Queries

Query your documents and annotations with SQL.

Ragnerock provides a SQL query interface that lets you run SELECT queries against virtual tables representing your documents, chunks, pages, and annotation data. All queries are automatically scoped to your project and executed in a read-only, security-hardened environment. This page covers the Query Explorer UI, available tables, supported SQL syntax, and security model.

Using the Query Explorer

The Query Explorer is the primary way to run SQL queries. Open it from the Queries section in the sidebar, or add a Query cell in a notebook.

SQL Mode

The Query Explorer provides a full-featured SQL editor powered by Monaco (the same editor used in VS Code). Write your query, then click Execute or press Shift+Enter to run it.

Results appear in a table below the editor with:

  • Sortable columns: Click any column header to sort ascending or descending
  • Filtering: Add conditions to filter rows (equals, contains, greater than, etc.)
  • Grouping: Group rows by any column with collapsible sections
  • Row limit: Choose how many rows to display (100, 500, 1,000, 2,500, or 5,000)
  • Pagination: Navigate through large result sets
  • Export: Download results for use in other tools

The Query Explorer showing a SQL query in the editor, with results displayed in a table below

AI-Assisted Mode

Switch to AI-assisted mode to describe what you want in natural language. The system generates the appropriate SQL query, shows you the generated SQL with an explanation, and executes it.

For example, type: “Show me all documents with negative sentiment and their confidence scores, sorted by confidence”

The generated SQL appears in the editor where you can review or modify it before executing.

The AI-assisted query mode showing a natural language input and the generated SQL query with explanation

Saving and History

  • Save queries: Click Save to name and store frequently used queries. Access saved queries from the saved queries popover.
  • Query history: Recent queries are automatically tracked with timestamps, so you can quickly re-run previous analyses.

Virtual Tables

Every project has three built-in virtual tables. All queries are automatically filtered to your current project, and soft-deleted records are excluded.

documents

The documents table contains metadata for every uploaded document in your project.

ColumnTypeDescription
idUUIDDocument identifier
nameTEXTDocument filename
file_typeTEXTFile format (pdf, docx, csv, etc.)
filesizeINTEGERFile size in bytes
created_atTIMESTAMPUpload timestamp
updated_atTIMESTAMPLast modification timestamp

You can also reference this table as document. Both forms are accepted.

SELECT name, file_type, filesize
FROM documents
WHERE file_type = 'pdf'
ORDER BY created_at DESC
LIMIT 10

chunks

The chunks table contains text segments extracted from documents during ingestion. Each chunk represents a paragraph or sentence, depending on how the document was processed.

ColumnTypeDescription
idUUIDChunk identifier
document_idUUIDParent document
contentTEXTChunk text content
document_offsetINTEGERPosition within the document (0-indexed)
chunk_typeENUMPARAGRAPH or SENTENCE
start_char_idxINTEGERStart character offset in the source document
end_char_idxINTEGEREnd character offset in the source document

Table aliases: chunks, document_chunks, documentchunk.

The chunk_type column accepts both lowercase and uppercase values. 'paragraph' is automatically converted to 'PARAGRAPH'.

pages

The pages table contains page-level content for documents that have distinct pages (primarily PDFs).

ColumnTypeDescription
idUUIDPage identifier
document_idUUIDParent document
page_numberINTEGERPage number (0-indexed)
contentTEXTFull page text
created_atTIMESTAMPExtraction timestamp

Table aliases: pages, document_pages, documentpage.

Annotation Tables

Each operator you create in your project automatically generates a queryable virtual table. The system reads your operator’s JSON Schema and builds the table dynamically. No schema migration is needed.

Table Naming

Operator names are converted to SQL-friendly table names by lowercasing and replacing spaces and hyphens with underscores.

Operator NameSQL Table Name
Sentiment Analysissentiment_analysis
Content Classificationcontent_classification
Risk-Assessmentrisk_assessment
Entity Extractionentity_extraction

You can use either the original operator name or the SQL-friendly name in your queries.

Standard Columns

Every annotation table includes these columns alongside the fields defined in the operator’s JSON Schema:

ColumnTypeDescription
annotation_idUUIDUnique annotation identifier
document_idUUIDSource document
chunk_idUUIDSource chunk (for paragraph- or sentence-scoped operators)
page_idUUIDSource page (for page-scoped operators)
confidence_scoreNUMERICModel confidence score
created_atTIMESTAMPWhen the annotation was created

Which ID columns are populated depends on the operator’s scope. A document-scoped operator populates document_id; a paragraph-scoped operator populates both chunk_id and document_id.

Type Casting

Your operator’s JSON Schema types are automatically cast to SQL types:

JSON Schema TypeSQL Type
stringTEXT
numberNUMERIC
integerBIGINT
booleanBOOLEAN
arrayJSONB
objectJSONB

Nullable types like ["string", "null"] are resolved to the non-null type. Array and object fields are reconstructed into proper JSON values. You can select them, but filtering on individual elements within arrays or objects is not supported.

Tabular Document Tables

CSV and Excel files uploaded to your project automatically become queryable as virtual tables, with columns typed according to the detected schema.

CSV Files

The table name is derived from the filename: lowercase, strip the .csv extension, and replace non-alphanumeric characters with underscores.

FilenameSQL Table Name
Q3 Financials.csvq3_financials
holdings-2024.csvholdings_2024

Each table includes row_id, document_id, and row_index columns in addition to the data columns from the file.

Excel Files

Multi-sheet Excel files generate one table per sheet using the convention {document}__{sheet} (double underscore separator).

FileSheetSQL Table Name
portfolio.xlsxHoldingsportfolio__holdings
portfolio.xlsxTradesportfolio__trades

Naming Constraints

Tabular document table names cannot collide with operator names, built-in tables (documents, chunks, pages), or SQL reserved words. If a collision occurs, rename the document or query the data through the built-in tabular_rows table with a document_id filter.

Supported SQL

SELECT

Standard column selection with aliases:

SELECT
    d.name AS document_name,
    sa.overall_sentiment,
    sa.confidence_score
FROM sentiment_analysis sa
JOIN documents d ON sa.document_id = d.id

All whitelisted functions can be used in the SELECT clause.

FROM and JOIN

All standard join types are supported:

Join TypeSyntax
Inner joinJOIN or INNER JOIN
Left joinLEFT JOIN
Right joinRIGHT JOIN
Full outer joinFULL JOIN
Cross joinCROSS JOIN
SELECT d.name, c.content, c.chunk_type
FROM documents d
JOIN chunks c ON d.id = c.document_id
WHERE d.name LIKE '%earnings%'
ORDER BY c.document_offset

WHERE

OperatorExample
=, !=WHERE file_type = 'pdf'
<, <=, >, >=WHERE confidence_score >= 0.8
LIKE, ILIKEWHERE name ILIKE '%annual report%'
INWHERE risk_level IN ('high', 'critical')
BETWEENWHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'
IS NULL, IS NOT NULLWHERE revenue IS NOT NULL
AND, OR, NOTWHERE score > 0.5 AND category = 'positive'

GROUP BY and HAVING

Standard aggregation with GROUP BY. Use HAVING to filter on aggregated values.

SELECT risk_category, COUNT(*) AS count
FROM risk_classification
GROUP BY risk_category
HAVING COUNT(*) > 5
ORDER BY count DESC

ORDER BY and LIMIT

ORDER BY supports ASC (default) and DESC. Use LIMIT to cap the number of rows returned.

SELECT name, filesize
FROM documents
ORDER BY filesize DESC
LIMIT 25

Functions

Aggregate Functions

FunctionDescriptionExample
COUNT(*)Row countCOUNT(*) AS total
COUNT(DISTINCT col)Distinct countCOUNT(DISTINCT document_id)
SUM(col)SumSUM(revenue)
AVG(col)AverageAVG(sentiment_score)
MIN(col)MinimumMIN(created_at)
MAX(col)MaximumMAX(confidence_score)

String Functions

FunctionDescription
LOWER(col)Convert to lowercase
UPPER(col)Convert to uppercase
LENGTH(col)String length

Date Functions

FunctionDescription
DATE(col)Extract date from a timestamp
CURRENT_DATECurrent date
CURRENT_TIMESTAMPCurrent timestamp

Type and Conditional Functions

FunctionDescription
CAST(expr AS type)Type conversion
COALESCE(a, b, ...)First non-null value
CASE WHEN ... THEN ... ELSE ... ENDConditional logic

CASE is useful for mapping annotation values to custom categories:

SELECT
    d.name,
    CASE
        WHEN sa.overall_sentiment IN ('positive', 'very_positive') THEN 'bullish'
        WHEN sa.overall_sentiment IN ('negative', 'very_negative') THEN 'bearish'
        ELSE 'neutral'
    END AS signal
FROM sentiment_analysis sa
JOIN documents d ON sa.document_id = d.id

Unsupported Operations

The query system uses a whitelist-based security model. Only explicitly allowed operations are permitted. The following are not supported:

OperationReason
SubqueriesNot supported for security reasons
UNION / INTERSECT / EXCEPTSet operations are not supported
WITH (CTEs)Reserved for internal use by the query engine
Window functions (OVER, PARTITION BY)Not currently supported
INSERT, UPDATE, DELETEAll queries are read-only
CREATE, ALTER, DROPSchema modification is not allowed
SQL comments (--, /* */)Stripped for security

If you need capabilities beyond what the query interface supports, export your query results to a notebook. In a Ragnerock notebook, add a Query cell with your SQL. The results become a DataFrame artifact you can import into a code cell and work with using pandas, or connect to your JupyterLab environment for analysis with your full local toolkit.

Project Scoping and Security

Automatic Project Filtering

All queries are automatically scoped to your current project. You never need to add WHERE project_id = .... The system injects this constraint into every query before execution. Queries against chunks and pages include an existence check ensuring the parent document belongs to your project.

Read-Only Enforcement

Only SELECT statements are permitted. Any attempt to run data modification (INSERT, UPDATE, DELETE) or schema modification (CREATE, ALTER, DROP) statements returns a FORBIDDEN_OPERATION error.

Query Validation

You can validate a query before executing it. The validation endpoint checks for syntax errors, unknown tables, unknown columns, and forbidden operations, returning structured error messages with suggestions for fixing common issues.

Limits and Constraints

ConstraintValueDescription
Default result rows1,000Default maximum rows returned per query
Maximum result rows500,000Hard cap on rows returned
Query timeout300 secondsMaximum execution time
Rate limit (execute)30 per minuteQuery executions per user
Rate limit (validate)10 per minuteValidation requests per user

You can set the row limit in the Query Explorer toolbar or by adding a LIMIT clause to your query. Queries that exceed the timeout return a TIMEOUT error.

Example Queries

These examples progress from simple lookups to multi-table joins with aggregation.

Simple Document Lookup

List recent PDF documents:

SELECT name, filesize, created_at
FROM documents
WHERE file_type = 'pdf'
ORDER BY created_at DESC
LIMIT 10

Document and Chunk Join

Find paragraphs in earnings-related documents:

SELECT d.name, c.content, c.chunk_type
FROM documents d
JOIN chunks c ON d.id = c.document_id
WHERE c.chunk_type = 'paragraph'
  AND d.name LIKE '%earnings%'
ORDER BY c.document_offset

Annotation Query with Filtering

Find documents with negative sentiment and high confidence:

SELECT
    d.name AS document_name,
    sa.overall_sentiment,
    sa.confidence_score
FROM sentiment_analysis sa
JOIN documents d ON sa.document_id = d.id
WHERE sa.overall_sentiment IN ('negative', 'very_negative')
  AND sa.confidence_score > 0.8
ORDER BY sa.confidence_score DESC

Multi-Table Join with Aggregation

Aggregate annotation statistics per document:

SELECT
    d.name AS document_name,
    COUNT(*) AS annotation_count,
    AVG(sa.confidence_score) AS avg_confidence,
    MIN(sa.confidence_score) AS min_confidence,
    MAX(sa.confidence_score) AS max_confidence
FROM documents d
JOIN sentiment_analysis sa ON d.id = sa.document_id
WHERE d.file_type = 'pdf'
GROUP BY d.name
ORDER BY avg_confidence DESC
LIMIT 20

GROUP BY with HAVING

Find frequently mentioned entities:

SELECT
    name,
    type,
    COUNT(*) AS mentions
FROM entity_extraction
GROUP BY name, type
HAVING COUNT(*) >= 3
ORDER BY mentions DESC
LIMIT 50

Next Steps

  • Operators: Design the operators that become queryable tables
  • JupyterLab Integration: Import query results into your local Jupyter environment for analysis