Advanced Query Patterns

SQL recipes for common analysis tasks, from simple lookups to multi-table annotation aggregations.

This tutorial is a cookbook of SQL query patterns for Ragnerock, organized from simple to complex. Each recipe is self-contained; jump to whatever pattern fits your current task. For a full reference on available tables and SQL syntax, see Queries.

Setup

Open the Query Explorer from the sidebar. All SQL examples in this tutorial can be run directly in the Query Explorer’s editor. Type or paste the query and press Shift+Enter to execute.

Queries can also be executed as Query cells in the notebook interface, where results become DataFrame artifacts you can import into code cells. Standalone queries in the Query Explorer can be saved for reuse. The examples assume the following operators exist in your project (your actual operator names will differ):

OperatorDescriptionKey Fields
sentiment_analysisChunk-level sentiment scoringoverall_sentiment (text), sentiment_score (number)
entity_extractionNamed entity recognitionname (text), type (text), relevance (number)
financial_metricsFinancial data extractionrevenue (number), net_income (number), period (text)
risk_classificationRisk category labelingrisk_level (text), risk_factors (text)

Document Queries

List all PDF documents

SELECT name, filesize, created_at
FROM documents
WHERE file_type = 'pdf'
ORDER BY created_at DESC
namefilesizecreated_at
Q4_2024_earnings.pdf24583012024-12-15 09:23:00
annual_report_2024.pdf51200442024-11-30 14:05:00

Search documents by name

Use ILIKE for case-insensitive pattern matching:

SELECT name, file_type, created_at
FROM documents
WHERE name ILIKE '%earnings%'
ORDER BY name

Chunk Queries

Find paragraphs in a specific document

SELECT d.name, c.content, c.document_offset
FROM documents d
JOIN chunks c ON d.id = c.document_id
WHERE d.name ILIKE '%10-K%'
  AND c.chunk_type = 'paragraph'
ORDER BY c.document_offset

The chunk_type column accepts both 'paragraph' and 'PARAGRAPH'. The system normalizes automatically.

Search chunk content

Find chunks that mention a specific topic:

SELECT d.name, c.content
FROM documents d
JOIN chunks c ON d.id = c.document_id
WHERE c.content ILIKE '%supply chain%'
LIMIT 20

Page Queries

Read specific pages from a document

SELECT d.name, p.page_number, p.content
FROM pages p
JOIN documents d ON p.document_id = d.id
WHERE d.name ILIKE '%annual_report%'
  AND p.page_number BETWEEN 0 AND 4
ORDER BY p.page_number

Page numbers are 0-indexed, so BETWEEN 0 AND 4 returns the first five pages.

Count pages per document

SELECT d.name, COUNT(*) AS page_count
FROM documents d
JOIN pages p ON d.id = p.document_id
GROUP BY d.name
ORDER BY page_count DESC

Querying a Single Annotation Table

Retrieve annotations with filtering

Each operator you create becomes a queryable table. Query it directly by its name:

SELECT overall_sentiment, sentiment_score, confidence_score
FROM sentiment_analysis
WHERE sentiment_score < -0.5
  AND confidence_score > 0.8
ORDER BY sentiment_score ASC
LIMIT 20

Aggregate annotation statistics

SELECT
    AVG(sentiment_score) AS avg_sentiment,
    MIN(sentiment_score) AS min_sentiment,
    MAX(sentiment_score) AS max_sentiment,
    COUNT(*) AS total_annotations
FROM sentiment_analysis
avg_sentimentmin_sentimentmax_sentimenttotal_annotations
0.12-0.930.984827

Joining Annotations with Document Metadata

Sentiment by document

Join an annotation table to documents to see which document each annotation belongs to:

SELECT
    d.name AS document_name,
    sa.overall_sentiment,
    sa.sentiment_score,
    sa.confidence_score
FROM sentiment_analysis sa
JOIN documents d ON sa.document_id = d.id
WHERE sa.confidence_score > 0.8
ORDER BY sa.sentiment_score ASC
LIMIT 20

Average sentiment per document

SELECT
    d.name AS document_name,
    AVG(sa.sentiment_score) AS avg_sentiment,
    COUNT(*) AS chunk_count
FROM documents d
JOIN sentiment_analysis sa ON d.id = sa.document_id
GROUP BY d.name
ORDER BY avg_sentiment ASC
document_nameavg_sentimentchunk_count
risk_disclosure_2024.pdf-0.7234
Q4_earnings_call.pdf-0.1589
annual_report_2024.pdf0.31156

Annotations with chunk content

Join through chunks to see the original text alongside annotation results:

SELECT
    d.name,
    c.content,
    sa.overall_sentiment,
    sa.sentiment_score
FROM sentiment_analysis sa
JOIN chunks c ON sa.chunk_id = c.id
JOIN documents d ON sa.document_id = d.id
WHERE sa.sentiment_score < -0.7
ORDER BY sa.sentiment_score ASC
LIMIT 10

Multi-Annotation Joins

Combine sentiment and entity data

Join two annotation tables through their shared chunk_id to correlate different analyses on the same text:

SELECT
    c.content,
    sa.overall_sentiment,
    sa.sentiment_score,
    ee.name AS entity_name,
    ee.type AS entity_type
FROM chunks c
JOIN sentiment_analysis sa ON c.id = sa.chunk_id
JOIN entity_extraction ee ON c.id = ee.chunk_id
WHERE sa.sentiment_score < -0.5
  AND ee.type = 'ORGANIZATION'
ORDER BY sa.sentiment_score ASC
LIMIT 20

This finds chunks where a specific organization is mentioned in a negative-sentiment context.

Sentiment and financial metrics together

Use LEFT JOIN when not every chunk has both annotation types. This avoids dropping chunks that have sentiment data but no financial metrics:

SELECT
    d.name,
    sa.overall_sentiment,
    sa.sentiment_score,
    fm.revenue,
    fm.net_income
FROM documents d
JOIN sentiment_analysis sa ON d.id = sa.document_id
LEFT JOIN financial_metrics fm ON sa.chunk_id = fm.chunk_id
WHERE fm.revenue IS NOT NULL
ORDER BY fm.revenue DESC
LIMIT 20

Aggregation Patterns

Count annotations per category

SELECT
    overall_sentiment,
    COUNT(*) AS count
FROM sentiment_analysis
GROUP BY overall_sentiment
ORDER BY count DESC
overall_sentimentcount
neutral2145
positive1389
negative876
very_positive312
very_negative105

Filter groups with HAVING

Find documents that have more than 50 annotations with low average confidence:

SELECT
    d.name,
    COUNT(*) AS annotation_count,
    AVG(sa.confidence_score) AS avg_confidence
FROM documents d
JOIN sentiment_analysis sa ON d.id = sa.document_id
GROUP BY d.name
HAVING COUNT(*) > 50
   AND AVG(sa.confidence_score) < 0.7
ORDER BY avg_confidence ASC

Entity frequency across documents

Find the most frequently mentioned entities:

SELECT
    ee.name AS entity,
    ee.type AS entity_type,
    COUNT(*) AS mentions,
    COUNT(DISTINCT ee.document_id) AS document_count
FROM entity_extraction ee
GROUP BY ee.name, ee.type
HAVING COUNT(*) >= 5
ORDER BY mentions DESC
LIMIT 25
entityentity_typementionsdocument_count
Federal ReserveORGANIZATION4712
SECORGANIZATION318
JPMorganORGANIZATION286

Date-Based Filtering

Documents uploaded in a date range

SELECT name, file_type, created_at
FROM documents
WHERE created_at BETWEEN '2024-10-01' AND '2024-12-31'
ORDER BY created_at DESC

Recent annotations

Use CURRENT_DATE to filter relative to today:

SELECT
    d.name,
    sa.overall_sentiment,
    sa.created_at
FROM sentiment_analysis sa
JOIN documents d ON sa.document_id = d.id
WHERE sa.created_at > CURRENT_DATE - INTERVAL '7 days'
ORDER BY sa.created_at DESC

Group by date

Use the DATE() function to aggregate annotations by day:

SELECT
    DATE(sa.created_at) AS annotation_date,
    COUNT(*) AS annotations_created
FROM sentiment_analysis sa
GROUP BY DATE(sa.created_at)
ORDER BY annotation_date DESC
LIMIT 14

Pattern Matching

Search annotation text fields without worrying about case:

SELECT
    d.name,
    rc.risk_level,
    rc.risk_factors
FROM risk_classification rc
JOIN documents d ON rc.document_id = d.id
WHERE rc.risk_factors ILIKE '%cybersecurity%'

IN for enum filtering

Filter on a set of known values:

SELECT
    d.name,
    rc.risk_level,
    rc.confidence_score
FROM risk_classification rc
JOIN documents d ON rc.document_id = d.id
WHERE rc.risk_level IN ('high', 'critical')
  AND rc.confidence_score > 0.8
ORDER BY rc.confidence_score DESC

Combine multiple filters

SELECT
    d.name,
    c.content,
    sa.sentiment_score,
    ee.name AS entity
FROM chunks c
JOIN documents d ON c.document_id = d.id
JOIN sentiment_analysis sa ON c.id = sa.chunk_id
JOIN entity_extraction ee ON c.id = ee.chunk_id
WHERE d.name ILIKE '%10-K%'
  AND ee.type IN ('PERSON', 'ORGANIZATION')
  AND sa.sentiment_score < -0.3
ORDER BY sa.sentiment_score ASC
LIMIT 20

CASE Expressions

Map annotation values to custom categories

Use CASE to create derived columns that map annotation values into your own classification scheme:

SELECT
    d.name,
    sa.sentiment_score,
    CASE
        WHEN sa.sentiment_score > 0.5 THEN 'bullish'
        WHEN sa.sentiment_score > 0.1 THEN 'slightly_bullish'
        WHEN sa.sentiment_score > -0.1 THEN 'neutral'
        WHEN sa.sentiment_score > -0.5 THEN 'slightly_bearish'
        ELSE 'bearish'
    END AS signal
FROM sentiment_analysis sa
JOIN documents d ON sa.document_id = d.id

Aggregate over CASE categories

Combine CASE with GROUP BY to count documents in each custom category:

SELECT
    CASE
        WHEN sa.sentiment_score > 0.3 THEN 'positive'
        WHEN sa.sentiment_score < -0.3 THEN 'negative'
        ELSE 'neutral'
    END AS sentiment_bucket,
    COUNT(*) AS count,
    AVG(sa.confidence_score) AS avg_confidence
FROM sentiment_analysis sa
GROUP BY
    CASE
        WHEN sa.sentiment_score > 0.3 THEN 'positive'
        WHEN sa.sentiment_score < -0.3 THEN 'negative'
        ELSE 'neutral'
    END
ORDER BY count DESC
sentiment_bucketcountavg_confidence
neutral25340.87
positive17010.82
negative9810.79

Use COALESCE for missing values

Handle annotations that may have null fields:

SELECT
    d.name,
    COALESCE(fm.revenue, 0) AS revenue,
    COALESCE(fm.net_income, 0) AS net_income
FROM financial_metrics fm
JOIN documents d ON fm.document_id = d.id
ORDER BY revenue DESC
LIMIT 20

Working with Results in Notebooks

Query results from the Query Explorer are useful for quick exploration, but for deeper analysis you’ll want to work with the data in a notebook.

Import Query Results as DataFrames

In a Ragnerock notebook, add a Query cell with your SQL:

SELECT
    d.name AS document_name,
    AVG(sa.sentiment_score) AS avg_sentiment,
    COUNT(*) AS chunk_count
FROM documents d
JOIN sentiment_analysis sa ON d.id = sa.document_id
GROUP BY d.name
ORDER BY avg_sentiment ASC

The results appear as a DataFrame artifact. Click the artifact badge and select Import to bring the data into your code cells as a variable (e.g., sentiment_by_doc).

Analyze with pandas

In a Code cell, work with the imported data:

import pandas as pd

df = pd.DataFrame(sentiment_by_doc)
print(df.describe())

# Pivot table: average sentiment by document and category
pivot = df.pivot_table(
    values="sentiment_score",
    index="name",
    columns="overall_sentiment",
    aggfunc="mean",
)
print(pivot)

# Filter and sort
bearish = df[df["sentiment_score"] < -0.5].sort_values("sentiment_score")
print(f"Found {len(bearish)} bearish chunks")

Visualization

import matplotlib.pyplot as plt

df.plot.bar(x="overall_sentiment", y="count", title="Sentiment Distribution")
plt.tight_layout()
plt.show()

Next Steps

  • Queries: Full virtual table reference and SQL syntax documentation
  • Operators: Design the operators that become queryable tables