Advanced Query Patterns - Ragnerock Docs

This tutorial is a cookbook of SQL query patterns for Ragnerock, organized from simple to complex. Each recipe is self-contained; jump to whatever pattern fits your current task. For a full reference on available tables and SQL syntax, see Queries.

Setup

Open the Query Explorer from the sidebar. All SQL examples in this tutorial can be run directly in the Query Explorer’s editor. Type or paste the query and press Shift+Enter to execute.

Queries can also be executed as Query cells in the notebook interface, where results become DataFrame artifacts you can import into code cells. Standalone queries in the Query Explorer can be saved for reuse. The examples assume the following operators exist in your project (your actual operator names will differ):

Operator	Description	Key Fields
`sentiment_analysis`	Chunk-level sentiment scoring	`overall_sentiment` (text), `sentiment_score` (number)
`entity_extraction`	Named entity recognition	`name` (text), `type` (text), `relevance` (number)
`financial_metrics`	Financial data extraction	`revenue` (number), `net_income` (number), `period` (text)
`risk_classification`	Risk category labeling	`risk_level` (text), `risk_factors` (text)

Document Queries

List all PDF documents

SELECT name, filesize, created_at
FROM documents
WHERE file_type = 'pdf'
ORDER BY created_at DESC

name	filesize	created_at
Q4_2024_earnings.pdf	2458301	2024-12-15 09:23:00
annual_report_2024.pdf	5120044	2024-11-30 14:05:00

Search documents by name

Use ILIKE for case-insensitive pattern matching:

SELECT name, file_type, created_at
FROM documents
WHERE name ILIKE '%earnings%'
ORDER BY name

Chunk Queries

Find paragraphs in a specific document

SELECT d.name, c.content, c.document_offset
FROM documents d
JOIN chunks c ON d.id = c.document_id
WHERE d.name ILIKE '%10-K%'
  AND c.chunk_type = 'paragraph'
ORDER BY c.document_offset

The chunk_type column accepts both 'paragraph' and 'PARAGRAPH'. The system normalizes automatically.

Search chunk content

Find chunks that mention a specific topic:

SELECT d.name, c.content
FROM documents d
JOIN chunks c ON d.id = c.document_id
WHERE c.content ILIKE '%supply chain%'
LIMIT 20

Page Queries

Read specific pages from a document

SELECT d.name, p.page_number, p.content
FROM pages p
JOIN documents d ON p.document_id = d.id
WHERE d.name ILIKE '%annual_report%'
  AND p.page_number BETWEEN 0 AND 4
ORDER BY p.page_number

Page numbers are 0-indexed, so BETWEEN 0 AND 4 returns the first five pages.

Count pages per document

SELECT d.name, COUNT(*) AS page_count
FROM documents d
JOIN pages p ON d.id = p.document_id
GROUP BY d.name
ORDER BY page_count DESC

Querying a Single Annotation Table

Retrieve annotations with filtering

Each operator you create becomes a queryable table. Query it directly by its name:

SELECT overall_sentiment, sentiment_score, confidence_score
FROM sentiment_analysis
WHERE sentiment_score < -0.5
  AND confidence_score > 0.8
ORDER BY sentiment_score ASC
LIMIT 20

Aggregate annotation statistics

SELECT
    AVG(sentiment_score) AS avg_sentiment,
    MIN(sentiment_score) AS min_sentiment,
    MAX(sentiment_score) AS max_sentiment,
    COUNT(*) AS total_annotations
FROM sentiment_analysis

avg_sentiment	min_sentiment	max_sentiment	total_annotations
0.12	-0.93	0.98	4827

Joining Annotations with Document Metadata

Sentiment by document

Join an annotation table to documents to see which document each annotation belongs to:

SELECT
    d.name AS document_name,
    sa.overall_sentiment,
    sa.sentiment_score,
    sa.confidence_score
FROM sentiment_analysis sa
JOIN documents d ON sa.document_id = d.id
WHERE sa.confidence_score > 0.8
ORDER BY sa.sentiment_score ASC
LIMIT 20

Average sentiment per document

SELECT
    d.name AS document_name,
    AVG(sa.sentiment_score) AS avg_sentiment,
    COUNT(*) AS chunk_count
FROM documents d
JOIN sentiment_analysis sa ON d.id = sa.document_id
GROUP BY d.name
ORDER BY avg_sentiment ASC

document_name	avg_sentiment	chunk_count
risk_disclosure_2024.pdf	-0.72	34
Q4_earnings_call.pdf	-0.15	89
annual_report_2024.pdf	0.31	156

Annotations with chunk content

Join through chunks to see the original text alongside annotation results:

SELECT
    d.name,
    c.content,
    sa.overall_sentiment,
    sa.sentiment_score
FROM sentiment_analysis sa
JOIN chunks c ON sa.chunk_id = c.id
JOIN documents d ON sa.document_id = d.id
WHERE sa.sentiment_score < -0.7
ORDER BY sa.sentiment_score ASC
LIMIT 10

Multi-Annotation Joins

Combine sentiment and entity data

Join two annotation tables through their shared chunk_id to correlate different analyses on the same text:

SELECT
    c.content,
    sa.overall_sentiment,
    sa.sentiment_score,
    ee.name AS entity_name,
    ee.type AS entity_type
FROM chunks c
JOIN sentiment_analysis sa ON c.id = sa.chunk_id
JOIN entity_extraction ee ON c.id = ee.chunk_id
WHERE sa.sentiment_score < -0.5
  AND ee.type = 'ORGANIZATION'
ORDER BY sa.sentiment_score ASC
LIMIT 20

This finds chunks where a specific organization is mentioned in a negative-sentiment context.

Sentiment and financial metrics together

Use LEFT JOIN when not every chunk has both annotation types. This avoids dropping chunks that have sentiment data but no financial metrics:

SELECT
    d.name,
    sa.overall_sentiment,
    sa.sentiment_score,
    fm.revenue,
    fm.net_income
FROM documents d
JOIN sentiment_analysis sa ON d.id = sa.document_id
LEFT JOIN financial_metrics fm ON sa.chunk_id = fm.chunk_id
WHERE fm.revenue IS NOT NULL
ORDER BY fm.revenue DESC
LIMIT 20

Aggregation Patterns

Count annotations per category

SELECT
    overall_sentiment,
    COUNT(*) AS count
FROM sentiment_analysis
GROUP BY overall_sentiment
ORDER BY count DESC

overall_sentiment	count
neutral	2145
positive	1389
negative	876
very_positive	312
very_negative	105

Filter groups with HAVING

Find documents that have more than 50 annotations with low average confidence:

SELECT
    d.name,
    COUNT(*) AS annotation_count,
    AVG(sa.confidence_score) AS avg_confidence
FROM documents d
JOIN sentiment_analysis sa ON d.id = sa.document_id
GROUP BY d.name
HAVING COUNT(*) > 50
   AND AVG(sa.confidence_score) < 0.7
ORDER BY avg_confidence ASC

Entity frequency across documents

Find the most frequently mentioned entities:

SELECT
    ee.name AS entity,
    ee.type AS entity_type,
    COUNT(*) AS mentions,
    COUNT(DISTINCT ee.document_id) AS document_count
FROM entity_extraction ee
GROUP BY ee.name, ee.type
HAVING COUNT(*) >= 5
ORDER BY mentions DESC
LIMIT 25

entity	entity_type	mentions	document_count
Federal Reserve	ORGANIZATION	47	12
SEC	ORGANIZATION	31	8
JPMorgan	ORGANIZATION	28	6

Date-Based Filtering

Documents uploaded in a date range

SELECT name, file_type, created_at
FROM documents
WHERE created_at BETWEEN '2024-10-01' AND '2024-12-31'
ORDER BY created_at DESC

Recent annotations

Use CURRENT_DATE to filter relative to today:

SELECT
    d.name,
    sa.overall_sentiment,
    sa.created_at
FROM sentiment_analysis sa
JOIN documents d ON sa.document_id = d.id
WHERE sa.created_at > CURRENT_DATE - INTERVAL '7 days'
ORDER BY sa.created_at DESC

Group by date

Use the DATE() function to aggregate annotations by day:

SELECT
    DATE(sa.created_at) AS annotation_date,
    COUNT(*) AS annotations_created
FROM sentiment_analysis sa
GROUP BY DATE(sa.created_at)
ORDER BY annotation_date DESC
LIMIT 14

Pattern Matching

ILIKE for case-insensitive search

Search annotation text fields without worrying about case:

SELECT
    d.name,
    rc.risk_level,
    rc.risk_factors
FROM risk_classification rc
JOIN documents d ON rc.document_id = d.id
WHERE rc.risk_factors ILIKE '%cybersecurity%'

IN for enum filtering

Filter on a set of known values:

SELECT
    d.name,
    rc.risk_level,
    rc.confidence_score
FROM risk_classification rc
JOIN documents d ON rc.document_id = d.id
WHERE rc.risk_level IN ('high', 'critical')
  AND rc.confidence_score > 0.8
ORDER BY rc.confidence_score DESC

Combine multiple filters

SELECT
    d.name,
    c.content,
    sa.sentiment_score,
    ee.name AS entity
FROM chunks c
JOIN documents d ON c.document_id = d.id
JOIN sentiment_analysis sa ON c.id = sa.chunk_id
JOIN entity_extraction ee ON c.id = ee.chunk_id
WHERE d.name ILIKE '%10-K%'
  AND ee.type IN ('PERSON', 'ORGANIZATION')
  AND sa.sentiment_score < -0.3
ORDER BY sa.sentiment_score ASC
LIMIT 20

CASE Expressions

Map annotation values to custom categories

Use CASE to create derived columns that map annotation values into your own classification scheme:

SELECT
    d.name,
    sa.sentiment_score,
    CASE
        WHEN sa.sentiment_score > 0.5 THEN 'bullish'
        WHEN sa.sentiment_score > 0.1 THEN 'slightly_bullish'
        WHEN sa.sentiment_score > -0.1 THEN 'neutral'
        WHEN sa.sentiment_score > -0.5 THEN 'slightly_bearish'
        ELSE 'bearish'
    END AS signal
FROM sentiment_analysis sa
JOIN documents d ON sa.document_id = d.id

Aggregate over CASE categories

Combine CASE with GROUP BY to count documents in each custom category:

SELECT
    CASE
        WHEN sa.sentiment_score > 0.3 THEN 'positive'
        WHEN sa.sentiment_score < -0.3 THEN 'negative'
        ELSE 'neutral'
    END AS sentiment_bucket,
    COUNT(*) AS count,
    AVG(sa.confidence_score) AS avg_confidence
FROM sentiment_analysis sa
GROUP BY
    CASE
        WHEN sa.sentiment_score > 0.3 THEN 'positive'
        WHEN sa.sentiment_score < -0.3 THEN 'negative'
        ELSE 'neutral'
    END
ORDER BY count DESC

sentiment_bucket	count	avg_confidence
neutral	2534	0.87
positive	1701	0.82
negative	981	0.79

Use COALESCE for missing values

Handle annotations that may have null fields:

SELECT
    d.name,
    COALESCE(fm.revenue, 0) AS revenue,
    COALESCE(fm.net_income, 0) AS net_income
FROM financial_metrics fm
JOIN documents d ON fm.document_id = d.id
ORDER BY revenue DESC
LIMIT 20

Working with Results in Notebooks

Query results from the Query Explorer are useful for quick exploration, but for deeper analysis you’ll want to work with the data in a notebook.

Import Query Results as DataFrames

In a Ragnerock notebook, add a Query cell with your SQL:

SELECT
    d.name AS document_name,
    AVG(sa.sentiment_score) AS avg_sentiment,
    COUNT(*) AS chunk_count
FROM documents d
JOIN sentiment_analysis sa ON d.id = sa.document_id
GROUP BY d.name
ORDER BY avg_sentiment ASC

The results appear as a DataFrame artifact. Click the artifact badge and select Import to bring the data into your code cells as a variable (e.g., sentiment_by_doc).

Analyze with pandas

In a Code cell, work with the imported data:

import pandas as pd

df = pd.DataFrame(sentiment_by_doc)
print(df.describe())

# Pivot table: average sentiment by document and category
pivot = df.pivot_table(
    values="sentiment_score",
    index="name",
    columns="overall_sentiment",
    aggfunc="mean",
)
print(pivot)

# Filter and sort
bearish = df[df["sentiment_score"] < -0.5].sort_values("sentiment_score")
print(f"Found {len(bearish)} bearish chunks")

Visualization

import matplotlib.pyplot as plt

df.plot.bar(x="overall_sentiment", y="count", title="Sentiment Distribution")
plt.tight_layout()
plt.show()

Next Steps

Queries: Full virtual table reference and SQL syntax documentation
Operators: Design the operators that become queryable tables