Advanced Query Patterns
SQL recipes for common analysis tasks, from simple lookups to multi-table annotation aggregations.
This tutorial is a cookbook of SQL query patterns for Ragnerock, organized from simple to complex. Each recipe is self-contained; jump to whatever pattern fits your current task. For a full reference on available tables and SQL syntax, see Queries.
Setup
Open the Query Explorer from the sidebar. All SQL examples in this tutorial can be run directly in the Query Explorer’s editor. Type or paste the query and press Shift+Enter to execute.
Queries can also be executed as Query cells in the notebook interface, where results become DataFrame artifacts you can import into code cells. Standalone queries in the Query Explorer can be saved for reuse. The examples assume the following operators exist in your project (your actual operator names will differ):
| Operator | Description | Key Fields |
|---|---|---|
sentiment_analysis | Chunk-level sentiment scoring | overall_sentiment (text), sentiment_score (number) |
entity_extraction | Named entity recognition | name (text), type (text), relevance (number) |
financial_metrics | Financial data extraction | revenue (number), net_income (number), period (text) |
risk_classification | Risk category labeling | risk_level (text), risk_factors (text) |
Document Queries
List all PDF documents
SELECT name, filesize, created_at
FROM documents
WHERE file_type = 'pdf'
ORDER BY created_at DESC
| name | filesize | created_at |
|---|---|---|
| Q4_2024_earnings.pdf | 2458301 | 2024-12-15 09:23:00 |
| annual_report_2024.pdf | 5120044 | 2024-11-30 14:05:00 |
Search documents by name
Use ILIKE for case-insensitive pattern matching:
SELECT name, file_type, created_at
FROM documents
WHERE name ILIKE '%earnings%'
ORDER BY name
Chunk Queries
Find paragraphs in a specific document
SELECT d.name, c.content, c.document_offset
FROM documents d
JOIN chunks c ON d.id = c.document_id
WHERE d.name ILIKE '%10-K%'
AND c.chunk_type = 'paragraph'
ORDER BY c.document_offset
The chunk_type column accepts both 'paragraph' and 'PARAGRAPH'. The system normalizes automatically.
Search chunk content
Find chunks that mention a specific topic:
SELECT d.name, c.content
FROM documents d
JOIN chunks c ON d.id = c.document_id
WHERE c.content ILIKE '%supply chain%'
LIMIT 20
Page Queries
Read specific pages from a document
SELECT d.name, p.page_number, p.content
FROM pages p
JOIN documents d ON p.document_id = d.id
WHERE d.name ILIKE '%annual_report%'
AND p.page_number BETWEEN 0 AND 4
ORDER BY p.page_number
Page numbers are 0-indexed, so BETWEEN 0 AND 4 returns the first five pages.
Count pages per document
SELECT d.name, COUNT(*) AS page_count
FROM documents d
JOIN pages p ON d.id = p.document_id
GROUP BY d.name
ORDER BY page_count DESC
Querying a Single Annotation Table
Retrieve annotations with filtering
Each operator you create becomes a queryable table. Query it directly by its name:
SELECT overall_sentiment, sentiment_score, confidence_score
FROM sentiment_analysis
WHERE sentiment_score < -0.5
AND confidence_score > 0.8
ORDER BY sentiment_score ASC
LIMIT 20
Aggregate annotation statistics
SELECT
AVG(sentiment_score) AS avg_sentiment,
MIN(sentiment_score) AS min_sentiment,
MAX(sentiment_score) AS max_sentiment,
COUNT(*) AS total_annotations
FROM sentiment_analysis
| avg_sentiment | min_sentiment | max_sentiment | total_annotations |
|---|---|---|---|
| 0.12 | -0.93 | 0.98 | 4827 |
Joining Annotations with Document Metadata
Sentiment by document
Join an annotation table to documents to see which document each annotation belongs to:
SELECT
d.name AS document_name,
sa.overall_sentiment,
sa.sentiment_score,
sa.confidence_score
FROM sentiment_analysis sa
JOIN documents d ON sa.document_id = d.id
WHERE sa.confidence_score > 0.8
ORDER BY sa.sentiment_score ASC
LIMIT 20
Average sentiment per document
SELECT
d.name AS document_name,
AVG(sa.sentiment_score) AS avg_sentiment,
COUNT(*) AS chunk_count
FROM documents d
JOIN sentiment_analysis sa ON d.id = sa.document_id
GROUP BY d.name
ORDER BY avg_sentiment ASC
| document_name | avg_sentiment | chunk_count |
|---|---|---|
| risk_disclosure_2024.pdf | -0.72 | 34 |
| Q4_earnings_call.pdf | -0.15 | 89 |
| annual_report_2024.pdf | 0.31 | 156 |
Annotations with chunk content
Join through chunks to see the original text alongside annotation results:
SELECT
d.name,
c.content,
sa.overall_sentiment,
sa.sentiment_score
FROM sentiment_analysis sa
JOIN chunks c ON sa.chunk_id = c.id
JOIN documents d ON sa.document_id = d.id
WHERE sa.sentiment_score < -0.7
ORDER BY sa.sentiment_score ASC
LIMIT 10
Multi-Annotation Joins
Combine sentiment and entity data
Join two annotation tables through their shared chunk_id to correlate different analyses on the same text:
SELECT
c.content,
sa.overall_sentiment,
sa.sentiment_score,
ee.name AS entity_name,
ee.type AS entity_type
FROM chunks c
JOIN sentiment_analysis sa ON c.id = sa.chunk_id
JOIN entity_extraction ee ON c.id = ee.chunk_id
WHERE sa.sentiment_score < -0.5
AND ee.type = 'ORGANIZATION'
ORDER BY sa.sentiment_score ASC
LIMIT 20
This finds chunks where a specific organization is mentioned in a negative-sentiment context.
Sentiment and financial metrics together
Use LEFT JOIN when not every chunk has both annotation types. This avoids dropping chunks that have sentiment data but no financial metrics:
SELECT
d.name,
sa.overall_sentiment,
sa.sentiment_score,
fm.revenue,
fm.net_income
FROM documents d
JOIN sentiment_analysis sa ON d.id = sa.document_id
LEFT JOIN financial_metrics fm ON sa.chunk_id = fm.chunk_id
WHERE fm.revenue IS NOT NULL
ORDER BY fm.revenue DESC
LIMIT 20
Aggregation Patterns
Count annotations per category
SELECT
overall_sentiment,
COUNT(*) AS count
FROM sentiment_analysis
GROUP BY overall_sentiment
ORDER BY count DESC
| overall_sentiment | count |
|---|---|
| neutral | 2145 |
| positive | 1389 |
| negative | 876 |
| very_positive | 312 |
| very_negative | 105 |
Filter groups with HAVING
Find documents that have more than 50 annotations with low average confidence:
SELECT
d.name,
COUNT(*) AS annotation_count,
AVG(sa.confidence_score) AS avg_confidence
FROM documents d
JOIN sentiment_analysis sa ON d.id = sa.document_id
GROUP BY d.name
HAVING COUNT(*) > 50
AND AVG(sa.confidence_score) < 0.7
ORDER BY avg_confidence ASC
Entity frequency across documents
Find the most frequently mentioned entities:
SELECT
ee.name AS entity,
ee.type AS entity_type,
COUNT(*) AS mentions,
COUNT(DISTINCT ee.document_id) AS document_count
FROM entity_extraction ee
GROUP BY ee.name, ee.type
HAVING COUNT(*) >= 5
ORDER BY mentions DESC
LIMIT 25
| entity | entity_type | mentions | document_count |
|---|---|---|---|
| Federal Reserve | ORGANIZATION | 47 | 12 |
| SEC | ORGANIZATION | 31 | 8 |
| JPMorgan | ORGANIZATION | 28 | 6 |
Date-Based Filtering
Documents uploaded in a date range
SELECT name, file_type, created_at
FROM documents
WHERE created_at BETWEEN '2024-10-01' AND '2024-12-31'
ORDER BY created_at DESC
Recent annotations
Use CURRENT_DATE to filter relative to today:
SELECT
d.name,
sa.overall_sentiment,
sa.created_at
FROM sentiment_analysis sa
JOIN documents d ON sa.document_id = d.id
WHERE sa.created_at > CURRENT_DATE - INTERVAL '7 days'
ORDER BY sa.created_at DESC
Group by date
Use the DATE() function to aggregate annotations by day:
SELECT
DATE(sa.created_at) AS annotation_date,
COUNT(*) AS annotations_created
FROM sentiment_analysis sa
GROUP BY DATE(sa.created_at)
ORDER BY annotation_date DESC
LIMIT 14
Pattern Matching
ILIKE for case-insensitive search
Search annotation text fields without worrying about case:
SELECT
d.name,
rc.risk_level,
rc.risk_factors
FROM risk_classification rc
JOIN documents d ON rc.document_id = d.id
WHERE rc.risk_factors ILIKE '%cybersecurity%'
IN for enum filtering
Filter on a set of known values:
SELECT
d.name,
rc.risk_level,
rc.confidence_score
FROM risk_classification rc
JOIN documents d ON rc.document_id = d.id
WHERE rc.risk_level IN ('high', 'critical')
AND rc.confidence_score > 0.8
ORDER BY rc.confidence_score DESC
Combine multiple filters
SELECT
d.name,
c.content,
sa.sentiment_score,
ee.name AS entity
FROM chunks c
JOIN documents d ON c.document_id = d.id
JOIN sentiment_analysis sa ON c.id = sa.chunk_id
JOIN entity_extraction ee ON c.id = ee.chunk_id
WHERE d.name ILIKE '%10-K%'
AND ee.type IN ('PERSON', 'ORGANIZATION')
AND sa.sentiment_score < -0.3
ORDER BY sa.sentiment_score ASC
LIMIT 20
CASE Expressions
Map annotation values to custom categories
Use CASE to create derived columns that map annotation values into your own classification scheme:
SELECT
d.name,
sa.sentiment_score,
CASE
WHEN sa.sentiment_score > 0.5 THEN 'bullish'
WHEN sa.sentiment_score > 0.1 THEN 'slightly_bullish'
WHEN sa.sentiment_score > -0.1 THEN 'neutral'
WHEN sa.sentiment_score > -0.5 THEN 'slightly_bearish'
ELSE 'bearish'
END AS signal
FROM sentiment_analysis sa
JOIN documents d ON sa.document_id = d.id
Aggregate over CASE categories
Combine CASE with GROUP BY to count documents in each custom category:
SELECT
CASE
WHEN sa.sentiment_score > 0.3 THEN 'positive'
WHEN sa.sentiment_score < -0.3 THEN 'negative'
ELSE 'neutral'
END AS sentiment_bucket,
COUNT(*) AS count,
AVG(sa.confidence_score) AS avg_confidence
FROM sentiment_analysis sa
GROUP BY
CASE
WHEN sa.sentiment_score > 0.3 THEN 'positive'
WHEN sa.sentiment_score < -0.3 THEN 'negative'
ELSE 'neutral'
END
ORDER BY count DESC
| sentiment_bucket | count | avg_confidence |
|---|---|---|
| neutral | 2534 | 0.87 |
| positive | 1701 | 0.82 |
| negative | 981 | 0.79 |
Use COALESCE for missing values
Handle annotations that may have null fields:
SELECT
d.name,
COALESCE(fm.revenue, 0) AS revenue,
COALESCE(fm.net_income, 0) AS net_income
FROM financial_metrics fm
JOIN documents d ON fm.document_id = d.id
ORDER BY revenue DESC
LIMIT 20
Working with Results in Notebooks
Query results from the Query Explorer are useful for quick exploration, but for deeper analysis you’ll want to work with the data in a notebook.
Import Query Results as DataFrames
In a Ragnerock notebook, add a Query cell with your SQL:
SELECT
d.name AS document_name,
AVG(sa.sentiment_score) AS avg_sentiment,
COUNT(*) AS chunk_count
FROM documents d
JOIN sentiment_analysis sa ON d.id = sa.document_id
GROUP BY d.name
ORDER BY avg_sentiment ASC
The results appear as a DataFrame artifact. Click the artifact badge and select Import to bring the data into your code cells as a variable (e.g., sentiment_by_doc).
Analyze with pandas
In a Code cell, work with the imported data:
import pandas as pd
df = pd.DataFrame(sentiment_by_doc)
print(df.describe())
# Pivot table: average sentiment by document and category
pivot = df.pivot_table(
values="sentiment_score",
index="name",
columns="overall_sentiment",
aggfunc="mean",
)
print(pivot)
# Filter and sort
bearish = df[df["sentiment_score"] < -0.5].sort_values("sentiment_score")
print(f"Found {len(bearish)} bearish chunks")
Visualization
import matplotlib.pyplot as plt
df.plot.bar(x="overall_sentiment", y="count", title="Sentiment Distribution")
plt.tight_layout()
plt.show()