Agent Orchestration
How Ragnerock orchestrates agentic data processing pipelines using workflows and operators.
Ragnerock provides a general-purpose framework for orchestrating agentic data processing pipelines. Workflows and operators are the core abstractions. They let you define what your AI agents produce, how they chain together, and at what granularity they process your data. Use cases range from classification and entity extraction to sentiment analysis, feature generation, summarization, and custom business logic.
Overview
The orchestration framework has three core building blocks:
- Operators define what to produce: a JSON Schema describing the output shape, AI instructions for the task, and a scope controlling granularity
- Workflows define how to orchestrate production: a graph of operators with dependencies, conditions, and error handling
- Annotations are the results: structured data attached to your data sources that you can query with SQL or work with in notebooks
A typical pipeline might classify data sources by type, extract financial metrics from filings, analyze sentiment in earnings call transcripts, and produce risk assessments, all in a single workflow.
Operators
An operator is a reusable processing definition. Each operator specifies three things:
-
Output schema: A JSON Schema defining the structure of the output data. This might be as simple as a single sentiment score or as complex as a nested object with financial metrics, entity lists, and confidence scores.
-
Generation prompt: Instructions for the AI agent, covering what to look for, how to handle edge cases, and what format to return.
-
Scope: The granularity at which processing operates:
| Scope | Description |
|---|---|
| Document | Analyze the entire data source as a whole |
| Page | Process each page independently |
| Paragraph | Process individual paragraphs |
| Sentence | Fine-grained sentence-level processing |
| Sheet | Analyze an entire tabular sheet |
| Row | Process each row of a table independently |
Scope determines what content the AI agent sees for each unit of work. A document-scoped operator sees the full data source; a paragraph-scoped operator runs once per paragraph, seeing only that paragraph’s content.
Example
A sentiment analysis operator might define:
- Schema: An object with
overall_sentiment(enum),confidence(number), andkey_themes(array of strings) - Prompt: “Analyze the sentiment of this financial document. Consider tone, outlook statements, and risk language.”
- Scope: Document
Workflows
Workflows compose multiple operators into processing pipelines by arranging them as a directed acyclic graph (DAG).
Each node in the graph wraps a single operator and adds workflow-specific configuration:
- Dependencies: Edges between nodes define execution order. A node only runs after all its upstream dependencies have completed.
- Conditions: Optional logic that determines whether a node executes, based on upstream results (see Conditional Execution)
- Error handling: What happens if a node fails: stop the entire workflow, skip the failed node and continue, or retry
Execution Order
When a workflow is created or modified, the system computes a topological ordering of its nodes. This ordering guarantees that every node runs only after all of its upstream dependencies have finished, while maximizing parallelism for independent branches.
Workflows can be triggered manually on a set of data sources, or configured to run automatically whenever new data sources are uploaded to the project.
Execution
When a workflow runs against a data source, the execution engine processes nodes in topological order. For each node, three things happen:
1. Context Building
The engine collects annotations from the node’s direct upstream dependencies and assembles them into a context object. This context is included in the AI prompt alongside the source content, giving the agent access to what previous pipeline stages have already produced.
Only direct upstream nodes contribute to context, not transitive dependencies. This keeps the context focused and prevents token bloat.
2. Scope Filtering
Not all upstream annotations are relevant to every downstream node. Scope filtering applies hierarchy rules to determine visibility:
| Upstream Scope | Downstream Scope | Behavior |
|---|---|---|
| Document | Any | Broadcast: available to all downstream nodes |
| Page | Page, Paragraph, Sentence | Available to nodes processing the same page |
| Paragraph | Paragraph, Sentence | Available to nodes processing the same paragraph |
| Sentence | Sentence | Available only to nodes on the same sentence |
| Sheet | Row | Available to nodes processing rows in that sheet |
3. Generation
The AI agent receives the source content (scoped appropriately), the assembled context from upstream nodes, and the operator’s generation prompt. It produces structured output conforming to the operator’s JSON Schema.
The result is added to the execution context for downstream nodes, and persisted to the database.
Conditional Execution
Workflow nodes can include conditions that control whether they execute, based on results from upstream nodes. This enables branching pipelines where different processing paths activate depending on data characteristics.
Conditions support three types of logic:
- Field comparisons: Compare an upstream annotation field against a value. For example: “only run this node if the upstream classifier’s
document_typeequalsfinancial_filing” - List operations: Check array fields for membership, length, or aggregate values. For example: “only run if the
entitiesarray containsApple Inc.” - Logical combinations: Combine conditions with AND, OR, and NOT for complex branching logic
Example: Branching Pipeline
Consider a pipeline that handles different data types:
- Document Classifier (always runs): Determines the data source type
- Financial Metrics: Runs only when
document_type == "financial_filing" - Executive Summary Generator: Runs only when
document_type == "report"
The classifier runs first, then its output determines which downstream branch activates. Both branches can execute independently in the same workflow.
Annotation Storage
Persisted annotations are stored in a flattened key-value format designed for portability across different database backends.
Each annotation records:
- Source tracking: Which data source, page, or text segment the annotation was produced from, enabling full provenance and traceability back to the original content
- Produced values: The structured data conforming to the operator’s schema, flattened into individually queryable fields
- Metadata: Confidence scores, generation details, and the operator that produced the annotation
Ragnerock handles the translation automatically across supported database backends, so annotations work identically regardless of your storage configuration.
Querying Annotations
Once annotations are persisted, they become queryable as SQL tables. Each operator in your project creates a virtual table named after the operator, with columns corresponding to the schema fields.
SELECT
document_name,
overall_sentiment,
confidence,
key_themes
FROM sentiment_analysis
WHERE overall_sentiment IN ('positive', 'very_positive')
ORDER BY confidence DESC
You can join across operators, aggregate results, and filter by date ranges using standard SQL. Run these queries in the Query Explorer, in a notebook Query cell, or through the Python SDK. Results from notebook query cells become DataFrame artifacts you can import into your JupyterLab environment for further analysis.
Next Steps
- Operators: Deep dive into schema design, prompts, and scopes
- Workflows: Building multi-stage processing pipelines
- Annotations: Core concepts and usage
- AI Research Agent: Interactive research using your annotations