Agent Orchestration

How Ragnerock orchestrates agentic data processing pipelines using workflows and operators.

Ragnerock provides a general-purpose framework for orchestrating agentic data processing pipelines. Workflows and operators are the core abstractions. They let you define what your AI agents produce, how they chain together, and at what granularity they process your data. Use cases range from classification and entity extraction to sentiment analysis, feature generation, summarization, and custom business logic.

Overview

The orchestration framework has three core building blocks:

  • Operators define what to produce: a JSON Schema describing the output shape, AI instructions for the task, and a scope controlling granularity
  • Workflows define how to orchestrate production: a graph of operators with dependencies, conditions, and error handling
  • Annotations are the results: structured data attached to your data sources that you can query with SQL or work with in notebooks

A typical pipeline might classify data sources by type, extract financial metrics from filings, analyze sentiment in earnings call transcripts, and produce risk assessments, all in a single workflow.

Operators

An operator is a reusable processing definition. Each operator specifies three things:

  1. Output schema: A JSON Schema defining the structure of the output data. This might be as simple as a single sentiment score or as complex as a nested object with financial metrics, entity lists, and confidence scores.

  2. Generation prompt: Instructions for the AI agent, covering what to look for, how to handle edge cases, and what format to return.

  3. Scope: The granularity at which processing operates:

ScopeDescription
DocumentAnalyze the entire data source as a whole
PageProcess each page independently
ParagraphProcess individual paragraphs
SentenceFine-grained sentence-level processing
SheetAnalyze an entire tabular sheet
RowProcess each row of a table independently

Scope determines what content the AI agent sees for each unit of work. A document-scoped operator sees the full data source; a paragraph-scoped operator runs once per paragraph, seeing only that paragraph’s content.

Example

A sentiment analysis operator might define:

  • Schema: An object with overall_sentiment (enum), confidence (number), and key_themes (array of strings)
  • Prompt: “Analyze the sentiment of this financial document. Consider tone, outlook statements, and risk language.”
  • Scope: Document

Workflows

Workflows compose multiple operators into processing pipelines by arranging them as a directed acyclic graph (DAG).

Each node in the graph wraps a single operator and adds workflow-specific configuration:

  • Dependencies: Edges between nodes define execution order. A node only runs after all its upstream dependencies have completed.
  • Conditions: Optional logic that determines whether a node executes, based on upstream results (see Conditional Execution)
  • Error handling: What happens if a node fails: stop the entire workflow, skip the failed node and continue, or retry

Execution Order

When a workflow is created or modified, the system computes a topological ordering of its nodes. This ordering guarantees that every node runs only after all of its upstream dependencies have finished, while maximizing parallelism for independent branches.

Workflows can be triggered manually on a set of data sources, or configured to run automatically whenever new data sources are uploaded to the project.

Execution

When a workflow runs against a data source, the execution engine processes nodes in topological order. For each node, three things happen:

1. Context Building

The engine collects annotations from the node’s direct upstream dependencies and assembles them into a context object. This context is included in the AI prompt alongside the source content, giving the agent access to what previous pipeline stages have already produced.

Only direct upstream nodes contribute to context, not transitive dependencies. This keeps the context focused and prevents token bloat.

2. Scope Filtering

Not all upstream annotations are relevant to every downstream node. Scope filtering applies hierarchy rules to determine visibility:

Upstream ScopeDownstream ScopeBehavior
DocumentAnyBroadcast: available to all downstream nodes
PagePage, Paragraph, SentenceAvailable to nodes processing the same page
ParagraphParagraph, SentenceAvailable to nodes processing the same paragraph
SentenceSentenceAvailable only to nodes on the same sentence
SheetRowAvailable to nodes processing rows in that sheet

3. Generation

The AI agent receives the source content (scoped appropriately), the assembled context from upstream nodes, and the operator’s generation prompt. It produces structured output conforming to the operator’s JSON Schema.

The result is added to the execution context for downstream nodes, and persisted to the database.

Conditional Execution

Workflow nodes can include conditions that control whether they execute, based on results from upstream nodes. This enables branching pipelines where different processing paths activate depending on data characteristics.

Conditions support three types of logic:

  • Field comparisons: Compare an upstream annotation field against a value. For example: “only run this node if the upstream classifier’s document_type equals financial_filing
  • List operations: Check array fields for membership, length, or aggregate values. For example: “only run if the entities array contains Apple Inc.
  • Logical combinations: Combine conditions with AND, OR, and NOT for complex branching logic

Example: Branching Pipeline

Consider a pipeline that handles different data types:

  1. Document Classifier (always runs): Determines the data source type
  2. Financial Metrics: Runs only when document_type == "financial_filing"
  3. Executive Summary Generator: Runs only when document_type == "report"

The classifier runs first, then its output determines which downstream branch activates. Both branches can execute independently in the same workflow.

Annotation Storage

Persisted annotations are stored in a flattened key-value format designed for portability across different database backends.

Each annotation records:

  • Source tracking: Which data source, page, or text segment the annotation was produced from, enabling full provenance and traceability back to the original content
  • Produced values: The structured data conforming to the operator’s schema, flattened into individually queryable fields
  • Metadata: Confidence scores, generation details, and the operator that produced the annotation

Ragnerock handles the translation automatically across supported database backends, so annotations work identically regardless of your storage configuration.

Querying Annotations

Once annotations are persisted, they become queryable as SQL tables. Each operator in your project creates a virtual table named after the operator, with columns corresponding to the schema fields.

SELECT
    document_name,
    overall_sentiment,
    confidence,
    key_themes
FROM sentiment_analysis
WHERE overall_sentiment IN ('positive', 'very_positive')
ORDER BY confidence DESC

You can join across operators, aggregate results, and filter by date ranges using standard SQL. Run these queries in the Query Explorer, in a notebook Query cell, or through the Python SDK. Results from notebook query cells become DataFrame artifacts you can import into your JupyterLab environment for further analysis.

Next Steps