Workflows

Chain operators into DAG-based processing pipelines with conditional logic and error handling.

Workflows chain multiple operators into directed acyclic graph (DAG) processing pipelines. Each node in the workflow runs an operator, and edges between nodes define data dependencies: a downstream node can access the results of its upstream dependencies. Workflows support conditional execution, configurable error handling, ephemeral (non-persisted) intermediate results, automatic triggering on document upload, and YAML import/export for version control.

Building a Workflow

Navigate to the Workflows section in the sidebar and click New Workflow to open the visual graph editor.

Step-by-Step

  1. Name your workflow: Enter a name and optional description. Toggle Auto-run on upload if you want this workflow to execute automatically when new documents are added to the project.

  2. Add operator nodes: Click the operator palette on the left side of the canvas and drag operators onto the graph. Each node wraps an existing operator that you’ve created.

  3. Connect nodes: Click an output port on one node and drag to an input port on another to create an edge. Edges define data dependencies: a downstream node receives the annotation results of its direct upstream dependencies as context for its LLM prompt.

  4. Configure each node: Click a node to open its configuration panel on the right side. Here you can set:

    • Condition: An optional rule that determines whether this node runs for a given document (see Conditional Execution)
    • Error handling: What happens if the operator fails (see Error Handling)
    • Persist: Whether to save the annotations to the database or keep them ephemeral (see Ephemeral Annotations)
  5. Save: Click Save in the toolbar to persist your workflow.

The visual workflow editor showing an operator node on the canvas

Running a Workflow

To run a workflow on your documents:

  1. Open the workflow and click Run in the toolbar, or right-click a workflow in the sidebar and select Run
  2. In the dialog, select the documents to process (use search and filters to find specific documents)
  3. Click Run Workflow

The run workflow dialog showing the Excel file selected and the Run Workflow button

Monitor execution progress in the Jobs panel. Each job shows a timeline of execution stages, per-document subtask status, and detailed logs.

The Jobs dashboard showing a completed workflow with per-document subtasks and status indicators

Execution Order

Ragnerock determines execution order using topological sort. Nodes with no upstream dependencies (roots) run first, followed by nodes whose dependencies have all completed. This ensures every node has access to the results it needs before it executes.

The execution order is cached on the workflow and recomputed automatically whenever you add, remove, or reconnect nodes. If your graph contains a cycle (e.g., Node A depends on Node B, which depends on Node A), the editor rejects the connection and reports an error. DAGs must be acyclic by definition.

Key behaviors:

  • Nodes at the same depth in the graph can execute in parallel
  • The execution order is deterministic for a given graph structure
  • Adding or removing a node triggers recomputation of the cached order

Conditional Execution

Each workflow node can have an optional condition that controls whether it executes for a given document. Conditions are evaluated against the annotation results of the node’s direct upstream dependencies. If the condition evaluates to false, the node is skipped for that document.

To configure a condition, click a node in the workflow editor to open its configuration panel, then expand the Condition section.

Conditions reference upstream results using field paths in the format {operator_name}.{field}, where operator_name is the programmatic name of the upstream operator (lowercase, spaces replaced with underscores). For example, if the upstream operator is named “Sentiment Analysis” and it produces a score field, reference it as sentiment_analysis.score.

Field Comparison

Select Field Comparison in the condition builder. Choose the upstream field, a comparison operator, and a value. For example, to run a node only when the upstream sentiment score is at least 0.7: select the field sentiment_analysis.score, set the operator to >=, and enter the value 0.7.

The following comparison operators are available in the condition builder:

OperatorDescriptionExample
==Equal tocategory == "financial"
!=Not equal tocategory != "irrelevant"
>Greater thanscore > 0.8
<Less thanrisk_level < 3
>=Greater than or equalconfidence >= 0.5
<=Less than or equalcount <= 100
matchesRegex pattern matchtopic matches "^earnings"

Type coercion is applied automatically ("5" equals 5, true equals 1) so conditions work even when upstream types are slightly inconsistent.

List Operations

Select List Operation to perform operations on array fields produced by upstream operators. For example, to check if an upstream entity list contains a specific company: select the field entity_extraction.organizations, set the operation to contains, and enter the value "Apple Inc.".

The following list operations are available:

OperationDescriptionRequires comparisonExample
containsCheck if value exists in listNoorganizations contains "Apple Inc."
countCompare list lengthYesentities count >= 3
minCompare minimum elementYesscores min > 0.5
maxCompare maximum elementYesscores max < 100

For count, min, and max, you also select a comparison operator (==, !=, >, <, >=, <=) and enter a comparison value.

Logical Operators

Select AND, OR, or NOT to compose multiple conditions. Each sub-condition is configured the same way, as a field comparison, list operation, or another logical group. This lets you build rules like “category is not irrelevant AND at least one organization was extracted.”

NOT negates a single condition. The node runs only when the inner condition is false.

Logical operators nest arbitrarily. You can put an AND inside an OR, or a NOT around a complex expression.

Condition Safety

If a referenced field is missing from the upstream results (e.g., the upstream operator didn’t produce that field for a particular document), the condition evaluates to false and the node is skipped. This fail-safe prevents downstream nodes from running on incomplete data.

Error Handling

Each node has a configurable error handling strategy that determines what happens if the operator fails for a document:

StrategyBehaviorUse When
SKIP_NODE (default)Skip the failed node and continue the workflow. Downstream nodes execute without the failed node’s results.Production resilience: partial results are acceptable
FAIL_JOBFail the entire job immediately. No further nodes execute.Accuracy is critical: partial results are unacceptable
RETRYRetry the failed node up to max_retries times before falling back to skip or fail behavior.Transient errors (rate limits, timeouts)

Set the error strategy per node in the configuration panel. SKIP_NODE is the recommended default: it keeps the pipeline running even when individual extractions fail, and the failed items are logged for review.

Ephemeral Annotations

By default, every node’s annotations are persisted to the database and become queryable via SQL. For intermediate processing steps where you only need the results as input to downstream nodes, toggle Persist off on the node.

Ephemeral annotations are:

  • Generated normally: The operator runs and produces structured output
  • Available to downstream nodes: Downstream operators see ephemeral results in their upstream context
  • Not written to the database: They don’t appear in query results, don’t consume storage, and don’t clutter your annotation tables

When to use ephemeral annotations:

  • Intermediate classification steps that only exist to route documents through conditional branches
  • Preprocessing steps that normalize or enrich data before a final extraction
  • Any node whose output is only meaningful as input to another node

Auto-Run on Upload

When Auto-run on upload is enabled on a workflow, it triggers automatically whenever new documents are uploaded to the project. This is useful for standardized processing pipelines: upload a batch of earnings calls and have them automatically classified, annotated, and scored without manual intervention.

Disable auto-run for workflows you only want to trigger manually (e.g., experimental pipelines, one-off analyses).

YAML Import/Export

Workflows can be exported as YAML files and imported into the same or different projects. YAML import/export supports an analysis-as-code paradigm inspired by Kubernetes declarative configuration. Workflow YAMLs are configuration you can track and version in source control, separately from Ragnerock, enabling reproducible analysis pipelines across teams and environments.

Export

Export a workflow from the toolbar menu. The YAML file captures the complete workflow definition: operators, schemas, prompts, node positions, conditions, error strategies, and connections:

version: "1"
workflow:
  name: "Earnings Call Analysis"
  description: "Multi-stage pipeline for earnings call processing"
  is_active: true
  auto_run_on_upload: true
operators:
  - name: "Document Classification"
    description: "Classify document type and relevance"
    generation_prompt: |
      Classify this document by type and assess its relevance
      to equity research...
    chunk_type: "document"
    batch_size: null
    multi_annotation: false
    jsonschema:
      type: object
      properties:
        category:
          type: string
          enum: ["earnings_call", "10-K", "10-Q", "research_note", "other"]
        relevance:
          type: number
          minimum: 0
          maximum: 1
      required: ["category", "relevance"]
  - name: "Sentiment Analysis"
    description: "Extract overall sentiment"
    generation_prompt: |
      Analyze sentiment toward the company's future performance...
    chunk_type: "document"
    batch_size: null
    multi_annotation: false
    jsonschema:
      type: object
      properties:
        overall_sentiment:
          type: string
          enum: ["very_negative", "negative", "neutral", "positive", "very_positive"]
        confidence:
          type: number
          minimum: 0
          maximum: 1
      required: ["overall_sentiment", "confidence"]
nodes:
  - operator: "Document Classification"
    condition: null
    persist: true
    on_error: "skip_node"
    max_retries: 0
    position:
      x: 100
      y: 200
      z: 0
    connections:
      in: []
      out: [1]
  - operator: "Sentiment Analysis"
    condition:
      type: field_comparison
      field_path: document_classification.relevance
      operator: ">="
      value: 0.5
    persist: true
    on_error: "skip_node"
    max_retries: 0
    position:
      x: 400
      y: 200
      z: 0
    connections:
      in: [0]
      out: []

Import

Import a YAML file via the workflow toolbar. The import process:

  1. Creates or updates operators: Operators in the YAML are matched by name. New operators are created; existing ones are updated with the imported definition.
  2. Creates the workflow: Workflow metadata (name, description, flags) is applied.
  3. Creates nodes and connections: Nodes are created with their operator references, conditions, and error strategies, then edges are wired up.
  4. Validates the graph: Cycle detection runs on the imported connections. Invalid graphs are rejected.

Import validation checks for: valid YAML structure, version compatibility, required operator fields (name, prompt, schema, chunk type), valid error strategy names, and connection indices within bounds.

Worked Example: Multi-Stage Document Analysis

This example builds a four-node pipeline that classifies documents, extracts entities from relevant ones, computes sentiment as an intermediate signal, and produces a final risk assessment for documents with negative sentiment.

Pipeline Structure

The pipeline has four stages:

  1. Document Classification > 2. Entity Extraction > 3. Sentiment Analysis (ephemeral) > 4. Risk Assessment

Each stage gates the next through a condition, so only relevant documents flow through the full pipeline.

Node Configurations

Node 1: Document Classification: Root node, no condition, always runs. Classifies each document’s type and relevance. Persisted so you can query classification results directly.

Node 2: Entity Extraction: Paragraph-scoped, persisted. Only runs if the upstream classification says the document is not irrelevant. Configure a Field Comparison condition: set field to document_classification.category, operator to !=, and value to "irrelevant".

Node 3: Sentiment Analysis: Document-scoped, ephemeral (Persist toggled off). Serves only as an intermediate signal for the risk assessment node. Only runs if at least one entity was extracted. Configure a List Operation condition: set field to entity_extraction.entities, operation to count, comparison to >=, and value to 1.

Node 4: Risk Assessment: Document-scoped, persisted. Runs only when sentiment is negative. Configure a Logical OR condition with two sub-conditions:

  • Field Comparison: sentiment_analysis.overall_sentiment == "negative"
  • Field Comparison: sentiment_analysis.overall_sentiment == "very_negative"

Running the Pipeline

Open the “Document Analysis Pipeline” workflow in the editor and click Run. Select all documents in the run dialog and click Run Workflow. Monitor progress in the Jobs panel. The timeline view shows each node’s execution stage.

Querying the Results

Since Sentiment Analysis is ephemeral, it doesn’t appear in query results. The persisted nodes (Document Classification, Entity Extraction, and Risk Assessment) are queryable. Open the Query Explorer and run:

-- Which documents were flagged as risky?
SELECT
    document_name,
    risk_level,
    risk_factors,
    confidence
FROM risk_assessment
WHERE risk_level IN ('high', 'critical')
ORDER BY confidence DESC
-- What entities appear most frequently in relevant documents?
SELECT
    name,
    type,
    COUNT(*) as mentions
FROM entity_extraction
GROUP BY name, type
ORDER BY mentions DESC
LIMIT 20

Best Practices

  1. Keep operators focused: Each operator should extract one concept. Use workflows to compose multiple extractions rather than building a single massive operator that tries to do everything.

  2. Use ephemeral nodes for intermediates: If a node’s output is only needed as input to downstream nodes, toggle Persist off to keep your annotation tables clean and reduce storage.

  3. Start with SKIP_NODE error handling: The default strategy keeps your pipeline running even when individual extractions fail. Switch to FAIL_JOB only when partial results would be misleading.

  4. Test on a small sample first: Run your workflow on 2-3 representative documents before processing your entire corpus. Check that conditions fire correctly and downstream nodes receive the expected context.

  5. Use YAML export for version control: Export your workflows as YAML and commit them to your repository. This gives you change history, code review, and the ability to reproduce pipelines across environments.

  6. Design conditions to fail safely: Conditions that reference missing fields evaluate to false, skipping the node. Structure your conditions so that a missing upstream result means the downstream node shouldn’t run anyway.

Next Steps

  • Operators: Deep dive into schema design, prompts, and scopes
  • Annotations: Core concepts and usage
  • Queries: SQL reference and Query Explorer