Multi-Row Batching Guide¶

Process N rows in a single API call to achieve 100× speedup for large datasets.

Overview¶

Multi-row batching aggregates multiple prompts into a single API call, dramatically reducing: - API calls: 100× fewer (5M → 50K with batch_size=100) - Processing time: 100× faster (69 hours → 42 minutes) - Rate limit issues: Virtually eliminated

Quick Start¶

Basic Usage¶

from ondine import PipelineBuilder

# Enable multi-row batching with one line
pipeline = (
    PipelineBuilder.create()
    .from_csv("data.csv", input_columns=["text"], output_columns=["sentiment"])
    .with_prompt("Classify sentiment: {text}")
    .with_batch_size(100)  # Process 100 rows per API call!
    .with_llm(provider="openai", model="gpt-4o-mini")
    .build()
)

result = pipeline.execute()

How It Works¶

Without Batching (batch_size=1, default):

Row 1: "Product A is great" → API call 1 → "positive"
Row 2: "Product B is bad" → API call 2 → "negative"
Row 3: "Product C is okay" → API call 3 → "neutral"
...
100 rows = 100 API calls

With Batching (batch_size=100):

Batch prompt:
[
  {"id": 1, "input": "Product A is great"},
  {"id": 2, "input": "Product B is bad"},
  ...
  {"id": 100, "input": "Product Z"}
]
↓ 1 API call ↓
Batch response:
[
  {"id": 1, "result": "positive"},
  {"id": 2, "result": "negative"},
  ...
  {"id": 100, "result": "neutral"}
]

100 rows = 1 API call (100× reduction!)

Choosing Batch Size¶

Recommended Batch Sizes¶

Use Case	Batch Size	Reason
Simple classification	100-500	Short prompts, low failure risk
Sentiment analysis	50-100	Medium complexity
Text summarization	10-50	Longer outputs, higher risk
Complex extraction	10-20	Complex prompts, careful parsing

Factors to Consider¶

1. Model Context Window - GPT-4o/GPT-4o-mini: 128K tokens → batch_size up to 500 - Claude Sonnet: 200K tokens → batch_size up to 800 - Llama 3.1: 131K tokens → batch_size up to 500

Ondine automatically validates batch size against context limits.

2. Prompt Complexity - Simple prompts (20 tokens): Larger batches (100-500) - Medium prompts (100 tokens): Medium batches (50-100) - Complex prompts (500+ tokens): Smaller batches (10-50)

3. Failure Tolerance - Larger batches = higher risk of partial failures - Start with batch_size=10, increase gradually - Monitor partial failure rate

Batch Strategies¶

JSON Strategy (Default)¶

Formats batches as JSON arrays:

pipeline = (
    PipelineBuilder.create()
    .with_batch_size(100)
    .with_batch_strategy("json")  # Default, most reliable
    .build()
)

Pros: - Most reliable (Pydantic validation) - Handles complex outputs - Automatic error detection

Cons: - ~200 token overhead per batch - Requires LLM to follow JSON format

CSV Strategy (Future)¶

Coming soon - more compact format for simple use cases.

Error Handling¶

Partial Failures¶

If some rows fail to parse, Ondine handles gracefully:

# Batch with 100 rows
# LLM returns 97 results (missing IDs: 23, 67, 89)

# Ondine automatically:
# 1. Parses 97 successful results
# 2. Marks 3 failed rows with [PARSE_ERROR]
# 3. Continues processing

# Result:
# - 97 rows: Valid results
# - 3 rows: "[PARSE_ERROR: Row not found in batch response]"

Complete Failures¶

If entire batch fails to parse:

# Batch response: "I cannot provide results in JSON format"

# Ondine automatically:
# 1. Marks all rows with [BATCH_PARSE_ERROR]
# 2. Logs error for debugging
# 3. Continues with next batch

# Result:
# - All rows: "[BATCH_PARSE_ERROR: Invalid JSON]"

Performance Benchmarks¶

Scaling to 5M Rows¶

Batch Size	API Calls	Time	Speedup	Cost Overhead
1 (default)	5,000,000	~69 hours	1×	0%
10	500,000	~7 hours	10×	~2%
100	50,000	~42 minutes	100×	~5%
500	10,000	~8 minutes	500×	~10%

Cost overhead: JSON formatting adds ~200 tokens per batch

Real-World Example¶

Dataset: 10 rows, sentiment classification

Without batching: - API calls: 10 - Duration: ~15 seconds - Tokens: 210 (21 per row)

With batching (batch_size=5): - API calls: 2 (5× reduction!) - Duration: ~6 seconds (2.5× faster) - Tokens: 250 (25 per row, ~20% overhead)

Combining with Prefix Caching¶

For maximum cost savings, combine both techniques:

# Shared context (cached across all rows)
SHARED_CONTEXT = """You are an expert data analyst.
[1024+ tokens of general knowledge]
"""

pipeline = (
    PipelineBuilder.create()
    .with_prompt("TASK: Classify\\nINPUT: {text}")
    .with_system_prompt(SHARED_CONTEXT)  # Cached (40-50% savings)
    .with_batch_size(100)  # 100× fewer API calls
    .with_llm(provider="openai", model="gpt-4o-mini")
    .build()
)

Combined savings: - Prefix caching: 40-50% cost reduction - Multi-row batching: 100× fewer API calls - Total: 90%+ cost reduction + 100× speedup!

CLI Configuration¶

Enable batching via YAML config:

prompt:
  template: "Classify: {text}"
  batch_size: 100  # Multi-row batching
  batch_strategy: json
  system_message: "You are a classifier."  # Cached

llm:
  provider: openai
  model: gpt-4o-mini

Run with:

ondine process --config config.yaml

Best Practices¶

Start small: Begin with batch_size=10, increase gradually
Monitor failures: Check for [PARSE_ERROR] in results
Validate context: Ondine auto-validates, but check logs for warnings
Combine techniques: Use with prefix caching for maximum savings
Test first: Run on 100-1000 rows before scaling to millions

Troubleshooting¶

Issue: Batch parsing failures¶

Symptom: Many rows with [PARSE_ERROR]

Solutions: - Reduce batch_size (try 10-20) - Simplify prompt instructions - Add more explicit JSON format examples - Check LLM is following instructions

Issue: Context window exceeded¶

Symptom: Warning logs about batch size validation

Solutions: - Reduce batch_size - Use model with larger context window - Simplify prompts to reduce tokens

Issue: Slower than expected¶

Symptom: Not seeing 100× speedup

Solutions: - Check batch_size is actually set (print pipeline.specifications.prompt.batch_size) - Verify batch aggregation logs appear - Check if rate limiting is bottleneck (increase rate_limit_rpm)

Examples¶

See examples/21_multi_row_batching.py for complete working examples: - Example 1: Without batching (baseline) - Example 2: With batching (5× speedup) - Example 3: Large dataset extrapolation (5.4M rows)

API Reference¶

`with_batch_size(batch_size: int)`¶

Enable multi-row batching.

Parameters: - batch_size (int): Number of rows to process per API call (1-500)

Returns: PipelineBuilder (for chaining)

Raises: ValueError if batch_size < 1 or called before with_prompt()

`with_batch_strategy(strategy: str)`¶

Set batch formatting strategy.

Parameters: - strategy (str): "json" or "csv"