ondine¶
ondine ¶
LLM Dataset Processing Engine.
A production-grade SDK for processing tabular datasets using Large Language Models with reliability, observability, and cost control.
DatasetProcessor ¶
DatasetProcessor(data: str | DataFrame, input_column: str, output_column: str, prompt: str, llm_config: dict[str, any])
Simplified API for single-prompt, single-column use cases.
This is a convenience wrapper around PipelineBuilder for users who don't need fine-grained control.
Example
processor = DatasetProcessor( data="data.csv", input_column="description", output_column="cleaned", prompt="Clean this text: {description}", llm_config={"provider": "openai", "model": "gpt-4o-mini"} ) result = processor.run()
Initialize dataset processor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
str | DataFrame
|
CSV file path or DataFrame |
required |
input_column
|
str
|
Input column name |
required |
output_column
|
str
|
Output column name |
required |
prompt
|
str
|
Prompt template |
required |
llm_config
|
dict[str, any]
|
LLM configuration dict |
required |
Source code in ondine/api/dataset_processor.py
run ¶
Execute processing and return results.
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with results |
run_sample ¶
Test on first N rows.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n
|
int
|
Number of rows to process |
10
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with sample results |
Source code in ondine/api/dataset_processor.py
estimate_cost ¶
Estimate total processing cost.
Returns:
| Type | Description |
|---|---|
float
|
Estimated cost in USD |
Pipeline ¶
Pipeline(specifications: PipelineSpecifications, dataframe: DataFrame | None = None, executor: ExecutionStrategy | None = None)
Main pipeline class - Facade for dataset processing.
Provides high-level interface for building and executing LLM-powered data transformations.
Example
pipeline = Pipeline(specifications) result = pipeline.execute()
Initialize pipeline with specifications.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
specifications
|
PipelineSpecifications
|
Complete pipeline configuration |
required |
dataframe
|
DataFrame | None
|
Optional pre-loaded DataFrame |
None
|
executor
|
ExecutionStrategy | None
|
Optional execution strategy (default: SyncExecutor) |
None
|
Source code in ondine/api/pipeline.py
add_observer ¶
Add execution observer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
observer
|
ExecutionObserver
|
Observer to add |
required |
Returns:
| Type | Description |
|---|---|
Pipeline
|
Self for chaining |
validate ¶
Validate pipeline configuration.
Returns:
| Type | Description |
|---|---|
ValidationResult
|
ValidationResult with any errors/warnings |
Source code in ondine/api/pipeline.py
estimate_cost ¶
Estimate total processing cost.
Returns:
| Type | Description |
|---|---|
CostEstimate
|
Cost estimate |
Source code in ondine/api/pipeline.py
execute ¶
Execute pipeline end-to-end.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resume_from
|
UUID | None
|
Optional session ID to resume from checkpoint |
None
|
Returns:
| Type | Description |
|---|---|
ExecutionResult
|
ExecutionResult with data and metrics |
Source code in ondine/api/pipeline.py
186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 | |
execute_async
async
¶
Execute pipeline asynchronously.
Uses AsyncExecutor for non-blocking execution. Ideal for integration with FastAPI, aiohttp, and other async frameworks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resume_from
|
UUID | None
|
Optional session ID to resume from checkpoint |
None
|
Returns:
| Type | Description |
|---|---|
ExecutionResult
|
ExecutionResult with data and metrics |
Raises:
| Type | Description |
|---|---|
ValueError
|
If executor doesn't support async |
Source code in ondine/api/pipeline.py
execute_stream ¶
Execute pipeline in streaming mode.
Processes data in chunks for memory-efficient handling of large datasets. Ideal for datasets that don't fit in memory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_size
|
int | None
|
Number of rows per chunk (uses executor's chunk_size if None) |
None
|
Yields:
| Type | Description |
|---|---|
ExecutionResult
|
ExecutionResult objects for each processed chunk |
Raises:
| Type | Description |
|---|---|
ValueError
|
If executor doesn't support streaming |
Source code in ondine/api/pipeline.py
PipelineBuilder ¶
Fluent builder for constructing pipelines.
Provides an intuitive, chainable API for common use cases.
Example
pipeline = ( PipelineBuilder.create() .from_csv("data.csv", input_columns=["text"], output_columns=["result"]) .with_prompt("Process: {text}") .with_llm(provider="openai", model="gpt-4o-mini") .build() )
Initialize builder with None values.
Source code in ondine/api/pipeline_builder.py
create
staticmethod
¶
from_specifications
staticmethod
¶
Create builder from existing specifications.
Useful for loading from YAML and modifying programmatically.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
specs
|
PipelineSpecifications
|
Complete pipeline specifications |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
PipelineBuilder pre-configured with specs |
Example
specs = load_pipeline_config("config.yaml") builder = PipelineBuilder.from_specifications(specs) pipeline = builder.build()
Source code in ondine/api/pipeline_builder.py
from_csv ¶
from_csv(path: str, input_columns: list[str], output_columns: list[str], delimiter: str = ',', encoding: str = 'utf-8') -> PipelineBuilder
Configure CSV data source.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to CSV file |
required |
input_columns
|
list[str]
|
Input column names |
required |
output_columns
|
list[str]
|
Output column names |
required |
delimiter
|
str
|
CSV delimiter |
','
|
encoding
|
str
|
File encoding |
'utf-8'
|
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
from_excel ¶
from_excel(path: str, input_columns: list[str], output_columns: list[str], sheet_name: str | int = 0) -> PipelineBuilder
Configure Excel data source.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to Excel file |
required |
input_columns
|
list[str]
|
Input column names |
required |
output_columns
|
list[str]
|
Output column names |
required |
sheet_name
|
str | int
|
Sheet name or index |
0
|
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
from_parquet ¶
Configure Parquet data source.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to Parquet file |
required |
input_columns
|
list[str]
|
Input column names |
required |
output_columns
|
list[str]
|
Output column names |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
from_dataframe ¶
from_dataframe(df: DataFrame, input_columns: list[str], output_columns: list[str]) -> PipelineBuilder
Configure DataFrame source.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Pandas DataFrame |
required |
input_columns
|
list[str]
|
Input column names |
required |
output_columns
|
list[str]
|
Output column names |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
with_prompt ¶
Configure prompt template.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
template
|
str
|
Prompt template with {variable} placeholders |
required |
system_message
|
str | None
|
Optional system message |
None
|
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
with_llm ¶
with_llm(provider: str, model: str, api_key: str | None = None, temperature: float = 0.0, max_tokens: int | None = None, **kwargs: any) -> PipelineBuilder
Configure LLM provider.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider
|
str
|
Provider name (openai, azure_openai, anthropic) or custom provider ID |
required |
model
|
str
|
Model identifier |
required |
api_key
|
str | None
|
API key (or from env) |
None
|
temperature
|
float
|
Sampling temperature |
0.0
|
max_tokens
|
int | None
|
Max output tokens |
None
|
**kwargs
|
any
|
Provider-specific parameters |
{}
|
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
with_llm_spec ¶
Configure LLM using a pre-built LLMSpec object.
This method allows using LLMSpec objects directly, enabling: - Reusable provider configurations - Use of LLMProviderPresets for common providers - Custom LLMSpec instances for advanced use cases
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec
|
LLMSpec
|
LLM specification object |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Raises:
| Type | Description |
|---|---|
TypeError
|
If spec is not an LLMSpec instance |
Example
Use preset¶
from ondine.core.specifications import LLMProviderPresets
pipeline = ( PipelineBuilder.create() .from_csv("data.csv", input_columns=["text"], output_columns=["result"]) .with_prompt("Process: {text}") .with_llm_spec(LLMProviderPresets.TOGETHER_AI_LLAMA_70B) .build() )
Custom spec¶
custom = LLMSpec( provider=LLMProvider.OPENAI, model="gpt-4o-mini", temperature=0.7 ) pipeline.with_llm_spec(custom)
Override preset¶
spec = LLMProviderPresets.GPT4O_MINI.model_copy( update={"temperature": 0.9} ) pipeline.with_llm_spec(spec)
Source code in ondine/api/pipeline_builder.py
with_custom_llm_client ¶
Provide a custom LLM client instance directly.
This allows advanced users to create their own LLM client implementations by extending the LLMClient base class. The custom client will be used instead of the factory-created client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
any
|
Custom LLM client instance (must inherit from LLMClient) |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Example
class MyCustomClient(LLMClient): def invoke(self, prompt: str, **kwargs) -> LLMResponse: # Custom implementation ...
pipeline = ( PipelineBuilder.create() .from_dataframe(df, ...) .with_prompt("...") .with_custom_llm_client(MyCustomClient(spec)) .build() )
Source code in ondine/api/pipeline_builder.py
with_batch_size ¶
Configure batch size.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
size
|
int
|
Rows per batch |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
with_concurrency ¶
Configure concurrent requests.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
threads
|
int
|
Number of concurrent threads |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
with_checkpoint_interval ¶
Configure checkpoint frequency.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rows
|
int
|
Rows between checkpoints |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
with_rate_limit ¶
Configure rate limiting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rpm
|
int
|
Requests per minute |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
with_max_retries ¶
Configure maximum retry attempts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
retries
|
int
|
Maximum number of retry attempts |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
with_max_budget ¶
Configure maximum budget.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
budget
|
float
|
Maximum budget in USD |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
with_error_policy ¶
Configure error handling policy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
policy
|
str
|
Error policy ('skip', 'fail', 'retry', 'use_default') |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
with_checkpoint_dir ¶
Configure checkpoint directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
directory
|
str
|
Path to checkpoint directory |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
with_parser ¶
Configure response parser.
This method allows setting a custom parser. The parser type determines the response_format in the prompt spec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parser
|
any
|
Parser instance (JSONParser, RegexParser, PydanticParser, etc.) |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
to_csv ¶
Configure CSV output destination.
Alias for with_output(path, format='csv').
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Output CSV file path |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
with_output ¶
Configure output destination.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Output file path |
required |
format
|
str
|
Output format (csv, excel, parquet) |
'csv'
|
merge_strategy
|
str
|
Merge strategy (replace, append, update) |
'replace'
|
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
with_executor ¶
Set custom execution strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
executor
|
ExecutionStrategy
|
ExecutionStrategy instance |
required |
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
with_async_execution ¶
Use async execution strategy.
Enables async/await for non-blocking execution. Ideal for FastAPI, aiohttp, and async frameworks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_concurrency
|
int
|
Maximum concurrent async tasks |
10
|
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
with_streaming ¶
Use streaming execution strategy.
Processes data in chunks for memory-efficient handling. Ideal for large datasets (100K+ rows).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_size
|
int
|
Number of rows per chunk |
1000
|
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Source code in ondine/api/pipeline_builder.py
with_stage ¶
Add a custom pipeline stage by name.
Enables injection of custom processing stages at specific points in the pipeline. Stages must be registered via StageRegistry.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stage_name
|
str
|
Registered stage name (e.g., "rag_retrieval") |
required |
position
|
str
|
Where to inject the stage. Options: - "after_loader" / "before_prompt": After data loading, before prompt formatting - "after_prompt" / "before_llm": After prompt formatting, before LLM invocation - "after_llm" / "before_parser": After LLM invocation, before parsing - "after_parser": After response parsing |
'before_prompt'
|
**stage_kwargs
|
Arguments to pass to stage constructor |
{}
|
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Raises:
| Type | Description |
|---|---|
ValueError
|
If stage_name not registered or position invalid |
Example
RAG retrieval example¶
pipeline = ( PipelineBuilder.create() .from_csv("questions.csv", input_columns=["question"], output_columns=["answer"]) .with_stage( "rag_retrieval", position="before_prompt", vector_store="pinecone", index_name="my-docs", top_k=5 ) .with_prompt("Context: {retrieved_context}\n\nQuestion: {question}\n\nAnswer:") .with_llm(provider="openai", model="gpt-4o") .build() )
Content moderation example¶
pipeline = ( PipelineBuilder.create() .from_csv("content.csv", input_columns=["text"], output_columns=["moderated"]) .with_stage( "content_moderation", position="before_llm", block_patterns=["spam", "offensive"] ) .with_prompt("Moderate: {text}") .with_llm(provider="openai", model="gpt-4o-mini") .build() )
Source code in ondine/api/pipeline_builder.py
608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 | |
with_observer ¶
Add observability observer to the pipeline.
Observers receive events during pipeline execution for monitoring, logging, and tracing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Observer identifier (e.g., "langfuse", "opentelemetry", "logging") |
required |
config
|
dict[str, any] | None
|
Observer-specific configuration dictionary |
None
|
Returns:
| Type | Description |
|---|---|
PipelineBuilder
|
Self for chaining |
Raises:
| Type | Description |
|---|---|
ValueError
|
If observer not registered |
Example
OpenTelemetry for infrastructure monitoring¶
pipeline = ( PipelineBuilder.create() .from_csv("data.csv", ...) .with_prompt("...") .with_llm(provider="openai", model="gpt-4o-mini") .with_observer("opentelemetry", config={ "tracer_name": "my_pipeline", "include_prompts": False }) .build() )
Langfuse for LLM-specific observability¶
pipeline = ( PipelineBuilder.create() .from_csv("data.csv", ...) .with_prompt("...") .with_llm(provider="openai", model="gpt-4o-mini") .with_observer("langfuse", config={ "public_key": "pk-lf-...", "secret_key": "sk-lf-..." }) .build() )
Multiple observers¶
pipeline = ( PipelineBuilder.create() .from_csv("data.csv", ...) .with_prompt("...") .with_llm(provider="openai", model="gpt-4o-mini") .with_observer("langfuse", config={...}) .with_observer("opentelemetry", config={...}) .with_observer("logging", config={"log_level": "DEBUG"}) .build() )
Source code in ondine/api/pipeline_builder.py
build ¶
Build final Pipeline.
Returns:
| Type | Description |
|---|---|
Pipeline
|
Configured Pipeline |
Raises:
| Type | Description |
|---|---|
ValueError
|
If required specifications missing |
Source code in ondine/api/pipeline_builder.py
QuickPipeline ¶
Simplified pipeline API with smart defaults.
Designed for rapid prototyping and common use cases. Automatically detects: - Input columns from prompt template placeholders - Provider from model name (e.g., gpt-4 → openai, claude → anthropic) - Parser type (JSON for multi-column, text for single column) - Reasonable defaults for batch size, concurrency, retries
Examples:
Minimal usage:
>>> pipeline = QuickPipeline.create(
... data="data.csv",
... prompt="Categorize this text: {text}"
... )
>>> result = pipeline.execute()
With explicit outputs:
>>> pipeline = QuickPipeline.create(
... data="products.csv",
... prompt="Extract: {description}",
... output_columns=["brand", "model", "price"]
... )
Override defaults:
>>> pipeline = QuickPipeline.create(
... data=df,
... prompt="Summarize: {content}",
... model="gpt-4o",
... temperature=0.7,
... max_budget=Decimal("5.0")
... )
create
staticmethod
¶
create(data: str | Path | DataFrame, prompt: str, model: str = 'gpt-4o-mini', output_columns: list[str] | str | None = None, provider: str | None = None, temperature: float = 0.0, max_tokens: int | None = None, max_budget: Decimal | float | str | None = None, batch_size: int | None = None, concurrency: int | None = None, **kwargs: Any) -> Pipeline
Create a pipeline with smart defaults.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
str | Path | DataFrame
|
CSV/Excel/Parquet file path or DataFrame |
required |
prompt
|
str
|
Prompt template with {placeholders} |
required |
model
|
str
|
Model name (default: gpt-4o-mini) |
'gpt-4o-mini'
|
output_columns
|
list[str] | str | None
|
Output column name(s). If None, uses ["output"] |
None
|
provider
|
str | None
|
LLM provider. If None, auto-detected from model name |
None
|
temperature
|
float
|
Sampling temperature (default: 0.0 for deterministic) |
0.0
|
max_tokens
|
int | None
|
Max output tokens (default: provider's default) |
None
|
max_budget
|
Decimal | float | str | None
|
Maximum cost budget in USD |
None
|
batch_size
|
int | None
|
Rows per batch (default: auto-sized based on data) |
None
|
concurrency
|
int | None
|
Parallel requests (default: auto-sized) |
None
|
**kwargs
|
Any
|
Additional arguments passed to PipelineBuilder |
{}
|
Returns:
| Type | Description |
|---|---|
Pipeline
|
Configured Pipeline ready to execute |
Raises:
| Type | Description |
|---|---|
ValueError
|
If input data cannot be loaded or prompt is invalid |
Source code in ondine/api/quick.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | |
CostEstimate
dataclass
¶
CostEstimate(total_cost: Decimal, total_tokens: int, input_tokens: int, output_tokens: int, rows: int, breakdown_by_stage: dict[str, Decimal] = dict(), confidence: str = 'estimate')
Cost estimation for pipeline execution.
ExecutionResult
dataclass
¶
ExecutionResult(data: DataFrame, metrics: ProcessingStats, costs: CostEstimate, errors: list[ErrorInfo] = list(), execution_id: UUID = uuid4(), start_time: datetime = datetime.now(), end_time: datetime | None = None, success: bool = True, metadata: dict[str, Any] = dict())
Complete result from pipeline execution.
validate_output_quality ¶
Validate the quality of output data by checking for null/empty values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_columns
|
list[str]
|
List of output column names to check |
required |
Returns:
| Type | Description |
|---|---|
QualityReport
|
QualityReport with quality metrics and warnings |
Source code in ondine/core/models.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | |
ProcessingStats
dataclass
¶
ProcessingStats(total_rows: int, processed_rows: int, failed_rows: int, skipped_rows: int, rows_per_second: float, total_duration_seconds: float, stage_durations: dict[str, float] = dict())
Statistics from pipeline execution.
QualityReport
dataclass
¶
DatasetSpec ¶
Bases: BaseModel
Specification for data source configuration.
validate_source_path
classmethod
¶
Convert string paths to Path objects.
validate_no_overlap
classmethod
¶
Ensure output columns don't overlap with input columns.
Source code in ondine/core/specifications.py
LLMSpec ¶
Bases: BaseModel
Specification for LLM provider configuration.
validate_base_url_format
classmethod
¶
Validate base_url is a valid HTTP(S) URL with a host.
Source code in ondine/core/specifications.py
validate_azure_config
classmethod
¶
Validate Azure-specific configuration.
Source code in ondine/core/specifications.py
validate_provider_requirements ¶
Validate provider-specific requirements.
Source code in ondine/core/specifications.py
PipelineSpecifications ¶
Bases: BaseModel
Container for all pipeline specifications.
ProcessingSpec ¶
PromptSpec ¶
Bases: BaseModel
Specification for prompt template configuration.