streaming_executor¶
streaming_executor ¶
Streaming execution strategy.
Provides memory-efficient processing for large datasets by processing data in chunks.
StreamingExecutor ¶
Bases: ExecutionStrategy
Streaming execution strategy.
Processes data in chunks to maintain constant memory usage. Ideal for very large datasets (100K+ rows) that don't fit in memory.
Benefits: - Constant memory footprint - Can process unlimited dataset sizes - Checkpoints at chunk boundaries - Early results available
Initialize streaming executor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_size
|
int
|
Number of rows per chunk |
1000
|
Source code in ondine/orchestration/streaming_executor.py
execute ¶
Execute stages in streaming mode.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stages
|
list[PipelineStage]
|
Pipeline stages |
required |
context
|
ExecutionContext
|
Execution context |
required |
Yields:
| Type | Description |
|---|---|
DataFrame
|
DataFrames with processed chunks |
Source code in ondine/orchestration/streaming_executor.py
supports_async ¶
supports_streaming ¶
execute_stream ¶
StreamingResult ¶
Result container for streaming execution.
Provides access to metrics after consuming the stream.
Initialize streaming result.
Source code in ondine/orchestration/streaming_executor.py
add_chunk ¶
finalize ¶
Create final ExecutionResult.