Data Orchestrator - Concepts
Orchestration
Orchestration is the automated coordination of tasks and workflows that keeps data operations reliable, scalable, and maintainable. In the context of Scaleway Data Orchestrator, it enables users to define, schedule, and manage complex data pipelines. It also handles dependencies, error recovery, and execution order seamlessly. Instead of manually triggering scripts or monitoring jobs, orchestration brings structure and intelligence, turning fragmented processes into unified, business-aligned workflows.
Tasks
Action task
An action task represents the executable unit within a workflow that performs concrete work. An action task can be:
- Serverless Jobs: Long-running batch processes that scale automatically without infrastructure management.
- Serverless Functions: Lightweight, event-driven code execution for quick transformations or API calls.
- Spark Jobs: Distributed data processing tasks for large-scale ETL or analytics using Apache Spark.
- Other compute-intensive or service-specific jobs (e.g., data validation, model inference).
These tasks are orchestrated in sequence or in parallel, forming the backbone of data processing pipelines.
Logic task
A Logic task controls the flow and decision-making within a workflow, enabling dynamic behavior beyond simple linear execution. A logic task can be:
- Switch: Direct flow based on runtime conditions (e.g., file size, data quality).
- Fork: Split execution into parallel branches to process data concurrently.
- Try catch: Implement error-handling blocks to manage failures and enable retries or fallback logic.
These tasks allow users to embed business logic directly into pipelines, making them resilient and adaptable.
Trigger
A trigger is the event that initiates a workflow execution. A trigger can be:
- Manual: User starts the run via the Scaleway Console or CLI (ideal for testing).
- Schedule: Automatic execution based on time (e.g., daily at 8:00 AM), set with a built-in scheduler.
- Event: Triggered by external signals (e.g., new file in object storage, message in a queue), enabling reactive, real-time data processing.
Views
Code view
Every workflow can be visualized as code, showing tasks and their dependencies.
Graph view
Every workflow can be visualized as a Directed Acyclic Graph (DAG), showing the tasks and their dependencies.
Workflow
A workflow is a structured sequence of action tasks and logical tasks that define an end-to-end data process.
Workflow definition
The declarative blueprint of a workflow, typically described in code (e.g., YAML or Python) or designed visually. It specifies tasks, dependencies, conditions, and execution parameters. This definition is version-controlled, reusable, and portable across environments.
Workflow execution / run
The runtime instance of a workflow definition. Each execution (or run) tracks the state, logs, and results of every task, providing full observability and auditability. Runs can succeed, fail, or be paused, with detailed insights for debugging.