octopipe pipeline Command Reference

The octopipe pipeline command is used to create and manage complete data pipelines in Octopipe. This command orchestrates the interaction between sources, transformations, and destinations, allowing you to build robust and scheduled data workflows.

Purpose

  • Pipeline Orchestration: Create, update, and remove pipelines that define the end-to-end data flow.
  • Scheduling: Configure pipelines to run at specific intervals using cron expressions.
  • Comprehensive Management: Manage all aspects of pipeline execution including start, stop, and status monitoring.

Usage

octopipe pipeline <subcommand> [options]

Subcommands

create

Purpose: Create a new pipeline.

Usage Example:

octopipe pipeline create --name daily_sales --source sales_api --destination sales_db --transform sales_transform --schedule "0 0 * * *"

Options:

--name <pipeline_name>: Unique name for the pipeline.

--source <source_name>: The data source for the pipeline.

--destination <destination_name>: The data destination for the pipeline.

--transform <transform_name>: The transformation to apply.

--schedule <cron_expression>: Cron expression to schedule the pipeline.

--option <key>=<value>: Additional pipeline-specific options.

list

Purpose: List all configured pipelines.

Usage Example:

octopipe pipeline list

update

Purpose: Update an existing pipeline.

Usage Example:

octopipe pipeline update daily_sales --option new_setting=value

remove

Purpose: Remove a pipeline.

Usage Example:

octopipe pipeline remove daily_sales

Detailed Behavior

Creation Process:

The create command validates all components (source, destination, transform) before constructing the pipeline. It ensures that the scheduled time is properly formatted.

Scheduling:

Pipelines can be scheduled using standard cron expressions. Octopipe integrates with Airflow to manage these schedules.

Options Management:

Additional options can fine-tune behavior, such as retry policies and resource allocation.

Examples

Creating a Pipeline

octopipe pipeline create --name daily_sales --source sales_api --destination sales_db --transform sales_transform --schedule "0 0 * * *"

Listing Pipelines

octopipe pipeline list

Updating a Pipeline

octopipe pipeline update daily_sales --option new_setting=value

Removing a Pipeline

octopipe pipeline remove daily_sales

Best Practices

Modular Design:

Ensure that each pipeline component is independently tested before integration.

Clear Naming:

Use descriptive names for pipelines to help with management and debugging.

Regular Reviews:

Periodically review pipeline configurations and schedules for optimization.

Troubleshooting

Invalid Cron Expression:

Check that your schedule string conforms to cron syntax.

Component Mismatch:

Verify that the specified source, destination, and transformation exist and are correctly configured.

Execution Failures:

Use the logs and status commands to identify where in the pipeline the error occurs.

Conclusion

The octopipe pipeline command ties together all aspects of your data workflow. With proper use of creation, updating, and scheduling options, you can build and maintain efficient pipelines that run reliably and scale with your data needs.