Configuration Guide

Proper configuration is essential for the smooth operation of your Octopipe pipelines. This guide details how to manage both global and project-specific settings, ensuring that your environment is optimized for your needs.

Overview

Octopipe provides a configuration system that lets you adjust settings at multiple levels. This includes global configurations that affect all pipelines and project-specific configurations tailored to individual pipelines.

Global vs. Project Configurations

  • Global Configuration: Applies settings across all projects. Examples include API keys, default LLM models, and common source or destination types.
  • Project-Specific Configuration: Settings unique to a particular pipeline. These may include custom scheduling, local paths, or environment-specific variables.

Managing Configurations via CLI

Setting a Configuration Value

Use the config set command to define or update a configuration value.

octopipe config set default_llm_model gpt-4

Explanation:

This sets the default LLM model used in type safe API generation across all pipelines.

Getting a Configuration Value

To retrieve the current value of a configuration setting:

octopipe config get default_llm_model

Output:

The command returns the value, allowing you to confirm that settings are correctly applied.

Listing All Configurations

View all current configurations with:

octopipe config list

Usage:

This provides a comprehensive view of all global settings, aiding in troubleshooting and verification.

Configuration Files

Octopipe supports configuration files in JSON or YAML formats for complex settings. To use a configuration file during initialization:

octopipe init --name my_pipeline --config-file ./config/my_config.json

Tip:

Use configuration files to store recurring settings and ensure consistency across environments.

Environment Variables

Certain configurations can also be managed through environment variables. For example:

export OCTOPIPE_ENV=local
export OCTOPIPE_API_KEY=YOUR_API_KEY

Usage:

Environment variables are useful for sensitive data and deployment-specific configurations.

Customizing Default Settings

Default Source and Destination Types:

You can set defaults to streamline the creation of new pipelines.

octopipe config set default_source_type api
octopipe config set default_destination_type postgres

Custom LLM Parameters:

Fine-tune the LLM’s behavior for type inference by setting additional parameters:

octopipe config set llm_timeout 30

Advanced Configuration Scenarios

Overriding Configurations:

Project-specific settings will override global configurations when conflicts arise. This allows flexibility in diverse environments.

Dynamic Configuration Reloads:

Some settings can be updated without restarting the pipeline, enabling real-time adjustments.

Backup and Restore:

Regularly export your configuration using:

octopipe config list > backup_config.json

This backup can be used to restore settings in case of failure.

Best Practices

Keep Sensitive Data Secure:

Avoid hardcoding API keys in configuration files. Use environment variables or secret management tools.

Version Control:

Track changes to your configuration files using Git to monitor and roll back changes if needed.

Document Configurations:

Maintain internal documentation that outlines the purpose of each setting and its expected values.

Conclusion

By understanding and properly managing your Octopipe configurations, you ensure that your pipelines run smoothly and efficiently. Use the CLI commands and best practices described in this guide to optimize both global and project-specific settings.

A well-configured environment is the backbone of a reliable and scalable data pipeline system.