CLI Usage Guide

Octopipe’s command-line interface (CLI) is at the heart of how you interact with the system. This guide provides a comprehensive overview of the CLI, its commands, and best practices for efficient pipeline management.

Overview

Octopipe’s CLI is designed to be intuitive, combining the best practices of tools like kubectl and Meltano’s CLI. It abstracts complex operations into simple commands, allowing you to initialize projects, manage sources and destinations, configure transformations, and orchestrate pipelines—all from your terminal.

Getting Started with the CLI

  • Installation Verification: Once Octopipe is installed, verify the CLI is accessible by running:
    octopipe --version
    

This command outputs the current version and confirms that the CLI is ready.

Help Command:

For any command, you can view detailed usage instructions by appending the help flag:

octopipe help

To get help on a specific command:

octopipe help init

Common CLI Commands

  1. Initialization (init):

Purpose: Set up a new pipeline or project.

Usage:

octopipe init --name my_pipeline --description "Description of your pipeline"

Details: This command creates the necessary configuration files and directories.

  1. Authentication (login/logout):

Purpose: Securely authenticate to access cloud or local resources.

Usage:

octopipe login --api-key YOUR_API_KEY
octopipe logout
  1. Source and Destination Management:

Add a Data Source:

octopipe source add --name my_source --type api --option url=https://api.example.com --option token=TOKEN

Add a Data Destination:

octopipe destination add --name my_destination --type postgres --option host=localhost --option port=5432
  1. Transformation Commands:

Define a Transformation:

octopipe transform add --name my_transform --source my_source --destination my_destination --schema-file ./path/to/schema.json
  1. Pipeline Management:

Create a Pipeline:

octopipe pipeline create --name daily_pipeline --source my_source --destination my_destination --transform my_transform --schedule "0 0 * * *"

Start, Stop, and Monitor Pipelines:

octopipe start daily_pipeline
octopipe stop daily_pipeline
octopipe logs daily_pipeline --follow

Advanced CLI Options

Verbose Output:

Use the —verbose flag with any command to see detailed logs of the process:

octopipe init --name verbose_pipeline --verbose

Custom Configurations:

Pass a configuration file using —config-file to customize settings on initialization:

octopipe init --name custom_pipeline --config-file ./config/custom.json

Real-Time Monitoring:

The logs command can stream logs live for troubleshooting:

octopipe logs daily_pipeline --follow --tail 100

Best Practices

Consistency:

Always follow naming conventions for sources, destinations, and pipelines to avoid confusion.

Modular Commands:

Use the CLI in small, modular steps. For example, test source connectivity before creating a full pipeline.

Regular Updates:

Keep your Octopipe CLI updated to benefit from the latest features and security improvements.

Troubleshooting CLI Commands

Command Not Found:

If you receive an error indicating a command is not found, verify your PATH includes the Octopipe binary.

Authentication Issues:

Re-run the login command and check your API key if authentication fails.

Error Logs:

Use the —verbose flag to capture detailed error messages, and refer to the troubleshooting guide for common issues.

Conclusion

The Octopipe CLI is a powerful tool that simplifies pipeline management. By mastering the basic commands and leveraging advanced options, you can build, deploy, and monitor pipelines with ease. Experiment with different commands to become more comfortable with the CLI and explore its full potential.

---

### docs/guides/advanced-topics.md

```markdown
# Advanced Topics

For users looking to take their Octopipe experience to the next level, this guide covers advanced configurations, customizations, and integrations. Dive into detailed topics that enable you to optimize performance, extend functionality, and tailor the platform to your unique needs.

## Customizing Transformations

- **Advanced Mapping:**
  Learn how to refine the mapping between your type safe API schema and the destination schema. Custom scripts can be injected to handle complex data transformations.
  ```bash
  # Example: Custom transformation logic
  octopipe transform update my_transform --option custom_script=./scripts/custom_transform.py

User-Defined Logic:

Incorporate conditional logic and data cleaning steps to ensure the highest quality of data is loaded.

Integrating External Tools

Third-Party Monitoring:

Integrate Octopipe with monitoring tools like Prometheus and Grafana. Detailed logs and metrics from Airflow and Spark can be forwarded to these systems.

Custom Connectors:

If your data source or destination is not supported out-of-the-box, learn how to develop and plug in custom connectors.

Enhanced Logging:

Use middleware to intercept logs and enrich them with additional metadata for better debugging and analysis.

Performance Tuning

Optimizing Spark:

Adjust Spark configurations (e.g., executor memory, number of cores) to improve transformation performance. For example:

export SPARK_EXECUTOR_MEMORY=4g
export SPARK_EXECUTOR_CORES=2

Airflow Scheduling:

Fine-tune scheduling intervals to balance resource usage and data freshness. Learn how to configure task retries and timeouts for improved reliability.

Scalability Strategies:

Utilize Kafka for buffering high-velocity data streams and horizontally scale your services to handle increased loads.

Advanced Debugging Techniques

Interactive Debugging:

Leverage breakpoints and interactive shells in your transformation scripts to diagnose issues in real time.

Detailed Metrics:

Configure detailed metric collection for each pipeline stage. Use these insights to pinpoint bottlenecks and optimize performance.

Custom Alerting:

Set up alerting systems to notify you of failures or performance degradation. This proactive approach helps in maintaining high availability.

Security and Compliance

Data Governance:

Implement data masking and encryption techniques during transformation to meet compliance requirements.

Access Controls:

Use role-based access control (RBAC) to restrict access to sensitive pipeline configurations and logs.

Audit Logging:

Enable detailed audit logs to track changes in configuration and data access, ensuring accountability.

Future Enhancements

Interactive UIs:

Plans are underway for a graphical interface that allows users to design and monitor pipelines visually.

AI-Driven Optimizations:

Future releases may include AI-driven recommendations for performance tuning and error resolution.

Plugin Ecosystem:

Expand the platform with a rich ecosystem of plugins developed by the community to extend Octopipe’s functionality.

Best Practices for Advanced Users

Documentation:

Maintain internal documentation for custom transformations and integrations to facilitate team collaboration.

Regular Reviews:

Periodically review performance metrics and configurations to ensure that your pipelines are optimized for current workloads.

Community Engagement:

Share your advanced configurations and custom scripts with the Octopipe community to contribute to the collective knowledge base.

Conclusion

Advanced topics in Octopipe empower you to fully harness the platform’s potential. By customizing transformations, integrating with external tools, and optimizing performance, you can create highly efficient and secure data pipelines tailored to your organization’s needs.

Explore these advanced features, experiment with custom configurations, and join our community to stay updated on the latest enhancements and best practices!