Self-Hosting Octopipe

Octopipe is designed to provide an outstanding local development experience. Self-hosting enables you to test and run pipelines on your own infrastructure, ensuring you have full control over the environment and can debug issues in real time.

Why Self-Host?

  • Local Development: Focus on rapid development and testing without the overhead of cloud deployment.
  • Real-Time Monitoring: Access detailed logs and status updates to troubleshoot and optimize pipeline performance.
  • Full Control: Customize your environment to suit specific development needs.

Setting Up Your Local Environment

Prerequisites

Ensure your system meets the following requirements:

  • Python 3.8+ installed.
  • Node.js and npm installed.
  • Docker and Docker Compose (recommended for managing multiple services).
  • Git for source control.

Step 1: Clone the Repository

Clone the Octopipe repository from GitHub:

git clone https://github.com/your-org/octopipe.git
cd octopipe

Step 2: Install Dependencies

Install Python dependencies:

pip install -r requirements.txt

If Node.js dependencies are needed, run:

npm install

Step 3: Set Up Docker Compose

For a self-hosted setup, Docker Compose can launch all required services (Meltano, Airflow, Kafka, Spark, etc.). Create or update the docker-compose.yml file with the required services:

version: '3.8'
services:
  octopipe:
    image: your-org/octopipe:latest
    ports:
      - "8000:8000"
    environment:
      - OCTOPIPE_ENV=local
  airflow:
    image: apache/airflow:2.2.2
    ports:
      - "8080:8080"
  kafka:
    image: confluentinc/cp-kafka:latest
  spark:
    image: bitnami/spark:latest

Tip: Customize the configuration as per your environment and resource availability.

Step 4: Launch the Environment

Start all services using Docker Compose:

docker-compose up

This command brings up all the required services in one command, making it easier to manage local development.

Running and Testing Pipelines Locally

Initialize a New Pipeline:

octopipe init --name local_pipeline --description "Local development pipeline" --local

Manage Components:

Add data sources, destinations, and transformations as per your project requirements.

Start and Monitor Pipelines:

octopipe start local_pipeline
octopipe logs local_pipeline --follow

Monitoring and Debugging

Real-Time Logs:

Use the logs command to stream output to your terminal, allowing for on-the-fly debugging.

Status Checks:

Regularly check pipeline status with:

octopipe status local_pipeline

Step-by-Step Debugging:

In case of errors, stop the pipeline, inspect logs, adjust configurations, and restart:

octopipe stop local_pipeline
octopipe start local_pipeline

Tips for an Amazing Local Experience

Use a Dedicated Environment:

Run Octopipe in a separate virtual machine or container to avoid conflicts with other applications.

Automate Routine Tasks:

Use scripts to automate repetitive tasks such as starting/stopping services.

Document Local Configurations:

Keep notes on any local tweaks to facilitate quick troubleshooting and team onboarding.

Conclusion

Self-hosting Octopipe offers a powerful and flexible way to develop, test, and optimize your data pipelines locally. With detailed logs, easy management of services through Docker Compose, and robust CLI tools, you can enjoy a development experience that is both efficient and scalable.

Embrace the freedom of local development, and fine-tune your pipelines before deploying them to production!