Pipeline Deployment Guide

Complete deployment guide for Telegen unified pipeline across all environments.

Overview

The unified pipeline provides a unified data path for metrics, traces, and logs with:

  • Data Quality Controls: Cardinality limiting, rate limiting, attribute limits
  • Signal Transformation: Rule-based transformations with PII redaction
  • Flexible Export: Multi-endpoint failover, persistent queuing
  • Hot Reload: Configuration changes without restart
  • Graceful Shutdown: Drain in-flight data on shutdown

Configuration Reference

All pipeline features are configured under the pipeline section:

pipeline:
  enabled: true
  
  limits:
    cardinality:
      enabled: true
      default_max_series: 10000
      global_max_series: 100000
    rate:
      enabled: true
      metrics_per_second: 100000
      traces_per_second: 50000
      logs_per_second: 200000
    attributes:
      enabled: true
      max_resource_attributes: 128
      max_attribute_value_size: 4096
  
  transform:
    enabled: true
    rules:
      - name: add-environment
        actions:
          - type: set_attribute
            set_attribute:
              key: environment
              value: production
  
  pii_redaction:
    enabled: true
    scan_log_bodies: true
  
  export:
    otlp:
      endpoint: otel-collector:4317
    batch:
      size: 1000
      timeout: 5s
  
  operations:
    hot_reload:
      enabled: true
      enable_sighup: true
    shutdown:
      timeout: 30s
      drain_timeout: 10s

Bare Metal / Virtual Machines

systemd Deployment

This is the recommended method for Linux servers, VMs, and bare-metal hosts.

Prerequisites

  • Linux kernel 4.18+ (5.8+ recommended for full eBPF support)
  • systemd
  • Root access or CAP_BPF/CAP_SYS_ADMIN capabilities
  • Network access to OTLP endpoint

Step 1: Download Binary

# Latest version
VERSION=$(curl -s https://api.github.com/repos/mirastacklabs-ai/telegen/releases/latest \
  | grep tag_name | cut -d '"' -f4 | sed 's/release\/mark-v//')

# Download (amd64)
curl -LO "https://github.com/mirastacklabs-ai/telegen/releases/download/release/mark-v${VERSION}/telegen-linux-amd64.tar.gz"
tar xzf telegen-linux-amd64.tar.gz
sudo mv telegen-linux-amd64 /usr/local/bin/telegen
sudo chmod +x /usr/local/bin/telegen

# Verify
telegen --version

For ARM64:

curl -LO "https://github.com/mirastacklabs-ai/telegen/releases/download/release/mark-v${VERSION}/telegen-linux-arm64.tar.gz"

Step 2: Create V3 Configuration

sudo mkdir -p /etc/telegen

cat << 'EOF' | sudo tee /etc/telegen/config.yaml
telegen:
  mode: agent
  service_name: telegen
  log_level: info

# Pipeline Configuration
pipeline:
  enabled: true
  
  limits:
    cardinality:
      enabled: true
      default_max_series: 10000
      global_max_series: 100000
      series_ttl: 1h
    rate:
      enabled: true
      metrics_per_second: 100000
      traces_per_second: 50000
      logs_per_second: 200000
    attributes:
      enabled: true
      max_resource_attributes: 128
      max_attribute_value_size: 4096
      protected_attributes:
        - service.name
        - host.name
  
  pii_redaction:
    enabled: true
    scan_log_bodies: true
    rules:
      - name: email
        type: email
        enabled: true
      - name: ssn
        type: ssn
        enabled: true

  transform:
    enabled: true
    rules:
      - name: add-host-info
        match:
          signal_types: [metrics, traces, logs]
        actions:
          - type: set_attribute
            set_attribute:
              key: deployment.environment
              value: ${TELEGEN_ENVIRONMENT:-production}

  export:
    otlp:
      endpoint: ${TELEGEN_OTLP_ENDPOINT:-otel-collector:4317}
      insecure: ${TELEGEN_OTLP_INSECURE:-true}
    batch:
      size: 1000
      timeout: 5s
    queue:
      enabled: true
      directory: /var/lib/telegen/queue
      max_size_bytes: 500000000

  operations:
    hot_reload:
      enabled: true
      config_path: /etc/telegen/config.yaml
      check_interval: 30s
      enable_sighup: true
    shutdown:
      timeout: 30s
      drain_timeout: 10s

# Agent configuration
agent:
  ebpf:
    enabled: true
    network:
      enabled: true
      http: true
      grpc: true
    syscalls:
      enabled: true
  profiling:
    enabled: true
    cpu: true
    memory: true
  discovery:
    enabled: true
    interval: 30s

self_telemetry:
  enabled: true
  listen: ":19090"
EOF

Step 3: Environment File

cat << 'EOF' | sudo tee /etc/telegen/telegen.env
# OTLP endpoint
TELEGEN_OTLP_ENDPOINT=otel-collector.example.com:4317
TELEGEN_OTLP_INSECURE=false

# Environment tag
TELEGEN_ENVIRONMENT=production

# Optional: API authentication
# OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer your-token

# Logging
TELEGEN_LOG_LEVEL=info
EOF

sudo chmod 600 /etc/telegen/telegen.env

Step 4: systemd Service

cat << 'EOF' | sudo tee /etc/systemd/system/telegen.service
[Unit]
Description=Telegen V3 Observability Agent
Documentation=https://telegen.mirastacklabs.ai
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=root
Group=root

EnvironmentFile=/etc/telegen/telegen.env
ExecStart=/usr/local/bin/telegen --config=/etc/telegen/config.yaml
ExecReload=/bin/kill -HUP $MAINPID

Restart=always
RestartSec=5
LimitNOFILE=65536
LimitMEMLOCK=infinity

# eBPF capabilities
AmbientCapabilities=CAP_SYS_ADMIN CAP_SYS_PTRACE CAP_NET_ADMIN CAP_BPF CAP_PERFMON CAP_SYS_RESOURCE CAP_DAC_READ_SEARCH
NoNewPrivileges=false

StandardOutput=journal
StandardError=journal
SyslogIdentifier=telegen

[Install]
WantedBy=multi-user.target
EOF

Step 5: Start Service

sudo systemctl daemon-reload
sudo systemctl enable telegen
sudo systemctl start telegen

# Check status
sudo systemctl status telegen

# View logs
sudo journalctl -u telegen -f

Hot Reload Configuration

# Edit configuration
sudo vim /etc/telegen/config.yaml

# Reload without restart
sudo systemctl reload telegen
# Or send SIGHUP directly
sudo kill -HUP $(pidof telegen)

Docker Compose

Single Node Agent

# docker-compose.yaml
version: '3.8'

services:
  telegen:
    image: ghcr.io/mirastacklabs-ai/telegen:latest
    container_name: telegen
    restart: unless-stopped
    privileged: true
    pid: host
    network_mode: host
    
    environment:
      - TELEGEN_OTLP_ENDPOINT=otel-collector:4317
      - TELEGEN_ENVIRONMENT=production
      - TELEGEN_LOG_LEVEL=info
    
    volumes:
      - /sys:/sys:ro
      - /proc:/host/proc:ro
      - /sys/kernel/debug:/sys/kernel/debug
      - /sys/fs/bpf:/sys/fs/bpf
      - ./configs/telegen.yaml:/etc/telegen/config.yaml:ro
      - telegen-queue:/var/lib/telegen/queue
    
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:19090/healthz"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  telegen-queue:

Full Observability Stack

# docker-compose.full.yaml
version: '3.8'

services:
  # Telegen Agent
  telegen:
    image: ghcr.io/mirastacklabs-ai/telegen:latest
    container_name: telegen
    restart: unless-stopped
    privileged: true
    pid: host
    network_mode: host
    environment:
      - TELEGEN_OTLP_ENDPOINT=localhost:4317
    volumes:
      - /sys:/sys:ro
      - /proc:/host/proc:ro
      - /sys/kernel/debug:/sys/kernel/debug
      - /sys/fs/bpf:/sys/fs/bpf
      - ./configs/agent.yaml:/etc/telegen/config.yaml:ro

  # OpenTelemetry Collector
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    container_name: otel-collector
    restart: unless-stopped
    command: ["--config=/etc/otel/config.yaml"]
    volumes:
      - ./configs/otel-collector.yaml:/etc/otel/config.yaml:ro
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "8888:8888"   # Metrics

  # Prometheus
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.enable-remote-write-receiver'
    volumes:
      - ./configs/prometheus.yaml:/etc/prometheus/prometheus.yml:ro
      - prometheus-data:/prometheus
    ports:
      - "9090:9090"

  # Loki
  loki:
    image: grafana/loki:latest
    container_name: loki
    restart: unless-stopped
    command: -config.file=/etc/loki/config.yaml
    volumes:
      - ./configs/loki.yaml:/etc/loki/config.yaml:ro
      - loki-data:/loki
    ports:
      - "3100:3100"

  # Grafana
  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-data:/var/lib/grafana
    ports:
      - "3000:3000"

volumes:
  prometheus-data:
  loki-data:
  grafana-data:

Collector Mode (Non-eBPF)

# docker-compose.collector.yaml
version: '3.8'

services:
  telegen-collector:
    image: ghcr.io/mirastacklabs-ai/telegen:latest
    container_name: telegen-collector
    restart: unless-stopped
    
    environment:
      - TELEGEN_OTLP_ENDPOINT=otel-collector:4317
    
    volumes:
      - ./configs/collector.yaml:/etc/telegen/config.yaml:ro
    
    ports:
      - "19090:19090"  # Health/metrics
    
    # No privileged mode needed for collector mode

With collector config:

# configs/collector.yaml
telegen:
  mode: collector
  service_name: telegen-collector

pipeline:
  enabled: true
  limits:
    cardinality:
      enabled: true
      default_max_series: 50000
  pii_redaction:
    enabled: true
  export:
    otlp:
      endpoint: ${TELEGEN_OTLP_ENDPOINT}

collectors:
  prometheus:
    enabled: true
    scrape_interval: 30s
    targets:
      - name: node-exporter
        address: node-exporter:9100
      - name: cadvisor
        address: cadvisor:8080
        metrics_path: /metrics

Kubernetes

DaemonSet (Agent Mode)

# telegen-daemonset.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: telegen
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: telegen
  namespace: telegen
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: telegen
rules:
  - apiGroups: [""]
    resources: ["nodes", "pods", "services", "endpoints", "namespaces", "configmaps"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["apps"]
    resources: ["deployments", "replicasets", "daemonsets", "statefulsets"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["nodes/proxy", "nodes/stats"]
    verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: telegen
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: telegen
subjects:
  - kind: ServiceAccount
    name: telegen
    namespace: telegen
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: telegen-config
  namespace: telegen
data:
  config.yaml: |
    telegen:
      mode: agent
      service_name: telegen
      log_level: info
    
    pipeline:
      enabled: true
      
      limits:
        cardinality:
          enabled: true
          default_max_series: 10000
          global_max_series: 100000
        rate:
          enabled: true
          metrics_per_second: 100000
        attributes:
          enabled: true
          protected_attributes:
            - service.name
            - k8s.namespace.name
            - k8s.pod.name
      
      pii_redaction:
        enabled: true
        scan_log_bodies: true
      
      transform:
        enabled: true
        rules:
          - name: add-cluster
            match:
              signal_types: [metrics, traces, logs]
            actions:
              - type: set_attribute
                set_attribute:
                  key: k8s.cluster.name
                  value: ${CLUSTER_NAME}
      
      export:
        otlp:
          endpoint: ${OTLP_ENDPOINT}
          insecure: true
        batch:
          size: 1000
          timeout: 5s
        queue:
          enabled: true
          directory: /var/lib/telegen/queue
          max_size_bytes: 100000000
      
      operations:
        hot_reload:
          enabled: true
          enable_sighup: true
        shutdown:
          timeout: 30s
          drain_timeout: 10s
    
    agent:
      ebpf:
        enabled: true
        network:
          enabled: true
      profiling:
        enabled: true
      discovery:
        enabled: true
    
    kube_metrics:
      enabled: true
    
    self_telemetry:
      enabled: true
      listen: ":19090"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: telegen
  namespace: telegen
  labels:
    app: telegen
spec:
  selector:
    matchLabels:
      app: telegen
  template:
    metadata:
      labels:
        app: telegen
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "19090"
    spec:
      serviceAccountName: telegen
      hostPID: true
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      
      tolerations:
        - operator: Exists
          effect: NoSchedule
        - operator: Exists
          effect: NoExecute
      
      containers:
        - name: telegen
          image: ghcr.io/mirastacklabs-ai/telegen:latest
          imagePullPolicy: IfNotPresent
          
          args:
            - --config=/etc/telegen/config.yaml
          
          env:
            - name: OTLP_ENDPOINT
              value: "otel-collector.observability:4317"
            - name: CLUSTER_NAME
              value: "production"
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          
          resources:
            requests:
              cpu: 200m
              memory: 256Mi
            limits:
              cpu: 1000m
              memory: 1Gi
          
          securityContext:
            privileged: true
          
          volumeMounts:
            - name: config
              mountPath: /etc/telegen
              readOnly: true
            - name: sys
              mountPath: /sys
              readOnly: true
            - name: proc
              mountPath: /host/proc
              readOnly: true
            - name: debugfs
              mountPath: /sys/kernel/debug
            - name: bpffs
              mountPath: /sys/fs/bpf
            - name: queue
              mountPath: /var/lib/telegen/queue
          
          livenessProbe:
            httpGet:
              path: /healthz
              port: 19090
            initialDelaySeconds: 30
            periodSeconds: 30
          
          readinessProbe:
            httpGet:
              path: /readyz
              port: 19090
            initialDelaySeconds: 10
            periodSeconds: 10
      
      volumes:
        - name: config
          configMap:
            name: telegen-config
        - name: sys
          hostPath:
            path: /sys
        - name: proc
          hostPath:
            path: /proc
        - name: debugfs
          hostPath:
            path: /sys/kernel/debug
        - name: bpffs
          hostPath:
            path: /sys/fs/bpf
        - name: queue
          emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: telegen
  namespace: telegen
  labels:
    app: telegen
spec:
  type: ClusterIP
  ports:
    - name: metrics
      port: 19090
      targetPort: 19090
  selector:
    app: telegen

Apply to Cluster

# Create resources
kubectl apply -f telegen-daemonset.yaml

# Verify
kubectl -n telegen get pods
kubectl -n telegen logs -l app=telegen -f

# Check health
kubectl -n telegen exec ds/telegen -- curl -s localhost:19090/healthz

Helm

Quick Start

# Add repository
helm repo add telegen https://charts.mirastacklabs.ai
helm repo update

# Install with defaults
helm install telegen telegen/telegen -n telegen --create-namespace

# Install with custom values
helm install telegen telegen/telegen -n telegen --create-namespace \
  --set otlp.endpoint=otel-collector:4317 \
  --set pipeline.enabled=true \
  --set pipeline.limits.cardinality.enabled=true

Custom Values

# values.yaml
replicaCount: 1  # For DaemonSet, this is ignored

image:
  repository: ghcr.io/mirastacklabs-ai/telegen
  tag: latest
  pullPolicy: IfNotPresent

# OTLP configuration
otlp:
  endpoint: otel-collector.observability:4317
  insecure: true
  headers: {}

# Pipeline configuration
v3Pipeline:
  enabled: true
  
  limits:
    cardinality:
      enabled: true
      defaultMaxSeries: 10000
      globalMaxSeries: 100000
    rate:
      enabled: true
      metricsPerSecond: 100000
      tracesPerSecond: 50000
      logsPerSecond: 200000
    attributes:
      enabled: true
      maxResourceAttributes: 128
      protectedAttributes:
        - service.name
        - k8s.namespace.name
  
  piiRedaction:
    enabled: true
    scanLogBodies: true
  
  transform:
    enabled: true
    rules:
      - name: add-cluster
        actions:
          - type: set_attribute
            setAttribute:
              key: k8s.cluster.name
              value: "production"
  
  export:
    batch:
      size: 1000
      timeout: 5s
    queue:
      enabled: true
      maxSizeBytes: 100000000
  
  operations:
    hotReload:
      enabled: true
    shutdown:
      timeout: 30s

# Agent configuration
agent:
  ebpf:
    enabled: true
    network: true
  profiling:
    enabled: true
  discovery:
    enabled: true

# Resources
resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: 1000m
    memory: 1Gi

# Pod tolerations for scheduling on all nodes
tolerations:
  - operator: Exists
    effect: NoSchedule
  - operator: Exists
    effect: NoExecute

# Service monitor for Prometheus Operator
serviceMonitor:
  enabled: true
  interval: 30s

Install with Values File

helm install telegen telegen/telegen -n telegen --create-namespace -f values.yaml

OpenShift

OpenShift requires additional security context constraints (SCC).

Create SCC

# telegen-scc.yaml
apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints
metadata:
  name: telegen-scc
allowHostDirVolumePlugin: true
allowHostIPC: true
allowHostNetwork: true
allowHostPID: true
allowHostPorts: true
allowPrivilegedContainer: true
allowedCapabilities:
  - SYS_ADMIN
  - SYS_PTRACE
  - NET_ADMIN
  - BPF
  - PERFMON
  - SYS_RESOURCE
  - DAC_READ_SEARCH
fsGroup:
  type: RunAsAny
readOnlyRootFilesystem: false
runAsUser:
  type: RunAsAny
seLinuxContext:
  type: RunAsAny
supplementalGroups:
  type: RunAsAny
users:
  - system:serviceaccount:telegen:telegen
volumes:
  - '*'

Apply and Deploy

# Create SCC
oc apply -f telegen-scc.yaml

# Create project
oc new-project telegen

# Deploy (use the Kubernetes DaemonSet YAML from above)
oc apply -f telegen-daemonset.yaml

# Verify
oc get pods -n telegen

AWS ECS

Task Definition

{
  "family": "telegen",
  "networkMode": "host",
  "pidMode": "host",
  "requiresCompatibilities": ["EC2"],
  "executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::ACCOUNT:role/telegenTaskRole",
  "containerDefinitions": [
    {
      "name": "telegen",
      "image": "ghcr.io/mirastacklabs-ai/telegen:latest",
      "essential": true,
      "privileged": true,
      "environment": [
        {"name": "TELEGEN_OTLP_ENDPOINT", "value": "otel-collector.internal:4317"},
        {"name": "TELEGEN_ENVIRONMENT", "value": "production"}
      ],
      "mountPoints": [
        {"sourceVolume": "sys", "containerPath": "/sys", "readOnly": true},
        {"sourceVolume": "proc", "containerPath": "/host/proc", "readOnly": true},
        {"sourceVolume": "debugfs", "containerPath": "/sys/kernel/debug"},
        {"sourceVolume": "bpffs", "containerPath": "/sys/fs/bpf"}
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/telegen",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "telegen"
        }
      },
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:19090/healthz || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3
      },
      "cpu": 256,
      "memory": 512
    }
  ],
  "volumes": [
    {"name": "sys", "host": {"sourcePath": "/sys"}},
    {"name": "proc", "host": {"sourcePath": "/proc"}},
    {"name": "debugfs", "host": {"sourcePath": "/sys/kernel/debug"}},
    {"name": "bpffs", "host": {"sourcePath": "/sys/fs/bpf"}}
  ]
}

Service (Daemon Scheduling)

{
  "serviceName": "telegen",
  "cluster": "production",
  "taskDefinition": "telegen",
  "schedulingStrategy": "DAEMON",
  "deploymentConfiguration": {
    "maximumPercent": 100,
    "minimumHealthyPercent": 0
  }
}

Verification

After deployment, verify Telegen is working:

# Check health
curl http://localhost:19090/healthz
# Expected: {"status":"healthy"}

# Check readiness
curl http://localhost:19090/readyz
# Expected: {"status":"ready"}

# Check metrics
curl http://localhost:19090/metrics | head -20

# Check pipeline stats
curl http://localhost:19090/debug/pipeline/stats

# View logs
# systemd: journalctl -u telegen -f
# Docker: docker logs telegen -f
# Kubernetes: kubectl -n telegen logs -l app=telegen -f

Troubleshooting

Common Issues

Issue Cause Solution
eBPF not starting Missing capabilities Run privileged or add CAP_BPF
No metrics exported OTLP endpoint unreachable Check network/firewall
High memory usage Cardinality explosion Enable cardinality limits
Config not reloading SIGHUP not working Check hot_reload.enabled
Data loss on restart No persistent queue Enable queue.enabled

Debug Commands

# Check kernel version
uname -r  # Must be 4.18+

# Check eBPF support
ls /sys/fs/bpf

# Check capabilities (container)
capsh --print

# Test OTLP connectivity
nc -zv otel-collector 4317

Logs

# Increase log level
# In config.yaml: log_level: debug

# Filter errors only
journalctl -u telegen | grep -i error

Next Steps