Pipeline Configuration

Complete reference for the unified pipeline configuration.

Overview

The unified pipeline provides enhanced data quality controls, transformation capabilities, and operational features. Enable it by setting:

pipeline:
  enabled: true

Data Quality Limits

Cardinality Limiter

Prevents metric cardinality explosion that can overwhelm backends.

pipeline:
  limits:
    cardinality:
      enabled: true
      
      # Per-metric series limit (unique label combinations)
      default_max_series: 10000
      
      # Global limit across all metrics
      global_max_series: 100000
      
      # Per-metric overrides for high-cardinality metrics
      metric_limits:
        http_request_duration_seconds: 50000
        api_requests_total: 20000
      
      # How long to remember series (for cleanup)
      series_ttl: 1h
      
      # Action when limit reached
      # "drop" - silently drop new series
      # "hash_labels" - hash label values to reduce cardinality
      on_limit: drop

Parameter	Type	Default	Description
`enabled`	bool	false	Enable cardinality limiting
`default_max_series`	int	10000	Default per-metric series limit
`global_max_series`	int	100000	Total series limit across all metrics
`metric_limits`	map	{}	Per-metric overrides
`series_ttl`	duration	1h	Time to remember series for cleanup
`on_limit`	string	drop	Action when limit reached

Rate Limiter

Controls data ingestion rate to protect backends.

pipeline:
  limits:
    rate:
      enabled: true
      
      # Maximum data points/spans/logs per second
      metrics_per_second: 100000
      traces_per_second: 50000
      logs_per_second: 200000
      
      # Allow temporary bursts
      burst_multiplier: 2.0
      
      # Action when limit reached
      on_limit: drop

Parameter	Type	Default	Description
`enabled`	bool	false	Enable rate limiting
`metrics_per_second`	int	100000	Max metric data points/second
`traces_per_second`	int	50000	Max spans/second
`logs_per_second`	int	200000	Max log records/second
`burst_multiplier`	float	2.0	Allow this multiple for bursts
`on_limit`	string	drop	Action when limit reached

Attribute Limiter

Controls attribute counts and sizes to reduce payload size.

pipeline:
  limits:
    attributes:
      enabled: true
      
      # Maximum attributes per level
      max_resource_attributes: 128
      max_scope_attributes: 64
      max_data_point_attributes: 32
      
      # Maximum value sizes
      max_attribute_value_size: 4096
      max_attribute_key_size: 256
      
      # Protected attributes (never dropped or truncated)
      protected_attributes:
        - service.name
        - service.namespace
        - k8s.pod.name
        - k8s.namespace.name

Parameter	Type	Default	Description
`enabled`	bool	false	Enable attribute limiting
`max_resource_attributes`	int	128	Max attributes on resource
`max_scope_attributes`	int	64	Max attributes on scope
`max_data_point_attributes`	int	32	Max attributes on data points
`max_attribute_value_size`	int	4096	Max string value length
`max_attribute_key_size`	int	256	Max key length
`protected_attributes`	[]string	[]	Never drop or truncate these

Signal Transformation

Transform Rules

Apply rule-based transformations to signals before export.

pipeline:
  transform:
    enabled: true
    rules:
      # Add cluster information
      - name: add-cluster-info
        enabled: true
        match:
          signal_types: [metrics, traces, logs]
        actions:
          - type: set_attribute
            set_attribute:
              key: k8s.cluster.name
              value: production

      # Filter debug metrics
      - name: drop-debug-metrics
        enabled: true
        match:
          signal_types: [metrics]
          metric_names:
            - "^debug_.*"
            - "^internal_.*"
        actions:
          - type: filter
            filter:
              drop: true

      # Hash sensitive data
      - name: hash-user-ids
        enabled: true
        match:
          signal_types: [traces]
          resource_attributes:
            service.name: "user-service"
        actions:
          - type: hash_attribute
            hash_attribute:
              key: user.id
              algorithm: sha256

Rule Structure

Field	Type	Description
`name`	string	Rule identifier
`enabled`	bool	Enable/disable rule
`match`	object	Conditions for applying rule
`actions`	[]object	Actions to perform

Match Conditions

Field	Type	Description
`signal_types`	[]string	Signal types: metrics, traces, logs
`resource_attributes`	map	Match on resource attribute values
`metric_names`	[]string	Regex patterns for metric names
`span_names`	[]string	Regex patterns for span names
`log_bodies`	[]string	Regex patterns for log bodies

Action Types

set_attribute - Add or update an attribute

- type: set_attribute
  set_attribute:
    key: environment
    value: ${ENVIRONMENT}  # Supports env vars

delete_attribute - Remove an attribute

- type: delete_attribute
  delete_attribute:
    key: internal.debug.info

rename_attribute - Rename an attribute key

- type: rename_attribute
  rename_attribute:
    old_key: host.hostname
    new_key: host.name

hash_attribute - Hash an attribute value

- type: hash_attribute
  hash_attribute:
    key: user.id
    algorithm: sha256  # sha256, sha512, xxhash
    salt: ${HASH_SALT}  # Optional salt

filter - Drop matching signals

- type: filter
  filter:
    drop: true

transform - Regex transformation

- type: transform
  transform:
    key: http.url
    pattern: "([?&])password=[^&]*"
    replacement: "${1}password=***"

PII Redaction

Automatically detect and mask personally identifiable information.

pipeline:
  pii_redaction:
    enabled: true
    
    # Mask string
    redaction_string: "[REDACTED]"
    
    # Scan log message bodies (impacts performance)
    scan_log_bodies: true
    
    # Scan span names
    scan_span_names: false
    
    # Use hash instead of mask (preserves uniqueness)
    hash_redaction: false
    
    # Attributes that should never be scanned
    allowed_attributes:
      - service.name
      - k8s.pod.name
      - http.route
    
    # PII detection rules
    rules:
      - name: email
        type: email
        enabled: true
      - name: phone
        type: phone
        enabled: true
      - name: ssn
        type: ssn
        enabled: true
      - name: credit_card
        type: credit_card
        enabled: true
      - name: jwt
        type: jwt
        enabled: true
      - name: api_key
        type: api_key
        enabled: true
      
      # Custom pattern
      - name: internal_id
        type: regex
        enabled: true
        pattern: "INTERNAL-[A-Z0-9]{8}"

Built-in PII Types

Type	Pattern	Example
`email`	Email addresses	`user@example.com`
`phone`	Phone numbers	`555-123-4567`
`ssn`	Social Security Numbers	`123-45-6789`
`credit_card`	Credit card numbers	`4111-1111-1111-1111`
`ipv4`	IPv4 addresses	`192.168.1.1`
`ipv6`	IPv6 addresses	`2001:db8::1`
`jwt`	JWT tokens	`eyJhbG...`
`api_key`	API keys	`sk-xxx`, `AKIA...`
`password`	Password-like strings	(configurable)
`regex`	Custom regex pattern	User-defined

Export Configuration

OTLP Export

pipeline:
  export:
    otlp:
      endpoint: otel-collector:4317
      protocol: grpc  # grpc or http
      insecure: true
      
      # TLS configuration
      tls:
        cert_file: /etc/telegen/certs/client.crt
        key_file: /etc/telegen/certs/client.key
        ca_file: /etc/telegen/certs/ca.crt
        insecure_skip_verify: false
      
      # Headers
      headers:
        X-API-Key: ${OTLP_API_KEY}
        Authorization: Bearer ${OTLP_TOKEN}
      
      # Timeouts
      timeout: 30s
      
      # Retry configuration
      retry:
        enabled: true
        max_attempts: 3
        initial_interval: 1s
        max_interval: 30s
        backoff_multiplier: 2.0

Batching

pipeline:
  export:
    batch:
      # Items per batch
      size: 1000
      
      # Max wait before flush
      timeout: 5s
      
      # Minimum batch size to send immediately
      send_batch_size: 500

Multi-Endpoint Export

Support failover, round-robin, or fan-out to multiple endpoints.

pipeline:
  export:
    multi_endpoint:
      enabled: true
      
      # Mode: failover, round_robin, fanout
      mode: failover
      
      endpoints:
        - name: primary
          endpoint: primary-collector:4317
          priority: 1
        
        - name: secondary
          endpoint: secondary-collector:4317
          priority: 2
        
        - name: archive
          endpoint: archive-collector:4317
          mode: fanout  # Always send regardless of mode

Persistent Queue

Survive restarts without data loss.

pipeline:
  export:
    queue:
      enabled: true
      directory: /var/lib/telegen/queue
      max_size_bytes: 500000000  # 500MB
      max_items: 100000

Operations

Hot Reload

Reload configuration without restart.

pipeline:
  operations:
    hot_reload:
      enabled: true
      
      # Path to watch
      config_path: /etc/telegen/config.yaml
      
      # Check interval for file changes
      check_interval: 30s
      
      # Enable SIGHUP reload
      enable_sighup: true
      
      # Validation timeout
      validation_timeout: 10s
      
      # Auto-rollback on error
      rollback_on_error: true

Trigger reload:

# Send SIGHUP
kill -HUP $(pidof telegen)

# systemd
systemctl reload telegen

Graceful Shutdown

Drain in-flight data before stopping.

pipeline:
  operations:
    shutdown:
      # Total shutdown timeout
      timeout: 30s
      
      # Time to drain in-flight data
      drain_timeout: 10s
      
      # Mark unhealthy during shutdown
      enable_health_check: true

Environment Variables

All configuration values support environment variable substitution:

pipeline:
  export:
    otlp:
      endpoint: ${OTLP_ENDPOINT:-otel-collector:4317}
      headers:
        Authorization: Bearer ${OTLP_TOKEN}
  
  transform:
    rules:
      - name: add-env
        actions:
          - type: set_attribute
            set_attribute:
              key: environment
              value: ${ENVIRONMENT:-production}

Variable	Description
`${VAR}`	Value of VAR, error if unset
`${VAR:-default}`	Value of VAR, or “default” if unset
`${VAR:?error}`	Value of VAR, or error message if unset

Complete Example

telegen:
  mode: agent
  service_name: telegen
  log_level: info

pipeline:
  enabled: true
  
  limits:
    cardinality:
      enabled: true
      default_max_series: 10000
      global_max_series: 100000
    rate:
      enabled: true
      metrics_per_second: 100000
      traces_per_second: 50000
      logs_per_second: 200000
    attributes:
      enabled: true
      max_resource_attributes: 128
      protected_attributes:
        - service.name
        - k8s.namespace.name
  
  transform:
    enabled: true
    rules:
      - name: add-cluster
        match:
          signal_types: [metrics, traces, logs]
        actions:
          - type: set_attribute
            set_attribute:
              key: k8s.cluster.name
              value: ${CLUSTER_NAME:-default}
  
  pii_redaction:
    enabled: true
    scan_log_bodies: true
  
  export:
    otlp:
      endpoint: ${OTLP_ENDPOINT:-otel-collector:4317}
      insecure: true
    batch:
      size: 1000
      timeout: 5s
    queue:
      enabled: true
      directory: /var/lib/telegen/queue
  
  operations:
    hot_reload:
      enabled: true
      enable_sighup: true
    shutdown:
      timeout: 30s
      drain_timeout: 10s

agent:
  ebpf:
    enabled: true
  profiling:
    enabled: true
  discovery:
    enabled: true

self_telemetry:
  enabled: true
  listen: ":19090"