Auto-Discovery

Telegen automatically discovers your infrastructure, cloud environment, and running applications.

Overview

When Telegen starts, it automatically:

  1. Detects cloud provider - AWS, GCP, Azure, etc.
  2. Discovers Kubernetes metadata - Pods, services, namespaces
  3. Identifies application runtimes - Go, Java, Python, Node.js
  4. Maps network topology - Services, connections, dependencies

No configuration required.


Cloud Detection

Telegen queries cloud metadata services to identify the environment:

Provider Detection Method Metadata Collected
AWS IMDS v1/v2 Instance ID, region, AZ, instance type, AMI
GCP Metadata server Instance ID, zone, machine type, project
Azure IMDS VM ID, location, VM size, subscription
DigitalOcean Metadata service Droplet ID, region, size
Alibaba Cloud Metadata service Instance ID, region, zone

Example AWS Metadata

# Automatically added to all telemetry
cloud.provider: aws
cloud.platform: aws_ec2
cloud.account.id: "123456789012"
cloud.region: us-east-1
cloud.availability_zone: us-east-1a
host.id: i-0abc123def456
host.type: m5.xlarge
host.image.id: ami-0abc123

Example Kubernetes Metadata

# Automatically added when running in K8s
k8s.cluster.name: production
k8s.namespace.name: default
k8s.pod.name: my-app-xyz123
k8s.pod.uid: a1b2c3d4-e5f6-7890-abcd-ef1234567890
k8s.deployment.name: my-app
k8s.node.name: ip-10-0-1-100.ec2.internal
k8s.container.name: app

Runtime Detection

Telegen identifies running application runtimes through process analysis:

Runtime Detection Method Auto-Instrumentation
Go Binary analysis, goroutine patterns ✅ HTTP, gRPC, database
Java JVM process, JFR integration ✅ Full JVM tracing
Python Interpreter detection ✅ HTTP, database, asyncio
Node.js V8 process patterns ✅ HTTP, database, async
.NET CoreCLR detection ✅ HTTP, database, EF Core
Ruby Interpreter detection ⚠️ Partial support
Rust Binary analysis ✅ Full tracing
C/C++ Binary analysis ✅ Network, syscalls

Example Runtime Metadata

# Automatically detected for a Go service
process.runtime.name: go
process.runtime.version: go1.21.5
process.executable.name: api-server
process.executable.path: /app/api-server
process.pid: 12345
process.command_line: /app/api-server --port=8080

Database Detection

Telegen identifies database connections and auto-traces queries:

Database Detection Tracing Support
PostgreSQL Port 5432, wire protocol ✅ Queries, latency, errors
MySQL Port 3306, wire protocol ✅ Queries, latency, errors
MongoDB Port 27017, wire protocol ✅ Operations, aggregations
Redis Port 6379, RESP protocol ✅ Commands, latency
Elasticsearch Port 9200, HTTP ✅ Queries, bulk ops

Message Queue Detection

Queue Detection Tracing Support
Kafka Port 9092, protocol ✅ Produce, consume, lag
RabbitMQ Port 5672, AMQP ✅ Publish, consume
Redis Pub/Sub Port 6379, RESP ✅ Publish, subscribe
NATS Port 4222 ✅ Publish, subscribe

Service Discovery

Telegen builds a topology map of all services:

flowchart LR
    subgraph Discovery["Auto-Discovery"]
        A["Frontend\n(Node.js)"]
        B["API Gateway\n(Go)"]
        C["Order Service\n(Java)"]
        D["User Service\n(Python)"]
        E["PostgreSQL"]
        F["Redis"]
        G["Kafka"]
    end
    
    A -->|HTTP| B
    B -->|gRPC| C
    B -->|gRPC| D
    C -->|SQL| E
    D -->|SQL| E
    B -->|Commands| F
    C -->|Produce| G

Service Metadata

# Automatically generated service topology
service.name: order-service
service.version: 1.2.3
service.namespace: production
service.instance.id: order-service-abc123

# Detected dependencies
dependencies:
  - service: postgres
    type: database
    protocol: postgresql
  - service: kafka
    type: message_queue
    protocol: kafka
  - service: user-service
    type: service
    protocol: grpc

Process Discovery (Port-Based & Path-Based)

Telegen discovers processes to instrument using port-based and/or path-based selection. Port-based discovery is often more reliable in containerized environments where executable paths vary.

Discovery Methods

Method Use Case Reliability in Containers
Port-based Known service ports (8080, 3000, etc.) ✅ High
Path-based Known executable patterns ⚠️ Medium (paths vary)
Kubernetes metadata Label/namespace selectors ✅ High
Combined Port + path + K8s metadata ✅ Highest precision

Port-Based Discovery

Discover services by the ports they listen on:

discovery:
  instrument:
    # Single port
    - open_ports: "8080"
    
    # Port range
    - open_ports: "8000-8999"
    
    # Multiple ports and ranges
    - open_ports: "80,443,3000,8080-8089"

Path-Based Discovery

Discover services by executable path patterns (glob syntax):

discovery:
  instrument:
    # Match any Java process
    - exe_path: "*java*"
    
    # Match specific application
    - exe_path: "/usr/bin/myapp"
    
    # Match Node.js processes
    - exe_path: "*node*"

Combined Discovery (AND Logic)

When multiple criteria are in one entry, ALL must match:

discovery:
  instrument:
    # Must be: Java process AND listening on port 8080
    - open_ports: "8080"
      exe_path: "*java*"
    
    # Must be: in production namespace AND on port 3000
    - k8s_namespace: "production"
      open_ports: "3000"

Kubernetes-Aware Discovery

Use Kubernetes metadata for precise targeting:

discovery:
  instrument:
    # By namespace
    - k8s_namespace: "production"
    
    # By deployment name
    - k8s_deployment_name: "api-gateway"
    
    # By pod labels
    - k8s_pod_labels:
        app: "frontend*"
        version: "v2*"
    
    # By pod annotations
    - k8s_pod_annotations:
        telegen.io/instrument: "true"
    
    # Combined: namespace + port
    - k8s_namespace: "production"
      open_ports: "8080-8089"

Container-Only Discovery

Limit discovery to containerized processes:

discovery:
  instrument:
    - containers_only: true
      open_ports: "8080"

Excluding Services

Exclude specific services from instrumentation (takes precedence over instrument):

discovery:
  instrument:
    - open_ports: "8080-8089"
  
  exclude_instrument:
    # Exclude health check services
    - open_ports: "9090"
    
    # Exclude test namespaces
    - k8s_namespace: "*-test"
    
    # Exclude by path
    - exe_path: "*health*"

Default Exclusions

Telegen excludes itself and common observability tools by default:

discovery:
  default_exclude_instrument:
    - exe_path: "*telegen*"
    - exe_path: "*alloy*"
    - exe_path: "*otelcol*"
    - k8s_namespace: "kube-system"
    - k8s_namespace: "monitoring"

Full Discovery Example

discovery:
  # Skip already-instrumented services
  exclude_otel_instrumented_services: true
  exclude_otel_instrumented_services_span_metrics: false
  
  # Use generic tracers for all languages
  skip_go_specific_tracers: false
  
  # What to instrument
  instrument:
    # Instrument common application ports
    - open_ports: "8080-8089"
    - open_ports: "3000,5000"
    
    # Instrument all Java apps in production
    - exe_path: "*java*"
      k8s_namespace: "production"
    
    # Instrument anything with our annotation
    - k8s_pod_annotations:
        telegen.io/instrument: "true"
  
  # What to exclude
  exclude_instrument:
    - k8s_namespace: "kube-system"
    - k8s_namespace: "monitoring"
    - open_ports: "9090"  # Prometheus
  
  # Timing
  min_process_age: 5s
  poll_interval: 5s

Configuration

Enabling/Disabling Discovery

agent:
  discovery:
    enabled: true
    interval: 30s
    
    # What to discover
    detect_cloud: true
    detect_kubernetes: true
    detect_runtimes: true
    detect_databases: true
    detect_message_queues: true

Cloud-Specific Settings

cloud:
  aws:
    enabled: true
    timeout: 200ms
    refresh_interval: 15m
    collect_tags: true
    tag_allowlist:
      - "app_*"
      - "env"
      - "team"
      - "cost_center"
  
  gcp:
    enabled: true
    timeout: 200ms
    refresh_interval: 15m
  
  azure:
    enabled: true
    timeout: 200ms
    refresh_interval: 15m

Kubernetes Settings

agent:
  kubernetes:
    enabled: true
    
    # Metadata to collect
    pod_metadata: true
    node_metadata: true
    service_metadata: true
    
    # Label filtering
    label_allowlist:
      - "app.kubernetes.io/*"
      - "helm.sh/*"
      - "app"
      - "version"
      - "team"
    
    # Namespace filtering
    namespace_include: []  # Empty = all
    namespace_exclude:
      - kube-system
      - kube-public
      - kube-node-lease

Resource Attributes

All discovered metadata is attached as OpenTelemetry resource attributes:

Cloud Attributes (Semantic Conventions)

Attribute Description
cloud.provider Cloud provider (aws, gcp, azure)
cloud.platform Platform (aws_ec2, gcp_compute_engine)
cloud.region Cloud region
cloud.availability_zone Availability zone
cloud.account.id Account/project ID
host.id Instance ID
host.type Instance type

Kubernetes Attributes

Attribute Description
k8s.cluster.name Cluster name
k8s.namespace.name Namespace
k8s.pod.name Pod name
k8s.pod.uid Pod UID
k8s.deployment.name Deployment name
k8s.replicaset.name ReplicaSet name
k8s.node.name Node name
k8s.container.name Container name

Process Attributes

Attribute Description
process.pid Process ID
process.executable.name Executable name
process.executable.path Full path
process.command_line Command line
process.runtime.name Runtime (go, java, python)
process.runtime.version Runtime version

Best Practices

1. Use Label Allowlists

Avoid collecting unnecessary labels that increase cardinality:

agent:
  kubernetes:
    label_allowlist:
      - "app"
      - "version"
      - "team"
    # NOT: "*" (collects everything)

2. Set Reasonable Timeouts

Fast timeouts prevent slow cloud APIs from blocking:

cloud:
  aws:
    timeout: 200ms  # Quick timeout
    refresh_interval: 15m  # Cache results

3. Exclude System Namespaces

Reduce noise from infrastructure components:

agent:
  kubernetes:
    namespace_exclude:
      - kube-system
      - kube-public
      - monitoring
      - logging

Next Steps