Network Observability

Telegen provides deep network observability using eBPF.

Overview

Network observability includes:

  • DNS tracing - Query/response correlation
  • TCP metrics - RTT, retransmits, connection tracking
  • HTTP/gRPC tracing - Request/response details
  • Flow tracking - Connection topology
  • XDP packet analysis - High-performance packet inspection

DNS Tracing

What’s Captured

Field Description
Query Domain name, type (A, AAAA, CNAME)
Response Answer records, response code
Latency Query-to-response time
Server DNS server address

Sample Event

{
  "timestamp": "2024-01-15T10:30:00.123Z",
  "attributes": {
    "dns.question.name": "api.example.com",
    "dns.question.type": "A",
    "dns.response_code": "NOERROR",
    "dns.answers": ["10.0.1.100", "10.0.1.101"],
    "dns.latency_ms": 2.5,
    "net.peer.ip": "10.0.0.2",
    "net.peer.port": 53,
    "process.pid": 12345,
    "k8s.pod.name": "my-app-xyz"
  }
}

Configuration

agent:
  network:
    dns:
      enabled: true
      capture_queries: true
      capture_responses: true
      
      # Capture query/response content
      capture_content: true

TCP Metrics

Metrics Collected

Metric Description
tcp_rtt_us Round-trip time in microseconds
tcp_retransmits Packet retransmission count
tcp_connections Connection count
tcp_bytes_sent Bytes transmitted
tcp_bytes_received Bytes received

Connection Tracking

# Metrics example
tcp_rtt_us{
  src_ip="10.0.1.50",
  dst_ip="10.0.2.100",
  dst_port="5432",
  k8s_src_pod="api-server",
  k8s_dst_service="postgres"
} 1250

tcp_retransmits_total{
  src_ip="10.0.1.50",
  dst_ip="10.0.2.100",
  dst_port="5432"
} 3

Configuration

agent:
  network:
    tcp:
      enabled: true
      rtt: true
      retransmits: true
      connection_tracking: true
      
      # Flow sampling (1 in N connections)
      sample_rate: 1  # Capture all

HTTP/gRPC Tracing

HTTP Details

Field Description
http.method GET, POST, PUT, DELETE, etc.
http.url Full request URL
http.route Matched route pattern
http.status_code Response status
http.request_content_length Request body size
http.response_content_length Response body size

gRPC Details

Field Description
rpc.system grpc
rpc.service Service name
rpc.method Method name
rpc.grpc.status_code gRPC status code

Configuration

agent:
  ebpf:
    network:
      enabled: true
      http: true
      grpc: true
      
      # URL/path filtering
      exclude_paths:
        - "/health"
        - "/healthz"
        - "/ready"
        - "/metrics"
        - "/favicon.ico"
      
      # Capture request/response headers
      capture_headers:
        - "content-type"
        - "user-agent"
        - "x-request-id"

Service Topology

Telegen automatically builds a service dependency map:

flowchart LR
    subgraph External
        LB["Load Balancer"]
    end
    
    subgraph Cluster["Kubernetes Cluster"]
        FE["Frontend"]
        API["API Gateway"]
        US["User Service"]
        OS["Order Service"]
        PG["PostgreSQL"]
        RD["Redis"]
        KF["Kafka"]
    end
    
    LB -->|HTTP| FE
    FE -->|HTTP| API
    API -->|gRPC| US
    API -->|gRPC| OS
    US -->|SQL| PG
    OS -->|SQL| PG
    API -->|TCP| RD
    OS -->|Produce| KF

Topology Data

topology:
  nodes:
    - id: "api-gateway"
      type: "service"
      attributes:
        k8s.deployment: "api-gateway"
        k8s.namespace: "default"
    
    - id: "user-service"
      type: "service"
      attributes:
        k8s.deployment: "user-service"
        k8s.namespace: "default"
  
  edges:
    - source: "api-gateway"
      target: "user-service"
      attributes:
        protocol: "grpc"
        requests_per_second: 150
        avg_latency_ms: 12
        error_rate: 0.01

XDP Packet Analysis

For high-performance packet inspection at the NIC level:

Configuration

agent:
  network:
    xdp:
      enabled: true
      
      # Sample rate (1 in N packets)
      sample_rate: 1000  # 0.1% of packets
      
      # Interfaces to attach
      interfaces:
        - eth0
        - eth1
      
      # Packet filters
      filters:
        # Only specific ports
        ports:
          - 80
          - 443
          - 8080
        
        # Only specific protocols
        protocols:
          - tcp
          - udp

Use Cases

  • DDoS detection - High packet rate anomalies
  • Protocol analysis - Non-HTTP traffic inspection
  • Network debugging - Low-level packet issues

Network Metrics

RED Metrics (Rate, Errors, Duration)

# Request rate by service
sum(rate(http_server_requests_total[5m])) by (service_name)

# Error rate
sum(rate(http_server_requests_total{status_code=~"5.."}[5m])) 
/ sum(rate(http_server_requests_total[5m]))

# Latency percentiles
histogram_quantile(0.99, 
  sum(rate(http_server_duration_bucket[5m])) by (le, service_name)
)

Connection Metrics

# Active connections by service pair
telegen_tcp_connections{state="established"}

# Connection errors
sum(rate(telegen_tcp_connection_errors_total[5m])) by (error_type)

# Retransmit rate
sum(rate(telegen_tcp_retransmits_total[5m])) 
/ sum(rate(telegen_tcp_segments_total[5m]))

DNS Metrics

# DNS query rate
sum(rate(telegen_dns_queries_total[5m])) by (domain)

# DNS latency
histogram_quantile(0.95, 
  sum(rate(telegen_dns_latency_bucket[5m])) by (le)
)

# DNS errors
sum(rate(telegen_dns_queries_total{response_code!="NOERROR"}[5m]))

Interface Filtering

Control which network interfaces are monitored:

agent:
  network:
    # Include specific interfaces
    interfaces:
      - eth0
      - ens5
    
    # Or exclude interfaces
    exclude_interfaces:
      - lo        # Loopback
      - docker0   # Docker bridge
      - veth*     # Container veths

Port Filtering

Focus on specific ports:

agent:
  ebpf:
    network:
      # Only trace these ports
      include_ports:
        - 80
        - 443
        - 8080
        - 3000
        - 5432
        - 6379
      
      # Or exclude ports
      exclude_ports:
        - 22    # SSH
        - 2379  # etcd
        - 2380  # etcd peer

Network Security

Suspicious Connection Detection

agent:
  network:
    security:
      enabled: true
      
      # Detect connections to unusual ports
      suspicious_ports:
        - 4444   # Common reverse shell
        - 31337  # Elite port
      
      # Detect connections to external IPs
      external_connection_alerts: true
      
      # Known bad IP lists
      blocklists:
        - "/etc/telegen/ip-blocklist.txt"

Example Alert

{
  "timestamp": "2024-01-15T10:30:00Z",
  "severity": "WARNING",
  "body": "Suspicious outbound connection to known bad IP",
  "attributes": {
    "network.event_type": "suspicious_connection",
    "net.peer.ip": "198.51.100.50",
    "net.peer.port": 4444,
    "process.pid": 12345,
    "process.executable.path": "/tmp/shell",
    "k8s.pod.name": "compromised-pod"
  }
}

Performance Considerations

Overhead

Feature CPU Impact Memory Impact
TCP metrics ~0.5% 10MB
DNS tracing ~0.2% 5MB
HTTP tracing ~1% 20MB
XDP (sampled) ~0.1% 5MB

Reducing Overhead

agent:
  network:
    # Reduce ring buffer size
    ring_buffer_size: 8388608  # 8MB instead of 16MB
    
    # Increase sampling
    tcp:
      sample_rate: 10  # 1 in 10 connections
    
    # Limit captured data
    http:
      max_body_capture: 0  # Don't capture bodies
      max_headers: 5       # Limit headers

Best Practices

1. Filter Noisy Traffic

Exclude health checks and internal traffic:

agent:
  ebpf:
    network:
      exclude_paths:
        - "/health*"
        - "/ready*"
        - "/metrics"
      exclude_ports:
        - 2379  # etcd
        - 10250 # kubelet

2. Use Appropriate Sampling

For high-traffic environments:

agent:
  network:
    tcp:
      sample_rate: 100  # 1% of connections
    xdp:
      sample_rate: 10000  # 0.01% of packets

3. Monitor Key Services

Focus on critical paths:

agent:
  network:
    include_ports:
      - 80    # HTTP
      - 443   # HTTPS
      - 5432  # PostgreSQL
      - 6379  # Redis

Messaging Protocols

Telegen captures tracing data for AMQP 0-9-1, CQL (Cassandra), and NATS at the eBPF level — no SDK instrumentation or configuration changes required.


AMQP 0-9-1 Tracing

AMQP 0-9-1 is the wire protocol used by RabbitMQ and other brokers. Telegen captures publish and consume operations at the channel level.

What’s Captured

Field Description
messaging.system rabbitmq
messaging.operation publish or process
messaging.destination.name Exchange name
messaging.rabbitmq.destination.routing_key Routing key
messaging.client_id AMQP channel ID
net.peer.ip / net.peer.port Broker address

Sample Span

{
  "name": "orders.created publish",
  "kind": "PRODUCER",
  "duration_ms": 0.8,
  "attributes": {
    "messaging.system": "rabbitmq",
    "messaging.operation": "publish",
    "messaging.destination.name": "events",
    "messaging.rabbitmq.destination.routing_key": "orders.created",
    "net.peer.ip": "10.0.2.50",
    "net.peer.port": 5672
  }
}

Configuration

agent:
  network:
    protocols:
      amqp:
        enabled: true
        capture_routing_key: true

CQL (Cassandra) Tracing

Telegen parses the Cassandra Query Language binary protocol (CQL v3–v5) to capture query statements, keyspaces, batch operations, and prepared statement execution.

See Database Tracing for the full Cassandra tracing reference.


NATS Tracing

NATS is a lightweight, text-based publish/subscribe messaging system. Telegen captures PUB, MSG, and subscription operations from the NATS wire protocol.

What’s Captured

Field Description
messaging.system nats
messaging.operation publish or process
messaging.destination.name Subject name
net.peer.ip / net.peer.port NATS server address

Sample Span

{
  "name": "sensor.readings publish",
  "kind": "PRODUCER",
  "duration_ms": 0.2,
  "attributes": {
    "messaging.system": "nats",
    "messaging.operation": "publish",
    "messaging.destination.name": "sensor.readings",
    "net.peer.ip": "10.0.3.10",
    "net.peer.port": 4222
  }
}

Configuration

agent:
  network:
    protocols:
      nats:
        enabled: true
        capture_subject: true

Connection Statistics

Telegen tracks byte-level connection statistics via TCP close events, providing a low-overhead measure of throughput per connection without full payload capture.

Metrics Emitted

Metric Type Labels Description
telegen.connection.bytes_sent Counter src, dst, port Bytes sent per connection lifetime
telegen.connection.bytes_received Counter src, dst, port Bytes received per connection lifetime

These metrics are emitted when a TCP connection closes and complement the per-request span data produced by the protocol parsers.

Configuration

agent:
  ebpf:
    conn_stats:
      enabled: true

Next Steps