Network Observability

Telegen provides deep network observability using eBPF.

Overview

Network observability includes:

DNS tracing - Query/response correlation
TCP metrics - RTT, retransmits, connection tracking
HTTP/gRPC tracing - Request/response details
Flow tracking - Connection topology
XDP packet analysis - High-performance packet inspection

DNS Tracing

What’s Captured

Field	Description
Query	Domain name, type (A, AAAA, CNAME)
Response	Answer records, response code
Latency	Query-to-response time
Server	DNS server address

Sample Event

{
  "timestamp": "2024-01-15T10:30:00.123Z",
  "attributes": {
    "dns.question.name": "api.example.com",
    "dns.question.type": "A",
    "dns.response_code": "NOERROR",
    "dns.answers": ["10.0.1.100", "10.0.1.101"],
    "dns.latency_ms": 2.5,
    "net.peer.ip": "10.0.0.2",
    "net.peer.port": 53,
    "process.pid": 12345,
    "k8s.pod.name": "my-app-xyz"
  }
}

Configuration

agent:
  network:
    dns:
      enabled: true
      capture_queries: true
      capture_responses: true
      
      # Capture query/response content
      capture_content: true

TCP Metrics

Metrics Collected

Metric	Description
`tcp_rtt_us`	Round-trip time in microseconds
`tcp_retransmits`	Packet retransmission count
`tcp_connections`	Connection count
`tcp_bytes_sent`	Bytes transmitted
`tcp_bytes_received`	Bytes received

Connection Tracking

# Metrics example
tcp_rtt_us{
  src_ip="10.0.1.50",
  dst_ip="10.0.2.100",
  dst_port="5432",
  k8s_src_pod="api-server",
  k8s_dst_service="postgres"
} 1250

tcp_retransmits_total{
  src_ip="10.0.1.50",
  dst_ip="10.0.2.100",
  dst_port="5432"
} 3

Configuration

agent:
  network:
    tcp:
      enabled: true
      rtt: true
      retransmits: true
      connection_tracking: true
      
      # Flow sampling (1 in N connections)
      sample_rate: 1  # Capture all

HTTP/gRPC Tracing

HTTP Details

Field	Description
`http.method`	GET, POST, PUT, DELETE, etc.
`http.url`	Full request URL
`http.route`	Matched route pattern
`http.status_code`	Response status
`http.request_content_length`	Request body size
`http.response_content_length`	Response body size

gRPC Details

Field	Description
`rpc.system`	grpc
`rpc.service`	Service name
`rpc.method`	Method name
`rpc.grpc.status_code`	gRPC status code

Configuration

agent:
  ebpf:
    network:
      enabled: true
      http: true
      grpc: true
      
      # URL/path filtering
      exclude_paths:
        - "/health"
        - "/healthz"
        - "/ready"
        - "/metrics"
        - "/favicon.ico"
      
      # Capture request/response headers
      capture_headers:
        - "content-type"
        - "user-agent"
        - "x-request-id"

Service Topology

Telegen automatically builds a service dependency map:

flowchart LR
    subgraph External
        LB["Load Balancer"]
    end
    
    subgraph Cluster["Kubernetes Cluster"]
        FE["Frontend"]
        API["API Gateway"]
        US["User Service"]
        OS["Order Service"]
        PG["PostgreSQL"]
        RD["Redis"]
        KF["Kafka"]
    end
    
    LB -->|HTTP| FE
    FE -->|HTTP| API
    API -->|gRPC| US
    API -->|gRPC| OS
    US -->|SQL| PG
    OS -->|SQL| PG
    API -->|TCP| RD
    OS -->|Produce| KF

Topology Data

topology:
  nodes:
    - id: "api-gateway"
      type: "service"
      attributes:
        k8s.deployment: "api-gateway"
        k8s.namespace: "default"
    
    - id: "user-service"
      type: "service"
      attributes:
        k8s.deployment: "user-service"
        k8s.namespace: "default"
  
  edges:
    - source: "api-gateway"
      target: "user-service"
      attributes:
        protocol: "grpc"
        requests_per_second: 150
        avg_latency_ms: 12
        error_rate: 0.01

XDP Packet Analysis

For high-performance packet inspection at the NIC level:

Configuration

agent:
  network:
    xdp:
      enabled: true
      
      # Sample rate (1 in N packets)
      sample_rate: 1000  # 0.1% of packets
      
      # Interfaces to attach
      interfaces:
        - eth0
        - eth1
      
      # Packet filters
      filters:
        # Only specific ports
        ports:
          - 80
          - 443
          - 8080
        
        # Only specific protocols
        protocols:
          - tcp
          - udp

Use Cases

DDoS detection - High packet rate anomalies
Protocol analysis - Non-HTTP traffic inspection
Network debugging - Low-level packet issues

Network Metrics

RED Metrics (Rate, Errors, Duration)

# Request rate by service
sum(rate(http_server_requests_total[5m])) by (service_name)

# Error rate
sum(rate(http_server_requests_total{status_code=~"5.."}[5m])) 
/ sum(rate(http_server_requests_total[5m]))

# Latency percentiles
histogram_quantile(0.99, 
  sum(rate(http_server_duration_bucket[5m])) by (le, service_name)
)

Connection Metrics

# Active connections by service pair
telegen_tcp_connections{state="established"}

# Connection errors
sum(rate(telegen_tcp_connection_errors_total[5m])) by (error_type)

# Retransmit rate
sum(rate(telegen_tcp_retransmits_total[5m])) 
/ sum(rate(telegen_tcp_segments_total[5m]))

DNS Metrics

# DNS query rate
sum(rate(telegen_dns_queries_total[5m])) by (domain)

# DNS latency
histogram_quantile(0.95, 
  sum(rate(telegen_dns_latency_bucket[5m])) by (le)
)

# DNS errors
sum(rate(telegen_dns_queries_total{response_code!="NOERROR"}[5m]))

Interface Filtering

Control which network interfaces are monitored:

agent:
  network:
    # Include specific interfaces
    interfaces:
      - eth0
      - ens5
    
    # Or exclude interfaces
    exclude_interfaces:
      - lo        # Loopback
      - docker0   # Docker bridge
      - veth*     # Container veths

Port Filtering

Focus on specific ports:

agent:
  ebpf:
    network:
      # Only trace these ports
      include_ports:
        - 80
        - 443
        - 8080
        - 3000
        - 5432
        - 6379
      
      # Or exclude ports
      exclude_ports:
        - 22    # SSH
        - 2379  # etcd
        - 2380  # etcd peer

Network Security

Suspicious Connection Detection

agent:
  network:
    security:
      enabled: true
      
      # Detect connections to unusual ports
      suspicious_ports:
        - 4444   # Common reverse shell
        - 31337  # Elite port
      
      # Detect connections to external IPs
      external_connection_alerts: true
      
      # Known bad IP lists
      blocklists:
        - "/etc/telegen/ip-blocklist.txt"

Example Alert

{
  "timestamp": "2024-01-15T10:30:00Z",
  "severity": "WARNING",
  "body": "Suspicious outbound connection to known bad IP",
  "attributes": {
    "network.event_type": "suspicious_connection",
    "net.peer.ip": "198.51.100.50",
    "net.peer.port": 4444,
    "process.pid": 12345,
    "process.executable.path": "/tmp/shell",
    "k8s.pod.name": "compromised-pod"
  }
}

Performance Considerations

Overhead

Feature	CPU Impact	Memory Impact
TCP metrics	~0.5%	10MB
DNS tracing	~0.2%	5MB
HTTP tracing	~1%	20MB
XDP (sampled)	~0.1%	5MB

Reducing Overhead

agent:
  network:
    # Reduce ring buffer size
    ring_buffer_size: 8388608  # 8MB instead of 16MB
    
    # Increase sampling
    tcp:
      sample_rate: 10  # 1 in 10 connections
    
    # Limit captured data
    http:
      max_body_capture: 0  # Don't capture bodies
      max_headers: 5       # Limit headers

Best Practices

1. Filter Noisy Traffic

Exclude health checks and internal traffic:

agent:
  ebpf:
    network:
      exclude_paths:
        - "/health*"
        - "/ready*"
        - "/metrics"
      exclude_ports:
        - 2379  # etcd
        - 10250 # kubelet

2. Use Appropriate Sampling

For high-traffic environments:

agent:
  network:
    tcp:
      sample_rate: 100  # 1% of connections
    xdp:
      sample_rate: 10000  # 0.01% of packets

3. Monitor Key Services

Focus on critical paths:

agent:
  network:
    include_ports:
      - 80    # HTTP
      - 443   # HTTPS
      - 5432  # PostgreSQL
      - 6379  # Redis

Messaging Protocols

Telegen captures tracing data for AMQP 0-9-1, CQL (Cassandra), and NATS at the eBPF level — no SDK instrumentation or configuration changes required.

AMQP 0-9-1 Tracing

AMQP 0-9-1 is the wire protocol used by RabbitMQ and other brokers. Telegen captures publish and consume operations at the channel level.

What’s Captured

Field	Description
`messaging.system`	`rabbitmq`
`messaging.operation`	`publish` or `process`
`messaging.destination.name`	Exchange name
`messaging.rabbitmq.destination.routing_key`	Routing key
`messaging.client_id`	AMQP channel ID
`net.peer.ip` / `net.peer.port`	Broker address

Sample Span

{
  "name": "orders.created publish",
  "kind": "PRODUCER",
  "duration_ms": 0.8,
  "attributes": {
    "messaging.system": "rabbitmq",
    "messaging.operation": "publish",
    "messaging.destination.name": "events",
    "messaging.rabbitmq.destination.routing_key": "orders.created",
    "net.peer.ip": "10.0.2.50",
    "net.peer.port": 5672
  }
}

Configuration

agent:
  network:
    protocols:
      amqp:
        enabled: true
        capture_routing_key: true

CQL (Cassandra) Tracing

Telegen parses the Cassandra Query Language binary protocol (CQL v3–v5) to capture query statements, keyspaces, batch operations, and prepared statement execution.

See Database Tracing for the full Cassandra tracing reference.

NATS Tracing

NATS is a lightweight, text-based publish/subscribe messaging system. Telegen captures PUB, MSG, and subscription operations from the NATS wire protocol.

What’s Captured

Field	Description
`messaging.system`	`nats`
`messaging.operation`	`publish` or `process`
`messaging.destination.name`	Subject name
`net.peer.ip` / `net.peer.port`	NATS server address

Sample Span

{
  "name": "sensor.readings publish",
  "kind": "PRODUCER",
  "duration_ms": 0.2,
  "attributes": {
    "messaging.system": "nats",
    "messaging.operation": "publish",
    "messaging.destination.name": "sensor.readings",
    "net.peer.ip": "10.0.3.10",
    "net.peer.port": 4222
  }
}

Configuration

agent:
  network:
    protocols:
      nats:
        enabled: true
        capture_subject: true

Connection Statistics

Telegen tracks byte-level connection statistics via TCP close events, providing a low-overhead measure of throughput per connection without full payload capture.

Metrics Emitted

Metric	Type	Labels	Description
`telegen.connection.bytes_sent`	Counter	src, dst, port	Bytes sent per connection lifetime
`telegen.connection.bytes_received`	Counter	src, dst, port	Bytes received per connection lifetime

These metrics are emitted when a TCP connection closes and complement the per-request span data produced by the protocol parsers.

Configuration

agent:
  ebpf:
    conn_stats:
      enabled: true

Next Steps

Database Tracing - Deep database network tracing
Security Observability - Network security events
Agent Mode - Network configuration