Distributed Tracing
Telegen provides zero-configuration distributed tracing using eBPF.
Overview
Telegen automatically traces:
- HTTP/HTTPS - All HTTP/1.1 and HTTP/2 traffic
- gRPC - All gRPC calls
- Database queries - PostgreSQL, MySQL, MongoDB, Redis
- Message queues - Kafka, RabbitMQ
- Internal function calls - For supported runtimes
No code changes or SDK integration required.
For targeted tracing, use **port-based discovery** to instrument only specific services.
See [Auto Discovery](auto-discovery) for details.
```yaml
discovery:
instrument:
- open_ports: "8080-8089" # Only trace these ports
```
How It Works
flowchart TB
subgraph Kernel["Linux Kernel"]
K["eBPF Programs"]
end
subgraph App["Application"]
A["HTTP Handler"]
B["gRPC Client"]
C["DB Query"]
end
subgraph Telegen["Telegen Agent"]
T["Trace Correlator"]
E["OTLP Exporter"]
end
K -->|"Intercept"| A
K -->|"Intercept"| B
K -->|"Intercept"| C
A --> K
B --> K
C --> K
K --> T
T --> E
E -->|"OTLP"| OC["OTel Collector"]
Trace Context Propagation
Telegen automatically extracts and propagates trace context:
- Incoming requests - Extract
traceparent/tracestatefrom headers - Outgoing requests - Inject trace context into outgoing calls
- Cross-service correlation - Link spans across service boundaries
Protocol Support
HTTP Tracing
# Automatically captured for every HTTP request
span:
name: "GET /api/users/{id}"
kind: SERVER
attributes:
http.method: GET
http.url: "https://api.example.com/api/users/123"
http.route: "/api/users/{id}"
http.status_code: 200
http.request_content_length: 0
http.response_content_length: 1234
http.user_agent: "curl/7.88.0"
net.peer.ip: "10.0.1.50"
net.peer.port: 45678
net.host.ip: "10.0.1.100"
net.host.port: 8080
gRPC Tracing
span:
name: "/users.UserService/GetUser"
kind: SERVER
attributes:
rpc.system: grpc
rpc.service: users.UserService
rpc.method: GetUser
rpc.grpc.status_code: 0
net.peer.ip: "10.0.1.50"
net.peer.port: 45678
Database Tracing
span:
name: "SELECT users"
kind: CLIENT
attributes:
db.system: postgresql
db.name: mydb
db.user: appuser
db.statement: "SELECT * FROM users WHERE id = $1"
db.operation: SELECT
db.sql.table: users
net.peer.ip: "10.0.2.100"
net.peer.port: 5432
Message Queue Tracing
# Kafka produce
span:
name: "orders send"
kind: PRODUCER
attributes:
messaging.system: kafka
messaging.destination.name: orders
messaging.kafka.partition: 3
messaging.kafka.message.offset: 12345
messaging.message.payload_size_bytes: 256
# Kafka consume
span:
name: "orders receive"
kind: CONSUMER
attributes:
messaging.system: kafka
messaging.destination.name: orders
messaging.kafka.consumer.group: order-processor
messaging.kafka.partition: 3
messaging.kafka.message.offset: 12345
Runtime-Specific Tracing
Go Applications
Telegen traces Go applications at the runtime level:
- Goroutine tracking - Track execution across goroutines
- HTTP handlers -
net/http, Gin, Echo, Chi, Fiber - gRPC - All gRPC calls
- Database drivers -
database/sql, pgx, go-redis
Java Applications
Integration with JFR (Java Flight Recorder):
- Method tracing - Hot methods and stack traces
- GC events - Garbage collection correlation
- Lock contention - Synchronized blocks and locks
- Thread events - Thread creation, blocking
Python Applications
- ASGI/WSGI - FastAPI, Django, Flask
- asyncio - Async operation tracking
- Database - psycopg2, SQLAlchemy, pymongo
Node.js Applications
- HTTP - Express, Fastify, Koa
- Async hooks - Promise and callback tracking
- Database - pg, mysql2, mongodb, redis
Trace Correlation
Automatic Signal Linking
Telegen automatically correlates:
flowchart LR
subgraph Request["Single Request"]
T["Trace\n(span_id: abc123)"]
M["Metrics\n(labeled: span_id=abc123)"]
L["Logs\n(trace_id, span_id)"]
P["Profile\n(span_id: abc123)"]
end
T --- M
T --- L
T --- P
Log Correlation
Logs are automatically enriched with trace context:
{
"timestamp": "2024-01-15T10:30:00Z",
"level": "info",
"message": "User created successfully",
"trace_id": "a1b2c3d4e5f6789012345678",
"span_id": "abc123def456",
"service.name": "user-service",
"k8s.pod.name": "user-service-xyz"
}
Metric Exemplars
Metrics include exemplars linking to traces:
http_server_duration:
type: histogram
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
exemplars:
- value: 0.045
trace_id: "a1b2c3d4e5f6789012345678"
span_id: "abc123def456"
Configuration
Basic Configuration
otlp:
endpoint: "otel-collector:4317"
traces:
enabled: true
sample_rate: 1.0 # 100% sampling
Sampling
otlp:
traces:
enabled: true
# Sample 10% of traces
sample_rate: 0.1
# Head-based sampling (default)
sampler: parent_based_traceidratio
Network Filtering
agent:
ebpf:
network:
enabled: true
http: true
grpc: true
# Exclude noisy endpoints
exclude_paths:
- "/health"
- "/healthz"
- "/ready"
- "/metrics"
# Exclude by port
exclude_ports:
- 22 # SSH
- 2379 # etcd
Database Query Settings
agent:
database:
# Capture full query text
capture_queries: true
# Sanitize sensitive data
sanitize_queries: true
# Max query length
max_query_length: 1024
# Capture query parameters
capture_parameters: false # Privacy consideration
Span Enrichment
Automatic Enrichment
All spans are automatically enriched with:
| Attribute | Source |
|---|---|
service.name |
Discovery or config |
service.version |
Binary analysis |
host.name |
System |
k8s.pod.name |
Kubernetes |
cloud.region |
Cloud metadata |
process.pid |
System |
Custom Attributes
Add custom attributes via environment variables:
# Kubernetes deployment
env:
- name: OTEL_RESOURCE_ATTRIBUTES
value: "team=platform,cost_center=engineering"
Performance Impact
Telegen is designed for minimal overhead:
| Metric | Overhead |
|---|---|
| Latency | < 100μs per request |
| CPU | < 1% additional |
| Memory | ~50MB for trace buffers |
| Network | Compressed OTLP batches |
Optimizations
- Ring buffers - Efficient kernel-to-userspace transfer
- Batching - Spans batched before export
- Compression - gzip compression by default
- Sampling - Configurable head-based sampling
Troubleshooting
Missing Traces
- Check eBPF status:
# Verify eBPF programs loaded bpftool prog list | grep telegen - Check OTLP connectivity:
# Verify endpoint is reachable curl -v http://otel-collector:4317 - Check sampling rate:
otlp: traces: sample_rate: 1.0 # Ensure 100% for debugging
Missing Span Correlation
- Verify trace context propagation:
- Check incoming requests have
traceparentheader - Verify W3C Trace Context format
- Check incoming requests have
- Check time synchronization:
- Ensure NTP is configured
- Spans may appear out of order with clock drift
Next Steps
- Continuous Profiling - Link profiles to traces
- Database Tracing - Deep database tracing
- Agent Mode - Trace configuration options