Performance Tuning
Optimize Telegen for your environment and workload.
Resource Guidelines
Default Resource Requirements
| Component | CPU | Memory |
|---|---|---|
| Agent (minimal) | 0.1 cores | 128MB |
| Agent (full features) | 0.5 cores | 512MB |
| Agent (high volume) | 1.0 cores | 1GB |
| Collector (SNMP) | 0.2 cores | 256MB |
| Collector (storage) | 0.3 cores | 384MB |
Kubernetes Resources
resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "1Gi"
Ring Buffer Tuning
The ring buffer is the primary channel for eBPF events.
Sizing
| Buffer Size | Use Case | Event Capacity |
|---|---|---|
| 4MB | Low traffic, testing | ~40K events |
| 16MB | Default, balanced | ~160K events |
| 64MB | High traffic | ~640K events |
| 256MB | Very high volume | ~2.5M events |
Configuration
agent:
ebpf:
ringbuf_size: 16777216 # 16MB (default)
Signs You Need Larger Buffer
# High loss rate
rate(telegen_ebpf_ringbuf_lost_total[5m]) > 100
If events are being lost, increase buffer size:
agent:
ebpf:
ringbuf_size: 67108864 # 64MB
CPU Optimization
Reduce Collection Overhead
- Limit traced ports
agent: ebpf: network: include_ports: - 80 - 443 - 8080 exclude_ports: - 22 - 2379 - Reduce syscall tracing
agent: ebpf: syscalls: enabled: false # Disable if not needed - Limit profiling frequency
agent: profiling: sample_rate: 49 # Lower than default 99 Hz
Parallel Processing
agent:
processing:
workers: 4 # Match available CPU cores
Memory Optimization
Queue Limits
queues:
traces:
mem_limit: "128Mi"
max_age: "1h"
batch_size: 256
metrics:
mem_limit: "64Mi"
max_age: "5m"
batch_size: 500
logs:
mem_limit: "128Mi"
max_age: "6h"
batch_size: 500
Reduce Cardinality
High cardinality labels increase memory:
agent:
kubernetes:
# Only essential labels
label_allowlist:
- "app"
- "version"
# NOT: "*"
Limit Active Connections Tracked
agent:
ebpf:
network:
# Limit tracked connections
max_connections: 50000 # Default: 100000
Network/Export Optimization
Compression
otlp:
compression: gzip # Reduce bandwidth
Batching
queues:
traces:
batch_size: 512 # Larger batches = fewer requests
flush_interval: 5s # Don't wait too long
Connection Pooling
otlp:
max_connections: 10 # Connection pool size
idle_timeout: 60s
Sampling
Head-Based Sampling
Sample at collection time:
otlp:
traces:
sample_rate: 0.1 # 10% of traces
Tail-Based Sampling
For more intelligent sampling, configure your OTel Collector:
# OTel Collector config
processors:
tail_sampling:
policies:
- name: errors
type: status_code
status_code: { status_codes: [ERROR] }
- name: slow
type: latency
latency: { threshold_ms: 1000 }
- name: sample
type: probabilistic
probabilistic: { sampling_percentage: 10 }
Per-Feature Tuning
Profiling
agent:
profiling:
# Lower sample rate for less overhead
sample_rate: 49 # Hz
# Longer upload interval
upload_interval: 120s
# Disable unused profile types
mutex: false
block: false
goroutine: false
Security Monitoring
agent:
security:
# Focus on critical syscalls only
syscall_audit:
syscalls:
- execve
- setuid
- ptrace
# NOT all syscalls
# Limit file paths
file_integrity:
paths:
- /etc/passwd
- /etc/shadow
# NOT: /var/**
Network Monitoring
agent:
network:
# Use sampling for high-volume
tcp:
sample_rate: 10 # 1 in 10 connections
# XDP sampling
xdp:
sample_rate: 1000 # 0.1% of packets
High-Volume Environments
Recommended Configuration
For environments with >10K requests/second:
telegen:
log_level: warn # Reduce logging
agent:
ebpf:
ringbuf_size: 134217728 # 128MB
perf_buffer_size: 32768 # 32KB per CPU
network:
exclude_paths:
- "/health*"
- "/ready*"
- "/metrics"
exclude_ports:
- 22
- 2379
- 2380
- 10250
resources:
cpu_limit: 2.0
memory_limit: "2Gi"
rate_limit:
spans_per_second: 100000
metrics_per_second: 200000
otlp:
compression: gzip
queues:
traces:
mem_limit: "512Mi"
batch_size: 1024
Low-Resource Environments
Minimal Configuration
For resource-constrained environments:
telegen:
log_level: error
agent:
ebpf:
ringbuf_size: 4194304 # 4MB
network:
enabled: true
http: true
grpc: false
dns: false
syscalls:
enabled: false
profiling:
enabled: false
security:
enabled: false
queues:
traces:
mem_limit: "64Mi"
batch_size: 128
Kubernetes Resources
resources:
requests:
cpu: "50m"
memory: "128Mi"
limits:
cpu: "200m"
memory: "256Mi"
Monitoring Performance
Key Metrics to Watch
# CPU usage
rate(telegen_process_cpu_seconds_total[5m])
# Memory usage
telegen_process_resident_memory_bytes
# Event loss rate
rate(telegen_ebpf_ringbuf_lost_total[5m]) / rate(telegen_ebpf_ringbuf_events_total[5m])
# Export latency
histogram_quantile(0.99, rate(telegen_export_latency_seconds_bucket[5m]))
# Queue depth
telegen_export_queue_size
Performance Alerts
groups:
- name: telegen-performance
rules:
- alert: TelegenHighCPU
expr: rate(telegen_process_cpu_seconds_total[5m]) > 0.8
for: 10m
labels:
severity: warning
annotations:
summary: "Telegen using high CPU"
- alert: TelegenHighMemory
expr: telegen_process_resident_memory_bytes > 1.5e9
for: 5m
labels:
severity: warning
annotations:
summary: "Telegen memory above 1.5GB"
- alert: TelegenExportSlow
expr: histogram_quantile(0.99, rate(telegen_export_latency_seconds_bucket[5m])) > 5
for: 5m
labels:
severity: warning
annotations:
summary: "Telegen export P99 latency high"
Benchmarking
Test Configuration
Before deploying changes, benchmark:
# Generate test load
hey -n 10000 -c 100 http://your-app:8080/api/test
# Monitor Telegen metrics
watch -n 1 'curl -s http://localhost:19090/metrics | grep -E "cpu|memory|lost"'
Compare Before/After
- Baseline current configuration
- Apply changes
- Run same load test
- Compare metrics
Best Practices Summary
- Start conservative - Begin with defaults, tune based on actual needs
- Monitor loss rates - If losing events, increase buffers
- Use sampling - For high-volume, sample rather than drop
- Filter noise - Exclude health checks, internal traffic
- Batch efficiently - Larger batches reduce export overhead
- Set limits - Protect against runaway memory usage
- Test changes - Benchmark before and after tuning
Next Steps
- Monitoring - Set up performance monitoring
- Troubleshooting - Diagnose performance issues
- Full Reference - All configuration options