Continuous Profiling
Telegen provides always-on, low-overhead profiling for production environments.
Overview
Continuous profiling helps you understand:
- CPU hotspots - Where is time being spent?
- Memory allocations - What’s allocating memory?
- Off-CPU time - What’s blocking or waiting?
- Lock contention - Where are threads competing?
All profiles are automatically correlated with traces and metrics.
Profile Types
| Profile Type | What It Measures | Use Case |
|---|---|---|
| CPU | On-CPU execution time | Find hot code paths |
| Off-CPU | Blocking/waiting time | Find I/O bottlenecks |
| Memory | Heap allocations | Find memory issues |
| Mutex | Lock contention | Find concurrency issues |
| Block | Blocking operations | Find synchronization issues |
| Goroutine | Goroutine stacks (Go) | Debug goroutine leaks |
How It Works
flowchart TB
subgraph Kernel["Linux Kernel"]
P["perf_event"]
U["uprobes"]
end
subgraph Telegen["Telegen Agent"]
S["Stack Sampler"]
A["Symbol Resolver"]
C["Profile Builder"]
E["OTLP Exporter"]
end
subgraph App["Application"]
F["Functions"]
end
P -->|"CPU samples"| S
U -->|"Alloc samples"| S
F --> P
F --> U
S --> A
A --> C
C --> E
E -->|"pprof"| Backend["Profiling Backend"]
Sampling
Telegen uses statistical sampling to minimize overhead:
- CPU profiling - Sample at 99Hz (configurable)
- Stack unwinding - Frame pointer and DWARF-based
- Symbol resolution - Binary analysis and debug symbols
- Aggregation - Aggregate identical stacks
Configuration
Enable Profiling
agent:
profiling:
enabled: true
Full Configuration
agent:
profiling:
enabled: true
# Sampling rate in Hz
sample_rate: 99 # 99 Hz avoids aliasing with timers
# Profile types
cpu: true # On-CPU time
off_cpu: true # Off-CPU waiting time
memory: true # Heap allocations
mutex: true # Lock contention (Go, Java)
block: true # Blocking operations (Go)
goroutine: true # Goroutine stacks (Go only)
# Duration of each profile sample
duration: 10s
# How often to upload profiles
upload_interval: 60s
# Symbol resolution
symbols:
demangle_rust: true
demangle_cpp: true
include_kernel: false # Include kernel symbols
# Filtering
filters:
# Minimum samples to include
min_samples: 1
# Exclude system functions
exclude_patterns:
- "runtime.*" # Go runtime
- "java.lang.*" # Java internals
CPU Profiling
CPU profiling shows where your application spends execution time.
Sample Output
Total CPU time: 60s
Flat Flat% Sum% Cum Cum% Name
15.20s 25.3% 25.3% 15.20s 25.3% encoding/json.(*decodeState).scanWhile
10.50s 17.5% 42.8% 25.70s 42.8% net/http.(*conn).serve
8.30s 13.8% 56.6% 8.30s 13.8% runtime.mallocgc
5.20s 8.7% 65.3% 5.20s 8.7% compress/gzip.(*Reader).Read
4.10s 6.8% 72.1% 4.10s 6.8% database/sql.(*DB).queryDC
Use Cases
- Identify hot functions - Find the most CPU-intensive code
- Compare before/after - Measure optimization impact
- Find regressions - Catch performance degradation
Off-CPU Profiling
Off-CPU profiling shows where your application is blocked or waiting.
What Causes Off-CPU Time
- I/O operations - Disk, network reads/writes
- Lock contention - Waiting for mutexes
- Sleep/yield - Explicit waits
- Scheduling - Waiting for CPU time
Sample Output
Total Off-CPU time: 45s
Flat Flat% Sum% Cum Cum% Name
12.30s 27.3% 27.3% 12.30s 27.3% syscall.Read
9.80s 21.8% 49.1% 9.80s 21.8% sync.(*Mutex).Lock
7.50s 16.7% 65.8% 7.50s 16.7% net.(*netFD).Read
5.20s 11.6% 77.4% 5.20s 11.6% database/sql.(*DB).conn
Memory Profiling
Memory profiling tracks heap allocations.
Configuration
agent:
profiling:
memory: true
# Track allocations or in-use memory
memory_mode: alloc # alloc or inuse
Sample Output
Total allocations: 1.2GB
Flat Flat% Sum% Cum Cum% Name
350MB 29.2% 29.2% 350MB 29.2% encoding/json.(*decodeState).object
220MB 18.3% 47.5% 220MB 18.3% bytes.makeSlice
180MB 15.0% 62.5% 180MB 15.0% net/http.(*persistConn).readLoop
Runtime-Specific Features
Go Profiling
Full goroutine and runtime profiling:
agent:
profiling:
# Go-specific profile types
goroutine: true # Goroutine stacks
mutex: true # Mutex contention
block: true # Blocking operations
Java Profiling
Integration with JFR (Java Flight Recorder):
agent:
profiling:
java:
jfr_enabled: true
# JFR event types
cpu: true
memory: true
gc: true
locks: true
Python Profiling
Frame-based profiling:
agent:
profiling:
python:
enabled: true
# AsyncIO support
asyncio: true
Profile-Trace Correlation
Profiles are automatically linked to traces:
flowchart LR
subgraph Request["Slow Request"]
T["Trace\nspan_id: abc123\nlatency: 2.3s"]
P["Profile\nspan_id: abc123\nCPU: 1.8s parsing"]
end
T -->|"Linked"| P
How It Works
- Span context - Profiles capture active span_id
- Time range - Profiles filtered to span duration
- Aggregation - Stack samples grouped by span
Viewing Correlated Profiles
In your tracing UI, slow spans will show:
- Profile link - Click to see CPU/memory profile
- Hot functions - Top functions during the span
- Flame graph - Visual stack trace
Symbol Resolution
Automatic Symbol Loading
Telegen resolves symbols from:
| Source | Priority |
|---|---|
| Debug symbols | Highest (DWARF, .debug_info) |
| Symbol table | High (.symtab) |
| Build ID | Medium (debuginfod lookup) |
| Binary name | Fallback |
Missing Symbols
If you see [unknown] in profiles:
- Compile with symbols:
# Go go build -gcflags="-N -l" ./... # C/C++ gcc -g -O2 main.c # Rust RUSTFLAGS="-C debuginfo=2" cargo build - Use debuginfod (Linux):
export DEBUGINFOD_URLS="https://debuginfod.elfutils.org/" - Include frame pointers:
# Go GOFLAGS="-buildmode=exe" go build # C/C++ gcc -fno-omit-frame-pointer main.c
Java Profiling
Telegen supports two approaches for profiling Java applications:
Approach Comparison
| Aspect | JFR (Recommended) | eBPF + perf-map-agent |
|---|---|---|
| JVM Flag Required | None | -XX:+PreserveFramePointer |
| JVM Restart | Not required | Not required |
| Symbol Accuracy | Always accurate | Depends on perf-map refresh |
| GC Events | ✅ Native | ❌ Not available |
| Lock Contention | ✅ Native | ❌ Not available |
| Kernel Stacks | ❌ JVM only | ✅ Full system view |
| Mixed-mode | ❌ JVM only | ✅ Java + native |
| Overhead | ~1% | ~1-2% |
JFR-Based Profiling (Recommended)
Java Flight Recorder (JFR) is built into the JVM and provides the most accurate profiling:
profiles:
jfr:
enabled: true
input_dirs:
- /var/log/jfr
# Export as OTLP Logs for unified pipeline
direct_export:
log_export:
enabled: true
endpoint: "http://otel-collector:4318/v1/logs"
JFR provides:
- Execution samples - CPU profiling with accurate Java symbols
- Allocation samples - Memory allocation tracking
- Lock contention - Monitor enter/exit timing
- GC events - Garbage collection correlation
eBPF-Based Java Profiling
For scenarios requiring kernel-level visibility or mixed-mode profiling (Java + native code), use eBPF with perf-map-agent:
profiling:
enabled: true
cpu:
enabled: true
# Enable Java symbol resolution for eBPF profiles
java_ebpf:
enabled: true
agent_jar_path: "/opt/perf-map-agent/attach-main.jar"
agent_lib_path: "/opt/perf-map-agent/libperfmap.so"
refresh_interval: 60s
unfold_all: true
dotted_class: true
# Export as OTLP Logs (same format as JFR)
log_export:
enabled: true
endpoint: "http://otel-collector:4318/v1/logs"
Requirements:
- JVM Flag - Java applications must be started with:
java -XX:+PreserveFramePointer -jar myapp.jar - perf-map-agent - Install from jvm-profiling-tools/perf-map-agent:
git clone https://github.com/jvm-profiling-tools/perf-map-agent cd perf-map-agent cmake . make # Copy to /opt/perf-map-agent/
How It Works:
sequenceDiagram
participant JVM as Java App
participant Agent as perf-map-agent
participant Telegen as Telegen eBPF
participant Map as /tmp/perf-PID.map
Telegen->>Agent: Attach to PID
Agent->>JVM: JVMTI attach
JVM->>Agent: JIT method addresses
Agent->>Map: Write symbol map
loop Profiling
Telegen->>JVM: eBPF CPU samples
Telegen->>Map: Resolve addresses
Telegen->>Telegen: Build profile
end
Unified OTLP Logs Export
Both JFR and eBPF profiles can export to the same OTLP Logs endpoint using identical ProfileEvent JSON format:
{
"timestamp": "2024-02-05T10:30:00.123Z",
"eventType": "jdk.ExecutionSample",
"profileType": "cpu",
"serviceName": "my-java-app",
"k8s_pod_name": "my-java-app-7b5f8d4c9b-x2kpq",
"k8s_namespace": "production",
"threadName": "main",
"threadId": 1,
"topClass": "com.example.MyService",
"topMethod": "processRequest",
"stackPath": "processRequest <- handleRequest <- doFilter",
"stackDepth": 25,
"sampleWeight": 150,
"stackTrace": "[{\"class\":\"com.example.MyService\",\"method\":\"processRequest\",\"line\":42}]"
}
Performance Overhead
Telegen profiling is designed for production:
| Profile Type | CPU Overhead | Memory Overhead |
|---|---|---|
| CPU | ~1% | 10MB buffer |
| Off-CPU | ~0.5% | 10MB buffer |
| Memory | ~2% | 20MB buffer |
| All enabled | ~3% | 50MB buffer |
Reducing Overhead
agent:
profiling:
# Lower sample rate
sample_rate: 49 # Instead of 99
# Increase upload interval
upload_interval: 120s # Instead of 60s
# Disable unused profile types
mutex: false
block: false
goroutine: false
Best Practices
1. Always Enable CPU and Off-CPU
These two profiles cover most performance issues:
agent:
profiling:
enabled: true
cpu: true
off_cpu: true
2. Use Appropriate Sample Rates
| Environment | Recommended Rate |
|---|---|
| Development | 99 Hz |
| Staging | 49 Hz |
| Production (high volume) | 19 Hz |
3. Include Debug Symbols in Production
Build with symbols, then strip for deployment:
# Build with symbols
go build -o app ./...
# Keep symbols for profiling (in separate file)
objcopy --only-keep-debug app app.debug
# Strip binary
strip app
Troubleshooting
No Profiles Generated
- Check eBPF permissions:
# Verify perf_event access cat /proc/sys/kernel/perf_event_paranoid # Should be 1 or less - Check capabilities:
# Container needs CAP_SYS_ADMIN or CAP_PERFMON capsh --print | grep -E "sys_admin|perfmon"
Missing Stack Frames
- Frame pointers disabled - Rebuild with
-fno-omit-frame-pointer - JIT code - Enable JIT symbol mapping
- Optimized code - Some inlined functions won’t appear
Next Steps
- Distributed Tracing - Correlate profiles with traces
- Security Observability - Security event profiling
- Agent Mode - Full profiling configuration