How We Made Telemetry Queries 10x Faster: Chunk-Split Caching for Metrics, Logs, and Traces
Building a caching architecture that serves metrics range queries, log aggregations, and trace searches in sub-second responses — across all three observability pillars — without sacrificing freshness.
At MIRASTACK LABS, we are building Agentic AI DevOps Toolchains starting with observability tools that help engineering teams understand what their systems are doing. Metrics, logs, and traces — the three pillars of observability — billions of data points flowing through 100% open-source observability signal stores all surfaced through a single interaction layers in fully AIR-GAPPED REGULATED DATA CENTER Environments.
Our engineering team comes from a background of building population-scale systems and platforms. As our telemetry signals grew with scale — some running thousands of microservices generating gigabytes of telemetry per hour — the next bottleneck appeared in our data fetch layer, the middleware between our UI, AI Agents and the telemetry stores. The backend foundation remained strong and continued to handle scale exactly as designed for long term purposes.
This is the story of how we redesigned the caching layer, and the engineering decisions behind a system that now serves most queries in under 50 ms — across metrics, logs, AND traces — even when the underlying data spans a week and billions of raw data points. The same telemetry fabric is consumed by MIRASTACK AI Agents for deep correlation, failure detection, and near-real-time Root Cause Analysis, known internally as #5YRCA. For us, it is neither fair nor efficient to push that entire agentic-analysis burden directly onto the Observability datastores (VictoriaStack, Clickhouse, etc.) query path.
This solution was implemented to preserve backend efficiency, protect query latency, and let both observability and AI analysis pipelines scale cleanly together.
The Problem: Death by a Thousand Queries — Across All Three Pillars
Here’s what happens when someone opens our App Performance page for a single service:
Metrics:
- 30+ PromQL queries fire in parallel — throughput, error rate, P50/P95/P99 latency, anomaly scores, dependency graphs
- 6 metric probe queries determine which histogram naming convention the collector uses
- 2 service graph probes detect metric naming variants
Logs:
- Log volume histograms (
stats_query_range) for the service — error distribution over time - Hit count aggregations (
hits) showing log lines per severity level - Field facets — what fields exist in the logs for this service
Traces:
- Trace search across the entire time window — find all traces touching this service
- RED metrics from traces — rate, errors, duration computed from span data via LogSQL
- Service dependency graph — which services call what, derived from trace analytics
- Operations list — what endpoints this service exposes, fetched from the Jaeger API
That’s 50+ HTTP round-trips split across three different backends (the exact count depends on which panels are visible and which probes are warm). Now multiply that by 200 users viewing dashboards simultaneously, with auto-refresh every 30 seconds.
200 users × 50 queries × every 30 seconds = ~20,000 backend queries per minute, split roughly 60% metrics, 25% logs, 15% traces. Most return identical or near-identical results to what was fetched 30 seconds ago.
The Insight: Telemetry Data Has a Natural Temporal Cache Structure
Here’s what we realised — and this applies across all three pillars, not just metrics:
Metrics: Historical data is immutable
A metric value recorded at 2:00 PM yesterday will never change. The TSDB has committed it.
Logs: Historical aggregations are stable
A stats_query_range result asking “how many ERROR logs per minute yesterday” won’t change — those logs are indexed and immutable. Only the current minute is still receiving writes.
Traces: Committed traces never change
A trace that completed 10 minutes ago — its spans, durations, tags, processes — is immutable. The Jaeger API will return the exact same JSON for that trace for eternity. Only traces from the last few seconds might still have spans arriving.
The live edge exists in all three domains, but it’s narrow. For a 24-hour query:
- 23 hours and 55 minutes of data that will never change (metrics, logs, AND traces)
- 5 minutes of data that’s still being written
If we could cache the historical portion with a long TTL and only re-fetch the live edge — across all three backends — we’d eliminate the vast majority of upstream calls.
The Architecture: Five Strategies for All Query Patterns
We identified five distinct query patterns spanning all three telemetry backends and designed a purpose-built cache strategy for each:
flowchart TB
subgraph TP[Telemetry Proxy]
M[Metrics Chunk Cache\nPromQL]
L[Logs Chunk Cache\nLogSQL]
T[Traces Bucket Cache\nJaeger]
I[Instant Cache\nPromQL]
C[TTL Cache\nall endpoints]
U[Shared Cache Utilities\ngzip, pipeline MGET, tiered TTL sec and microseconds]
M --> U
L --> U
T --> U
I --> U
C --> U
end
U --> V[Valkey]
V --> MS[MetricStore]
V --> LS[LogStore]
V --> TS[TraceStore]
Strategy 1: Metrics Chunk-Split Cache (PromQL Range Queries)
This is where the journey started. A long time-range PromQL query gets decomposed into clock-aligned chunks, each cached independently with an age-based TTL.
flowchart LR
Q[24h range query T minus 24h to T] --> S[Split into 2h aligned chunks]
S --> C1[C1 2h\nTTL 1h old]
S --> C2[C2 2h\nTTL 1h old]
S --> C3[C3 2h\nTTL 1h old]
S --> C4[...]
S --> C11[C11 2h\nTTL 2m recent]
S --> C12[C12 2h\nTTL 15s live]
The algorithm runs in 9 steps:
- Full-result cache check — Return immediately if cached (120 s TTL)
- Short-range bypass — Queries under 10 minutes go direct (chunking overhead not worth it)
- Canonical step selection — Different auto-step values map to the same canonical step, sharing cache entries
- Chunk splitting — Clock-aligned boundaries ensure overlapping time windows share chunks
- Batch cache lookup — All chunk keys fetched in a single
MGET(one round-trip) - Parallel fetch of misses — Bounded concurrency (max 6 workers)
- Merge — Deduplicate by timestamp (same PromQL at same timestamp = same value)
- Downsample — Resample from canonical to user-requested step
- Cache the full result — Store assembled result for fast repeat access
The key innovation — tiered TTL — applies the same way across all three backends:
| How old is this chunk? | TTL |
|---|---|
| Less than 1 minute | 15 seconds |
| Less than 5 minutes | 2 minutes |
| Older than 5 minutes | 1 hour |
A 24-hour metrics query produces ~12 chunks of 2 hours each. After the first request, 11 of the 12 are cached for a full hour (only the live chunk needs re-fetching). The next user gets 11 cache hits and 1 backend call — a ~92% reduction in upstream queries.
Strategy 2: Logs Chunk-Split Cache (LogSQL Range Queries)
Once we proved chunk-splitting worked for metrics, the obvious question was: can we apply the same pattern to logs?
The answer is yes — but with critical differences in merge semantics and time filtering.
LogSQL stats_query_range — The Well-Behaved Sibling
LogStore exposes a stats_query_range endpoint that returns time-series aggregations over log data — “how many ERROR logs per minute over the last 24 hours.” The response shape is nearly identical to PromQL range queries. We reuse the exact same 9-step algorithm with the same chunk tier table.
The merge semantics are the same as metrics: deduplicate by timestamp. An aggregation at timestamp T over the same log data is deterministic.
Where it’s used: This single cache strategy serves six different route files — log explorer, trace analytics, RED report generation, RED metrics dashboards, and the main server routes. That’s the power of a well-designed abstraction: one caching implementation, six consumers.
LogSQL hits — The One That Tried to Break Us
The /select/logsql/hits endpoint returns hit count histograms: “how many log lines matched this query per time bucket, grouped by field.” Looks simple. It’s not.
Problem 1: Time filtering lives in the query, not the URL.
The hits endpoint doesn’t accept start/end as query parameters. Time filtering must be embedded in the LogSQL query itself:
Original query: error AND service:payments
Chunk query: _time:[2024-01-15T00:00:00Z, 2024-01-15T01:00:00Z) AND (error AND service:payments)
This means our chunk-split code has to rewrite the query for each chunk, injecting a _time filter. If the original query is * (match all), we use just the time filter without AND.
Problem 2: Hit counts are additive, not idempotent.
This is the one that would silently corrupt data if you got it wrong. Metric values at timestamp T are deterministic — the same PromQL evaluation at the same timestamp produces the same value. Deduplicating overlapping chunks is safe.
Hit counts are additive. If a time bucket straddles two chunks, each chunk returns a partial count for that bucket. You must sum the overlapping values, not deduplicate them:
Chunk 1 returns: { "14:00": 42, "14:05": 31 }
Chunk 2 returns: { "14:05": 17, "14:10": 89 }
WRONG (deduplicate): { "14:00": 42, "14:05": 31, "14:10": 89 } ← 14:05 undercounted!
RIGHT (sum): { "14:00": 42, "14:05": 48, "14:10": 89 } ← correct total
We designed the merge semantics per-endpoint from the start rather than applying a one-size-fits-all deduplication. The downsampling follows the same principle: metric downsampling picks the first sample per bucket, while hits downsampling sums values per bucket.
Where hits caching is used: Both the Log Explorer and the Trace Explorer use hits aggregations — traces analytics surfaces log hit counts correlated with trace data, so the same LogSQL hits cache serves both pages.
Strategy 3: Traces Bucketed Search Cache (Jaeger API)
This is the strategy that didn’t exist in our first iteration — and the one that made the most dramatic difference for trace-heavy workflows.
The Jaeger Limit Problem
The Jaeger trace search API (/api/traces) accepts a limit parameter that caps results per request. Send limit=50 over a 24-hour window, and you get the 50 most recent traces. Which means:
- A trace that caused a P1 incident at 3 AM? Not in the results.
- The slow outlier at 11 AM that affected 10,000 users? Not in the results.
- 23 hours of system behaviour? Invisible.
The standard advice is “narrow your time range.” That’s user-hostility disguised as a feature.
The Fix: Time-Bucketed Search
We split the 24-hour search window into time buckets, search each bucket independently with its own limit, then merge the results:
flowchart TB
B1[Before naive single 24h query limit 50]
B2[Returns only latest 50 traces]
B3[Most of the 24h window is invisible]
B1 --> B2 --> B3
A1[After bucketed split 24h into twelve 2h buckets]
A2[Each bucket queried with limit 50]
A3[Merge and deduplicate by traceID]
A4[Up to about 600 traces spread across full window]
A1 --> A2 --> A3 --> A4
The algorithm runs in 6 steps:
- Full-result cache check — Return immediately if cached
- Split into clock-aligned time buckets — Auto-select bucket size based on time range
- Batch cache lookup — Pipeline MGET for all bucket keys
- Parallel fetch of misses — Bounded at 6 concurrent Jaeger API calls
- Merge and deduplicate — By traceID (not by timestamp — traces aren’t time-series)
- Cache the full result
Microsecond Timestamps
Here’s a subtlety that bit us early: the Jaeger v1 API uses microsecond timestamps, not seconds. (Note: while the OpenTelemetry specification defines trace timestamps in nanoseconds, the Jaeger search API uses microseconds — a distinction that matters for bucket arithmetic.) Our tiered TTL function — originally built for seconds-precision metrics — needed a microsecond variant:
age_seconds = (now_microseconds - chunk_end_microseconds) / 1,000,000
The bucket boundary arithmetic also uses microseconds. Getting this wrong means buckets misalign, cache keys don’t match, and you get zero cache hits while wondering why your cache “isn’t working.”
Auto-Bucket Sizing
The bucket size auto-selects based on the time range, keeping the bucket count reasonable:
| Query Range | Bucket Size | Approx. Buckets |
|---|---|---|
| ≤ 5 min | 1 min | 5 |
| 5 min – 30 min | 5 min | 6 |
| 30 min – 2 h | 15 min | 8 |
| 2 h – 6 h | 30 min | 12 |
| 6 h – 24 h | 2 h | 12 |
| > 24 h | 6 h | varies |
Safety cap: If the calculated bucket count exceeds 120 (someone queries “last 90 days”), the bucket size automatically widens. We also cap each bucket at 1,000 traces to prevent any single Jaeger request from returning a 200 MB response.
Trace Merge — Deduplication by traceID
Traces aren’t time-series. You don’t merge them by summing values or deduplicating timestamps. A trace is a tree of spans, identified by a unique traceID. If the same trace appears in two adjacent buckets (because its spans straddle the boundary), we keep the first occurrence:
Bucket 1: [trace-abc, trace-def, trace-ghi]
Bucket 2: [trace-ghi, trace-jkl, trace-mno] ← trace-ghi spans the boundary
Merged: [trace-abc, trace-def, trace-ghi, trace-jkl, trace-mno] ← deduped
Traces are immutable once committed, so any copy is authoritative. First-occurrence-wins avoids re-processing.
Where it’s used: The primary trace search page uses the full bucketed search. For internal callers — business flow correlation, anchor trace search, journey flow analysis — we use a simpler 15-second TTL wrapper around raw Jaeger search, since those queries use tight time ranges that don’t need bucketing.
Strategy 4: Time-Quantised Instant Cache (PromQL)
Instant queries evaluate a PromQL expression at a single point in time. Our App Performance page fires 6 histogram probes within the same second, each asking “does this metric exist right now?”
We quantise the evaluation timestamp to 30-second buckets:
Query 1: time=1711900817 → quantised to 1711900800
Query 2: time=1711900818 → quantised to 1711900800 ← same bucket!
Query 3: time=1711900819 → quantised to 1711900800 ← same bucket!
The first query fetches from MetricStore. Queries 2–6 get the cached response. Six round-trips become one.
Strategy 5: Simple TTL Cache (Metadata Across All Three Backends)
Metadata queries — from all three backends — use a straightforward check-cache → fetch-if-miss → store pattern with endpoint-appropriate TTLs.
Traces:
| Data | TTL | Why |
|---|---|---|
| Trace by ID | 1 hour | Immutable once committed — the longest TTL in the system |
| Service list | 2 min | Services deploy/undeploy slowly |
| Operations | 2 min | Endpoints change rarely |
Logs:
| Data | TTL | Why |
|---|---|---|
| Full-text log query (NDJSON) | 15 s | Near-real-time, parsed from newline-delimited JSON |
| Stats query (non-range) | 30 s | Summary aggregation |
| Field names | 60 s | Schema-level metadata |
| Field values | 60 s | Value enumeration |
| Facets | 30 s | Dynamic aggregation |
Metrics:
| Data | TTL | Why |
|---|---|---|
| Label values | 2 min | Cardinality changes slowly |
| Probe results (histogram/svc graph) | 5 min | Metric naming doesn’t change mid-flight |
The Shared Foundation: Making It All Fast
Four utilities underpin every strategy across all three backends:
Transparent Gzip Compression
Each backend produces large JSON in its own way:
- Metrics: 24 h × 15 s step × 100 series → ~1.5 MB of PromQL JSON
- Logs: Hit count histograms across 50 field values → ~800 KB
- Traces: A single span tree with 200 spans, processes, and tags → ~2 MB per trace search
We transparently compress entries larger than 1 KB before writing to Valkey:
| Backend | Typical Compression Ratio | Why |
|---|---|---|
| Metrics (PromQL JSON) | 5–10× | Repetitive label key-value pairs |
| Logs (NDJSON/stats JSON) | 3–8× | Repetitive field names across log lines |
| Traces (Jaeger JSON) | 4–8× | Repeated process maps, service names, tag arrays |
Detection is automatic: we check for gzip magic bytes on read and decompress transparently. Entries exceeding 15 MB are silently skipped.
Pipeline MGET
When a query splits into 8 chunks (metrics), 12 buckets (traces), or 6 chunks (logs), naively checking each key means many round-trips. MGET collapses them all into one round-trip:
Sequential: 12 keys × ~0.5 ms RTT = ~6 ms
Pipeline: 1 MGET × ~0.5 ms RTT = ~0.5 ms
This is identical for metrics chunks, log chunks, and trace buckets. The same pipelineMGet function serves all three.
Tiered TTL — Seconds and Microseconds
The three-tier TTL applies identically across backends:
| Chunk Age | TTL | Metrics | Logs | Traces |
|---|---|---|---|---|
| < 1 min (live) | 15 s | ✓ | ✓ | ✓ |
| < 5 min (recent) | 2 min | ✓ | ✓ | ✓ |
| ≥ 5 min (historical) | 1 hour | ✓ | ✓ | ✓ |
The difference: metrics and logs use seconds-precision timestamps, traces use microseconds. We provide two variants of the same function — one divides by 1, the other divides by 1,000,000 — to compute chunk age correctly.
Graceful Degradation
The cache is never in the critical path for correctness — for any backend:
try {
return cache_strategy.execute(query)
} catch {
// Valkey is down? Go direct to Metrics/Logs/Traces DataStore.
return direct_fetch(query)
}
If Valkey goes down, all three cache strategies transparently fall back to direct backend queries. Users experience slower responses but never see errors. When Valkey comes back, the cache warms organically.
Real-World Impact
Quantitative: What the Code Guarantees
| Metric | Before | After (from cache architecture) |
|---|---|---|
| Backend calls per 24 h metrics query | 1 large range query | 1 live chunk re-fetch + 11 cache hits (~92% reduction) |
| Backend calls for 24 h trace search | 1 Jaeger query returning only last N traces | 1 live bucket re-fetch + 11 cache hits |
| Histogram probe overhead per page load (warm) | 6 instant queries | 0 (cached for 5 min) |
| Repeated identical query (any backend) | Full backend round-trip | Valkey cache hit (~sub-ms) |
| Trace search coverage (24 h window) | Last limit traces only |
Traces distributed across all 12 buckets — full time coverage |
| Metadata refresh (services, labels, fields) | Every page load from backend | Cached at 30 s–120 s TTLs |
Note: Latency improvements depend on deployment topology (Valkey latency, backend response times, network). The architecture eliminates redundant upstream calls; actual wall-clock gains vary by environment.
The trace bucketing benefit deserves special emphasis: Before, a 24-hour trace search returned only the most recent N traces within the limit. After, traces are distributed across the entire window — every 2-hour bucket is represented — giving users full visibility into system behaviour across the entire time range.
Design Decisions Worth Highlighting
Why the Same Chunk-Split Pattern Across Three Different Backends?
The fundamental insight — “old data doesn’t change, so split along time boundaries” — applies universally to metrics, logs, and traces. By designing a shared abstraction (cache utilities, pipeline MGET, bounded worker pool, tiered TTL), we implemented three separate strategies without tripling the code:
| Component | Shared | Per-Backend |
|---|---|---|
| Gzip compression | ✓ | |
| Pipeline MGET | ✓ | |
| Tiered TTL | ✓ (sec + µs variants) | |
| Bounded worker pool | ✓ | |
| Chunk/bucket splitting | Metrics (seconds), Logs (seconds), Traces (microseconds) | |
| Merge semantics | Dedup (metrics), Dedup (log stats), Sum (log hits), TraceID dedup (traces) | |
| Time filter injection | Log hits embeds _time in query |
|
| Auto bucket sizing | Traces (6 tiers + safety cap) |
Why Different Merge Semantics per Backend?
This is the design decision that prevents silent data corruption:
- Metrics: Same PromQL at same timestamp = same value → deduplicate
- Log stats: Same aggregation at same timestamp = same value → deduplicate
- Log hits: Counts are additive across chunk boundaries → sum
- Traces: Immutable objects identified by unique ID → deduplicate by traceID
A one-size-fits-all merge function would silently undercount log volumes or duplicate traces. We caught this early because we designed merge semantics per-endpoint from the start.
Why Clock-Aligned Chunks?
If User A queries [14:02, 14:32] and User B queries [14:05, 14:35], free-form chunks would be completely different. Clock-aligned chunks snap to wall-clock boundaries, so overlapping middle chunks share cache entries. This applies identically to metrics chunks, log chunks, and trace buckets — humans across the same team tend to look at similar time ranges.
Why Microsecond Arithmetic for Traces?
The Jaeger v1 API uses microsecond timestamps (the OpenTelemetry specification uses nanoseconds). If you use seconds-precision arithmetic for trace bucket boundaries, your buckets misalign by up to 999,999 microseconds — nearly a full second. Cache keys won’t match between requests, hit rate drops to zero, and you’ve built an expensive no-op. We learned to provide a dedicated tieredChunkTTLMicros function alongside the seconds variant.
Why Auto-Widen Trace Buckets?
Unlike metrics (where chunk size is deterministic from the tier table), trace searches can span very long time ranges. A “last 90 days” search at 2-hour buckets would produce 1,080 buckets — 1,080 Jaeger API calls. The MAX_BUCKETS=120 safety cap auto-widens bucket size to keep the request count bounded, at the cost of coarser cache granularity for very long ranges.
Why Sum for Hits Downsampling?
Metric downsampling picks the first sample per target bucket — because you’re resampling a continuous signal. Hits downsampling sums values per target bucket — because you’re aggregating counts. Using the wrong downsampling function for hits would silently lose log volume data.
What We’d Do Differently
Cache warming: Currently, the cache warms organically from user traffic. For critical dashboards with known query patterns, a background warmer that pre-fetches historical chunks and trace buckets during low-traffic periods would further reduce first-load latency.
Adaptive chunk sizing: Our fixed tier tables work well for most workloads, but extremely high-cardinality metric queries (10,000+ series) and high-volume log queries produce large chunks. An adaptive system that adjusts chunk size based on estimated result size would optimise Valkey memory usage.
Write coalescing: When 50 users hit the same uncached query simultaneously, all 50 fetch from the backend. A “single-flight” pattern (coalescing concurrent requests for the same cache key into a single backend call) would eliminate this thundering herd — especially impactful for trace searches where each Jaeger API call is expensive.
Cross-backend preloading: When a user opens the App Performance page, we know they’ll query metrics, logs, AND traces for that service. Proactively warming the caches for all three backends in parallel (instead of waiting for each component to mount and fire its own requests) would cut perceived load time further.
Why We Chose VictoriaMetrics As Our Observability Backend
We want to extend sincere thanks to the VictoriaMetrics Engineering Team for building and maintaining exceptional open source observability infrastructure.
Their continued contributions to the open source ecosystem enable engineering teams around the world to build reliable, high-performance systems on transparent and battle-tested foundations.
We chose the VictoriaMetrics stack deliberately. For our roadmap, observability is foundational infrastructure, not a sidecar decision. We needed a backend that scales with product ambition and remains economically efficient under sustained load.
-
Horizontal scaling by design
The VictoriaMetrics ecosystem gives us a practical scale-out path for real production workloads. For example, VictoriaLogs cluster mode uses the same binary with different flags, so teams can expand capacity without a separate “cluster edition” migration step. The gateway and storage role separation (including stateless gateway patterns) also maps cleanly to modern Kubernetes operations and progressive scaling strategies. -
Extreme resource optimization and cost efficiency
The VictoriaMetrics team consistently publishes benchmark-driven engineering and resource-usage analysis in public. Their performance work and cost-focused product improvements reinforce why we trust this stack for high-ingest telemetry while maintaining disciplined infrastructure spend at scale.
If you build serious observability systems, we strongly recommend following the VictoriaMetrics engineering blog for architecture, performance, and production operations insights.
If you have not explored their work yet, we highly recommend visiting victoriametrics.com.
Conclusion
The fundamental insight is deceptively simple: telemetry data — metrics, logs, and traces — all share a natural temporal cache structure. Old data doesn’t change, regardless of which backend stores it. Split along time boundaries, cache aggressively, merge correctly (deduplicate for metrics, sum for hit counts, deduplicate-by-ID for traces), and degrade gracefully.
The implementation required careful attention to three different merge semantics, two timestamp precisions, time-filter injection for endpoints that don’t accept time parameters, and safety caps for unbounded searches. But the result is a system where the same architectural pattern — chunk-split with tiered TTL — applies across all three observability pillars, sharing infrastructure while respecting the unique semantics of each data type.
For a platform serving population-scale observability data across metrics, logs, and traces, the caching layer is the difference between a system that groans under load and one that hums — regardless of which backend the user is querying.
This post is part of the MIRASTACK LABS engineering blog series on building production-grade Agentic AI + AIR-GAPPED environments for highly secure and regulated entities