Exact Match Cache
Store responses for exact prompt matches. Fast lookups with minimal overhead. Best for FAQ-style queries and repetitive requests.
- SHA-256 hash for keys
- Sub-millisecond lookups
- Configurable TTL
- Memory-efficient storage
Supercharge your LLM infrastructure with Redis-powered caching. Reduce API costs by up to 80%, slash response times to milliseconds, and handle rate limits gracefully with intelligent caching strategies.
Store responses for exact prompt matches. Fast lookups with minimal overhead. Best for FAQ-style queries and repetitive requests.
Cache responses based on meaning, not exact text. Embed prompts with vector similarity search to match semantically similar queries.
Automatic expiration for time-sensitive content. Perfect for models with knowledge cutoffs or frequently updated information.
Hierarchical caching with L1 (local) and L2 (Redis) layers. Minimize latency while maximizing cache coverage across deployments.
Prompt + Model
Cache Check
Lookup
Return Cached
Call LLM API
Cache for Future
| Parameter | Default | Description |
|---|---|---|
cache_ttl |
3600 | Time-to-live in seconds for cached responses |
similarity_threshold |
0.95 | Minimum similarity score for semantic cache hits |
max_cache_size |
4GB | Maximum memory allocation for cache |
cache_models |
all | Which models to cache (can filter by model name) |
cache_streaming |
true | Cache streaming responses chunk by chunk |
Caching Tutorial | Cost Optimization | Load Balancing | Production Deployment