Overview
A production-grade distributed caching system built from scratch to understand the internals of systems like Redis Cluster and Memcached. The system supports key-value storage with TTL, consistent hashing for data distribution, leader-based replication for fault tolerance, and multiple eviction policies.
Problem
Single-node caches become a bottleneck and single point of failure at scale. This project explores the engineering challenges of distributing cache state across multiple nodes while maintaining low latency, high availability, and consistency guarantees.
Architecture
The system consists of three main components:
- Cache Nodes: Each node runs an in-memory key-value store with configurable eviction (LRU, LFU, TTL-based)
- Hash Ring: Consistent hashing with virtual nodes distributes keys across cache nodes, minimizing redistribution when nodes join or leave
- Replication Manager: Raft-based consensus for leader election and log replication ensures data survives node failures
- Client Library: A Go client that handles node discovery, request routing, and automatic failover
Tradeoffs
- CP over AP: Chose consistency over availability — writes are acknowledged only after replication to a quorum, which adds latency but prevents stale reads
- gRPC over HTTP: Binary protocol reduces serialization overhead for the high-frequency cache operations
- Memory-only over disk-backed: No persistence to disk keeps latency predictable but means a full cluster restart loses all data
Implementation
Key engineering decisions:
- Virtual nodes: Each physical node maps to 150 virtual nodes on the hash ring, ensuring even key distribution even with heterogeneous hardware
- Gossip protocol: Nodes exchange health information via a gossip protocol, enabling fast failure detection without a centralized coordinator
- Batch operations: Multi-get and multi-set operations are split by destination node and executed in parallel
- Backpressure: Connection pooling with circuit breakers prevents cascade failures when a node becomes slow
Challenges
- Handling split-brain scenarios during network partitions required careful Raft configuration and partition-aware client routing
- Achieving sub-millisecond p99 latency while maintaining replication guarantees
- Hot key detection and mitigation — popular keys are replicated to additional nodes with read-your-writes consistency
- Memory fragmentation management in long-running Go processes
Future Improvements
- Redis-compatible protocol support for drop-in replacement
- Tiered storage with SSD backing for warm data
- Cluster auto-scaling based on memory pressure and hit rate metrics
- Cross-datacenter replication for geo-distributed deployments