Bigtable — Summary, Strengths & Weaknesses
Bigtable is Google’s distributed storage system for structured data. It scales to petabytes across thousands of servers and powers products like Search, Earth, and Analytics with low-latency reads and high-throughput writes.
Summary
Bigtable is Google’s distributed storage system designed for massive scalability, high performance, and reliability. It stores data as a sparse, distributed map indexed by (row, column, timestamp). Data is written in a log-structured way using a commit log, memtable, and immutable SSTables maintained through background compactions. Reads are accelerated through block and scan caches, Bloom filters, and locality groups that cluster frequently accessed data. The Master coordinates tablet distribution, while GFS provides persistent storage and Chubby manages synchronization. Overall, Bigtable achieves fast access, fault tolerance, and horizontal scalability, powering many large-scale Google services.
2. Strengths
- Massive scalability: Designed for petabytes and thousands of nodes.
- High throughput & low latency: Log-structured writes, caches, and Bloom filters.
- Operational resilience: Automatic tablet reassignment and recovery.
- Flexibility: Simple model fits varied structured/semi-structured data.
- Workload tuning: Locality groups, compression, in-memory families.
- Ecosystem fit: Integrates with MapReduce for ETL/analytics.
- Battle-tested: Backed major Google products in production.
3. Weaknesses
- Limited transactions: Only single-row atomicity; no multi-row ACID.
- No SQL/joins: Not a relational database; requires custom access patterns.
- Ops complexity: Requires careful key design to avoid hotspots; compaction tuning.
- Platform dependencies: Availability tied to Chubby and GFS health.
- Config sensitivity: Block sizes, locality grouping, and caching are workload-dependent.
- Learning curve: Schema and access design differ from RDBMS norms.
