Applications
Bigtable backs many Google products, from latency-sensitive serving to throughput-heavy batch jobs. This page captures the three case studies described in the paper—Google Analytics, Google Earth, and Personalized Search— together with the relevant characteristics from Table 2.
1. Production Snapshot (from Table 2)
| Project | Table size (TB, pre-compression) | Compression ratio | # Cells (billions) | # Families | # Locality groups | % in memory | Latency-sensitive? |
|---|---|---|---|---|---|---|---|
| Google Analytics | 20 | 29% | 10 | 1 | 1 | 0% | Yes |
| Google Analytics | 200 | 14% | 80 | 1 | 1 | 0% | Yes |
| Google Earth | 0.5 | 64% | 8 | 7 | 2 | 33% | Yes |
| Google Earth | 70 | – | 9 | 8 | 3 | 0% | No |
| Personalized Search | 4 | 47% | 6 | 93 | 11 | 5% | Yes |
2. Google Analytics
Raw → Aggregations → Summary
A JavaScript snippet records page views and events. The system stores raw sessions and periodically produces per-site summaries. Summaries power dashboards and reporting; throughput is typically GFS-bound during batch windows.
- Raw click table (~200 TB): one row per session; row key (site, session_time) for site-contiguity and chronological order.
- Summary table (~20 TB): MapReduce/streaming jobs compute per-site daily aggregates.
- Compression: raw ≈14%, summary ≈29%.
3. Google Earth
Spatial Locality for Tiled Imagery
Raw imagery is preprocessed into geographic tiles in one table; serving uses a compact index in another table. The index uses in-memory families and many tablet servers to meet very low-latency read targets during panning/zooming.
- Imagery table (~70 TB): rows = geo segments; compression disabled (source imagery already encoded).
- Serving index (~0.5 TB): memory-resident families, high QPS; keys reflect (zoom, lat, lon).
- ETL: MapReduce pipelines for cleaning, tiling, and loading.
4. Personalized Search
Per-User Activity with Versioned Cells
Opt-in histories capture queries, clicks, and preferences. Each user is a row keyed by userid; column families separate action types, with timestamps as versions. Profiles are built by batch jobs and used at serve time.
- Schema: families like queries, clicks, prefs; timestamps = action time.
- Consistency: single-row atomic updates; multi-cluster replication for availability.
- Ops: quotas on shared tables to bound per-client usage.
Notes: Sizes, compression ratios, locality group counts, and in-memory percentages reflect the paper’s Table 2 and the “Real Applications” section descriptions.
