Applications

Applications

Bigtable backs many Google products, from latency-sensitive serving to throughput-heavy batch jobs. This page captures the three case studies described in the paper—Google Analytics, Google Earth, and Personalized Search— together with the relevant characteristics from Table 2.

1. Production Snapshot (from Table 2)

Project Table size (TB, pre-compression) Compression ratio # Cells (billions) # Families # Locality groups % in memory Latency-sensitive?
Google Analytics2029%10110%Yes
Google Analytics20014%80110%Yes
Google Earth0.564%87233%Yes
Google Earth709830%No
Personalized Search447%693115%Yes
Extracted directly from the paper’s Table 2. “–” indicates compression disabled.

2. Google Analytics

Raw → Aggregations → Summary

A JavaScript snippet records page views and events. The system stores raw sessions and periodically produces per-site summaries. Summaries power dashboards and reporting; throughput is typically GFS-bound during batch windows.

  • Raw click table (~200 TB): one row per session; row key (site, session_time) for site-contiguity and chronological order.
  • Summary table (~20 TB): MapReduce/streaming jobs compute per-site daily aggregates.
  • Compression: raw ≈14%, summary ≈29%.
Access: range scans + targeted lookups
Writes: heavy ingest with group commit
Latency: user-facing reports (low-ms on summaries)

3. Google Earth

Spatial Locality for Tiled Imagery

Raw imagery is preprocessed into geographic tiles in one table; serving uses a compact index in another table. The index uses in-memory families and many tablet servers to meet very low-latency read targets during panning/zooming.

  • Imagery table (~70 TB): rows = geo segments; compression disabled (source imagery already encoded).
  • Serving index (~0.5 TB): memory-resident families, high QPS; keys reflect (zoom, lat, lon).
  • ETL: MapReduce pipelines for cleaning, tiling, and loading.
Access: point reads by tile key
Writes: batch loads during imagery updates
Latency: highly latency-sensitive (map UX)

4. Personalized Search

Per-User Activity with Versioned Cells

Opt-in histories capture queries, clicks, and preferences. Each user is a row keyed by userid; column families separate action types, with timestamps as versions. Profiles are built by batch jobs and used at serve time.

  • Schema: families like queries, clicks, prefs; timestamps = action time.
  • Consistency: single-row atomic updates; multi-cluster replication for availability.
  • Ops: quotas on shared tables to bound per-client usage.
Access: per-user lookups, short range scans
Writes: steady stream of small updates
Latency: on the user-visible path

Notes: Sizes, compression ratios, locality group counts, and in-memory percentages reflect the paper’s Table 2 and the “Real Applications” section descriptions.