Data Model – big table article

1) Definition & Semantics

Formal definition:

(row: string, column: string, time: int64) → bytes

Row keys define lexicographic order and tablet boundaries (key ranges). Column keys are written as family:qualifier. Values are arbitrary byte strings.

Atomicity: mutations and reads are atomic per row.
Ordering: data is sorted lexicographically by row key.
Sparsity: nonexistent columns or cells take no storage.
Versions: multiple versions per cell, identified by timestamps.

2) Rows & Row Key Design

Row keys are arbitrary byte strings (typically 10–100 bytes) that define locality and load distribution.

Group related data: reversed-domain for URLs (e.g., com.google.maps/index.html).
Avoid hotspots: use salting/hash prefixes for high-throughput IDs.
Efficient scans: time bucketing (e.g., YYYYMMDD#user) for range queries.

Row-level atomicity Lexicographic ordering Tablet splits

3) Column Families & Qualifiers

Columns belong to families, which are the basic units for access control and storage tuning. Families are few and fixed; qualifiers are dynamic.

family:qualifier anchor:cnnsi.com → “Sports anchor text” contents: → “<html>…</html>” @ t_i language:id → “en”

Per-family policies: compression, caching, Bloom filters, max_versions, TTL.
Locality: design schema so frequently co-accessed data is colocated.

4) Versions, Timestamps & Retention

Each cell may have multiple versions stored in descending timestamp order. Reads return the newest version unless a timestamp or time range is specified.

Timestamps: 64-bit integers, assigned by the system or client.
Retention: per-family limits such as keep last N or keep newer than X days.
Compaction: obsolete versions are pruned during background merges.

// Example (pseudo API) put(row=”com.cnn.www”, mutations=[ set(“anchor:www.c-span.org”,”CNN”, ts=1692), delete(“anchor:www.abc.com”, ts=1692) ]) get(row=”com.cnn.www”, columns=[“anchor:*”], return_all_versions=true)

5) Mapping to Physical Storage

The logical map is persisted via an in-memory memtable and on-disk immutable SSTables stored in GFS. Each tablet (a row-key range) is the unit of load balancing and parallelism.

Write path: commit log → memtable → flush → SSTables → compactions.
Read path: merge memtable + SSTables with block cache & Bloom filters.
Locality groups: group families for co-access and selective caching.

Design insight: row-key, family, and locality-group choices translate access patterns directly into performance and storage efficiency.