Data Then vs Now: The Shift from Data Warehouses to Streaming Data

Author

Shivam Dhawan

Date Published

Well Lake and Stream

Data analytics has changed shape three times. From Stored Truth to Moving Signals

Data’s New Physics: From Stored Truth to Moving Signals

Not in theory. In the way it moves, where it lives, and how we treat it.

A simple mental model we use for this article:

  • Data used to be a well.
  • Then it became a lake.
  • Now it behaves like streams.

Each era comes with its own “treatment plant”; its own ETL-style flow, expectations, and failure modes.


1) The Well Era (Past): scarce, centralized, controlled

A well is deep, narrow, and intentional.

You don’t pull water unless you have a reason.

You don’t let everyone build their own plumbing.

Where data lived

  • Relational databases, data warehouses
  • Carefully modeled schemas
  • Central BI team as gatekeeper

Typical treatment (ETL flow)

  1. Extract from operational systems (ERP/CRM)
  2. Transform into strict business models (star schemas, dimensions)
  3. Load into a warehouse where queries are “safe”

Strengths

  • High trust and consistency
  • Clear definitions and governance

Trade-offs

  • Slow to change
  • Data becomes “official” only after it’s fully processed
The well optimized for certainty.

 

2) The Lake Era (Present-ish): abundant, flexible, messy

A lake is wide and full of water from many sources.

You can drop in logs, events, files, tables — and decide later what it means.

Where data lived

  • Object storage + lakehouse patterns
  • Semi-structured data (JSON, parquet)
  • Many producers, many consumers

Typical treatment (ELT-leaning flow)

  1. Extract everything (raw ingestion)
  2. Load into a lake (raw, bronze)
  3. Transform downstream into curated layers (silver/gold)

Strengths

  • Fast ingestion, flexible exploration
  • Supports many analytics use cases without upfront modeling

Tradeoff

  • Without discipline, you get a swamp:
    • unclear ownership
    • inconsistent definitions
    • “raw forever” datasets nobody trusts
The lake optimized for optionality.

 

3) The Stream Era (Now): continuous, time-sensitive, always in motion

A stream is not stored first and analyzed later.

It’s processed as it flows.

Latency becomes part of the product.

Where data lived

  • Event pipelines, streaming platforms
  • Real-time feature stores, operational analytics
  • Metrics and monitoring become first-class citizens

Typical treatment (Streaming ETL)

  1. Capture events as they happen (CDC / event tracking)
  2. Validate and enrich in motion (schemas, contracts, joins)
  3. Route to multiple sinks:
    • warehouse/lake for history
    • serving systems for real-time decisions
    • alerting for operations

Strengths

  • Real-time feedback loops (fraud, personalization, monitoring)
  • Closer alignment between data and product behavior

Tradeoff

  • Harder debugging (time windows, out-of-order events)
  • Governance must shift “left” (contracts at the source)
  • Observability becomes mandatory, not optional
The stream optimized for responsiveness.

 

The quiet point

These aren’t replacements. They stack.

Most mature systems are hybrid:

  • Streams for immediacy
  • Lakes for breadth and history
  • Wells (warehouses/models) for trusted business truth

The question isn’t “which one is best?”

It’s: What kind of water are we dealing with; and how quickly do we need it to be safe to drink?