Materialize

A real-time data integration platform that continuously maintains SQL views over streaming data — always-fresh results with no polling, caching, or custom pipelines.

cli

v0.130.0

⎔ Repository 🌐 Website

Materialize is a streaming SQL database written in Rust. Rather than running queries against a snapshot of your data, Materialize continuously maintains the results of your SQL views as the underlying data changes — so every SELECT returns a correct, up-to-date answer with millisecond latency, without polling, cache invalidation, or custom pipeline code.

It connects to streaming sources like Kafka and Redpanda, or directly to PostgreSQL and MySQL via CDC (Change Data Capture), and keeps materialized views incrementally updated using a dataflow engine built on Timely Dataflow and Differential Dataflow.

Features

Always-fresh SQL — materialized views are updated incrementally as source data changes; queries always return correct, consistent results without manual refresh
Standard SQL — supports a comprehensive PostgreSQL-dialect SQL including multi-way joins, subqueries, aggregations, window functions, and CTEs
Streaming sources — native connectors for Kafka, Redpanda, Kinesis, and Postgres/MySQL CDC; no ETL pipeline required
SUBSCRIBE — push changes from a view to connected clients in real time as they happen, eliminating the need for polling
Delta joins — avoids the intermediate state explosion of naive nested-loop joins; handles joins of up to 64 relations efficiently
PostgreSQL wire protocol — connect with psql, any Postgres client library, JDBC, or BI tools like Grafana, Metabase, and dbt without modification
High availability — multi-active replication and horizontal scaling in the managed cloud offering
Embedded sources — built-in load generators (LOAD GENERATOR TPCH, LOAD GENERATOR COUNTER) for testing and benchmarking without external dependencies

Installation

Materialize is available as a fully managed cloud service and as a self-hosted Community Edition:

# Try the managed cloud service (free trial)
# https://materialize.com/register

# Self-hosted: download the binary
curl -L https://materialize.com/downloads/mz | sh

# Or via Docker (works on all Linux distributions, including Debian and Fedora)
docker pull materialize/materialized
docker run -p 6875:6875 -p 6876:6876 materialize/materialized

Connect with psql:

psql -h localhost -p 6875 materialize

Core Concepts

Sources

A source connects Materialize to an upstream data system and continuously ingests changes:

-- Connect to a Kafka topic
CREATE SOURCE orders_raw
  FROM KAFKA CONNECTION my_kafka (TOPIC 'orders')
  FORMAT JSON;

-- Ingest all tables from a PostgreSQL database via CDC
CREATE SOURCE pg_source
  FROM POSTGRES CONNECTION my_pg (PUBLICATION 'mz_source')
  FOR ALL TABLES;

-- Built-in load generator for testing
CREATE SOURCE auction_house
  FROM LOAD GENERATOR AUCTION
  FOR ALL TABLES;

Materialized Views

Define a view and Materialize keeps it continuously up to date:

-- A simple aggregation that stays current
CREATE MATERIALIZED VIEW order_totals AS
  SELECT
    customer_id,
    COUNT(*) AS order_count,
    SUM(amount) AS total_spent
  FROM orders
  GROUP BY customer_id;

-- A multi-source join — results update whenever either source changes
CREATE MATERIALIZED VIEW enriched_orders AS
  SELECT
    o.order_id,
    o.amount,
    c.name AS customer_name,
    c.email
  FROM orders o
  JOIN customers c ON o.customer_id = c.id;

Indexes

Creating an index on a materialized view keeps results in memory for instant point-lookups:

CREATE INDEX ON order_totals (customer_id);

-- Now this query returns in <1ms regardless of dataset size
SELECT * FROM order_totals WHERE customer_id = 42;

-- Subscribe to changes in a view; rows stream to your client as they update
SUBSCRIBE TO order_totals;

-- Subscribe with a snapshot of current state, then follow changes
SUBSCRIBE TO enriched_orders WITH (SNAPSHOT = true);

Connecting with psql

# Connect to a local instance
psql -h localhost -p 6875 materialize

# Connect to Materialize Cloud
psql "postgresql://user@host.materialize.cloud:6875/materialize"

Once connected, standard SQL works as expected:

-- Show all sources
SHOW SOURCES;

-- Show all materialized views
SHOW MATERIALIZED VIEWS;

-- Query a view — always returns fresh results
SELECT * FROM order_totals ORDER BY total_spent DESC LIMIT 10;

-- Explain the dataflow plan for a query
EXPLAIN SELECT * FROM enriched_orders WHERE customer_id = 42;

Use Cases

Operational dashboards — live metrics that update in sub-second without scheduled refreshes
Fraud and anomaly detection — evaluate rules continuously against a stream of transactions
AI / RAG pipelines — keep context fresh for retrieval-augmented generation without batch recomputation
Feature stores — serve always-current feature values to ML inference endpoints
Data mesh — expose live, consistent data products to downstream consumers via SQL

Why not just use a streaming framework?

Tools like Apache Flink or Spark Streaming can do continuous computation, but they require you to write dataflow code in Java/Scala/Python and manage state explicitly. Materialize lets you express the same logic as standard SQL — a language any engineer or analyst already knows — and handles the incremental maintenance, consistency, and fault tolerance automatically.