Materialize
A real-time data integration platform that continuously maintains SQL views over streaming data — always-fresh results with no polling, caching, or custom pipelines.
Materialize is a streaming SQL database written in Rust. Rather than running
queries against a snapshot of your data, Materialize continuously maintains the
results of your SQL views as the underlying data changes — so every SELECT
returns a correct, up-to-date answer with millisecond latency, without polling,
cache invalidation, or custom pipeline code.
It connects to streaming sources like Kafka and Redpanda, or directly to PostgreSQL and MySQL via CDC (Change Data Capture), and keeps materialized views incrementally updated using a dataflow engine built on Timely Dataflow and Differential Dataflow.
Features
- Always-fresh SQL — materialized views are updated incrementally as source data changes; queries always return correct, consistent results without manual refresh
- Standard SQL — supports a comprehensive PostgreSQL-dialect SQL including multi-way joins, subqueries, aggregations, window functions, and CTEs
- Streaming sources — native connectors for Kafka, Redpanda, Kinesis, and Postgres/MySQL CDC; no ETL pipeline required
SUBSCRIBE— push changes from a view to connected clients in real time as they happen, eliminating the need for polling- Delta joins — avoids the intermediate state explosion of naive nested-loop joins; handles joins of up to 64 relations efficiently
- PostgreSQL wire protocol — connect with
psql, any Postgres client library, JDBC, or BI tools like Grafana, Metabase, and dbt without modification - High availability — multi-active replication and horizontal scaling in the managed cloud offering
- Embedded sources — built-in load generators (
LOAD GENERATOR TPCH,LOAD GENERATOR COUNTER) for testing and benchmarking without external dependencies
Installation
Materialize is available as a fully managed cloud service and as a self-hosted Community Edition:
# Try the managed cloud service (free trial)
# https://materialize.com/register
# Self-hosted: download the binary
curl -L https://materialize.com/downloads/mz | sh
# Or via Docker (works on all Linux distributions, including Debian and Fedora)
docker pull materialize/materialized
docker run -p 6875:6875 -p 6876:6876 materialize/materialized
Connect with psql:
psql -h localhost -p 6875 materializeCore Concepts
Sources
A source connects Materialize to an upstream data system and continuously ingests changes:
-- Connect to a Kafka topic
CREATE SOURCE orders_raw
FROM KAFKA CONNECTION my_kafka (TOPIC 'orders')
FORMAT JSON;
-- Ingest all tables from a PostgreSQL database via CDC
CREATE SOURCE pg_source
FROM POSTGRES CONNECTION my_pg (PUBLICATION 'mz_source')
FOR ALL TABLES;
-- Built-in load generator for testing
CREATE SOURCE auction_house
FROM LOAD GENERATOR AUCTION
FOR ALL TABLES;Materialized Views
Define a view and Materialize keeps it continuously up to date:
-- A simple aggregation that stays current
CREATE MATERIALIZED VIEW order_totals AS
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id;
-- A multi-source join — results update whenever either source changes
CREATE MATERIALIZED VIEW enriched_orders AS
SELECT
o.order_id,
o.amount,
c.name AS customer_name,
c.email
FROM orders o
JOIN customers c ON o.customer_id = c.id;Indexes
Creating an index on a materialized view keeps results in memory for instant point-lookups:
CREATE INDEX ON order_totals (customer_id);
-- Now this query returns in <1ms regardless of dataset size
SELECT * FROM order_totals WHERE customer_id = 42;SUBSCRIBE — Real-time change streams
-- Subscribe to changes in a view; rows stream to your client as they update
SUBSCRIBE TO order_totals;
-- Subscribe with a snapshot of current state, then follow changes
SUBSCRIBE TO enriched_orders WITH (SNAPSHOT = true);Connecting with psql
# Connect to a local instance
psql -h localhost -p 6875 materialize
# Connect to Materialize Cloud
psql "postgresql://user@host.materialize.cloud:6875/materialize"
Once connected, standard SQL works as expected:
-- Show all sources
SHOW SOURCES;
-- Show all materialized views
SHOW MATERIALIZED VIEWS;
-- Query a view — always returns fresh results
SELECT * FROM order_totals ORDER BY total_spent DESC LIMIT 10;
-- Explain the dataflow plan for a query
EXPLAIN SELECT * FROM enriched_orders WHERE customer_id = 42;Use Cases
- Operational dashboards — live metrics that update in sub-second without scheduled refreshes
- Fraud and anomaly detection — evaluate rules continuously against a stream of transactions
- AI / RAG pipelines — keep context fresh for retrieval-augmented generation without batch recomputation
- Feature stores — serve always-current feature values to ML inference endpoints
- Data mesh — expose live, consistent data products to downstream consumers via SQL
Why not just use a streaming framework?
Tools like Apache Flink or Spark Streaming can do continuous computation, but they require you to write dataflow code in Java/Scala/Python and manage state explicitly. Materialize lets you express the same logic as standard SQL — a language any engineer or analyst already knows — and handles the incremental maintenance, consistency, and fault tolerance automatically.