Databend
An open-source, cloud-native data warehouse written in Rust — a high-performance alternative to Snowflake built for object storage.
Databend is an open-source, cloud-native data warehouse written in Rust. It is designed as a modern alternative to Snowflake — storing all data on object storage (S3, GCS, Azure Blob, or any S3-compatible service), separating compute from storage, and delivering interactive query performance at a fraction of the cost of traditional data warehouses.
It speaks standard SQL with a MySQL-compatible wire protocol, so existing BI tools, SQL clients, and data pipelines connect without modification.
Features
- Object storage native — all data lives on S3 (or compatible); compute nodes are fully stateless and can scale to zero
- MySQL-compatible protocol — connect with any MySQL client, JDBC/ODBC driver, or tool that supports MySQL without additional drivers
- Vectorized query execution — a SIMD-accelerated columnar execution engine with Apache Arrow as the in-memory format
- Apache Parquet on disk — data is stored as Parquet files on object storage, making it portable and interoperable with the rest of the data ecosystem
- Semi-structured data — first-class support for JSON, Parquet, CSV, TSV, ORC, and Avro — query nested JSON directly with SQL
- Data sharing — share live, read-only datasets across accounts without copying data
- Time travel — query historical snapshots of your data at any point in the retention window
- Continuous data loading — auto-ingest files from S3 buckets as they arrive using stage-based pipelines
- Full-text search — built-in inverted index for fast keyword search within SQL queries
- Serverless option — Databend Cloud offers a fully managed, pay-per-query serverless experience
Installation
Docker (quickest for local testing)
Docker works on all Linux distributions, including Debian, Ubuntu, and Fedora.
docker pull datafuselabs/databend
docker run -p 3307:3307 -p 8000:8000 datafuselabs/databend
Connect with any MySQL client:
mysql -h 127.0.0.1 -P 3307 -u rootBinary
Download the latest release from the
releases page. Pre-built
.deb and .rpm packages are available there for Debian/Ubuntu and Fedora
respectively, as well as plain tarballs for other Linux distributions.
# Debian / Ubuntu — download the .deb from the releases page, then:
dpkg -i databend-v1.2.725-x86_64-unknown-linux-gnu.deb
# Fedora — download the .rpm from the releases page, then:
rpm -i databend-v1.2.725-x86_64-unknown-linux-gnu.rpm
# Or extract the tarball directly
tar xzf databend-v1.2.725-x86_64-unknown-linux-gnu.tar.gz
./bin/databend-meta &
./bin/databend-queryDatabend Cloud
The fully managed cloud service handles infrastructure, scaling, and upgrades automatically:
# Connect to Databend Cloud via the CLI
pip install databend-sqlalchemy
# or use any MySQL-compatible clientUsage
Connecting
# MySQL client
mysql -h 127.0.0.1 -P 3307 -u root -e "SELECT version();"
# bendsql — the official Databend CLI client
cargo install bendsql
bendsql --host 127.0.0.1 --port 8000Creating tables and loading data
-- Create a database and table
CREATE DATABASE sales;
CREATE TABLE sales.orders (
order_id INT,
customer VARCHAR,
amount DECIMAL(10, 2),
region VARCHAR,
order_date DATE
);
-- Load from a staged file on S3
COPY INTO sales.orders
FROM 's3://my-bucket/orders/'
FILE_FORMAT = (TYPE = 'PARQUET');
-- Or load from a local file via stage
PUT fs:///path/to/orders.parquet @my_stage;
COPY INTO sales.orders FROM @my_stage FILE_FORMAT = (TYPE = 'PARQUET');Querying
-- Standard analytical queries
SELECT
region,
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value
FROM sales.orders
WHERE order_date >= '2024-01-01'
GROUP BY region, month
ORDER BY month, total_revenue DESC;
-- Query JSON columns directly
SELECT
raw_data:user.name::VARCHAR AS user_name,
raw_data:event.type::VARCHAR AS event_type
FROM events
WHERE raw_data:user.country = 'US';
-- Full-text search within SQL
SELECT title, body
FROM articles
WHERE MATCH(body, 'rust programming language');Time travel
-- Query a table as it existed 1 hour ago
SELECT * FROM sales.orders AT (OFFSET => -3600);
-- Query at a specific snapshot
SELECT * FROM sales.orders AT (SNAPSHOT => 'abc123...');
-- Restore a dropped table
UNDROP TABLE sales.orders;Continuous ingestion
-- Auto-ingest new files from S3 as they land
CREATE PIPE orders_pipe
AUTO_INGEST = TRUE
AS COPY INTO sales.orders
FROM 's3://my-bucket/orders/'
FILE_FORMAT = (TYPE = 'PARQUET');bendsql — the CLI client
bendsql is the official command-line client for Databend:
cargo install bendsql
# Connect to a local instance
bendsql
# Connect to Databend Cloud
bendsql --dsn "databend://user:password@host.databend.com:443/mydb?sslmode=enable"
# Run a query directly
bendsql --query "SELECT COUNT(*) FROM sales.orders"
# Execute a SQL file
bendsql < query.sql
# Output as CSV
bendsql --output csv --query "SELECT * FROM sales.orders LIMIT 100"Architecture
Databend separates three concerns that are coupled in traditional databases:
- Storage — raw data as Parquet files on object storage; no proprietary format lock-in
- Metadata — a lightweight catalog service tracking table schemas, snapshots, and statistics
- Compute — stateless query nodes that read from storage, execute queries, and can be scaled or paused independently
This means you pay only for storage at rest (object storage prices) and compute when running queries — the same model that made Snowflake successful, but open source and self-hostable.