Databend

An open-source, cloud-native data warehouse written in Rust — a high-performance alternative to Snowflake built for object storage.

Screenshot of Databend

Databend is an open-source, cloud-native data warehouse written in Rust. It is designed as a modern alternative to Snowflake — storing all data on object storage (S3, GCS, Azure Blob, or any S3-compatible service), separating compute from storage, and delivering interactive query performance at a fraction of the cost of traditional data warehouses.

It speaks standard SQL with a MySQL-compatible wire protocol, so existing BI tools, SQL clients, and data pipelines connect without modification.

Features

  • Object storage native — all data lives on S3 (or compatible); compute nodes are fully stateless and can scale to zero
  • MySQL-compatible protocol — connect with any MySQL client, JDBC/ODBC driver, or tool that supports MySQL without additional drivers
  • Vectorized query execution — a SIMD-accelerated columnar execution engine with Apache Arrow as the in-memory format
  • Apache Parquet on disk — data is stored as Parquet files on object storage, making it portable and interoperable with the rest of the data ecosystem
  • Semi-structured data — first-class support for JSON, Parquet, CSV, TSV, ORC, and Avro — query nested JSON directly with SQL
  • Data sharing — share live, read-only datasets across accounts without copying data
  • Time travel — query historical snapshots of your data at any point in the retention window
  • Continuous data loading — auto-ingest files from S3 buckets as they arrive using stage-based pipelines
  • Full-text search — built-in inverted index for fast keyword search within SQL queries
  • Serverless option — Databend Cloud offers a fully managed, pay-per-query serverless experience

Installation

Docker (quickest for local testing)

Docker works on all Linux distributions, including Debian, Ubuntu, and Fedora.

docker pull datafuselabs/databend
docker run -p 3307:3307 -p 8000:8000 datafuselabs/databend

Connect with any MySQL client:

mysql -h 127.0.0.1 -P 3307 -u root

Binary

Download the latest release from the releases page. Pre-built .deb and .rpm packages are available there for Debian/Ubuntu and Fedora respectively, as well as plain tarballs for other Linux distributions.

# Debian / Ubuntu — download the .deb from the releases page, then:
dpkg -i databend-v1.2.725-x86_64-unknown-linux-gnu.deb

# Fedora — download the .rpm from the releases page, then:
rpm -i databend-v1.2.725-x86_64-unknown-linux-gnu.rpm

# Or extract the tarball directly
tar xzf databend-v1.2.725-x86_64-unknown-linux-gnu.tar.gz
./bin/databend-meta &
./bin/databend-query

Databend Cloud

The fully managed cloud service handles infrastructure, scaling, and upgrades automatically:

# Connect to Databend Cloud via the CLI
pip install databend-sqlalchemy
# or use any MySQL-compatible client

Usage

Connecting

# MySQL client
mysql -h 127.0.0.1 -P 3307 -u root -e "SELECT version();"

# bendsql — the official Databend CLI client
cargo install bendsql
bendsql --host 127.0.0.1 --port 8000

Creating tables and loading data

-- Create a database and table
CREATE DATABASE sales;

CREATE TABLE sales.orders (
    order_id    INT,
    customer    VARCHAR,
    amount      DECIMAL(10, 2),
    region      VARCHAR,
    order_date  DATE
);

-- Load from a staged file on S3
COPY INTO sales.orders
FROM 's3://my-bucket/orders/'
FILE_FORMAT = (TYPE = 'PARQUET');

-- Or load from a local file via stage
PUT fs:///path/to/orders.parquet @my_stage;
COPY INTO sales.orders FROM @my_stage FILE_FORMAT = (TYPE = 'PARQUET');

Querying

-- Standard analytical queries
SELECT
    region,
    DATE_TRUNC('month', order_date) AS month,
    COUNT(*)                         AS order_count,
    SUM(amount)                      AS total_revenue,
    AVG(amount)                      AS avg_order_value
FROM sales.orders
WHERE order_date >= '2024-01-01'
GROUP BY region, month
ORDER BY month, total_revenue DESC;

-- Query JSON columns directly
SELECT
    raw_data:user.name::VARCHAR   AS user_name,
    raw_data:event.type::VARCHAR  AS event_type
FROM events
WHERE raw_data:user.country = 'US';

-- Full-text search within SQL
SELECT title, body
FROM articles
WHERE MATCH(body, 'rust programming language');

Time travel

-- Query a table as it existed 1 hour ago
SELECT * FROM sales.orders AT (OFFSET => -3600);

-- Query at a specific snapshot
SELECT * FROM sales.orders AT (SNAPSHOT => 'abc123...');

-- Restore a dropped table
UNDROP TABLE sales.orders;

Continuous ingestion

-- Auto-ingest new files from S3 as they land
CREATE PIPE orders_pipe
  AUTO_INGEST = TRUE
  AS COPY INTO sales.orders
  FROM 's3://my-bucket/orders/'
  FILE_FORMAT = (TYPE = 'PARQUET');

bendsql — the CLI client

bendsql is the official command-line client for Databend:

cargo install bendsql

# Connect to a local instance
bendsql

# Connect to Databend Cloud
bendsql --dsn "databend://user:password@host.databend.com:443/mydb?sslmode=enable"

# Run a query directly
bendsql --query "SELECT COUNT(*) FROM sales.orders"

# Execute a SQL file
bendsql < query.sql

# Output as CSV
bendsql --output csv --query "SELECT * FROM sales.orders LIMIT 100"

Architecture

Databend separates three concerns that are coupled in traditional databases:

  • Storage — raw data as Parquet files on object storage; no proprietary format lock-in
  • Metadata — a lightweight catalog service tracking table schemas, snapshots, and statistics
  • Compute — stateless query nodes that read from storage, execute queries, and can be scaled or paused independently

This means you pay only for storage at rest (object storage prices) and compute when running queries — the same model that made Snowflake successful, but open source and self-hostable.