What is DuckLake, and why is it interesting?
DuckLake is an open lakehouse format that stores table data in open files like Parquet while managing all metadata in a SQL database.That design makes lakehouse management simpler, faster, and more reliable.
Instead of relying on many metadata files, DuckLake moves catalog and table metadata into relational tables managed through SQL transactions.
That brings a few interesting advantages:
Why use it?
Because it offers a lightweight way to combine:
For development, the catalog can even be a local DuckDB file.For more centralized setups, DuckLake can use systems like PostgreSQL, SQLite, MySQL, or MotherDuck as the catalog backend.
The interesting part is that DuckLake can support multiple compute nodes reading and writing the same dataset through the central catalog, which solves a concurrency limitation you typically have with plain DuckDB alone.
In short: DuckLake is trying to make the lakehouse model simpler by keeping the data open and letting a real SQL database do what it does best: manage metadata and transactions.