I was almost going to build a lakehouse* with DuckDB because I low-key love it, easiest and strongest analytical engine I've found yet: scale from laptops to big metal, while being mostly out-of-core when doing sane stuff, and avoiding distributed computing for SQL in the process (looking at you Spark).
That is until I found out it does not support Iceberg writes[1], big nono as I would need another engine for inserts, and I want a simple stack :(. What a bummer.
The flight extension is excellent as it removes the need to write C++ extensions and lets you use your favorite language to develop native DuckDB catalogs. It's straightforward to build data lake connectors and plug them in as a flight catalog, thanks to Airport!
Yeah, it just would be great if it already did so and I hope it supports Iceberg soon, as it would enable me to change expensive (and bad) engines like AWS Athena for something more manageable.
Don't get me wrong, I'm just being a tongue-in-check egotistical bastard data engineer from hell. DuckDB is a fine piece of software as it is, and those mantainers deserve heaven.
same here man, ended up going with trino explicitly for writing and data management and using chdb/duckdb to process data for front-ends etc (mostly ethereum data so chdb "support" for ui256 is quite important)
This is a cool thought exercise to think that everything that we do in the data world can be done in SQL, from SQL. In a sense this is the MCPs but for the DuckDB world.
Not clear. Will this allow loading ipc files in DuckDB finally? That's been my biggest issue, since I use IPC files for append operations before I turn them into parquet files.
Does this mean the data source and destination both have to set up flight servers? I imagine then this won’t be useful for integration of third-party services.
fuzzycomplete - https://github.com/Query-farm/fuzzycomplete "This fuzzycomplete extension serves as an alternative to DuckDB's autocomplete extension, with several key differences: ..."
lindel - https://github.com/Query-farm/lindel "This lindel extension adds functions for the linearization and delinearization of numeric arrays in DuckDB. It allows you to order multi-dimensional data using space-filling curves. ... Linearization maps multi-dimensional data into a one-dimensional sequence while preserving locality, enhancing the efficiency of data structures and algorithms for spatial data, such as in databases, GIS, and memory caches."
What’s the situation where this is useful? Seems like ‘replace your remote duckDB instance—used to replace a DB server—with duckDB instance + a flight server (or a bunch of them!)’. Who has a problem for which this is the solution?
I was almost going to build a lakehouse* with DuckDB because I low-key love it, easiest and strongest analytical engine I've found yet: scale from laptops to big metal, while being mostly out-of-core when doing sane stuff, and avoiding distributed computing for SQL in the process (looking at you Spark).
That is until I found out it does not support Iceberg writes[1], big nono as I would need another engine for inserts, and I want a simple stack :(. What a bummer.
[1] https://github.com/duckdb/duckdb_iceberg/issues/37
*that is what they are called now aren't they? I just can't follow the terms anymore haha.
I'm curious, did you consider delta tables? Pretty sure duckdb supports them nicely. If you did, how come you chose not to go with them?
Fivetran tried to upstream write support but it was not accepted https://github.com/duckdb/duckdb-iceberg/pull/95
That sounds less "not accepted" and more "will implement, rewrite required". It was only a couple months ago.
This is one of the ideas behind using DuckDB in github.com/spiceai/spiceai
That looks like an amazing "swiss army knife"...!
Looks very cool! I will take a look, tysm!
Not just for building a new one, it can also complement existing data-warehouse/lakehouses: https://github.com/buremba/universql
The flight extension is excellent as it removes the need to write C++ extensions and lets you use your favorite language to develop native DuckDB catalogs. It's straightforward to build data lake connectors and plug them in as a flight catalog, thanks to Airport!
it's coming. they already have hive style parquet writes. Iceberg is more complicated than that, but it's certainly doable.
Yeah, it just would be great if it already did so and I hope it supports Iceberg soon, as it would enable me to change expensive (and bad) engines like AWS Athena for something more manageable.
Don't get me wrong, I'm just being a tongue-in-check egotistical bastard data engineer from hell. DuckDB is a fine piece of software as it is, and those mantainers deserve heaven.
same here man, ended up going with trino explicitly for writing and data management and using chdb/duckdb to process data for front-ends etc (mostly ethereum data so chdb "support" for ui256 is quite important)
I love duck db. We use it a ton for indexing and organizing system / kernel level metrics exported by eBPF.
Check out our sandbox:
https://yeet.cx/play
This is a cool thought exercise to think that everything that we do in the data world can be done in SQL, from SQL. In a sense this is the MCPs but for the DuckDB world.
Thanks for taking the time to understand the philosophy of the extension.
Not clear. Will this allow loading ipc files in DuckDB finally? That's been my biggest issue, since I use IPC files for append operations before I turn them into parquet files.
That’s possible with the arrow extension today.
I was sure it supported .arrow but not the streaming .ipc format, but will re-check when I have a chance
It was not supported for quite a while indeed, but now there's https://duckdb.org/2025/05/23/arrow-ipc-support-in-duckdb.ht...
Does this mean the data source and destination both have to set up flight servers? I imagine then this won’t be useful for integration of third-party services.
Only the data source.
This is very nice. I also love the fuzzycomplete and lindel from the same org/authors.
fuzzycomplete - https://github.com/Query-farm/fuzzycomplete "This fuzzycomplete extension serves as an alternative to DuckDB's autocomplete extension, with several key differences: ..."
lindel - https://github.com/Query-farm/lindel "This lindel extension adds functions for the linearization and delinearization of numeric arrays in DuckDB. It allows you to order multi-dimensional data using space-filling curves. ... Linearization maps multi-dimensional data into a one-dimensional sequence while preserving locality, enhancing the efficiency of data structures and algorithms for spatial data, such as in databases, GIS, and memory caches."
Thanks for the compliments!
last monday: https://news.ycombinator.com/item?id=44036343
What’s the situation where this is useful? Seems like ‘replace your remote duckDB instance—used to replace a DB server—with duckDB instance + a flight server (or a bunch of them!)’. Who has a problem for which this is the solution?
A Flight server paired with duckdb is a good way to get concurrent writes.