mrbungie 17 hours ago

I was almost going to build a lakehouse* with DuckDB because I low-key love it, easiest and strongest analytical engine I've found yet: scale from laptops to big metal, while being mostly out-of-core when doing sane stuff, and avoiding distributed computing for SQL in the process (looking at you Spark).

That is until I found out it does not support Iceberg writes[1], big nono as I would need another engine for inserts, and I want a simple stack :(. What a bummer.

[1] https://github.com/duckdb/duckdb_iceberg/issues/37

*that is what they are called now aren't they? I just can't follow the terms anymore haha.

  • benrutter 6 minutes ago

    I'm curious, did you consider delta tables? Pretty sure duckdb supports them nicely. If you did, how come you chose not to go with them?

  • jeadie 16 hours ago

    This is one of the ideas behind using DuckDB in github.com/spiceai/spiceai

    • anentropic 7 hours ago

      That looks like an amazing "swiss army knife"...!

    • mrbungie 16 hours ago

      Looks very cool! I will take a look, tysm!

  • buremba 7 hours ago

    Not just for building a new one, it can also complement existing data-warehouse/lakehouses: https://github.com/buremba/universql

    The flight extension is excellent as it removes the need to write C++ extensions and lets you use your favorite language to develop native DuckDB catalogs. It's straightforward to build data lake connectors and plug them in as a flight catalog, thanks to Airport!

  • mritchie712 17 hours ago

    it's coming. they already have hive style parquet writes. Iceberg is more complicated than that, but it's certainly doable.

    • mrbungie 17 hours ago

      Yeah, it just would be great if it already did so and I hope it supports Iceberg soon, as it would enable me to change expensive (and bad) engines like AWS Athena for something more manageable.

      Don't get me wrong, I'm just being a tongue-in-check egotistical bastard data engineer from hell. DuckDB is a fine piece of software as it is, and those mantainers deserve heaven.

  • sukhavati 7 hours ago

    same here man, ended up going with trino explicitly for writing and data management and using chdb/duckdb to process data for front-ends etc (mostly ethereum data so chdb "support" for ui256 is quite important)

r3tr0 15 hours ago

I love duck db. We use it a ton for indexing and organizing system / kernel level metrics exported by eBPF.

Check out our sandbox:

https://yeet.cx/play

blef 18 hours ago

This is a cool thought exercise to think that everything that we do in the data world can be done in SQL, from SQL. In a sense this is the MCPs but for the DuckDB world.

  • rustyconover 7 hours ago

    Thanks for taking the time to understand the philosophy of the extension.

k_bx 12 hours ago

Not clear. Will this allow loading ipc files in DuckDB finally? That's been my biggest issue, since I use IPC files for append operations before I turn them into parquet files.

rubenvanwyk 14 hours ago

Does this mean the data source and destination both have to set up flight servers? I imagine then this won’t be useful for integration of third-party services.

vkaku 12 hours ago

This is very nice. I also love the fuzzycomplete and lindel from the same org/authors.

  • code_biologist 11 hours ago

    fuzzycomplete - https://github.com/Query-farm/fuzzycomplete "This fuzzycomplete extension serves as an alternative to DuckDB's autocomplete extension, with several key differences: ..."

    lindel - https://github.com/Query-farm/lindel "This lindel extension adds functions for the linearization and delinearization of numeric arrays in DuckDB. It allows you to order multi-dimensional data using space-filling curves. ... Linearization maps multi-dimensional data into a one-dimensional sequence while preserving locality, enhancing the efficiency of data structures and algorithms for spatial data, such as in databases, GIS, and memory caches."

the_optimist 17 hours ago

What’s the situation where this is useful? Seems like ‘replace your remote duckDB instance—used to replace a DB server—with duckDB instance + a flight server (or a bunch of them!)’. Who has a problem for which this is the solution?

  • simlevesque 16 hours ago

    A Flight server paired with duckdb is a good way to get concurrent writes.