Hacker Newsnew | past | comments | ask | show | jobs | submit | jeadie's commentslogin

This is exactly what we found. Ingest rates were tough. We partitioned and ran over multiple duckdb instances too (and wrangled the complexity).

We ending up building a Sqlite + vortex file alternative for our use case: https://spice.ai/blog/introducing-spice-cayenne-data-acceler...



Thanks for this, really enjoyed reading this and helps validate some of my personal thoughts

OT but: You joined in 2019, barely post anything, then suddenly in 2026 your comments are copy pasted LLM output. Why? Why don't you use your own voice and type with your own hands? Notice how all those copy pasta posts were nuked - for good reason - we don't like being insulted.

You joined in 2017, barely post anything, then suddenly in 2025/2026 2/3 of your posts are copy pasted links, 1 of which is dead and another is 10 years old. Why? Why don’t you use your own voice and type with your own hands? Why don’t you post something new and relevant that you made instead of attacking people who are posting entire code repos of interesting technology?

I call it suspicious activity.

touché

We’re building vector indexes into Datafusion for search (starting with S3 vectors).

Open source at https://github.com/spiceai/spiceai


This is one of the ideas behind using DuckDB in github.com/spiceai/spiceai


That looks like an amazing "swiss army knife"...!


Looks very cool! I will take a look, tysm!



This is a common feature now. If anything, for being so early to vector databases, Pinecone was rather late to integrating embeddings.

Timescale most recently added it but, yes a bunch of others: Weaviate, Spice AI, Marqo, etc.


A difference between Pinecone and many of the others you listed is that we host both embedding and reranking models in a serverless fashion. You pay for what you use while we manage the entire stack.


Do any of the others also handle reranking?


Qdrant does with its ‘Query API’.

https://qdrant.tech/documentation/concepts/hybrid-queries/

And handles embedding creation with its fastembed package.

https://github.com/qdrant/fastembed



I don't know about them, but Manticore does.

https://manticoresearch.com/use-case/vector-search/


Why not just federate Postgres and parquet files? That way the query planner can push down as much of the query and reduce how much data has to move about?


This looks functionally similar as using http://github.com/spiceai/spiceai with a postgreSQL data accelerator.


Spice AI | Senior Software Engineer | GMT+10 (e.g. Australia) through GMT-7 (e.g. Seatle/SF/LA) | Remote | Full Time

Spice AI provides building blocks for data and AI-driven applications by composing real-time and historical time-series data, high-performance SQL query, machine learning training and inferencing, in a single, interconnected AI backend-as-a-service.

We just launched github.com/spiceai/spiceai, a unified SQL query interface and portable runtime to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.

We're hiring experienced software engineers, ideally with Rust and/or Golang production experience. We're focused on large data and distributed systems, experience in these is important too. More details: https://spice.ai/careers#section-open-positions


it says remote but the open positions are mostly hybrid


And yes, Iceberg is very high up on our list


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: