Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's great to see more and more talk of vector search and vector databases. We've been promoting this technology for over a year now and have several intro articles for anyone looking to learn more[1], and a generous free tier on our vector search service[2] for anyone looking to give vector search a shot.

[1] https://www.pinecone.io/learn/

[2] https://app.pinecone.io/

We are also actively researching the space, and just recently published a paper on improving Google's ScaNN: https://arxiv.org/abs/2112.02179



That reference/learning page is a great resource!

As for Pinecone itself, what are the main selling points as you see them for a simple application (e.g. comparing trigram-vectorized sets of strings) when compared to a home-rolled solution using postgres with array types? Better performance, ease of indexing, etc.?


It will depend on your use-case, but primarily:

(1) Pinecone uses dense vectors which can encode much more meaningful info, eg the actual 'semantic meaning' behind a sentence as we (people) would understand it, or the context in an image. Because of this, we can enable much richer, human-like interaction/search in your applications

(2) Performance wise, before joining Pinecone I was spending a lot of time with other dense vectors search tools like Faiss, and it isn't easy to get good or even reasonable accuracy and latency, particularly for large datasets. When I first used Pinecone, it took me maybe 10 minutes to figure everything out and start querying a reasonable dataset, search times were very fast and the accuracy incredible. Pinecone's tech is built by people that live and breath vector search, and what they've built outperforms anything I can build, even if I spend months trying to build it. I got better performance with Pinecone in 10 mins.

(3) Everything is production ready, no need to worry about deployment, security, maintenance etc, Pinecone deal with it and you can even use the service for free up to 1M vectors.


I pinged someone more technical from our team to chime in.

In the meantime I can say moving to the dense vector + ANN search combo turns regular searches into semantic searches, which means more relevant results.

If that's the case for you, then you can use Pinecone to go further and make those results fast (<100ms), fresh (CRUD + live index updates), and filtered (apply single-stage metadata- filtering). All on a fully managed system that you can scale up/down with one API call.


I've been toying with making a deckbuilder for Magic: The Gathering and could see this being potentially useful for finding fun card combinations. Thanks!


We are actually discussing this on the Weaviate Slack :-) https://weaviate.slack.com/archives/D02JM9D3HND/p16347312830...


That would be a fun use case for us to promote. Let me know when it's ready! The free plan supports as many as 1 million items, more than enough for the all MTG cards in existence. Plus you can add and filter by metadata, like card type and properties.


> Plus you can add and filter by metadata, like card type and properties.

I read through your docs and figure that will be part of the approach.

An idea I had was to find similar, or "next best", cards for replacement in popular decks or to achieve similar effects in order to bring down the cost of EDH, Modern, etc. formats. I'm just getting back into the hobby again, so having a tool like this would make my wife and wallet happy :)


I’ve resorted to playing modern with high quality fakes. Otherwise wouldn’t have the budget. Checkout bootlegmtg on reddit


I love this idea. I would pay for that service!


I just want to chime in and say that the resources on your website look amazing. I spent 5 minutes poking around and it looks really high quality.

I'm dabbling in Postgres's full text search (ts_vector) for a small website, I know that is extremely simple compared to the offerings you provide, but your site has me quite interested in this space now.

Eager to learn more about this tech!


Glad you think so. Makes me want to expand it even more! What would you like to see covered?


Maybe this is too simple, but a comparison to Postgres' ts_vector and how to do something similar in your service?


Does Pinecone have any position on the status of document embeddings and whether they would be considered PII? One of the challenges of using a fully managed service is the headache of adding yet another data subprocessor and all of the legal and compliance questions that raises.


That depends on the document. We do not see the original document, only the embedding. You can argue that is sufficiently obfuscated to not count as PII. The good news is we are SOC2 compliant and GDPR-friendly and do a bunch of other stuff to help you meet security compliance requirements: https://www.pinecone.io/security/


No, I understand that. I guess my question is actually around your experience with "You can argue that is sufficiently obfuscated to not count as PII" and whether your customers are actually successful with this argument.


Those who need more assurance just look at our SOC2 compliance, or have us go through a security review, or opt for the dedicated-environment deployment option.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: