In many animals, hearing gets less sensitive in general when they sleep. I think it's common for people to be surprised by that, but it works that way to maximize sleep function.
Dog's are thought to be an exception, because part of their domestication involved selection for the offspring that was more alert (watch dogs).
The brain is thought to be hyper sensitive to a certain subset of sounds while sleeping, such as babies crying.
White noise is thought to work by drowning out the sounds we are most sensitive to.
The day I discovered that marquee tags have a direction attribute, using which you can make the text go up/down left/right and use multiple of these tags, is still etched in my memory.
We generally tend to engage in in-depth conversations with our users.
But in this case, when you opened the GitHub issue, we noticed that you’re part of the Meilisearch team, so we didn’t want to spend too much time explaining something in-depth to someone who was just doing competitive research, when we could have instead spent that time helping other Typesense users. Which is why the response to you might have seemed brief.
For what it’s worth, the approach used in Typesense is called Reciprocal Rank Fusion (RRF) and it’s a well researched topic that has a bunch of academic papers published on it. So it’s best to read those papers to understand the tradeoffs involved.
> But in this case, when you opened the GitHub issue, we noticed that you’re part of the Meilisearch team, so we didn’t want to spend too much time explaining something in-depth to someone who was just doing competitive research, when we could have instead spent that time helping other Typesense users. Which is why the response to you might have seemed brief.
Well, in this case I was just trying to be a normal user that want the best relevancy possible and couldn’t find a solution.
But the reason why I couldn’t find it was not because you didn’t want to spend more time on my case, it was because typesense provide no solution to this problem.
> it’s a well researched topic that has a bunch of academic papers published on it. So it’s best to read those papers to understand the tradeoffs involved.
Yeah, cool or in other word « it’s bad, we know it and we can’t help you, but it’s the state of the art, you should instruct yourself ».
But guess what, meilisearch may need some fine-tuning around your model etc, but in the end it gives you the tool to make a proper hybrid search that knows the quality of the results before mixing them.
I think this is a good example of why people should disclose their background when commenting on competing products/projects. Even if the intentions were sound, which seems to be the case here, upfront disclosure would have given the conversation more weight and meaning.
We’ve interacted before on Twitter and GitHub, and I want to address your point about Raft in Typesense since you mention it explicitly:
I can confidently say that Raft in Typesense is NOT broken.
We run thousands of clusters on Typesense Cloud serving close to 2 Billion searches per month, reliably.
We have airlines using us, a few national retailers with 100s of physical stores in their POS systems, logistic companies for scheduling, food delivery apps, large entertainment sites, etc - collectively these are use cases where a downtime of even an hour could cause millions of dollars in loss. And we power these reliably on Typesense Cloud, using Raft.
For an n-node cluster, the Raft protocol only guarantees auto-recovery for a failure of up to (n-1)/2 nodes. Beyond that, manual intervention is needed. This is by design to prevent a split brain situation. This not a Typesense thing, but a Raft protocol thing.
I'm biased, but I'd recommend exploring Typesense for search.
It's an open source alternative to Algolia + Pinecone, optimized for speed (since it's in-memory) and an out-of-the-box dev experience. E-commerce is also a very common use-case I see among our users.
I work on Typesense [1] - historically considered an open source alternative to Algolia.
We then launched vector search in Jan 2023, and just last week we launched the ability to generate embeddings from within Typesense.
You'd just need to send JSON data, and Typesense can generate embeddings for your data using OpenAI, PaLM API, or built-in models like S-BERT, E-5, etc (running on a GPU if you prefer) [2]
You can then do a hybrid (keyword + semantic) search by just sending the search keywords to Typesense, and Typesense will automatically generate embeddings for you internally and return a ranked list of keyword results weaved with semantic results (using Rank Fusion).
You can also combine filtering, faceting, typo tolerance, etc - the things Typesense already had - with semantic search.
For context, we serve over 1.3B searches per month on Typesense Cloud [3]
We store a couple million documents in typesense and the vector store is performing great so far (average search time is a fraction of overall RAG time). Didn’t realise you’ve updated to support creating the embeddings automatically; great news!
This is very difficult for me to understand. Can you explain like I'm an undergrad? What exactly does this mean? What is an embedding? What is the difference between keyword and semantic search?
Let's say your dataset has the words "Oceans are blue" in it.
With keyword search, if someone searches for "Ocean", they'll see that record, since it's a close match. But if they search for "sea" then that record won't be returned.
This is where semantic search comes in. It can automatically deduce semantic / conceptual relationships between words and return a record with "Ocean" even if the search term is "sea", because the two words are conceptually related.
The way semantic search works under the hood is using these things called embeddings, which are just a big array of floating point numbers for each record. It's an alternate way to represent words, in an N-dimensional space created by a machine learning model. Here's more information about embeddings: https://typesense.org/docs/0.25.0/api/vector-search.html#wha...
With the latest release, you essentially don't have to worry about embeddings (except may be picking one of the model names to use and experiment) and Typesense will do the semantic search for you by generating embeddings automatically.
In the upcoming version, we've also added the ability to automatically generate embeddings from within Typesense either using OpenAI, PaLM API or a built-in model like s-bert or E5. So you only have to send json and pick a model, Typesense will then do a hybrid vector+keyword search for queries.
I love that the discussions we're having (in public channels) are now automatically indexed and made searchable publicly to any users who are looking for information on Google, etc, even if they're not a part of our Slack community.
I previously used to be worried about all this time and effort we're putting in to a walled garden of information that Slack was becoming, not to mention their untenable pricing for communities.
I now find myself spending more time writing more detailed answers in Slack, because I know it's going to be available publicly for future searchers.