Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Weaviate – Build your own generative health search engine (github.com/weaviate)
39 points by edichief on July 18, 2023 | hide | past | favorite | 26 comments
We are super excited to release our latest open-source demo, Healthsearch. This demo decodes user reviews of supplements and performs semantic- and generative search on them, retrieving the most related products for specific health effects, and leveraging Large Language Models to generate product and review summaries. The demo can understand natural language queries and derive all search filters directly from the context of your query.


The near-bot responses from people who do not clearly identify that they work at this company are very frustrating. This feels like using HN for manufactured marketing campaigns.

For instance, Take CShorten's last two responses to threads that do not clearly identify that they are direct marketing for Weaviate.

https://news.ycombinator.com/item?id=36670548 Thanks so much for sharing!

https://news.ycombinator.com/item?id=36760269 Loved this paper - so much opportunity to explore Retrieval-Augmented Generation with these longer input LLMs + Vector Databases & Search!

Take another example of a user whose submission CShorten commented on: https://news.ycombinator.com/threads?id=bobvanluijt

The overwhelming majority of these comments are _spam_.

If you work at Weaviate, I'd recommend you stop doing this stuff. I'm way more likely to look at a competitor who feels less like a bot network.


IMHO, having employees (and otherwise affiliated) come to the thread can be a huge positive for the conversation, but it is indeed very important to disclose the affiliation! Don't fall into the trap of "well I'm just commenting my personal opinion, I don't need to disclose."

I think you can make a perfectly logical and reasonable argument about why it should be fine, but in the end it doesn't matter. It will backfire because it's been abused to the nth degree in the past and now people's antennae are up.

Transparency around affiliation is key. If you're commenting something of substance, then people will appreciate it and value your comment. If you're not adding anything substantive, then it's best not to comment at all.


Thanks, appreciate your perspective. This is much more insightful than just "affiliated==botnet". I agree that this can be a huge positive and - at least for me - that was the motivation for wanting to be active in the comments; to see if there are any challenging questions, feedback, etc.

As a HN user, I often go straight to the comments and only look at the submission if there's an interesting discussion there. I can totally see how spammy-looking comments with an undisclosed affiliation can have the exact opposite effect.

(affiliated with Weaviate)


Nobody said affiliated equals botnet. You have accounts which go into discussions about Weaviate and write low quality comments that are essentially spam. They have no other real HN history.

Do you not think that the comments in here could’ve been generated by bots?

Stop it.


Hey all, my apologies for these comments! I will be more mindful of this going forward!


    # Easter Egg 
     if query_text == "easteregg": 
         return JSONResponse( 
             content={ 
                 "query": " Congratulations, you rolled the demo!", 
                 "results": {}, 
                 "generative_summary": "You just got rick-rolled...", 
             } 
         )
Haha


> ==

Interesting encoding/embedding and vector search you have there :-)


Haha! I initially used the rick-roll meme whenever the demo crashed, but at some point, I couldn't stand getting rick-rolled by my demo any more... :)


Haha awesome you found it so quickly!


Weaviate is pretty cool IMO. It is open source and fairly easy* to get this running locally and do stuff with it. Definitely worth playing with. In the past I have done a search-type thing for a side project and used Elastic Search. While I think ES is great, it might be overkill and something like this might be easier to manage.

You can even run Weaviate as an embedded python package rather than a separate process, so compare that to running an ES cluster and I think life gets a lot easier. Although you are probably working 'closer to the metal' but really if you use ES you will be tuning the hell out of it anyway.

* depends on your experience, but if you can use docker, and know roughly what an embedding basically is a concept, then not too hard to get something up and running.


I wonder what Weaviate users think of the GraphQL interface. When using an LLM to do this type of query generation it needs to understand the schema and be trained to generate a semantically correct request. We found better results putting the schema itself in the prompt to help generate valid requests from English and I see here it is added in the system prompt as well. I wonder if a simpler interface might suffice so that the retrieval can be more flexible, though I don't have a better alternative at the moment :)


Great to hear that you also experimented with this! I agree that this approach is limiting and I think this can be further improved. I'm wondering whether a Retrieval Augmented Generation (RAG) could make this more flexible.


One of my college room mates ended up with kidney stones at a very young age (early 20s) because of a dietary supplement. He got somewhat lucky; there have been cases of liver failure [3], damage to vision [4], and many other serious health complications. Worse still, diagnosis can be difficult because in the US dietary supplements are covered under separate rules from drugs [1], the FDA is limited to post-market enforcement [2], and ingredient labels are often incomplete or wrong [5,6,7].

If Healthsearch is based on the contents of advertisements and reviews, then it may not be much more than a search engine for at best unsubstantiated marketing claims and anecdotes about the placebo effect, and at worst summarization and promotion of dangerously inaccurate or incomplete product information.

I think this is a good example of a space where generative AI is probably likely to do far more harm than good, and where the best advice remains to seek out the advice of educated and licensed medical/dietetic professionals, where education ensures a minimum of competence and licensure ensures a minimum of skin-in-the-game.

Just remember: if it's a dietary supplement, then it's not a food or a drug. It's a third thing and mostly a wild west. You have no idea what you are putting in your body, and the words on the label are just that: mere words designed to get you to purchase something. Without ground-truth data, it's all GIGO.

[1] https://www.fda.gov/food/dietary-supplements

[2] https://www.fda.gov/food/information-consumers-using-dietary...

[3] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3076034/

[4] https://journals.sagepub.com/doi/10.1177/2474126419877567

[5] https://www.health.harvard.edu/blog/whats-in-your-supplement...

[6] https://www.gao.gov/assets/gao-19-23r.pdf

[7] https://www.gao.gov/assets/gao-19-23r.pdf


this is bulls*it

all the smart people know that big pharma with their _symptom not the reason_ medicines are making billions and investing hundreds of millions for political influencing, internet influencing and many other illegal things. And herbs, and unpatentable things like sodium bicarbonate, borax and silverwater are the things they have been shooting down for decades successfully (sadly)

Big pharma medicine kills millions every month. Vitamin supplements maybe 3 people in the last 50 years. Think about it.


Thank you for your comment. I appreciate your thoughtful feedback and couldn't agree more with many of your points. The primary purpose of this demo isn't to encourage self-diagnosis or the unmonitored consumption of dietary supplements but to demonstrate the potential of semantic search in a field that we know is fraught with misinformation.

As you rightly pointed out, the regulatory landscape of dietary supplements is quite different from that of drugs or food, and this often leads to a host of issues, including misleading labels and unproven health claims. Our goal is not to feed into these problems but instead to use technology to parse through the vast amount of data and draw out meaningful information. Dietary supplements are not a cure-all or replacement for professional medical advice.

They can, however, be beneficial in some instances if used correctly and under the guidance of healthcare professionals. Melatonin, a hormone that our bodies naturally produce in response to darkness, plays an essential role in regulating our sleep-wake cycle. Research has shown that supplemental melatonin can benefit those with disrupted sleep patterns. For instance, a systematic review and meta-analysis published in "Sleep Medicine Reviews" found that melatonin reduces sleep onset latency (the time it takes to fall asleep), increases total sleep time, and enhances overall sleep quality [1]. Another study in the "Journal of Clinical Sleep Medicine" reported that melatonin could benefit those with Delayed Sleep-Wake Phase Disorder, a condition characterized by a significant delay in sleep onset and waking times [2]. These studies suggest that, under specific circumstances, melatonin can be an effective sleep aid, emphasizing the importance of consulting with a healthcare provider to assess individual needs and potential benefits.

[1] https://pubmed.ncbi.nlm.nih.gov/23691095/ [2] https://pubmed.ncbi.nlm.nih.gov/26414986/


That's a nice narrative.

How is any of it relevant to a system that uses LLMs trained on user-written reviews of health supplements to recommend products based on end user's descriptions of health conditions?


Disclaimer the author of this project is in my team, and I am the initiator of this project, so you see me now fighting for it.

In the end, how do you want to make self-treatment options accessible? We have reflected exactly on your valid critique before doing and publishing this project. User or patient written feedback around medication or self-treatment options is always risky as a source. But, it doesn't help if big pharma is the funnel that determines the information situation, there the interest is purely monetary-driven. Yes, this project is explicitly about supplement reviews, but the overall big picture and the next iteration it also include the analysis of Reddit comments in support groups also for rare diseases.

There are dozens of anecdotes for treatment options that have not reached the mainstream only because they use substances that are not or not longer patentable and patients have to resort to pharmacotherapy that has significantly more adverse side effects, to complete the picture, i have experienced it myself as I have a rare autoimmune disease and only learned about a new experimental treatment options thanks to self build NLP pipeline that analyzed Reddit comments. In the end I was able to stop my classic medications that had unpleasant side effects. I don't want to make this too personal because I want to address your legitimate criticism. All I want to say is that we do not want to be uncritical about the adverse effects of supplements, as they're evident depending on the substance we're discussing.

The narrative "it is just a supplement, so there are no dangers" is definitely something to avoid. You should not take them without critical reflection, as supplements are not without adverse drug effects. Sometimes the same substance depending on your location, is a medication or an uncontrolled substance like Berberine for example. I took it too long before researching to find out it could adversely affect your microbiome etc.

This project is not about being uncritical of supplements. Btw the NLP pipeline also detects adverse drug effects because, as already said, anything that has an effect can also have side effects.

So long post is long, now help us improve here. What would you do besides the big fat red disclaimer we have in the project to address your concerns? Happy to adjust the project to improve here, we're going to have a blog post about this project, so I'm already thinking of having a passage around what I have written in this post.


> What would you do besides the big fat red disclaimer we have in the project to address your concerns?

I probably wouldn't use an LLM for this problem domain. Or, if I did use an LLM, it'd be as a way to map the user's requests into an expert system. The ES would generate a recommendation as well as a set of diagnostic assumptions extracted from the prompt. The user should be presented with a checklist of extracted diagnostic assumptions along with the recommendation. The recommendation should include any specific warnings about the active ingredient(s) together with a general warning about the wild west nature of health supplements - the active ingredient may not be present and other harmful ingredients may be present.

To build out the expert system, I would find a team member who is a medical professional with expertise in health supplements. An MD or researcher with relevant SME would be the obvious choice, but I've also talked with some truly excellent registered dieticians, nurses, and PharmDs.

Finally, I would only recommend a limited white-list of health supplements that have some form of third party verified quality control in place.

Honestly, if you're interested in innovating in this space, third-party vetting a la the GAO reports from my original post seems like a MUCH more valuable product than anything using AI hotness. I don't think people need an LLM here; what they need is ground truth, and an LLM can't help with that. If I wanted to innovate in the health supplement space, I'd put the NLP away and figure out how to automate ingredient testing.


where can you download the database? where the health reviews are ?


Epic! Love this, one of the best Weaviate demos I've seen!


> Epic! Love this, one of the best Weaviate demos I've seen!

Just out of interest, do you still work for Weaviate? Probably worth mentioning.


Hey robertlagrant, my apologies -- still figuring out best practices on Hacker News. Will be more mindful of this going forward!


Thanks! Working on this and all the exciting features was really fun, from semantic and generative search to using Weaviate as a Semantic Cache and translating natural queries to GraphQL. The live demo was also recently updated with some good stuff! https://healthsearch-frontend.onrender.com/


Love the visual design, too. The generative/RAG/semantic search is the interesting part, but it's also just very pleasant to look at. Which can go a long way.

EDIT: Disclaimer, I am affiliated with Weaviate.


> Love the visual design, too. The generative/RAG/semantic search is the interesting part, but it's also just very pleasant to look at. Which can go a long way.

Just out of interest, do you still work for Weaviate? Probably worth mentioning.


Fair point, edited my post. The fact that I love the visual design is my personal opinion though :-D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: