pvpv's comments

pvpv · on Aug 30, 2023

just to compare both fusion approaches we did some internal benchmarks testing recall on a standard (FIQA) dataset. According to our internal benchmarks, the relativeScoreFusion algorithm showed a ~6% improvement in recall over the default rankedFusion method.

pvpv · on July 21, 2023

what andre said haha srsly

pvpv · on July 19, 2023

Highly opinionated as I'm working for Weaviate, so take my comment with a large portion of salt.

As always depends on your use-case.

My highly opinionated view is that for Elastic, they're not really open source and the dependency on Java of the Lucene ecosystem is a big disadvantage, so as you already said, speed, they're getting better at this, but if you need to scale, this problem scales with you.

So if you already have ELK stack and don't need to scale, sure go for it otherwise, Weaviate offers real open source, so use it for free on your own infrastructure https://github.com/weaviate/weaviate

pvpv · on July 19, 2023

Using it for years and loving it!

pvpv · on July 19, 2023

How to use vector databases for semantic search, question answering, and generative search in Python with OpenAI and Weaviate.

pvpv · on July 19, 2023

Disclaimer the author of this project is in my team, and I am the initiator of this project, so you see me now fighting for it.

In the end, how do you want to make self-treatment options accessible? We have reflected exactly on your valid critique before doing and publishing this project. User or patient written feedback around medication or self-treatment options is always risky as a source. But, it doesn't help if big pharma is the funnel that determines the information situation, there the interest is purely monetary-driven. Yes, this project is explicitly about supplement reviews, but the overall big picture and the next iteration it also include the analysis of Reddit comments in support groups also for rare diseases.

There are dozens of anecdotes for treatment options that have not reached the mainstream only because they use substances that are not or not longer patentable and patients have to resort to pharmacotherapy that has significantly more adverse side effects, to complete the picture, i have experienced it myself as I have a rare autoimmune disease and only learned about a new experimental treatment options thanks to self build NLP pipeline that analyzed Reddit comments. In the end I was able to stop my classic medications that had unpleasant side effects. I don't want to make this too personal because I want to address your legitimate criticism. All I want to say is that we do not want to be uncritical about the adverse effects of supplements, as they're evident depending on the substance we're discussing.

The narrative "it is just a supplement, so there are no dangers" is definitely something to avoid. You should not take them without critical reflection, as supplements are not without adverse drug effects. Sometimes the same substance depending on your location, is a medication or an uncontrolled substance like Berberine for example. I took it too long before researching to find out it could adversely affect your microbiome etc.

This project is not about being uncritical of supplements. Btw the NLP pipeline also detects adverse drug effects because, as already said, anything that has an effect can also have side effects.

So long post is long, now help us improve here. What would you do besides the big fat red disclaimer we have in the project to address your concerns? Happy to adjust the project to improve here, we're going to have a blog post about this project, so I'm already thinking of having a passage around what I have written in this post.

ke88y · on July 19, 2023

> What would you do besides the big fat red disclaimer we have in the project to address your concerns?

I probably wouldn't use an LLM for this problem domain. Or, if I did use an LLM, it'd be as a way to map the user's requests into an expert system. The ES would generate a recommendation as well as a set of diagnostic assumptions extracted from the prompt. The user should be presented with a checklist of extracted diagnostic assumptions along with the recommendation. The recommendation should include any specific warnings about the active ingredient(s) together with a general warning about the wild west nature of health supplements - the active ingredient may not be present and other harmful ingredients may be present.

To build out the expert system, I would find a team member who is a medical professional with expertise in health supplements. An MD or researcher with relevant SME would be the obvious choice, but I've also talked with some truly excellent registered dieticians, nurses, and PharmDs.

Finally, I would only recommend a limited white-list of health supplements that have some form of third party verified quality control in place.

Honestly, if you're interested in innovating in this space, third-party vetting a la the GAO reports from my original post seems like a MUCH more valuable product than anything using AI hotness. I don't think people need an LLM here; what they need is ground truth, and an LLM can't help with that. If I wanted to innovate in the health supplement space, I'd put the NLP away and figure out how to automate ingredient testing.

pvpv · on Nov 16, 2022

Coreference resolution is something all of us do instinctively many times every day even though most of us haven’t heard the term before. People use language to talk about entities, events and the relationships between them. When we mention the same thing multiple times throughout a discourse we tend to use different expressions.

pvpv · on Dec 22, 2021

If you want to play around with it: https://huggingface.co/spaces/spacy/healthsea-demo

pvpv · on June 25, 2021

don't forget to spend some start love for the repository!