Show HN: Customizable, embeddable Chat GPT based on your own documents

Zak · on March 29, 2023

I said "Avada kedavra!" to the Dumbledore bot and got an internal server error. Is this intended behavior?

gcanyon · on March 30, 2023

Well, you killed it, obviously. :-)

pavelstoev · on March 29, 2023

Hi Bea and thank you for sharing your creation. Looking good ! Couple of questions:

1) could you please describe your data privacy considerations. Like what happens to my documents after they are uploaded ? Are they stored somewhere (encrypted or not) or deleted ?

2) could you please share more details on how this works “under the hood”. Specifically how do you ingest and digest the knowledge contained in my documents ?

Thanks !

highwaylights · on March 29, 2023

On the website it states:

“Step 1

Import or sync documents into Libraria, or add API integrations like Google and Shopify (beta). Bring the docs - let GPT-3.5 do the heavy lifting.”

I might be reading it wrong, or might have missed it on the website, but is it actually GPT-3.5 running over those imported documents? (As in, are you using OpenAI or another third party provider in the background?).

If you’re running a local LLM then the privacy implications are clearly pretty different than if you’re essentially sending people’s documents verbatim into an external LLM.

monkeydust · on March 29, 2023

Also curious before I use.

Tbh I have managed to build this minus the nicer interface using langchain. Was surprisingly easy as someone who doesn't dev daily.

https://blog.langchain.dev/retrieval/

bsenftner · on March 29, 2023

This is the first "real" app people make after initially getting familiar the OpenAI API. I don't see how this can be sustained without an expensive feature race with the horde of similar services that are appearing, and with more than one programmer.

monkeydust · on March 29, 2023

Yea I mean it is going to get commoditized to the point where its just an Azure service - throw some docs/links etc, maybe tune some parameters (if your and advanced user) and off you go - Q&A bot against your own material.

baq · on March 29, 2023

Given their office copilot demo they’ve already got something better.

monkeydust · on March 29, 2023

Thanks I hadn't got around to watching that yet just did. My to-read/watch list is getting ridiculous now as I just cant keep up with the pace given job/life!

https://www.youtube.com/watch?v=S7xTBa93TX8

WastingMyTime89 · on March 29, 2023

> If you’re running a local LLM then the privacy implications are clearly pretty different than if you’re essentially sending people’s documents verbatim into an external LLM.

Is it?

You are basically entering a contractual relationship with them regarding the propagation of your document and they are themselves entering a contractual relationship with their suppliers. It's not different from hosting in the cloud.

Do you expect every webapps you use to tell you if they use Azure, AWS or GCP?

ceejayoz · on March 29, 2023

> Do you expect every webapps you use to tell you if they use Azure, AWS or GCP?

In their privacy policies, yes. And I expect them to have signed a DPA with those and other vendors.

WastingMyTime89 · on March 29, 2023

I’m downvoted but I still believe the position doesn’t make sense. You state and negotiate the level of protection you want for your data with your direct interlocutor. They then enforce that with their own suppliers.

You are free to want to know the whole chain of companies involved in providing you a service but that’s very unusual.

ceejayoz · on March 29, 2023

It's hardly that unusual.

https://cloud.google.com/terms/subprocessors

https://aws.amazon.com/compliance/sub-processors/

https://www.cloudflare.com/gdpr/subprocessors/

https://www.atlassian.com/legal/sub-processors

https://slack.com/slack-subprocessors

https://www.linkedin.com/legal/l/customer-subprocessors

https://grafana.com/legal/list-of-subprocessors/

WastingMyTime89 · on March 29, 2023

Looking at your answer, I surmised it was a legal obligation and yes, it's indeed part of the European GDPR. I learnt something new today.

yipbub · on March 29, 2023

I asked Dumbledore something from the books[0] and got something somewhat innovative, somewhat wrong: "How can I turn a staircase into a slide?"

> To turn a staircase into a slide, you can use the spell 'Tergeo' to remove any dirt or grime from the stairs, and then use the spell 'Lubricus' to make the stairs slippery. Lubricus is an incantation, which can be cast with a wand and by reciting the word 'Lubricus' while pointing the wand at the object in question. You can also use the spell 'Flagrante Curse', which can make a surface slippery by causing the target object to become extremely hot, in order to render the staircase slippery. However, this spell is considered one of the Unforgivable Curses and is strictly forbidden. It takes years of study and practice to become a proficient wizard, and performing spells can be dangerous, so it's important to learn and practice magic under the guidance of a skilled wizard or witch.

[0] https://harrypotter.fandom.com/wiki/Glisseo

userbinator · on March 29, 2023

and got something somewhat innovative, somewhat wrong

That's not surprising, it's just par for the course with LLMs.

bealuga · on March 29, 2023

:D LOL. The dumbledore example is interesting. I left "GPT" enabled, so it's not just pulling from the documents. I think I'll add settings for users to be able to change the temperature etc. for the assistants in the future as well

afro88 · on March 29, 2023

The first thing I did was try and get it to hallucinate or talk about things that aren't in the harry potter books.

This is the main concern for me with chatgpt bots: they make things up and can be prompt hacked to get outside their bounds.

So just some feedback that your demos should show how you tackle this too.

tirpen · on March 29, 2023

That's a pretty cool service. But as usual with these services today, it's totally trivial to make it do something completely different than what's intended:

https://i.imgur.com/VKrfWYm.png

which would make me very wary of hooking it up with an API-key that I'm paying for, since I'd basically be paying for free GPT access for anyone who visits my site, while I would probably only be interesting in paying when they are asking questions related to my topics.

bealuga · on March 29, 2023

hi, try the same with Zappy! As mentioned in another comment, Dumbledore was set up to "enable" GPT: https://ibb.co/8sQh1X6 . I give you the option to "only" use your own documents.

dash2 · on March 29, 2023

I wonder if you’ve considered academics as a target market. We have a lot of pdfs and might like help in “thinking” about them.

golergka · on March 29, 2023

There's http://chatpdf.com already

dash2 · on March 29, 2023

Interesting and cool, but here's why serious academics should avoid it:

When you post Contributions, you grant us a license (including use of your name, trademarks, and logos): By posting any Contributions, you grant us an unrestricted, unlimited, irrevocable, perpetual, non-exclusive, transferable, royalty-free, fully-paid, worldwide right, and license to: use, copy, reproduce, distribute, sell, resell, publish, broadcast, retitle, store, publicly perform, publicly display, reformat, translate, excerpt (in whole or in part), and exploit your Contributions (including, without limitation, your image, name, and voice) for any purpose, commercial, advertising, or otherwise, to prepare derivative works of, or incorporate into other works, your Contributions, and to sublicense the licenses granted in this section. Our use and distribution may occur in any media formats and through any media channels.

bealuga · on March 29, 2023

That would be really really cool if I could able to serve that space. I'd be curious to know what kind of features you'd want to have, what would be deal-breakers, etc!

dash2 · on March 29, 2023

Basic setup: point it at a folder of PDFs, have it recurse in and read them all, then ask it questions like:

* Summarize these papers on chimpanzee cooperation in the wild. What other papers should I be reading?

* Suggest an interesting master's thesis topic on the early modern economy.

* How good are polygenic scores at predicting educational attainment, and how has this developed over time?

Bonus: integrate it with e.g. google scholar, so it can go and find and read new papers.

Pricing: it is probably easier to start selling this to individual academics. Then when you've got a compelling product and the word is out, you could sell it to the whole lab (at a much higher price because people can put it in their grant budgets).

Gotchas: privacy. Nobody wants their hot unpublished paper to be scooped by a large language model.

MaxikCZ · on March 29, 2023

Not person with original question but he asks for the same feature as I would like to see.

I am not sure how the documents are handled in your product, since Chat GPT has a context limit that probably wont be able to hold longer papers in memory.

For me, I have a pdf[0] depicting a system that can be programmed, along with bits of pseudo-code and a lot of clarifications. Something that the Chat GPT could use to spit out an actual implementation, if it were able to "think" about the pdf as a whole. I would love to see if your product is capable of such feat.

[0]: https://arrow.tudublin.ie/cgi/viewcontent.cgi?article=1177&c...

Drive-Based Utility-Maximizing Computer Game Non-Player Characters by Colm Sloan (note that basically only chapter 3 is needed in this case, but its still over 40 pages long)

malborodog · on March 29, 2023

Yep. This is such an obvious use case. Have you seen the best thing out there that does this? Where can I load 100gb of pdfs and ask questions about what's in them??

ssdspoimdsjvv · on March 29, 2023

Yes, this is the next step I'm looking forward to, and what would probably make LLMs really take off. Let me dump my own knowledge base or source code into ChatGPT and have it use that as its source of knowledge. I can only imagine the cost and resources required to train and run these individual models on a large scale must still be prohibitively large.

osigurdson · on March 29, 2023

This definitely seems like a huge untapped space: ChatGPT for my stuff that answers questions correctly. Sure there are privacy concerns for some things but unless you are going to train your own LLM on premise (yeah right) this will always be an issue.

I broke it trying to upload a PDF but that is ok, I'll try it again at some point.

zirgs · on March 29, 2023

Can I train a LORA on premise instead?

osigurdson · on March 29, 2023

You can do anything on premise if you have the necessary skills and hardware.

zirgs · on March 30, 2023

LORAs were invented so that we don't have to retrain the whole model if we want to add something new.

Retraining the whole Stable Diffusion base model costs a lot of money. But training a LORA on a few thousand images is totally doable with a single GPU.

mb_72 · on March 29, 2023

A few things:

1) I went through Stripe checkout to upgrade to the $10/month plan, but it's still showing me as on the free plan on the billing page.

2) I guess related to 1), but I want to show my business partner the results of a quick dump of a PDF plus scrape of our website; it's not clear how to supply him with the public chat/bot URL.

3) 'Last scraped' always shows 'invalid date'.

Feel free to reach out to me directly (email address is ***72@gmail.com for my account). Thanks, good luck with the product!

bealuga · on March 29, 2023

Thank you! Looking you up right now!

jimmySixDOF · on March 29, 2023

Love the approach and amazed by your packaging it looks slick and mature even for such a novel tech stack. But my understanding is that all the OpenAI models (3.5-Turbo et al) have a non compete terms of use clause so wondering how you approach this ? I can see the case that you are a complementary service increasing paid usage of OpenAI, but, I can also see them saying hey we want to be the ones to do that with enhanced features you are competing against. Perhaps just a long term risk to live with if you get past the short term market traction test ? Asking because I face a similar set of questions for a different set of reasons.

PanMan · on March 30, 2023

Hi Bea, I have been experimenting with it just now. The add multiple urls behaves strange.. Sometimes it gets the urls right, but after the first 20 pages, it changed all my https urls into..

  212);">(URL WITHOUT HTTPS)</span></div><div><span

I changed the url, but it seems something goes wrong in the https:// replacement - this prevents me to add more pages after the first 20 did work - I have already reloaded a few times, and it's strange that it did work for the first 20 pages (they are from the same list / url format)

Second issue: When I enable "use only my library" I get a server error from the chat. And without that, the answer wasn't specific enough - it was literally answered in the content

maphew · on March 29, 2023

I love the idea, and from the overview implementation looks like a great start. For me though it's not something I will explore. Uploading our documents to a 3rd party is not something we can do.

colordrops · on March 29, 2023

How does this work considering the token limitation of gpt? Or does the gpt api let you create your own models through their API? I'm admitting my ignorance of OpenAI's offering here.

skerit · on March 29, 2023

Wondering about this too. I thought GPT3.5 didn't offer any custom training. And it has a pretty small token limit, at least when talking about dumping all your documentation into it.

printvoid · on March 29, 2023

This is brilliant, but I am not sure if you have rushed it to launch, I am unable to upload any docs, tried to add a URL for scraping the import was successful but then it never appeared in my Library section. Connecting to github gave me issues but then when I did finally connect to github. the import is never successful. I keep on getting 500 service error and then a bunch of console.log messages with `hi` as null and sometimes `hi` having a value of an object. Let me know if you need more info regarding these issues. But this is a great product definitely.

bealuga · on March 29, 2023

Hi! I definitely did not expect this reception, so I've been running into quite a few issues (,:

Send me an email and I'll debug it for you https://libraria.dev/about

ajhai · on March 29, 2023

Congrats on the launch. We are building something similar at https://trypromptly.com that allows chatbot building on user's own documents powered by GPT. We also allow users to build AI powered apps without writing code.

For example, https://trypromptly.com/app/9da10a4f-6d20-431e-98f7-048fab81... is a chatbot built on Coursera video transcript. Similarly https://trypromptly.com/app/d478594d-2082-46c9-bee7-f057f4bc... is a web app that generates resume based on user information and the position they are applying to. Both are powered by Open AI message completions API and built using Promptly's no-code builder. Another fun app we built allows end user to provide a audio file, pipe that to Whisper and use that in the chain below to extract summaries etc.

Ps: We are still updating our landing page with app builder videos and other content

Sai_ · on March 29, 2023

I feel it is impolite to hijack someone else’s Show HN to promote your own, like-for-like product, that too as a top level comment. Not sure about HN norms around this though.

esperent · on March 29, 2023

It's a for-profit app being posted here to get free advertising. Other people jumping on to get free advertising too is fair game and definitely the HN norm.

Besides, I always bookmark threads for apps that look interesting and check back later when everyone else has posted their similar apps and been commented on, then I can compare and choose whichever one looks best (or whichever is open source).

Sai_ · on March 29, 2023

I can see your point of view.

Personally, if I find that my app is similar to an ongoing Show HN, I like to wait for an opening where I can respond with a plug for my own app.

I don't feel comfortable talking up my own app unprompted. Feels like making a big attention grabbing announcement at someone else's wedding.

malborodog · on March 29, 2023

Nah it was done kinda tastefully here and it’s helpful to know the comps

zby · on March 29, 2023

It is useful for users to learn about the alternatives.

noduerme · on March 29, 2023

I'm wondering if this is designed only for in-house use, or if it's something intended to be public-facing. If a company were to embed this on their website, say, to answer questions about products for sale...

1. Would there be any way to restrict the conversation with the customer to questions related to the products?

2. What strategies could be used to prevent malicious overuse / spamming / flooding the bot to cost the company money?

alphadog · on March 29, 2023

Congrats on the launch! I'm testing it now and are excited about it. I've tried few similar solutions in the past few days but they were buggy and not as feature rich as your solution. I do have a few questions:

- What's the differences between Assistant and Chat mode?

- If I have GPT4 access is there anyway I can insert my API key and use it with your app? I assume you're using 3.5 turbo.

- What toolchain are you using for splitting / chunking?

bealuga · on March 29, 2023

Thank you!

- Nothing as of right now! Just an interface. I don't think I implemented snippets yet for Chat mode though - Not yet, although I plan to introduce this. I found that GPT4 is a little slow right now, and much more expensive, but it'll be awesome to enable this for users, then I could charge even less. - I recommend langchain, it's one of the few things that I didn't modify for my use case on their library.

fumplethumb · on March 29, 2023

This is amazing!

I would love to have an AWS expert handy at all times, so I tried to upload all AWS documentation using this: https://docs.aws.amazon.com/sitemap_index.xml. I can no longer use the site, so I suspect that busted something.

In hindsight, that was not cool and I'm sorry about it.

maski · on March 29, 2023

Most AWS documentation is on GitHub: https://github.com/awsdocs/

bealuga · on March 29, 2023

Heh. Looking into this and will try to fix. I'll comment here again when I can get this working!

edmundsauto · on March 29, 2023

This is really cool! I would suggest a landing page based on the benefits - what cool things are people doing with this? IE some applications - what immediately pops to mind is a chatbot for your website or documentation, but I'm sure there are many more use cases you've thought of. Would be cool to see on the home page.

lysecret · on March 29, 2023

This is what LangChain is about right? I think these kind of things have to be open source and self hostable, maybe with a paid and hosted offering as a side. Of course until Apple will eventually integrate this.

serf · on March 29, 2023

I ask 'googly' "tell me about the weather in san francisco, ca today and tomorrow" -- I get a factual answer that quotes Weather Underground.

I ask 'Dumbledore' the same thing, and I get (more-or-less) the same answer.

I think that I initially expected 'Dumbledore' to either feign ignorance on the topic, or at least output a 'flavored' answer in the style of his dialogue.

I guess what i'm saying is that assigning 'characters' to each of these pre-trained/biased prompts creates a user expectation of a character rather than just a pre-loaded knowledge-base.

bealuga · on March 29, 2023

Try the same with "Zappy". It won't give you the same answer!

I have a checkbox for you to enable or disable GPT. I enabled GPT for the Dumbledore one

https://ibb.co/8sQh1X6

slagfart · on March 29, 2023

Let's say I have about 100k novel-length texts I want to import into a model like this (I want to build a book recommendation system).

How should I approach it? I think this product can't deal with that much text yet?

klibertp · on March 29, 2023

Also very interested in this. From what I read, you need to train some other model in the "transformer" family yourself, or you can use OpenAI API for fine-tuning their model. Not sure which would be better in practice, or if either would work, but AI people at my company say the fine-tuning with even just a bit of data can result in a lot better (or different) results for the given domain.

ElFitz · on March 29, 2023

Providing the sources is a nice touch! I really like it. Providing supporting excerpts too would have been nice.

Isn’t publicly hosting some specific content on S3 and providing links to it risky though?

bealuga · on March 29, 2023

good callout - I had it setup to delete immediately after processing, but looks like there was an issue on my end for that! Fixed now and deleted ones that were uploaded before. s3 was storing manual uploads (not data scraped from urls), and should not be storing them anymore.

replwoacause · on March 29, 2023

Beautifully designed. I assumed a team of people was behind this based on the apparent quality and then learned you're a solo dev. Very impressive!

bealuga · on March 29, 2023

omg. thank you.

Pamar · on March 29, 2023

Hi! Haven't tested it yet but I already have a question... I have an historio.us account which I use to "collect" articles I consider interesting.

Would it be possible make your product access a historio.us instance?

(In my use case there would be no privacy concern because all the articles stored in Historio.us are already public... also it would be quite useful because search is quite limited)

guldbog · on March 29, 2023

Love it! For positioning, I'd look into www.raffle.ai - they've found a very good way to explain why enterprise search is valuable

SMAAART · on March 29, 2023

Testing it, nice.

I tried to import a .pdf but I got an error message about not having enough "credits" https://i.imgur.com/7za9FEn.png

Also, the google drive integration is not working (yet).

Can you give us a glimpse of what's coming down the road? How can I get more credits and when will Gdrive integration will be available?

Also: pricing?

bealuga · on March 29, 2023

Hi! Strange, that google drive integration should work. I'll take a look and update you right here when it does.

Pricing: Feel free to send me an email bea@bogpad.io, I'll send you some credits to play around with. I'm still trying to figure out pricing!

SMAAART · on March 29, 2023

https://i.imgur.com/OB5yLvc.png

lxe · on March 29, 2023

Ah this is crazy how fast the space is moving. I'm literally trying to build a document embedding UI to do Q&A on. Great work!

quickthrower2 · on March 29, 2023

Love the idea. Would suggest my co. pay for it if a. know what it costs and b. there is details about how data is handled and plausible assurance it is not fed into a model and is secure.

The only way I would be convinced is if it is free/open source and self hosted, but that makes charging for it more difficult

varunjain99 · on March 29, 2023

Pretty cool that all you need is the sitemap for the assitant to index the data! Will give this a try soon :)

hhw3h · on March 29, 2023

For the Dumbledore demo I asked, "What are the 3 unforgivable curses?" and it was unable to provide an answer.

https://ibb.co/bL61nDk

Would your tool work in gated knowledge bases for example training courses behind logins?

bealuga · on March 29, 2023

As for working with gated knowledge bases, you can. You can just set your assistant as "protected".

https://bogpad.libraria.dev/ for an example of a protected one.

https://ibb.co/T8qjCgZ (strange, i tried again and got this! I'll work on improving stability)

bealuga · on March 29, 2023

Oh no! Let me manuially add the 3 unforgivable curses to the library. The way I did the dumbledore demo was upload a PDF for each of the 7 books and there ~might~ not be an explicit section in harry potter that states all of them at once?

Waterluvian · on March 29, 2023

> an explicit section in harry potter that states all of them at once?

Isn’t part of the point of GPT that it finds relationships without the training data having to be well-structured? So long as the text describes them in a way that a human, having read the books, could answer the question?

I don’t really understand how training works. This isn’t a jab.

bealuga · on March 29, 2023

It's a great question!

Right now, I don't train anything, I've broken the text down to n characters and created embeddings for that subset of text -- then I search for the closest distance / relationships between the question asked. Then I add the text to the prompt, and tell gpt to use those paragraphs to answer the question (to ensure that it doesn't make anything up). This is one of the ways I can get around the token limit, but it comes at the cost of thinking it can only use the paragraphs I show it. I'm trying to improve the prompt to get more consistent results, and maybe 4 can help me give it larger bodies of text!

Hope that answers your question, let me know if you have any more!

Waterluvian · on March 29, 2023

Oh! I think I understand.

So your software takes a prompt from me, does non-GPT work to find additional context from your source (the books, parsed and re-structured into word vectors or whatnot), and then asks GPT my prompt combined with the added context?

Like,

“What are the three foobars when considering these passages from a book <…> ?”

bealuga · on March 29, 2023

Yeah more or less! I still use open ai for their embeddings (translating text into vector space)

- Your question -> vectors with open ai embeddings - Text you uploaded before -> vectors with open ai embeddings

Get the most similar above a certain threshold, and then add it to the prompt saying

"From these articles / paragraphs, answer the user's question: What are the three foobars"

So yep! I preprocess it

Waterluvian · on March 29, 2023

I think that’s a really clever idea.

It feels like training is analogous to years of growing up and going school. And what you’ve done is taken that educated “mind” and said, “here’s a document. I’m going to quiz you on it.”

That seems really practical compared to sending an AI back to school to learn all about some specific topic.

I wonder if we will end up with a library of trained models, each of which went to different schools to obtain different focuses of knowledge. Maybe LiteratureGPT is better suited for this than General GPT or ProgrammerGPT.

Okay I think I’ve stretched the analogy far enough.

just_boost_it · on March 29, 2023

The GPT models create sequences of words that are likely to be look correct. A side effect is that it sometimes happens to find a connection between ideas. The three unforgivable curses are something that it should be able to connect, as these are all probably in a similar conceptual space, but you would need a fair bit of data to push the model to see that. It's possible to uncover these connections with less data, but with LLMs you need to do things like prompt engineering to give you more leverage on your data if it's limited.

bealuga · on March 29, 2023

Yeah, it looks like it actually got the documents correctly, but the prompt itself needed some reworking. I think there's a space out there for prompt testing.

Ozzie_osman · on March 29, 2023

Congrats, this is really cool. Have you thought about making the assistants Slack bots?

ClearAndPresent · on March 29, 2023

It appears the only way to sign up is with a Google account. (I know that is a solution suggested by OpenAI for user authentication.)

Will there be an option for email, for those of us who don't use Google, or other large corporations?

ttul · on March 29, 2023

I’m guessing that Libraria is using HyDE (1) to create hallucinogenic embeddings of the documents. It works extremely well.

1. https://github.com/texttron/hyde

aryc19 · on March 29, 2023

I asked "Dumbledore" a question. It gave the answer along with a "learn more" button which ended up being a link to PDF of the book. It shows up only for one particular question. Seems like a bug?

felgueres · on March 29, 2023

Congrats on the launch! I’m working on a similar product at https://upstreamapi.com

Im excited to see what features we build on top of semantic search + chat.

pmontra · on March 29, 2023

See also the ChatGPT Retrieval Plugin by OpenAI at https://github.com/openai/chatgpt-retrieval-plugin

topnde · on March 29, 2023

Was this plugin used to create the service OP posted?

calculated · on March 29, 2023

Great service, Bea! Would it support Databases as a source in the future?

bealuga · on March 29, 2023

Yes! I'm currently working on it. I'm trying to improve CSV imports right now (current users have large csvs they want to go through)

calculated · on March 29, 2023

Amazing. Another thing - make it si that we can give you our emails and you can send us updates there so we can track the progress of the app and come back to it when needed.

bealuga · on March 29, 2023

It's at the very bottom of the page! I'll try to make it more visible tomorrow as well (:

imcoconut · on March 29, 2023

This looks awesome. I'd be super interested in testing this out and providing feedback on it. I'm going to make an account and is there an email/link to submit feedback as I use it?

bealuga · on March 29, 2023

Will be adding my email at the about page! https://libraria.dev/about (vercel takes a few minutes to deploy, but it should be up soon)

an_aparallel · on March 29, 2023

so - if i download a boatload of research papers from scihub about topic X - this will be base its knowledge off them? if so i was just thinking this would be cool to have - and here it is?

luccasiau · on March 29, 2023

I was thinking something similar and the answer is yes I think.

I uploaded a public-domain copy of Meditations by Marcus Aurelius and now I feel like I can get direct knowledge from the man himself

https://ibb.co/g7ry4X3

It worked pretty well on the first attempt. Will try putting more authors in it now

KidComputer · on March 29, 2023

https://github.com/whitead/paper-qa

bealuga · on March 29, 2023

Yes. If you do try that, let me know! The most difficult part that i'm encountering right now is the "prompt" engineering part. So I'd be curious to know how I can make it better for you

rivals · on March 29, 2023

Just a heads-up, I asked a simple question on Harry Potter and the "Learn more" section had links to 3 full Harry Potter PDF books publicly hosted on your S3.

ChatPGT · on March 29, 2023

Do you know what would be pretty cool? A feature that let the user to upload N documents, from different fields.

Then through an API he/she can extract the knowledge he/she wants.

bealuga · on March 29, 2023

Hmm, right now you can kind of simulate that by creating n assistants with only the knowledge you want in them. But yes, no API for now. If there's demand for it, I'll gladly build it though

blacksoil · on March 29, 2023

I'm getting internal server error when pressing the demo button. Also the page after login doesn't show anything other than the sidebar on FF mobile

mcluck · on March 29, 2023

Nice! I'll be taking it for a spin after work. One question: I'm interested in trying to use this as a way of querying my Zettlekasten-like note system

ignorantguy · on March 29, 2023

This is cool. I just did a small poc yesterday. I scraped our api documentation and created an embedding and use chatgpt to answer questions. It is pretty slick.

bealuga · on March 29, 2023

Thank you! Hope you had as much fun as I've been having!

KaoruAoiShiho · on March 30, 2023

How is this possible? Doesn't it have a 4k token limit, how are you able to ingest documents that have vastly more data than that?

alexnew · on March 29, 2023

One thing that's understated is the fact that all you need is a sitemap. Wild how you can summon the knowledge of any site with a list of links.

it · on March 29, 2023

The "Ask Libraria" text field doesn't let me type anything. I tried it with Chrome and Safari on a Mac. Is this intentional?

agotterer · on March 29, 2023

I also thought it would be interactive and then realized it was just a screenshot. Would be cool to have an interactive demo right there at the top.

bealuga · on March 29, 2023

(,: I'm sorry! I'll put it up at the top soon hopefully (I'm hoping to go back to just coding on the app! Spent all day on the landing page today (,: )

bealuga · on March 29, 2023

yeah sorry! That's just an image (,: There are other demos a bit below when you scroll down

gcanyon · on March 29, 2023

I broke Dumbledore :-(

Q: Which piece did Hermione play in the chess game?

A: I'm sorry! I may not have the article in question yet. Please try again!

But this looks awesome.

munchler · on March 29, 2023

Spelling error on upload page, should be: "Add documents to your library for your assistant to use".

arcatech · on March 29, 2023

I wish people were a little more hesitant to send every single one of their thoughts and documents to OpenAI.

milar · on March 29, 2023

This is cool! I’m guessing you use something like langchain’s document loaders to make it work?

intalentive · on March 29, 2023

I did this for my email and Google docs. It’s about 15 lines of Python with langchain.

mat_jack1 · on March 29, 2023

Is this Ok to upload all your messages and your contacts to these services? I'm very worried about that, really

highwaylights · on March 29, 2023

I’d assume it has exactly the consequences you think it does, but I don’t know that and it becomes more of an unknown as more third parties are layered between you and the LLM.

I’d assume the right prompt by another user of the same underlying LLM on another platform could well expose your private information / content / passwords and would treat this with a lot of caution until you get satisfactory evidence otherwise.

mat_jack1 · on March 29, 2023

Yes, that's what concerns me. Since it's not really clear what's going on, how can we trust a third party service by uploading personal data?

oezi · on March 29, 2023

Doesn't it cost a lot of money to finetune on so many documents?

tester457 · on March 29, 2023

Yes however langchain is not finetuning, it's basically indexing!

cdelsolar · on March 29, 2023

nice. this is _exactly_ what I wanted to build when ChatGPT came out, but you actually went out and did it while I coded on passion toy apps that don't actually make me any money. Hope this is successful!

Beefin · on March 29, 2023

building something similar that scrapes a website' html, image, pdf, etc. files here's an example:

https://collie.ai/tesla

meghan_rain · on March 29, 2023

One question, are any of the prompts shared with OpenAI?

bealuga · on March 29, 2023

I use their APIs, so yes. There's no way around it right now, but hopefully in the future I can move away from relying on them

EGreg · on March 29, 2023

How does it scrape the documents?

It inserts them into the prompt?

brimstedt · on March 29, 2023

Nice work!

How many documents do you need for it to be usable?

on March 29, 2023

[deleted]

livelielife · on March 29, 2023

R.I.P. software, now there are only services (backed by software) but the software is not longer distributed, just access to the service enabled.

EntrePrescott · on March 29, 2023

Indeed, the ongoing fashion shift from locally installed software to external services is quite problematic, as it shifts the control over the user's data from the user themself to… whatever tree/graph of third parties somewhere in the network where the data is accessed, processed… and further shared (willingly or not, knowingly or not, in conformance with the agreements and laws or not) which the user has no control over.

That being said, it seems like an almost inescapable consequence of the fact that the software itself is also data and as such subject to a similar problem: in times of ubiquitous high-speed internet access where you can't protect software (data) from being widely copied and shared over the network once anyone else gets hold of it, the only simple and reliable way for a software company to keep control over the distribution of their software (and thus its monetization) is to never share it in the first place but only provide access to it as a service.

So in the end, the shift from locally installed software to remote services kinda comes down to the fact that most users don't have the same level of awareness or care about their own data and keeping control over it as the software/internet service companies have for their data (e.g. software).

marktolson · on March 29, 2023

SaaS has been around for a while now.