Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Customizable, embeddable Chat GPT based on your own documents (libraria.dev)
271 points by bealuga on March 29, 2023 | hide | past | favorite | 130 comments
Hi Hacker News!

My name is Bea, I built a site called Libraria that uses GPT to do a few things

1. Let you spin up multiple assistants based on your own documents. You can make it public, private, or protected. It has its own subdomain and landing page. 2. Respond in full markdown always, so it can output images, links, code, and more 3. Let you upload articles on the fly within the Chat, so you can ask it questions 4. Make it embeddable in your site with one line of code 5. Let you update it for fun / with your branding 5. Enable syncing for any URLs you let us scrape, so that you can make sure it's always up to date 6. Let you upload multiple file types

I've been working on this for about a month now by myself and you can keep track of my feature updates here: https://libraria.dev/feature-updates

I would LOVE your feedback on anything, and If you're willing to try it out I'm looking for a few beta users that can provide me more continuous feedback that I would gladly waive the fee for!



I said "Avada kedavra!" to the Dumbledore bot and got an internal server error. Is this intended behavior?


Well, you killed it, obviously. :-)


Hi Bea and thank you for sharing your creation. Looking good ! Couple of questions:

1) could you please describe your data privacy considerations. Like what happens to my documents after they are uploaded ? Are they stored somewhere (encrypted or not) or deleted ?

2) could you please share more details on how this works “under the hood”. Specifically how do you ingest and digest the knowledge contained in my documents ?

Thanks !


On the website it states:

“Step 1

Import or sync documents into Libraria, or add API integrations like Google and Shopify (beta). Bring the docs - let GPT-3.5 do the heavy lifting.”

I might be reading it wrong, or might have missed it on the website, but is it actually GPT-3.5 running over those imported documents? (As in, are you using OpenAI or another third party provider in the background?).

If you’re running a local LLM then the privacy implications are clearly pretty different than if you’re essentially sending people’s documents verbatim into an external LLM.


Also curious before I use.

Tbh I have managed to build this minus the nicer interface using langchain. Was surprisingly easy as someone who doesn't dev daily.

https://blog.langchain.dev/retrieval/


This is the first "real" app people make after initially getting familiar the OpenAI API. I don't see how this can be sustained without an expensive feature race with the horde of similar services that are appearing, and with more than one programmer.


Yea I mean it is going to get commoditized to the point where its just an Azure service - throw some docs/links etc, maybe tune some parameters (if your and advanced user) and off you go - Q&A bot against your own material.


Given their office copilot demo they’ve already got something better.


Thanks I hadn't got around to watching that yet just did. My to-read/watch list is getting ridiculous now as I just cant keep up with the pace given job/life!

https://www.youtube.com/watch?v=S7xTBa93TX8


> If you’re running a local LLM then the privacy implications are clearly pretty different than if you’re essentially sending people’s documents verbatim into an external LLM.

Is it?

You are basically entering a contractual relationship with them regarding the propagation of your document and they are themselves entering a contractual relationship with their suppliers. It's not different from hosting in the cloud.

Do you expect every webapps you use to tell you if they use Azure, AWS or GCP?


> Do you expect every webapps you use to tell you if they use Azure, AWS or GCP?

In their privacy policies, yes. And I expect them to have signed a DPA with those and other vendors.


I’m downvoted but I still believe the position doesn’t make sense. You state and negotiate the level of protection you want for your data with your direct interlocutor. They then enforce that with their own suppliers.

You are free to want to know the whole chain of companies involved in providing you a service but that’s very unusual.



Looking at your answer, I surmised it was a legal obligation and yes, it's indeed part of the European GDPR. I learnt something new today.


I asked Dumbledore something from the books[0] and got something somewhat innovative, somewhat wrong: "How can I turn a staircase into a slide?"

> To turn a staircase into a slide, you can use the spell 'Tergeo' to remove any dirt or grime from the stairs, and then use the spell 'Lubricus' to make the stairs slippery. Lubricus is an incantation, which can be cast with a wand and by reciting the word 'Lubricus' while pointing the wand at the object in question. You can also use the spell 'Flagrante Curse', which can make a surface slippery by causing the target object to become extremely hot, in order to render the staircase slippery. However, this spell is considered one of the Unforgivable Curses and is strictly forbidden. It takes years of study and practice to become a proficient wizard, and performing spells can be dangerous, so it's important to learn and practice magic under the guidance of a skilled wizard or witch.

[0] https://harrypotter.fandom.com/wiki/Glisseo


and got something somewhat innovative, somewhat wrong

That's not surprising, it's just par for the course with LLMs.


:D LOL. The dumbledore example is interesting. I left "GPT" enabled, so it's not just pulling from the documents. I think I'll add settings for users to be able to change the temperature etc. for the assistants in the future as well


The first thing I did was try and get it to hallucinate or talk about things that aren't in the harry potter books.

This is the main concern for me with chatgpt bots: they make things up and can be prompt hacked to get outside their bounds.

So just some feedback that your demos should show how you tackle this too.


That's a pretty cool service. But as usual with these services today, it's totally trivial to make it do something completely different than what's intended:

https://i.imgur.com/VKrfWYm.png

which would make me very wary of hooking it up with an API-key that I'm paying for, since I'd basically be paying for free GPT access for anyone who visits my site, while I would probably only be interesting in paying when they are asking questions related to my topics.


hi, try the same with Zappy! As mentioned in another comment, Dumbledore was set up to "enable" GPT: https://ibb.co/8sQh1X6 . I give you the option to "only" use your own documents.


I wonder if you’ve considered academics as a target market. We have a lot of pdfs and might like help in “thinking” about them.


There's http://chatpdf.com already


Interesting and cool, but here's why serious academics should avoid it:

When you post Contributions, you grant us a license (including use of your name, trademarks, and logos): By posting any Contributions, you grant us an unrestricted, unlimited, irrevocable, perpetual, non-exclusive, transferable, royalty-free, fully-paid, worldwide right, and license to: use, copy, reproduce, distribute, sell, resell, publish, broadcast, retitle, store, publicly perform, publicly display, reformat, translate, excerpt (in whole or in part), and exploit your Contributions (including, without limitation, your image, name, and voice) for any purpose, commercial, advertising, or otherwise, to prepare derivative works of, or incorporate into other works, your Contributions, and to sublicense the licenses granted in this section. Our use and distribution may occur in any media formats and through any media channels.


That would be really really cool if I could able to serve that space. I'd be curious to know what kind of features you'd want to have, what would be deal-breakers, etc!


Basic setup: point it at a folder of PDFs, have it recurse in and read them all, then ask it questions like:

* Summarize these papers on chimpanzee cooperation in the wild. What other papers should I be reading?

* Suggest an interesting master's thesis topic on the early modern economy.

* How good are polygenic scores at predicting educational attainment, and how has this developed over time?

Bonus: integrate it with e.g. google scholar, so it can go and find and read new papers.

Pricing: it is probably easier to start selling this to individual academics. Then when you've got a compelling product and the word is out, you could sell it to the whole lab (at a much higher price because people can put it in their grant budgets).

Gotchas: privacy. Nobody wants their hot unpublished paper to be scooped by a large language model.


Not person with original question but he asks for the same feature as I would like to see.

I am not sure how the documents are handled in your product, since Chat GPT has a context limit that probably wont be able to hold longer papers in memory.

For me, I have a pdf[0] depicting a system that can be programmed, along with bits of pseudo-code and a lot of clarifications. Something that the Chat GPT could use to spit out an actual implementation, if it were able to "think" about the pdf as a whole. I would love to see if your product is capable of such feat.

[0]: https://arrow.tudublin.ie/cgi/viewcontent.cgi?article=1177&c...

Drive-Based Utility-Maximizing Computer Game Non-Player Characters by Colm Sloan (note that basically only chapter 3 is needed in this case, but its still over 40 pages long)


Yep. This is such an obvious use case. Have you seen the best thing out there that does this? Where can I load 100gb of pdfs and ask questions about what's in them??


Yes, this is the next step I'm looking forward to, and what would probably make LLMs really take off. Let me dump my own knowledge base or source code into ChatGPT and have it use that as its source of knowledge. I can only imagine the cost and resources required to train and run these individual models on a large scale must still be prohibitively large.


This definitely seems like a huge untapped space: ChatGPT for my stuff that answers questions correctly. Sure there are privacy concerns for some things but unless you are going to train your own LLM on premise (yeah right) this will always be an issue.

I broke it trying to upload a PDF but that is ok, I'll try it again at some point.


Can I train a LORA on premise instead?


You can do anything on premise if you have the necessary skills and hardware.


LORAs were invented so that we don't have to retrain the whole model if we want to add something new.

Retraining the whole Stable Diffusion base model costs a lot of money. But training a LORA on a few thousand images is totally doable with a single GPU.


A few things:

1) I went through Stripe checkout to upgrade to the $10/month plan, but it's still showing me as on the free plan on the billing page.

2) I guess related to 1), but I want to show my business partner the results of a quick dump of a PDF plus scrape of our website; it's not clear how to supply him with the public chat/bot URL.

3) 'Last scraped' always shows 'invalid date'.

Feel free to reach out to me directly (email address is ***72@gmail.com for my account). Thanks, good luck with the product!


Thank you! Looking you up right now!


Love the approach and amazed by your packaging it looks slick and mature even for such a novel tech stack. But my understanding is that all the OpenAI models (3.5-Turbo et al) have a non compete terms of use clause so wondering how you approach this ? I can see the case that you are a complementary service increasing paid usage of OpenAI, but, I can also see them saying hey we want to be the ones to do that with enhanced features you are competing against. Perhaps just a long term risk to live with if you get past the short term market traction test ? Asking because I face a similar set of questions for a different set of reasons.


Hi Bea, I have been experimenting with it just now. The add multiple urls behaves strange.. Sometimes it gets the urls right, but after the first 20 pages, it changed all my https urls into..

  212);">(URL WITHOUT HTTPS)</span></div><div><span
I changed the url, but it seems something goes wrong in the https:// replacement - this prevents me to add more pages after the first 20 did work - I have already reloaded a few times, and it's strange that it did work for the first 20 pages (they are from the same list / url format)

Second issue: When I enable "use only my library" I get a server error from the chat. And without that, the answer wasn't specific enough - it was literally answered in the content


I love the idea, and from the overview implementation looks like a great start. For me though it's not something I will explore. Uploading our documents to a 3rd party is not something we can do.


How does this work considering the token limitation of gpt? Or does the gpt api let you create your own models through their API? I'm admitting my ignorance of OpenAI's offering here.


Wondering about this too. I thought GPT3.5 didn't offer any custom training. And it has a pretty small token limit, at least when talking about dumping all your documentation into it.


This is brilliant, but I am not sure if you have rushed it to launch, I am unable to upload any docs, tried to add a URL for scraping the import was successful but then it never appeared in my Library section. Connecting to github gave me issues but then when I did finally connect to github. the import is never successful. I keep on getting 500 service error and then a bunch of console.log messages with `hi` as null and sometimes `hi` having a value of an object. Let me know if you need more info regarding these issues. But this is a great product definitely.


Hi! I definitely did not expect this reception, so I've been running into quite a few issues (,:

Send me an email and I'll debug it for you https://libraria.dev/about


Congrats on the launch. We are building something similar at https://trypromptly.com that allows chatbot building on user's own documents powered by GPT. We also allow users to build AI powered apps without writing code.

For example, https://trypromptly.com/app/9da10a4f-6d20-431e-98f7-048fab81... is a chatbot built on Coursera video transcript. Similarly https://trypromptly.com/app/d478594d-2082-46c9-bee7-f057f4bc... is a web app that generates resume based on user information and the position they are applying to. Both are powered by Open AI message completions API and built using Promptly's no-code builder. Another fun app we built allows end user to provide a audio file, pipe that to Whisper and use that in the chain below to extract summaries etc.

Ps: We are still updating our landing page with app builder videos and other content


I feel it is impolite to hijack someone else’s Show HN to promote your own, like-for-like product, that too as a top level comment. Not sure about HN norms around this though.


It's a for-profit app being posted here to get free advertising. Other people jumping on to get free advertising too is fair game and definitely the HN norm.

Besides, I always bookmark threads for apps that look interesting and check back later when everyone else has posted their similar apps and been commented on, then I can compare and choose whichever one looks best (or whichever is open source).


I can see your point of view.

Personally, if I find that my app is similar to an ongoing Show HN, I like to wait for an opening where I can respond with a plug for my own app.

I don't feel comfortable talking up my own app unprompted. Feels like making a big attention grabbing announcement at someone else's wedding.


Nah it was done kinda tastefully here and it’s helpful to know the comps


It is useful for users to learn about the alternatives.


I'm wondering if this is designed only for in-house use, or if it's something intended to be public-facing. If a company were to embed this on their website, say, to answer questions about products for sale...

1. Would there be any way to restrict the conversation with the customer to questions related to the products?

2. What strategies could be used to prevent malicious overuse / spamming / flooding the bot to cost the company money?


Congrats on the launch! I'm testing it now and are excited about it. I've tried few similar solutions in the past few days but they were buggy and not as feature rich as your solution. I do have a few questions:

- What's the differences between Assistant and Chat mode?

- If I have GPT4 access is there anyway I can insert my API key and use it with your app? I assume you're using 3.5 turbo.

- What toolchain are you using for splitting / chunking?


Thank you!

- Nothing as of right now! Just an interface. I don't think I implemented snippets yet for Chat mode though - Not yet, although I plan to introduce this. I found that GPT4 is a little slow right now, and much more expensive, but it'll be awesome to enable this for users, then I could charge even less. - I recommend langchain, it's one of the few things that I didn't modify for my use case on their library.


This is amazing!

I would love to have an AWS expert handy at all times, so I tried to upload all AWS documentation using this: https://docs.aws.amazon.com/sitemap_index.xml. I can no longer use the site, so I suspect that busted something.

In hindsight, that was not cool and I'm sorry about it.


Most AWS documentation is on GitHub: https://github.com/awsdocs/


Heh. Looking into this and will try to fix. I'll comment here again when I can get this working!


This is really cool! I would suggest a landing page based on the benefits - what cool things are people doing with this? IE some applications - what immediately pops to mind is a chatbot for your website or documentation, but I'm sure there are many more use cases you've thought of. Would be cool to see on the home page.


This is what LangChain is about right? I think these kind of things have to be open source and self hostable, maybe with a paid and hosted offering as a side. Of course until Apple will eventually integrate this.


I ask 'googly' "tell me about the weather in san francisco, ca today and tomorrow" -- I get a factual answer that quotes Weather Underground.

I ask 'Dumbledore' the same thing, and I get (more-or-less) the same answer.

I think that I initially expected 'Dumbledore' to either feign ignorance on the topic, or at least output a 'flavored' answer in the style of his dialogue.

I guess what i'm saying is that assigning 'characters' to each of these pre-trained/biased prompts creates a user expectation of a character rather than just a pre-loaded knowledge-base.


Try the same with "Zappy". It won't give you the same answer!

I have a checkbox for you to enable or disable GPT. I enabled GPT for the Dumbledore one

https://ibb.co/8sQh1X6


Let's say I have about 100k novel-length texts I want to import into a model like this (I want to build a book recommendation system).

How should I approach it? I think this product can't deal with that much text yet?


Also very interested in this. From what I read, you need to train some other model in the "transformer" family yourself, or you can use OpenAI API for fine-tuning their model. Not sure which would be better in practice, or if either would work, but AI people at my company say the fine-tuning with even just a bit of data can result in a lot better (or different) results for the given domain.


Providing the sources is a nice touch! I really like it. Providing supporting excerpts too would have been nice.

Isn’t publicly hosting some specific content on S3 and providing links to it risky though?


good callout - I had it setup to delete immediately after processing, but looks like there was an issue on my end for that! Fixed now and deleted ones that were uploaded before. s3 was storing manual uploads (not data scraped from urls), and should not be storing them anymore.


Beautifully designed. I assumed a team of people was behind this based on the apparent quality and then learned you're a solo dev. Very impressive!


omg. thank you.


Hi! Haven't tested it yet but I already have a question... I have an historio.us account which I use to "collect" articles I consider interesting.

Would it be possible make your product access a historio.us instance?

(In my use case there would be no privacy concern because all the articles stored in Historio.us are already public... also it would be quite useful because search is quite limited)


Love it! For positioning, I'd look into www.raffle.ai - they've found a very good way to explain why enterprise search is valuable


Testing it, nice.

I tried to import a .pdf but I got an error message about not having enough "credits" https://i.imgur.com/7za9FEn.png

Also, the google drive integration is not working (yet).

Can you give us a glimpse of what's coming down the road? How can I get more credits and when will Gdrive integration will be available?

Also: pricing?


Hi! Strange, that google drive integration should work. I'll take a look and update you right here when it does.

Pricing: Feel free to send me an email bea@bogpad.io, I'll send you some credits to play around with. I'm still trying to figure out pricing!



Ah this is crazy how fast the space is moving. I'm literally trying to build a document embedding UI to do Q&A on. Great work!


Love the idea. Would suggest my co. pay for it if a. know what it costs and b. there is details about how data is handled and plausible assurance it is not fed into a model and is secure.

The only way I would be convinced is if it is free/open source and self hosted, but that makes charging for it more difficult


Pretty cool that all you need is the sitemap for the assitant to index the data! Will give this a try soon :)


For the Dumbledore demo I asked, "What are the 3 unforgivable curses?" and it was unable to provide an answer.

https://ibb.co/bL61nDk

Would your tool work in gated knowledge bases for example training courses behind logins?


As for working with gated knowledge bases, you can. You can just set your assistant as "protected".

https://bogpad.libraria.dev/ for an example of a protected one.

https://ibb.co/T8qjCgZ (strange, i tried again and got this! I'll work on improving stability)


Oh no! Let me manuially add the 3 unforgivable curses to the library. The way I did the dumbledore demo was upload a PDF for each of the 7 books and there ~might~ not be an explicit section in harry potter that states all of them at once?


> an explicit section in harry potter that states all of them at once?

Isn’t part of the point of GPT that it finds relationships without the training data having to be well-structured? So long as the text describes them in a way that a human, having read the books, could answer the question?

I don’t really understand how training works. This isn’t a jab.


It's a great question!

Right now, I don't train anything, I've broken the text down to n characters and created embeddings for that subset of text -- then I search for the closest distance / relationships between the question asked. Then I add the text to the prompt, and tell gpt to use those paragraphs to answer the question (to ensure that it doesn't make anything up). This is one of the ways I can get around the token limit, but it comes at the cost of thinking it can only use the paragraphs I show it. I'm trying to improve the prompt to get more consistent results, and maybe 4 can help me give it larger bodies of text!

Hope that answers your question, let me know if you have any more!


Oh! I think I understand.

So your software takes a prompt from me, does non-GPT work to find additional context from your source (the books, parsed and re-structured into word vectors or whatnot), and then asks GPT my prompt combined with the added context?

Like,

“What are the three foobars when considering these passages from a book <…> ?”


Yeah more or less! I still use open ai for their embeddings (translating text into vector space)

- Your question -> vectors with open ai embeddings - Text you uploaded before -> vectors with open ai embeddings

Get the most similar above a certain threshold, and then add it to the prompt saying

"From these articles / paragraphs, answer the user's question: What are the three foobars"

So yep! I preprocess it


I think that’s a really clever idea.

It feels like training is analogous to years of growing up and going school. And what you’ve done is taken that educated “mind” and said, “here’s a document. I’m going to quiz you on it.”

That seems really practical compared to sending an AI back to school to learn all about some specific topic.

I wonder if we will end up with a library of trained models, each of which went to different schools to obtain different focuses of knowledge. Maybe LiteratureGPT is better suited for this than General GPT or ProgrammerGPT.

Okay I think I’ve stretched the analogy far enough.


The GPT models create sequences of words that are likely to be look correct. A side effect is that it sometimes happens to find a connection between ideas. The three unforgivable curses are something that it should be able to connect, as these are all probably in a similar conceptual space, but you would need a fair bit of data to push the model to see that. It's possible to uncover these connections with less data, but with LLMs you need to do things like prompt engineering to give you more leverage on your data if it's limited.


Yeah, it looks like it actually got the documents correctly, but the prompt itself needed some reworking. I think there's a space out there for prompt testing.


Congrats, this is really cool. Have you thought about making the assistants Slack bots?


It appears the only way to sign up is with a Google account. (I know that is a solution suggested by OpenAI for user authentication.)

Will there be an option for email, for those of us who don't use Google, or other large corporations?


I’m guessing that Libraria is using HyDE (1) to create hallucinogenic embeddings of the documents. It works extremely well.

1. https://github.com/texttron/hyde


I asked "Dumbledore" a question. It gave the answer along with a "learn more" button which ended up being a link to PDF of the book. It shows up only for one particular question. Seems like a bug?


Congrats on the launch! I’m working on a similar product at https://upstreamapi.com

Im excited to see what features we build on top of semantic search + chat.


See also the ChatGPT Retrieval Plugin by OpenAI at https://github.com/openai/chatgpt-retrieval-plugin


Was this plugin used to create the service OP posted?


Great service, Bea! Would it support Databases as a source in the future?


Yes! I'm currently working on it. I'm trying to improve CSV imports right now (current users have large csvs they want to go through)


Amazing. Another thing - make it si that we can give you our emails and you can send us updates there so we can track the progress of the app and come back to it when needed.


It's at the very bottom of the page! I'll try to make it more visible tomorrow as well (:


This looks awesome. I'd be super interested in testing this out and providing feedback on it. I'm going to make an account and is there an email/link to submit feedback as I use it?


Will be adding my email at the about page! https://libraria.dev/about (vercel takes a few minutes to deploy, but it should be up soon)


so - if i download a boatload of research papers from scihub about topic X - this will be base its knowledge off them? if so i was just thinking this would be cool to have - and here it is?


I was thinking something similar and the answer is yes I think.

I uploaded a public-domain copy of Meditations by Marcus Aurelius and now I feel like I can get direct knowledge from the man himself

https://ibb.co/g7ry4X3

It worked pretty well on the first attempt. Will try putting more authors in it now



Yes. If you do try that, let me know! The most difficult part that i'm encountering right now is the "prompt" engineering part. So I'd be curious to know how I can make it better for you


Just a heads-up, I asked a simple question on Harry Potter and the "Learn more" section had links to 3 full Harry Potter PDF books publicly hosted on your S3.


Do you know what would be pretty cool? A feature that let the user to upload N documents, from different fields.

Then through an API he/she can extract the knowledge he/she wants.


Hmm, right now you can kind of simulate that by creating n assistants with only the knowledge you want in them. But yes, no API for now. If there's demand for it, I'll gladly build it though


I'm getting internal server error when pressing the demo button. Also the page after login doesn't show anything other than the sidebar on FF mobile


Nice! I'll be taking it for a spin after work. One question: I'm interested in trying to use this as a way of querying my Zettlekasten-like note system


This is cool. I just did a small poc yesterday. I scraped our api documentation and created an embedding and use chatgpt to answer questions. It is pretty slick.


Thank you! Hope you had as much fun as I've been having!


How is this possible? Doesn't it have a 4k token limit, how are you able to ingest documents that have vastly more data than that?


One thing that's understated is the fact that all you need is a sitemap. Wild how you can summon the knowledge of any site with a list of links.


The "Ask Libraria" text field doesn't let me type anything. I tried it with Chrome and Safari on a Mac. Is this intentional?


I also thought it would be interactive and then realized it was just a screenshot. Would be cool to have an interactive demo right there at the top.


(,: I'm sorry! I'll put it up at the top soon hopefully (I'm hoping to go back to just coding on the app! Spent all day on the landing page today (,: )


yeah sorry! That's just an image (,: There are other demos a bit below when you scroll down


I broke Dumbledore :-(

Q: Which piece did Hermione play in the chess game?

A: I'm sorry! I may not have the article in question yet. Please try again!

But this looks awesome.


Spelling error on upload page, should be: "Add documents to your library for your assistant to use".


I wish people were a little more hesitant to send every single one of their thoughts and documents to OpenAI.


This is cool! I’m guessing you use something like langchain’s document loaders to make it work?


I did this for my email and Google docs. It’s about 15 lines of Python with langchain.


Is this Ok to upload all your messages and your contacts to these services? I'm very worried about that, really


I’d assume it has exactly the consequences you think it does, but I don’t know that and it becomes more of an unknown as more third parties are layered between you and the LLM.

I’d assume the right prompt by another user of the same underlying LLM on another platform could well expose your private information / content / passwords and would treat this with a lot of caution until you get satisfactory evidence otherwise.


Yes, that's what concerns me. Since it's not really clear what's going on, how can we trust a third party service by uploading personal data?


Doesn't it cost a lot of money to finetune on so many documents?


Yes however langchain is not finetuning, it's basically indexing!


nice. this is _exactly_ what I wanted to build when ChatGPT came out, but you actually went out and did it while I coded on passion toy apps that don't actually make me any money. Hope this is successful!


building something similar that scrapes a website' html, image, pdf, etc. files here's an example:

https://collie.ai/tesla


One question, are any of the prompts shared with OpenAI?


I use their APIs, so yes. There's no way around it right now, but hopefully in the future I can move away from relying on them


How does it scrape the documents?

It inserts them into the prompt?


Nice work!

How many documents do you need for it to be usable?


[deleted]


R.I.P. software, now there are only services (backed by software) but the software is not longer distributed, just access to the service enabled.


Indeed, the ongoing fashion shift from locally installed software to external services is quite problematic, as it shifts the control over the user's data from the user themself to… whatever tree/graph of third parties somewhere in the network where the data is accessed, processed… and further shared (willingly or not, knowingly or not, in conformance with the agreements and laws or not) which the user has no control over.

That being said, it seems like an almost inescapable consequence of the fact that the software itself is also data and as such subject to a similar problem: in times of ubiquitous high-speed internet access where you can't protect software (data) from being widely copied and shared over the network once anyone else gets hold of it, the only simple and reliable way for a software company to keep control over the distribution of their software (and thus its monetization) is to never share it in the first place but only provide access to it as a service.

So in the end, the shift from locally installed software to remote services kinda comes down to the fact that most users don't have the same level of awareness or care about their own data and keeping control over it as the software/internet service companies have for their data (e.g. software).


SaaS has been around for a while now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: