Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>Try entering a query like 'gpt3' or '2019' into the news search demo The results are bad because this is not the kind of input the engine was trained to handle.

You should copy paste a part of an existing article. It will embed it into a multidimensional space and do a similarity search (the same way it was trained to (by converting a full article paragraph to a vector).

If you give it just a word it can't convert it to a meaningful representation because the network wasn't trained to do this.

But you can train it differently and have it able to handle a few words. For example you can summarize every article to a few sentences and keywords and use a traditional keyword search.

One usual way to create good vector representation is to use encode simultaneously two different space to the same vector space. You encode 'queries' and 'answers' such that they are close for the known (query-answer) pairs. This is what CLIP did, encoding both images and their corresponding description to a same vector space.

You can download the precomputed clip embeddings LAION-400-MILLION OPEN DATASET on academictorrents.com

CLIP can do such thing for the problem of semantic image search because the problem of matching an image to its description is quite well defined. But quite often there is no unique apriori meaningful way of matching a query to an answer, specially as the index get big.

In the case of a basic query 'gpt-3', the query is quite vague and its not obvious with respect to which direction you should do the ranking (Do you meaning you want articles generated by gpt-3 ? Articles containing gpt-3, a basic definition ?). There is no a priori good answer, and that's where you can use your additional context to refine the query. For example Siri or a NLP bot, could ask you to be more explicit in what you mean.

Or it can have multiple representation of your vector space and return the top-1 for each of those representation, and hope that you give it feedback by clicking the one that was more meaningful to you as requester.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: