To optimize for fast nearest neighbors, I chose 256 dims. Notably, this actually hurt some of the pre-training classification losses pretty severely compared to 2k dims, so it definitely has a quality cost.
The site uses cosine distance. The code itself implements Euclidean distance, but I decided to normalize the vectors last minute out of FUD that some unusually small vectors would appear as neighbors for an abnormal number of examples.
You definitely highlighted a shortcoming of the feature vector model in this case. Indeed it's quite a small model trained on a single Mac for about a week, so it's not very "smart".
I'd expect that this is a problem that could be solved by using larger off the shelf models for image similarity. For this project, I thought it would be cooler to train the model end-to-end myself, but doing so has negative consequences for sure.
I think it would be a useful feature. For the sake of being a fun project, I didn't use CLIP because I only wanted to use models that I trained myself on a single Mac. However, to make this more useful, text search would be quite helpful.
Yup, it's a small model I trained on my Mac mini! The model itself just classifies product attributes like keywords, price, retailer, etc. The features it learns are then used as embeddings
People who are making quick social media posts while taking a casual walk outside on websites that don't make it easy to edit posts and are not expecting to be nitpicked about it.
Overall, it's something I've seen very often on social media and less technical articles about LLMs. OpenAI would fall into the "almost" category.
It's okay to say that you mistyped or whatever, while taking a casual walk outside on websites that don't make it easy to edit posts and are not expected to be nitpicked about it. Throwing in that everyone uses them interchangeably, however, is just profoundly wrong on every level.
I wasn't nitpicking. It is a HUGE differentiation, and I pointed it out specifically because people pick up on terminology so people who might not know better will go forward and just drop in the more super duper hyperparameter, not realizing that it makes them look like they don't know what they're talking about. As I said in the other post, no one who knows anything uses them interchangeably. It is just completely wrong.
Again, I've heard and used the terminology "model hyperparameter" in place of "model parameter", and I've also heard "model parameter" in place of "model hyperparameter" because not every human interaction is a paper on arXiv and the terms are obviously very similar. The context of the term is what matters in the end (as demonstrated by other comments following my correct intent), and society will not crumble if using either term incorrectly in casual conversation. No one intentionally uses the wrong term, but as jokingly said in another comment "when you get really deep into model training, it can seem like there are a billion hyperparameters you have to worry about."
I appreciate being corrected, but you are the one who asked for my opinion based on my extensive time in AI, you can choose to believe it or not.
The site uses cosine distance. The code itself implements Euclidean distance, but I decided to normalize the vectors last minute out of FUD that some unusually small vectors would appear as neighbors for an abnormal number of examples.