def average_embeddings(embeddings):
return [
sum([embedding[x] for embedding in embeddings]) / len(embeddings)
for x in range(len(embeddings))
]
Just finding the mid-point in N-dimensional space between all points. Although I believe OpenAI says these are directions rather than locations. Averaging makes sure you don't accidentally pick a target vector that happens to be on the fringes of your desired embedding space.
So if you're querying a document for predatory birds and you are using the string "Eagles" as your target you will be accidentally digging into embedding space for sports teams (go birds). But if your target is `average_embeddings([eagles_embedding, hawks_embedding, falcons_embedding])` you'll end up with less noise.
When you input a query, you get a numerical representation output. If you have 3-5 similar queries (each phrased a bit differently), they all have slightly different outputs. If you average them all together, it combines them in a way that improves results.