What Are Embeddings?
In my experience the word “embedding” is not clearly understood. It doesn’t need to be, we just need to be able to reason about them usefully. Here’s how I think about them.
An embedding is a coordinate in a space, which could indeed also correctly be described as a numerical representation or a vector. Now for this example let’s assume we train a model which outputs an embedding (a vector) of size 2, we should think of these two numbers as our x and y axes.
These axes have a certain meaning, let’s say that the x axis expresses whether a phrase is casual (lower) or formal (high), and our y axis expresses whether a phrase is negative (low) or positive (high).
Now we let the model encode the following two phrases:
1. Dear sir, I regret to inform you that we have to cancel your appointment.
2. lessgo we are so back guys wgmi
It will look something like this:
So here we see a visualization of this so called “numerical representation” of a piece of text, our vector of size 2. The graph shows us that the two phrases are complete opposites on both axes, which for us humans is easy to evaluate as roughly true.
Let’s now embed a third phrase: “Hey sorry can’t make it tonight, something came up” and plot it.
Now we’re getting to the heart of why embeddings are useful. Two points allow us to express how far apart they are, but with three points we can ask “which of these two is the third one closest to?”. All of this just by measuring the Euclidean distance on the plane between the coordinates!
Our toy model has two dimensions. A real embedding can have hundreds or even thousands of dimensions. It’s exactly the same idea where we just measure the distance, we just can’t draw it on a flat plane anymore.
This is the reason why techniques like semantic search and RAG can match “how to fix my car” to a document about “automobile repairs”, even if none of the words in both documents match, they’re just nearby coordinates.
Unlike our toy model, in reality there’s an important caveat, which is that the many dimensions of these models are (as of now) uninterpretable. It’s hard to say exactly what each dimension encodes, let alone capture the interactions between each dimension. Although its inner workings remain a mystery to us for now, it seems to work pretty well.
So, next time someone says “embedding”, just think: coordinate in a very high-dimensional space, where distance = similarity.