What Are Embeddings? - Wesley Gahr

In my experience the word “embedding” is not clearly understood. It doesn’t need to be, we just need to be able to reason about them usefully. Here’s how I think about them.

An embedding is a coordinate in a space, which could indeed also correctly be described as a numerical representation or a vector. Now for this example let’s assume we train a model which outputs an embedding (a vector) of size 2, we should think of these two numbers as our x and y axes.

These axes have a certain meaning, let’s say that the x axis expresses whether a phrase is casual (lower) or formal (high), and our y axis expresses whether a phrase is negative (low) or positive (high).

Now we let the model encode the following two phrases:

1. Dear sir, I regret to inform you that we have to cancel your appointment.
2. lessgo we are so back guys wgmi

It will look something like this:

2D embedding example showing two phrases plotted on casual/formal and negative/positive axes — Two phrases mapped to a 2D embedding space. The formal, negative phrase lands in the bottom-right; the casual, positive phrase lands in the top-left.

So here we see a visualization of this so called “numerical representation” of a piece of text, our vector of size 2. The graph shows us that the two phrases are complete opposites on both axes, which for us humans is easy to evaluate as roughly true.

Let’s now embed a third phrase: “Hey sorry can’t make it tonight, something came up” and plot it.

3 phrases plotted in 2D embedding space with distance lines showing phrase 3 is closer to phrase 1 — Adding a third phrase: "Hey sorry can't make it tonight, something came up." The dashed lines show distances. Phrase 3 is closer to phrase 1 (d=0.78) than phrase 2 (d=1.28), because both involve cancelling something with a slightly negative tone.

Now we’re getting to the heart of why embeddings are useful. Two points allow us to express how far apart they are, but with three points we can ask “which of these two is the third one closest to?”. All of this just by measuring the Euclidean distance on the plane between the coordinates!

Our toy model has two dimensions. A real embedding can have hundreds or even thousands of dimensions. It’s exactly the same idea where we just measure the distance, we just can’t draw it on a flat plane anymore.

This is the reason why techniques like semantic search and RAG can match “how to fix my car” to a document about “automobile repairs”, even if none of the words in both documents match, they’re just nearby coordinates.

Unlike our toy model, in reality there’s an important caveat, which is that the many dimensions of these models are (as of now) uninterpretable. It’s hard to say exactly what each dimension encodes, let alone capture the interactions between each dimension. Although its inner workings remain a mystery to us for now, it seems to work pretty well.

So, next time someone says “embedding”, just think: coordinate in a very high-dimensional space, where distance = similarity.