The math of meaning: Embeddings

Written by Aaron Hammond

02.11.2025

Have you ever wondered how AI understands that cat and kitten are related words? Or how your music app can recommend songs similar to your favorites? There’s no master list of synonyms or similar songs.

Instead, the computer understands meaning using a tool called embeddings. These nifty mathematical constructions give the computer the power to understand the meaning of a text the same way humans do

Vectors everywhere

An embedding is a special kind of vector. You may remember this word from physics class, but the concept is simple: a vector is list of numbers.

Commonly vectors are used to describe motion through space, where each number represents an amount of change in a different dimension.

In a two dimensional vector space, the two components of the vector represent displacement along the X axis and Y axis respectively. If we draw vectors from the origin, these vectors point to the same X-Y point indicated by their vector form.

When we measure velocity, we record the result as a vector of three numbers representing change in X, Y, or Z dimensions. This velocity vector defines the direction of change and magnitude, or speed.

A vector can be defined by a magnitude of change in a specific direction. The magnitude of a vector is equal to its length, while the direction is typically measured as the angle between the vector and axes.

We can manipulate vectors mathematically. For example, when we add the velocity vectors for the wind and an object, we calculate a new vector for total velocity.

We add vectors by adding each of their constituent components pairwise. This is graphically equivalent to placing each vector at the head of the opposite, and tracing the resulting point from the origin. In the image above, the plane would follow the dotted yellow line, even if pointed in the direction of the green line.

Technically, a vector is defined by a direction of change and a magnitude of change, but we’re going to talk about them as single points. For our purposes, the vectors will all have magnitude of 1, so we can think about them simply as points on a sphere with radius 1.

Embed this

A word embedding is a vector meant to represent the meaning (or semantics) contained within that word. Each embedding may be 256 numbers long or longer. The values in an embedding vector are assigned in a very careful way that preserves the semantic relationships when we do math over multiple embeddings.

Our embedding generator produces a unique vector for every word in our dictionary. These vectors all share the same length in terms of number of dimensions, as well as total magnitude of the vector.

The calculation of word embeddings was first introduced as Word2Vec in a 2013 paper by Tomas Mikolov et al. This paper ignited the modern wave of developments in natural language processing, and we can trace a straight line from Word2Vec to the modern large language models (LLM) we use today.

We expect that two vectors representing similar words or concepts should be “close” together. We can think about this like our point in X-Y-Z, three-dimensional space. Each embedding can be understood as a point instead in N-dimensional space, where each value in the vector represents a different dimension.

Embeddings for related words or concepts will cluster in space. Above, dog, canine, and wolf form one cluster, while cat, feline, and lion for a second cluster. Note the symmetry between the two clusters.

First, in this N-dimensional space, we find that the vector for the word wolf is closer to the vector for dog than it is to the vector for cat. The opposite is true for lion. Second, the difference between lion and cat mirrors the difference between wolf and dog, meaning their relationships are structurally similar. These two remarkable properties unlock computation over natural language.

We can perform arithmetic over word embeddings in a way surprisingly consistent with intuition. It’s hard to understand what exactly (Lion - Cat) means on its own, but it’s easy to understand why it would equal (Wolf - Dog).

Information retrieval

In a similar way to individual words, we can also calculate embeddings over longer chunks of text. This enables efficient retrieval of information based on user entry.

When we calculate embeddings over entire phrases, we still produce a vector of the same length and magnitude as for a single word. This enables easy comparison across words or phrases that represent larger concepts.

Let’s suppose we calculate a single embedding vector to represent the complete content of a student’s record. This embedding could be understood to represent the student themselves. We expect the embedding vectors for students with similar characteristics to be closer together than not.

To calculate the embedding for a student, we first produce a textual representation of that student’s characteristics. Then, we produce a phrase embedding in the usual way.

Let’s say we take a human prompt like “students with peanut allergies.” We can convert the whole prompt into one of our embeddings, representing a new point in N-dimensional space. Just as the embedding for lion was closer to the embedding for cat, the embedding for our prompt will be closer in distance to students who actually have peanut allergies. Trippy!

When a new prompt is received by our agent, we convert the text into a phrase embedding. We can then compare the vector representing the prompt with the vectors representing other objects in our system. We expect related objects to have vectors close to the prompt embedding, as in this example.

We can use embeddings in this way for information retrieval. When a user prompts the robot with a new question, we first encode that question as a new embedding. The robot scans its store of content to identify objects that are related to that question by minimizing distance between the objects’ embeddings and the prompt’s embedding. These related documents are then provided to the robot as additional context when formulating the response.

This technique, called Retrieval-Augmented Generation (RAG), is the primary way our agent discovers new information about the world over the course of a thread. Training an AI model can take a long time, so we can’t rely on their training to handle realtime behavior. Instead, we provide additional context we think will be useful when asking questions in realtime.

We’re all just vectors anyway

In just over ten years, embedding techniques have completely transformed AI practice. These little vectors directly laid the tracks for the explosion of large language models (LLM) we’re experiencing today. As we discover more and more entities that can be represented in vector form as embeddings, the reach and penetration of AI into the outside world will only grow.

What’s next

In the next entry in this series, we’ll learn more about RAG and discuss how we maintain privacy and role-based access control in our agent.