Computers are good at numbers. They’re not naturally good at understanding that “dog” and “puppy” are related, that a photo of a beach and the phrase “summer vacation” share something in common, or that a five-star review and the sentence “this product is amazing” mean roughly the same thing.
Embedding models are how we bridge that gap.
An embedding model takes something human-readable (like a word, a sentence, a paragraph, or an image) and converts it into a list of numbers. Not an arbitrary list, but one where the numbers encode meaning. Things that are semantically similar end up with similar numbers. Things that are unrelated end up with very different ones.
That list of numbers is called an embedding, or a vector.
Why Numbers?
Because once meaning is expressed as numbers, you can do math on it. And math is something computers are extremely good at.
With embeddings, you can measure how similar two pieces of text are by calculating the distance between their vectors. You can cluster documents by topic. You can find the most relevant passage in a knowledge base in response to a question. You can compare an image to a text description and determine how closely they match.
None of that is straightforward with raw text. With vectors, it reduces to geometry.
What Does an Embedding Actually Look Like?
In practice, an embedding is just an array of floating point numbers. A typical embedding model might represent a sentence as a list of 384, 768, or 1536 numbers, depending on the model. Each number corresponds to a dimension in what’s called a high-dimensional space.
You can’t visualise 1536 dimensions, but the concept is the same as a simpler case. In two dimensions, similar things cluster together on a map. In 1536 dimensions, similar meanings cluster together in vector space. The embedding model has learned, through training on enormous amounts of data, which regions of that space correspond to which kinds of meaning.
“Cat” and “kitten” will have vectors close together. “Cat” and “interest rate” will be far apart. “I loved this movie” and “fantastic film” will be near each other even though they share no words in common.
Embedding dimensions in AI typically range from 128 to 3072, depending on the model’s complexity and the balance between speed and accuracy.
You’ll notice these numbers are written without commas (1536 rather than 1,536). That’s standard convention in AI and technical contexts, where large numbers used as dimensions or sizes are treated more like identifiers than quantities. Also, since you can’t use commas in integer values in programming (like dimension=1536), developers naturally drop the comma when writing about it. You’ll see the same pattern in model specs, code, and documentation across the field.
Common Dimension Sizes
While 1536 dimensions has become a familiar benchmark due to the popularity of OpenAI’s models, the broader AI landscape utilizes a variety of vector lengths. These sizes are determined during a model’s training phase and represent the specific “coordinate system” the AI uses to map out semantic meaning.
Here are the common ones:
- 384 dimensions: Popular for lightweight or “mini” models designed for speed and mobile applications (e.g., all-MiniLM-L6-v2).
- 768 dimensions: A widely used industry standard established by BERT and many general-purpose sentence transformer models.
- 1024 dimensions: Often used by “large” open-source models like BGE-large or Voyage AI to capture deeper semantic nuances.
- 3072 dimensions: The current high-end for models like OpenAI’s text-embedding-3-large, offering maximum precision for complex retrieval tasks.
As open-source libraries like Hugging Face continue to grow, the 384 and 768-dimension models remain the most popular choices for developers running self-hosted infrastructure due to their lower memory overhead.
Typical Use Case Breakdown
Choosing the right dimension size is a balancing act between the precision of your search results and the operational costs of your database. The following breakdown illustrates how different vector lengths align with specific project requirements, from rapid mobile lookups to deep-dive document analysis:
| Dimensions | Use Case | Main Benefit |
|---|---|---|
| 128–256 | Real-time retrieval, mobile apps | Low latency & small storage |
| 384–768 | Balanced RAG & general search | Industry standard performance |
| 1024–1536 | Multilingual or high-precision tasks | Captures complex relationships |
| 3072+ | Long-context & maximum accuracy | Best for massive datasets |
When selecting a size, consider the “Dimensionality Curse”: larger vectors can capture more nuance but require significantly more computational power and storage. Most modern RAG (Retrieval-Augmented Generation) systems find their sweet spot between 768 and 1,536 dimensions.
Modern Flexibility (Matryoshka Embeddings)
Newer models from OpenAI and Google now support Matryoshka Representation Learning. This allows you to generate a large vector (like 3072) and simply “slice” it down to a smaller size (like 256 or 512) without significant loss in accuracy, helping you save on database costs.
How Embedding Models Are Trained
Embedding models learn from examples. These are usually massive amounts of text, image-caption pairs, or other data where relationships between things are implicit.
During training, the model is repeatedly shown pairs of things and learns to push related pairs closer together in vector space while pushing unrelated pairs further apart. Over millions of examples, the model develops a rich internal map of meaning that generalises well to new inputs it’s never seen before.
The result is a model that can take any new sentence or image and place it accurately in that map, even if it was never part of the training data.
What Embedding Models Are Used For
Embeddings show up in more places than you may realise:
- Semantic search — finding results that match the meaning of a query, not just the keywords. Search for “cheap flights to Europe” and get results that mention “budget airlines” and “affordable European travel” even if those exact words aren’t in your query.
- Retrieval-Augmented Generation (RAG) — the retrieval step in a RAG system relies on embeddings to find the most relevant documents in response to a question before passing them to a language model
- Recommendation systems — embedding user behaviour and content lets you find items similar to what someone has engaged with before
- Duplicate and near-duplicate detection — finding documents, support tickets, or database records that say the same thing in different words
- Classification — once text is in vector form, it’s much easier to train a classifier on top of it
- Clustering — grouping large volumes of text by topic without manually defining categories
Embedding Models vs. Large Language Models
These are related but different things, and the distinction is worth being clear on.
A large language model (LLM) is designed to generate text. You give it a prompt, it produces a response. GPT-4, Claude, Gemini are all generative models.
An embedding model is designed to represent text, not generate it. You give it an input, it gives you back a vector. No words come out. Just numbers. The model has done its job when it accurately encodes the meaning of the input into that vector.
Some model families include both. OpenAI, for example, offers GPT models for generation and separate embedding models like text-embedding-3-small for producing vectors. They’re trained differently and used for different tasks, even if they share some underlying architecture.
Multimodal Embeddings
So far this has mostly focused on text, but embedding models can work across modalities. A multimodal embedding model can encode both text and images into the same vector space, which means you can compare them directly.
This is how image search works when you type a description and get back photos. It’s also what powers features like “find images similar to this one”. The image gets embedded as a vector, and the system retrieves other images whose vectors are nearby.
OpenAI’s CLIP model was an early and influential example of this. Multimodal embeddings are now a core building block for applications that work across text, image, audio, and video.
Choosing an Embedding Model
There are a lot of embedding models available, ranging from small open-source models you can run locally to large hosted models accessible via API. A few things worth considering when choosing one:
- Dimensionality — higher-dimensional embeddings can capture more nuance but cost more to store and search
- Domain fit — a model trained on general web text may not perform as well on medical or legal documents as one trained specifically on that kind of content
- Language support — not all embedding models work equally well across languages
- Speed and cost — if you’re embedding millions of documents, inference speed and API cost start to matter a lot
For most general-purpose applications, a well-regarded off-the-shelf model is a perfectly reasonable starting point. Specialised use cases may need something fine-tuned on domain-specific data.
To Summarize
Embedding models are the part of the AI stack that converts meaning into math. They’re not the most visible technology (you rarely interact with them directly) but they’re foundational to a huge range of AI applications, from search to recommendations to the retrieval step in RAG systems.
Understanding what they do and why they work makes it a lot easier to reason about how modern AI applications are put together.