What is Retrieval-Augmented Generation (RAG)?

Large language models are impressive, but they have a fundamental limitation in that they only know what they were trained on. Ask a model about something that happened after its training cutoff, or about a document sitting in your company’s internal knowledge base, and it either makes something up or tells you it doesn’t know.

Retrieval-augmented generation, almost always shortened to RAG, is the approach the industry has settled on to fix this.

The idea is pretty straightforward. Instead of relying purely on what the model has memorized, you give it the ability to pull in relevant information from an external source, then use that information to generate a response.

Read more

What is an Embedding Model?

Computers are good at numbers. They’re not naturally good at understanding that “dog” and “puppy” are related, that a photo of a beach and the phrase “summer vacation” share something in common, or that a five-star review and the sentence “this product is amazing” mean roughly the same thing.

Embedding models are how we bridge that gap.

Read more

What is a Data Mesh?

Data mesh is one of the newer ideas in the data world. And it’s generated a lot of confusion. Unlike data lakes or data warehouses, it’s not a technology you buy or install. Rather, it’s a way of organizing how your company thinks about and manages data.

Read more

Data Governance Explained

Data governance isn’t the most exciting term in the data world, but it might be one of the most important. Companies that ignore it tend to find out why it matters the hard way. Often this is through a compliance failure, a data breach, or a boardroom argument about whose numbers are right.

Read more

Column-Level Security Explained

You may be aware of a concept called row-level security, which controls which rows a user can see in a table. Column-level security is a similar concept, controls which columns are visible. It solves a different problem. Same table, same rows, but some fields in those rows shouldn’t be visible to everyone.

Think about an employees table. A manager might reasonably see a list of all staff and their departments. But salary? National ID numbers? Personal contact details? These should be visible to the manager, but they probably shouldn’t be visible to most other employees, even if they’re querying the same table.

Read more

What is a Data Lake?

Data lake is one of those terms that gets thrown around a lot in conversations about data strategy, often alongside data warehouses and data marts. But what actually is a data lake, and how does it fit into the picture? Let’s find out.

Read more