Create a Text Index with Different Field Weights in MongoDB

When you create a text index in MongoDB, you have the option of applying different weights to each indexed field.

These weights denote the relative significance of the indexed fields to each other. A field with a higher weight will have more impact in the search results than a field with a lower weight.

This provides you with a certain amount of control over how the search results are calculated.

The default weight is 1, so if you don’t specify a weight for field, it will be assigned a weight of 1.

Example

Suppose we have a collection called posts, and it contains documents like this:

{
	"_id" : 1,
	"title" : "The Web",
	"body" : "Body text...",
	"abstract" : "Abstract text..."
}

We could create a compound text index to the three text fields and apply different weights to each one.

Like this:

db.posts.createIndex( 
  { 
    title : "text",
    body : "text",
    abstract : "text"
  },
  {
    weights: {
      body: 10,
      abstract: 5
    } 
  } 
)

When I created the compound text index, I specified 3 fields. When I specified the weights, I specified weights for just two of those fields.

The result is that those two fields will be weighted as specified, and the other field (title) will have the default weight of 1.

We can see this when we run getIndexes():

db.posts.getIndexes()

Result:

[
	{
		"v" : 2,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_"
	},
	{
		"v" : 2,
		"key" : {
			"_fts" : "text",
			"_ftsx" : 1
		},
		"name" : "title_text_body_text_abstract_text",
		"weights" : {
			"abstract" : 5,
			"body" : 10,
			"title" : 1
		},
		"default_language" : "english",
		"language_override" : "language",
		"textIndexVersion" : 3
	}
]

This means that the body field will have twice the significance of the abstract field, and ten times the significance of the title field.

Wildcard Text Indexes with Weighted Fields

You can apply weights when creating wildcard text indexes. Wildcard text indexes can be handy when you don’t know what the text fields are going to be in the documents. You may know some, but not all.

In such cases, you could create a wildcard text index, and assign a weight to those fields that you are aware of. Any other fields will be assigned the default value of 1.

Suppose we have the following document as a guideline:

{
	"_id" : 1,
	"title" : "Title text...",
	"body" : "Body text...",
	"abstract" : "Abstract text...",
	"tags" : [
		"tag1",
		"tag2",
		"tag3"
	]
}

It’s similar to the previous document, except that it now has a tags field that contains an array. But for all we know, future documents in that collection could have other fields – like maybe categories, keywords, author_bio, etc.

But we don’t actually know, so we will create a wildcard text index that will encapsulate all fields with string data. And we will create weightings for some of the known fields.

Example:

db.posts.createIndex( 
  { "$**": "text" },
  { weights: {
      body: 10,
      abstract: 5
    } 
  } 
)

In this case, the body field gets a weight of 10 and the abstract field gets a weight of 5. This means that the body field has twice the impact of the abstract field, and ten times the impact of all other text fields (because they will be assigned the default weight of 1).

After creating that index, if we call getIndexes(), we can see the weightings given to the fields:

db.posts.getIndexes()

Result:

[
	{
		"v" : 2,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_"
	},
	{
		"v" : 2,
		"key" : {
			"_fts" : "text",
			"_ftsx" : 1
		},
		"name" : "$**_text",
		"weights" : {
			"$**" : 1,
			"abstract" : 5,
			"body" : 10
		},
		"default_language" : "english",
		"language_override" : "language",
		"textIndexVersion" : 3
	}
]

As expected, the body field gets 10, the abstract field gets 5, and all others get 1.