Import CSV Data as an Array in MongoDB using mongoimport

MongoDB’s import utility – mongoimport – introduced a new parameter that allows you to import CSV data as an array.

The --useArrayIndexFields parameter interprets natural numbers in fields as array indexes when importing CSV or TSV files.

Example

Suppose we have a CSV file called tags.csv that looks like this:

tags.0,tags.1,tags.2,tags.3
html,css,sql,xml

We can import that using the --useArrayIndexFields parameter, which will cause mongoimport to interpret the numbers in the column headers as array indexes.

Example:

mongoimport --db=krankykranes --type=csv --headerline --useArrayIndexFields --file=tags.csv

Output:

2021-01-03T20:55:44.284+1000	no collection specified
2021-01-03T20:55:44.284+1000	using filename 'tags' as collection
2021-01-03T20:55:44.297+1000	connected to: mongodb://localhost/
2021-01-03T20:55:44.330+1000	1 document(s) imported successfully. 0 document(s) failed to import.

Seeing as I didn’t specify a collection name, it created a collection called tags (based on the name of the file), then imported my document.

Let’s jump over to the mongo shell and check the collection.

db.tags.find()

Result:

{ "_id" : ObjectId("5ff1a2b0300ed79d9836882f"), "tags" : [ "html", "css", "sql", "xml" ] }

We can see that the CSV data has been imported as a JSON array.

Here it is again but with nicer formatting, which might make the array easier to see.

db.tags.find().pretty()

Result:

{
	"_id" : ObjectId("5ff1a2b0300ed79d9836882f"),
	"tags" : [
		"html",
		"css",
		"sql",
		"xml"
	]
}

Combined with Other CSV Data

The CSV file in the previous example only consisted of array data. But we can include other data in the file too.

Here’s an example of a CSV file called articles.csv that contains other data.

_id,title,body,tags.0,tags.1,tags.2,tags.3
1.0,Web,blah,html,css,sql,xml
2.0,Animals,blah 2,cats,dogs
3.0,Plants,blah 3,trees
4.0,Oceans,blah 4

Let’s import that file:

mongoimport --db=krankykranes --type=csv --headerline --useArrayIndexFields --file=articles.csv

Output:

2021-01-03T21:14:38.286+1000	no collection specified
2021-01-03T21:14:38.287+1000	using filename 'articles' as collection
2021-01-03T21:14:38.336+1000	connected to: mongodb://localhost/
2021-01-03T21:14:38.407+1000	4 document(s) imported successfully. 0 document(s) failed to import.

Now let’s jump over to the mongo shell and check the collection.

db.articles.find()

Result:

{ "_id" : 1, "title" : "Web", "body" : "blah", "tags" : [ "html", "css", "sql", "xml" ] }
{ "_id" : 3, "title" : "Plants", "body" : "blah 3", "tags" : [ "trees" ] }
{ "_id" : 4, "title" : "Oceans", "body" : "blah 4" }
{ "_id" : 2, "title" : "Animals", "body" : "blah 2", "tags" : [ "cats", "dogs" ] }

Let’s pretty it up a little.

db.articles.find().pretty()

Result:

{
	"_id" : 1,
	"title" : "Web",
	"body" : "blah",
	"tags" : [
		"html",
		"css",
		"sql",
		"xml"
	]
}
{
	"_id" : 3,
	"title" : "Plants",
	"body" : "blah 3",
	"tags" : [
		"trees"
	]
}
{ "_id" : 4, "title" : "Oceans", "body" : "blah 4" }
{
	"_id" : 2,
	"title" : "Animals",
	"body" : "blah 2",
	"tags" : [
		"cats",
		"dogs"
	]
}

So we can see that the documents have been created as perfect JSON/BSON documents, and arrays have been created for the relevant data.

Notice that no array has been created for document 4. This is because no array data was provided in the CSV file. So rather than create a field with an empty array, it didn’t create the field or the array.