If you happen to read a lot of data-related material, you might occasionally find the word “data” being treated in different ways. In some cases you’ll see “this data is…” and in other cases “these data are…”. You might even think “they obviously made a mistake with their grammar”.
Not so fast!
The word data is the plural of datum, which is Latin for “something given”. In that sense of the word, then “these data are” would be the correct usage.
However, the Oxford English Dictionary states [emphasis mine]:
In Latin, data is the plural of datum and, historically and in specialized scientific fields, it is also treated as a plural in English, taking a plural verb, as in the data were collected and classified. In modern non-scientific use, however, it is generally not treated as a plural. Instead, it is treated as a mass noun, similar to a word like information, which takes a singular verb.
So as long as you’re not using it in a scientific context, it’s usually fine to use “this data is”.
When you use “this data is”, you are using it as a mass noun (also known as an uncountable noun, or non-count noun). This usage treats all data as one unit, where the individual pieces can’t be counted.
Here are some examples of sentences where data is treated as a mass noun:
- This data is confusing
- The data doesn’t lie
- The data was out of date
- Comprehensive data has been published
And here are the same sentences, but with data treated as a plural noun:
- These data are confusing
- The data don’t lie
- The data were out of date
- Comprehensive data have been published
But there are occasions where the usage is ambiguous – it could be treated as plural noun or mass noun:
- I’ve loaded the data
- Can you please send me the data?