"Data" is the plural form of "datum".

It is proper to say "these data" instead of "this data", and "the data are" instead of "the data is".

However, doing this makes you look like an uptight pedant like the people who are picky about ending sentences with prepositions and are concerned about the correct use of "who" and "whom".

Sure, in Latin data is the plural of datum. But in English the word data can be used both as singular or plural (at least according to Webster's New Collegiate Dictionary published by Merriam-Webster).

Quite frankly, it seems to me that the most common usage is rather abstract, i.e., there is no such thing as "a data" (or "a datum" for that matter) in common English usage, nor is there "two data." Rather, there is a data item, two data items, etc. Kind of like water: There is no such thing as "a water" but there is "a glass of water".

And just as we say "water is," we can say "the data is."

The problem with this distinction is that scientists who refer to their observations as data are almost never interested in a single datum. They consider any single observation to be decontextualized and potentially anomalous, and look for confirmation through statistically-validated trends across large bodies of data.
Therefore, all the data together can be seen as a unitary piece of information, a single way to understand the phenomenon being studied. It's similar to the way that "story" is singular, even though one story can contain many subplots and characters that are stories in their own right. I think this makes sense as the basis for a singular form of the word "data."

Also note that, at least in the social sciences, "datum" and "data" have largely been replaced by "data point" and "data set," respectively. Perhaps that the proper argument is that "data" is plural in Latin and singular in English.

Sadly, treating "data" as a singular noun leaves the door open for a grammatical horror: datae.

This arises from treating the neuter plural as a feminine singular, with the following reasoning:
  1. data is singular.
  2. I want to talk about more than one of it.
  3. I know it's a Latin word.
  4. Latin words that end in "a" are made plural by changing the "a" to "ae"
  5. The plural of data is therefore datae

I have seen this in formal writing. I have wept.

In practice, the usage of the words data and datum in published scientific work seems to vary depending on how much of a grammarian an author's co-authors or editors are. For papers I have written for conferences and journals, my dissertation advisor (who was learning Latin) always demanded that I write "the data show", as in "the image interpolation data show that Britney Spears has breast implants", rather than "the image interpolation data shows" the same thing. His rationale for this demand was based on the idea that conference papers and journals were formally published work that may be read by others years or decades later. As such, he believed that correct grammar should be used.

Data can be used in two different ways: as the plural of datum or as a mass noun. The second usage is more recent. It is never singular.

My theory for why this is so goes like this:

A datum is a single fact, a value or bit. When you deal with only a few of them it makes sense to count them: One datum, another datum, here are the data.

These days, now that a 10 million byte MP3 data file is common, we are looking at the values from so far away that the individual bits are no longer visible, and what was a lot of discreet points has merged into a continuous stream of data, like air composed of molecules: Here is a lot of data.

Log in or register to write something here or to contact authors.