Intercoder reliability is a term used in the social sciences to describe the simple fact that no group of individuals will interpret any set of non-numeric data the exact same way. In any type of
social science research, the greater the numbers of people tested, the stronger the results. Size does matter when it comes to
sampling. Unfortunately sample size also increases the
labor involved in gathering and interpreting results. Social science
jargon calls these people 'coders'. Often it becomes impossible for one person to do the work. If a researcher has the funds, they may hire or employ
volunteer labor to help with the work. Unfortunately, while humans may agree on many things we rarely agree precisely on any one thing. Thus in any endeavor involving more than one person results may be gathered or interpreted with some inconsistencies
even if all make a good faith effort to follow study guidelines exactly.
This comes because a human's history and background strongly affects their interpretation of language. Take a very precise word like house. If you asked people to draw a house, they would draw different homes, usually reflecting one they lived in during a significant period of their life. That level of inconsistency rarely matters for 'house' is not a subjective term. But suppose an interviewer for a survey is expected to note if the subject became 'agitated' during a particular series of questions. Now the interviewer must make a judgement call about the subject. In The Caine Mutiny the story makes it clear that when Captain Queeg is agitated he rolls little steel balls in his hand. But suppose we found ourselves interviewing Queeg and he pulls out his balls. We haven't seen an hour of back story, we just met the man. Is he 'agitated'. Is his ball rolling a compulsive habit? Or perhaps some form of odd physical therapy? The interviewer does not know and must make a judgement based solely upon Queeg's rolling balls.
How the interviewer (often an undergraduate) will interpret Queeg's balls depends upon their previous life experiences and personal predispositons. An interviewer familiar with ticks like Queeg's will probably correctly identify Queeg as 'agitated'. Others who know others with odd ticks that mean nothing may interpret Queeg as 'Not agitated'.
In social science research many forms of data are not clear, but still constitute valid data. Any time you have more than one person tasked with interpreting a data set there will be some variation in how different coders will interpret a particular datum. This leads to built-in inconsistencies, and thus makes any inferences drawn from the data weaker than they might be. A stronger result is required to attain significance. People often say "I know it when I see it." They do, but they must all see for themselves. We all see things slightly differently, and the more abstract the data, the more differently we see.