Cohesion and coupling are computer programming code metrics. They are more like rules of thumb than numbers that can be automatically measured.

I've reviewed a fair bit of code from prospective programmers keen to show that they are competent with object-oriented software construction. The sad thing is that a significant minority of them are having problems with much more basic concepts than their design patterns and other object-oriented idioms. Object orientation was conceived as a layer over procedural code. A function in the latest object-oriented java system contains the same basic stuff as a function in C code written in 1968: variable declarations, if statements, for loops, assignement and so on.

Object orientation is a way to organise your functions, so first you still need to know how to make a function - what to break down into which functions, and to which object to attach them. That is a basic skill where the rules of thumb were discussed in 1974 1, as part of the study of procedural programming, functional decomposition and the like.

The worst of the bad code that I have seen contains cut and paste coding. The history of programming language constructs is largely a history of finding new ways to avoid repetitive coding. And Ctrl+C and Ctrl+V are tools not weapons.

More often, bad code is coded without understanding of cohesion and coupling, and exhibits the sins of low cohesion and high coupling. You may have looked at code and thought it bad, without knowing what to call that quality that made it bad. Maybe you read a procedure and realised that it didn't have a clearly identifiable role. Maybe you noticed lots of global variables used instead of data members and parameters. You didn't find it to be modular. Cohesion and coupling are often the metrics that we unconsciously employ here. Well-designed code, OO or plain procedural, has high cohesion and low coupling.

High cohesion means that a block of code does only one thing, and nearby code does related things. The code in file, in an object, hangs together. You can describe unambiguously what it does.

Low coupling means that the code does not draw on other far-flung parts of the program unnecessarily. You can understand it without reference to much other code.

Cohesion

Here are the now-recognised levels of cohesion2, in order from highest (best) to lowest (worst):

Functional cohesion - a function that does only one operation, e.g. Sin(), FloatToStr() etc. This is as good as cohesion gets.

Sequential cohesion - a routine contains related operations that occur in a particular order, e.g. open a file, read data from it, then close it.

Communicational cohesion: I'm not sure what this one means. Apparently when different operations use the same data, they may be linked together. E.g. several distinct operations that use a newly arrived data packet.

Temporal cohesion: Operations that must occur at the same time, e.g. Initialise(), Shutdown(), AfterLogin() are likely to be grab-bags of things that are unrelated except that they must occur at the same time. Not brilliant cohesion, but necessary and acceptable.

The following kinds of cohesion (or lack thereof) are generally considered unacceptable.

Procedural cohesion. A procedure performs the details of several unrelated tasks in order. This means that it is not OK to code

void MonthEnd()
{
  Report ExR = InitExpenseReport();
  Report rr = InitRevenueReport();
  Report EmpR = InitEmployeeReport();

  EmpR.Init();
  rr.Init();
  ExR.SetEmployees(true);

  if (ExR.GetReportParams())
    EmpR.GetReportParams();

  SendToPrinter(rr);
  SendToPrinter(ExR);
  SendToPrinter(EmpR);
}

What's wrong with this is that it's not clear how to modify it. It's not clear which statements are related, and have to happen in that order, and which aren't. It has only procedural cohesion. Rather put the detail into lower level procedures and code as:

void MonthEnd()
{
  PrintRevenueReport();
  PrintExpenseReport();
  PrintEmployeeReport();
}

Now it's clear what is related. Readability and maintainability is improved.

A procedure should not be just one damn thing after another, it should group statements into a logical whole. If you've ever written code like that, you'll know how the different operations tend to get glued onto each other over time, and tearing them apart is painful. Similarly, a paragraph of text is not just a collection of good sentences. That can still be incoherent.

Logical cohesion The routine does one of two or more things depending on flags passed in ... and does not solely consist of dispatch to smaller routines. E.g. it is OK to code

void MonthEnd(ReportType rtype)
{ 
 if (rtype == revenue)
   PrintRevenueReport();
 else if (rtype == expense)
    PrintExpenseReport();
 else
  PrintEmployeeReport();
}

if you like that kind of dispatch, but rolling the dependant procedures into the main one is not, since it has only logical cohesion.

Coupling

Coupling gauges the strength of the connection between two routines. Here are kinds of coupling from the best (loosest) to worst (most coupled)

Simple data coupling: Only simple unstructured data is passed between the routines as parameters. The result value depends only on the parameters. E.g. x = Sin(y); the value of x returned by Sin depends only on y.

Data structure coupling: As simple data coupling, but structured data (records or classes) are passed between the routines.

Control coupling: The flags passed by the first routine to the second tell the second routine what to do.

Global data coupling: The routines make use of the same global data. Tolerable if the data is read only, but not written. Stealth communication between distant parts of the program by writing global variables is one probably the best known kind of bad coupling.

Pathological coupling: the routines modify each other's internal data, or jumps to an address inside it. You don't see this much in modern, structured languages.

The modern age

This looks all rather procedural. Well, that was the state of the art at the time. These concepts can be applied to the larger blocks of code in use at present - classes or unit files.

Does each class in your program have a distinct unique role in life? This is a cohesiveness question. How much external data does it need? This is a coupling question. If a class is highly cohesive, then you are more likely to see situations where you could reuse it, and if it is lightly coupled, then it will be easier to carry out that reuse.

If a function on your class is passed a Employee object, but uses only the Age and Salary properties of that object, then consider passing in two simple values rather than the whole object - it's less coupled, and easier to understand and reuse.

A class called TrigUtils is cohesive (as long as it actually does what it claims to do), a class called MiscUtils probably isn't, and it may be a good idea to refactor by splitting it up. If not, at least make sure that the various utilities therein are not coupled to each other.

If the functions that could make up MiscUtils are attached to other classes (e.g. you were coding in Java or c# and they needed to be attached somewhere, so they ended up next to where they were needed), then they probably should be detached, since they make that class less cohesive.

You may need sharp eyes to see that there is indeed an independant function trying to get out - using a whole object rather than two parameters situation as above, or even a repeated couple of lines of code.

Exceptions

You cannot get perfectly high cohesion and low coupling throughout your program - once you've written all those unitary functions that depend only on their inputs, you will need a bit of less cohesive, more coupled code to glue them together into a working program.

Do you need to adhere to these rules slavishly? Well, no. As with any art, breaking the rules deliberately on occasion for a good reason is entirely different to hindering yourself by breaking them continually out of ignorance.

References

1) Cohesion was first discussed in a paper called "structured design" by Wayne Stevens, Glenford Myers and Larry Constantine in IBM systems journal, 1974

2)This writeup draws chiefly on Steve McConnel's "Code Complete", first edition, pages 81-92