One way of maintaining cache coherence in multiprocessing designs with CPUs that have local caches is to ensure that single cache lines are never held by more than one CPU at a time. With write-through caches, this is easily implemented by having the CPUs invalidate cache lines on snoop hits.
However, if multiple CPUs are working on the same set of data from main memory, this can lead to the following scenario:
- CPU #1 reads a cache line from memory.
- CPU #2 reads the same line, CPU #1 snoops the access and invalidates its local copy.
- CPU #1 needs the data again and has to re-read the entire cache line, invalidating the copy in CPU #2 in the process.
- CPU #2 now also re-reads the entire line, invalidating the copy in CPU #1.
- Lather, rinse, repeat.
The result is a dramatic performance loss because the CPUs keep fetching the same data over and over again from slow main memory.
Possible solutions include:
- Use a smarter cache coherence protocol, such as MESI.
- Mark the address space in question as cache-inhibited. Most CPUs will then resort to single-word accesses which should be faster than reloading entire cache lines (usually 32 or 64 bytes).
- If the data set is small, make one copy in memory for each CPU.
- If the data set is large and processed sequentially, have each CPU work on a different part of it (one starting at the beginning, one at the middle, etc.).