Denormalization is the process of attempting to optimize the performance of a database system by adding redundant data.

Examples of denormalization that I've found in E2 include storing the number of cools that a writeup has received in the writeup table, when a simple SQL select count on the coolwriteups table can retrieve that information. E2 also stores the number of writeups that a user has created in the user variables when you can do a select count on node join writeup to calculate that.

Database designers often justify denormalization on performance issues, but they should note that logical denormalization can easily break the consistency of the database, one of the all-important ACID properties. However, a designer can achieve the performance benefits while retaining consistency by performing denormalization at a physical level: create an indexed view on the tables in which you are interested, and the DBMS will physically denormalize the data into the index for faster queries on the view, but every time a user inserts, updates, or deletes something in the table, the DBMS will automatically update all views and indexes.

The biggest remaining causes of harmful denormalization nowadays:

  1. newbie DB designers
  2. MySQL, or any other DBMS that can't index a view
Denormalisation is the keystone of data warehousing, in which data from one or more tables of one or more databases is denormalised and stored in a large warehouse database.

Because the denormalised tables represent the result of table joins commonly performed during operational application - but which may be slow due to their iterative nature - data warehouse databases tend to be generated periodically (nightly, weekly).

An easy way to imagine a data warehouse is to think of all the joined queries you would perform on your database, then create tables that represent those queries. Because no join is necessary, the warehouse database is much faster at returning result sets.

Log in or register to write something here or to contact authors.