In a CPU, some operations (like division or loading from a uncached location) may take much longer than others. If the CPU can only execute the instructions in the order they appear, when executing a long-latency instruction, it cannot execute anything after that, even if many of the following instructions do not depend on the result of the long-latency instruction. This is inefficient.
Using out-of-order execution, when some long-latency instructions are being executed, the CPU can choose some following instructions to execute, if they don't depend on the result of the instruction, and there are resources (execution units) available to execute them. When the long-latency instruction finishes, many of the following independent instructions would have finished too, so we only have to execute the instructions that depend on the long-latency instruction. The results of executed instructions are then "committed" (also called "retired") to real registers (or memory) in their original order, to keep correct semantics.
For example, look at the following pseudocode:
Load A from memory
Load B from memory
Do things with E
If B was not found in cache, so it will take a long time before B become available, we can calculate C while waiting for the load to complete. When B is finally loaded from memory, C is probably already calculated, so we only need to calculate D and E then.