Something all empirical researchers should do, but...

So, you're reading this study, where they've got some model which they want to statistically or econometrically test. The paper carefully outlines the theory, explains how they do some fancy regression analysis on it, and then show you the results.

And, lo and behold, they've got GREAT p-values! Obviously the model captures some truth, and the significant coefficients are awfully enlightening.

Yeah, well, maybe. Assuming that the author took their work seriously, and didn't actually intend to lie with their statistics, here's how it probably went:

  1. Come up with a model and theory.
  2. Go looking for data which could support (or refute) the theory.
  3. Fail to find such data.
  4. Modify theory a bit to match the data that actually might be available.
  5. Re-interpret variables in the model so they can be applied to sorta similar variables in the data.
  6. Actually collect or purchase said data, and spend a long while cleaning it up.
Actually, so far that's pretty benign. Research is hard, and one has to make do. But then:

      run regressions;
      while ( !(significant coefficients) || (coefficient signs all wrong) )
      {
        tear out hair;
        rethink model;
        futz with model;
        rationalize changes;
        run regressions;
      }
      write-up results;
      publish;

All of which is worthwhile data mining, valuable things can be learned from the process, and there's a decent chance that the final model will be an improvement over the original. But, that's not how it's presented, it's presented as, "We tested our theory, and here's the statistics."

Those aren't the statistics for a tested theory. The p-values or t-statistics can't be interpreted. The statistics are derived based on probability theory with certain assumptions about random error. In order to really test the results, one would need to rederive the theory for:

What are the probability distributions of my error terms when the model I'm using was selected in a jerry-rigged, entrail-reading, Zen angst process, entirely CONDITIONAL on a path dependent series of previous regressions?
The process is valuable, the results are flawed. Thereby, out of sample testing. After one has tortured the data until it confesses, get a clean set of data and see if the confession was valid. Run the final regression on the new data, and the statistics are applicable.

Unfortunately, researchers often use up all the data they have in the data mining process, and then put a good face on it.