One of my Bob Smizik‘s blog after he said “From the All-Star game to the end of the season, the Pirates scored only three fewer runs than the Giants (who won the ):”is the random use of . I posted the following on
There may not be anything factually wrong with your, but it is misleading. The problems include, but are not limited to:
1). Time interval chosen – you can choose statistics from different periods of time to fit whatever it is you are trying to say.
2). Sampling bias – it is random that you chose the Giants with which to compare the Pirates. They won the World Series, but this has zero validity to tell me that the Pirates have hope because they scored 3 less runs in the 2nd half of the year than the Giants.
3). Sampling bias II – If you trawl enough data – e.g. in the first quarter of the season as compared to pitchers, or of division winners from the eastern half of the country, you not only MAY find anomalies that support a hypothesis, you are LIKELY to find such anomalies.
4). Incorrect inferences – what is the implication of that stat? Some (maybe you) will take from that, that we are somehow slightly closer to being a respectable team because of this statistic, which is a completely incorrect reading. Not saying you necessarily think that, but I guarantee others do…need a disclaimer there.
As www.profootballtalk.com) likes to say, “Boom. Roasted.”(