I don’t normally like taking down defenseless sports writers, but…


One of my pet peeves is the random use of statistics. I posted the following on Bob Smizik‘s blog after he said “From the All-Star game to the end of the season, the Pirates scored only three fewer runs than the Giants (who won the World Series):”

There may not be anything factually wrong with your statistic, but it is misleading. The problems include, but are not limited to:

1). Time interval chosen – you can choose statistics from different periods of time to fit whatever it is you are trying to say.
2). Sampling bias – it is random that you chose the Giants with which to compare the Pirates. They won the World Series, but this has zero validity to tell me that the Pirates have hope because they scored 3 less runs in the 2nd half of the year than the Giants.
3). Sampling bias II – If you trawl enough data – e.g. ERA in the first quarter of the season as compared to NL West pitchers, or OBP of division winners from the eastern half of the country, you not only MAY find anomalies that support a hypothesis, you are LIKELY to find such anomalies.
4). Incorrect inferences – what is the implication of that stat? Some (maybe you) will take from that, that we are somehow slightly closer to being a respectable team because of this statistic, which is a completely incorrect reading. Not saying you necessarily think that, but I guarantee others do…need a disclaimer there.

As Mike Florio (www.profootballtalk.com) likes to say, “Boom. Roasted.”


Tags: , ,

2 Responses to “I don’t normally like taking down defenseless sports writers, but…”

  1. Matt Kelly Says:

    I respectfully disagree with you. The sample size is large enough and the compairison is on offenses only. The point of the article is that you can with with an offense like the Pirates IF you have GREAT pitching. Obviously the Pirates don’t have GREAT pitching so it doesn’t really matter. If a sample of 77 games isn’t large enough in baseball, what is? Look at splits, they are either compiled monthly or pre/post All Star break.

  2. David Rajakovich Says:

    Hey Matt, thanks for your comment. I only indirectly referenced sample size…that is a very small part of the overall argument. The main thrust of the argument is that if you look at many different stats (ERA, OBP, etc.), over many different periods of time, comparing the Pirates to many different (most likely successful) teams, you are likely to find a stat like this. However, the stat is essentially meaningless, and even misleading, as it takes one of the worst teams in the history of the Pirate organization and suggests that they were not too far away from being a decent team. It is essentially like the argument that if you take 1000 equity traders and measure each at the end of the year, you are likely to find one that beats the market for three years in a row by pure luck. It does not make that person a good trader, just that they were the one that randomly had good luck for three years…one of them was likely to.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: