Baseball’s Sabermetrics

Unless you love baseball and/or are a bit of a numbers geek, you will probably find the first portion of this column rather tedious. I ask that you at least skim over the following so that you will understand the last portion of the column, which offers a broader perspective that will give each of you, my dear readers, something to consider over the course of this coming week.

Baseball, as virtually everyone realizes, is rife with statistics. There are the obvious statistics involving total home runs, doubles, triple, runs batted in, etc. and there are the common statistics involving mathematical calculations, such as batting average, on base percentage, and slugging percentage. Back in the 1980s, however, the computer became personal and the world of baseball statistics changed forever.

Baseball has been around a long, long time and, as a result, there is an enormous amount of data for every year, every team, and every player. Computers made it possible to organize, sift, and calculate this data in ways that were virtually impossible prior to their arrival and a handful of ardent baseball fans (who also happened to be mathematicians) recognized the potential of these huge data sets and the computer.

One man in particular, Bill James, began to experiment with analyzing the data to attempt to predict future performance by individual players, creating a variety of new statistics that he termed Sabermetrics (derived from the acronym SABR for Society for Baseball Research). He began publishing an annual book entitled “Bill James’ Baseball Abstract” that made his research (along with work by others) available to the general public, and garnered his new statistical work legions of fans, including many in the front offices of Major League Baseball teams.

One of the most widely accepted statistics James created was a measurement for pitching success he called WHIP. In this instance the concept was so simple, and the math was so easy, you have to wonder why no one thought of it prior to James. All WHIP does is take the number of hits a pitcher allows, adds that to the number of walks (base on balls) he allows, and divides the total by the number of innings he pitches. This statistic is now so commonplace, that it is even printed on the backs of baseball cards.

Needless to say, Sabermetrics attracted many more lovers of statistics with highly skilled math backgrounds and the world of baseball statistics will never be the same. Below are just a few of the new statistics available to fans and baseball professionals:

wOBA (weighted on base average)

This statistic is an update of the standard on base percentage (OBP) that has been around almost since baseball began. The idea here is that OBP only measures how often a player reached base; wOBA shows not only shows how often a player reached base, it also takes into account the value his reaching base had to the team over the course of a season. Here’s what the math looks like:

((0.72 x NIBB) + (0.75 x HBP) + (0.90 x 1B) + (0.92 x RBOE) + (1.24 x 2B) + (1.56 x 3B) + (1.95 x HR) / PA

I grant the above looks rather intimidating (and this is one of the simpler calculations Sabermetric practitioners use) but here’s a quick explanation: NIBB stands for non-intentional base on balls (or walks); HBP is hit by pitch; RBOE is reached base on error; 1B, 2B, 3B, and HR are single, double, triple, and home run, respectively; and PA is plate appearances (i.e. how many times the player came up to bat).

The numbers you see in the above equation can change depending on the scope of the date set used (the 2009 season will have one set of numbers while the 2006 through 2009 seasons will have another set of numbers). These represent the average number of runs that can be expected from any of the outcomes for a batter listed above. Put more simply, if a batter singles there is a 90 percent chance it will result in a run for his team.

These individual numbers for each outcome are based on crunching the numbers for all players for a given season (or set of seasons).

Are you with me so far? Well, most of you probably aren’t, but again, hang with me. I’ll go quickly through just a few more before I get to my point.

PECOTA (Player Empirical Comparison and Optimization Test Algorithm)

Mathematicians do have a sense of humor: this very, very complex formula honors journeyman baseball player Bill Pecota who is considered by practitioners of Sabermetrics as the archetype of an average player. This statistic is used to forecast future performance in a variety of advanced sabermetric categories.

UZR (Ultimate Zone Rating)

This statistic measures a player’s ability to field his position. A baseball field is divided into 78 zones, 64 of which are used to calculate this statistic. In essence UZR measures how effectively a player fields in the zones surrounding the position he plays on the field compared to other players. Think of it this way: shortstop A has the ability to go deep into the hole between short and third, while shortstop B does not. At the end of a season, shortstop A may have several more errors than shortstop B, but because shortstop A was able to reach more ball and – presumably – convert them into outs, shortstop A saved his team more runs that did shortstop B and, as a result, will have a better UZR.

WAR (wins above replacement)

This is the statistic that combines all sorts of other sabermetrics in an attempt to measure a players true value to his team: how many wins did the team gain by playing a particular player that they wouldn’t have achieved if they had played a lesser player (think Bill Pecota). Individuals in the management side of baseball have taken this statistic a step farther by breaking it down into salary groupings (i.e. how much did it cost us to gain this many wins?). Frankly, it’s almost insidious.

And that brings me to the point of this whole column. Keep in mind that the above descriptions are just a tiny portion of the new statistics being used to measure a baseball player’s performance and, with that in mind, start thinking about what might happen if these mathematicians turned their attention to other, non-sport professions.

In some cases, this might not be a bad thing. Wouldn’t the world be a better place if we had an EAR (earnings above replacement) for financial advisors? What about a CVI (component value index) that measured the years of service we could expect from washing machine against the initial cost of the machine and the anticipated cost of repairs during the machine’s lifetime (that should kill off “planned obsolescence”).

On the other hand, how many of us could withstand the type of scrutiny baseball players face from their employers?

The scary part of this is that thanks to the wealth of data that is available these days, and the ability to manage and shape that data exists, almost anything is possible – even a EAR or CVI.