Saturday, December 24, 2005

The Validation Of Data Analysts - A Change Is Coming

Basketball TV broadcasts frustrate me when they present statistics like this:

"Offensive rebounds: Michigan State 9, Wisconsin-Green Bay 9."

The broadcast team presents statistics to give the viewer an understanding of how teams or players have performed. If the above statistic were presented, though, the viewer would likely take away the message that the two teams rebounded equally well on the offensive end (and, therefore, on the defensive end, too). While each team did secure the same number of offensive boards, Michigan State did so in only 25 opportunities (22 missed FG, 3 missed FT) while UWGB did so in 41 opportunities (35 missed FG, 6 missed FT). This statistic would convey a much stronger message:

"Offensive rebounding percentage: Michigan State 36%, Wisconsin-Green Bay 22%"

In the past couple years, Ken Pomeroy, Kyle Whelliston, John Gasaway, and others have evaluated players and teams using statistics that measure efficiency (usually named "possession-based stats" or "tempo-free stats") rather than "totals" that depend in part on the number of opportunities a player or team has to obtain them (points, rebounds, assists, etc.). Mainstream media, meanwhile, have continued to focus on totals. Or is there some other explanation for Bracey Wright's selection to the 2004-2005 All-Big Ten first team?

The old guard of sports-talkers is vulnerable to the new breed of data analysts. Dick Vitale's skill as an analyst lies in watching players and teams perform and passing judgment on their ability to compete against other players and teams. I won't dispute his skill in this regard. What data analysts have taught us, though, is that there are tools that predict success with greater accuracy than Dick Vitale. Or just about any other analyst who insists on using his own judgment of specific situations and performances without considering the value of overall statistics.

I refer to Dick Vitale, Bill Walton, Charles Barkley, and others like them as "sports-talkers" because (1) the word would have an entirely different meaning without the hyphen, and (2) their most important skill lies in their ability to perform on live television when they have nothing to say. Data analysts can show so many things through statistics, though, that they would almost always have something to say, even without relying on mentioning J.J. Redick.

Data analysts threaten sportswriters more directly. Sportswriters can make their mark as a reporter (Chris Mortensen), an analyst, or a writer (Bill Simmons). I don't have a good example for the sportswriter-analyst because I really can't think of one. Sportswriters break stories or produce feature articles, or they entertain readers through their writing. Data analysts have a niche waiting for them, because they can make their mark through analysis alone.

The key to writing as an analyst is clarity. An analyst's purpose in writing is to convey to the reader the meaning of the analysis. If the reader can't understand the message, the analyst has failed. Beyond that, it's all good, as long as the analysis is sound and the matter holds significance for the reader. Just because Gasaway, as Big Ten Wonk, is entertaining doesn't mean he has to be to get his point across.

ESPN.com has recognized the analyst niche and begun to fill it -- Pomeroy and Whelliston have, quite recently, joined the payroll. Sportswriters that masquerade as analysts are now officially endangered. Thankfully, most sportswriters are true reporters as well and don't feel threatened. Grant Wahl and Luke Winn from SI.com each endorsed the aforementioned data analysts in an online column earlier this month. And sportswriters like Bill Simmons (not such a large group, is it?) are writers first, so they're not going anywhere, either.

I find the articles from Wahl and Winn encouraging because they signal a shift in the attitude of mainstream media. It's only a matter of time before television broadcasters start citing effective field goal percentage, rebounding percentage, and points-per-possession in their pre-game analysis instead of things like "Shoot The 3; Avoid Foul Trouble; Run, Run, Run." From there it's a hop, skip, and a jump to seeing these figures show up in box scores and in mid-game analysis.

Statistics have analytical and predictive value that rivals and often exceeds the judgment of even the most talented individuals. The next time you read or hear someone disregarding statistics in favor of personal judgment, without any sort of qualification, consider the source and his motivation. Self-preservation is a powerful instinct.

No comments: