Saturday, December 24, 2005

The Validation Of Data Analysts - A Change Is Coming

Basketball TV broadcasts frustrate me when they present statistics like this:

"Offensive rebounds: Michigan State 9, Wisconsin-Green Bay 9."

The broadcast team presents statistics to give the viewer an understanding of how teams or players have performed. If the above statistic were presented, though, the viewer would likely take away the message that the two teams rebounded equally well on the offensive end (and, therefore, on the defensive end, too). While each team did secure the same number of offensive boards, Michigan State did so in only 25 opportunities (22 missed FG, 3 missed FT) while UWGB did so in 41 opportunities (35 missed FG, 6 missed FT). This statistic would convey a much stronger message:

"Offensive rebounding percentage: Michigan State 36%, Wisconsin-Green Bay 22%"

In the past couple years, Ken Pomeroy, Kyle Whelliston, John Gasaway, and others have evaluated players and teams using statistics that measure efficiency (usually named "possession-based stats" or "tempo-free stats") rather than "totals" that depend in part on the number of opportunities a player or team has to obtain them (points, rebounds, assists, etc.). Mainstream media, meanwhile, have continued to focus on totals. Or is there some other explanation for Bracey Wright's selection to the 2004-2005 All-Big Ten first team?

The old guard of sports-talkers is vulnerable to the new breed of data analysts. Dick Vitale's skill as an analyst lies in watching players and teams perform and passing judgment on their ability to compete against other players and teams. I won't dispute his skill in this regard. What data analysts have taught us, though, is that there are tools that predict success with greater accuracy than Dick Vitale. Or just about any other analyst who insists on using his own judgment of specific situations and performances without considering the value of overall statistics.

I refer to Dick Vitale, Bill Walton, Charles Barkley, and others like them as "sports-talkers" because (1) the word would have an entirely different meaning without the hyphen, and (2) their most important skill lies in their ability to perform on live television when they have nothing to say. Data analysts can show so many things through statistics, though, that they would almost always have something to say, even without relying on mentioning J.J. Redick.

Data analysts threaten sportswriters more directly. Sportswriters can make their mark as a reporter (Chris Mortensen), an analyst, or a writer (Bill Simmons). I don't have a good example for the sportswriter-analyst because I really can't think of one. Sportswriters break stories or produce feature articles, or they entertain readers through their writing. Data analysts have a niche waiting for them, because they can make their mark through analysis alone.

The key to writing as an analyst is clarity. An analyst's purpose in writing is to convey to the reader the meaning of the analysis. If the reader can't understand the message, the analyst has failed. Beyond that, it's all good, as long as the analysis is sound and the matter holds significance for the reader. Just because Gasaway, as Big Ten Wonk, is entertaining doesn't mean he has to be to get his point across.

ESPN.com has recognized the analyst niche and begun to fill it -- Pomeroy and Whelliston have, quite recently, joined the payroll. Sportswriters that masquerade as analysts are now officially endangered. Thankfully, most sportswriters are true reporters as well and don't feel threatened. Grant Wahl and Luke Winn from SI.com each endorsed the aforementioned data analysts in an online column earlier this month. And sportswriters like Bill Simmons (not such a large group, is it?) are writers first, so they're not going anywhere, either.

I find the articles from Wahl and Winn encouraging because they signal a shift in the attitude of mainstream media. It's only a matter of time before television broadcasters start citing effective field goal percentage, rebounding percentage, and points-per-possession in their pre-game analysis instead of things like "Shoot The 3; Avoid Foul Trouble; Run, Run, Run." From there it's a hop, skip, and a jump to seeing these figures show up in box scores and in mid-game analysis.

Statistics have analytical and predictive value that rivals and often exceeds the judgment of even the most talented individuals. The next time you read or hear someone disregarding statistics in favor of personal judgment, without any sort of qualification, consider the source and his motivation. Self-preservation is a powerful instinct.

Sunday, December 18, 2005

Microwave squares

When my mom reheats a mug of coffee in the microwave, she'll mike it for 12 seconds, or 16 seconds, but not something "normal" like 15 seconds. This works for longer times, too -- I've seen her mike frozen vegetables for something like 2:43. I'm not normal, either, because when I wanted to mike something for 2 minutes and 30 seconds, I used to enter 1:90. The microwave that today I own seems to only accept whole increments of 30 seconds, which disappoints me. In any case, here is a math problem I came up with while waking up on Saturday morning.

The display on a microwave contains four digits, two for minutes and two for seconds (mm:ss). Because there are only 60 seconds in 1 minute, the number that can be read from left to right does not represent the true number of seconds that will be counted down by the timer -- 210 (2:10) represents 130 seconds, while 210 seconds can be represented by either 330 (3:30) or 290 (2:90). If the displayed number is one or two digits only, then the displayed number and true number of seconds will be the same (i.e., this is the trivial or uninteresting case). When the two numbers are different, they still may share interesting properties in some cases...

For what three- or four-digit displayed numbers are the displayed number and true number of seconds both perfect squares?

Saturday, December 17, 2005

Curious about the can

I have three questions about the bathroom and related activities:
  1. Why do some men wash their hands before doing their business in the restroom? Or am I the weird one for even asking this question? I've done it a couple of times, but only if I've just eaten and I know there's something on my hands that would get smudges on my clothes. So is this also the case each time I see someone wash his hands first? I'd really like to ask a total stranger why he does this -- not someone I know, because I'd have to talk to him again afterwards -- but I've never felt comfortable sticking around until the guy comes out of the stall. And having a conversation with a total stranger who's in a bathroom stall is something I don't do while sober. Actually, there's a better reason to stick around until he's done -- to see whether he washes his hands after as well.

  2. Do you leave the toilet seat up or down? When you replace the toilet paper roll, do you have it roll forward (pull over the top and down the front) or backward? I know what I do (down; forward) but I'm not qualified to tell you that you're wrong if you do it differently. I wonder, though, whether there's a correlation between the two. Are people that leave the toilet seat up more likely to have the toilet paper roll face forward than those that leave the toilet seat down? And so on. This is the kind of research I'd do if there were 87 hours in the day.

  3. What goes through a dog's mind when his owner cleans up after him with a plastic bag while taking a walk? I can imagine just about anything:
  • "My owner is one sick ____."
  • "Hey, leave that! I was marking my territory! Kind of like what you do with your dirty clothes."
  • "As long as you're going to pick up after me, would you mind cleaning my _______ with your tongue, too?"
  • "If I had known this stuff was so desirable, I wouldn't have eaten it all these years."

Wednesday, December 14, 2005

Boxing Day -- not quite brownies, but close...

Just now I discovered the true meaning of Canada's Boxing Day. And I don't mean in the sense of how the Grinch discovered the true meaning of Christmas. I mean in the sense of why the day is named Boxing Day. It took me so long only because I never drove myself to resolve the uncertainty I felt when considering the name. "Why is it called Boxing Day?" is a question easily displaced by "How many strips of bacon can I have?" and "What do you mean, we're out of batteries?" It wasn't until today that the true meaning was shoved in my face.

The day has nothing to do with prizefighting, as I may have supposed in my early years. The day has nothing to do with Chinese anti-imperialism, as I may have speculated in high school. Boxing Day has to do with the practice of attending church on Christmas Day and leaving the distribution/receipt/opening of gifts (depending on whose story you believe) until the following day. The name arises from the boxes that contain the gifts.

Now that, my friend, is a lightbulb moment.

Friday, December 09, 2005

To do list...

Things to do in the near future:

  1. Put up links to this blog's archives. You know, for all my readers. Which leads me to wonder, in today's increasingly virtual world, do people invent imaginary online friends?
  2. See that someone gets jail time for the thousand messages my spam mailbox has received in the past 24 hours for "The Bouncer."
  3. Find the web's best-formatted single-page listing of all World Cup matches (this will have to do until the Sun-Times devotes a full page to it next summer), print it out, and thumb-tack it to my cubicle at work. Right next to my inspired diagram for Feature Group traffic. Ah, Microsoft Visio...it is so choice. If you have the means, I highly recommend picking one up...I mean, finding out if your company has a corporate license. But after spending an hour in 2002 fruitlessly trying to put my own spreadsheet together -- unfortunately, even conditional formatting has its limits -- I'm willing to let sports websites do the work for me this time around.
  4. Decide whether I should stop listening to the latest Jamiroquai album before I feel the need to officially declare "Starchild" my current favorite song, which would pretty much guarantee that I'd stop liking it within the next month. From another perspective, though, it could be my fourth favorite song on the album -- like an ACC team that finishes way back in conference, but everyone knows they still have a shot to win the national title -- so I should be in the clear.
  5. Go see Aeon Flux before I get talked out of it by the reviews I haven't read. [And yet somehow I know what they say. Damn news website leads.]

At least we have Ghana

It's the World Cup, baby! Today's draw...could have gone better for the United States. Making the second round shouldn't come cheap, though...and in Group E, it definitely won't.

Group A - Almost History

Poland was one ping-pong ball away from facing Germany in the opening match (!). After being drawn first from Pot 3, Poland was to be placed in one of the two remaining spots in Group A, A2 or A3 (Ecuador had already been placed in A4). Alas, the historic match-up was not to be. Germany will likely reprise its opening match from the 1994 World Cup, an instantly forgettable 1-0 victory over Bolivia, when facing Costa Rica in Munich on June 9.

As the host nation, Germany should run away with this group. Second place will earn a spot against England, with a chance of Sweden, in the second round.

Group B - The Afterthought

Barring some remarkable performances from Paraguay, this group's top spot should come down to who can score the most against Trinidad & Tobago. England and Sweden should reach their end-of-round match with six points each, leading to a meaningless 1-1 draw in which Wayne Rooney and David Beckham strive desperately to avoid yellow card suspensions.

Group C - 9, 6, 3, 0

I'm calling it now -- someone will win the Argentina-Netherlands match. It won't really matter, though, because each of them will make it to the final eight, and neither of them can face Brazil until at least the semifinals. I'm also calling this now -- the Ivory Coast-Serbia & Montenegro match will have more red cards than any other in this tournament. Think about it -- two teams coming in with no points; an African team that has probably turned on its manager (they qualified over Cameroon, so they may as well follow in the Indomitable Lions' footsteps); and, as Turkey didn't qualify, I think S&M has what it takes to be the 2006 sending-off champ.

Group D - Even Portugal Can't Screw This Up

Mexico must be happy. Iran and Angola, meet Jared Borgetti's head. This is why it's good to be seeded...especially when the only threatening team in Pot 4, the United States, can't be placed in your group. But Portugal must be even happier. After being overwhelmed in the first half-hour by the United States and giving up a weak goal to South Korea to choke their way out in 2002, the Portuguese have received valet parking and a red carpet on their way to a round-of-16 defeat.

Group E - And I Like The Chicago Bears' Chances, Too

It's not a true Group of Death -- unless Ghana turns out to be really good, that is -- but it does have the Czech Republic, ranked second in the world, and Italy, against whom I'll be amazed if DaMarcus Beasley doesn't get injured. (I just can't imagine the Azzurri letting him run by them for 90 minutes without incident.) If the United States knew they would end up with this draw, they would have been much more upset about missing out on a seed by one lousy point.

The path to the second round for the United States is clear -- don't lose either of the first two matches, and then take Ghana out behind the Holzhütte (bastardized German for woodshed). No injuries. (Claudio, J.O'B., this means you.) No suspensions. (Entire backline, this means you.) Donovan actually converting a breakaway. Keller playing as out-of-his-mind as Friedel did four years earlier. Gooch frustrating strikers. Beasley driving defenders insane. And McBride justifying my jersey purchase.

Group F - Ronaldinho reminds me of Tori Spelling. There, I said it.

Brazil's offensive depth -- all-world Robinho (which is pronounced Rob-EEN-yo, not ROB-in-ho...) probably won't even start -- could be the best thing to happen to Australia or Croatia, because even the Brazilian subs will still be able to light up Japan at the end of the group stage. Whoever takes second place in this group will end up being one of those "Wow, I didn't realize they made the second round in 2006" teams. You know, like Paraguay in 2002 or Paraguay in 1998. [Hmmm, maybe Paraguay won't be quite the walk-over I think they will in Group B.]

Group G - Even France Can't Screw This Up

Repeated heading, yes. But, seriously -- Switzerland, South Korea, and Togo? France should thank their lucky stars that they've been handed a waterslide to a penalty shootout with Italy in the quarterfinals. [Did that Bill and Ted's reference fly by too quickly?] I would like Togo's chances to go through -- if the three teams other than France all play to draws, Togo could advance by knocking off France, who will likely have wrapped up the group by then, on the last day of group play. However, after getting embarrassed by Senegal in their first match in 2002, France simply can't blow this game against another African unknown...can they?

Group H - The Big East Conference

Seriously, folks...this is West Virginia in a BCS bowl. Except if the conference got to send two teams.

I've never been a believer in Spain, even though I'd be surprised to see them fail to advance. I'm not much of a believer in Ukraine, either, even though they were the first side to qualify from Europe. As usual, I have no idea what to expect with Tunisia or Saudi Arabia. (That should reflect on them, not on me, I hope.) But as the group winner gets Brazil in the quarters and the runner-up gets France in the next round, this may be one of the groups whose games I don't stay home from work to watch. [After all, that's what live web updates are for.]

Tuesday, December 06, 2005

Choosing among outcomes, not methods

On Tuesday FIFA announced the eight teams, out of the 32 that have qualified for the World Cup finals next June, that will receive a top seed in one of the eight groups. After round-robin play within the groups, the top two teams from each group advance to a 16-team single-elimination bracket to determine the World Cup champion. Being seeded allows a team to avoid facing any of the other top seeds in the group stage of the competition, giving the team a more certain path to the second round.

The seeds are determined using a method that is dictated in general terms by FIFA guidelines. The specific implementation of those guidelines, though, is up to the Organizing Committee for each World Cup. The fine-tuning of these guidelines is announced at the same time as the seeds -- the Committee does not announce in advance the specific criteria it will use. This makes sense from the standpoint of event promotion, because the seeds would no longer be a mystery once the criteria were announced. Of potentially greater importance, it also allows the Committee to select a set of criteria that produces the most desirable set of seeded teams.

FIFA is running a business, so it would like to make decisions and take actions that produce the largest profits. Certain teams have larger fan bases with larger potential television revenues, so it would be beneficial to smooth their path to a longer stay in the tournament. And in the 1994 World Cup, two of Italy's first-round matches were played as relative home games, in Giants Stadium just outside of New York City, which boasts a huge Italian population. However, allowing economic and other off-field issues to override other measures of teams' relative quality on the field would make for very bad press. Given the choice, then, the Organizing Committee would prefer that the seeds turn out as desired by economic considerations alone, while using a method that utilizes only performance-based measures.

This wasn't as complex a problem as it may sound, at least this time around. Of the eight seeds, one is given to the host nation, in this case Germany. A second belonged to Brazil, unquestionably the strongest team in the world. There are another 10 teams that could reasonably have been seeded (Argentina, Czech Republic, England, France, Italy, Mexico, Netherlands, Portugal, Spain, United States), so there is a relatively limited set of potential outcomes.

[If you're thinking of a number, and it's 210, you're probably as big of a dork as I am.]

There is also a relatively limited set of methods, or selection and weightings of criteria, that are available for use in determining the seeds. Because the seeds will be definitively determined by the method chosen, the Committee knows the outcome that would result from each method. This means that the choice for any particular method is really a choice for its resulting outcome. The Committee is really choosing among outcomes, not methods.

Rather than comparing the possible methods to determine which is the fairest, the Committee can rank the derived outcomes in order of their projected economic value. Starting with the most favorable outcome, the Committee can proceed down the list until it finds one that is associated with a method that will withstand scrutiny.

This time around, if the Committee had decided to give slightly more weight to FIFA rankings, the United States would likely have displaced Italy as the 8th seed. However, if the Committee had incorporated the teams' performance at the last three World Cups (as the Committees for 1998 and 2002 did), instead of just the last two, Italy may have finished even further ahead of the United States (in 1994 Italy made the final, while the United States lost in the Round of 16). The Committee may have had a choice among several methods that all produced the same outcome -- the same set of seeded teams -- and settled on a method that caused the United States to finish a close 9th, rather than a distant 9th or 10th, with an additional selling point being that the method was only a slight tweak from that used at the previous two World Cups. [I'm probably understating the political desire to maintain consistency over time, as significant change is always subject to greater critique than maintenance of the status quo.]

When the outcome of a particular process is automatic, any choice involving modification of that process is effectively a choice among the potential outcomes. The seeding of teams at the World Cup finals is such a process, so an analysis of the choice of methods is not complete unless it considers the possible outcomes as well.

Sunday, December 04, 2005

The Supermen among us

This Saturday afternoon I saw a firetruck back into its station. Traveling south on Ashland, Engine 30 briefly sounded its siren to stop traffic and then did a big S-move, positioning itself at a 90-degree angle to the street. The driver then hurled the engine, starting from a standstill, backwards across the street and through the doorway at over 20 miles per hour. I had two simultaneous thoughts: "Wow, I can't believe he did that!" and "That was pretty impressive." I guess I was awestruck. I'm not saying I was 15-point-comeback-against-Arizona awestruck, but I think that's the right word to describe those two quotes.

I don't know why the fireman driving the engine sped into the station. Maybe he wanted to clear the street as quickly as possible to allow traffic to continue moving. Maybe he had backed in the engine so many times that speed wasn't a concern. And maybe it just doesn't faze him at all because driving a truck in reverse pales in comparison to racing into a burning building.

I'm not saying firemen are my heroes. When I was little, my heroes consisted of astronauts, football players, and anyone that carried a lightsaber. And, of course, my dad. [Who occasionally carried a lightsaber.] But firemen are among the heroes of our society as a whole, and I certainly recognize and admire them for what they do. And when I think of heroes, I imagine that everything they do is extraordinary, even the most basic aspects of life. When a football player reads a book to kids, he does the evil wizard's voice perfectly. And when a fireman backs his car into the garage, he does it at break-neck speed.

It helps us believe. We rarely get to see a fireman do what makes him so special. So when we get the chance to see him do something completely ordinary, it bolsters our faith to see it done in an extraordinary way. Knowing that a potential hero can do what we probably couldn't do when it comes to ordinary things gives us hope that he can do what we almost certainly couldn't do when it comes to matters of life and death.

I'm glad I got to see what I did on Saturday. The older I get, the less I tend to rely on others. It feels good to be reminded that there are people in the world that can do the things I can't do, and that they're out there waiting to help me if I need it.