Tuesday, December 08, 2009

 

Some Things I Learned in the Past Week (Part I)


It turns out that men-on-base information isn't preserved in the Gameday database. I hadn't ever noticed it before, but if you load up an old game like this one between the Angels and Blue Jays from 5/7/9 and look at any plate appearance aside from the final one, you'll find that the Runners on: dialog will never change to reflect the PA you're looking at, nor will the little dots appear on the field schematic to indicate a player's occupying the base.

The only place where that shows up, in fact, is in the gameday_Syn.xml file loaded by Gameday. There's a subtree in the file containing information about the next two batters and the current (final) pitch sequence, as well as containing nodes for the three bases with Boolean attributes to indicate occupation. That came as something of a snag for me this past week, but it's an intriguing one.

In any case, I've got to do something far more interesting than table-lookup for this part of the project to work but have a plan. So far, it's looking good—should have something quite amazing built by the time I hit the sheets tomorrow night.

Update: Some more stuff I learned this week: my data indicates that there were 283,862 plays or personnel changes made during the course of the 2009 MLB season. Of those, the most frequent were assisted groundball outs at 32,655 that didn't advance a runner, followed by 31,267 fly ball outs that didn't advance a runner. The third most frequent play type was the swinging strikeout: 26,918 of them. There were 13,726 walks issued with 1st base unoccupied and 7,861 pop outs. The most common hits were bases-empty singles on line drives followed by ground ball singles at 7,830 and 7,172, respectively. There were 3,602 two-throw double plays with outs at 1st and 2nd. 1933 batters struck out on foul tips. There were 9,896 pitching changes not involving a defensive switch. One batter lined into an unassisted triple play. Six other players hit into a triple play, four on liners, two on ground balls. Fifteen players hit inside the park home runs. 862 play-types occurred only once, out of 2,300 different play-types (suppressing player and position names).

Amazingly, this information has non-trivial uses... Trust me!

(Re-ran the data set, so the numbers changed at 10:23 from an hour or two earlier.)
((Turns out those numbers were wrong, but I do have them correctly now.))

Final update: I noticed what seemed to be some odd inconsistencies with that dataset and figured it out. Those counts include spring training and WBC games, so those numbers (and some of the underlying data) are a slightly screwy.

Labels: , ,






<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]