It seems like it should be important to know how a certain batter has done against a certain pitcher. It is not. It is not important.
Oct 14, 2011 - The other day, Brewers manager Ron Roenicke was drafting a lineup to put up against Cardinals righty Chris Carpenter. When it came to picking a center fielder, Roenicke had three options: Nyjer Morgan, Carlos Gomez or Mark Kotsay. He wound up going with Kotsay - not just because Morgan seemed to be in a bit of a slump, but also because Kotsay is left-handed, and was 4-for-11 against Carpenter in his career.
When I read that explanation, I sighed. I imagine that many people sighed. Of course, Kotsay would go on to homer against Carpenter and draw a pair of walks, but that's kind of beside the point. In retrospect, I probably should have chosen a better example.
Maybe it's my imagination, but I feel like I'm being bombarded with microsplits this month more than ever before. By "microsplits", I'm referring to any small-sample statistical split in general, but specifically, for this post, I'm referring to batter vs. pitcher data. Batter vs. pitcher data is everywhere I turn. It's all over Twitter, from reputable and less-reputable sources alike. It's being cited by managers. It's being discussed on the air. All the time, it's being discussed on the air. This guy has this many hits in this many at bats against this other guy. Over and over.
In a vacuum, that's okay. Batter vs. pitcher data is information. There's nothing wrong with information. The problem is how that information is interpreted. That information is treated like it's meaningful, where by "meaningful" I mean "predictive". A lot of people act like, because a match-up was so in the past, so it will continue to be in the future.
And that isn't true. With batter vs. pitcher data, that isn't true. Dave Cameron just wrote a good post about this at FanGraphs a short while back, and you should read it. In short: this data isn't predictive. Thorough examination has shown that this data isn't predictive. It's not like batter vs. pitcher data is completely, 100 percent irrelevant, but it has to be so heavily regressed that you might as well not have the data at all. You're better off looking at the observed overall performances by a given hitter and pitcher.
Okay, so for many of you, this isn't news. It isn't exactly an Internet revelation that batter vs. pitcher data is of little use. But I think it's worth considering why such data is still treated as significant, even though it's essentially been proven that it is not.
The first reason, and the main reason, is that, intuitively, batter vs. pitcher data seems perfect. It seems like exactly the data you should want. Let's say you're a manager putting together a starting lineup. The other team is starting a lefty on the mound. When you're making your lineup, you don't think about your hitters' overall performances - you think about their performances against lefties. You do that because it gives you a better idea of how they'll perform against this particular lefty. But what if they already have an established performance against this particular lefty? In theory, shouldn't that give you an even better idea of how they'll do? What better way to predict how someone will do against someone else than by examining how that specific matchup has gone in the past?
That isn't how it works. But it feels like that should be how it works. It makes so much intuitive sense that it can be hard to believe it doesn't make actual sense.
A second reason, and a lesser reason, is that I think people are wired to not care too much about sample size. It would be one thing if a batter had faced a pitcher 1,000 times. Then that information would be significant. More commonly, a batter has faced a pitcher 10 or 20 or 30 times, and so that information is not significant. The sample size is far too small, spread over too many years, for anything to be made of it.
But it isn't the instinct to worry that a sample is too small. People make quick judgments based on very limited information all the time. Think about your opinions of other people you've just met. Think about cities or countries you've visited once or twice. Yelp is a website built around members publishing reviews of establishments based often on one single experience. That's crazy! But we're always doing it. We seldom wait for a sample to be big enough in life, and many seldom wait for a sample to be big enough in baseball.
We want for batter vs. pitcher data to matter. It seems too perfect for it to not matter. It will never matter. Never, for as long as baseball is played as it's currently played. It's just a meaningless microsplit. There's that old joke about statheads worrying about how a batter does against lefties on Tuesday nights in domes between the fourth and sixth innings. The ingredients of the joke change, but the joke itself stays the same: statheads worry about ridiculous microsplits. In reality, it isn't the statheads who concern themselves with ridiculous microsplits.
Mickey Hatcher, Angels Hitting Coach, Relieved Of Duties
Vance Worley Has 'Soreness', Will Miss Wednesday Start
Justin Morneau Will Be Activated From DL Wednesday
Manny Ramirez Rehab Assignment Will Start Saturday In Albuquerque
Jon Jay Heading To Disabled List; Shane Robinson Recalled
Troy Tulowitzki Day-To-Day With Deep Leg Bruise
Matt Kemp Goes On 15-Day DL With Sore Hamstring
Jeff Niemann Likely Missing 'A Few Months' With Leg Injury
VIDEO: Bryce Harper's First Career Home Run
Torii Hunter Placed On Restricted List Following Son's Arrest
More News »
Comments
The only worse thing is hearing "he's 2 for 13 against this pitcher, so he's due"
and knowing that they really think this increases the likelihood of getting a hit in this at-bat
by cfj3 on Oct 14, 2011 4:41 PM EDT reply actions
well technically
If the guy is better than a .167 hitter, regression would imply that his chances ARE better.
(I know what you meant.)
Purple Row - For all of your Colorado Rockies-related needs
Learn about Batting Metrics
Learn about Pitching Metrics
by Andrew Martin on Oct 14, 2011 10:47 PM EDT via mobile up reply actions
His chances of getting a hit are not better by the fact that he has slumped recently.
by ColeFitz88 on Oct 14, 2011 11:09 PM EDT up reply actions
but his chances are better than .167
follow @klett206
by Rochestie4ever on Oct 19, 2011 6:43 PM EDT up reply actions
My second favorite abused stat
Batter v. Team, either in season or over career.
You don’t have to understand regression or small sample size to understand how meaningless that stat is. Hey look, this batter hits .300 against teams wearing blue and white!
HangingSliders.com
A Smart & Sassy Baseball Blog
@hangingsliders
facebook.com/hangingsliders
by Wendy Thurm on Oct 14, 2011 4:46 PM EDT reply actions
It feels B v.P splits have gotten completely out of control this postseason
It’s worse than the puns
by Poochie on Oct 14, 2011 4:55 PM EDT reply actions
The best* batter vs pitcher splits are the ones like “Smith is hitting .400 against Jones in his career,” and you find out he went 2 for 5 six years ago.
by Phrozen on Oct 14, 2011 7:03 PM EDT reply actions
Asterix fail.
by Phrozen on Oct 14, 2011 7:03 PM EDT up reply actions
haha
follow @klett206
by Rochestie4ever on Oct 19, 2011 6:43 PM EDT up reply actions
I hate this joke so, so much.
Juan "Doesn't Cheat The Game" Perez, future CF for the World Champion San Francisco Giants.
"And besides, if I wanted to participate in a mindless patriotic ritual where my voice isn’t really heard, I would vote." - Chris Marcil
by marcello on Oct 15, 2011 3:17 AM EDT reply actions
The postseason is full of microsplits
About the LCS or so, the networks actually stop showing you a guy’s regular season batting line and start showing you his postseason batting line.
Which is like five games’ worth. I mean, who gives a s**t? (Well, apparently, viewers do, so that they can pour pointless vituperation out on “chokers.” But I digress.)
The point, I think, is that while no information is literally of zero value, it requires a substantial opportunity cost to process information. Crappy information crowds out good information. Once you remember to factor that in, it really is literally true that there is “something wrong” with microsplits.
"We don't want our people to be preoccupied with seminude, crazy men jumping up and down who are chasing an inflated object," said Sheik Mohamed Osman Arus, head of operations for the Hizbul Islam insurgent group.
by PaulThomas on Oct 15, 2011 4:14 PM EDT reply actions 1 recs
easy road...
hi jeff,
i think you took the easy road and stated the conventional SABR theory.
maybe you should try being a bit more contrarian sometimes (for this audience).
comments:
you have such a small sample size to determine a baseline, how is there a sufficient sample size AFTER the baseline to conclude anything? and is the sample after the “baseline” (presumably 3 years) still accurate. i.e. are both players still in their prime, etc. (from year 1 to year 6)
then the next time he faces him hits a HR, is that just standard probability? or is this a good match up for the hitter? the pitchers out pitch is in the hitters zone perhaps. batter picks up ball well and rarely chases this pitchers balls out of the zone; pitchers fastball and offspeed pitches just happen to match well with the batter’s bat speed?
YOUR BIG FINISH:
>> We want for batter vs. pitcher data to matter. It seems too perfect for it to not
>> matter. It will never matter. Never, for as long as baseball is played as
>> it’s currently played. It’s just a meaningless microsplit.
anyway, enjoy the playoffs.
by no_name on Oct 15, 2011 4:48 PM EDT reply actions
Comments For This Post Are Closed