Note: This section is now in read-only mode.
Please use our new community site for future posts.

Back To Board

A Dozen Brian Giles: Small Samples in Action

Posted By Kurt

A lot of complaints about play results or inaccurate statistics elicits the reply - that's just the result of viewing a small sample of games. I suspect that's not always a satisfying response, particularly when its YOUR player who's underperforming over that small sample. I think we tend to ignore our players who are OVERPERFORMING, but that's another issue. The issue here is: Is there evidence that players will approach expected levels of performance given our observations that in relatively small numbers of games they appear not to.
One aspect of sampling theory that is misunderstood is that it is not so much that sample estimates will approach population (or true) values as the sample grows bigger, but that sample estimates will approach population values as results are aggregated over repeated samples. That is, it's not that you have to play a 1,000 game season for stats to even out, but that stats should "even out" over multiple replications of the same season - and multiple replications is exactly what you see when you click on the "leagues" button.
I picked on Brian Giles because his name has come on on this page a few times by owners complaining that he's underperforming. I pooled his stats from the first 12 leagues (through about 60 games), then created a seasonal average based on the 616 actual plate appearances he had last year. Giles in fact varied considerably from league to league. In 9, he was putting up .246/.432/.362 numbers, but in 11, it was .367/.742/.454.
But, here are his actual 99 stats and his average PB stats over 12 leagues. I won't tell you which is which, because, as you'll see, it really doesn't matter a whole lot:
AB H 2B 3B HR RBI BB K Avg Slg OBP
521 164 33 03 39 115 95 80 .315 .614 .418
522 156 33 02 38 120 94 90 .299 .589 .406
Given differences in ball parks, and the overall upgrade pitching (true scrubs don't start), I'd say that what you get is a pretty acceptable statistical approximation.
A different issue is whether you should have to aggregate stats over a dozen trials to approximate reality. There's a case to be made that Giles should hit .300 with 39 HRs in every league. Mike's position is that this would be too predictable, and ultimately, too boring. This is arguable, but the more I think about it, the more I agree with his position. There's a pragmatic reason - the game's alrorithms are not designed this way, to redesign the game on such a fundamental basis would probably take years. But, the philosophical reason is more compelling. If your team, say, has Brian Giles and A-Rod, and my team has Carlos Beltran and Miguel Tejada, the ODDS are that you will considerably more production from those two spots than me, and if our talent levels are similar elsewhere, you'll probably win more games. But, if I KNOW that will happen, why bother to play the games? Maybe, just maybe, you get the Brian Giles who hits .246, and I get the Carlos Beltran who slugs .600; and I have a chance to edge you out for the playoffs. Isn't this the hope that motivates teams and fans of ALL teams each year? If every Indian and every White Sox hit their expected values exactly this year, there would have been no reason to begin the season. But, just as you don't know which Brian Giles you'll get, the Indians don't know which Roberto or Sandy Alomar they will get, which David Justice, which Omar Vizquez. This way, there's still value in accumulating talent, but there's also a reason to play the games.