Friday, July 28, 2006
This blog is no longer active
I may start up a new blog later.
Sunday, April 02, 2006
My season predictions ....
I confess that I have left this article a little late, what with opening day a matter of only hours away! So rather than regurgitate the usual dross that is written about who’s going to win this or that pennant, I’ll do something a little different. I have taken $300 from my own pocket and bet on the losingest (is that really a word?) team in each division. Over the course of the season we’ll see how well I do and how much, if any, money I make. Sounds fun, right. Well, I thought so too until I handed 300 big ones over to some slightly overweight, cigar chomping bookmaker. So, who did I plump for? Here is the list, with odds:
AL East: Orioles 3/1
AL Central: Royals 1/6
AL West: Mariners Evens
NL East: Nationals 2/1
NL Central: Reds Evens
NL West: Rockies 1/2
Let’s go through each in turn, starting with the AL East. The Os to lose! What is all that about? What about the Devil Rays? I admit, this is a close call, but despite having possibly the finest shortstop in all of baseball in Tejada there is nothing else. The only thing that can save this team is if Leo Mazzone works his magic and make stars of a hitherto ropey pitching staff that has lost arguably its greatest asset in BJ Ryan. In my book the Rays have turned the corner, and with superstar talent coming through (Young, Upton, Crawford, Kazmir to name four) they could surprise everyone. In any case the Orioles’ odds looked too good to ignore.
In the AL Central the Royals virtually pick themselves – again! There is not much else to say except that any team which has a blog dedicated to a quest to lose less than 100 games is not in good shape (http://breaking-100.blogspot.com/).
Finally the AL West; this is a little more tricky but with only four teams I’ll use a process of elimination. The Angels and A’s are genuine contenders, not just for the division but the World Series, so we can ignore them. The Rangers have upgraded their rotation (Millwood) and shipped out the awfully overrated Soriano, so should finish with an even record at least. That leaves the Mariners, who despite having the most exciting young arm in baseball and adding the impressive Johjima, will still struggle. Beltre may regress towards the mean but will still struggle in Safeco’s cavernous outfield, and Seattle’s main acquisition, Jarrod Washburn, has a FIP of 5 – a 2 full points above his ERA last year – indicating that 2005 was a fluke. Someone has got to lose and it should be the Mariners.
Phew, the AL summary is over – lets move onto the NL, starting with the Central. Pirates or reds, Pirates or Reds, Pirates or Reds – hmm, I’ll take the Reds thank you very much. Actually it isn’t as difficult as I made out – the Reds ranked bottom in every major statistical category last season, and projections for this season aren’t much different. I’ll bank my evens odds thank you very much.
Moving swiftly on the NL East: this is a straight toss-up between the Marlins and the Nationals. Although the Marlins look marginally worse on paper I am still scarred by the last time the line-up was savaged in the off-season – within two years they won the World Series. Anyhow, the Nationals absolutely stank in the second half and haven’t improved (don’t get me started on the Soriano trade).
Finally lets look at the dreaded NL West. A league which was so bad last year only one team finished with a winning record and that was more by luck than judgment. All permutations are possible but the Giants, Dodgers and Padres all look reasonably competitive on paper. That leaves a two-horse race for last; the Diamondbacks made a few smart offseason moves (Batista for instance) and the Rockies are, well, the Rockies. Actually it is not as close as I first thought; the Rockies will lose at a canter.
There we have it. $300 burned – I’ll review progress at the All Star break. Say a prayer or two for me, I might need it.
I suppose I can’t sign off without giving my tip to win. I’m going for the Indians. The White Sox were a shade lucky last year – all the evidence points to them being at 85 win team – and the Indians have a great rotation, good batting, and did well to acquire Marte, who should fill a hole at 3rd. They’ll take the Central and from there who knows. Once more it will be in the lap of those great October Gods.
Wednesday, March 29, 2006
Review: The Book - simply awesome
Baseball is a simple game: win games by outscoring your opponents. And you don’t need to watch too much baseball to know that managers will do pretty much anything to eek out that vital victory. That’s because the manager’s job is to make decisions and trade-offs that maximize the win (or run) potential at every possible juncture.
But are they actually making the right decisions? This is where The Book comes in. Using a variety of statistical techniques, and a truck load of data, the authors set out to debate some of the many myths which managers swear black and blue by. These debates are played out are across a variety of chapters:
- Batting and pitching streaks
- Batter / pitcher match-ups
- Clutch hitting
- Batting order
- Starting pitchers
- Relief pitchers
- Sacrifice batting
- Intentional walking
- Base running
- Game theory (responding to your opponent’s actions)
Ok, I know what you are thinking. Many of these topics have been discussed before, so what is different about Tango, Lichtman and Dolphin’s approach? Well, amazingly our pen-toting trio manage to break new ground on pretty much every subject. Part of the joy of reading The Book is the feeling of discovering and learning alongside the authors, so I don’t want to give too much away, but here are a few tasty morsels:
- Sacrifice bunting can make sense in certain situations
- Disruptive running has an enormous negative influence on batting
- Pinch hitting for non-pitchers rarely pays off
And the learning continues across all the debates. The conclusions are summarized in a box entitled “The Book says”, which contain the pithy takeaways that you’d do well to remember and reflect on when you are watching your next game.
Some reviews I have read commented that The Book is quite technical in nature. I disagree. Sure, you have to have an aptitude for learning, but the writing is so lucid and exact that a layman with a bit of time on his hands is perfectly capable of picking up the main points.
Another huge plus is the inclusion of a “Toolshed” chapter, as well as a detailed appendix on some of the statistical techniques used. In fact, reading through these sections was such an edifying experience that I have found that I consistently returned to many of the ideas to reaffirm my own thinking on the methods used. Topics such as regression to the mean and markov chains are explained succinctly yet with clarity that a book like this so often lacks.
The only slightly negative comment I can make is that some of the studies with small sample sizes seem slightly out of place with the overall ethos of the book, and here the authors struggle to establish firm conclusions while still persisting to dive deep into the data. Still, even these analyses are a joy to pore over and reinforce the central concept about drawing accurate conclusions from data.
In summary, if you are reading this review then buy The Book. To hear the opinions of three of the most respected sabermetricians in baseball is a joy and a privilege. It sets a standard of work for others to aspire to, and I can only hope that volume 2 isn’t another two years in the making.
Buy the book at http://www.insidethebook.com
Sunday, March 26, 2006
Comments on line drives
I'd like to thank the guys at BTF for linking to my article. I'm in the process of adding links to this site and I'll definitely be adding theirs - it is an invaluable resource for all sabermetricians
Atlanta Braves: worth $400m?
Ultimately a baseball club is a business, just like any other company, and therefore we can value it using similar techniques. The method I'm going to introduce is called Discount Cash Flow (DCF) analysis. Don't be put off by the name, although investment bankers are paid hundreds of thousands of dollars to perfect DCF we will use it in its simplest form.
DCF values a business or investment by calculating the present value of all future cash flows. What does this mean? Suppose that I sell you a product, eg, a financial option, which can be sold this time next year for $100. How much would you pay for it today? $100? Well, no - our trusty friend from the 1970s called inflation means that $100 today is worth less this time next year. If inflation is 5%, $100 next year is worth $95 today. Therefore the maximum that you'd pay in this instance is $95 - less if you wanted to make a profit.
A business is similar except rather than generating a single payment it spits out cash year after year. If we know how much money our business generates in all future years, and also what the inflation rate is for each year, we can work out the maximum that we would pay for it. Ok, so much for the theory, lets see how it works in practice. We need to work out what the present value of all those future cash flows are. A good starting point is to work out what the current year cash flow is. To do this we need to know revenues and costs. Revenue minus cost equals profit and, for this purpose, cash. Technical note: this excludes capital expenditure (adjusting for depreciation / amortization) and changes in working capital. This is not necessary for a high level exercise like this but is academically accurate if we were doing a full valuation.
For a public company, one listed on the stock exchange for instance, annual reports give us all the information we need. However, ball clubs are intensely private companies which are loathe to reveal even a smatter of their financials. One approach is to ballpark (excuse the pun) estimate cash flow. We simply list all the revenues and costs and try to work out the size of each. Let's give this a go for revenue.
Revenues include: tickets, concessions, car parking, advertising, TV / radio rights, mechandising to name a few - I'm sure you can probably list many more. For the time being lets consider the first three. The Braves' attendance in 2005 was 2.6m. Estimating the average ticket price at $20 gives $52m sales from tickets. Adding in concessions - say a beer and a hotdog per person (total $10) - gives an extra $26m. Car parking? The Braves have 10,000 spaces. So lets say that 80% are filled up on average for each game. At $10 per car that is another $8m per game. Just from gamedays we have $88m. We could continue to go through this exercise and work out all other revenue sources - though we'd struggle a little with television revenue given that AOL are the current owners. Luckily Forbes produces an annual estimate of revenue for us. The last available data is 2004 where revenue was estimated at $162m. Given our initial gameday estimate of $88m, $162m doesn't seem too crazy. Actually you could argue that it is a little on the low-side given the intricacies of the TV contract (basically the Braves sell its TV rights at below fair value to inflate AOL's profit and reduce the Braves' - it is a complex subject that almost deserves of an article in its own right if it wasn't such a dull subject). For simplicity lets put a range on revenue of $160m-$180m - that seems about fair.
Ok, now on to costs which primarily consist of player and staff salaries, but also include general operational expenditure and minor league affiliates amongst others. This is easier to do - salaries are reported and fixed costs of running a ball club are easy to estimate. Again, for simplicity we'll take Forbes' figure of $146m. Lets do a quick sense check: payroll is about $90m leaving $50m for other costs - seems about right (I'm happy to trust Forbes as I haven't got the time to work it out).
Going back to our earlier equation we estimate cash flow (revenue minus costs) at approximately $15m-$35m.
This is just for one year, what we need to do is to project this over all future years. Sounds tricky, right? Well here we can cheat a bit. Given that we are only looking for a ballpark estimate we can use a perpetuity valuation technique. This means that we will base future cash flows on current year cash and a constant future growth rate. The formula relies only on three inputs: annual cash flow (calculated above), the rate at which to discount future cash flow and the growth rate in annual cash flow.
The discount rate should reflect two things: general inflation and riskiness of investment - the higher the risk the more we want to discount cash flows far out in the future as there is less certainty over them. Baseball is a stable industry so is reasonably risk free. A discount rate of about 10% is probably about right. Now what about growth? Again, given the many approximations in this analysis we can be reasonably rough and ready here. We'll assume that ticket prices increase in-line with inflation: say 3%.
The formula for perpetuity value is:
cashflow x (1 + growth rate)
discount rate - growth rate
Now we are in a position to work out how much to pay for the Braves. Plugging in the numbers we get a range from $225m to $530m - which is a wide margin (it goes to show you need to make sure your assumptions are as accurate as possible at the outset). How does that number compare to what other people are saying? The Forbes value is $380m, which is pretty much bang in the middle of where we are. Moreover, investment analysts expect a winning bid for the Braves to be in the region of $400m-$450m.
Back to my original question: how much would I pay for the Braves? Well, unlike a normal business, baseball is a uniquee industry with a number of factors which conspire to push up value. Firstly, it is an effective monopoly which is obviously a bonus for any business (witness the fight put up by the Orioles when the Expos relocated). Secondly, baseball inspires passion and loyalty. People are prepared to pay way over the odds for this which skews even the most rigorous analysis. So based on the above, the general price of recent ball club transactions, and the fact that the Braves is a very well run organization, I'd say that a value in the region of $400m is fair. In fact, if I had that sort of money burning a hole in my pocket I'd be sitting with my feet up, smoking a cigar, watching my mighty Braves winning the world series. Unfortunately I can only but dream!
Bonds to sue
I don't know what form the legal action will take. Speculation is that it will focus on how the authors gathered evidence and not on the content of the book. That is a shame. The allegations, if true, make a sham of Bonds' career. If the action is against the content then I hope he wins; for the sake of baseball Bonds needs to be clean. But the evidence looks damming. Fortunately Barry can probably afford a half-decent lawyer - he'll need it.
Coming up ....
Later today I hope to post an article on how much I'd pay for the Atlanta Braves - yes, this is the same piece I promised when I started this blog some 2 months ago! Shortly after I'll post my review of Tango's, MGL's and Dolphin's book, called The Book. Also not forgetting my season preview which will hopefully be ready before the season starts.
Happy reading and thanks for your support
Monday, March 06, 2006
Do K pictchers give up more fly balls?
So the question is does a dominant strikeout pitcher give up more flyballs that the average pitcher?
OK this shouldn't be too difficult to answer. Firstly what I did was to analyse pbp data for all pitchers who recorded more than 50 BIP. Here is the plot of K9 against fly ball percentage (includes outfield flies and pop-ups):
Pretty conclusive. Rsq is only 0.03, indicating that there is no relationship between the variables - though you don't need regression to work that out. OK, what about the second assertion: "how many extra walks were allowed [by a high strikout pitcher]?". Again here is the data:
Given the first correlation no surprise again. The conclusion? Just because you are a strikeout pitcher it doesn't mean that you either (a) sacrifice control or (b) give up more fly balls. This makes sense: who accuses Johan Santana of lacking control or Roger Clemens being a fly ball picther. No-one.
Sunday, February 26, 2006
Strikeout proficiency regressions ...
(1) BB / (K+BB)
(2) K / BFP
Here are the results (all significant to p<0.01):>
Equations 2 and 3 give similar year-to-year correlations indicating that they both have similar predictive power. K/BFP is more pure measure of strikeout power while K-1.15BB includes an element of control. The correlation for BB/BB+KK is substantially lower. Off the top of my head I can’t reason why this is. My guess is that this equation will be quite sensitive to changes in both BB and KK. And because we have a sensitive denominator then this will increase the variance in the equation. Any other thoughts most welcome.
Thursday, February 23, 2006
Giving up lines drives IS largely luck ...
OK, so what I did was to work out the line drive percentage for all pitchers who gave up more than 40 BIP in both 2004 & 2005. I then allocated a score between 1 to 6 based on where they ranked in line drive percentage. I did this for both 2004 and 2005. If a pitcher had a low line drive percentage he got a 1, if not his score would be closer to 6. Each group is the same size so you can envisage a 6 by 6 matrix representing the distribution where pitchers who gave up few line drives in 2004 AND 2005 would be in the top left and those particularly bad would be in the bottom right. If its not clear hopefully the diagram below might help:
At this point you are probably wondering why I am bothering to run this categorisation. Why don't I just run a regression? Remember, what I am trying to detect here is the presence of an elite group of pitchers, hence why I am segmenting. Technically you could say I should be comparing this group with REST of the population and not just the pitchers in the bottom right corner. If we find a difference between the extremes then lets come back to this.
So my hypothesis is that you may get an elite group of pitchers who don't give up many line drives and they reside in the top left corner of the diagram. Pitchers in this corner include: Mariano Rivera, Johan Santana, Tim Wakefield, Billy Wagner, AJ Burnett and Jose Contreras to name a few. Not a bad list. But in the other corner there were also some A-list names: Jason Isringhausen, Mark Prior and (ouch) Brad Lidge - hmm my hypothesis looks doomed!!.
Anyway, to test this what I did was to run an independent sample t-test of this data using FIP (Field Independent Pitching - developed by Tangotiger) as the test variable. FIP is a good measure of how effective a pitcher is with defence controlled. The two groups were, group 1: where the pitchers had a rank of either 1 or 2 in both 2004 & 2005 and group 2: where pitchers had a rank of ether 5 or 6 in both 2004 and 2005. Everyone else was excluded. Running the analysis it turned out that the test wasn't significant. In other words there was NO difference between the two test groups in their FIP scores therefore disproving my hypothesis.
Not a surprise I suppose given the low year-to-year correlations in line drive percentage and the observations above. But I was still curious whether people like Santana and Rivera who distinguished themselves in have a low line drive % in both 2004 and 2005 did occupy an elite group of pitching. I segment the existing groups into 4:
- Group 1: elite pitchers, ranked 1 in both 2004 and 2005
- Group 2: semi-elite, ranked 1 and 2 or 2 and 2
- Group 3: poor pitchers, ranked 5 and 6 or 5 and 5
- Group 4: worst, ranked 6 in both years
I then ran an ANOVA to compare the different samples. And no surprise the test failed - the overall mean of the data was a better fit to the data than the ANOVA model. Why am I boring you with all this? Well the one interesting thing I found was that there was a significant difference between the worst (group 4) and the rest. FIP for group 4 was almost a whole point higher. Now this is probably because the sample size was small (only 10 in group 3 vs 30 in other groups). But if this is confirmed with a larger dataset it opens the possibility that there are some pitchers who simply shouldn't be in the major leagues becuase they give up too many line drives - which as we know are expensive. I'd also like to look at line drive percentages for a couple of the elite picther like Santana to see if his line drive percentage regressed towards the mean. Given the findings above I would expect this to happen.
Wednesday, February 22, 2006
Are pop-up pitchers flyball pitchers?
There is a clear relationship but the Rsq it is only 0.18 (significant at 0.01 level). This means that only 18% of the variance of flyballs is explained by this model (ie, pop-ups). (In case you were wondering including 2004 data gives a similar correlation).
Actually the correlation could be a little stronger than it first appears. Because both variables are a percentage of balls in play, if say, flyballs increase then there is less "room" for pop-ups to increase. This is why groundballs correlate invesely to fly-balls - if you don't have one you have the other (ignoring line drives).
Nothing suprising so far. Another way to look at this is to ask the contrarian question: do groundball pitchers give up fewer pop-ups than flyball pitchers. Given that flyballs and groundballs make up ~70% of batted balls we can simply categorise pitchers according to whether they give up a lot of groundballs or not. Then we can look for a difference in pop-ups in these two populations. Clear? Lets have a look at how it works in practice.
To categorise pitchers into those that give up groundballs and those that don't I'm simply going to cut the sample of pitchers in half. Those above the mean will go in the "groundball" group; those who aren't go in "other". Then we can run an independent t-test on the two populations to see if there is a significant difference in pop-ups. And, again no surprise, there is a difference and calculating an "Rsq" for this gives 0.63, which as expected is much more pronounced. We could further control for line drives but since we know that (for pitchers) they are largely a random event I have ignored them.
So, what does all this mean? We know from linear weights and the run expectancy matrix that a pop-out is worth almost the same as a strikeout. This poses a wider question. We know from my last post that inducing groundballs is very effective for the fielding team because most of them (75%) are turned into outs and those that are not are predominantly singles. But our analysis here tells us that popups, which lets not forget are as valuable as strikeouts, are the domain of flyballers. I haven't run the analysis but it would be interesting to see if groundball or flyball pitchers have a higher propensity to strikeout. Then we could use batted bull run value data to build the profile(s) of what an elite pitcher looks like.
Monday, February 20, 2006
Hardball Time Annual 2006 & Batted Balls
I am not going to review the annual in detail except to say that all of the articles are of the highest quality and are extremely well written. What I want to do is spend some time discussing my thoughts on what I consider to be the most interesting part of the book, namely analysis of batted ball types. The boys at THT ordered up a special cut of batted ball data for the last three years from Baseball Info Solutions and carried out all sorts of clever whiz-bang analysis on it. If you are a regular reader of THT, or indeed other blogs like Sabernomics, then this won't be new. But it is only with the advent of THT annual 2006 that I have focused on the potential of batted ball data.
Of particular interest were the year-to-year correlations. Using this technique we can determine the extent to which a pitcher (or batter) has control over various events. For example, if we correlate groundballs per BIP (Ball in Play) for the entire population of pitchers in 2004 vs 2005 we get the chart below (thanks to Yahoo Stats Group for the play-by-play data for all charts in here):
No surprise. Pitchers who gave up a lot of groundballs in 2004 did so in 2005. The Rsq is 0.5, which says that 50% of performance in 2005 is explained by performance in 2004 - which is reasonably high. That is why we refer to pitchers as groundball pitchers - Tim Hudson comes to mind. Now, where it gets more interesting is if we look at the same chart for line drives. Here it is:
As you can see the Rsq is very small: 0.01. In other words whether or not a pitcher gives up a line drive is largely luck. I bet you didn't know that (unless year read THT). Doing the same for batters shows a slightly larger Rsq (~0.1) between one year and the next for line drives. Hitters do show a small degree of skill in hitting line drives. THT Annual 2006 does this (and a lot more) for a range of different batter / pitcher events and I encourage you to have a look.
All very interesting but so what, you may ask? To really understand what is going on lets look at another article from, you've guessed it, THT Annual 2006 (no, I am not a contributor). This particular piece works out run value per batted ball above or below an average baseline. Here are some selected examples:
- Line drive: 0.356
- Outfield flyball: 0.035
- Groundball: -0.101
- Strikeout: -0.287
What this is saying is that if a batter slams a line drive then it contributes runs for the offense - the highest value event. This is because line drives only result in an out 25% of the time. And, remember, we said earlier that line drives are largely luck! Pretty amazing. Now take groundballs. Hitting a groundball is bad news. That is because it results in an out 75% of the time, and if it doesn't the chances are that you will only get to first base.
Two things jump out at me that I want to look at further. Firstly, I want to dig into line drives a little deeper. If we can find some pitchers who consistently prevent line drives more than others then they should be more valuable. I guarantee you that if I was on the mound a lot more than 20% of my pitches will go for line drives. Secondly I want to use this to develop a measuring system for pitchers / batters. Now I know this has been done (check out J'S Bradbury PrOPS metric - http://www.sabernomics.com - all his work is excellent if you have time to peruse), but I am curious if we can find a new part of the player population that has been significantly under-valued. Also I'll continue to explore the batted ball data and post anything else that I discover.
Wednesday, February 15, 2006
Coming up on my blog ....
First of all I have just got my copy of The Hardball Times Annual. I love this publication, so in the next week I'll probably pick the most interesting article and give my comments.
Following that I want to do a piece on my team - the Atlanta Braves. One interesting epiosde surrounding the Braves at the moment is the rumoured pending sale. I want to look at how much the Braves are worth; or at least how much I would pay for them if I had the money - which in case you're wondering I don't. I hope to publish something in the next couple of weeks so keep your eyes peeled.
Following that I'll give a quick run through of how I think all the divisions will shape up as well as my predictions on who will win the batting title, home run race and Cy Young award. Yeah, I know everyone does this but I want to look at it from a contrarian position, particulary on division races - I'll focus exclusive on who I think will end up in last place! (Ok - I'll also give my thoughts on the winner). I'll be wrong - but I will bet $50 on each prediction and I'll let you know how I do at the end of the season.
I haven't really decided what else to write about - I guess it will be topics that interest me but I can't say what those are yet. As I publish my first pieces and get comment that will probably define the direction of future work.
Let me know if there is anything you think I should consider ...
Saturday, February 11, 2006
On this blog I'll regularly be posting my views on all aspects of baseball. I love to use data so most of my posts will be based on analysing whatever data I can get my grubby hands on. The tools of my trade are Excel, Access and SPSS, and I'll get data from either Retrosheet or Yahoo Stats Software Group.
Obviously I want this blog to be read as widely as possible. I'll spend the first month or two developing some interesting articles and then I'll try to promote the site towards the start of the season. My goal is to write in an accessible but informative way. I want to focus on the output, and not the methodology of getting there - but I do recognise that this is important to the wider sabermetric community. Actaully, I'll probably end up doing baseball stats primer (perhaps in the 2006 off-season) so people can learn my approach and comment on how I do things.
In the next few days I'll be posting a list of some of the more detailed topics I'll be investigating over the coming months.
In the meantime if there are any issues you want me to comment on then please get in touch at firstname.lastname@example.org