“Do Spring Training Records Mean Anything?”
As a Giants fan, right now, I am hoping the answer is yes. But right away, without any kind of research or information, there is a cautionary tale of the 2005 spring training. I watched, excited, as the Giants went 2012 that spring. But then the news of Barry Bonds’ injury hit the team, and the Giants were below .500 for the first time since 1996. So it is clear that we need to be cautious before being hopeful (or pessimistic) about a team’s spring training records.
There are several factors that are at play here. First and foremost, teams tend to play their major league players more often in the latter part of spring training, so teams that end spring training on a high note would be more likely to succeed in the regular season. Secondly, the difference between runs scored and runs allowed tends to be a more accurate predictor of success, especially with so few games to be played.
Model Summary 

Model 
R 
R Square 
Adjusted R Square 
Std. Error of the Estimate 
1 
.208^{a} 
.043 
.039 
.068075 
a. Predictors: (Constant), Spring WPCT 
Initially, while only comparing spring training winning percentage to regular season winning percentage, there appears to be a weak correlation. However, this correlation only accounts for 4.3% of the variation in regular season winning percentages. Since we also have the variables for runs scored and runs allowed for the regular season in our model, this is not a indication that spring training records have no real predictive factor. It would be unfair to compare them to the runs scored and allowed in the regular season, because those have an extremely high correlation with winning percentages.
Coefficients^{a} 

Model 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

1 
(Constant) 
.426 
.025 

17.241 
.000 
Spring WPCT 
.141 
.046 
.208 
3.071 
.002 

a. Dependent Variable: Reg Pct 
However, despite a weak correlation, with just spring training winning percentages and regular season winning percentages, there appears to be a significant relationship. I will now add the difference between runs scored and runs allowed in spring training games to see how that changes the relationship between the variables. It is my belief that DIFF is more important than winning percentage in predicting regular season records.
Model Summary 

Model 
R 
R Square 
Adjusted R Square 
Std. Error of the Estimate 
1 
.242^{a} 
.058 
.049 
.067698 
a. Predictors: (Constant), Difference, Spring WPCT 
Coefficients^{a} 

Model 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

1 
(Constant) 
.484 
.040 

12.084 
.000 
Spring WPCT 
.031 
.076 
.046 
.416 
.678 

Difference 
.000 
.000 
.203 
1.822 
.070 

a. Dependent Variable: Reg Pct 
Immediately, we can tell that I was correct in my calculation. No longer is spring winning percentage significant at all. DIFF is significant, but only at the P < 0.1 level. We are now accounting for 5.8% of the variation in regular season winning percentage. Another interesting calculation would be to compare spring DIFF to regular season DIFF, because that may lead to more significant results than a winning percentage that has been derived from around 35 games.
Model Summary 

Model 
R 
R Square 
Adjusted R Square 
Std. Error of the Estimate 
1 
.209^{a} 
.043 
.039 
101.60093 
a. Predictors: (Constant), Difference 
Interestingly enough, there is not much difference between these numbers and the first numbers we derived from comparing the two winning percentages. Thus, it would make sense to assume that all four variables are tied together.
Coefficients^{a} 

Model 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

1 
(Constant) 
.101 
7.014 

.014 
.988 
Difference 
.741 
.241 
.209 
3.075 
.002 

a. Dependent Variable: Reg Difference 
Again, the Beta is nearly the same as the first calculation. I will now add the spring winning percentage into this calculation to see if it yields different results than our second regression did.
Model Summary 

Model 
R 
R Square 
Adjusted R Square 
Std. Error of the Estimate 
1 
.214^{a} 
.046 
.036 
101.72719 
a. Predictors: (Constant), Spring WPCT, Difference 
In this case, the 4.6% of RDIFF’s variation being explained is lower than the second regression’s 5.8% of RWPCT.
Coefficients^{a} 

Model 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

1 
(Constant) 
41.676 
60.172 

.693 
.489 
Difference 
.520 
.399 
.146 
1.302 
.194 

Spring WPCT 
79.056 
113.638 
.078 
.696 
.487 

a. Dependent Variable: Reg Difference 
This calculation shows that neither DIFF or spring WPCT have significance when calculating the RDIFF. Thus, this either shows the lack of predictive powers of spring training performance, or that RDIFF does not account for something that winning percentage does. Overall, it can be concluded that there is some value in the run scored/allowed differential in spring training, but not in spring training winning percentages.
For comparison’s sake, how do real runs scored and allowed correlate with winning percentage?
Model Summary 

Model 
R 
R Square 
Adjusted R Square 
Std. Error of the Estimate 
1 
.925^{a} 
.856 
.855 
.026476 
a. Predictors: (Constant), Reg Runs Scored, Reg Runs Allowed 
This shows that 85.6% of WPCT’s variation can be explained by the number of runs teams score and allow.
Coefficients^{a} 

Model 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

1 
(Constant) 
.513 
.018 

28.839 
.000 
Reg Runs Allowed 
.000 
.000 
.869 
30.277 
.000 

Reg Runs Scored 
.001 
.000 
.811 
28.240 
.000 

a. Dependent Variable: Reg Pct 
The Beta shows that pitching is slightly more important than hitting, but both are very important. It also shows that a team with equal runs scored and allowed would have a .513 winning percentage. The most important question now is what happens if we add the spring training WPCT and DIFF into this model?
Model Summary 

Model 
R 
R Square 
Adjusted R Square 
Std. Error of the Estimate 
1 
.927^{a} 
.859 
.856 
.026362 
a. Predictors: (Constant), Spring WPCT, Reg Runs Allowed, Reg
Runs Scored, Difference 
This shows that there is very little change in the predictive value. It only now predicts 85.9% of the variation of winning percentage, but the punishment for having even more variables (Adjusted R Square) removes that and brings us back to 85.6%.
Coefficients^{a} 

Model 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

1 
(Constant) 
.522 
.023 

22.407 
.000 
Reg Runs Allowed 
.000 
.000 
.861 
29.695 
.000 

Reg Runs Scored 
.001 
.000 
.802 
27.496 
.000 

Difference 
.000 
.000 
.069 
1.591 
.113 

Spring WPCT 
.017 
.029 
.024 
.562 
.575 

a. Dependent Variable: Reg Pct 
This shows that there is essentially no relationship between spring winning percentage and regular season winning percentage. However, even though DIFF is not significant, the .113 shows that it is nearly significant, just missing the P < 0.1 value. Therefore, since we do not know the runs scored and allowed for 2010 teams, DIFF is one of the better predictors that we have, given our limited information.
A cautionary note, however. I noticed that the ESPN winning percentages of teams include games not calculated in the standings. It includes games played against minor league teams. That may be one cause of the lack of significance between spring training records and regular season records. The Giants are 166 in games that count in the standings, and the team behind them (Indians) are 126. However, the Indians have a winning percentage of .722 and the Giants have a winning percentage of .727. This may have the effect of weakening all the correlations, especially if DIFF includes the games against minor league teams.
There is also this graph of spring training percentages compared to regular season percentages.
This is slightly encouraging for the Giants. It appears that the slight significance is caused more by correlation at the high end. It seems that teams that win 70% or more of their games in spring training tend to have a slight advantage over the others, but since there are only around 10 data points at that level, it is not too significant. However, the general predictions for the Giants I’m seeing (around 8090) wins fit in with where they would be based on their current spring training record. However, nothing is set and stone, and a Double Six Dollar Burger caused injury to Lard Lad Sandoval would drop the Giants out of contention.