“Spring Training Records: In
Depth Analysis”
This is a more indepth analysis starting with what I did in the prior
article after I toyed around with the numbers I got from ESPN using SPSS. If you aren’t into readomg SPSS data output and just want to see the conclusions, go to the
bottom for graphs.
Variables Entered/Removed^{b} 

Model 
Variables Entered 
Variables Removed 
Method 
1 
Spring
Wins^{a} 
. 
Enter 
a. All
requested variables entered. 

b.
Dependent Variable: Reg Pct 
Model Summary 

Model 
R 
R Square 
Adjusted R Square 
Std. Error of the Estimate 
1 
.266^{a} 
.073 
.068 
.067330 
a.
Predictors: (Constant), Spring Wins 
This means that there is a +.266 correlation between the number of spring wins and regular season winning percentage. I chose to ignore regular season wins because teams often miss a game and that would provide a slightly less accurate picture of the predictive value of spring training. The R Square shows that spring wins account for 7.3% of the variation of a team’s winning percentage.
Coefficients^{a} 

Model 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

1 
(Constant) 
.413 
.022 

18.755 
.000 
Spring
Wins 
.006 
.001 
.270 
4.054 
.000 

a.
Dependent Variable: Reg Pct 
The constant, meaning spring wins are set at 0 would give a team a .413 winning percentage. And for every spring training win, a team would win 0.6% more of its games. This is significant at beyond the P < 0.001 level, so it is clear that the number of spring wins have some meaning on average. The “winning percentages” by ESPN include games not played against major league teams, which explained their complete lack of significance in the previous model. The conclusion is that the number of games a team wins in spring training against major league opponents can predict how well they will do. In the previous model, run differential showed some slight significance early on.
Variables Entered/Removed 

Model 
Variables Entered 
Variables Removed 
Method 
1 
Difference,
Spring Wins^{a} 
. 
Enter 
a. All
requested variables entered. 
Model Summary 

Model 
R 
R Square 
Adjusted R Square 
Std. Error of the Estimate 
1 
.280^{a} 
.079 
.070 
.067285 
a.
Predictors: (Constant), Difference, Spring Wins 
While the RSquared goes up to 7.9%, we are punished more by adding another independent variable that is closely correlated with the first independent variable, and our Adjusted R Square is lower than the R Square for just Spring Wins.
Coefficients^{a} 

Model 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

1 
(Constant) 
.440 
.032 

13.699 
.000 
Spring
Wins 
.004 
.002 
.188 
1.908 
.061 

Difference 
.000 
.000 
.111 
1.131 
.265 

a.
Dependent Variable: Reg Pct 
However, due to the correlation between these two variables, the significance of spring wins is no longer as potent. It is now at the P > 0.1 level; but close to P < 0.05. Thus it is clear that spring wins do have some effect on a team’s performance. And this constant states that a team that never played a spring training game would have a record of .440. Adding winning percentage (which includes games against nonmajor league teams) yields some interesting results.
Coefficients^{a} 

Model 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

1 
(Constant) 
.465 
.040 

11.572 
.000 
Spring
Wins 
.006 
.003 
.277 
2.158 
.032 

Difference 
.000 
.000 
.161 
1.431 
.154 

Spring
WPCT 
.099 
.097 
.148 
1.025 
.306 

a.
Dependent Variable: Reg Pct 
The winning percentage of a team in spring training has a slightly negative yet statistically insignificant correlation with winning percentage when the number of wins against major league teams are accounted for. This suggests that poorer teams could potentially be inflating their winning percentages not represented in the WL by playing minor league teams, but this is not statistically significant. To test this, I created my own formula to calculate the actual winning percentage of teams in spring training.
Variables Entered/Removed^{b} 

Model 
Variables Entered 
Variables Removed 
Method 
1 
SpringRealWin^{a} 
. 
Enter 
a. All
requested variables entered. 

b.
Dependent Variable: Spring WPCT 
Model Summary 

Model 
R 
R Square 
Adjusted R Square 
Std. Error of the Estimate 
1 
.979^{a} 
.958 
.958 
.021186 
a.
Predictors: (Constant), SpringRealWin 
Thus, almost all of a team’s winning percentage can be predicted by how they perform in major league games, and it does not appear to be a case of poor teams getting lucky against minor league teams, as the two have a +.979 correlation and a .958 RSquared, meaning 95.8% of the variation in the spring winning percentage counting nonmajor league teams is explained by teams playing other major league teams.
There is another hypothesis that I would like to test; that the good teams will do better in the end of spring training when they’ve reassigned or optioned their minor league players and get to play their major league rosters. This would be best represented by winning and losing streaks at the end of spring training. I recoded to “Streak Code” this to make it numerical, going from negative to positive. If my theory is correct, there should be a positive correlation here.
Variables Entered/Removed^{b} 

Model 
Variables Entered 
Variables Removed 
Method 
1 
Streak
Code^{a} 
. 
Enter 
a. All
requested variables entered. 

b.
Dependent Variable: Reg Pct 
Variables Entered/Removed^{b} 

Model 
Variables Entered 
Variables Removed 
Method 
1 
Streak
Code^{a} 
. 
Enter 
a. All
requested variables entered. 

b.
Dependent Variable: Reg Pct 
Model Summary 

Model 
R 
R Square 
Adjusted R Square 
Std. Error of the Estimate 
1 
.211^{a} 
.044 
.040 
.068485 
a.
Predictors: (Constant), Streak Code 
Coefficients^{a} 

Model 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

1 
(Constant) 
.499 
.005 

105.160 
.000 
Streak
Code 
.006 
.002 
.211 
3.103 
.002 

a.
Dependent Variable: Reg Pct 
It appears there is a somewhat weak correlation of +.213 based on scoring the streaks as negative or positive for losing and winning. This explains only 4.5% of the variation in regular reason winning percentage. It is also significant alone at the P > 0.005 level. But does this hold up when our other variables are added?
Coefficients^{a} 

Model 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

1 
(Constant) 
.483 
.042 

11.589 
.000 
Streak
Code 
.004 
.002 
.132 
1.799 
.075 

Spring
Wins 
.006 
.003 
.274 
2.123 
.038 

Difference 
.000 
.000 
.162 
1.443 
.157 

Spring
WPCT 
.138 
.099 
.204 
1.397 
.172 

a.
Dependent Variable: Reg Pct 
The “Streak Code” does hold to be significant at the P < 0.1 level, but is not nearly as important as Spring Wins when it comes to the overall calculation. However, teams that have large winning streaks at the end of spring training appear more likely to win more games in the regular season. It is just an .004 increase in winning percentage per streak and this may be skewed toward the teams that lose many in a row at the end of spring training as they face other teams superior rosters.
Coefficients^{a} 

Model 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

1 
(Constant) 
.530 
.068 

7.775 
.000 
Games
Back 
.003 
.003 
.125 
.869 
.386 

Difference 
.000 
.000 
.143 
1.256 
.211 

Spring
WPCT 
.205 
.127 
.303 
1.613 
.108 

Streak
Code 
.004 
.002 
.127 
1.721 
.087 

Spring
Wins 
.006 
.003 
.278 
2.142 
.033 

a.
Dependent Variable: Reg Pct 
Adding yet another variable to the mix increases further the significance of spring wins. Nearly all of the variables in this regression could be conceivably considered significant because there are only 210 data points. If we had more years of information then it would be possible to find out for sure. Even though the run differential (difference) is at a PValue of .211, that has to be understood differently than some of the other variables because it is twotailed, meaning it can be negative or positive in its influenced.. The same is true for streak code. Even though the Beta shows streak code is not a huge impact, it is rather significant due to its twotailed nature. Spring winning percentage (including nonmajor league games) has a negative impact at this point because of its significant correlation with Spring Wins.
Model Summary 

Model 
R 
R Square 
Adjusted R Square 
Std. Error of the Estimate 
1 
.316^{a} 
.100 
.078 
.067110 
a.
Predictors: (Constant), Spring Wins, Streak Code, Difference, Games Back,
Spring WPCT 
However, even with all of these variables, we’ve explained just 10% of the variation in real season winning percentages, or 7.8% if we adjust for the number of variables used and how they interact with one another. At this point, we’ve exhausted all the predictive variables that are of any use to us. If we continue to add more, they just correlate way too strongly with current variables to be of any valuable. So spring training records are not meaningless, but they are not strong predictors in most cases.
Graphs and Charts:
This graph clearly shows a slight correlation between spring wins and regular season winning percentage. Most teams tend to be clustered between 10 and 20 wins in spring training. It is not until you get past 20 spring training wins that the teams have noticeably higher winning percentages. I also made a graph of the number of wins, though it will likely be just about the same with a few exceptions for teams who played extra games or skipped a game. It just makes it easier to recognize your favorite team on the chart if you’re a Tigers fan! J It’s also easier to read the graph because the numbers are not as grouped up. However, I chose not to use regular season wins in any regression because of the missing games issue.
As one can clearly see, the graph is about the same but it is easier to differentiate between teams because of how close winning percentages can be compared to the actual number of wins. Just for you Giants fans… that 9 win team in spring training with 91 wins… that’s the Giants in 2004. So it’s always unpredictable, even if there is a slight correlation. However, the 2010 Giants won their 20^{th} spring training game tonight and still have six to go. Only two teams since 2003 have won 24 or more spring training games. (Yes, I am an optimistic Giants fan, which is illegal on the Giants newsgroup.)
There are, of course, other factors to consider, such as run differential.
This shows most teams clustered between 25 and +25. Only when you get to +50 or beyond do the teams usually win often, and the same goes for 50 and losing. However, ESPN did not state whether this counted games against nonmajor teams so I will take it with a grain of salt.
The “streak code” or winning streak dot plot shows that most teams tend to end with small streaks, and that there is a tendency for teams with large winning streaks to win a lot of games. The effect is not as pronounced for teams ending on losing streaks to lose lots of games.
Overall, the baseball season is unpredictable, but I personally am confident that the Giants and the Indians will both do very well this year based on what I know from spring training, barring any severe injuries.