“Spring Training Records: In Depth Analysis”

 

  By Jesse Radin

 

This is a more in-depth analysis starting with what I did in the prior article after I toyed around with the numbers I got from ESPN using SPSS.  If you aren’t into readomg SPSS data output and just want to see the conclusions, go to the bottom for graphs.

 

Variables Entered/Removedb

Model

Variables Entered

Variables Removed

Method

1

Spring Winsa

.

Enter

a. All requested variables entered.

b. Dependent Variable: Reg Pct

 

 

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.266a

.073

.068

.067330

a. Predictors: (Constant), Spring Wins

 

 

This means that there is a +.266 correlation between the number of spring wins and regular season winning percentage. I chose to ignore regular season wins because teams often miss a game and that would provide a slightly less accurate picture of the predictive value of spring training. The R Square shows that spring wins account for 7.3% of the variation of a team’s winning percentage.

 

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

.413

.022

 

18.755

.000

Spring Wins

.006

.001

.270

4.054

.000

a. Dependent Variable: Reg Pct

 

 

 

The constant, meaning spring wins are set at 0 would give a team a .413 winning percentage. And for every spring training win, a team would win 0.6% more of its games. This is significant at beyond the P < 0.001 level, so it is clear that the number of spring wins have some meaning on average. The “winning percentages” by ESPN include games not played against major league teams, which explained their complete lack of significance in the previous model.  The conclusion is that the number of games a team wins in spring training against major league opponents can predict how well they will do. In the previous model, run differential showed some slight significance early on.

 

 

Variables Entered/Removed

Model

Variables Entered

Variables Removed

Method

1

Difference, Spring Winsa

.

Enter

a. All requested variables entered.

 

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.280a

.079

.070

.067285

a. Predictors: (Constant), Difference, Spring Wins

 

 

While the R-Squared goes up to 7.9%, we are punished more by adding another independent variable that is closely correlated with the first independent variable, and our Adjusted R Square is lower than the R Square for just Spring Wins.

 

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

.440

.032

 

13.699

.000

Spring Wins

.004

.002

.188

1.908

.061

Difference

.000

.000

.111

1.131

.265

a. Dependent Variable: Reg Pct

 

 

However, due to the correlation between these two variables, the significance of spring wins is no longer as potent. It is now at the P > 0.1 level; but close to P < 0.05. Thus it is clear that spring wins do have some effect on a team’s performance. And this constant states that a team that never played a spring training game would have a record of .440. Adding winning percentage (which includes games against non-major league teams) yields some interesting results.

 

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

.465

.040

 

11.572

.000

Spring Wins

.006

.003

.277

2.158

.032

Difference

.000

.000

.161

1.431

.154

Spring WPCT

-.099

.097

-.148

-1.025

.306

a. Dependent Variable: Reg Pct

 

The winning percentage of a team in spring training has a slightly negative yet statistically insignificant correlation with winning percentage when the number of wins against major league teams are accounted for.  This suggests that poorer teams could potentially be inflating their winning percentages not represented in the W-L by playing minor league teams, but this is not statistically significant. To test this, I created my own formula to calculate the actual winning percentage of teams in spring training.

 

 

Variables Entered/Removedb

Model

Variables Entered

Variables Removed

Method

1

SpringRealWina

.

Enter

a. All requested variables entered.

b. Dependent Variable: Spring WPCT

 

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.979a

.958

.958

.021186

a. Predictors: (Constant), SpringRealWin

 

 

 

Thus, almost all of a team’s winning percentage can be predicted by how they perform in major league games, and it does not appear to be a case of poor teams getting lucky against minor league teams, as the two have a +.979 correlation and a .958 R-Squared, meaning 95.8% of the variation in the spring winning percentage counting non-major league teams is explained by teams playing other major league teams.

 

There is another hypothesis that I would like to test; that the good teams will do better in the end of spring training when they’ve reassigned or optioned their minor league players and get to play their major league rosters. This would be best represented by winning and losing streaks at the end of spring training. I recoded to “Streak Code” this to make it numerical, going from negative to positive. If my theory is correct, there should be a positive correlation here.

 

 

Variables Entered/Removedb

Model

Variables Entered

Variables Removed

Method

1

Streak Codea

.

Enter

a. All requested variables entered.

b. Dependent Variable: Reg Pct

 

 

 

Variables Entered/Removedb

Model

Variables Entered

Variables Removed

Method

1

Streak Codea

.

Enter

a. All requested variables entered.

b. Dependent Variable: Reg Pct

 

 

 

 

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.211a

.044

.040

.068485

a. Predictors: (Constant), Streak Code

 

 

 

 

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

.499

.005

 

105.160

.000

Streak Code

.006

.002

.211

3.103

.002

a. Dependent Variable: Reg Pct

 

 

 

 

It appears there is a somewhat weak correlation of +.213 based on scoring the streaks as negative or positive for losing and winning. This explains only 4.5% of the variation in regular reason winning percentage. It is also significant alone at the P > 0.005 level. But does this hold up when our other variables are added?

 

 

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

.483

.042

 

11.589

.000

Streak Code

.004

.002

.132

1.799

.075

Spring Wins

.006

.003

.274

2.123

.038

Difference

.000

.000

.162

1.443

.157

Spring WPCT

-.138

.099

-.204

-1.397

.172

a. Dependent Variable: Reg Pct

 

 

The “Streak Code” does hold to be significant at the P < 0.1 level, but is not nearly as important as Spring Wins when it comes to the overall calculation. However, teams that have large winning streaks at the end of spring training appear more likely to win more games in the regular season. It is just an .004 increase in winning percentage per streak and this may be skewed toward the teams that lose many in a row at the end of spring training as they face other teams superior rosters. 

 

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

.530

.068

 

7.775

.000

Games Back

-.003

.003

-.125

-.869

.386

Difference

.000

.000

.143

1.256

.211

Spring WPCT

-.205

.127

-.303

-1.613

.108

Streak Code

.004

.002

.127

1.721

.087

Spring Wins

.006

.003

.278

2.142

.033

a. Dependent Variable: Reg Pct

 

 

 

Adding yet another variable to the mix increases further the significance of spring wins. Nearly all of the variables in this regression could be conceivably considered significant because there are only 210 data points. If we had more years of information then it would be possible to find out for sure. Even though the run differential (difference) is at a P-Value of .211, that has to be understood differently  than some of the other variables because it is two-tailed, meaning it can be negative or positive in its influenced.. The same is true for streak code. Even though the Beta shows streak code is not a huge impact, it is rather significant due to its two-tailed nature. Spring winning percentage (including non-major league games) has a negative impact at this point because of its significant correlation with Spring Wins.

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.316a

.100

.078

.067110

a. Predictors: (Constant), Spring Wins, Streak Code, Difference, Games Back, Spring WPCT

 

However, even with all of these variables, we’ve explained just 10% of the variation in real season winning percentages, or 7.8% if we adjust for the number of variables used and how they interact with one another. At this point, we’ve exhausted all the predictive variables that are of any use to us. If we continue to add more, they just correlate way too strongly with current variables to be of any valuable. So spring training records are not meaningless, but they are not strong predictors in most cases.

 

Graphs and Charts:

 

 

 

This graph clearly shows a slight correlation between spring wins and regular season winning percentage. Most teams tend to be clustered between 10 and 20 wins in spring training. It is not until you get past 20 spring training wins that the teams have noticeably higher winning percentages. I also made a graph of the number of wins, though it will likely be just about the same with a few exceptions for teams who played extra games or skipped a game. It just makes it easier to recognize your favorite team on the chart if you’re a Tigers fan! J It’s also easier to read the graph because the numbers are not as grouped up. However, I chose not to use regular season wins in any regression because of the missing games issue.

 

 

 

As one can clearly see, the graph is about the same but it is easier to differentiate between teams because of how close winning percentages can be compared to the actual number of wins. Just for you Giants fans… that 9 win team in spring training with 91 wins… that’s the Giants in 2004. So it’s always unpredictable, even if there is a slight correlation. However, the 2010 Giants won their 20th spring training game tonight and still have six to go. Only two teams since 2003 have won 24 or more spring training games.  (Yes, I am an optimistic Giants fan, which is illegal on the Giants newsgroup.)

 

There are, of course, other factors to consider, such as run differential.

 

This shows most teams clustered between -25 and +25.  Only when you get to +50 or beyond do the teams usually win often, and the same goes for -50 and losing. However, ESPN did not state whether this counted games against non-major teams so I will take it with a grain of salt.

The “streak code” or winning streak dot plot shows that most teams tend to end with small streaks, and that there is a tendency for teams with large winning streaks to win a lot of games. The effect is not as pronounced for teams ending on losing streaks to lose lots of games.

 

Overall, the baseball season is unpredictable, but I personally am confident that the Giants and the Indians will both do very well this year based on what I know from spring training, barring any severe injuries.


Back to Main