“Do Spring Training Records Mean Anything?”

 

  By Jesse Radin

 

            As a Giants fan, right now, I am hoping the answer is yes. But right away, without any kind of research or information, there is a cautionary tale of the 2005 spring training. I watched, excited, as the Giants went 20-12 that spring. But then the news of Barry Bonds’ injury  hit the team, and the Giants were below .500 for the first time since 1996. So it is clear that we need to be cautious before being hopeful (or pessimistic) about a team’s spring training records.

There are several factors that are at play here. First and foremost, teams tend to play their major league players more often in the latter part of spring training, so teams that end spring training on a high note would be more likely to succeed in the regular season. Secondly, the difference between runs scored and runs allowed tends to be a more accurate predictor of success, especially with so few games to be played.

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.208a

.043

.039

.068075

a. Predictors: (Constant), Spring WPCT

 

Initially, while only comparing spring training winning percentage to regular season winning percentage, there appears to be a weak correlation. However, this correlation only accounts for 4.3% of the variation in regular season winning percentages. Since we also have the variables for runs scored and runs allowed for the regular season in our model, this is not a indication that spring training records have no real predictive factor. It would be unfair to compare them to the runs scored and allowed in the regular season, because those have an extremely high correlation with winning percentages.

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

.426

.025

 

17.241

.000

Spring WPCT

.141

.046

.208

3.071

.002

a. Dependent Variable: Reg Pct

 

However, despite a weak correlation, with just spring training winning percentages and regular season winning percentages, there appears to be a significant relationship. I will now add the difference between runs scored and runs allowed in spring training games to see how that changes the relationship between the variables. It is my belief that DIFF is more important than winning percentage in predicting regular season records.

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.242a

.058

.049

.067698

a. Predictors: (Constant), Difference, Spring WPCT

 

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

.484

.040

 

12.084

.000

Spring WPCT

.031

.076

.046

.416

.678

Difference

.000

.000

.203

1.822

.070

a. Dependent Variable: Reg Pct

 

Immediately, we can tell that I was correct in my calculation. No longer is spring winning percentage significant at all. DIFF is significant, but only at the P < 0.1 level. We are now accounting for 5.8% of the variation in regular season winning percentage. Another interesting calculation would be to compare spring DIFF to regular season DIFF, because that may lead to more significant results than a winning percentage that has been derived from around 35 games.

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.209a

.043

.039

101.60093

a. Predictors: (Constant), Difference

 

Interestingly enough, there is not much difference between these numbers and the first numbers we derived from comparing the two winning percentages. Thus, it would make sense to assume that all four variables are tied together.

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

-.101

7.014

 

-.014

.988

Difference

.741

.241

.209

3.075

.002

a. Dependent Variable: Reg Difference

 

 

Again, the Beta is nearly the same as the first calculation. I will now add the spring winning percentage into this calculation to see if it yields different results than our second regression did.

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.214a

.046

.036

101.72719

a. Predictors: (Constant), Spring WPCT, Difference

 

In this case, the 4.6% of RDIFF’s variation being explained is lower than the second regression’s 5.8% of RWPCT.

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

-41.676

60.172

 

-.693

.489

Difference

.520

.399

.146

1.302

.194

Spring WPCT

79.056

113.638

.078

.696

.487

a. Dependent Variable: Reg Difference

 

This calculation shows that neither DIFF or spring WPCT have significance when calculating the RDIFF. Thus, this either shows the lack of predictive powers of spring training performance, or that RDIFF does not account for something that winning percentage does. Overall, it can be concluded that there is some value in the run scored/allowed differential in spring training, but not in spring training winning percentages.

For comparison’s sake, how do real runs scored and allowed correlate with winning percentage?

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.925a

.856

.855

.026476

a. Predictors: (Constant), Reg Runs Scored, Reg Runs Allowed

 

This shows that 85.6% of WPCT’s variation can be explained by the number of runs teams score and allow.

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

.513

.018

 

28.839

.000

Reg Runs Allowed

.000

.000

-.869

-30.277

.000

Reg Runs Scored

.001

.000

.811

28.240

.000

a. Dependent Variable: Reg Pct

 

The Beta shows that pitching is slightly more important than hitting, but both are very important. It also shows that a team with equal runs scored and allowed would have a .513 winning percentage. The most important question now is what happens if we add the spring training WPCT and DIFF into this model?

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.927a

.859

.856

.026362

a. Predictors: (Constant), Spring WPCT, Reg Runs Allowed, Reg Runs Scored, Difference

 

This shows that there is very little change in the predictive value. It only now predicts 85.9% of the variation of winning percentage, but the punishment for having even more variables (Adjusted R Square) removes that and brings us back to 85.6%.

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

.522

.023

 

22.407

.000

Reg Runs Allowed

.000

.000

-.861

-29.695

.000

Reg Runs Scored

.001

.000

.802

27.496

.000

Difference

.000

.000

.069

1.591

.113

Spring WPCT

-.017

.029

-.024

-.562

.575

a. Dependent Variable: Reg Pct

 

This shows that there is essentially no relationship between spring winning percentage and regular season winning percentage. However, even though DIFF is not significant, the .113 shows that it is nearly significant, just missing the P < 0.1 value. Therefore, since we do not know the runs scored and allowed for 2010 teams, DIFF is one of the better predictors that we have, given our limited information.

A cautionary note, however. I noticed that the ESPN winning percentages of teams include games not calculated in the standings. It includes games played against minor league teams. That may be one cause of the lack of significance between spring training records and regular season records. The Giants are 16-6 in games that count in the standings, and the team behind them (Indians) are 12-6. However, the Indians have a winning percentage of .722 and the Giants have a winning percentage of .727. This may have the effect of weakening all the correlations, especially if DIFF includes the games against minor league teams.

 

 

There is also this graph of spring training percentages compared to regular season percentages.

This is slightly encouraging for the Giants. It appears that the slight significance is caused more by correlation at the high end. It seems that teams that win 70% or more of their games in spring training tend to have a slight advantage over the others, but  since there are only around 10 data points at that level, it is not too significant. However, the general predictions for the Giants I’m seeing (around 80-90) wins fit in with where they would be based on their current spring training record. However, nothing is set and stone, and a Double Six Dollar Burger caused injury to Lard Lad Sandoval would drop the Giants out of contention.


Further analysis on this topic

 

 

Back to Main