2010 BPR Correlation Analysis

As part of BPR F1’s ongoing season wrap-up, I thought that reviewing the correlation between the BPR POWER rating and the actual on-track results during the 2010 Formula 1 season would be a worthwhile exercise.

For the less mathematically-inclined, correlation is a study of the relationship between two sets of variables (a good, brief explanation of correlation can be found here).  In evaluating the POWER rating’s relationship to on-track results, I used one of the most common methods of displaying correlation: a correlation coefficient.  A correlation coefficient ranges from -1.0 to +1.0. The closer the coefficient is to +1 or -1, the more closely the two sets of variables are related.  Once again, the foregoing link provides a good explanation of what a correlation coefficient represents.

My main purpose in running a correlation analysis was to evaluate how well the BPR POWER rating predicted the next round’s results throughout the 2010 Formula 1 season.  It wouldn’t make sense to run a correlation between the individual-round BPR ratings and the on-track results from that same event, as the BPR formula is based solely on on-track performance and inherently correlated.  Therefore, I ran the “CORREL” function in Microsoft Excel for the POWER rating preceding each round against the next round’s results.  To illustrate: the post-Bahrain POWER rating was correlated to the results of Australia, the post-Australia POWER rating was correlated to the results of Malaysia, and so on.  For those that are interested, Excel’s CORREL function uses the covariance of the variable sets and the standard deviations of each set to arrive at its correlation coefficient output.

I evaluated the POWER rating’s correlation to the finishing position of all 24 driver/entries at subsequent rounds in two ways: by utilizing the actual POWER rating as well as the rankings based on the POWER rating which are included in each BPR table posted on BPR F1.  The results of each correlation analysis are provided in the tables below, with the corresponding results being utilized to generate a line plot also displayed below.  The POWER rating’s correlations were inversed from negative to positive to allow for continuity in graphing.

BPR POWER RATING / RACE FINISH POSITION CORRELATION TABLE

(click to enlarge)

BPR RANKING / RACE FINISH POSITION CORRELATION TABLE

(click to enlarge)

COMBINED CORRELATION LINE PLOT

(click to enlarge)

(Note that because there was no pre-round POWER rating for the season-opening Bahrain Grand Prix, that round does not have a corresponding correlation coefficient.  Asterisks next to round abbreviations indicates a year-to-year change in track configuration, and the letters in parentheses next to some round abbreviations indicates the track used for events which have alternated tracks in recent history.  Coefficient boxes shaded in light-blue indicate a race that was run in wet conditions.)

With nearly identical average coefficients of 0.779 and 0.778, the POWER rating and its associated rankings are certainly correlated to finishing positions at consecutive races.  Take the correlations one step further and the POWER rating is indeed a “predictor” of future results.  However, the POWER rating was never designed to be a prediction model and its predicting abilities are solely related to its reflection of current performance trends prior to each consecutive race.

With the foregoing in mind, I intend to adapt the existing BPR formula to create a predictor model for the 2011 Formula 1 season.  At this stage, the model will incorporate the POWER rating as well as a separate formula based on track-type performances which positively correlate to other rounds.  Including the second calculation is intended to address the simple fact that there are completely different tracks on the Formula 1 calendar which tend to produce different results based on individual car and driver characteristics.  Although it’s relatively obvious that the results of the Monaco Grand Prix are unlikely to reflect the results of say, the Italian Grand Prix at Monza, result relationships amongst other tracks aren’t as obvious.  Therefore, I decided to run a correlation of all the BPR scores for each round on the 2010 Formula 1 calendar to find which events’ results correlate.  Remember that the BPR incorporates a lot of data for all 24 participating drivers at each round and therefore the correlation results should be fairly dependable.

BPR RATING / BPR RATING CORRELATION TABLE

(click to enlarge)

(Note the light-blue shading used once again to show races which ran in wet conditions, with darker blue shading indicating a strong correlation between two races which were run in the wet.)

I’ll let the reader digest some of the more interesting correlations in the above table and what they entail, but I highlighted all correlation coefficients of 0.900+ as being especially relevant.

Look for the upcoming BPR predictor at BPR F1 in addition to the regularly-posted BPR materials throughout the 2011 Formula 1 season.

posted by Trey Blincoe

Advertisements

6 Comments

Filed under 2010 Season, BPR

6 responses to “2010 BPR Correlation Analysis

  1. Cliff

    I have tried to do similar analyses on NASCAR. Correlations are not nearly as good as for Formula 1: my best are in the 0.5 range. Some of your metrics are not available to me for NASCAR. Also, I have not come up with a way to model DNFs for NASCAR.

    Have you tried to do anything like this on NASCAR?
    How do you use reliability ratings?

    Thanks

    • I’ve not applied the BPR formula to NASCAR, but I do know that the same basic formula I use for F1 can be tweaked to address a wide range of motor racing series. In fact, the BPR formula was originally crafted to evaluate Le Mans-style endurance racing, which is a substantially different sport than the world of F1 racing. At various times I’ve used the BPR formula, with modifications, to evaluate the F1, FIA GT, ALMS, LMS, and DTM series. Over time I’ve found that the key to developing an effective performance rating system is evaluating what performance data ultimately determines race results in any particular series.

      As far as correlations are concerned, I’ve run the correlation analysis posted on this site to evaluate how effective the BPR POWER rating is at predicting future results. In arriving at the current BPR formula, I have spent a lot of time evaluating what metrics provide the strongest correlations. However, you’ll notice that the strength of the BPR’s correlations vary from race to race and from year to year; indicating the basic fact that the nature of the competition will play a significant role in determining correlation strength.

      In regard to the Reliability Rating, it’s probably the simplest part of the BPR formula as it’s a simple ratio of laps completed versus laps scheduled. The hard part is determining how the resulting ratio should be incorporated into the overall performance rating formula. The question you should be asking yourself is: how important is reliability in the series being evaluated?

      Hope some of that helps! If you’re interested in NASCAR performance ratings, I did a quick google search and found the NASCAR actually produces it’s own ratings. There’s not much information provided regarding the composition of their formula, but it does appear they use a good amount of data. You may want to run a correlation analysis on their formula to get an idea of a ‘baseline’ correlation in NASCAR.

  2. Cliff

    Thanks for the reply. I use the NASCAR-published Driver Rating, and it includes most of what you use, but correlations are only about 0.50. The equivalent of reliability rating (laps completed/laps raced) for the year end is very closely correlated with Driver Rating, and therefore doesn’t seem to add much information. I am in the process of looking at it week to week, using both YTD reliability and a 4-week average.

    DNFs occur about 15% of the time, on average. If you look at the higher rated drivers, their points scored distribution is bimodal: it looks like two normal distributions, with and without a DNF. If I throw out the DNFs, points predictions are much more accurate (duh). My thought is to weight the predicted points scored, assuming no DNF, with the reliability weighting. I can also add (1-reliability)*DNF average points. Does this make sense to you? For simple DNFs, it doesn’t work any better than ignoring DNFs, but it could be different if I use percent of laps completed.

    In process. Thanks for any ideas.

    • So just to make sure I understand, you are trying to create a points prediction system using the existing NASCAR rating system, and that part of your prediction system will be to incorporate the reliability of a driver to discount his chances of scoring points?

      If that is the case, I can certainly understand the bimodial nature of the distributions with and without DNFs; as you’ll read in the explanation of the BPR, I ‘throw out’ DNFs for that very reason and use a system of accounting for laps completed. If prediction is indeed your goal, I would certainly incorporate a reliability system of some sort in conjunction with a regression model to arrive at your predicted results.

      One other piece of advice I can give you is to account for what you are trying to predict. I am not well-versed on the NASCAR points scoring system, but for even the comparably basic F1 points system one must remember that points are a man-made fiction attached to what actually occurs in the real world. It is for this reason that I correlate the BPR POWER rating and its associated rankings to finishing position, not points scored. If you a truly interested in predicting points scored, you have to account for the premiums NASCAR places on certain performance goals vis a vis whether those goals ultimately show-up or matter on-track; i.e., issuing points for laps-led.

      Does that make sense?

  3. Cliff

    Yes, it all makes sense to me. I tried to incorporate the reliability (laps completed percentage), and had little luck. I did the expected value and got about the same result as fitting the entire database with DNFs included. The best option for reliability appears to be to fit points to reliability and use that as an additional estimate for points scored. Reliability is not highly correlated (~0.25) to points scored for the top 35 drivers, so it is of limited utility.

    The bonus system in NASCAR is equivalent to one position or so, and I don’t believe it upsets the predictions to include it. I wanted a measure that would show the differences between driver ratings in finer detail than just finishing position, since some drivers are only very slightly better that other drivers. In past years, I used finishing position and results of the correlation analysis are about the same.

    Thanks for your comments. I’m trying Neural Nets and Genetic Algorithms for now. Just for fun!!

    Cliff

    • Sounds interesting to me and it appears that you’re definitely on the right track. Do you post your results anywhere? I’d be interested to see how your system would apply in the world of NASCAR. Good luck with the genetic algorithms, you’re certainly more mathematically-minded than I am!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s