OBJECTIVEPredict accurately the winner of the LVI Super Bowl (Bengals vs Rams).
This will be done through a statistical analysis of the regular season to come up with a forecast of “ratings” in order to give value to all the teams. After the analysis is done we will use Solver in order to accurately depict team ratings and the values that the Bengals and Rams possess, allowing us to determine who should win based on the performance of the regular season. The study will be done through Microsoft Excel.
We will being by importing the NFL weekly scores for the 2021 season. In order to do so we will acquire the information from the website Pro-Football-Reference, unfortunately since the csv version of the data is not Excel-friendly, I’ll use python code to screen scrape the website and get all the datapoints that provide value to our analysis (Its already provided in tables).
The preeminent website detailing the full statistical history of the NFL (including the APFA, AAFC and AFL). Pages for every season, team, player, coach and draft. Line scores for every game detailing every score in history, box scores since 1960 and play-by-play since 1994. Updated the morning after every NFL game.
The Python Code:
Now that we have created our csv file on a data analysis format (See screenshot below). We will use PowerQuery in order to format the text into a table-like structure for an easier data analysis.
After doing the PowerQuery steps, the final tables should look like this: By using a Pivot Table, we will divide the information in week, day, date, time, winner/tie, a column to determine the home/away team, loser/tie, points for winners, points for loser, yards for winers and loser and so on.
After this we will use the function IF (Selecting the in order to create our own table with the home team, visitor team, home points, visitor points).
Once we have the raw information of the teams, we’ll create a couple tables… The first one to create “Ratings” for the team, as this is the way we will determine the value of each team after the regular season and playoffs. We will use a couple of cells to give a “Home-Edge” value (As this year’s Super Bowl is being played at one of the team’s home field.
The most important table will be the one located in the middle, in which we will have: Home,Visitor,Hpts,Vpts, HomeMargin(Difference in points after game), a forecast and the Squared Error.
The Forecast formula will be the following: a Forecast (=HomeEdge+VLOOKUP(D7,TeamRatings,2,FALSE)-VLOOKUP(E7,TeamRatings,2,FALSE)
After creating the forecast, we will use the Solver function in order to determine every team’s ratings, this way we will determine the winner of the Super Bowl (The team with the best rating between the Bengals and the Rams). Solver will also be used to determine the HomeEdge, this is the solver formula:
Now that we’ve run the Solver formula, the ratings are the following:
The highest ratings are colored in green, meaning that the Bills, Cowboys, Buccaneers and Chiefs should’ve made the playoffs semifinals… But football is not based on math 100% (Which is not concerning).
As we know, the Los Angeles Rams will take on the Cincinnati Bengals in the LVI Super Bowl final on Sunday, February 13th.
As we can observe from the ratings, between the two teams….