Sourcing Sports Data Sets

Sourcing Sports Data Sets

Before you’re able to test any of your sports theories or hypotheses, you will need data.

The collection of sports data is a rapidly growing industry. There is a huge commercial demand for sports statistics. For years, betting firms companies were the only ones collecting sports related data. Now, media outlets, broadcasters, sport teams themselves, players’ agents and scouting divisions are all trying to obtain and create data driven insights.

For soccer, there are currently several companies that provide very detailed data sets for the most popular soccer leagues. Opta Sports is one of the best known companies and charges various fees for different levels of data. One of the main features of Opta Sports soccer data is they are able to provide full x,y pitch location co-ordinates of every on-the ball event. Squawka and Deltatre are two other well-known companies that provide detailed sports data. Other companies operating within this space offer slightly different services, such as Stats Perform who perform sophisticated data analysis on these data sets and provide insights to over 1,800 global organisations.

There are also several websites providing free data sets for many of the leagues. Many of these websites started out by providing information to betting pundits, but are now providing a platform for many enthusiastic soccer fanatics to create personal blogs. The most popular websites are:

Additional sources

  • Kaggle : Programming website which users can access various data sets
  • GitHub : Another programming website that has various data sets available
  • Reddit : Many independent data collectors provide links to their data

Canadian Premier League

Canada reformatted their soccer league system in 2019. One of the new features of the Canadian Premier League (CPL), with help from their data partner Stats Perform, is to provide all league data to the public for free. This initiative was done to help develop interest and knowledge of Canadian soccer. Oliver Gage, Head of On-Field Performance and Recruitment for the Canadian Premier League explains “The ability for Canadian fans, aspiring analysts, scouts and coaches to access this data on our league is an essential part of a wider strategy to promote this side of the game in Canada. Empowering a community with the ability to hone their analysis skills, will no doubt help our clubs, players and Canadian soccer as a whole in the long run. I look forward to seeing, questioning and promoting the articles and ideas which the release of this data will no-doubt encourage.”

This data set is a great place to start your analytics path. The data provides over 100 player focused metrics. For outfield players, the stats go beyond the basics of minutes played, number of substitute appearances made, but includes figures such as: passes attempted in the attacking third, aerial dual win percentage, fouls committed in the defending third. For goalkeepers, the data includes number of crosses caught, number of diving saves made, number of saves made with body.

In order to obtain a copy of the data, visit the Centre Circle section of the CPL website. The CPL is eager for people to publish their data insights and findings onto the official Centre Circle Data Twitter handles @canpldata  #CCData and #CanPL, in order to increase awareness.

This article was written by David Martin. Check out his site ‘Actuaries FC’ where he’ll be writing more sports analytics content as well as writing with us.

Share on facebook
Share on twitter
Share on whatsapp
Share on email
Share on linkedin