In our interview with Wade Hobbs we cover all things sports analytics including the rising importance of data analytics in the roles of a performance analyst or sports scientist, the impact of coronavirus on the industry and tips & tricks to improve your data visualisation skills!
Tell us about your background
I completed a Sport and Exercise Science Degree in 2012 and landed a Post Graduate Scholar position at the Australian Institute of Sport (AIS) in 2014. Following a year working in the performance analysis team at the AIS I began a PhD with the AIS and Basketball Australia investigating unpredictability in basketball. I finished the PhD mid 2019 and began working for Triathlon Australia in September 2019.
Talk us through your area of focus in your PhD and work with Triathlon Australia
The focus of my PhD was understanding unpredictability in basketball offences. The coach of the Australian Opals at the time, Brendan Joyce, wanted to play in a way that was difficult to scout. This meant handing over control and decision making to the players in a ‘motion’ style offence. This change in the style of play should have made the team offence unpredictable, however there was no way of evaluating this or understanding if it led to better outcomes for the team.
Through a series of four studies I established a method of quantifying unpredictability across the court. Firstly, I created a measure of performance termed spatial effectiveness that quantified the probability of scoring across the court; tested whether there was a relationship between unpredictability and effectiveness; and finally if there was a relationship between shot location and the unpredictability of ball movements in the 5 second window leading to the shot. This type of spatio-temporal analysis required a significant up-skill in my analysis abilities, for which I learnt to code in R and have been developing those skills since.
The skills and experience learnt during my PhD set me up well for a role in sport as a data analyst and fortunately not long after submitting my thesis I received a call from Triathlon Australia who were looking for a data analyst. I have been with Triathlon for about 10 months and have worked across a number of areas including performance modelling, pathways, talent identification, and athlete training and health.
I think anyone in performance analysis who is not learning to program in R or Python will be left behind. Broadly the role of a performance analyst is changing from primarily video analysis to data analysis.
In the decade between when you started your Sport Science Degree in 2009 and finishing your PhD in sports analytics in 2019, what were some of the key changes you noticed in the industry of sports science and performance analysis?
It’s an interesting time to be in sports science in 2020 in Australia. When I started my degree the established areas of physiology, biomechanics, skill acquisition and performance analysis each appeared equally viable to pursue as a career. In fact, I initially applied for the AIS post-grad scholar program in 2013 and interviewed for a position in bio-mechanics, skill acquisition and performance analysis (didn’t get any of them).
In 2020 a performance analyst without programming skills probably won’t have a long career. Bio-mechanics will likely be impacted by more and better automation of motion analysis technology, shifting the day-to-day work of those professionals. Skill acquisition is slowly becoming more widespread in Australian sport, and there will always be a place for the physiologists. However, it is a clear that programming skills are becoming more ubiquitous and in certain areas a prerequisite as a sports scientist.
In short, it’s a familiar story – more data is being collected, new data sources are emerging, and skills needed to handle the data are in strong demand. The flip-side of that is that some skills that were previously in demand won’t be in future, such as SportsCode in performance analysis or Vicon processing in biomechanics for example.
Some of the projects you’ve worked on for your personal site seem really interesting! Can you talk us through one of them?
It’s been a while since I posted a new project on the website but the most recent one was an exploration of tree-based machine learning techniques to predict tennis match outcomes. The motivation behind the post was that I wanted to learn how to use these types of predictive models so decided to teach myself then share what I learnt in a blog post. Essentially tree-based models take in the data – in this case a bunch of tennis match box score stats, and the outcome of the match – and finds the best split in the varies predictors (box score stats) to predict the winner of future matches.
For example, the decision tree split the data first by break points converted, players who converted 3 or more break points won 78% of the time compared to those with less than 3 converted break points. The model is trained on a subset of the data so it can learn where to make the splits and what is most predictive of a win, then the model is tested on unseen data to evaluate how accurately it predicts match outcomes. Therefore, you can build various models with different settings to find which performs best. There are also ways to automatically find the optimal model configuration to get the best prediction accuracy.
Any interesting visualisations for us less technical folk?
I posted it a while ago but I still like the visualisation of a rowing race I made at the start of this post. It is a gif showing the progression of the rowers in the race. It’s very simple but quickly conveys what happened in a race. It could be used in any race scenario and I’ve seen similar (better) visualisations used to summarise F1 races.
Which tools and/or methodologies have you mostly used for data analysis and modelling?
I almost exclusively use R for analysis and modelling. It’s a powerful tool built specifically for this purpose, and with the expansion of the ‘tidyverse’ (a set of packages that share a common language and can be used in combination) the learning curve is lower than when I first started. I have used Python out of necessity but found it more difficult to use. Excel is ok if it’s all you know but can introduce errors and doesn’t scale well if you’re dealing with larger data sets or doing more complex analyses.
As far as methodologies go, it’s a familiar story when talking to people working with data, the bulk of the work is cleaning and manipulating data, and very little is fancy machine learning or predictive modelling. Often visualising the data is as far as the analysis needs to go, and only on rare occasions is more complex modelling useful (in my role at least).
Coronavirus has had a significant impact on the sporting industry, including the reduction in budgets and headcount for support staff. What are a couple of the key areas professionals in sports science and performance analysis can improve to make sure their services are still in demand?
I think anyone in performance analysis who is not learning to program in R or Python will be left behind. Broadly the role of a performance analyst is changing from primarily video analysis to data analysis. There will always be a need for video analysis and Sportscode is a very useful tool, but more and more analysis of performance involves analysing data and to do that well requires skills in programming. Therefore, in my opinion someone interested in performance analysis would be better served doing a degree in statistics or data science rather than sports science these days.
More specifically, simply understanding how to visualise data is a highly valuable skill, and I don’t necessarily mean knowing how to code a plot in R. Learning the general conventions and understanding, for example, what type of plot to use for the data you have and the message you are trying to deliver is highly impactful. This online textbook is a good resource to understand the theory of data visualisation. A recent blog post by Alice Sweeting nicely presented her thoughts on data visualisation in sports science.
If you could use your skillset and experience to work with any sporting team, who would you work with and why?
The sports I follow closely are NFL, NBA and cycling. I would quite happily work for one of the more analytics driven NFL teams (Eagles, Ravens etc) or NBA team (Rockets, 76ers, Magic etc) or a world tour cycling team. In saying that, Triathlon Australia is a great organisation, with a great high-performance team committed to data supported decision making. I am lucky to have fallen into my current role so it would take something pretty special to drag me away. Interestingly, there are still plenty of organisations in the NFL and NBA that do not embrace analytics and might have one or two token analysts, so working for a sport that is genuinely committed to using data is very rewarding.