In our chat with Jason, we discuss how he thought of starting his blog, different sports he’s investigated & suggestions on how to get started in sport analytics. Check it out!
Tell us about your background
My name is Jason Zivcovic and I’m a Data Scientist at the Reece Group. I came to the role on a really winding road. I didn’t want to go to University after High School, so went and completed a trade (Landscape Gardening). I then realised I didn’t want to be a human bobcat so I decided to complete my studies, which was a Bachelor of Commerce and Finance.
I’d spent a bit of my early career in process improvement and had done my Lean and Six Sigma training, so statistics played a role there, but I never had the programming experience (save for some VBA code). That role led me to an operations role, and then finally into the analytics space, where I now am and will be for the long haul.
How did you get into data analysis in sport
Growing up as a sports-mad child and teen, statistics were always in my world. But with the love of sport blinding me, I didn’t even realise it was a love of statistics and data. Once I’d decoded that an analytics career was for me, I discovered edX and DataCamp and the programming-focused data analysis short courses they had available, then applying the learnings to sports datasets.
How did “Don’t Blame the Data” come about
The blog was really just an itch I wanted to scratch, really just for my own pleasure initially (and probably still is to be honest). I code a lot in R, so thought I’d take advantage of the amazing Blogdown package that makes building a blog site relatively easy.
The first post I wrote was on Medium and it was just to dip my toes in – it was a satirical look at the 2019 Australian Federal election, which went fairly well in terms of readers, so it gave me the confidence to start putting things down. Not really caring much for politics, from there I decided I’d predominantly focus my own blog site’s attention on sports, but just not the on-field stuff.
The first post on Don’t Blame the Data was on “Bandwagon” jumpers in the AFL and tried to measure which set of supporters are there only for the good times. It performed beyond my expectations, so really spurred me on to continue with it.
I’ve since had my first child, so the posts have slowed down. We’re getting to a comfortable stage now so my goals for Don’t Blame the Data are to start ramping up the content and honing in on specific sports and not focusing this blog on some of the more programming elements.
What areas of sport do you look into
Sports (or leagues) of interest to me are the AFL, basketball (NBA mainly), NFL and I will also look to incorporate the world game into my analyses. As I mentioned earlier, it’s not only the on-field elements I like to analyse. I love to look into the business side, for example crowd attendances, are a real interest of mine.
What tools and modelling/analysis methods have you looked into?
I predominantly conduct and write my analyses using the R programming language. A lot of my analyses will use regression for both explainability and also predictions, while I’ve also introduced some unsupervised clustering techniques in the blog in an attempt to find similar FIFA19 players!
Any cool visualisations for us less technical folk?
Animations can be a really cool visualisation technique but the number of animated bar charts has gotten out of control. Sometimes the aesthetics of data visualisation seem to trump the explainability of data viz and I find it really tough to stomach!
In terms of who does some amazing visualisations, this website is one of the better ones out there. It’s beautiful and they really aid in the story that’s being told, while from a sports perspective, fivethirtyeight is hard to go past.
Where would you love to take the site in twelve months time
I’d love for my posts to be updated regularly and for it to be a regular go to for readers. That’s all conditional on professional sport making its way back after the devastation of the Coronavirus outbreak has subsided.
How do you think Sport Tech in Australia compares with the rest of the world
I think we’ve got a long way to go. If we look at the power Champion Data hold (which by the way is probably fair enough as they’ve put in the time and effort to build their product to be able to monetise it), the analytics community is fairly limited in the work that can be done.
If we look at the NFL as one overseas example, granular data is made public through the popular data science website Kaggle in their annual Big Data Bowl and the public harness that data into suggestions for the league’s benefit (through which a number of rule changes have resulted). It not only democratises the data, but the leagues themselves would get massive amounts of benefits by leveraging the crowd. Sadly, commercial realities make that a long-term dream of mine.
What advice do you have for the couch fan who wants to start playing around with some data
Just jump in! There are data sets everywhere you look – Kaggle might be a good start as there are not only datasets, but also public notebooks containing analysis others have done so you can follow the code bases, and there are some great packages in R (fitzRoy for AFL data and nflscrapR to name a few) that will make getting data easy. From there, just start playing around and see what you can find!
Which sporting industry problem would you love to solve using data analysis
Well in this sobering time going through the corona virus pandemic, it’s really hard to say. But I think coming out of it, it would be great to see the analytics community start focusing their efforts on helping leagues and teams rebound as quickly as possible to where we were before the world changed, both from an on and off-field sense.