Hello from another data science project How To Scrape Data From Understat.com?.
I hope you all are cool 🙂
Today we will be doing an introductory study on football analytics and we will get our own scrape data from understat.com. There are already many companies that keep the data revealed in football matches, such as statsbomb, whyscout, opta. But sometimes, having the freedom to scrape our own data will allow us to work faster in the studies we want to do, and according to our analytical approach, we will be able to get our data from the site we want as we want. The technique we use for this is web scrapping!! For this, we will use the BeautifulSoup library and Request libraries, Without further ado, we can move on to our study.
(1) PIP Install – How To Scrape Data From Understat.com?
(2) import to libraries – How To Scrape Data From Understat.com?
Ok, now we assign the website from which we will receive our data to a variable and define the match ID variable. I’ll explain the Match ID part again.
(3) Assaign to urls – How To Scrape Data From Understat.com?
MatchID, which we saw in the above image, is an ID located at the end of the URL when we enter the relevant event, namely the Manchester City – Tottenham match.
We got the matchID from the URL. Now, we get all the materials on the web page related to the Request.get method from the matchID we have, and then we parser the HTML elements with BeautifulSoup.
(4) HTML Parser – How To Scrape Data From Understat.com?
Above, I stated that we only want to take the shots that took place in the match. Now we only get the shots that took place in the match.
(5) Getting The Shot – How To Scrape Data From Understat.com?
Let’s see what’s inside the strings :
(6) Converting – How To Scrape Data From Understat.com?
It looks very chaotic.Now let’s bring this structure to a more regular form.
(7) Getting a Dataframe from JSON form – How To Scrape Data From Understat.com?
(8) Final Look’s of Dataframe – How To Scrape Data From Understat.com?
We transform the organized data into a dataframe. We also define the column names of the dataframe we have created.
We put the scattered shot data on the website into a table. Here, our important variable is the result variable. Result variable contains various unique values. These values indicate that the shot was scored (Goal), blocked by an opponent player (BlockedShot), saved by the goalkeeper (SavedShot) and the shot leaving the field of play (MissedShots).
Web scrapping is used as a very useful method when we can’t get a data set from organizations such as Statsbomb that hold the data of sports competitions, or when we want to perform a different analysis approach. In this study, we saw that we can simply obtain football data from a web page. In our next studies, we will focus on more comprehensive topics on football analytics. We have come to the end of our article, I hope it was a pleasant reading session for you.