Webscraping is technique employed to harvest and extract data from websites. For the purpose of this exercise I use the tabulated information retrieved from Newzoo.com web page here. The goal is to find out from the list of 20 ranked games, the top 3 publishers with the most game titles on that list. The restructured data is then charted for presentation. At the bottom of this page is the complete Jupyter Notebook result.
As it is a simple exercise, the task involved just a few steps (include some data wrangling to visualizations):
- Extract the table from a URL link.
- Examine the data type and the number of entries in the set.
- Clean up and recompile the data, replace the unreadable entries.
- Extract a list of unique publishers.
- Grouped game titles by Publisher and counted the number of titles allocated to each publisher.
- Create separate lists for data and labels.
- Create the bar and pie charts with the new lists.