Tamela Maciel

Logo

PhD astrophysicist. Data Scientist.
Space Communications Manager.
Physics editor. Science writer.

View My LinkedIn Profile

Find Me On Twitter

View My GitHub Profile

Clusters of Toronto Neighbourhoods
What are the characteristics of different neighbourhoods in the city of Toronto? Where are the cafes and fast food venues compared to the hotels and restaurants?

By Tamela Maciel, June 2020

ABOUT:

This jupyter notebook completes the first part of my IBM Data Science Professional Certificate through Coursera.

It uses the Foursquare API, BeautifulSoup for webscraping, and various python libraries to gather location data for the city of Toronto and compares and clusters various neighbourhoods using K-Means.

alt text

Methodology

First, the key geographic data was scrapped from a wikipedia table of Toronto boroughs, using BeautifulSoup, and from a csv file of postal codes, latitudes, and longitudes.

The top venues within 500 metres of each area lat/long were gathered using the Foursquare API. In order to build a full profile of each area, I dropped those areas that returned less than 10 venues, as these would be biaised heavily to the small number of venues within.

This left 47 areas to cluster. A K-means algorithm was used to cluster these areas based on the top 10 types of venue per each area. In order to select the best number of clusters, K, I iterated the clustering between 1-10 and selected the value K=3 according to the measure of inertia and the ‘elbow method’.

Python libraries

Pandas, Numpy, BeautifulSoup
KMeans from Sklearn.cluster
Folium, Matplotlib

The result

There are three natural clusters of different types of neighbourhoods. This is determined using the ‘elbow method’.

These can be profiled as follows by considering the types of common venues and their geographic location:

These three clusters are mapped using folium and coloured according to their cluster label below:

alt text

For more details and the jupyter notebook, see the GitHub repository.