K-means Clustering From Scratch In Python [Machine Learning Tutorial]
Dataquest Dataquest
57.1K subscribers
71,827 views
0

 Published On Jul 11, 2022

In this project, we'll build a k-means clustering algorithm from scratch. Clustering is an unsupervised machine learning technique that can find patterns in your data. K-means is one of the most popular forms of clustering.

We'll create our algorithm using python and pandas. We'll then compare it to the reference implementation from scikit-learn.

You can find the full project code here - https://github.com/dataquestio/projec... .

You can download the data here - https://www.kaggle.com/datasets/stefa... .

Project Steps
- Write out pseudocode for the algorithm
- Code the k-means algorithm
- Plot the clusters from the algorithm
- Compare performance to the scikit-learn algorithm

Chapters

00:00 Intro
00:37 k-means overview
02:51 Loading in and cleaning FIFA data
06:11 Scaling the data
10:31 Initialize random centroids
14:20 Finding cluster labels for each data point
19:29 Update centroid values
23:30 Plotting k-means iterations
28:24 Pulling the algorithm together
35:25 Comparing our implementation to scikit-learn
37:56 Conclusion and next steps

------------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: https://bit.ly/3O8MDef

show more

Share/Embed