Simple Recommendation System with Euclidean Distance
Simple Recommendation System with Euclidean Distance
Today I want to make a simple recommendation system using this streaming music app user data:
Cherrybelle | Kangen Band | Netral | PAS Band | SM*SH | The Rain | Ungu | |
---|---|---|---|---|---|---|---|
Agus | 4.0 | 4.5 | 2.5 | 3.5 | 5.0 | ||
Andi | 2.0 | 5.0 | 4.5 | ||||
Angga | 4.5 | ||||||
Indah | 3.5 | 4.5 | 5.0 | 4.0 | |||
Siti | 4.0 | 4.0 | 1.0 | 5.0 | 3.5 | ||
Solihah | 4.0 | 4.0 | 1.0 | 5.0 | 3.5 |
Notes:
- There are some sections in the task that have not been practiced in the references. But, Google is always ready to help you when difficulties and errors come.
- Distance is only formed if both users rate the same musician. If user X does not have another user partner who has rated the same musician then that user X has no distance to anyone.
- If there are two or more people having the same distance to a particular user select the first one to appear.
- It’s okay if the recommendation generates an empty list because the closest distance has rated the same musician.
- It’s okay if the recommendation produces an empty list because it doesn’t have any distance to anyone.
Euclidean Distance Formula:
Functional Programming
Step 1: Input User Data
users = {'Agus':{'Cherrybelle': 4.0,
'Kangen Band': 4.5,
'Netral': 2.5,
'SM*SH': 3.5,
'Ungu': 5.0},
'Andi':{'Kangen Band':2.0,
'Netral':5.0,
'PAS Band':4.5},
'Angga':{'The Rain':4.5},
'Indah':{'Kangen Band':3.5,
'Netral':4.5,
'PAS Band':5.0,
'Ungu':4.0},
'Siti':{'Cherrybelle':4.0,
'Kangen Band':4.0,
'PAS Band':1.0,
'SM*SH':5.0,
'Ungu':3.5},
'Solihah':{'Cherrybelle':4.0,
'Kangen Band':4.0,
'PAS Band':1.0,
'SM*SH':5.0,
'Ungu':3.5}}
names = ['Agus','Andi','Angga','Indah','Siti','Solihah']
Step 2: Euclidean Distance Code
# Import Library
from math import sqrt
# Euclidean Distance
def euclidean(rating1, rating2):
distance = 0.0
key_available = False
for key in rating1.keys():
if key in rating2.keys():
key_available = True
distance += (rating1[key]-rating2[key])**2
distance = sqrt(distance)
if key_available == True:
return distance
# Check Distance Between User
print('Andi -> Agus: ', euclidean(users['Agus'],users['Andi']))
print('Indah -> Agus: ', euclidean(users['Agus'],users['Indah']))
print('Siti -> Agus: ', euclidean(users['Agus'],users['Siti']))
print('Angga -> Agus: ', euclidean(users['Agus'],users['Angga']))
Andi -> Agus: 3.5355339059327378
Indah -> Agus: 2.449489742783178
Siti -> Agus: 2.179449471770337
Angga -> Agus: None
The distance result between Angga and Agus is None. It happens because distance is only formed if both users rate the same musician (Note 2). In Angga and Agus case, both users didn’t rate the same musician.
Cherrybelle | Kangen Band | Netral | PAS Band | SM*SH | The Rain | Ungu | |
---|---|---|---|---|---|---|---|
Agus | 4.0 | 4.5 | 2.5 | 3.5 | 5.0 | ||
Angga | 4.5 |
Step 3: Nearest Neighbor with Euclidean Distance
#Nearest Neighbor
def NN(username, data):
distances = []
for user in data:
if user != username:
distance = euclidean(data[username], data[user])
if distance != None:
distances.append((distance, user))
distances.sort()
return distances
#Check All of Agus Neighbor
NN('Agus', users)
[(2.179449471770337, 'Siti'),
(2.179449471770337, 'Solihah'),
(2.449489742783178, 'Indah'),
(3.5355339059327378, 'Andi')]
#Check The Closest Neighbor
for name in names:
neighbor = NN(name, users)
if not neighbor:
print('Closest Neighbor for '+name+ ': Unavailable')
else:
print('Closest Neighbor for '+name+ ': ',neighbor[0])
Closest Neighbor for Agus: (2.179449471770337, 'Siti')
Closest Neighbor for Andi: (1.6583123951777, 'Indah')
Closest Neighbor for Angga: Unavailable
Closest Neighbor for Indah: (1.6583123951777, 'Andi')
Closest Neighbor for Siti: (0.0, 'Solihah')
Closest Neighbor for Solihah: (0.0, 'Siti')
Siti and Solihah have the same distance to become Agus closest neighbor. However, it shows that Siti is the first one that appear (Note 3).
Step 4: Recommendation Function
# Recommend Function
def recommend(username, data):
recommendations = []
try:
nearest = NN(username, data)[0][1]
nearestRatings = data[nearest]
userRatings = data[username]
for artist in nearestRatings:
if artist not in userRatings:
recommendations.append((nearestRatings[artist], artist))
recommendations.sort(reverse=True)
if not recommendations:
return 'Sorry, recommendation for '+username+' is unavailable.'
else:
return recommendations
except:
return 'Sorry, recommendation for '+username+' is unavailable.'
# Check The Recommendation
for name in names:
print('Recommendation for '+name+ ': ',recommend(name, users))
Recommendation for Agus: [(1.0, 'PAS Band')]
Recommendation for Andi: [(4.0, 'Ungu')]
Recommendation for Angga: Sorry, recommendation for Angga is unavailable.
Recommendation for Indah: Sorry, recommendation for Indah is unavailable.
Recommendation for Siti: Sorry, recommendation for Siti is unavailable.
Recommendation for Solihah: Sorry, recommendation for Solihah is unavailable.
Recommendation for Angga is unavailable because Angga doesn’t have any distance to anyone (Note 5).
Recommendation for Siti, Solihah, and Indah are unavailable because the closest distance has rated the same musician (Note 4).
Cherrybelle | Kangen Band | Netral | PAS Band | SM*SH | The Rain | Ungu | |
---|---|---|---|---|---|---|---|
Agus | 4.0 | 4.5 | 2.5 | 3.5 | 5.0 | ||
Andi | 2.0 | 5.0 | 4.5 | ||||
Angga | 4.5 | ||||||
Indah | 3.5 | 4.5 | 5.0 | 4.0 | |||
Siti | 4.0 | 4.0 | 1.0 | 5.0 | 3.5 | ||
Solihah | 4.0 | 4.0 | 1.0 | 5.0 | 3.5 |
Object Oriented Programming
#Object Oriented Programming Vers.
class recommender:
def __init__(self, data):
self.data = data
from math import sqrt
def euclidean(self, username1, username2):
rating1 = self.data[username1]
rating2 = self.data[username2]
distance = 0.0
key_available = False
for key in rating1.keys():
if key in rating2.keys():
key_available = True
distance += (rating1[key]-rating2[key])**2
distance = sqrt(distance)
if key_available == True:
return distance
def NN(self, username):
distances = []
for user in self.data:
if user != username:
distance = self.euclidean(username, user)
if distance != None:
distances.append((distance, user))
distances.sort()
return distances
def recommend(self, username):
recommendations = []
try:
nearest = self.NN(username)[0][1]
nearestRatings = self.data[nearest]
userRatings = self.data[username]
for artist in nearestRatings:
if artist not in userRatings:
recommendations.append((nearestRatings[artist], artist))
recommendations.sort(reverse=True)
if not recommendations:
return 'Sorry, recommendation for '+username+' is unavailable.'
else:
return recommendations
except:
return 'Sorry, recommendation for '+username+' is unavailable.'
#Check The OOP Vers.
users_r = recommender(users)
for name in names:
print("Recommendation for "+name+ ": ",users_r.recommend(name))
Recommendation for Agus: [(1.0, 'PAS Band')]
Recommendation for Andi: [(4.0, 'Ungu')]
Recommendation for Angga: Sorry, recommendation for Angga is unavailable.
Recommendation for Indah: Sorry, recommendation for Indah is unavailable.
Recommendation for Siti: Sorry, recommendation for Siti is unavailable.
Recommendation for Solihah: Sorry, recommendation for Solihah is unavailable.