AIDM7380 Recommender Systems for Digital Media
AIDM7380 L2 – py_nb_implicitRatings-withExercises
# Install libraries using pip package in the current Jupyter kernel
User Behaviour and the User-Item Matrix¶
Importing and knowing your data¶
# Import the library
import wget
import os, pathlib
# Setup URL and path variables
baseURL = ‘https://raw.githubusercontent.com/pmengoni/AIDM7380-2223S2/main/’
doc = ‘collector_log.csv’
fullURL = baseURL + doc
dataPath = drivePath + ‘/MyDrive/Colab Notebooks/data’
#Create path if not exists
if not(os.path.exists(dataPath)):
path = pathlib.Path(dataPath)
path.mkdir(parents=True, exist_ok=True)
else:
print(‘The data path you selected already exists’)
# Download the file
fileName = wget.download(fullURL, out=dataPath)
# Print the file name including the local path
print(fileName)
In [ ]:
evidence = pd.read_csv(fileName)
In [ ]:
# checkin the type and take a glance at the head
print(type(evidence))
evidence.head(5)
Examining the attributes of the Data Frame (standard procedures)¶
df.shape (“dim” in R)
df.columns (check the variables, like “names” in R)
df.index (check the index of the “rows”)
df.info()
df.describe() (descriptive statistics for numerical variables)
In [ ]:
evidence.shape
# (the number of cases/observations, the number of variables)
In [ ]:
evidence.columns
In [ ]:
evidence.index
In [ ]:
evidence.info()
In [ ]:
evidence.describe()
In [ ]:
users = evidence.user_id.unique()
content = evidence.content_id.unique()
print(type(content))
print(len(content))
print(len(users))
Implicit Ratings¶
Binary Matrix¶
Let’s create a user-item binary matrix from the “buy” events
In [ ]:
#Create a user-item binary matrix
uiBuyMatrix = pd.DataFrame(columns=content, index=users)
uiBuyMatrix.head(2)
In [ ]:
evidence.event.unique()
Select only the “buy” events
In [ ]:
buyEvidence = evidence[evidence[‘event’] == ‘buy’]
buyEvidence.head(5)
Create the user-item matrix uiBuyMatrix for the buy events
In [ ]:
for index, row in buyEvidence.iterrows():
currentUser = row[‘user_id’]
currentContent = row[‘content_id’]
uiBuyMatrix.at[currentUser, currentContent] = 1
In [ ]:
print(uiBuyMatrix)
Behavioural Implicit Ratings¶
Using the formula introduced during lecture
$${IR}_(i,u) = \left(w_1*{\#event}_1\right)+\left(w_2*{\#event}_2\right)+\dots+\left(w_n*{\#event}_n\right)$$
In [ ]:
#Create a user-item matrix
uiMatrix = pd.DataFrame(columns=content, index=users)
uiMatrix.head(2)
Type of events recorded in the logs
In [ ]:
eventTypes = evidence.event.unique()
print(eventTypes)
Give a weight to each of them
In [ ]:
eventWeights = {
‘details’: 15,
‘moreDetails’: 50,
‘genreView’: 0,
‘addToList’: 0,
‘buy’: 100}
Compute the Implicit Rating for each user-item combination.
Populate the user-item matrix uiMatrix with the IR values.
In [ ]:
# Iterate the evidence
for index, row in evidence.iterrows():
# Select the user and items involved
currentUser = row[‘user_id’]
currentContent = row[‘content_id’]
# Extract the appropriate weight for the event
w = eventWeights[row[‘event’]]
# Find the value eventually stored for the current user-item combination
currentValue = uiMatrix.at[currentUser, currentContent]
if np.isnan(currentValue):
currentValue = 0
# Compute the new value and update the user-item matrix
updatedValue = currentValue + w #+ (1 * w)
uiMatrix.at[currentUser, currentContent] = updatedValue
In [ ]:
print(uiMatrix)
Exercise 1 (1′)¶
Update the user-item matrix by normalizing the values between 0 and 10. Note: NaN values should be maintained as NaN
Hint: the maximum value in the matrix is the following value
In [ ]:
np.nanmax(uiMatrix.values)
In [ ]:
Exercise 2 (1′)¶
Limit the number of relevant events to a specific threshold (e.g. 10).
In [ ]:
Exercise 3 (2′)¶
Add a decay threshold. Older events are not informative about the user’s behavior.
Check the sample Python function and adapt the code according to the following formulation.
Behavioural Implicit Ratings with Decay¶
We modify the formula introduced during lecture
$${IR}_{(i,u)} = \sum_{i=1}^n w_i*{\#event}_i = \left(w_1*{\#event}_1\right)+\left(w_2*{\#event}_2\right)+\dots+\left(w_n*{\#event}_n\right)$$to
$${IRDecay}_{(i,u)} = \sum_{i=1}^n w_i*{\#event}_i*d\left({\#event}_i\right) = \left(w_1*{\#event}_1*d\left({\#event}_1\right)\right)+\left(w_2*{\#event}_2*d\left({\#event}_2\right)\right)+\dots+\left(w_n*{\#event}_n*d\left({\#event}_n\right)\right)$$
Computing decay¶
In [ ]:
import datetime
from datetime import date, timedelta, datetime
def compute_decay(eventDate, decayDays):
age = (date.today() – datetime.strptime(eventDate, ‘%d/%m/%Y %H:%M’).date()) // timedelta(days=decayDays)
#print(“Age of event:”, age)
decay = 1/age #simple decay
#print(“Decay factor:”, decay)
return decay
createdEvent = evidence.at[0,’created’]
thresholdDays = 2 # Number of days
decayFactor = compute_decay(createdEvent, thresholdDays)
print(decayFactor)