AIDM7380 python 推荐系统

AIDM7380 Recommender Systems for Digital Media
AIDM7380 L2 – py_nb_implicitRatings-withExercises

# Install libraries using pip package in the current Jupyter kernel

User Behaviour and the User-Item Matrix¶

Importing and knowing your data¶

# Import the library
import wget
import os, pathlib

# Setup URL and path variables
baseURL = ‘https://raw.githubusercontent.com/pmengoni/AIDM7380-2223S2/main/’
doc = ‘collector_log.csv’
fullURL = baseURL + doc

dataPath = drivePath + ‘/MyDrive/Colab Notebooks/data’

#Create path if not exists
if not(os.path.exists(dataPath)):
path = pathlib.Path(dataPath)
path.mkdir(parents=True, exist_ok=True)
else:
print(‘The data path you selected already exists’)

# Download the file
fileName = wget.download(fullURL, out=dataPath)

# Print the file name including the local path
print(fileName)

In [ ]:

evidence = pd.read_csv(fileName)

In [ ]:

# checkin the type and take a glance at the head
print(type(evidence))
evidence.head(5)

Examining the attributes of the Data Frame (standard procedures)¶
df.shape (“dim” in R)
df.columns (check the variables, like “names” in R)
df.index (check the index of the “rows”)
df.info()
df.describe() (descriptive statistics for numerical variables)

In [ ]:

evidence.shape
# (the number of cases/observations, the number of variables)

In [ ]:

evidence.columns

In [ ]:

evidence.index

In [ ]:

evidence.info()

In [ ]:

evidence.describe()

In [ ]:

users = evidence.user_id.unique()
content = evidence.content_id.unique()
print(type(content))
print(len(content))
print(len(users))

Implicit Ratings¶
Binary Matrix¶
Let’s create a user-item binary matrix from the “buy” events

In [ ]:

#Create a user-item binary matrix
uiBuyMatrix = pd.DataFrame(columns=content, index=users)
uiBuyMatrix.head(2)

In [ ]:

evidence.event.unique()

Select only the “buy” events

In [ ]:

buyEvidence = evidence[evidence[‘event’] == ‘buy’]
buyEvidence.head(5)

Create the user-item matrix uiBuyMatrix for the buy events

In [ ]:

for index, row in buyEvidence.iterrows():
currentUser = row[‘user_id’]
currentContent = row[‘content_id’]
uiBuyMatrix.at[currentUser, currentContent] = 1

In [ ]:

print(uiBuyMatrix)

Behavioural Implicit Ratings¶
Using the formula introduced during lecture

$${IR}_(i,u) = \left(w_1*{\#event}_1\right)+\left(w_2*{\#event}_2\right)+\dots+\left(w_n*{\#event}_n\right)$$

In [ ]:

#Create a user-item matrix
uiMatrix = pd.DataFrame(columns=content, index=users)
uiMatrix.head(2)

Type of events recorded in the logs

In [ ]:

eventTypes = evidence.event.unique()
print(eventTypes)

Give a weight to each of them

In [ ]:

eventWeights = {
‘details’: 15,
‘moreDetails’: 50,
‘genreView’: 0,
‘addToList’: 0,
‘buy’: 100}

Compute the Implicit Rating for each user-item combination.
Populate the user-item matrix uiMatrix with the IR values.

In [ ]:

# Iterate the evidence
for index, row in evidence.iterrows():
# Select the user and items involved
currentUser = row[‘user_id’]
currentContent = row[‘content_id’]

# Extract the appropriate weight for the event
w = eventWeights[row[‘event’]]

# Find the value eventually stored for the current user-item combination
currentValue = uiMatrix.at[currentUser, currentContent]
if np.isnan(currentValue):
currentValue = 0

# Compute the new value and update the user-item matrix
updatedValue = currentValue + w #+ (1 * w)
uiMatrix.at[currentUser, currentContent] = updatedValue

In [ ]:

print(uiMatrix)

Exercise 1 (1′)¶
Update the user-item matrix by normalizing the values between 0 and 10. Note: NaN values should be maintained as NaN

Hint: the maximum value in the matrix is the following value

In [ ]:

np.nanmax(uiMatrix.values)

In [ ]:

Exercise 2 (1′)¶
Limit the number of relevant events to a specific threshold (e.g. 10).

In [ ]:

Exercise 3 (2′)¶
Add a decay threshold. Older events are not informative about the user’s behavior.
Check the sample Python function and adapt the code according to the following formulation.

Behavioural Implicit Ratings with Decay¶
We modify the formula introduced during lecture

$${IR}_{(i,u)} = \sum_{i=1}^n w_i*{\#event}_i = \left(w_1*{\#event}_1\right)+\left(w_2*{\#event}_2\right)+\dots+\left(w_n*{\#event}_n\right)$$to

$${IRDecay}_{(i,u)} = \sum_{i=1}^n w_i*{\#event}_i*d\left({\#event}_i\right) = \left(w_1*{\#event}_1*d\left({\#event}_1\right)\right)+\left(w_2*{\#event}_2*d\left({\#event}_2\right)\right)+\dots+\left(w_n*{\#event}_n*d\left({\#event}_n\right)\right)$$

Computing decay¶

In [ ]:

import datetime
from datetime import date, timedelta, datetime

def compute_decay(eventDate, decayDays):
age = (date.today() – datetime.strptime(eventDate, ‘%d/%m/%Y %H:%M’).date()) // timedelta(days=decayDays)
#print(“Age of event:”, age)
decay = 1/age #simple decay
#print(“Decay factor:”, decay)

return decay

createdEvent = evidence.at[0,’created’]
thresholdDays = 2 # Number of days
decayFactor = compute_decay(createdEvent, thresholdDays)

print(decayFactor)