Exercise 1)
A. Retrieve the top 100 posts from a Reddit community of your choice. Print 10 observations
from the data frame containing the scraped content and describe in a few sentences what
you observe (e.g. titles, content, upvotes, etc.).
B. Select one post from your dataframe that has at least 15 comments. Retrieve the top 15
comments for that post. Print the comments and in a short paragraph analyze the users’
discourse (what are the main topics of conversation?).
Exercise 2)
A. Use the dataframe generated for Exercise 1 A. Create a scatterplot to illustrate the
relationship between 2 continuous variables (e.g. number of comments and score). In a short
paragraph, describe what you observe.
B. Generate a wordcloud visualization for the variable containing the post titles. In a short
paragraph, describe what you observe.
Note: Make sure your visualizations have a title.
Exercise 3) – You will learn the skills to complete this exercise in WS6
A. Use a sentiment analysis algorithm to classify the titles or the content of the posts retrieved
at Ex. 1 A. Print 10 observations to highlight the output of the sentiment classification (e.g.
the column containing the results and the content classified)
Note: It is often the case that posts do not contain content, so if you have a lot of missing
values for the content variable, you might want to classify titles instead.
B. Reproduce the scatterplot from Ex. 2 A, but this time colour the data points by categories in
the sentiment variable. In a few sentences, describe what you observe.