NLP 2021) to help you identify topics of interests, however,

1 Assessment objective
The purpose of this assessment is to test the following learning outcomes: (1) demonstrating knowledge of a broader range of analytical techniques used in the field of Security and
Crime Science, (2) performing data science analyses on crime and/or- security-related
issues, (3) applying the data science pipeline on crime and/or-security related issues, (4)
interpreting and effectively reporting the results of said techniques.
Weight for the final grade: 70%
Page limit: 8 pages in the anonymised ACL long-paper format (see below). Please make
sure you do not change the template in any way by increasing or decreasing the font.
This assessment is the capstone project of the module. It requires you to address a research problem in the full data science workflow (e.g., collecting the data, processing the
data, building machine learning models, reporting on the findings and interpreting the
outcomes). You will write a report in a research paper format on your project (a template
will be provided), and you have to submit the R code needed to reproduce your findings.
After passing this assessment, you will have demonstrated the skills to solve a problem
using data science techniques.
2 Project topic
For this assessment, you will go through a full data science process to address research
questions you have about a topic of your own choice. Your project should (a) be related
to crime and/or security, (b) make use of all three areas taught in the module: web data
collection (text data), text mining and machine learning, and (c) be reproducible with
your code supplement and data.
In previous years some students worked on topics related to:
• analysing crime and security-related discussions on Reddit
• popularity analysis (e.g. what makes a post popular) on Reddit
• exploring crime coverage patterns in newspapers
You are allowed to work on similar topics and develop research questions that predict popularity or identify and analyse crime/security-related topics and patterns.

Creative ways
of addressing a problem and originality are highly valued in this assessment. We strongly
encourage you to make additional reading and recommend you to look at some of the
relevant natural language processing conferences such as ACL-IJCNLP and NAACL including workshops (e.g. SocialNLP 2021) to help you identify topics of interests, however,
your work should not be a replication of previous studies by others.
3 Project feedback and ethics
This is a large project and to help you in the process we will have a 1-on-1 feedback
session and require you to submit a brief project description form before the feedback
session. During the feedback session, you will receive feedback from us and get help with
any questions you might have. This process will also help us to identify if the project
has any ethical implications, see here. The project description form will be available
on Moodle. The feedback session will be held on 16 March 2023 and each student will
receive 10 minutes of feedback (time slots and locations to be arranged). Please submit
your project description three days before the deadline on 13 March 2023, so that we
can read it before the feedback session. While the project description submission and
attending the feedback session are mandatory, they will not be graded or count towards
your final mark.
3.1 Collecting data
You are required to collect your own text data for this project. You will need to take into
account the terms and conditions of the websites you intend to use for data collection.
We recommend that you use APIs to collect the data and avoid web scraping, as scraping
data requires ethics approval, please see the section under Step 1. Please make sure you speak to us before you commit to data collection.