ECON6087 2023 Spring Assignment1
For assignment 1, we will use a new corpus,
“A Million News Headlines” Corpus, cov-
ering all the news headlines published on the Australian news source ABC (Australian
Broadcasting Corporation, http://www.abc.net.au) over a period of 19 years. The data can
be accessed from the following Kaggle page https ://www.kaggle.com/datasets/therohk/
million-headlines. You may also learn more details about this dataset and even found
some coding examples from the same page. Please use this data to finish the following tasks
Train word embeddings using word2vec on this corpus, and perform a sentiment
analysis based on the word embeddings and the “positivity” vector. We construct
this vector based on the same wav as Luca Bellodi (2022):
positivity = success + good + happy + perfect + +important + worth + rich
Failurè – bad – sad – terrible – bad – regret – pool
• Use the appropriate pre-processing steps that you feel fit;
Decide on the size of dimensions, number of iterations, and which model you
would like to train;
Choose a reasonable distance (or similarities) measure;
Find a reasonable way to aggregate the sentiment scores for each word to the
document level
2. Plot the article-level sentiment scores by vear-month.
Tr to construct sentiment scores toward different countries or international organiza-
tions, such as “US”, “UK”, and “Russia”, “Iran”, “NATO”, and “UN”‘.
Please submit your markdown files with both codes to complete the above tasks and the
plots as output. The deadline is 8 March before class (at 6:15pm).