2024/3/22 13:25 FINAL PROJECT :: Sentiment Analysis Tool and Text Analysis Report
FINAL PROJECT :: Sentiment Analysis Tool and Text Analysis Report
Sentiment Analysis Tool and Text Analysis Report
Release History
You are always responsible for the latest release of assignments. The release may be updated at any time to fix bugs or add clarification. Canvas will send out an announcement if and when the assignment is updated. Make sure that you are receiving these Canvas notifications. Get used to the idea that programs don’t have static requirements, programmers are expected to adapt to changing requirements. Reporting assignment bugs on Piazza is a good way to contribute to this course.
V002 : Added the rubric and rubric text
V001 : Initial release, expect a few bugs.
Table of Contents
Overview of Negations, Intensifiers, and Downtoners Part 1 :: Writing the Improved Sentiment Analyzer
External Packages and an Introduction to Pip
Introduction to NLTK in Sentiment Analysis
Introduction to matplotlib
Introduction to Using Classes in Python for Organizational Purposes Implementing the Sentiment Analyzer Class
Part 2 :: Analyzing External Datasets
Analyzing Project Gutenberg Texts: Unveiling Sentiment Arcs Processing Texts into Chapters for Analysis
Completing the Sentiment Analysis Project
Part 3 :: Writing the Report Rubric Text
Programming Help
2024/3/22 13:25 FINAL PROJECT :: Sentiment Analysis Tool and Text Analysis Report
In this project, you will develop a sentiment analysis tool using Python, then apply this tool to analyze text from a selection of curated datasets. The goal is to gain insights into the overall sentiment expressed within these texts and discuss your findings. You are also encouraged to extend the tool with additional features and include your extended code as an appendix to your report.
Part 1: Developing the Sentiment Analysis Tool
Your first task is to develop a sentiment analysis tool that computes the sentiment of text based on predefined lists of positive and negative words, handles negation, and adjusts sentiment scores based on the presence of intensifiers and downtoners. Detailed specifications as well as scaffolding for the project can be found below.
Specifications:
1. Basic Sentiment Analysis: Implement functionality to identify positive and negative keywords within the text.
2. Negation Handling: Include logic to handle negations, affecting the sentiment of words within a specified distance.
3. Modifiers: Account for intensifiers and downtoners that modify the intensity of sentiment expressions.
4. Compute Sentiment: The tool should be able to compute and report the overall sentiment of a given piece of text, indicating whether it is generally positive, negative, or neutral.
Requirements:
Use the Python programming language and the NLTK library for natural language processing tasks.
Write clear, well-documented code following best practices.
Ensure your program can process an input text and output the computed sentiment.
Part 2: Text Analysis Report
After developing your sentiment analysis tool, select a text dataset from the provided options to analyze. Use your tool to investigate the overall sentiment of the text and explore any interesting patterns or insights you can find.
Your report should include:
1. Introduction: Briefly introduce the sentiment analysis tool and the chosen dataset.
2. Methodology: Describe how you used the sentiment analysis tool to analyze the dataset.
3. Findings: Present your findings on the overall sentiment of the dataset. Discuss any trends,
patterns, or anomalies you discovered.
4. Discussion: Reflect on the potential implications of your findings. Consider how the sentiment
insights could be useful or relevant in real-world contexts.
5. Conclusion: Summarize your work and any conclusions you’ve drawn from the analysis.
2024/3/22 13:25 FINAL PROJECT :: Sentiment Analysis Tool and Text Analysis Report
Optional Extension:
You are encouraged to extend the functionality of your sentiment analysis tool. Potential extensions could involve improving the handling of negation and modifiers, implementing part-of- speech tagging to refine sentiment calculations, or any other enhancements you find interesting. If you choose to extend the program, include your extended code as an appendix to your report. Discuss the extensions you made and how they potentially improve upon the basic tool.
Submission Guidelines
Submit the source code for your sentiment analysis tool as a Python script.
Submit your text analysis report as a PDF document. If you extended the tool, ensure your extended code is included as an appendix.
Ensure your submission adheres to all specified requirements and is submitted by the project deadline.
This project offers you an opportunity to apply programming and natural language processing concepts in a practical context, enhancing your skills in data analysis and Python programming. Good luck, and we look forward to seeing your insights!
In sentiment analysis, understanding the interplay between negations, intensifiers, and sentiment words (positive and negative) is crucial for accurately capturing the nuances of language and emotion in text. Here’s a broad overview tailored to the scope of this project:
Overview of Negations, Intensifiers, and Downtoners
Positive and Negative Words
Positive Words: These are words that convey positive sentiment, emotions, or evaluations, such as “happy,” “love,” and “excellent.” They contribute positively to the sentiment score of a sentence or phrase.
Negative Words: Conversely, these words express negative sentiments, emotions, or evaluations, such as “sad,” “hate,” and “poor.” They contribute negatively to the sentiment score.
Negations are words or phrases that invert or negate the sentiment of the words that follow. Examples include “not,” “never,” and “no one.” Negations can dramatically change the meaning of a sentence. For instance, “I am happy” has a positive sentiment, but adding a negation as in “I am not happy” flips the sentiment to negative.
Intensifiers and Downtoners
Intensifiers: These are words that amplify the sentiment of the words they modify. They can increase the positive or negative impact of a sentiment word. Examples include “very,” “extremely,” and “absolutely.” For example, “happy” is positive, but “very happy” is more strongly positive.
Downtoners: These words decrease the intensity of the sentiment of the words they modify. They
make the sentiment less extreme. Examples include “slightly,” “somewhat,” and “a bit.” For https://canvas.ucdavis.edu/courses/855436/assignments/1212335 3/24
2024/3/22 13:25 FINAL PROJECT :: Sentiment Analysis Tool and Text Analysis Report example, “sad” is negative, but “slightly sad” is less strongly negative.
Combining Elements in Semantic Analysis
Understanding how these elements combine is key to semantic analysis. For instance, an intensifier before a positive word can boost the overall positive sentiment, while a negation before a positive word can turn the sentiment negative. Similarly, an intensifier before a negation (e.g., “definitely not”) can make the negation more forceful, potentially leading to a stronger negative sentiment than the negation alone would imply.
In the context of this project, you’ll be coding a sentiment analysis class that takes these linguistic elements into account to calculate sentiment scores for given text. This involves identifying these elements in sentences, understanding their impact on sentiment individually and in combination, and applying this understanding algorithmically to assess the overall sentiment of text inputs accurately.
Part 1 :: Writing the Improved Sentiment Analyzer
Be sure to read through the instructions carefully. There are many moving parts to this project. You will be setting up external packages, writing your code in a scaffolded python class, and working on a more advanced version of the sentiment analyzer. For this project, you will be able to use some external packages
External Packages and an Introduction to Pip
pip is a package installer for Python, allowing you to manage software packages easily used in your Python projects. It gives you access to a vast repository of libraries on the Python Package Index (PyPI), enabling you to add external packages to your environment with simple commands. Using
pip , you can install, upgrade, and remove packages, streamlining the development process and ensuring you have the right tools at your disposal.
Installing Required Packages for Your Project
For this sentiment analysis project, you will need several packages, including nltk for natural language processing tasks, matplotlib for plotting charts. These are the only two packages required, and allowed, for the graded coding portion of the assignment. You may potentially need others based on specific requirements of your project. Here’s how to get started with pip and install the required packages.
Step 1: Ensure Pip is Installed
First, make sure you have pip installed. It usually comes with Python installations. To check if pip is installed, open a terminal or command prompt and type:
pip –version
If pip is installed, this command will return the version of pip you have. If not, you’ll need to install pip by downloading get-pip.py and running it with Python. Instructions for installing pip can be
https://canvas.ucdavis.edu/courses/855436/assignments/1212335 4/24
2024/3/22 13:25 FINAL PROJECT :: Sentiment Analysis Tool and Text Analysis Report found at https://pip.pypa.io/en/stable/installation/.
Step 2: Upgrade Pip
It’s a good practice to ensure your pip is up-to-date before installing new packages. To upgrade pip ,
run: bash/cmd
pip install –upgrade pip
Step 3: Install NLTK
With pip ready, you can install the Natural Language Toolkit ( nltk ) by running the following command:
pip install nltk
This command tells pip to download and install the latest version of nltk and its dependencies from PyPI.
Step 4: Verify Installation
After installation, you can verify that nltk was installed correctly by starting a Python shell and importing the library:
import nltk
If the import is successful without errors, nltk is correctly installed. Step 5: Download NLTK Data
nltk requires specific data packages for different tasks. For tokenization, you’ll need the punkt package. To download it, run the following Python code:
import nltk nltk.download(‘punkt’)
This command opens the NLTK Downloader, which manages the download of nltk data packages. You can also download other data packages required for your project in a similar manner.
Step 6: Install Matplotlib
After setting up NLTK for your sentiment analysis project, you will also need matplotlib for plotting and visualizing data, such as sentiment scores over time or across different text segments.
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It’s especially useful for this project if you decide to extend your analysis with visual representations of sentiment trends or comparisons.
To install Matplotlib, run the following command in your terminal or command prompt:
https://canvas.ucdavis.edu/courses/855436/assignments/1212335 5/24
2024/3/22 13:25 FINAL PROJECT :: Sentiment Analysis Tool and Text Analysis Report
pip install matplotlib
This command tells pip to download and install the latest version of Matplotlib and its dependencies from PyPI.
Verify Matplotlib Installation
After installation, you can verify that Matplotlib was installed correctly by attempting to import it in a Python session:
import matplotlib.pyplot as plt
If this command executes without errors, Matplotlib is correctly installed and ready to be used in your project.
Using Matplotlib in Your Project
Matplotlib can be used to create a wide range of plots and charts. For sentiment analysis, you might use it to create line plots showing sentiment scores across different parts of a text or bar charts comparing overall sentiment across multiple texts.
Here’s a simple example of how to create a line plot with Matplotlib:
import matplotlib.pyplot as plt
# Example sentiment scores
sentiment_scores = [0.1, 0.5, -0.2, 0.4, 0.8]
# Create a line plot of sentiment scores
plt.plot(sentiment_scores, marker=’o’, linestyle=’-‘, color=’blue’)
plt.title(‘Sentiment Scores Over Time’)
plt.xlabel(‘Time’)
plt.ylabel(‘Sentiment Score’)
plt.grid(True)
plt.show()
This code plots a series of sentiment scores on a line chart, illustrating how you might visualize changes in sentiment over time within a text.
By following these steps, you can enhance your sentiment analysis project with the capability to visualize data, making your findings more accessible and understandable.
Additional Packages
If your project requires additional Python packages, you can install them using pip in the same way
you installed nltk . Just replace nltk with the name of the package you need to install. For example: bash
pip install package_name
Replace package_name with the name of the package you wish to install.
https://canvas.ucdavis.edu/courses/855436/assignments/1212335 6/24
Code Help
2024/3/22 13:25 FINAL PROJECT :: Sentiment Analysis Tool and Text Analysis Report Conclusion
By following these steps, you can set up your Python environment with the necessary packages for the sentiment analysis project. Using pip simplifies the management of external libraries, allowing you to focus on developing your project.
Introduction to NLTK in Sentiment Analysis
The Natural Language Toolkit (NLTK) is an essential library in Python for working with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources, such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning. In this project, NLTK plays a pivotal role in processing and analyzing text for sentiment analysis. Here’s an overview of the NLTK functions we’ve used so far, designed to give you a solid understanding without needing to search elsewhere.
Tokenization
Tokenization is the process of breaking up a string, text, or sentence into a list of words or sentences. It’s a fundamental step in text analysis that allows us to work with individual components such as words or sentences.
1. nltk.sent_tokenize(text) : This function is used to split a document or paragraph into sentences. It takes a string containing a text document as input and returns a list of sentences. The function utilizes an instance of PunktSentenceTokenizer from the nltk.tokenize.punkt module, which is trained and thus very well-suited for English by default.
import nltk nltk.download(‘punkt’) # Make sure to download the ‘punkt’ resource sentences =
nltk.sent_tokenize(“This is an example. Here’s another sentence.”)
3. nltk.word_tokenize(text) : After splitting the text into sentences, we often need to further break down each sentence into words. This function takes a string (a sentence) and returns a list of words. It’s a wrapper function that calls PunktWordTokenizer .
1. words = nltk.word_tokenize(“This is an example sentence.”)
NLTK Data Download
Before using certain NLTK functions like tokenization, it’s necessary to download specific resource
files such as punkt . This resource file contains pre-trained models that assist in tokenization.
nltk.download(‘punkt’) : This code snippet downloads the punkt package, which contains models for unsupervised machine learning tokenization. You only need to run this once to make the resources available for your NLTK installation.
import nltk nltk.download(‘punkt’)
Code Help, Add WeChat: cstutorcs
2024/3/22 13:25 FINAL PROJECT :: Sentiment Analysis Tool and Text Analysis Report
Using NLTK in This Project
In this sentiment analysis project, you’re going to leverage NLTK’s tokenization capabilities to preprocess text data:
Sentence Tokenization: Use nltk.sent_tokenize to divide a piece of text into individual sentences. This is crucial for analyzing the sentiment of each sentence within a larger text.
Word Tokenization: Apply nltk.word_tokenize to break down each sentence into its constituent words. This step is necessary to examine each word’s sentiment and to identify negations, intensifiers, and downtoners within the context of the sentence.
By utilizing these functions, you’ll be able to dissect and analyze the text at a granular level, enabling effective sentiment analysis. Remember, the first step before using these tokenization functions is to ensure the punkt package is downloaded using nltk.download(‘punkt’) .
NLTK’s tokenization functions are straightforward to use but powerful in processing text for analysis. With these tools, you’re well-equipped to handle the text preprocessing needs of your sentiment analysis project.
Introduction to matplotlib
matplotlib is the most widely used Python library for plotting. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK. Some of the reasons it’s considered easy for beginners include:
Simplicity: With matplotlib , you can create basic plots like line charts, scatter plots, and histograms with just a few lines of code.
Documentation and Tutorials: There’s a vast amount of documentation, tutorials, and examples available online that cover how to use for various types of data visualization. Flexibility: While simple to use for basic plots, also supports a wide range of advanced plotting features and customization options as users become more comfortable with the library.
Example of a simple line plot with matplotlib :
matplotlib
matplotlib
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
# Create a line plot
plt.plot(x, y)
plt.xlabel(‘X Axis Label’)
plt.ylabel(‘Y Axis Label’)
plt.title(‘Simple Line Plot’)
plt.show()
https://canvas.ucdavis.edu/courses/855436/assignments/1212335 8/24
2024/3/22 13:25 FINAL PROJECT :: Sentiment Analysis Tool and Text Analysis Report
Introduction to Using Classes in Python for Organizational Purposes
In Python, classes are a fundamental aspect of code organization and structure. As you embark on developing your sentiment analysis tool, understanding how classes can be utilized as an organizational tool will be essential. Although we won’t dive deep into the broader concepts of Object- Oriented Programming (OOP), we’ll focus on how classes help organize code through encapsulation and their practical use in this project.
Encapsulation and Code Organization
Classes provide a way to bundle data (attributes) and functions (methods) into a single unit, known as encapsulation. This concept is crucial for several reasons:
Encapsulation helps in organizing code: By grouping related properties and behaviors, classes make code more readable and maintainable.
It reduces complexity: Working with encapsulated code means you can focus on how to use an object rather than how the object is implemented.
Improves code reusability: Once a class is written, it can be used in multiple projects without needing to rewrite code.
In the context of your sentiment analysis project, you’ll create a SentimentAnalyzer class. This class will encapsulate all the functionality needed to determine the sentiment of texts, such as handling positive and negative keywords, negation, and modifiers (intensifiers and downtoners).
How to Use the SentimentAnalyzer Class
The SentimentAnalyzer class will be the core of your sentiment_analyzer.py file. Here’s a simplified overview of how you’ll use this class in your project:
1. Initialization: When you create an instance of the SentimentAnalyzer , you’ll provide it with lists of positive and negative keywords. Optionally, you can also customize the lists of negation words, intensifiers, and downtoners.
from sentiment_analyzer import SentimentAnalyzer analyzer = SentimentAnalyzer(positive_keywords,
negative_keywords)
3. Analyzing Text: Once initialized, you can use the analyzer to assess the sentiment of texts. The class encapsulates all the logic for this process, so you don’t need to worry about the implementation details.
1. sentiment_result = analyzer.analyze_sentiment(text)
2. Reporting: The analyze_sentiment method returns a structured result that includes the overall sentiment, counts of positive and negative sentiments, and other details. You can use this information in your main.py to report on the sentiment of various texts.
https://canvas.ucdavis.edu/courses/855436/assignments/1212335 9/24
2024/3/22 13:25 FINAL PROJECT :: Sentiment Analysis Tool and Text Analysis Report Project Structure
For this project, you’ll be submitting two Python files:
1. sentiment_analyzer.py : This file contains the SentimentAnalyzer class. It encapsulates all the functionality for sentiment analysis, making your code organized and reusable. Don’t worry about creating the class structure, we have provided detailed scaffolding below.
2. main.py : This is your project’s main file, where you import and use the SentimentAnalyzer class to analyze texts from the selected datasets. This file will demonstrate how the class is utilized and will contain your analysis and findings.
Classes in Python, such as the SentimentAnalyzer , serve as powerful tools for organizing code, encapsulating complexity, and enhancing code reusability. By following this structure, you will create a well-organized project that effectively uses classes to accomplish its goals.
Detailed scaffolding with an extensive documentation and built in asserts to help you start testing has been provided for you. Download sentiment_analyzer.py (https://canvas.ucdavis.edu/courses/855436/files/23491827?wrap=1) (https://canvas.ucdavis.edu/courses/855436/files/23491827/download?download_frd=1) to get started.
Implementing the Sentiment Analyzer Class
Your task is to implement a Python class named SentimentAnalyzer that can analyze the sentiment of sentences based on positive and negative keywords, negations, and the impact of modifiers such as intensifiers and downtoners.
Objectives
1. Complete the SentimentAnalyzer Class: Utilizing the provided class scaffold, you will implement methods to analyze the sentiment of individual sentences. This includes identifying positive and negative keywords, handling negation words that invert sentiment, and adjusting sentiment scores with modifiers.
2. Testing Your Implementation: Validate the accuracy of your sentiment analyzer by writing tests that check its functionality against a series of assertions. These tests should cover a variety of scenarios including different combinations of keywords, negations, and modifiers. Upload your code to gradescope when you think that it’s ready to pass broader tests.
Implementation Details
Keywords Identification: Your sentiment analyzer should correctly identify positive and negative keywords within sentences and adjust the sentiment score accordingly.
Negation Handling: Implement logic to detect negation words immediately preceding sentiment keywords. Negation words should invert the sentiment of the following keyword.
https://canvas.ucdavis.edu/courses/855436/assignments/1212335 10/24
2024/3/22 13:25 FINAL PROJECT :: Sentiment Analysis Tool and Text Analysis Report
Modifiers Impact: Incorporate the ability to adjust sentiment scores based on the presence of intensifiers and downtoners. Intensifiers should amplify the sentiment effect, while downtoners should diminish it.
Combining Modifiers and Negation
When combining modifiers with negation, consider the immediate impact of the modifier on the negation or the sentiment word it precedes. This combination alters the sentiment score based on the type of modifier (intensifier or downtoner) and its placement relative to the negation and sentiment keyword. Here’s how you should handle it:
1. Intensifier with Negation (e.g., “definitely not good”): The intensifier amplifies the strength of the negation, leading to a stronger negation effect on the subsequent sentiment word. The score is adjusted more significantly than a simple negation.
2. Downtoner with Negation (e.g., “barely not good”): The downtoner reduces the impact of the negation on the sentiment word, leading to a softer negation effect. The adjustment to the sentiment score is less severe compared to a straightforward negation.
In both cases, apply the appropriate multiplier to the sentiment score for the word following the negation, reflecting the modifier’s impact. This approach ensures that the sentiment analysis accurately captures the nuanced meanings conveyed by combinations of modifiers and negations in sentences.
You do not have to handle negations that preceding modifiers for this portion of the project.
Parameters for Customization: Ensure your analyze_sentence_sentiment method respects the use_negation and use_modifiers boolean parameters. This allows for flexibility in the analysis
process based on whether negations and modifiers should be considered. Testing and Validation
Use the main method within the SentimentAnalyzer class as a template to test your implementation. The method should contain assertions that validate the functionality of your sentiment analyzer under various conditions.
Add additional test cases that challenge your sentiment analyzer with complex sentences. Include scenarios with multiple sentiment keywords, consecutive negations, and varied intensifiers and downtoners to ensure robustness.
Submission Guidelines
Submit the completed SentimentAnalyzer class ( sentiment_analyzer.py ). Make sure your class passes all the test cases you’ve written in the main method.
Document your implementation thoroughly. Include comments explaining your logic, especially for the sentiment analysis calculations and how negations and modifiers are handled.
This part of the project will lay the foundation for analyzing sentiment in text data. By completing this task, you will gain valuable experience in implementing logic for natural language processing, a key skill in the field of data science and machine learning.
https://canvas.ucdavis.edu/courses/855436/assignments/1212335 11/24
2024/3/22 13:25 FINAL PROJECT :: Sentiment Analysis Tool and Text Analysis Report
Part 2 :: Analyzing External Datasets
Analyzing Project Gutenberg Texts: Unveiling Sentiment Arcs
Introduction to Project Gutenberg
Project Gutenberg is a digital library offering over 60,000 free eBooks, focusing on older works for which U.S. copyright has expired. For literary researchers and enthusiasts alike, it’s a valuable resource for accessing a wide array of classic texts. In this project, you will leverage texts from Project Gutenberg to conduct a sentiment analysis, providing insights into the emotional journey within literary works.
Understanding Story Arcs
A story arc is the transformation or journey a narrative undergoes from its beginning to conclusion, marked by significant events, character development, and thematic evolution. Sentiment arcs, a subset of story arcs, trace the emotional fluctuations throughout a story. They reveal how authors craft narratives to evoke emotional responses, build tension, and drive the story forward. Analyzing sentiment arcs helps us understand narrative techniques, thematic depth, and character progression.
Sentiment Analysis Overview
Using the SentimentAnalyzer class developed in Part One, you will compute sentiment scores for sentences within each chapter of your chosen texts. This involves evaluating whether sentences carry positive, negative, or neutral sentiments and aggregating these to understand the overarching emotional tone of each chapter.
Moving Averages: Smoothing the Arc
A moving average is a statistical method used to analyze data points by creating a series of averages of different subsets of the full data set. In the context of sentiment arcs, applying a moving average smooths out short-term fluctuations and highlights longer-term trends or cycles.
Here’s how you can compute a simple moving average for your sentiment scores:
def compute_moving_average(scores, window_size=3):
Computes the moving average of sentiment scores.
scores (list of float): The list of sentiment scores.
window_size (int): The number of scores to include in each average calculation.
list of float: A list of the moving averages.
moving_averages = []
for i in range(len(scores)):
if i+window_size <= len(scores):
window_average = sum(scores[i:i+window_size]) / window_size
moving_averages.append(window_average)
return moving_averages
2024/3/22 13:25 FINAL PROJECT :: Sentiment Analysis Tool and Text Analysis Report
Adjust the window_size parameter based on your analysis needs. Smaller windows provide closer adherence to the original data, while larger windows offer smoother arcs.
Plotting Sentiment Arcs
After computing the moving averages, plot these values to visualize the sentiment arc. The x-axis represents the chapters or segments, and the y-axis represents the sentiment score. This visual representation allows you to observe the emotional journey of the narrative and analyze how sentiments evolve over time.
Exploring Sentiment Arcs
Consider the following when analyzing sentiment arcs:
How do sentiment arcs reflect the narrative structure of the story? Are there clear patterns associated with key events or turning points in the plot?
How do different authors manipulate sentiment to enhance storytelling? Compare sentiment arcs across genres, authors, or periods to uncover stylistic