2024/3/29 11:00 Project 2B: Ngordnet (Wordnet) | CS 61B Spring 2024
https://sp24.datastructur.es/projects/proj2b/ 1/15
Projects / Project 2B: Ngordnet (Wordnet)
Each assignment will have an FAQ linked at the top. You can also access it by adding “/faq” to
the end of the URL. The FAQ for Project 2B is located here.
In this project, youʼll complete your implementation of the NGordnet tool.
As this is a quite new project, there may be occasional bugs or confusion with the spec. If you
notice anything of this sort, please post on Ed.
IMPORTANT NOTE: After you read the 2B spec, you may be tempted to start coding.
Donʼt do this!
Before implementing ANY code for 2B, please read the 2C spec, as your design may
change depending on 2C. You can find it here.
Then, complete the Project 2B/C: Checkpoint and Design Document before starting
When designing your project, think about all of the requirements in advance. Planning ahead
will ensure you donʼt need to rewrite all of your code when you get to certain points in the
If you find yourself copy-pasting or repeating a lot of the same code youʼve already written,
there is probably an opportunity to reuse it directly, or slightly modify it so you donʼt have to
repeat yourself as often.
Project 2B: Ngordnet (Wordnet)
Checkpoint & Design Doc Due 03/15/2024
Coding Due 04/01/2024
Design Notes
Project Setup
https://sp24.datastructur.es/
https://sp24.datastructur.es/projects/
https://sp24.datastructur.es/projects/proj2b/faq/
https://sp24.datastructur.es/projects/proj2c/
https://www.gradescope.com/courses/708063/assignments/4133684
https://www.gradescope.com/courses/708063/assignments/4187810
https://sp24.datastructur.es/projects/proj2b/faq/
2024/3/29 11:00 Project 2B: Ngordnet (Wordnet) | CS 61B Spring 2024
https://sp24.datastructur.es/projects/proj2b/ 2/15
THE SETUP FOR THIS PROJECT IS DIFFERENT THAN THE OTHER LABS / PROJECTS.
PLEASE DO NOT SKIP THIS STEP!
Once you are done, your proj2b directory should look like this:
IMPORTANT NOTE: You should really complete Project 2B/C: Checkpoint first before
starting coding, or even designing your project. We think this would be helpful for your
understanding of the project. We will also require you to submit a design document to
Gradescope. More details about the design document can be found in Deliverables and
Complete Project 2B/C: Checkpoint
After finishing the checkpoint, complete Design Document
This part of the project is designed for you to come up with efficient and correct design for
your implementation. The design you come up with will be very important to handle these
cases. Please read 2B & 2C spec carefully before starting your design document.
Skeleton Setup
Similar to other assignments in this class, run git pull skeleton main to get the skeleton
code for this project.
Download the data files for this project using this link and move them into your proj2b
folder on the same level as src .
│ ├── ngrams
│ └── wordnet
├── static
Getting Started
https://www.gradescope.com/courses/708063/assignments/4133684
https://docs.google.com/document/d/1Vx7QAz4HFN0rEFFEt5rocY2X5AWVcIFFpRmD8vhegOM/edit?usp=sharing
https://www.gradescope.com/courses/708063/assignments/4133684
https://www.gradescope.com/courses/708063/assignments/4187810
https://drive.google.com/file/d/160iHOqwR4FAghGshbnSNMwd0idrjZxTR/view?usp=sharing
2024/3/29 11:00 Project 2B: Ngordnet (Wordnet) | CS 61B Spring 2024
https://sp24.datastructur.es/projects/proj2b/ 3/15
The course staff has created a couple of introductory videos to the project and the starter
code available here. Bear in mind we have changed the structure of the project so some
information might be outdated!
Weʼve also created two wonderful tools that you can (and should!) use to explore the dataset,
see how the staff solution behaves for specific inputs, and get expected outputs for your unit
tests (see Testing Your Code). Weʼll link them here, as well as in other relevant parts of the
Wordnet Visualizer: Useful for visually understanding how synsets and hyponyms work and
testing different words/lists of words for potential test case inputs. Click on the “?” bubbles
to learn how to use the various features of this tool!
Staff Solution Webpage: Useful for generating expected outputs for different test case
inputs. Use this to write your unit tests!
Read through entire 2B/C spec and complete Project 2B/C: Checkpoint
Read through entire 2B/C spec and complete Design Document
Before we can incorporate WordNet into our project, we first need to understand the WordNet
WordNet is a “semantic lexicon for the English language” that is used extensively by
computational linguists and cognitive scientists; for example, it was a key component in IBM s̓
Watson. WordNet groups words into sets of synonyms called synsets and describes semantic
relationships between them. One such relationship is the is-a relationship, which connects a
hyponym (more specific synset) to a hypernym (more general synset). For example, “change”
is a hypernym of “demotion”, since “demotion” is-a (type of) “change”. “change” is in turn a
hyponym of “action”, since “change” is-a (type of) “action”. A visual depiction of some
hyponym relationships in English is given below:
Using the WordNet Dataset
https://www.qxbytes.com/wordnet/
https://ngordnet.datastructur.es/
https://www.gradescope.com/courses/708063/assignments/4133684
https://www.gradescope.com/courses/708063/assignments/4187810
http://en.wikipedia.org/wiki/WordNet
2024/3/29 11:00 Project 2B: Ngordnet (Wordnet) | CS 61B Spring 2024
https://sp24.datastructur.es/projects/proj2b/ 4/15
Each node in the graph above is a synset. Synsets consist of one or more words in English that
all have the same meaning. For example, one synset is “jump, parachuting” , which represents
the act of descending to the ground with a parachute. “jump, parachuting” is a hyponym of
“descent”, since “jump, parachuting” is-a “descent”.
Words in English may belong to multiple synsets. This is just another way of saying words may
have multiple meanings. For example, the word “jump” also belongs to the synset “jump, leap”
, which represents the more figurative notion of jumping (e.g. a jump in attendance) rather the
literal meaning of jump from the other synset (e.g. a jump over a puddle). The hypernym of the
synset “jump, leap” is “increase”, since “jump, leap” is-an “increase”. Of course, there are other
ways to “increase” something: for example, we can increase something through
“augmentation,” and thus it is no surprise that we have an arrow pointing downwards from
“increase” to “augmentation” in the diagram above.
Synsets may include not just words, but also what are known as collocations. You can think of
these as single words that occur next to each other so often that they are considered a single
word, e.g. nasal_decongestant . To avoid ambiguity, we will represent the constituent words of
collocations as being separated with an underscore _ instead of the usual convention in
English of separating them with spaces. For simplicity, we will refer to collocations as simply
“words” throughout this document.
A synset may be a hyponym of multiple synsets. For example, “actifed” is a hyponym of both
“antihistamine” and “ nasal_decongestant”, since “actifed” is both of these things.
If youʼre curious, you can browse the Wordnet database by using the web interface ,
though this is not necessary for this project.
In Project 2B, your primary task is to implement this button, which will require reading in a
different type of dataset and synthesizing the results with the dataset from Project 2A. Unlike
2A, it will be entirely up to you to decide what classes you need to support this task.
Hyponyms (Basic Case)
Setting up a HyponymsHandler
In your web browser, open the ngordnet.html file in the static folder. As a refresher, you
can find how to do that here in bullet point 1. Youʼll see that there is a new button:
“Hyponyms”. Note that there is also a new input box called k .
Try clicking the Hyponyms button. Youʼll see nothing happens (and if you open the
developer tools feature of your web browser, youʼll see that your browser shows an error).
http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&s=jump&i=6&h=000010000000000000000000#c
http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&s=jump&i=2&h=100000000000000000000000#c
http://en.wikipedia.org/wiki/Collocation
http://wordnetweb.princeton.edu/perl/webwn?s=nasal+decongestant+&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=
http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&r=1&s=sturgeon&i=3&h=1000#c
https://sp24.datastructur.es/projects/proj2a/#historytexthandler
2024/3/29 11:00 Project 2B: Ngordnet (Wordnet) | CS 61B Spring 2024
https://sp24.datastructur.es/projects/proj2b/ 5/15
If you see some error like “Could not load file some_file_here.txt ”, it probably means
that your project is not set up correctly. Be sure that you have the same structure as stated
in the Project Setup section.
Next, youʼll create a partial implementation of the Hyponyms button. For now, this button
Assume that the “words” entered is only a single word.
Ignore startYear, endYear, and k.
Return a string representation of a list of the hyponyms of the single word, including the
word itself. The list should be in alphabetical order, with no repeated words.
For example, suppose the WordNet dataset looks like the diagram below (given to you as the
input files synsets11.txt and hyponyms11.txt ). Suppose that the user enters “descent” and
clicks on the Hyponyms button.
In this case, the output of your handler should be the string representation of a list containing
“descent”, “jump” and “ parachuting”, i.e [descent, jump, parachuting] . Note that the words
are in alphabetical order.
Edit the file called HyponymsHandler to simply return the word “Hello!” when the user
clicks the Hyponyms button in the browser. Youʼll need to make the HyponymsHandler
class extend the NgordnetQueryHandler class. See your other Handler classes for
examples. Make sure when you register your handler that you use the string “hyponyms” as
the first argument to the register method, and not “hyponym”.
Start by opening your ngordnet.main.Main.java file.2
Once youʼve modified Main so that your new handler is registered to handle hyponyms
requests, start up Main and try clicking the Hyponyms button in your web browser again.
You should see text appear that says “Hello”.
Hyponyms Handler (Basic Case)
2024/3/29 11:00 Project 2B: Ngordnet (Wordnet) | CS 61B Spring 2024
https://sp24.datastructur.es/projects/proj2b/ 6/15
As another example, suppose weʼre using a bigger dataset such as the one below (given to you
as the input files synsets16.txt and hyponyms16.txt ):
Suppose the user enters “change” and clicks on the Hyponyms button. In this case, the
hyponyms are all the words in the blue nodes in the diagram below:
That is the output is [alteration, change, demotion, increase, jump, leap,
modification, saltation, transition, variation] . Note that even though “change”
belongs to two different synsets, it only appears once.
2024/3/29 11:00 Project 2B: Ngordnet (Wordnet) | CS 61B Spring 2024
https://sp24.datastructur.es/projects/proj2b/ 7/15
Note: Try not to overthink this. Specifically, observe that the output does not include:
Synonyms of synonyms (e.g. does not include “adjustment” )
Hyponyms of synonyms (e.g. does not include “conversion” )
Hyponyms of other definitions of hyponyms (e.g. does not include “flashback” , which is a
hyponym of another definition of “transition” )
Implement HyponymsHandler.java and any helper classes.
Note: Please read the tips below, since you shouldnʼt be writing all of your code in this
To complete this task, youʼll need to decide what classes you need to create to support the
HyponymsHandler . DO NOT DO ALL THE WORK IN HYPONYMS HANDLER. Instead, you
should have helper classes. For example, in 2A, to handle the “History” button, we created
an NGramMap class. Youʼll want to do something similar in 2B, with your own classes.
Youʼll also need to understand the input format of the WordNet dataset. This description is
given in the section below.
For this part, you may NOT import any existing graph library into your code. That is you
canʼt import, for example, the graph implementations from the optional Princeton
algorithms textbook. Instead, you should build your own graph class or classes.
Just like 2A̓s NGramMap, youʼll want your helper classes to only parse the input files once,
in the constructor. DO NOT CREATE METHODS WHICH HAVE TO READ THE ENTIRE
INPUT FILE EVERY TIME THEY ARE CALLED. This will be too slow!
We strongly recommend creating at least two classes for this part of the project as follows:
One which implements the idea of a directed graph. One which reads in the WordNet
dataset and constructs an instance of the directed graph class. This second class should
also be able to take a word and return its hyponyms. You may also want additional helper
classes that represent the idea of a traversal but this is not required – you can implement
your traversal within your graph class as well.
Donʼt worry about writing Truth tests yet, weʼll talk about how to do that later in the spec.
Simply use the web front end to check the two input examples (“descent” and “change”)
from the diagrams above for synsets16.txt and hyponyms16.txt .
2024/3/29 11:00 Project 2B: Ngordnet (Wordnet) | CS 61B Spring 2024
https://sp24.datastructur.es/projects/proj2b/ 8/15
While you can (and should) write unit tests for the helper classes/methods that you create
for this project, another good way to test and see what s̓ going on with your code is to
simply run Main.java , open ngordnet.html , enter some inputs into the boxes, and click
the “Hyponyms” button. You may find visual debugging can lead to some useful discoveries
in this project.
We now describe the two types of data files that store the WordNet dataset. These files are in
comma separated format, meaning that each line contains a sequence of fields, separated by
File Type #1: List of noun synsets. The file synsets.txt (and other smaller files with
synset in the name) lists all the synsets in WordNet. The first field is the synset id (an
integer), the second field is the synonym set (or synset), and the third field is its dictionary
definition. For example, the line
means that the synset { Goofy } has an id number of 6829, and its definition is “a cartoon
character created by Walt Disney”. The individual nouns that comprise a synset are
separated by spaces (and a synset element is not permitted to contain a space). The S
synset ids are numbered 0 through S − 1; the id numbers will appear consecutively in the
synset file. The id numbers are useful because they also appear in the hyponym files,
described as file type #2.
File Type #2: List of hyponyms. The file hyponyms.txt (and other smaller files with
hyponym in the name) contains the hyponym relationships: The first field is a synset id;
subsequent fields are the id numbers of the synset s̓ direct hyponyms. For example, the
following line
means that the synset 79537 (“viceroy vicereine”) has two hyponyms: 38611 (“exarch”) and
9007 (“Khedive”), representing that exarchs and Khedives are both types of viceroys (or
vicereine). The synsets are obtained from the corresponding lines in the file synsets.txt :
WordNet File Format
6829,Goofy,a cartoon character created by Walt Disney Copy
79537,38611,9007 Copy
79537,viceroy vicereine,governor of a country or province who rules…
38611,exarch,a viceroy who governed a large province in the Roman Empire
9007,Khedive,one of the Turkish viceroys who ruled Egypt between…
2024/3/29 11:00 Project 2B: Ngordnet (Wordnet) | CS 61B Spring 2024
https://sp24.datastructur.es/projects/proj2b/ 9/15
There may be more than one line that starts with the same synset ID. For example, in
hyponyms16.txt , we have
This indicates that both synsets 12 and 13 are direct hyponyms of synset 11. These two
could also have been combined on to one line, i.e. the line below would have the exact
same meaning, namely that synsets 12 and 13 are direct hyponyms of synset 11.
You might ask why there are two ways of specifying the same thing. Real world data is often
messy, and we have to deal with it.
To get the “Hyponyms” button working youʼll need to:
Develop a graph class. If you arenʼt familiar with this data structure, take a look at lectures
21 and 22. You should test this with operations that are independent of the given data files.
For example, our tests evaluated that our createNode and addEdge functions yielded
appropriate graphs by using our graph classes s̓ getNodes and neighbors functions. For
inspiration, you can check out Lecture 22 and 23.
Write code that converts the WordNet dataset files into a graph. This could be part of
your graph class, or it could be a class that uses your graph class.
Write code that takes a word, and uses a graph traversal to find all hyponyms of that word
in the given graph.
We strongly recommended writing tests that evaluate queries on the examples above (for
example, you might look at the hyponyms of “descent” in synsets11/hypernyms11, or the
hyponyms of “change” in synsets16/hypernyms16).
Tests should be written at a level of abstraction appropriate to what theyʼre evaluating. For
example, we have a class called TestGraph that evaluates various aspects of our Graph
Or as another example, our code has a class called TestWordNet containing the function
11,12,13 Copy
Suggested Steps to Take
public void testHyponymsSimple(){
https://docs.google.com/presentation/d/1Gyke5ZMrgcMuBTa7qHAoklWZB3yf_yqQAymK2nV-Nqs/edit
https://docs.google.com/presentation/d/1NKWnXSJ8pUn1E2Cw6zidxMRuXf_aINKCpW7XpEkyz6c/edit
2024/3/29 11:00 Project 2B: Ngordnet (Wordnet) | CS 61B Spring 2024
https://sp24.datastructur.es/projects/proj2b/ 10/15
Note that your WordNet class may not have the same functions as ours so the test shown will
probably not work verbatim with your code. Note that our test does NOT use an NGramMap
anywhere, nor is it using a HyponymsHandler , nor is it directly invoking an object of type
Graph . It is specifically tailored to testing the WordNet class. Relying on only browser tests
will be incredibly frustrating (and slow!). Use your JUnit skills to build confidence in the
foundational abstractions that you build (e.g. Graph, WordNet, etc.).
This project involves having to do all sorts of different lookups, graph operations, and data
processing operations. There is no one right way to do this.
Some example lookups that you might need to perform:
Given a word (e.g. “change”), what nodes contain that word?
Example in synsets16.txt: change is in synsets 2 and 8
Given an integer, what node goes with that index?
Necessary for processing hyponyms.txt. For example in hyponyms16.txt, we know that
the node with synset 8 points at synsets 9 and 10, so we need to be able to find node 8
to get its neighbors.
Given a node, what words are in that node?
Example in synsets16.txt: synset 11 contains alteration, modification, and adjustment
Some example graph operations you might need to perform:
Creating a node, e.g. each line of synsets16.txt contains the information for a node.
Adding an edge to a node, e.g. each line of hyponyms16.txt contains one or more edges
that should be added to the corresponding node.
Finding reachable vertices, e.g. the vertices reachable from vertex #7 in hyponyms16.txt are
7, 8, 9, 10.
Your life will be a lot easier if you select instance variables and/or data structures for your
classes that naturally help solve all six of the problems above.
Some example data processing operations:
Given a collection of things, how do you find all non-duplicate items? (Hint: There is a data
structure that makes this very easy and efficient). Donʼt be afraid to also Google
WordNet wn=new WordNet(“./data/wordnet/synsets11.txt”,”./data/wordnet/hyponyms1
assertThat(wn.hyponyms(“antihistamine”)).isEqualTo(Set.of(“antihistamine”,”acti
Design Tips
2024/3/29 11:00 Project 2B: Ngordnet (Wordnet) | CS 61B Spring 2024
https://sp24.datastructur.es/projects/proj2b/ 11/15
documentation for the data structure that you choose (e.g. if you choose to use a TreeMap
for whatever reason, feel free to look up “TreeMap methods java”, “Map methods java”, or
“Collection methods java”, etc).
Given a collection of things, how do you sort them? (Hint: Google how to sort the collection
that youʼre using)
Also, a reminder from proj2a: Deeply nested generics are a warning sign that you are doing
something too complicated. Either find a simpler way or create a helper class to help
manage the complexity. For example, if you find yourself trying to use something like
Map