CPSC310-22W2_ Intro to SE Checkpoint 1

UBC CPSC310-22W2: Intro to SE – Checkpoint 1

Project Grading

Checkpoint 0

Checkpoint 1

UBC CPSC310-22W2: Intro to SE

Checkpoint 1

Learning Outcomes
Can model different data (a dataset , a query) into suitable data structures and reason about alternative representations.

Can understand and work with external libraries (fs-extra, JSZip) by following documentation.

Able to reason about and debug programs with asynchrony (Promises, async/await).

Able to handle exceptions properly for programs involving asynchrony.

Can parse and validate a query based on the given EBNF grammar.

Can extract a set of white box tests for a particular implementation.

1. Checkpoint 1 Introduction
2. Grading
2.1 Teamwork Score
2.1.1 Labs (Scrum Attendance)
2.1.2 Weekly Reports (Contribution Statement)
2.2 Requirements
2.3 Submitting Your Work
2.4 AutoTest Feedback
Main Branch [Smoke Test Feedback]
Feature Branches [Build Feedback]
#check Feedback
3. Implementation
3.1 Repository Setup
Getting your team repository
Git Branches
3.2 Changes from C0
Reference Implementation Query Results Ordering Update
3.3 Implementing insightUBC
3.3.1 Dataset Processor: Modeling Sections Data
3.3.2 Query Engine
3.4 Advice
fs-extra Package Usage
Dividing Work
4. Resources
4.1 Getting started
4.2 Recommended Videos & Tutorials
Example repository
folder-test
4.3 Common Issues
Implicitly has ‘any’ type”
IntelliJ/Webstorm Test Timeouts
IntelliJ/Webstorm Not Compiling Files

Change Log
[Tuesday Jan 24th] Section 2.4 AutoTest Feedback: Updated to include information about the #check command.

[Wednesday Jan 25th] Section 2.1.2 Weekly Reports: Updated with concrete link

1. Checkpoint 1 Introduction
In Checkpoint 0, you created a test suite against the insightUBC Section’s specification.  In Checkpoint 1, you will design and build an implementation of that specification, and possibly write more (white-box) tests for your implementation.
Teams.  This checkpoint and all future checkpoints will be completed in teams of two. You must use the same partner for the duration of the project, no changes will be permitted. If you do not have a  partner, the TAs will help you find one during the first lab after Checkpoint 0. Your partner must be in your lab.
Labs.  Labs are now mandatory and will be for the remainder of the term. During the labs, you and your partner will meet with your TA to discuss progress on the project and any issues you’ve come across. 

2. Grading
We will check for copied code on all submissions, so please make sure your work is your own. 
Your grade for Checkpoint 1 will be from AutoTest. We will run a private Client Test Suite against your implementation and your grade will be calculated as:
 (number of passing tests) / (total number of tests)
For example, if our test suite has 10 tests and when we run them against your implementation 8 pass, your grade is 80% (8/10).

2.1 Teamwork Score
Now that you are working in a team, there is a teamwork score which is a part of your overall project grade. The teamwork score is comprised of scrum attendance and weekly reports. See the Project Grading page for details.

2.1.1 Labs (Scrum Attendance)
Attending labs is required for the remainder of the term. During labs, you and your partner will meet with your TA to discuss progress on the project and any issues you come across. 

2.1.2 Weekly Reports (Contribution Statement)
A weekly scrum report will be required to detail your contribution to your group’s project. This report must be done at least 24 hours before your lab so your mentor TA can read it and prepare for the scrum! 
Link to the survey: https://ubc.ca1.qualtrics.com/jfe/form/SV_1ZywZT6exTiImQC 

2.2 Requirements
You cannot use any library package that is not already specified in this document or required for TypeScript compilation (i.e.: types). 

Your implementation must be in TypeScript.

You are not allowed to store the data in any external database, only disk storage is permitted. 

Do not store your datasets as static or global variables, keep them as members of a class. The reason for this is that we try to wipe added datasets in between tests. This won’t work if your data is stored statically, and one test behaving incorrectly may cause future tests to fail as well. This will cause you to get a lower grade.

2.3 Submitting Your Work
We will grade every commit on the main branch of your repository. Your grade will be the maximum grade you received from all commits on the main branch made before the checkpoint deadline. 
From C1 and onwards, the main branch of your repo will be treated differently from the rest. Only commits merged to main are eligible to be assessed by the full private Client Test Suite used to register a grade. Your project is automatically graded every time you merge a PR to the main branch. 
The only timestamp AutoTest trusts is the timestamp associated with a git push event. This is the timestamp that will be used for determining whether a commit was made before a deadline (this is because commit timestamps can be modified on the client side, but push timestamps are recorded on the AutoTest server itself). Make sure you push before any deadline and know that any pushes after the deadline (even if some commits within that push appear to come from before the deadline) will not be considered.

2.4 AutoTest Feedback
For C1-C3 you cannot preview your grade before the deadline. Instead, to ensure you are confident about your implementation, you should:
Write your own tests and use them to ensure your implementation is accurate.

Review the Smoke Test feedback from AutoTest on the main branch.

Run @310-bot #check to evaluate the accuracy of your test suite. #check will let you know if any of your tests are written incorrectly, as it runs your tests against our implementation.

Main Branch [Smoke Test Feedback]
On the main branch, you will receive information on how your implementation performs against our Smoke Test Suite. You can request the Smoke Tests results by creating a comment with @310-bot #c1 on a commit. 
You can only receive Smoke Test Feedback once every 12 hours per person. If your code does not compile or lint (e.g., yarn tsc && yarn lint), AutoTest will not run the rest of the commit and this will not use up your once every 12 hours feedback – e.g. you can request again immediately afterwards.
The Smoke Test Suite is comprised of basic tests, and is a subset of the private Client Test Suite. A smoke test is the most basic, simplest test possible and most likely you should already have similar tests in your own test suite. Below is an example of smoke test feedback that you might see.
If a smoke test is failing, it means that your suite is not strong enough and should be strengthened. The tests AutoTest runs are exactly the same as the ones you can write yourself. (NOTE: this does not suggest that you should randomly write more tests, but rather, you should continuously strengthen your test suite by examining the checkpoint specification).

Feature Branches [Build Feedback]
AutoTest will report any build, lint, or timeout failures on your feature branches. It should automatically report this feedback, but you can always request it without any restrictions with @310-bot #c1. 

#check Feedback
The #check command has been added to provide additional information about your test suite (not your implementation): 
Missing Files. Missing files will cause your build to error. 

Test Feedback. How your test performed against our implementation. This feedback is useful for determining if your tests are accurate (e.g. you are understanding the specification correctly).

Performance Hints. If your test suite takes too long, we will give you hints at ways to improve it. This includes how many slow addDataset calls, large queries and any unhandled promises.

Ensure that your tests do not rely on your underlying implementation (aka reference any code in your src directory, as your src directory will be replaced).
Like other commands (e.g. #c1), submissions are limited. Each team member can call #check once every 6 hours. For example, if Bob and Alice are on a team, then Bob can run it at 12pm and Alice at 1pm. However, now Bob must wait until 6pm and Alice until 7pm to call it again.

Timeouts can occur for a variety of reasons and they can be tricky to resolve. A timeout can occur because of (1) your test suite or (2) your implementation. 
Test Suite Issues:
AutoTest runs your tests against your implementation, so if your tests are slow enough you will have a timeout. To determine if the timeout is caused by your test suite, we recommend running the @310-bot #check command for AutoTest. The above section outlines the type of feedback you will get.
Implementation Issues:
Below are some common reasons for timeouts to occur due to your implementation, ordered from most to least frequent:
Not handling promises. If you are unsure what it means to “handle” a promise, checkout the Async Cookbook.
Ensure that all created promises resolve. Every path of every promise should eventually resolve or reject. Below is a promise that will timeout.

return new Promise((resolve, reject) => {
    if (false) {
        resolve(3)
Inefficient code. This can be seen as nested loops, redundant loops, reading or writing too often to disk. 

You can try to diagnose this by timing how long different parts of your implementation take using console.time(). Unzipping the zip file will take a long time but that is not the cause of your timeout since all teams unzip files the same way. 
Too much logging/printing (e.g., console.log(myGiantArray))

If you are producing more than 5 MB of output to the standard output you will receive a timeout error. It is easy to generate 100s of MB of output; please think carefully about your log statements.
OutOfMemory errors. Be careful about creating or pushing elements into data structures (e.g., Object, Array). Are you creating unnecessarily large/complex objects/arrays? 

Make sure you aren’t importing JSON from anywhere. You should NOT have the following as an import: import JSON from “json5”.

After reviewing the common reasons above, we recommend chatting with a TA or creating a Piazza post.

3. Implementation
This checkpoint involves implementing the insightUBC project! Your implementation should follow the insightUBC Section’s Specification outlined in Checkpoint 0.

3.1 Repository Setup

Getting your team repository
You will be provisioned a new repository to use for C1 and onwards. You and your partner will share this common repository. A team creation tool will be available in Classy after the add/drop deadline. Once available, please specify your team through Classy at https://cs310.students.cs.ubc.ca/. We will create project repositories daily for the first week, so you will be provisioned your team repository within 24 hours of creating your team. All work must take place in this repo (including all future checkpoints). 

Git Branches
Now that you are working as a team, you will need to use git branches to coordinate implementation. The main branch will be protected so that you cannot commit or push to it directly, you can only merge changes via pull requests. Each pull request must also be approved by your partner. 
Using version control branches is a great way to make it easier to work with your partner, but it is important that you merge your branches with main periodically. Having more than 3 branches is considered an anti-pattern, and stale branches should be deleted. 

3.2 Changes from C0

From this checkpoint onwards, your project will be subject to a linter to ensure that your code adheres to a standard of maintainability. Quoting ESLint’s official documentation: “ESLint statically analyzes your code to quickly find problems.” i.e., it’s for protecting you from yourself. If your code has common readability, maintainability, or functionality errors, the project will not build. This ensures that common errors are easily caught before runtime by a linter, which will save you hours of debugging later.

Reference Implementation Query Results Ordering Update
We’ve updated our reference implementation to no longer return results in the same order as the Reference UI. What does this mean? 
When running your tests against our implementation using the command @310-bot #check, your tests which used to pass, might now fail. The order of the query results in your test’s expected might differ from the order of the results our implementation is returning, even though they both contain the same items. If the assertions in your tests are too strict (i.e., they depend on the order being identical to that from the reference UI), your tests will fail.
You will want to update your test assertions to not rely on order when it is not explicitly set. One solution is to have two instances of folderTest with two different assertResult implementations – one which asserts order (for queries where this is specified) and one which doesn’t (for queries where order is not specified).

3.3 Implementing insightUBC
There are two main parts to implementation in C1: the Dataset Processor and the Query Engine. The Dataset Processor roughly corresponds to the addDataset method of insightFacade, and the Query Engine to performQuery. Let’s first discuss the Dataset Processor.

3.3.1 Dataset Processor: Modeling Sections Data
In order for insightUBC to be able to answer all kinds of questions about datasets, it must first load and process the data from the given zip files. You will take the dataset zips you’ve seen in C0, check that they are indeed valid datasets, and convert them into data model(s) of your choice. There are many good ways to model a section. For example, you could represent each section with its fields as a Typescript class. Try to reason about different representations, keeping in mind that your Query Engine will be working with this representation when answering queries. 

3.3.2 Query Engine
insightUBC also needs a Query Engine, so that it can answer questions about the datasets processed by the Dataset Processor. The Query Engine takes a JSON query, parses it, and validates that it is both syntactically and semantically correct. You will also implement the code to find the desired subset of all your datasets that matches a query.

Modeling Queries
As with a Section in a dataset, you will want to give a query a representation within your code. Coming up with a good model might take a couple of tries, but try to reduce a query into smaller parts that are more manageable. 
For example, let’s consider this sample query in the original JSON format. One way you could model a query is as a recursive tree structure (aka AST). One benefit of this representation is that it naturally converts from the EBNF used to specify the grammar of a query.
At the top level, we have a query. A query consists of two sub-components, a WHERE block and an OPTIONS block. And the WHERE block, again, consists of a single sub-component, the MComparator. Similarly, the OPTIONS block further decomposes into the COLUMNS and ORDER components.

This kind of decomposition is nice because it achieves separation of concerns. You could have a function that only cares about handling of the WHERE block, while another function is responsible for handling the OPTIONS block. The goal here is to make each function easier to reason about, code, and test.
A tip: If you’re going for such a representation of a query, your old friend recursion might come in handy!

3.4 Advice

fs-extra Package Usage
It is common to misuse the fs-extra methods when reading and writing files to disk. Unfortunately, misuse can cause timing issues which may appear as tests failing on AutoTest (not locally) on every fifth run or only in future checkpoints when things begin to slow down. These issues are tricky to diagnose but easy to fix and prevent! 
To avoid all of this pain please make sure to read the documentation carefully: fs-extra documentation. 
Below is an example of how not to use the package:
function createFile() {
   fs.writeJson(‘./package.json’, {name: ‘fs-extra’})

A testing anti-pattern is to only have integration tests (e.g., tests that directly evaluate addDataset, removeDataset, listDatasets and performQuery). A much more robust testing strategy that makes it easier to implement new features and isolate failures is to write unit tests against the individual methods in your implementation.
To implement the API you will likely have to create your own additional methods and classes.
The best way to test your system is via your own unit test suite. This will be the quickest and easiest way to ensure your system is behaving correctly and to make sure regressions are not introduced as you proceed further in the project. 

Dividing Work
Writing code may not be the hardest part about C1! It’s important to communicate well with your partner, make sure both of you have read the specification, and plan what responsibilities each person will undertake.
One way to “split” C1 is into the Dataset  Processor (addDataset, removeDataset, listDataset) and the Query Engine (performQuery).  An issue you may experience while doing so, is that the Query Engine depends on datasets loaded by the Dataset Processor to produce query results. If you find yourself waiting for your partner to implement addDataset, you should first work on validating the query structure, as this task does not depend on any datasets being loaded. Your team can also discuss ahead of time what the Sections data will look like (after being processed by Dataset Processor), so you can test performQuery against some mocked Sections data. 

We also recommend pair programming, especially when implementing complex algorithms, or while debugging. It is a great way to ensure that both of you have a shared understanding of the entire program, and to catch mistakes that will otherwise get missed.

4. Resources

4.1 Getting started
This specification might seem intimidating, but keep in mind that this project has the same interaction mechanism as most software systems:
It consumes input data (the zip file).

It transforms the data (according to the query).

It returns a result.

There is no best way to get started, but you can consider each of these in turn. Some possible options that could be pursued in any order (or skipped entirely):
Start by looking at the data file we have provided and understand what kind of data you will be analyzing and manipulating. This will help you think through the types of data structures you may want to create (this is a precursor to step 1 above).

Look at the sample queries in the specification. From these queries, figure out how you would want the data arranged so that you can answer these queries (this is the precursor to step 2 above).

Ignoring the provided data, create some fake data (maybe for one section of one course). Write the portion of the system that queries this fake data (this is step 2 above).

Like the above, using some fake data and a fake query processor, write code that would return the fake data correctly and with the correct error codes (this is step 3 above).

Trying to keep all of the requirements in mind at once is going to be overwhelming. Tackling a single task that you can accomplish in an hour is going to be much more effective than worrying about the whole specification at once. Iteratively growing your project from one small task towards the next small task is going to be the best way to make forward progress.

4.2 Recommended Videos & Tutorials
The following resources have been created by course staff to assist you with the project.

Typescript: an introduction to TypeScript.
Promises: an introduction to promises and their asynchronous properties.

Git Cookbook: learn the basics of Git.
Async Cookbook: learn about promises and the differences between synchronous and asynchronous code.

Example repository
Addition Calculator: a basic project that uses Typescript, Asynchronous Code (Promises), FolderTest, Mocha, Chai, Node/Yarn, Chai-as-Promised. It provides an example of how to develop and test an asynchronous method. 

folder-test
folder-test Documentation
Addition Calculator: provides an example of how to use folder-test for an asynchronous method.

4.3 Common Issues

Implicitly has ‘any’ type”
Take a look over here. When you use the Object type in TypeScript, the compiler doesn’t like not knowing what types are expected from the key/value pairs. The best solution is to use an interface (first solution in the link), but if you need a quick one time solution you can also do a cast (second solution in the link). Keep in mind that ‘one time fixes’ often end up being more than one time, doing it right the first time can save you a lot of hassle in the long run. Do not use the third solution (editing tsconfig).
You can also create quick inline interfaces for objects:
let stringToNumMap: {[id: string]: number} = {};  // TypeScript may also ask for the new cast syntax
let maybeNum: any = 3; 
let num: number = 1; 
let new = num + (maybeNum as number); // Note this ‘cast’ is only compile time, does not actually cast/convert

IntelliJ/Webstorm Test Timeouts
The command we write for you to execute your test suite , yarn test, has a timeout parameter that is set to 10 seconds. You can find where the command is defined in your package.json:
“test”: “mocha –require ts-node/register –timeout 10000 –extension .spec.ts –recursive test”,
When you execute your tests within IntelliJ, by pressing the green arrow with “Run “, it uses a different configuration. If you want to keep the increased timeout, you’ll need to manually update your Mocha configuration.
Open the Run/Debug Configuration Dialog as shown here. 

Click “Edit Configuration templates…” which appears at the bottom left of the dialog. 

Click on the Mocha Template.

Add –require ts-node/register –timeout 10000 to the “Extra Mocha options” and click “Apply”.

Delete all old Mocha configurations. Now, when you create a new configuration, it should use this increased timeout.

IntelliJ/Webstorm Not Compiling Files
Make sure typescript is enabled by going to Preferences > Languages & Frameworks > Typescript and check if the checkbox ‘enable typescript compiler’ is checked. If it is not, check it and you should be good to go.
Also if you have set up Mocha in WebStorm, you can enable the Compile TypeScript option as a ‘Before launch’ setting to make sure it has always compiled when you run your tests. If it’s still not compiling, you can always run yarn build manually, and check with a TA during lab or in office hours.