UCL CS 0019 Brad Karp
Individual Unassessed Coursework 2: Debugging Memory Allocator
Due date: 4 PM, 1st February 2024
Value: Unassessed (mark given but not part of module mark)
Introduction
C programmers (that would be us) allocate and free memory explicitly. This means we can write fast code for modern machines, because we have full control over memory. The bad news is that it’s all too easy to write programs that crash due to memory problems. But wait: as systems programmers, we can build tools to help us debug memory allocation problems. For instance, in this coursework, you will transform a simple memory allocator (e.g., implementation of malloc and friends) into a debugging memory allocator.
1. Transform the malloc library we give you into a debugging malloc library that:
• Tracks memory usage;
• Catches common programming errors (e.g., use after free, double free);
• Detects writing off the end of dynamically allocated memory (e.g., writing 65 bytes into a 64-byte piece of memory);
• Catches less common, somewhat devious, programming errors, as described in the remainder of this handout.
2. Augment your debugging malloc library with heavy hitter reporting, which tells a program- mer where most of the dynamically allocated memory is allocated.
While the above tasks may at first sound imposing, they are achievable in not all that much code. The remainder of this handout provides guidance in how to achieve them (as do the tests we provide for your implementation). Read this handout in its entirety carefully before you begin!
C memory allocation uses two basic functions, malloc and free. void *malloc(size t size)
Allocate size bytes of memory and return a pointer to it. This memory is not initialized (it can contain anything). Returns NULL if the allocation failed (because size was too big, or memory is exhausted, or for whatever other reason).
It is important to get started early—CW2 is not trivial! You will need the two weeks allotted to complete it.
Programming Help
void *free(void *ptr)
Free a single block of memory previously allocated by malloc. The rules of malloc and free are simple: Allocate once, then free once.
• Dynamically allocated memory remains active until explicitly freed with a call to free.
• A successful call to malloc(sz) returns a pointer (ptr) to “new” dynamically allocated memory. This means that the sz bytes of data starting at address ptr are guaranteed not to overlap with the program’s code, its global variables, its stack variables, or with any other active dynamically allocated memory.
• The pointer argument in free(ptr) must either equal NULL or be a pointer to active dynamically allocated memory. In particular:
– It is not OK to call free(ptr) if ptr points to the program’s code, or into its global variables, or into the stack.
– It is not OK to call free(ptr) unless ptr was returned by a previous call to malloc.
– It is not OK to call free(ptr) if ptr is currently inactive (i.e., free(ptr) was previously called with the same pointer argument, and the ptr memory block was not reused by another malloc()).
These errors are called invalid frees. The third error is also called a double free. Some notes on boundary cases:
• malloc(0) may return either NULL or a non-NULL pointer. If ptr = malloc(0) is not NULL, then ptr does not overlap with any other allocation and can be passed to free().
• free(NULL) is allowed. It does nothing.
• malloc(sz) returns memory whose alignment works for any object. (We’ll discuss align- ment in class; for a preview, see CS:APP/3e §3.9.3.) On x86-64 machines, this means that the address value returned by malloc() must be evenly divisible by 16. You should do this, too.
Two secondary memory allocation functions are also commonly used in C: calloc and realloc. The calloc function allocates memory and “clears” it so that all bytes in the allo- cated region contain zeroes. The realloc function can allocate, free, or resize memory depend- ing on its arguments. These functions work like this:
void *calloc(size_t nmemb, size_t sz) {
void *ptr = malloc(sz * nmemb);
if (ptr != NULL)
memset(ptr, 0, sz * nmemb); // set memory contents to 0
return ptr;
void *realloc(void *ptr, size_t sz) {
void *new_ptr = NULL;
if (sz != 0)
new_ptr = malloc(sz);
if (ptr != NULL && new_ptr != NULL) {
size_t old_sz = size of memory block allocated at ptr;
if (old_sz < sz)
memcpy(new_ptr, ptr, old_sz);
memcpy(new_ptr, ptr, sz);
free(ptr);
return new_ptr;
(N.B.: There’s actually a bug in that implementation of calloc! One of our tests would find
You will work on our replacements for these functions, which are called cs0019 malloc,
cs0019 free, cs0019 calloc, and cs0019 realloc. Our versions of these functions sim- ply call basic versions, base malloc and base free. Note that the cs0019 functions take extra arguments that the system versions don’t, namely a filename and a line number. Our header file, cs0019.h, uses macros so that calls in the test programs supply these arguments automati- cally. You’ll use filenames and line numbers to track where memory was allocated and to report where errors occur.
In addition to the debugging allocator, you must design and implement another useful tool, heavy hitter reports. You will design your solution, implement it, and test it.
Requirements
Your debugging allocator must support the following functionality. The code we hand out con- tains tests for all this functionality (though we may run further tests when grading). From easier to harder:
1. Overall statistics—how many allocations/frees, how many bytes have been allocated/freed, etc.
2. Secondary allocation functions (calloc and realloc) and integer overflow protection.
3. Invalid free detection.
4. Writing past the beginning/end of an allocation.
5. Reporting memory that has been allocated, but not freed.
6. Advanced reports and checking.
7. Heavy hitter reporting.
Further details on what you must implement for each of the above functionalities are provided below.
Finally, your debugging allocator also must perform acceptably—i.e., it must not inordinately slow the execution of programs that use it. For this coursework, we define “acceptable” to mean that the tests we provide (which invoke your debugging malloc) must each run to completion within 5 seconds. These test programs themselves take just a fraction of a second to run on their own (not counting time spent in your malloc implementation).
CS Help, Email: tutorcs@163.com
Getting Started
Before you follow the instructions below to retrieve the code for CW2, you MUST first complete the 0019 grading server registration process. You only need to complete that pro- cess once for the whole term, so if you did so before CW1, you can proceed straight to retrieving the code for CW2.
If you have not yet done so, STOP NOW, find the email you received with the subject line “your 0019 grading server token,” retrieve the instructions document at the link in that email, follow those instructions, and only thereafter proceed with the instructions below for retrieving the code for CW2.
All programming for this coursework must be done under Linux, with code compiled for an x86-64 CPU. Grades are determined using automated tests that the 0019 staff run on a grading server with an x86-64 CPU.1 Even small changes in the software environment (everything from OS version to compiler and library versions) and changes in CPU architecture (e.g., running on ARM vs. on x86-64) can cause the same source code to produce different results. As a consequence it is absolutely crucial that you do your work for the 0019 courseworks using the same exact software environment and CPU architecture that the grading server uses. Otherwise, there is a risk that when you test your code, you will see different behavior than the grading server does. We provide an “official,” supported development environment for the 0019 courseworks that you must use when building and running your 0019 CW code. This environment matches the one used on the 0019 grading server, and thus helps ensure that the behavior you see when you run tests yourself matches the behavior your code exhibits when run on the grading server.
We provide two main supported ways for you to work on the 0019 CW:
• by logging in remotely over ssh to 0019 Linux x86-64 machines the 0019 instructors provide that run the software for the supported development environment
• by running the supported development environment locally using Docker.
We explain both these methods below. You only need to use one! The Docker method is somewhat more effort to set up, but gives you a complete, shell-based Linux development environment for 0019 on your own machine, regardless of which OS (your “native OS”) you have installed on your machine to begin with. Our advice is that if you find you have difficulty getting Docker to work, rather than waste more valuable time on configuring your environment, you instead use the ssh remote development method, which requires less setup.
Some students’ personal machines have x86-64 CPUs, and some students’ personal machines have ARM CPUs (Apple M1, M2, or M3 CPUs). While the supported 0019 development envi- ronment produces code for x86-64 CPUs, both ways of developing (ssh and Docker) will work
1While you receive a grade on CW2 as feedback, CW2 is unassessed—it does not contribute to your mark in the module. CW3-CW5 will all be assessed, however, and are graded in the same way as CW2: using automated tests run on a grading server with an x86-64 CPU.
2There is one exception: if a submission produces the expected output strings for any part of the CW2 test suite, but the submitted code produces those output strings by means other than a good-faith attempt to implement the functionality required in this CW2 handout, that submission will receive zero marks, regardless of test results.
Results from the tests on the 0019 grading server are final; even if your code behaves differently in some other software environment or on some other CPU than in the supported 0019 development environment, your grade will be the result on the 0019 grading server.2
for you, even if your own machine has an ARM CPU. ssh simply has you develop on a server with an x86-64 CPU, whereas the Docker environment we provide accurately emulates an x86- 64 CPU if you have an ARM machine (as we explain in the information on Docker we provide below).
If while using the supported 0019 development environment (either via ssh or under the 0019 Docker setup), you consistently (i.e., for many runs) get different results than you do when your code is tested by the grading server (which we describe below), please contact the course staff via an Ed private message. We are happy to answer student questions about difficulties encountered when doing the coursework in the supported 0019 development environment, but we cannot support any other Linux installation.
Option 1: Using the 0019 x86-64 Linux Lab Machines over ssh
As we previously described in the handout for CW1, for the full duration of 0019, we provide a set of ten Linux machines with x86-64 CPUs that students can log into remotely via ssh. Regardless of the OS or CPU on your own personal machine, you can complete CW2 (and every CW in 0019) by logging into these 0019 Linux machines by ssh and building, testing, and debugging your code there. Please refer to the 0019 Courseworks web page for instructions on how to access the 0019 Linux machines by ssh:
http://www.cs.ucl.ac.uk/staff/B.Karp/0019/s2024/cw.html
Our intent in 0019 is that students build and run their code using the Linux shell command line, so that they build proficiency using the shell, which is an extremely powerful develop- ment environment favored by most of the most skilled system designers, and is convenient to use over ssh. That said, VS Code supports editing code locally in VS Code’s GUI on your own machine when developing over an ssh connection to a server. In this mode of use, VS Code applies edits to source code files stored on the server and can build and run your code on the server, all through VS Code’s GUI. You can find documentation for how to set up VS Code this way at https://code.visualstudio.com/docs/remote/ssh. Take particu- lar care that because UCL CS’s network requires you to ssh first to knuckles.cs.ucl.ac.uk and from there to one of the 0019 Linux machines, you will need to configure your machine to use ProxyCommand for VS Code to be able to “jump” through knuckles to reach an 0019 Linux machine, as explained in https://code.visualstudio.com/blogs/2019/ 10/03/remote-ssh-tips-and-tricks#_proxycommand.
Option 2: Setting Up and Developing Locally with Docker
The other option (again, you only need one!) is to install the supported 0019 development environment locally on your own personal machine. We use Docker to provide the supported local Linux development environment for the x86-64 CPU. Perhaps counterintuitively, Docker can provide this environment even on machines with ARM CPUs—it can compile C into x86-64 assembly and x86-64 machine code, and even run x86-64 machine code on ARM CPUs.
Docker performs well: Linux starts instantly in a Docker container, and a Linux Docker container tends to consume less CPU, disk storage, and RAM than a full-blown virtual machine image does. Using Docker will also give you the interesting experience of having a Linux shell- based development environment on your own machine.
All of this said, Docker takes a little effort to set up. If you are uncomfortable with the steps that follow, or if you encounter difficulty following them, we advise simply using ssh to complete the 0019 courseworks, as described in the previous section of this handout.
To use the supported 0019 development environment on your own machine under Docker, you must first download and install the Docker Desktop software, which is free for educational
use. You can find links to installer packages for the latest versions of Docker Desktop for Win- dows, Mac OS X, and Linux online at:
https://www.docker.com/get-started/
Download Docker Desktop on that page and install it. N.B. that there are different packages for Intel Macs and ARM Macs! Note further that for the entire rest of these instructions, you must have Docker Desktop not just installed, but running.
Windows users only:
To install Docker on Windows, you must first install Windows Subsystem for Linux version 2 (WSL2). If you already have WSL version 1, you need to update to WSL version 2. Microsoft explains how to check which version you are running and how to update at:
https://learn.microsoft.com/en-us/windows/wsl/
basic-commands
Next you need to retrieve configuration files for the 0019 Docker Linux development envi- ronment. To do so, use git to clone the repository with the following command:3
git clone https://github.com/UCLCS0019/cs0019-docker
Change directory into your local repo copy with the shell command cd cs0019-docker. Next you must customize the Docker image build instructions with your name and email address. To do so, use your favorite text editor to edit the file Dockerfile.x86-64 if you have an Intel CPU or the file Dockerfile.arm if you have an ARM CPU, as follows:
Find the line that begins ARG USER. Replace the user name 0019\ User with your (human) name, taking care to precede any space with a single backslash, as done in the example in the file (e.g., Jeremy\ Bentham).
FindthelinethatbeginsARG with your UCL email address.4
If your machine has an Intel x86-64 CPU, regardless of your native OS, type the command (note the period at the end, which is part of the command):
docker build -f Dockerfile.x86-64 -t cs0019:x86 .
If you have a Mac with an ARM CPU (Apple M1, M2, or M3), instead run the following command:
docker build -f Dockerfile.arm -t cs0019:arm .
The above command instructs Docker to build a Docker-compatible Linux image for your particular CPU. It will take up to 10 or so minutes to complete. Once the above command
3Throughout this handout, if you have set up your GitHub account to use ssh authentication, as the 0019 instructors recommend in Ed posts #5 and #14, then you can replace the https://github.com in all repo URLs throughout this handout with so that your git commmands authenticate to GitHub over ssh.
4If your name and/or UCL email address contain an apostrophe (you know who you are ;-)), you need to precede any apostrophe with a backslash!
completes you are ready to run the supported 0019 Linux development environment on your machine. Docker calls the running instantiation of an image a container.
One convenient feature of Docker is that you can give a Docker container (in your case, a running instance of the supported 0019 Linux development environment) access to a directory on disk in your machine’s native OS. You will store your CW2 GitHub repository (which contains your code for 0019 CW2) on disk in your machine’s native OS. You can then use any editor of your choice in your native OS to edit your code, and the Linux container running inside Docker can then read and write your repo directory’s contents, so that you can build and run your code in the Linux container. To configure this, you’ll need to choose a directory in your native OS’s file system that you’d like to share with your 0019 Linux Docker container. We recommend you create a directory for 0019 CWs in your home directory in your native OS; for example, if you’re in your home directory, you might type mkdir cs0019-cws in a Terminal window under Linux or Mac OS to create a directory named cs0019-cws in which you can store all your git repositories for the 0019 courseworks. Once you’ve chosen a name for this directory, make a note of the full pathname to it; you’ll need to supply it as part of an argument to Docker when you start your 0019 Linux Docker container. Let’s call the full pathname to this directory MY0019DIR.
You’re now ready to start a Docker container for the 0019 Linux image you built earlier. We provide a script that makes starting the 0019 Linux image under Docker less verbose. While in the same directory (cs0019-docker), type the shell command:
./cs0019-docker-linux -d MY0019DIR
(where, again, MY0019DIR is the full pathname to the directory in your native OS where you will keep your 0019 coursework repositories)
Windows users only:
You can’t run the above command directly in PowerShell. First run the bash Linux shell by typing bash at the PowerShell prompt. Then enter the above command at the bash command prompt. Also, when you specify MY0019DIR on the command line above, if there is a Windows drive letter in your pathname, do not use a colon in the pathname after the drive letter, and instead use backlashes to delimit the drive letter. In other words, don’t use a MY0019DIR argument that looks like C:\Mydata\...; instead use a MY0019DIR argument that looks like \C\Mydata\....
The above script instructs Docker to create and run a container from the image you built previously. If you look at the script’s contents, you will see that on the docker run line, the most significant command line options include telling Docker to connect the interactive shell in that container to your terminal so that you can use a Linux shell running in the container (-it); to delete the container when you exit the shell (--rm); and to make MY0019DIR in your native OS’s file system accessible within the Linux Docker container, where it will appear under /home/user/cs0019-cws (-v MY0019DIR:/home/user/cs0019-cws).
In your terminal window, you will immediately be given a shell prompt for a Linux shell running in your supported 0019 Linux development environment. You can now use this shell to do all the usual development commands you need (e.g., git to clone your repo, commit changes, and push them to GitHub, as we explain below; make to build your code; and commands to run the tests we provide with the various 0019 courseworks). When you’re done using the 0019 Linux development environment, just type exit at the shell prompt to exit the Docker container. You don’t need to worry about the container’s deletion when you exit it because all changes to your code (edits to source code, compilation results, etc.) are stored in your native OS’s file system. Whenever you want to use the 0019 development environment in a Docker container again, just run the cs0019-docker-linux script again.
ARM users only:
Your 0019 Linux development environment in a Docker container has been configured so that it cross-compiles C code into x86-64 assembly code. And your Docker container will automatically invoke an x86-64 CPU emulator to run x86-64 machine code. So when you build CW2 (and subsequent CWs), you will in fact be running x86-64 machine code on an emulated x86-64 CPU.
While the above all works seamlessly, gdb does not quite work seamlessly in this cross- development (ARM to x86-64) environment. If you want to run gdb on an executable that you build as part of CW2, you will need to do so using a gdb script we provide in the CW2 code we hand out. For example, if you want to run gdb on the executable file named test001, you may do so by typing the following commands:
• Run gdb at the shell prompt.
• At the (gdb) prompt, run the command source gdb.malloc. • At the (gdb) prompt, run the command run-malloc test001.
Hereafter, you may run whatever gdb commands you wish in the usual way (setting break- points, stepping through execution one line of C source a time, etc.). Substitute the name of any CW2 executable you want to debug for test001 in the above.
Keep in mind that if you find using gdb in this fashion inconvenient, you can always ssh to the 0019 x86-64 Linux machines we provide at UCL CS, clone your git repository there, and run gdb there on a native x86-64 box.
Continue below to obtain a copy of the CW2 code in your 0019 Linux development environ- ment.
Managing Your Code with git
For Courseworks 2 and later in CS 0019, you will manage the revisions of your code, including submitting it to the instructors for testing and grading, using the git source code control system and GitHub. git is useful for a great many things, from keeping the revision history of your code to making it easy to share your code on different machines (if you wind up wanting to use the Docker environment on your own box and also develop on the ssh-accessible 0019 Linux machines, for example, you can keep your multiple working copies in sync via your main repository on GitHub). If you’ve not used git before, you can find a wealth of documentation online; we offer only a bare-bones introduction below.
git manages a set of source code files you are working on in a repository. You keep a local copy of the repository on a machine where you are editing your code and testing it, and use git to keep your local copy synchronized with a “master” copy of the repository on a server. In CS 0019, you will use GitHub to host the master copy of your repository. As you do your work (adding code, fixing bugs, etc.) it is good practice to update the master copy of your repository on GitHub with the changes you have made locally. There are two steps to doing so: first, you commit the changes to indicate that they are ready for shipping to the master repository, and second, you push your committed changes to the master repository.
To start the coursework, though, you must first retrieve a copy of the files we provide for you to start from using the instructions below:
• First, set up your GitHub repository for your CW2 code by visiting the following GitHub URL:
https://classroom.github.com/a/mhVSWtW7
• If you’re not already logged into GitHub, you may be prompted for your GitHub username and password. Once you’ve logged into GitHub, you will see a page with a button to ac- cept GitHub Classroom assignment cs0019-2024-cw2-malloc-UNAME, where UNAME is your GitHub username. The full string formatted as above is the name of your repository, which we refer to below as “your repository name.” Press the Accept button.
• GitHub will ask you to reload the page to see when it has finished creating your repository. Once this is complete, click on the link to the repository name to see the contents of your new repository.
• In a shell on your chosen 0019 supported development environment (whether while logged in remotely by ssh to a 0019 x86-64 Linux machine, or in a local shell in your Docker 0019 Linux container), issue the command:
git clone https://github.com/UCLCS0019/YOUR-REPO-NAME (where YOUR-REPO-NAME is your repository name, as defined above)
You will be prompted for a username and password. Use your GitHub username. Do not use your GitHub account password, though; GitHub now requires that users use personal access tokens as passwords when retrieving repositories over HTTPS. Please refer to the full instructions on how to create a personal access token to learn how to create a personal access token.
Optional (Mac OS users using Docker only):
If you are using Docker on your Mac, and you have configured your GitHub account to use ssh key pair authentication, you may also retrieve the CW2 repo over ssh. (See the bottom of Ed post #5 for a link to instructions on how to set up key pair authentication for GitHub.) If you use the ssh authentication agent in Mac OS to store your ssh key, our script for running your 0019 Docker container will forward access to your ssh authenti- cation agent to Docker, so that commands you run inside the Docker 0019 Linux container have access to your ssh key.
Docker unfortunately does not support forwarding the Windows ssh authentication agent into a Docker container. A workaround may be to copy your ssh private key from your .ssh directory into your MY0019DIR, and then to start ssh-agent inside Docker, and run ssh-add on the copy of your ssh private key available inside your MY0019DIR.
• In the same shell in your chosen development environment, change directory to the local copy of your repository with the command:
cd YOUR-REPO-NAME
You will now have a local working copy of the CW2 repository in the environment where you will develop the code for your solution, and you are located in the working directory where the CW2 source code is.
All code you write for CW2 must go in the file cs0019.c. You will receive an initial version of this file (which you must extend to complete CW2) in your repository when you create it using the steps above.
程序代写 CS代考 加QQ: 749389476
As you write your code and improve it (e.g., by fixing bugs, adding functionality, etc.), you should get in the habit of syncing your changes to the master copy of your CW2 repository on GitHub. Doing so keeps the history of changes to your code, and so allows you to revert to an older version if you find that a change causes a regression. It also serves to back up your code on GitHub’s servers, so you won’t lose work if your local working copy is corrupted or lost. To bring GitHub up to date with changes to your local working copy, you must first use the git commit -a command (which will prompt you for a log message describing the reason for your commit, e.g., “fixed segfault on double free test”), and then the git push command to copy your changes to GitHub.
Debugging Allocator: Details
Implement the following function:
void cs0019 getstatistics(struct cs0019 statistics *stats)
Fill in the cs0019 statistics structure with overall statistics about memory allocations so far.
The cs0019 statistics structure is defined as follows:
struct cs0019_statistics {
unsigned long long nactive; // number of active allocations [#malloc - #free]
unsigned long long active_size; // number of bytes in active allocations
unsigned long long ntotal;
unsigned long long total_size;
unsigned long long nfail;
unsigned long long fail_size;
char* heap_min;
char* heap_max;
// number of allocations, total
// number of bytes in allocations, total
// number of failed allocation attempts
// number of bytes in failed allocation attempts
// smallest address in any region ever allocated
// largest address in any region ever allocated
Most of these statistics are easy to track, and you should tackle them first. You can pass tests 1–10 without per-allocation metadata. The hard one is active size: to track it, your free(ptr) implementation must find the number of bytes allocated for ptr.
The easiest, and probably best, way to do this is for your malloc code to allocate more space than the user requested. The first part of that space is used to store metadata about the allocation, including the allocated size. This metadata will not be a struct cs0019 statistics; it’ll be a structure you define yourself. Your malloc will initialize this metadata, and then return a pointer to the payload, which is the portion of the allocation following the metadata. Your free code will take the payload pointer as input, and then use address arithmetic to calculate the pointer to the corresponding metadata. This is possible because the metadata has fixed size. From that metadata it can read the size of the allocation. See CS:APP/3e Figure 9.35 “Format of a simple heap block” for an example of this type of layout.
If you don’t like this idea, you could create a list or hash table size for pointer that mapped pointer values to sizes. Your malloc code would add an entry to this data structure. Your free code would check this table and then remove the entry.
Other aspects of CW2 will require you to add more information to the metadata.
Run make check to test your work. Test programs test001.c through test012.c test your overall statistics functionality.