CIS2400 Proj2 C LC4Disassembler (1)

CIS 2400 Project #2: C – LC4 Disassembler
Part 1 (30%) Due Date: Friday 4/21 @11:59pm via gradescope
Part 2 (70%) Due Date: Wednesday 4/26 @11:59pm via gradescope
Which Lectures Should I Watch to Complete this Assignment?: Modules 13-14
YOU WILL USE DYNAMIC MEMORY FOR THIS ASSIGNMENT (LINKED LISTS + HASHTABLES)
READING: Chapter 19 discusses the heap. If you have purchased a C-programming book, then you’ll want to look to its chapter on dynamic memory.
Video Resources for this Assignment:
This assignment will be programmed in C and run on codio. The following resources can help:
• TUTORIAL-DEBUGGING in C: If you are getting segfaults during this HW assignment, and having trouble with your program, watch this video on canvas to help you learn how to use the GDB debugger: Files->Resources->Tutorials->
Tutorial_Debugging_GDB.mp4
• TUTORIAL-MAKEFILES: If you are still struggling to understand Makefiles even after the last assignment as well as recitation, try this video: Files->Resources->Tutorials-> Tutorial_Makefiles.mp4
• TUTORIAL-DEBGGING in C: Another wonderful tutorial on GDB…it shows how to use GDB to debug ‘infinite loops’ and segfault crashes using LAYOUT. This layout tool shows your code running and show you were it crashes! https://www.youtube.com/watch?v=bWH-nL7v5F4
• TUTORIAL-VALGRIND: Learning how to use VALGRIND is required for this HW. A video overviewing it is located on canvas under: Files->Resources->Tutorials-> Tutorial_Debugging_Valgrind.mp4
Setting up Codio for this HW:
1) Login to codio.com using the login you created in the previous HW and find assignment:
Note: There will no late period for Part 1 of the assignment and no extensions
given (for any reason) for part 1 of the assignment, so plan accordingly
From the “Course” Dropdown box in the middle of the screen, select:
CIS 2400 – S23 – ONCAMPUS
From the “Module” Dropdown box in the middle of the screen, select:
C-Assignments
A list of assignments should appear, select:
C Programming HW – LC4 Disassembler

CIS 2400 Project #2: C – LC4 Disassembler
OVERVIEW: The goal of this HW is for you to write a program that can open and read in a .OBJ file created by PennSim, parse it, and load it into a hash table of linked lists that will represent the 4 basic segments of LC4’s memory: User Program Memory, User Data Memory, OS Program Memory, and OS Data Memory. In the last HW you became very familiar with parsing an LC4 .OBJ file and then ‘simulating’ the LC4 machine. In this HW you will parse an LC4 .OBJ file and then convert it back to the assembly it came from! This is known as reverse assembling (sometimes referred to as disassembling).
RECALL: OBJECT FILE FORMAT
The following is the format for the binary .OBJ files created by PennSim from your .ASM files. It represents the contents of memory (both program and data) for your assembled LC-4 Assembly programs. In a .OBJ file, there are 3 basic sections indicated by 3 header “types” = CODE, DATA, SYMBOL. You will not see a “FILE” or “LINE NUMBER” header in the files used to test your code, so your code does not have to process those headers. You will see the following:
● Code: 3-word header (xCADE,

, ), n-word body comprising the instructions. This corresponds to the .CODE directive in assembly.
● Data: 3-word header (xDADA,

, ), n-word body comprising the initial data values. This corresponds to the .DATA directive in assembly.
● Symbol: 3-word header (xC3B7,

, ), n-character body comprising the symbol string. Note, each character in the file is 1 byte, not 2. There is no null terminator. Each symbol is its own section. These are generated when you create labels (such as “END”) in assembly.
LINKED LIST NODE STRUCTURE / HASH TABLE BUCKETS:
In the file: lc4_memory.h, you’ll see the following structure defined:
struct row_of_memory {
short unsigned int address ; char * label ;
short unsigned int contents ; char * assembly ;
struct row_of_memory *next ;
The structure is meant to model a row of the LC4’s memory: a 16-bit address, and its 16-bit contents. As you know, an address may also have a label associated with it. You will also recall that PennSim always shows the contents of memory in its “assembly” form. So PennSim reverse-assembles the contents and displays the assembly instruction itself (instead of the binary contents).
As part of this assignment, you’ll create a single 4-bucket hashtable that holds the head pointers of 4 linked lists of the type: row_of_memory (shown above). The buckets are meant to correspond to the address ranges of the 4 basic segments of LC4 memory: User Program Memory (x0000-1FFF), User Data Memory (x2000-x7FFF), OS Program Memory (x8000-x9FFF), and finally OS Data Memory (xA000-xFFFF). As you read in a .OBJ file, you will store each instruction/data memory value into the appropriate bucket of the hashtable, depending upon the instruction/data memory value’s address

CIS 2400 Project #2: C – LC4 Disassembler
Part 1 (30%) Due Date: Friday 4/21 @11:59pm via gradescope
The first part of the assignment is intended to act as a milestone to help you get started on the project early. What is required for part1 is only to implement the basic linked list functionality. (DO NOT IMPLEMENT THE HASH TABLE FOR PART 1).
Open the file lc4_memory.h to see the structure declaration for the row_of_memory struct described earlier in this document. This is the type of a single node for the linked list that you will manage with the helper functions declared in lc4_memory.h. For part1 of the assignment you will exclusively work in the file: lc4_memory.c. You will notice several helper functions that you must implement as part of this project.
In the file lc4_memory_test.c, there is also a “main()” function that is intended to make only part1 of the project work. The main() function will need to be implemented for part1 of the project to provide a simple test of the add_to_list(), delete_from_list(), and print_list(). For part2 of the assignment, this file (lc4_memory_test.c) could be deleted, or simply not included in the future Makefile for the assignment. Before implementing anything, make certain you can compile the skeleton code provided without errors:
Compile only lc4_memory.c and lc4_memory_test.c, from a terminal window as follows:
clang lc4_memory.c lc4_memory_test.c -o lc4_memory_test
No Makefile is required for part1 as you are working only on 2 files. If you were to run ./lc4_memory_test, the program won’t do anything until you implement it the functions.
Begin by implementing the function: add_to_list(). In lc4_memory.c you will see comments that will help you to implement the function. Make certain you have reviewed the slides on linked lists in module13 before you begin. The main difference between the add_to_list() function shown in lecture vs. this project is that you are not adding new nodes to the end of the list. Instead you are inserting new nodes to keep the linked list in order of ascending; so this add_to_list() is a bit more challenging as you must ‘perform surgery’ on the list regularly when inserting new nodes. It never hurts with just making a simple first version of your code that just adds them to the end of the list to get yourself started!
Once you have something going in add_to_list(), go back to main() in lc4_memory_test.c, create a local pointer to the linked list, and see if you can pass that to your add_to_list() (via pointer to a pointer) and see if your code is working. This takes time, so compile and test often!
Next, begin work on print_list(). For part1, use this example output to setup your print statements to display the entire linked list:
Node #0: address = 0x0000, contents = 0x1234, label = nil, assembly = nil Node #1: address = 0x0001, contents = 0x5678, label = nil, assembly = nil …
You will change this function’s formatting in part2, but for part1, you partially implement it to display the linked list as shown above as a basic test to see if your linked list code works!
For part1, you will only need to implement
add_to_list(), delete_from_list() and partially print_list().
Code Help
CIS 2400 Project #2: C – LC4 Disassembler
Finally implement the function: delete_from_list() whose job is to delete a single node from the linked list based on the address supplied to the function. Pay careful attention to the comments in the function to properly implement and return the proper values.
Follow the comments in the main() function in the lc4_memory_test.c file to implement the basic test cases listed that will ensure your code is working. Once you have things working properly on the screen, the next challenge is to run the memory-leak-error detection tool called: valgrind.
If you have not yet, watch the tutorial on valgrind located on canvas. After you’ve completed that tutorial, run valgrind on your code:
valgrind –leak-check=full lc4_memory
Valgrind should report 0 errors AND there should be no memory leaks prior to submission. The intent of working on part1 as we have is for you to work with valgrind to ensure your linked_list is working properly in terms of how its hadling memory. Just because it produces the proper output on the screen, if it leaks memory, points will be lost!
Note: we will run valgrind on your submission, if it leaks memory, you will lose many points on this assignment. So watch the VIDEO, learn how to use valgrind!!
Also note: If your code doesn’t compile or even run, you will lose most of the points of this assignment!
SUBMISSION of PART 1
Once you’ve completed part 1, upload lc4_memory.c onto gradescope. There may be an autotester that will use its own main() to test your lc4_memory.c linked list. This is why the lc4_memory.h file cannot be adjusted/added to/changed. So that the interface matches what our autotester expects.

CIS 2400 Project #2: C – LC4 Disassembler
Part 2 (70%) Due Date: Wednesday 4/26 @11:59pm via gradescope
FLOW CHART: Overview of Program Operation for Part 2 of the Project:
Be aware that you will be adding to lc4_memory.c in part2 and that main() will be located in
a different file: lc4.c
Create 4-bucket hashtable
Extract name of .OBJ files from command line
Open .OBJ file in binary mode for reading
Did FILE open? N Print Error, return 1 Y
Read 2 byte header field Read 2 byte

field Read 2 byte field
At end of file? Y
Close File;return 2 if error & free mem
For each bucket of hashtable, search linkedlist for: OPCODE=0001 && NULL assembly field
HEADER= N CODE/DATA?
Did search return node?
Read word body; create & populate node, then insert into proper bucket in the hashtable
Inspect node returned from search; translate contents field into assembly instruction
Read byte body; search hashtable for address, update node to have ASCII label
Allocate memory in the node to hold assembly instruction; then copy into assembly field
Print table; free memory for table; return 0

CIS 2400 Project #2: C – LC4 Disassembler
IMPLEMENATION DETAILS:
Examine the project files lc4_hash.h and lc4_hash.c. In these files you will notice the structure that represents a hashtable with buckets of linked lists of the type: row_of_memory. You will also notice helper functions that are mean to manage the common tasks needed for a hashtable: create table, search table, print table, delete table. You will be implementing these helper functions as part of the assignment. One notable function that is missing is the “hashing function,” this is intentional. You’ll notice from the hashtable’s structure that there is a function pointer that must be assigned when the “create_table” function is invoked. This requires the caller of create_table() to provide a pointer to a hashing function they have created.
From part 1 of the assignment, you have partially implemented most of the linked list helper functions in lc4_memory.c. The search() functions and the print() function will need to be implemented and adjusted for part 2 of the assignment.
Next, you will see the file called: lc4.c It serves as the “main” for the entire program. The pointer to the hashtable must be stored in main(), you will see in the provided lc4.c file a pointer named: memory that will do just that. You will use the hashtable helper functions to create the hashtable that the pointer memory will point to. You must also create a hashing function (in lc4.c) and pass a pointer to it when creating the hash table. The hashing function must take in a pointer to the hashtable as a “key”, which will be the memory address, that the function must map to the proper bucket in the table. Recall from earlier in this document, that the 4 buckets of the hash table are head pointers to linked lists. The buckets are meant to correspond to the address ranges of the 4 basic segments of LC4 memory: User Program Memory (x0000-1FFF), User Data Memory (x2000-x7FFF), OS Program Memory (x8000-x9FFF), and finally OS Data Memory (xA000-xFFFF). You’re hashing function will take in a memory address and determine the appropriate bucket given the ranges listed above.
Also in main() , one must extract the name of the .OBJ files the user has passed in when they ran your program from the argv[] parameter passed in from the user. Upon parsing that, it will call lc4_loader.c’s open_file() and hold a pointer to the open file. It will then ask call lc4_loader.c’s parse_file() to interpret the .OBJ file the user wishes to have your program process.
Lastly main will organize the reverse assembling the file, printing out the entries in the hash table, and finally delete it when the program ends. These functions are described in greater detail below. The order of the function calls and their purpose is shown in comments in the lc4.c file that you will implement as part of this assignment.
NOTE: COMMENTS ARE REQUIRED IN YOUR CODE – POINTS WILL BE DEDUCTED IF
MEAINGFUL COMMENTS ARE NOT INCLUDED, QUICK TIPS HERE:

13 Tips to Comment Your Code

CIS 2400 Project #2: C – LC4 Disassembler
Once you have properly implemented lc4.c and have it accept input from the command line, a user should be able to run your program as follows:
./lc4 output_filename.txt first.obj second.obj third.obj …
The first argument is the name of the file your program will produce as output. The rest are the names of one or more LC4 object files that have been assembled to be loaded into your simulator. So be aware that you need to read more than 1 object file (think of past HWs where you had: multiply.obj and os.obj loaded into the PennSim simulator)
If no file is passed in, your program should generate an error telling the user what went wrong, like this:
error1: usage: ./lc4

CIS 2400 Project #2: C – LC4 Disassembler
Problem 1) Implementing the LC4 Loader
Most of the work of your program will take place in the file: called: lc4_loader.c. In this file, you will start by implementing the function: open_file() to take in the name of the file the user of your program has specified on the command line (see lc4_loader.h for the definition of open_file()). If the file exists, the function should return a handle to that open file, otherwise a NULL should be returned.
Also in lc4_loader.c, you will implement a second function: parse_file() that will read in and parse the contents of the open .OBJ file as well as populate the hashtable’s linked lists as it reads the .OBJ file. The format of the .OBJ input file has been in lecture, but its layout has been reprinted above (see section: INPUT_FILE_FORMAT). As shown in the flowchart above, have the function read in the 3-word header from the file. Recall that all of the LC4 .OBJ file headers consist of 3 fields: header type,

, . As you read in the first header in the file, store the address field and the field into local variables. Then determine the type of header you have read in: CODE/DATA/SYMBOL. In this assignment, the symbol section won’t be ignored!
If you have read in a CODE header in the .OBJ file, from the file format for a .OBJ file, you’ll recall the body of the CODE section is -words long. As an example, see the hex listed below, this is a sample CODE section, notice the field we should correlate with n=0x000C, or decimal: 12. This indicates that the next 12-words in the .OBJ file are in fact 12 LC-4 instructions. Recall each instruction in LC4 is 1 word long.
CA DE 00 00 00 0C 90 00 D1 40 92 00 94 0A 25 00 0C 0C 66 00 48 01 72 00 10 21 14 BF 0F F8
From the example above, we see that the first LC-4 instruction in the 12-word body is: 9000. (that happens to be a CONST assembly instruction if you convert to binary). You would call the “add_entry_to_tbl()” function from lc4_hash.c. The hashing function for the table would determine this instruction belongs in bucket 0 (since its address is 0x0000 an address in User Program Memory). Add_entry_to_tbl() would then invoke the linked list helper function: add_to_list() which would allocate memory for a new node in your linked list to correspond to the first instruction (the section above: LINKED LIST NODE STRUCTURE, declares a structure that will serve as a blue-print for all your linked list nodes called: “row_of_memory”). As it is the first instruction in the body, and the address has been listed as 0000, you would populate the row_of_memory structure as follows.
address 0000 label NULL contents 9000 assembly NULL next NULL

CIS 2400 Project #2: C – LC4 Disassembler
In a loop, you must read in the remaining instructions from the .OBJ file; adding each entry to the hashtable (which will actually be adding to the bucket’s underlying linked) for each instruction. As your helper functions allocate row_of_memory nodes your linked list will look like this for the sample .OBJ file data from above:
Memory->BUCKET[0] Header pointer
address 0000 label NULL contents 9000 assembly NULL next
address 0001 label NULL contents D140 assembly NULL next
address 0002 label NULL contents 9200 assembly NULL next
Next node…
The procedure for reading in the USER DATA sections would be identical to reading in the CODE sections. These would become part of a separate linked list in bucket[1] of the hashtable. Finally OS Program Memory would go into bucket[2] of the hashtable following a similar procedure as was done above. Ffinally OS Data Memory items would go into bucket[3] of the hashtable.
STORING SYMBOLS IN THE APPROPRIATE NODE:
C3 B7 00 00 00 04 49 4E 49 54
Imagine the .OBJ file also contains the above SYMBOL section. The address field is: 0x0000.
The symbol field itself is: 0x0004 bytes long. The next 4 bytes: 49 4E 49 54 are ASCII for: INIT. This means that the label for address: 0000 is INIT. Your program must search the hashtable based on address 0x0000, which will determine the bucket this address is in, then call the linked list “search_by_address() function, and return back the node that is holding this instruction. Your job is to then populate the “label” field for the node. Note: the field: tells us exactly how much memory to malloc() to hold the string, however you must add a byte to hold the NULL. 5 bytes in the case of: INIT. For the example above, the node: 0000 in your linked list, would be updated as follows:
address 0000 label INIT contents 9000 assembly NULL next
CS Help, Email: tutorcs@163.com
CIS 2400 Project #2: C – LC4 Disassembler
Once you have read the entire file; created and added the corresponding nodes to your linked list, close the file and return to main(). If you encounter an error in closing the file, before exiting, print an error, but also free() all the memory associated with the hashtable & linked lists prior to exiting the program. Once you are back in main() repeat this process to process the additional .OBJ files the user of your program has passed in.
程序代写 CS代考 加QQ: 749389476
CIS 2400 Project #2: C – LC4 Disassembler
Problem 2) Implementing the Reverse Assembler
In a new file: lc4_disassembler.c: implement the function reverse_assemble() that will take as input the populated “memory” hashtable (that parse_file() populated) – it will now contain the complete .OBJ’s contents. reverse_assemble() must translate the hex representation of nodes in the program memory buckets of the hashtable (not the data memory buckets) into their assembly equivalent. You will need to reference the LC4’s ISA to author this function. To simplify this problem a little, you DO NOT need to translate every single HEX instruction into its assembly equivalent. Only translate instructions with the OPCODE: 0001 (ADD REG, MUL, SUB, DIV, ADD IMM) and 0101 (the logic instructions).
As shown in the flowchart, this function will call your linked list’s “search_by_opcode()” helper function. Your search_by_opcode() function should take as input an OPCODE and return the first node in the linked list that matches the OPCODE passed in, but also has a NULL assembly field. When/if a node in your linked list is returned, you’ll need to examine the “contents” field of the node and translate the instruction into its assembly equivalent. Once you have translated the contents filed into its ASCII Assembly equivalent, allocate memory for and store this as string in the “assembly’ field of the node. Repeat this process until all the nodes in the linked list with an OPCODE=0001 and 0101 have their assembly fields properly translated.
As an example, the figure below shows a node on your list that has been “found” and returned when the search_by_opcode() function was called. From the contents field, we can see that the HEX code: 128B is 0001 001 010 001 011 in binary. From the ISA, we realize the sub-opcode reveals that this is actually a MULTIPLY instruction. We can then generate the string MUL R1, R2, R3 and store it back in the node in the assembly field. For this work, I strongly encourage you to investigate the switch() statement in C (any good book on C will help you understand how this works and why it is more practical than multiple if/else/else/else statements). I also remind you that you must allocate memory strings before calling strcpy()!
NODE BEFORE
NODE AFTER UPDATE
address 0009
label NULL
contents 128B
assembly MUL R1, R2, R3 next
address 0009 label NULL contents 128B assembly NULL next

CIS 2400 Project #2: C – LC4 Disassembler
Problem 3) Putting it all together
As you may have realized main() does 3 basc things: 1) create and hold the pointer to your memory hashtable. 2) Call the parsing function in lc4_loader.c. 3) Call the disassembling function in lc4_dissassembler.c. One last thing to do in main() is to call a function to print the contents of your linked list to the screen and to the output file listed as the first argument to your program. Call the print_table() function In lc4_hash.c; you will need to implement the re- implement the printing helper function of the linked list (changing it from what you did in part 1) to display the contents of your lc4’s memory list like this:

INIT 0000

MUL R1, R2, R3
(and so on…)
Several things to note: There can be multiple CODE/DATA/SYMBOL sections in one .OBJ file. If there is more than one CODE section in a file, there is no guarantee that they are in order in terms of the address. In the file shown above, the CODE section starting at address 0000, came before the CODE section starting at address: 0010; there is no guarantee that this will always happen, your code must be able to handle that variation. Also, SYMBOL sections can come before CODE sections! What all of this means is that before one creates/allocates memory for a new node in the memory list, one should “search” the list to make certain it does not already exist. If it exists, update it, if not, create it and add it to the list!
Prior to exiting your program, you must properly “free” any memory that you allocated. We will be using a memory checking program known as valgrind to ensure your code properly releases all memory allocated on the heap! Simply run your program: lc4 as follows:
valgrind –leak-check=full lc4
Valgrind should report 0 errors AND there should be no memory leaks prior to submission.
Note: we will run valgrind on your submission, if it leaks memory, you will lost many points on this assignment. So watch the VIDEO, learn how to use valgrind!!
Also note: If your code doesn’t compile or even run, you will lose most of the points of this assignment!

CIS 2400 Project #2: C – LC4 Disassembler
TESTING YOUR CODE
When writing such a large program, it is a good strategy to “unit test.” This means, as you create a small bit of working code, compile it and create a simple test for it. Run “valgrind” on the code, see if it leaks memory. Once you are certain it works, and doesn’t leak memory, go on to the next function, implement that, test it out.
DO NOT write the entire program, compile it, and then start testing it. You will never resolve all of your errors this way. You need to unit test your program as you go along or it will be impossible to debug.
Where to get input files?
The test files from the previous assignment are available in your codio environment for this assignment. However, remember that you can create your own .OBJ files using PennSim. Write a simple .ASM file and then assemble it into a .OBJ file. You can then use that .OBJ file as input to your disassembler!

CIS 2400 Project #2: C – LC4 Disassembler
STRUCTURING YOUR CODE:
Preloaded in codio, you’ll find some of the files named below. For the ones you don’t see on codio, you must create them and implement them as described in the assignment above.
lc4.c lc4_hash.c lc4_hash.h lc4_memory.c lc4_memory.h
lc4_loader.h lc4_loader.c lc4_disasembler.h lc4_disasembler.c Makefile
– must contain your main() function.
– must contain your hash table helper functions.
– must contain declaration of your hash_table structure – must contain your linked list helper functions.
– must contain the declaration of your row_of_memory
structure & linked list helper functions
– contains your loader function declarations.
– must contain your .OBJ parsing function.
– contains your disassembler function declarations.
– must contain your disassembling function.
– must contain targets listed in the skeleton Makefile
You cannot alter any the existing functions in the .h files.
1) SIGNIFICANT EXTRA CREDIT: A complete reverse assembler:
Finish the disassembler to translate any/all instructions in the ISA. Have you program print the linked list to the screen still, but also create a new output file: .asm. In that file it should contain only the assembly program that you disassembled. If it works, I should be able to load it into PennSim , assemble it, and reproduce the identical .OBJ file that your .ASM file was derived from! Don’t forget to add in the directives (.CODE, .DATA)…the ultimate test of your program will be getting it to assemble using PennSim!
2) Early Turn in EXTRA CREDIT:
IF you wish to receive EXTRA CREDIT for either part of the assignment,
you can turn your assignment in early! For each day of early turn in,
we will award you 2.5 points of extra credit (note: maximum of 10
extra credit points, per part, can be earned for early turn-in)

CIS 2400 Project #2: C – LC4 Disassembler
Directions on how to submit your work: You must submit two things for this HW:
1) An anti-plagiarism form in gradescope 2) Your codio work
o Download the file:
o This must be done for each electronic assignment in our course
o Print it out and sign the form
o Scan in the printed out form(using your favorite app/ scanner) upload to gradescope o Codio submissions won’t be graded unless this form is submitted on gradescope
• Submitting in Codio:
o To manually submit your work, from the codio menu, choose: “EDUCATION”
§ From the education menu, choose: “Mark As Completed”, type yes, press OK
§ On our end of codio, we will see your project as “completed.” Then we can
open it and grade your HW. You can still see your files, but you won’t be able
to modify them any further after you mark your HW as completed.
§ Note the late policies that are outlined in the syllabus.
• Important Note on Plagiarism:
o We scan your HW files for plagiarism using various tools and means.
o If you are unaware of the plagiarism policy, make certain to check the syllabus.
• Submitting anti-plagiarism form in gradescope:
CIS2400_HW-Plagiarism_Signature.pdf
from canvas