CE40F29C6.

Introduction

Your assignment for this entire class is to devise your own architecture, hardware design, software design, and firmware for a very special-purpose RISC processor. All it needs to be able to is to run any of three assigned programs (more on these later). The more programs it can run successfully, the higher your grade for the course.

ISA Requirements

Your instruction set architecture shall feature fixed-length instructions (machine code) 9 bits wide and a data path 8 bits wide.

Given the tight limit on instruction bits, you need to consider the target programs and their needs carefully.
The best design will come from an iterative process of designing an ISA, then coding the programs, redesigning the ISA, etc.

Your ISA specification should describe:
• What operations it supports and what their respective opcodes are.
o For ideas, see the MIPS, ARM, RISC-V, and/or SPARC instruction lists
• How many instruction formats it supports and what they are
o In detail! How many bits for each field, where they are found in the instruction.
o Your instruction format description should be detailed enough that someone other than you could write an assembler (a program that creates machine code from assembly code) for it. (Again, refer to ARM or MIPS.)
• Number of registers: how many general-purpose and how many (if any) specialized.
• All internal data paths, ALU data ports, and data memories will be 8 bits wide.
• Addressing modes supported
o This applies to both memory instructions and branch instructions.
o How are addresses constructed or calculated? Lookup tables? Sign extension? Direct addressing? Indirect? Immediates?
The more time and care you put into your specification, the easier the rest of the project will be. This is the design element, and it harder than it seems (you have a lot of options!).

Some Things to Think About

For instructions to fit in a 9-bit field, the memory demands of these programs will have to be small. For example, you will have to be clever to support a conventional main memory of 256 bytes (8-bit address pointer – think ARM or MIPs, for example). You should consider how much data space you will need before you finalize your instruction format. Your instructions are stored in a separate program memory, so that your data addresses need be only big enough to hold data. Your data memory is byte-wide, i.e., loads and stores read and write exactly 8 bits (one byte). Your instruction memory is 9 bits wide, to hold your 9-bit machine code.

You will write and run three programs on your ISA. You should start the first program at machine code (instruction, program counter) address 0, and work your way up from there for the other two. The specification of your branch instructions will depend on where your programs reside in memory, so you should make sure they still work if the starting address changes a little, e.g., if you have to rewrite one of the programs and it causes the others to shift, as well. Hint: It is perfectly fine to put NO-OPs in your instruction memory, such as between programs. This approach will allow you to put all three programs in the same instruction memory later on in the quarter.


Architecture Limitations and Requirements

We shall impose the following constraints on your design, which will make the design a bit simpler:

1. Your core should have separate instruction memory and data memory.
2. You should assume single-address data memory. You can write and read in place, as in a Dflipflop/register, but you can’t simultaneously write to and read from separate locations.
3. Your instruction memory should not exceed 212 entries (12-bit program counter), but it can be larger if necessary.
4. Your data memory shall not exceed 256 entries (8-bit address pointer).
5. You will probably want an ARM or MIPS style register file (or whatever internal storage you support) that can write to only one register per instruction.
a. You may also have a multibit ALU condition/flag register (e.g., carry out or shift out, sign result, zero bit, etc., similar to ARM’s Z, N, C, and V status bits) that can be written at the same time as an 8-bit data register, if you want.
b. You may read up to two data registers per cycle.
c. Your register file will have no more than two data output ports and one data input port.
d. You may use separate pointers for reads and writes, if you wish.
e. Please restrict register file size to no more than 64 registers.
6. Manual loop unrolling of your code is not allowed – use at least some branch or jump instructions.
7. Your ALU instructions will be a subset of those in ARMsim, or of comparable complexity.
8. You may use lookup tables / decoders, but these are limited to 64 elements each (i.e., pointer width up to 6 bits). Outputs can be wide enough to match machine code (9 bits), program counter (up to 16 bits, probably fewer), or data stream (8 bits).
a. You may not, for example, build a big 512-element, 32-bit LUT to map your 9-bit machine codes into ARM- or MIPS-like wider microcode. (It was amusing the first time a team tried it, but it got old .)


Some Things to Think About

In addition to these constraints, the following suggestions will either improve your performance or greatly simplify your design effort:
1. In optimizing for performance, distinguish between what must be done in series vs. what can be done in parallel.
a. E.g. An instruction that does an add and a subtract (but neither depends on the output of the other) takes no longer than a simple add instruction.
b. Similarly, a branch instruction where the branch condition or target depends on a memory operation will make things more difficult later on.
2. Your primary goal is to execute the assigned programs accurately. Secondary goals are:
a. Minimize clock cycle count.
b. Minimize cycle time (short critical paths).
c. Simplify your processor hardware design.

Generic, general-purpose ISAs (that is, those that will execute other programs just as efficiently as those shown here) will be seriously frowned upon. We really want you to optimize a creative special purpose design for these programs only.

Top-Level Interface

Your microprocessor needs only four one-bit I/O ports: clock, reset, and start (request) inputs from the testbench, and a done (acknoweldge) output back to the testbench.

We will use the start and done signals to drive your processor. During final testing, the sequence will be as follows:
1. The testbench will set the start and reset bits high.
a. Your processor must not write to data memory while the start bit is asserted.
2. The testbench will load operands into specified locations in the data memory.
3. The testbench will lower the start bit.
a. This should cause your processor to begin executing the first program.
4. When your program has run and your device has stored the result into the specified locations in data memory, your device should bring the done flag high.
5. The testbench will respond by reading and verifying your results.
6. The testbench will assert the start bit
a. Your processor should deassert the done flag in response
7. The testbench will load the next set of operands into the specified locations in data memory while the start bit is high.
8. The testbench will lower the start bit.
a. Your device should start running the second program.
9. When the second program completes, your processor should assert done.
10. The testbench will read and verify your results from the second program, then issue the final start command while loading the third set of operands into data memory. Your done flag at the end of this program will terminate simulation after the testbench reads and verifies your results.
11. Why both reset and start? reset=1 should force program counter to 0, and your first program should be stored starting at address 0 in your instruction memory. In contrast, start=1 will merely hold/pause your program counter between programs.

If you cannot get all three programs to run, separate testbenches for individual programs will also be provided, with correspondingly lower course grades awarded.


What must the processor do?

Your processor must be able execute the following three programs.

int2float –Write a program that converts a 16-bit two’s complement (8 bits integer + 8 bits fractional into 16-bit IEEE floating point format, i.e.,
short X; // but fractional, not integer!
float Y = X; // actually, equivalent C code would need a special float_binary_16 format command

The operand, X, is found in memory locations 1 (most significant word of X) and 0 (least significant word of X). The result, Y, shall be written into locations 3 (MSW of Y) and 2 (LSW of Y).

float2int — Write a program that converts a 16-bit IEEE format floating point number to 16-bit two’s complement fixed point 8 integer + 8 fractional, again i.e.:
short Y = X;

The operand, X, is found in memory locations 5 (MSW) and 4 (LSW). The result shall be written into locations 7 (MSW) and 6 (LSW).

float_add — Write a program that adds two 16-bit floating point numbers.
float Z = X+Y;

One 16-bit floating point operand will occupy data memory locations 9 (MSW) and 8 (LSW), whereas the other will occupy locations 11 and 10. Write the 16-bit floating point sum into locations 13 and 12 (LSW).


What to Submit?

You will turn in milestone reports and (eventually) all your code.

Reports will address questions for each milestone. In describing your architecture, keep in mind that the person grading it has much less experience with your ISA than you do. It is your responsibility to make everything clear. One objective of this course is to help you improve your technical writing and reporting skills, which will benefit you richly in your career.

For each milestone, there will be a set of requirements and questions that direct the format of the writeup and make it easier to grade, but strive to create a report you can be proud of. 
Milestone 1 — The ISA
For the first milestone, you will design the instruction set architecture (ISA) for your processor. A quick reminder that an ISA is more than just an instruction set. It describes a fair bit about how the machine will work [at least from the programmer’s perspective]. It specifies how many registers are available, how memory operates, how addressing works, etc. Your ISA design will dictate your implementation — plan ahead!

Milestone 1 Objectives
For this milestone, you will design the instruction set and instruction formats for your processor. You will then write code for the three programs to run on your instruction set.

Milestone 1 Components
i. List the names of all members of your team, but only one copy of the report should be submitted.
1. Introduction.
i. This should include the name of your architecture (have fun with this ), overall philosophy, specific goals strived for and achieved.
ii. Can you classify your machine in any of the classical ways (e.g., stack machine, accumulator, register, load-store)? If so, which? If not, devise a name for your class of machine.
2. Architectural Overview. This must be in picture form.
i. What are the major building blocks you expect your processor to be made up of?
NOTE: This is not your final processor design, rather an early rough draft of the major elements. Missing details and imprecision are okay at this stage, but you should continue to refine this picture as your design evolves. You will submit an updated diagram with every milestone.
3. Machine Specification
i. Instruction formats.
i. List all formats and an example of each. (ARM has R, I, and B type instructions, for example.)
ii. Operations.
i. List all instructions supported and their opcodes/formats.
iii. Internal operands.
i. How many registers are supported?
ii. Is there anything special about any of the registers, or all of them general purpose?
iv. Control flow (branches).
i. What types of branches are supported?
ii. How are the target addresses calculated?
iii. What is the maximum branch distance supported?
v. Addressing modes.
i. What memory addressing modes are supported, e.g. direct, indirect?
ii. How are addresses calculated?
iii. Give examples.
4. Programmer’s Model [Lite]
i. How should a programmer think about how your machine operates?
ii. Give an example of an “assembly language” instruction in your machine, then translate it into machine code.
5. Program Implementations
For each program, give assembly instructions that will implement the program correctly. Make sure your assembly format is either very obvious or well described, and that the code is (very) well commented. If you also want to include machine code, the effort will not be wasted, since you will need it later. We shall not correct/grade the machine code. State any assumptions you make.
i. Program 1
ii. Program 2
iii. Program 3

What to Submit?
You will submit a written report that contains all of the required components of Milestone 1. It is your responsibility to make this report clear and well-organized.

Your report should be a single document, in .doc(x) or .pdf form.
• Exception: Please attach your program implementations as separate “source code” files.


Milestone 2 — 9-bit CPU: Register file, ALU, and fetch unit
In this milestone, you will design the top level, register file, control decoder, ALU (arithmetic logic unit), data memory, muxes (signal routing switches), lookup tables, and fetch unit (program counter plus instruction ROM) for your CPU.

For this and future designs, we want the highest level of your design to be a schematic and SystemVerilog code. You may either hand-draw the schematic or generate it using the Quartus RTL Viewer function.

Anything below that can be schematic (again either drawn or generated by Quartus) and SystemVerilog, or just SystemVerilog. The SystemVerilog files implement the symbols included in the block diagram file. Everyone will use Questa/ModelSim for simulation and Intel (formerly Altera) Quartus II for logic synthesis in the Cyclone IVE family, device EP4CE40F29C6.

In addition to connecting everything together at the top level, you will demonstrate the functionality of each component separately through schematic, SystemVerilog, and timing printouts.

Milestone 2 Objectives
The primary goal of this milestone is to show individual components operating as desired. All of the pieces of your processor will need to work in isolation before final integration.

CPU Design Refreshers and Helpful Tips
The fetch unit points to the current instruction from the instruction memory and determines the next out of the program counter (PC). It should look something like the following diagram:

The program counter is a state element (register) that outputs the address pointer of the next instruction. Instruction ROM is a Read Only Memory block that holds your 9-bit machine code. It does not have to hold your actual code (generated in Milestone 3) yet at this point (but if you have already written it then it might as well). It should hold something so we can see the effect of changing PCs while your processor runs. The next PC logic takes as input the previous PC and several other signals and calculates the next PC value.

The inputs to the next PC logic are:
• start – when asserted, it sets the PC to the starting address of your program.
• start_address – has the starting address of your program.
• branch – when asserted it indicates that the prior instruction was a branch.
• taken – [optional.. more on this in lecture] when the instruction is a branch, this signal when asserted indicates the branch was resolved as taken.
• target – [some options.. more on this in lecture] where this branch is going

On non-branch instructions, the next PC should be PC+1 (regardless of the value of taken). For branch instructions, the new PC is either PC+1 (branch not taken) or target (branch taken). If your branches are ALWAYS PC-relative, then you can redefine target to be a signed distance rather than an absolute address if you want. Make sure you tell us this is what you’re doing. (Note: ARM and MIPS increment their respective PCs by 4, simply by convention because their machine codes are 32 bits = 4 bytes wide. We’ll just increment by 1, for each 9-bit value of our machine code.)

How to Present Your Implementation
You will demonstrate each element of your design in two ways.

First, with schematics such as the one shown above, plus your SystemVerilog code. Obviously, you must also show all relevant internal circuits with further SystemVerilog code.

Second, you must demonstrate correct operation of all ALU operations, register file functionality, and fetch unit functions with timing diagrams. An example of a (partial) timing diagram will be demonstrated in class; yours will be longer. The timing diagrams, for example, should demonstrate all ALU operations (this includes math to support load address computation, or any other computation required by your design), each with a couple of interesting inputs. Make sure any relevant corner/unusual cases are demonstrated. If you support instructions that do multiple computations at the same time, you need to demonstrate them happening at the same time. Note that you’re demonstrating ALU operations, not instructions. So, for example, instructions that do no computation (e.g., branch to address in register) need not be demonstrated. There will also be a timing diagram for the fetch unit, showing it doing everything interesting (increment, absolute jump/branch, conditional jump/branch, etc.). The schematics and timing diagrams will be difficult for us to understand without a great deal of annotation. Good organization of files and Verilog modules also helps.

Milestone 2 Components
Your Milestone 2 report should add on to your Milestone 1 report (you are building your final report over time).

Your Milestone 2 report must include a changelog that indicates where any significant changes have been made since your Milestone 1 submission. Please restrict this to highlighting substantial architectural or operational changes. You may include a changelog per section, or a final changelog at the end, or something in between as best suits your report. You do not need a changelog for new sections.
• Some things, such as your Introduction, may have no changes; this is fine/expected.

Your Milestone 2 reports must add the following. You may add these to existing sections in your report or add new sections, as you deem appropriate:

• A list of ALU operations you will be demonstrating, including the instructions they are relevant to. Also, a brief description of the register file functionality is needed.

• Full Verilog models, hierarchically organized if your top level module contains subassembly modules, some of which contain smaller modules.

• Well-annotated timing diagrams or transcript (diagnostic print) listings from your module level Questa/ModelSim runs. It should be clear that your program counter / instruction memory (fetch unit) and ALU works. If your presentation leaves doubt, we’ll assume it doesn’t.

• Your Architectural Overview figure should be revised with more detail / needed updates.

Answer the following question:

• Will your ALU be used for non-arithmetic instructions (e.g., MIPS or ARM-like memory address pointer calculations, PC relative branch computations, etc.)? If so, how does that complicate your design?

What to Submit?
You will submit a written report that contains all of the required components of Milestones 1 and 2. It is your responsibility to make this report clear and well-organized.

Your report should be a single document, in .doc(x) or .pdf form.
• Exception: Include your program implementations as separate “source code” files. 
Milestone 3 — An Assembler & Early Integration
Assemblers convert human-readable assembly code to computer-readable machine code. Your code from Milestone 1 is the former, but your processors will need the latter.

Milestone 3 Objectives
Implement an assembler. Begin the process of integrating your processor components.

1. Write an assembler which converts your assembly code from Milestone 1 into 9-bit binary machine code. We will provide sample code, but you may use any language you wish. This should be a fairly simple string access, map, print sequence.
2. If you have not already done so in Milestone 2, write a top-level SystemVerilog model of your design which instantiates the ALU, fetch (program counter) unit, instruction memory (either inside fetch or separate), register file, data memory, control decoder, and any other blocks you need. This does not need to actually run the three problems yet — that will be the final piece — but it should compile cleanly in both Questa/ModelSim and Quartus II.

Milestone 3 Components
Your Milestone 3 report should add on to your Milestone 2 report.

Your Milestone 3 report must include a changelog that indicates where any significant changes have been made since your Milestone 2 submission.

Your Milestone 3 reports must add the following. You may add these to existing sections in your report or add new sections, as you deem appropriate:

• An example of input to and output from your assembler.
o [unlikely]: If your assembler does anything beyond what a ‘normal’ assembler would be expected to do, explain this as well.

• Your Architectural Overview figure should be revised with more detail / needed updates. [Might you be able to automate this drawing now?]

What to Submit?
You will submit a written report that contains all of the required components of Milestones 1, 2, and 3. It is your responsibility to make this report clear and well-organized.

Your report should be a single document, in PDF form.
• Exception: Attach your program implementations as separate “source code” files.