Embedded Systems Group (ES)

Abacus Processor Simulator

The Abacus simulator has a couple of parameters to specify the pipeline and the memory architecture that is considered for the simulation. This page explains these parameters and the output of the simulator. For a list of detailed instructions of the Abacus processor, have a look at the Abacus reference card.

Pipeline Behavior

First of all, we may specify whether our processor has a pipeline at all or whether all instructions are executed within a single cycle. The latter is specified by the following parameter when set to "yes":
    #const SingleCycle    no  // whether everything is done in one stage
The above setting will make the following options for the specification of the pipeline pointless since it specifies that there is no pipeline at all.

Pipeline Specification

If SingleCycle is set to "no", we consider a processor with a pipeline and should therefore specify the behavior of this pipeline (or have to accept the default values). There are the following parameters:
    #const PipeFetch       1  // pipeline stage where next program counter is set
    #const PipeDecode      2  // pipeline stage where operand registers are read
    #const PipeExecute     3  // pipeline stage where ALU results are produced
    #const PipeMemAcs      4  // pipeline stage where Load results are produced
    #const PipeRegWrite    5  // pipeline stage where registers are written
The above values are the default values that specify a five stages pipeline with the mentioned stages as explained in the lecture.

Register Bypasses

In general, it is possible to read the value that is currently written into a register by using a hardware bypass. Such a circuit simply checks first whether a value is currently written to the specified register, and if so, it takes that value, and otherwise reads the value from the register. In processor pipelines, usually register bypasses are used, and we can specify this with
    #const RegBypass     yes  // whether register reads yield current writes
If we set RegBypass to "no", it requires one further step until values are seen in the register. This means that the number of nop operations is increased by one.

Forwarding

In general, the number of nops needed before issuing an instruction depends on whether and when the required operand registers will be written by instructions which are still in the pipeline. The closer these instructions are, the longer they need to leave the pipeline or to reach a pipeline stage where they write the values. This number of nops can be decreased by replacing "write the values" in the previous sentence by "produce the values" which is done by forwarding using the following option:
    #const Forwarding    yes  // whether forwarding from PipeExecute and PipeMemAcs
If forwarding is used, the value of RegBypass does not matter since we do not wait until values arrive in registers, and pick them even before. Then, the difference PipeExecute-PipeDecode determines the number of required nops instead of the difference PipeRegWrite-PipeDecode

Branch Instructions

The situation for branches adds further difficulties. First of all, also branches need to check values of registers and therefore have to wait until these values are available. Hence, all of the above parameters also affect the number of nops required before branch instructions can enter the pipeline. In addition, a branch is writing to the pc which is also a register. We have two alternatives here: In the first alternative, we execute the entire branch in stage PipeDecode and immediately write the pc using a bypass so that PipeFetch can immediately fetch the correct next instruction. Hence, the next instruction can directly follow the branch in this alternative without adding any nop operations after the branch. This alternative is specified with the following parameter:
    #const BranchInDecode yes  // immediately write new PC in PipeDecode for Branches
If we set BranchInDecode to "no", then the branch condition is evaluated in PipeExecute. If Forwarding=yes holds, PipeFetch may get the right pc from PipeExecute, and otherwise, PipeFetch has to wait until the branch reaches PipeRegWrite. Depending on whether RegBypass is set or not, it may then even take one further step until the right pc is found in its register.

For unconditional branches, i.e., jump instructions, we always assume that these are completely executed in PipeDecode so that no nop operations are required after jump instructions. This is convenient since we can then use j 1 as an abbrevation for nop.

Default Pipeline Settings

When using the Abacus simulator related with the lectures, there are two typical default settings. First of all, if you just want a single cycle behavior, all you need to specify is the following parameter which ignores then the other pipeline parameters:
    // -----------------------------------------------------------------------------
    // parameters for single-cycle (instruction set) simulation
    // -----------------------------------------------------------------------------
    #const SingleCycle    yes  // whether everything is done in one stage
The second typical setting is that you want to simulate the execution on a pipelined processor with the five stage pipeline with and without forwarding. The settings for forwarding are as follows, and those for disabling forwarding just need to write there no instead of yes:
    // ---------------------------------------------------------------------------------
    // parameters for classic five-stage pipeline with forwarding and register bypassing
    // ---------------------------------------------------------------------------------
    #const SingleCycle    no  // whether everything is done in one stage
    #const PipeFetch       1  // pipeline stage where next program counter is set
    #const PipeDecode      2  // pipeline stage where operand registers are read
    #const PipeExecute     3  // pipeline stage where ALU results are produced
    #const PipeMemAcs      4  // pipeline stage where Load results are produced
    #const PipeRegWrite    5  // pipeline stage where registers are written
    #const RegBypass     yes  // whether register reads yield current writes
    #const Forwarding    yes  // whether forwarding from PipeExecute and PipeMemAcs
    #const BranchInDecode no  // immediately write new PC in PipeDecode for Branches

You can also specify longer pipelines where the execution or the memory access may take more than one cycle. What matters for these pipelines are still the above five values. For instance, the following settings define a pipeline with 14 stages where four cycles are needed for execution and seven for memory access:

    // ---------------------------------------------------------------------------------
    // parameters for a pipeline with 14 stages using forwarding and register bypassing
    // ---------------------------------------------------------------------------------
    #const SingleCycle    no  // whether everything is done in one stage
    #const PipeFetch       1  // pipeline stage where next program counter is set
    #const PipeDecode      2  // pipeline stage where operand registers are read
    #const PipeExecute     6  // pipeline stage where ALU results are produced
    #const PipeMemAcs     13  // pipeline stage where Load results are produced
    #const PipeRegWrite   14  // pipeline stage where registers are written
    #const RegBypass     yes  // whether register reads yield current writes
    #const Forwarding    yes  // whether forwarding from PipeExecute and PipeMemAcs
    #const BranchInDecode no  // immediately write new PC in PipeDecode for Branches

Cache/Memory Architecture

To specify the cache or memory architecture, further parameters can be specified:
  • DataWidth is the bitwidth of the registers of the specified Abacus processor. It must be a multiple of 8. The memory addresses for load and store operations refer to word addresses, and a word is thereby the length of a register, hence, specified with this parameter.
  • MemSize is the size of the main memory printed in the simulation run in bytes. Make sure that it is big enough so that all load and store operations can be successfully executed.
  • CacheSize is the size of the data cache in bytes.
  • BlockSize is the size of a block in the data cache in bytes. The cache is organized in blocks therefore the CacheSize must be divisible by BlockSize, and also the BlockSize must be a multiple of DataWidth/8.
  • SetAssoc is the set associativity of the data cache. If SetAssoc=1 holds, there is not set associativity, i.e., each memory block can be at exactly one place in the cache. If SetAssoc is the number of blocks CacheSize/BlockSize, then all blocks can be at any place in the cache.

Simulator Output

The simulator will print the state of the computer system after execution of every instruction. Such a state of the computer system might look as follows:

----------------------------------------------------------------
step 67 at instruction 11 : bnz $5,-6
----------------------------------------------------------------
    mm : 05:04:03:02:01:00    bk[st] : v d tg 05:04:03:02:01:00    i : Reg[i]
     0 : 00.05.00.03.00.01     0[ 0] : 1 1  4 00.0E.00.0C.00.0A    0 : 00.00
     1 : 00.00.00.00.00.00     1[ 0] : 1 1  1 00.11.00.0F.00.0D    1 : 00.14
     2 : 00.00.00.00.00.00     2[ 1] : 1 1  0 00.0B.00.09.00.07    2 : 00.09
     3 : 00.00.00.00.00.00     3[ 1] : 1 1  4 00.00.00.12.00.10    3 : 00.13
     4 : 00.00.00.00.00.00                                         4 : 00.14
     5 : 00.00.00.00.00.00                                         5 : FF.FF
     6 : 00.02.00.00.00.00                                         6 : 00.00
     7 : 00.08.00.06.00.04                                         7 : 00.00
     8 : 00.00.00.00.00.00
     9 : 00.00.00.00.00.00
    10 : 00.00.00.00.00.00
    11 : 00.00.00.00.00.00
    12 : 00.00.00.00.00.00
    13 : 00.00.00.00.00.00
    14 : 00.00.00.00.00.00
    15 : 00.00.00.00.00.00

The three columns correspond with the main memory, the data cache and the register file. The content of the main memory is printed blockwise, so the leftmost column lists the block addresses. The words inside these blocks have addresses that can be calculated by the block address and the position inside the block which is listed in the title of that table. For example, in the above case there is a data word 00.06 in block 7 at byte addresses 03 and 02, so the memory address of that word is 7 * 3 + 1 (since there are three words in a block and that is the second on in the block).

The second column shows the content of the data cache. It is also organized in blocks since memory transactions between the main memory and the data cache always refer to blocks. In the above example, there are four blocks and the block addresses in the data cache are shown in the column denoted with bk. The number given in square brackets right to it denotes the set number. The blocks of a set can be rearranged to any place inside the same set, but not in other sets. The next two bits v and d denote whether the cache block is valid (v) and whether it is dirty (d). Initially, all blocks are invalid and become valid after loading a block from memory to the cache, and they become invalid after writing back the block. A block becomes dirty if it is written by a store instruction and has not yet been written back to the main memory. The column with title tg denotes the tag of the block address so that one can recompute the memory block address of the block in the cache. For example, the first block at cache block address 0 is in the set st=0 and has tag tg=4, so that it refers to the memory block at address 2 * tg + st = 8 since there are 2 sets in the cache. The final rightmost column is the content of the block.

The rightmost column is simply the content of the register file. Abacus has eight registers with a bitwidth of DataWidth that are printed in that table. As can be seen, a register content has two bytes, hence DataWith was 16 in the above example, BlockSize was 6 bytes, MemSize was 16*6=96 bytes, CacheSize was 4 * 6 = 24 bytes, and SetAssoc=2.