# **UC Berkeley CS61C Fall 2018 Final Exam Answers** | A mystery, byte-addressed cache has Tag:Index:Offset (T:I:O) = 9:4:3. For the computer, | | | | | | | | | |-----------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|-----------------------|------------------------------------------------------------------|--|--|--| | a) What is the <i>virtual address space</i> ? (select ONE) | | | | | | | | | | | | | explore that in parts | | ) | | | | | c) How | Associativity | cacne size, while pr | O Direct Mapped | ? (select ONE per ro | w) | | | | | | Block Size | ○ 4 bytes | 8 bytes | 12 bytes | ○ 3 bytes | | | | | | # of Blocks | ○ 8 blocks | ○ 16 blocks | O 4 blocks | 128 blocks | | | | | associa<br>here th<br>So the | We start with block size, which is a function of the offset (3 bits), so that's 8 bytes regardless of cache associativity configuration. Since we have 4 index bits, we use the 2 <sup>4</sup> = 16 indices to index into a set of things, here the <i>most</i> we can have (if we want to <i>maximize</i> cache size), which is 8-way set associative. So the number of total blocks is 2 <sup>4</sup> sets/cache * 2 <sup>3</sup> blocks/set = 2 <sup>7</sup> blocks/cache = 128 blocks in this 1 cache. d) How could we <i>minimize</i> cache size, while preserving T:I:O=9:4:3? (select ONE per row) | | | | | | | | | <b>a</b> , | Associativity | ○ 2-way set | Direct Mapped | ○ 8-way set | ○ 4-way set | | | | | | Block Size | ○ 4 bytes | 8 bytes | ○ 12 bytes | ○ 3 bytes | | | | | | # of Blocks | O 8 blocks | 16 blocks | ○ 32 blocks | ○ 64 blocks | | | | | here the So the e) Now seeing | ativity configuration. e least we can have number of total block we're working with a if we can lower our A | Since we have 4 ind<br>(to minimize cache s<br>ks is 2 <sup>4</sup> sets/cache * 2<br>a write-back, 1024B of<br>AMAT if our memory | lex bits, we use the 2 size), which is 1-way 20 blocks/set = 24 blocks/set access pattern is ite | e that has 8B blocks. | ks in this 1 cache. We're interested in ay with a fixed stride. | | | | | Modification | Hit Time | Miss Penalty | Miss Rate | Hardware Complexity | |--------------------------------------------------------------------|----------------------------|-----------------------------|---------------------------------------------------------------|---------------------| | Change block size from 8B to 16B, but keep the cache size the same | Olected Increase No effect | Increase Decrease No effect | <ul><li>Increase</li><li>Decrease</li><li>No effect</li></ul> | = | capacity misses. For each of the following modifications, mark how it changes each component of AMAT (Hit Time, Miss Penalty, Miss Rate) and the overall Hardware Complexity. | Change to 2-way Associativity (same cache & block size) | Decrease | O Increase O Decrease No effect | O Increase Decrease No effect | <ul><li>Increase</li><li>○ Decrease</li><li>○ No effect</li></ul> | |---------------------------------------------------------|----------|---------------------------------|-------------------------------|-------------------------------------------------------------------| |---------------------------------------------------------|----------|---------------------------------|-------------------------------|-------------------------------------------------------------------| A block size change simply involves changing the aspect ratio of the cache (so height [rows, indices, sets] \* columns [blocks size controlled by offset] = constant) -- doubling the width is halving the height. So you're giving a bit from I to O. Doesn't affect hit time (the time to a cache hit ... i.e., the time to determine if a cache hits, and the time to get the data back to the registers), but does increase miss penalty (how much data you have to bring from the lower level of the cache) and does reduce the miss rate because of the benefits of spatial locality (but not forever, you'll recall when we get TOO few blocks the miss rate actually goes up because of ping pong and other effects). Doesn't affect the hardware complexity in terms of needing a new comparator or logic block or anything else since the cache is still doing the same work. An associativity change (1-way to 2-way) might (incrementally) increase the hit time since the hardware now has to parallel-compare which of the two blocks in a set match and then route it accordingly; it's still going to be on the order of 1 cycle for a hit, so we accepted "increase" or "no effect". Since we're not changing block size it doesn't affect the miss penalty, but hopefully we now don't have as many conflict misses so our miss rate will decrease, and we certainly need more hardware to do the tag comparisons. #### M2) Floating down the C... [this is a 2-page question] (8 points = 1,1,2,1,1,1,1, 20 minutes) Consider an 8-bit "minifloat" SEEEMMMM (1 sign bit, 3 exponent bits, 4 mantissa bits). All other properties of IEEE754 apply (bias, denormalized numbers, ∞, NaNs, etc). The bias is -3. - a) How many minifloats are there in the range [1, 4)? (i.e., $1 \le f < 4$ ) Bias of -3 means the exponent can go from -3 to $4 \to to 2^3$ so we are in range. 1 and 4 are powers of 2, so that's two "ranges", and with MMMM = 16 mantissa values, that's **32** mantissa values. - b) What is the number line distance between 1 and the smallest minifloat bigger than 1? $\frac{1/16}{1}$ 1 is a special number since the exponent is 0 (after the bias is applied), thus it's $2^{0*}$ 1.MMMM $\rightarrow$ 1.MMMM (the binary shift left/right by $2^{\text{EXP-Bias}}$ goes away) $\rightarrow$ the least M is counting by 1/16 so the next number after 1.0000<sub>2</sub> is 1.0001<sub>2</sub> which is 1+1/16. - c) Write times2 in one line using integer operations. Assume the leftmost "E" bit of f (bolded above) is 0. minifloat times2(minifloat f) { return f \* 2.0; } times2: addi a0, a0, 0b00010000 = 0x10 ## Assume f is in the lowest byte of the register Ret We have to add one to the exponent to make it work, cool! | Consider the following code (and truncated ASCII table; as an example of how to read it, "G" is 0b0100 0111): | | | | | | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--| | <pre>uint8_t mystery (uint8_t *A) { return *A ? (*A &amp; mystery(A+1)) : 0xFF; }</pre> | | | | | | | d) Where is A stored? ( <i>not</i> what it points to, *A) code | | | | | | | e) [alternate exam] What is (char)mystery("FLOW")? | | | | | | | The code does an AND of all the characters bits, the upper bits are $100 \& 101 \rightarrow 100$ , and the lower bits are $1111 \&$ | | | | | | | 0111 & 0110 & 1100 $\rightarrow$ 0100, so it's 100 0010 $\rightarrow$ <b>D</b> | | | | | | | f) A two-character string is passed into <b>mystery</b> that makes it return the <b>uint8_t</b> value <b>0</b> (not the character " <b>0</b> "). The first character is " <b>M</b> " ["K" alternate exam], the second | | | | | | | character is a number from 1-9. Which? | | | | | | | \( \text{4} \cdot \text{5} \cdot \text{6} \) | | | | | | | ○7 | | | | | | | What number has no hits in sommen with Ma hits-100 1101 | | | | | | | Pe P | 5 - | | | | | 0 , | 100 | 0 1 | |-------------|-------|-----|-----|-----|-----|-----|-----|----------| | <u>'.</u> ] | b 4 + | b 3 | p 5 | b i | Row | 3 | 4 | 5 | | | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Р | | | 0 | 0 | 0 | | | 1 | Α. | Q | | | 0 | 0 | 1 | 0 | 2 | 2 | В | R | | | 0 | 0 | 1 | 1 | 3 | 3 | C | S | | | 0 | 1 | 0 | 0 | 4 | 4 | D | T | | | 0 | | 0 | 1 | 5 | 5 | Ε | U | | | 0 | 1 | 1 | 0 | 6 | 6 | F | <b>V</b> | | | 0 | _ | 1 | | 7 | 7 | G | W | | | ١ | 0 | 0 | 0 | 8 | 8 | н | X | | | _ | 0 | 0 | | 9 | 9 | 1 | Y | | | | 0 | 1 | 0 | 10 | | J | Z | | | 1 | 0 | I | 1 | 11 | ; | К | C | | | - | 1 | 0 | 0 | 12 | < | L | \ | | | 1 | 1 | 0 | ١ | 13 | # | М | כ | | | _ | 1 | 1 | 0 | 14 | > | 8 | ^ | | | | 1 | | | 15 | ? | 0 | | What number has no bits in common with M's bits=100 1101 $\rightarrow$ all numbers have the high nibble with no bits in common so it's only the bits that only have 1 in the 2s column, thus 0010 or 0000 (but 0 is not part of it), so it must be 0010 $\rightarrow$ **2**. (note the ASCII low nibble of a 0-9 number is the number itself) [Alternate exam] What number has no bits in common with K's bits=100 1011 $\rightarrow$ all numbers have the high nibble with no bits in common so it's only the bits that only have 1 in the 4s column, thus 0100 or 0000 (but 0 is not part of it), so it must be 0100 $\rightarrow$ **4**. (note the ASCII low nibble of a 0-9 number is the number itself) d) Incrementing or decrementing a variable is common: e.g., **sum** += **amount**, so we decide to make a new RISC-V instruction (based on I-format instructions), and reuse the unused bits from **rs1** to give to the immediate (**rd** will be read *and* written, don't touch funct3 or opcode). Using *only* that operation, what's the most **amount** that **sum** could now increase by (approx)? (select ONE for each column) | | )1 | <u></u> | <b></b> 4 | <b>8</b> | <u></u> 16 | ⊖ <blank></blank> | ○kilo | ○mega | ⊖giga | ⊖tera | Opeta | ○exa | |-----|-----------|---------|-----------|---------------|-----------------|-------------------|-------|-------|-------|-------|-------|-------| | ○32 | $\subset$ | 64 | <u></u> | $\bigcirc$ 25 | 56 <b>○</b> 512 | | Okibi | ○mebi | ⊝gibi | ⊜tebi | ○pebi | ○exbi | 12 bits of immediate + 5 more from register = 17, but it's signed so $-2^{16} \rightarrow 2^{16}$ - 1, approx $2^{16}$ which is **64 kibi**. # M3) Just one more thing...RISC-V self-modifying code! (8 points, 20 minutes) (this is meant to be a fairly hard problem, we recommend saving it until the end of the exam...) Your Phase I date was too late, so you can't get into the course you want. You need to hack CalCentral's server to enroll yourself! You find the following program running on the CalCentral server: The subroutine change\_reg allows a user to arbitrarily set the value of any registers they choose when the function is executed (similar to the debugger on Venus). os(char \*a0) runs the command at a0. Select as few registers as necessary, set to particular values to MAKE THE RISC-V CODE MODIFY ITSELF so the os function runs "/bin/sh" to hack into the CalCentral database. Please note: even though change\_reg can arbitrarily change any register it STILL follows the RISC-V calling convention. You CANNOT assume that the registers are initialized to zero on the launch of the program. Also, the assembler is NOT optimized. Hint: Think about where the change needs to happen, then what it should be. | Reg | Value to set it to (in HEX without leading zeros) | |------------|---------------------------------------------------| | ☐ a0 | 0x | | ☐ a1 | 0x | | ☐ a2 | 0x | | □ s0 | 0x | | □ s1 | 0x | | ☐ s2 | 0x | | t0 | 0x12 | | ☐ t1 | 0x | | <b>t</b> 2 | 0xA0 | | | Not Possible | We have to change "addi a0 x0 0x100" to "addi a0 x0 0x10A" since the next string starts right after the first, which has 9 characters and a trailing 0, so that's bytes 0-9, meaning byte 10, or 0x10A is the location of the string you need to pass to os in a0. We need to store it in byte 18 (4 words = 16 bytes to skip over and 2 bytes within the word to skip), and write A0 into the 18th = 0x12 byte to clobber the lower nibble of the immediate with A and keep rs1 to be 0, to make "addi a0 x0 0x100" become "addi a0 x0 0x100" ``` [what]t2 = 0xA0, [where]t0 = 0x12 ``` The alternate exam swapped t2,t0 for t1,t2, but otherwise it was the same. ## F1) VM... (20 points = 12+4+4, 30 minutes) a) What are the steps that occur for a memory read (when a **page fault** is encountered)? You may assume there's room in memory for a new page, and we're using LRU replacement. Assume there's no data cache. Mark the order of the required actions, there's at most one choice per #, and every row/col should have a #. | Below, $\bigcirc \Rightarrow$ Select ONE, $\square \Rightarrow$ Select ALL that apply | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |--------------------------------------------------------------------------------------------|------------|---------|------------|------------|------------|------------|------------|------------|------------|------------| | Request data using the ( Ophysical Ovirtual ) address | | 0 | 0 | 0 | 0 | $\circ$ | 0 | 0 | 0 | $\circ$ | | Access Page Table with OPPN OVPN Offset | 0 | 0 | 0 | | 0 | 0 | 0 | 0 | 0 | $\bigcirc$ | | Access TLB with OPPN OPPN OFfset | 0 | 0 | | 0 | 0 | 0 | 0 | 0 | 0 | $\bigcirc$ | | Adjust LRU bits in TLB Page Table Memory | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | $\bigcirc$ | | Split physical address into ( PPN+Offset VPN+Offset PPN+VPN ) fields | 0 | 0 | 0 | 0 | 0 | • | 0 | 0 | 0 | 0 | | Split virtual address into ( \times PPN+Offset \bigcup VPN+Offset \times PPN+VPN ) fields | 0 | • | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | Request new page from OS/Memory manager | $\bigcirc$ | 0 | 0 | 0 | | $\circ$ | 0 | 0 | 0 | $\circ$ | | Update Page Table with new PPN VPN Offset | 0 | 0 | 0 | 0 | 0 | 0 | | 0 | 0 | 0 | | Update TLB with new PPN VPN Offset | 0 | 0 | | 0 | 0 | | | | 0 | | | Return the data (this is the last thing we do) | $\bigcirc$ | $\circ$ | $\bigcirc$ | - \_1\_ Request data using the virtual address - \_2\_ Split virtual address into VPN, Offset fields - \_3\_ Access TLB with VPN - \_4\_ Access Page Table with VPN - \_5\_ Request new page from OS/Memory manager - \_6\_ Split physical address into PPN, Offset fields - \_7\_ Update page table with new PPN - \_8\_ Update TLB with new PPN,VPN - \_9\_ Adjust LRU bits on TLB - \_10\_ Return the data - b) Mark the following questions as either **True** or **False**: | True | <b>○False</b> | If we have a TLB which contains a number of entries equal to MEMORY_SIZE / PAGE_SIZE, every TLB miss will also be a page fault. | |--------------|---------------|------------------------------------------------------------------------------------------------------------------------------------------| | <b>○True</b> | False | If we change our TLB to direct-mapped, we're likely to see fewer TLB misses. | | ⊖True | False | Every TLB miss is equally expensive in terms of the amount of time it takes for us to resolve our virtual address to a physical address. | | <b>OTrue</b> | False | A virtual address will always resolve to the same physical address. | **\_\_True\_\_** If we have a TLB which contains a number of entries equal to MEMORY\_SIZE / PAGE\_SIZE, every TLB miss will also be a page fault. \_\_False\_\_ If we change our TLB to direct-mapped, we're likely to see fewer TLB misses. **\_\_False\_\_** Every TLB miss is equally expensive in terms of the amount of time it takes for us to resolve our virtual address to a physical address. (Page faults are much more expensive than Page-Table hits. Both are possible outcomes of a TLB miss.) \_\_False\_\_ A virtual address will always resolve to the same physical address. c) Consider a VM system on a RISC-V 32-bit machine with 2<sup>20</sup> page table rows, no TLB, and limited physical RAM, choose ONE of the following code snippets that **would always have the most page faults per memory access** by touching elements of a page-aligned **uint8\_t** array A, and choose ONE value of STRIDE (choose the minimum possible value that accomplishes it). Both A\_SIZE and STRIDE are powers of 2, and A\_SIZE > STRIDE. random(N) returns a random integer from 0 to N. ``` \bigcirc for (i = 0; i < STRIDE; i++)</pre> { A[random(A_SIZE)] = random(255); } \bigcirc for (i = 0; i < A_SIZE; i++) { A[random(STRIDE)] = random(255); } \bigcirc for (i = 0; i < A_SIZE; i++) { A[i] = random(STRIDE); } i < STRIDE; i++)</pre> \bigcirc for (i = 0; { A[i] = random(255); } \bigcirc for (i = 0; i < A_SIZE; i+=STRIDE) { A[i] = random(255); }</pre> for (i = STRIDE; i < A_SIZE; i++)</pre> { A[i] = A[i-STRIDE]; A[i-STRIDE] = A[i]; } \bigcirc 2^0 \bigcirc 2^1 \bigcirc 2^2 \bigcirc 2^3 \bigcirc 2^4 \bigcirc 2^5 \quad \bigcirc 2^6 \quad \bigcirc 2^7 \bigcirc 2^8 \bigcirc 2^9 \bigcirc 2^{10} ○2<sup>14</sup> \bigcirc 2^{15} 218 \bigcirc 2^{21} \bigcirc 2^{16} \bigcirc 2^{17} \bigcirc 2^{19} \bigcirc 2^{20} STRIDE: \bigcirc 2^{13} \bigcirc 2^{23} \bigcirc 2^{27} \bigcirc 2^{28} 229 \bigcirc 2^{26} \bigcirc 2^{30} \bigcirc 2^{31} \bigcirc 2^{32} ``` To pound on the memory system the most, you'd request a different page with every access. (the first two can't guarantee that). Stride should be page size, and since 32 bits virtual address = VPN (20 bits since $2^{20}$ page table rows) + offset, then offset = 12. Therefore stride should be page size = $2^{offset}$ = $2^{12}$ . The alternate exam had $2^{10}$ page table rows, so using the same reasoning, it'd be stride = page size = $2^{22}$ . #### F2) SDS (20 points = 8+5+7, 30 minutes) a) Transform the **fun** function below into the *fewest Boolean gates* that implement the same logic. You may use AND, OR, XOR and NOT gates. *Hint: start with the truth table.* ``` bool fun(bool A, bool B) { return (A == B) ? true : B; } ``` a) If you write out the truth table, it's | A | В | fun(A,B) | | | | |-------|-------|----------|--|--|--| | FALSE | FALSE | TRUE | | | | | FALSE | TRUE | TRUE | | | | | TRUE | FALSE | FALSE | | | | | TRUE | TRUE | TRUE | | | | ...and using sum of products is: !A!B + !AB + AB !A (!B + B) + AB distribution !A + AB complimentarity, identity **!A + B** [uniting theorem v.2: x + !xy = x + y (here x = !A, B = y)] (the alternate exam swapped A and B. so they would have A+!B (the bubble would be on B, not A below). b) The logic implementation of a state machine is shown in the figure below. How many states does this state machine have? (Assume that it always starts from Out0=0, Out1=0) Out0 ← xor(!Out0, Out1) Out1 ← Out0 | Out0 | Out1 | Binary | | | | |-------|-------|--------|--|--|--| | FALSE | FALSE | 0 | | | | | TRUE | FALSE | 2 | | | | | FALSE | TRUE | 1 | | | | | FALSE | FALSE | 0 | | | | | etc | etc | etc | | | | (Out0=TRUE, OUT1=TRUE) is never accessed. Number of states = $\bigcirc 1$ $\bigcirc 2$ $\bigcirc 3$ $\bigcirc 4$ $\bigcirc 5$ $\bigcirc 6$ $\bigcirc 7$ $\bigcirc 8$ $\bigcirc 9$ c) In the figure above, flip-flop clk-to-q delay is **40ps**, setup time is **30ps** and hold time is **30ps**. XOR delay is **20ps** and the inverter delay is **10ps**. What is the *maximum frequency* ( $F_{max}$ ) of operation? $F_{max} = 1/(t_{clk-q} + t_{inverter} + t_{xor} + t_{setup}) = 1/(100ps) = 10 \text{ GHz}$ . The alternate exam had double the delays and setup/hold times, so the Fmax would be 1/200ps = 5 GHz. ### F3) Datapathology [this is a 2-page question] (20 points = 4+10+6, 30 minutes) The datapath below implements the RV32I instruction set. We'd like to implement sign extension for loaded data, but our loaded data can come in different sizes (recall: 1b, 1h, 1w) and different intended signs (1bu vs. 1b and 1hu vs 1h). Each load instruction will retrieve the data from the memory and "right-aligns" the LSB of the byte or the half-word with the LSB of the word to form mem [31:0]. a) To correctly load the data into the registers, we've created two control signals SelH and SelW that perform sign extension of mem[31:0] to memx[31:0] (see below). SelH controls the half-word sign extension, while **SelW** controls sign extension in the two most significant bytes. What are the Boolean logic expressions for the four (0, 1, 2, 3) **SelW** cases in terms of Inst[14:12] bits to handle these five instructions (1b, 1h, 1w, 1bu and 1hu)? SelH has been done for you. In writing your answers, use the shorthands "I14" for Inst[14], "I13" for Inst[13] and "I12" for Inst[12]. You don't have to reduce the Boolean expressions to simplest form. (Hint: green card!) Answers (and simplified form) | SelW=3 | | | | |---------------------------------------|-------------------------|--|--| | SelW=2 ~ 14 • ~ 13 • 12 = ~ 14 • 12 | | | | | SelW=1 ~ 14 • ~ 13 • ~ 12 | | | | | SelW=0 | ~I14 • I13 • ~I12 = I13 | | | | SelH=2 | l14 • ∼l13 • ∼l12 | | | | SelH=1 | ~l14 • ~l13 • ~l12 | | | | SelH=0 | l13 + l12 | | | (Single-bit values mem[7] and mem[15] are wired to 8 or 16 outputs) Here's how we figured it out -- we made a table: INST 111 432 | Inst | funct3 | byte3+byte2 | 2 | Ī | byte1 | | byte0 | |------|--------|-------------|--------|---|-----------|--------|----------| | 1b | 000 | mem[7] | SelW=1 | | mem[7] | SelH=1 | mem[7:0] | | 1h | 001 | mem[15] | SelW=2 | | mem[15:8] | SelH=0 | mem[7:0] | | lw | 010 | mem[31:16] | SelW=0 | | mem[15:8] | SelH=0 | mem[7:0] | | 1bu | 100 | 0 | SelW=3 | | 0 | SelH=2 | mem[7:0] | | 1hu | 101 | 0 | SelW=3 | | mem[15:8] | SelH=0 | mem[7:0] | ... and then reversing the table for the cases based on the INST funct3 bits above yields the values above. # F3) Datapathology, continued (20 points = 4+10+6, 30 minutes) (this is the same diagram as on the previous page, with five stages of execution annotated) b) In the RISC-V datapath above, mark what is used by a jal instruction. (See green card for its effect...) | Select<br>one per<br>row | PCSel Mux: ASel Mux: BSel Mux: WBSel Mux: | "pc + 4" branch "pc" branch "imm" branch "pc + 4" branch | <ul><li> "alu" branch</li><li> Reg[rs1] branch</li><li> Reg[rs2] branch</li><li> "alu" branch</li></ul> | <pre> * (don't care) * (don't care) * (don't care) * mem" branch</pre> | ○ * (don't care) | |-----------------------------|-------------------------------------------|----------------------------------------------------------|---------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|------------------| | Select all<br>that<br>apply | Datapath units:<br>RegFile: | Branch Comp Read Reg[rs1] | Imm. Gen Read Reg[r | Load Ex | | | start: 1w t0, 0 (a0) Hazard (circle one): S D C None # of nop 2 t0 being written to and read from the next op causes a data hazard and 2-cycle stall (or introduction of 2 NOPs so the W and R of the register file lines up — thankfully we can read and write registers the same cycle) beq t0, x0, end Hazard (circle one): S D C None # of nop 2 We need to wait until after the EX stage to know whether t0==x0 before we can load the correct next instruction, so that causes a control hazard and a 2-cycle stall (or introduction of 2 NOPs) addi t0, t0, 2 Hazard (circle one): S D C None # of nop 2 t0 being written to and read from the next op causes a data hazard and 2-cycle stall (or introduction of 2 NOPs) so the W and R of the register file lines up — thankfully we can read and write registers the same cycle) sw t0, 0 (a0) Hazard (circle one): S D C None # of nop 2 end: inst 1 2 3 4 5 6 7 8 9 10 11 12 13 14 wt0 0 (a0) IF ID EX MEM WB t0 NOP NOP IF ID EX MEM WB addi t0, t0, 2 NOP NOP IF ID EX MEM WB t0 NOP NOP IF ID EX MEM WB wt0, 0 (a0) NOP NOP IF ID EX MEM WB wt0, 0 (a0) NOP NOP IF ID EX MEM WB will only send the following temperatures: {100.0, 100.1, 100.2, 100.4, 100.8, 101.6, 103.2, 106.4, and any time the temperature is not those exact values, it'll send whatever value is the closest one. What encoding/decoding scheme would you use for these numbers and how many total bits would you need: Scheme: Unsigned fixed point Bias fixed point 2s complement fixed point Other Bits: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Other (just have a lookup table) using 3 bits to choose from the 8 values b) True False A 9/9 ALU operation would cause an interrupt, dealt with by the trap handler. | (Assume a register car | า be w | ritten a | and rea | ad in th | ne sam | ne cycl | e, and | that th | ne Brai | nch Co | mp is | in the | EX sta | ige.) | |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|-----------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------|----------------------------------------------------------------------------|---------------------------------------------------------------|-----------------------------------------------------------|----------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------|-------| | inst 1 2 3 4 5 6 7 8 9 10 11 12 13 14 lw t0 0(a0) IF ID EX MEM WB t0 beq t0,x0,end NOP NOP IF ID EX MEM WB t0 addi t0, t0, 2 NOP NOP IF ID EX MEM WB t0 NOP NOP IF ID EX MEM WB t0 NOP NOP IF ID EX MEM WB t0 NOP NOP IF ID EX MEM WB t0 NOP NOP IF ID EX MEM WB t0 NOP NOP IF ID EX MEM WB t0 addi t0, t0, 2 NOP NOP IF ID EX MEM WB t0 www WB F4) What's that smell?! Oh. it's Potpourri (20 points=2 each, 30 minutes) a) We build a small Internet-of-things device to measure dog body temperature and send it to a receiver. If will only send the following temperatures: {100.0, 100.1, 100.2, 100.4, 100.8, 101.6, 103.2, 106.4}, and any time the temperature is not those exact values, it'll send whatever value is the closest one. What encoding/decoding scheme would you use for these numbers and how many total bits would you need? Scheme: Unsigned fixed point Bias fixed point 2s complement fixed point Other Bits: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Other (just have a lookup table) using 3 bits to choose from the 8 values b) True False A 0/0 ALU operation would cause an interrupt, dealt with by the trap handler. | to being written to and so the W and R of the beq to, We need to wait until a instruction, so that cau addi to, to being written to and so the W and R of the sw to, | register x0, after the ses a t0, d read register | from the file line of t | ines up<br>stage to<br>I haza<br>he nex | t op ca<br>tha<br>Ha:<br>o know<br>rd and<br>Ha:<br>tt op ca<br>o tha | auses ankfully zard (control of the control | a data / we ca circle o her t0 /cle sta circle o a data / we ca | hazardan read<br>ne): S<br>==x0 l<br>all (or in<br>ne): S<br>hazardan read | d and d and d and d D C before ntrodu D C d and d and d and d | 2-cycle write re None we ca ction o None 2-cycle write re | e stall (egister # 0 n load of 2 NC # 0 e stall (egister | (or intrist the soft nope the coopes) of nope (or intrist the soft nope (or intrist the soft nope soft nope soft nope (or intrist the soft nope soft nope soft nope (or intrist the soft nope soft nope soft nope soft nope soft nope (or intrist not not not not not not not not not no | oduction oduction oductions ame constant of the th | on of 2 cycle) 2 next 2 on of 2 | | | beq t0,x0,end NOP NOP IF ID EX MEM WB addi t0, t0, 2 NOP NOP IF ID EX MEM WB t0 NOP NOP IF ID EX MEM WB www. Sw t0, 0(a0) NOP NOP IF ID EX MEM WB www. WB www. F4) What's that smell?! Oh. it's Potpourri (20 points=2 each, 30 minutes) a) We build a small Internet-of-things device to measure dog body temperature and send it to a receiver. It will only send the following temperatures: {100.0, 100.1, 100.2, 100.4, 100.8, 101.6, 103.2, 106.4}, and any time the temperature is not those exact values, it'll send whatever value is the closest one. What encoding/decoding scheme would you use for these numbers and how many total bits would you need? Scheme: Unsigned fixed point Bias fixed point 2s complement fixed point Other Bits: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Other (just have a lookup table) using 3 bits to choose from the 8 values b) True False A 0/0 ALU operation would cause an interrupt, dealt with by the trap handler. | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | | addi t0, t0, 2 NOP NOP IF ID EX MEM WB t0 | | IF | | | | | | | | | | | | | | | sw t0, 0 (a0) F4) What's that smell?! Oh, it's Potpourri (20 points=2 each, 30 minutes) a) We build a small Internet-of-things device to measure dog body temperature and send it to a receiver. It will only send the following temperatures: {100.0, 100.1, 100.2, 100.4, 100.8, 101.6, 103.2, 106.4}, and any time the temperature is not those exact values, it'll send whatever value is the closest one. What encoding/decoding scheme would you use for these numbers and how many total bits would you need? Scheme: Unsigned fixed point Bias fixed point 2s complement fixed point Other Bits: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Other (just have a lookup table) using 3 bits to choose from the 8 values b) True False A 8/8 ALU operation would cause an interrupt, dealt with by the trap handler. False, that's an exception | beq t0,x0,end | | NOP | NOP | IF | | EX | MEM | WB | | | | | | | | F4) What's that smell?! Oh, it's Potpourri (20 points=2 each, 30 minutes) a) We build a small Internet-of-things device to measure dog body temperature and send it to a receiver. It will only send the following temperatures: {100.0, 100.1, 100.2, 100.4, 100.8, 101.6, 103.2, 106.4}, and any time the temperature is not those exact values, it'll send whatever value is the closest one. What encoding/decoding scheme would you use for these numbers and how many total bits would you need? Scheme: Unsigned fixed point Bias fixed point 2s complement fixed point Other Bits: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Other (just have a lookup table) using 3 bits to choose from the 8 values b) True False A 0/0 ALU operation would cause an interrupt, dealt with by the trap handler. False, that's an exception | addi t0, t0, 2 | | | | | NOP | NOP | IF | | EX | MEM | | | | | | a) We build a small Internet-of-things device to measure dog body temperature and send it to a receiver. It will only send the following temperatures: {100.0, 100.1, 100.2, 100.4, 100.8, 101.6, 103.2, 106.4}, and any time the temperature is not those exact values, it'll send whatever value is the closest one. What encoding/decoding <i>scheme</i> would you use for these numbers and how many total <i>bits</i> would you need? Scheme: Unsigned fixed point Bias fixed point 2s complement fixed point Other Bits: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Other (just have a lookup table) using 3 bits to choose from the 8 values b) True False A 0/0 ALU operation would cause an <i>interrupt</i> , dealt with by the trap handler. False, that's an exception | sw t0, 0(a0) | | | | | | | | NOP | NOP | IF | t0 | EX | | WB | | False, DMA obviates the need for Programmed I/O where the CPU does the work | | | | | | | | | | | | | | | | c) If we convert the above datapath to a 5-stage pipeline with **no forwarding**, what types of hazards (S=structural, D=data, C=control) exist **after** each line in the following code; how many nops must we add? | • | work is another kind of parallelism; multiple nodes can talk to ne time, "sharing" the network. | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | | ork is a "bus" that can only be driven by one source at a time | | e) First-fit Next-fit Best-fit would Next-fit, since it needs to spreads the writes out | make sense for allocating blocks on a Flash (SSD) drive. to have even read wear | | f) Control Datapath Memory Or Memory, with all the cache coherence issues we | nput Output causes the most headaches with multi-core. | | <pre>while (lock != 0); // spin // Lock == 0 now (unlocked) lock = 1; // set Lock (lock // access shared resource . lock = 0; // release lock ( False, it doesn't work in C, you need an assemble)</pre> | unLocked) ly-level ATOMIC test-and-set mechanism | | h) The code below was written to sum the n | umbers from <b>1</b> to <b>N</b> (always a positive number). Select ONE. | | <ul> <li>It works, but it has to be run on a machine with 16 physical cores</li> <li>It works, but it has to be run on a machine with 16 logical cores</li> <li>It works, but only when N is bigger than the number of physical cores</li> <li>It works, but only when N is bigger than the number of logical cores</li> <li>It has a race condition bug</li> <li>It has a deadlock bug</li> <li>It always works</li> </ul> | <pre>int sumup(int N) { int THREADS = 16, TOTAL = 0, sum[THREADS]; omp_set_THREADS(THREADS); for (int i=0;i<threads;i++) #pragma="" (int="" +="i;" for="" i="id;i&lt;=N;i+=THREADS)" id="omp_get_thread_num();" int="" omp="" parallel="" pre="" return="" sum[i]="0;" sum[id]="" total+="sum[id];" total;}<="" {="" }=""></threads;i++)></pre> | | Race condition bug (TOTAL is read and increme | ented by multiple threads) | | <ul><li>A compile error because the printf s</li><li>A compile error because the printf s</li><li>An infinite loop</li></ul> | o trailing \n (so the output doesn't get flushed) 654321 | **Infinite loop** (the unsigned number is never negative to break out of the loop)