Login: cs61c-\_\_\_\_ ### Spring 2012 CS61C Midterm | Your Name: | | | RC | 15/C | | | | |--------------|-------|-------|------|------|------|-----|--| | | | | | | | | | | Your TA: | Rimas | Scott | Alan | Eric | Paul | lan | | | | | | | | | | | | Login: cs61c | | | | | | | | | | | | | | | | | This exam is worth 100 points, or about 10% of your total course grade. The exam contains 8 questions. This booklet contains 12 numbered pages including the cover page. Put all answers on these pages, please; don't hand in stray pieces of paper. You will receive 1 point for properly filling out your name, TA, and login (login must be filled in properly on every page of the exam). This is the exam's 0<sup>th</sup> question. | Question | Points(Minutes) | Score | |----------|-----------------|-------| | 0 | 1 | | | 1 | 6(3) | | | 2 | 7(5) | | | 3 | 10(4) | | | 4 | 12(5) | | | 5 | 21(13) | | | 6 | 8(4) | | | 7 | 20(14) | | | 8 | 15(9) | | | Total | 100(56) | | # 1. True or False (1pt each) - a. T / E: A mapper can output at most one key value pair per key value pair given as input. - **b.** The shuffle phase cannot group key value pairs in such a way as to send (a,7) and (a,4) to separate instances of reduce in one round of map-reduce. - **c.** The is a summary complete even if one of the workers fails. - d. T / F: Both the map and reduce tasks can run on multiple machines to exploit parallelism. - e. T / E: SIMD is an example of data level parallelism that performs different operations on a single input data value to produce multiple output data values. - **f.** The amount of parallelism achieved by a single SSE instrinsic is limited by the width of registers. | Login: | cs61c- | | |--------|--------|--| | Login. | COOTC- | | 2. Potpourri $(\ \ _{\rho}\lambda)$ a) Which of the following is NOT a consequence of Moore's Law? - A) The same computer will get cheaper over time - B) Transistors get smaller over time - (C)) Transistors will become twice as energy-efficient every 18 months - D) Computer chips can get physically smaller over time | b) Moore's law states that the | transistors | per chip will | double | every | |--------------------------------|-------------|---------------|--------|-------| | 18 or 14 months. | (1 pt) | | (1 pt) | | (phc) Both the ARM and MIPS architectures were announced about 27 years ago. According to Moore's Law, approximately how many more times transistors per chip do we have today than when these RISC instruction sets were developed? - A) 64 times more transistors today - B) 1,000 times more transistors today - if wwo (C) 16,000 times more transistors today - 18 (D) 256,000 times more transistors today (2 pts) d) What is the formula that the assembler should use to calculate the value to place in the address field of a beq or bne machine language instruction? Assume DestAddress is the address of the destination if the branch is taken and PC points to the beq instruciton. For example, | Address<br>10000 | Label | Instruction<br>beq \$t1,\$t2, DestAddress | |------------------|--------------|-------------------------------------------| | 10080 | DestAddress: | <br>addu \$t1, \$\$t2, \$t3 | - A) DestAddress - B) DestAddress / 4 - C) DestAddress \* 4 - D) DestAddress PC - E) (DestAddress PC)/4 - F) DestAddress PC 4 - G)(DestAddress PC 4) / 4 - H) (DestAddress PC 4) \* 4 - I) None of the above ## 3. Memory Hierarchy LO POINTS, I PER QUESTION ### Short answer: | 1) Caches are designed to exploit TEMPORAL and SPATIAL locality in memory access patterns. [ GET 1/2 POINT (F ONE CORRECT] | |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 2) The level of the memory hierarchy has the highest cost per bit of storage. (closest/furthest from chip) | | 3) The level of the memory hierarchy has the highest capacity. (closest furthest from chip) | | 4) In a 16kB byte-addressed, direct mapped cache that uses 32-bit addresses and 64 byte cache lines, the address is divided into 6 offset bits, 8 index bits and 18 tag bits. [B tag bits.] | | 5) Holding capacitance and power constant, a CPU's voltage must be decreased by a factor of to allow for a doubling of frequency. | | Multiple choice: | | For each of the following questions, select the option which is most often true: A) increases B) decreases C) does not effect | | 6) Increasing the capacity of a cache its access time | | 7) Decreasing the capacity of a cache its miss rate | | 8) Decreasing the capacity of a cache its miss penalty | | 9) Increasing a cache's hit rate the effective average memory access time | | 10) Assuming a constant capacity, increasing a cache's block size (cache line size) | | SOR C the amount of tag storage required TOPPENAS ON IF TALICING ABOUT TAGAREA > B ORIF (1) TAG WIDTH > C | ## 4. Compiling Linking Interpreting etc. a) In one word each, name the most common producer and consumer of the following items: Choose from: linker loader compiler assembler programmer | (item) | This is the output of: | This is the input to: | |-------------------------|------------------------|-----------------------| | bne \$t0,\$s0,done | CO MPILER | ASSEMBLER | | char *s = "hello world" | PROGRAMMER | COMPILER | | app.o string.o | ASSEMBLER | LINKER | | firefox | LINKER | LOADER | b) Circle <u>all</u> the advantages of interpretation over compilation: - - Interpreted programs often have a higher CPI, resulting in better resource use. - Interpreted programs do not require compilation. - An interpreted interpreter can interpret itself, eliminating any need for compilers. - An interpreted program can run on many different platforms without modification.) - c) Circle *all* the reasons why many programs are compiled instead of interpreted: - Compilers produce programs that often execute faster. - It is easier to write code when using a compiler than when using an interpreter. - Interpreted programs cannot be scaled to many machines. - It's harder to reverse engineer a compiled program than an interpreted one. + 1 FOR CORRECT ANSWER -1 FOR INCORRECT ANSWER MIN IS O, MAX IS 2 Login: cs61c- ## 5. C Question - Graph Representation // vertex\_t declared elsewhere, sizeof(vertex\_t) == 16 // struct holding two pointers to vertex t's typedef struct edge { vertex t \*first; vertex t \*second; } edge t; // A graph is represented as an array of edge t pointers terminated by a null pointer (similar to how a c string is terminated by a null character) // The following pointer to a graph is declared: edge t\*\* G; a) Determine the value of the following expressions, assuming a 32-bit architecture and tightlypacked structs: $$sizeof(\&G) = 4$$ $sizeof(G[0]) = 4$ $$sizeof(G[U]) = \Upsilon$$ $$sizeof(**G) = \emptyset$$ sizeof(&G) = 4 Spts for exact answers sizeof(G[0]) = 4 Spts for exact answers redit if units given b) Draw a possible pointer diagram starting from G, assuming the graph it points to has exactly two edges and three vertices. G and the three vertices are given - your job is to fill in the rest. In a pointer diagram blocks of contiguous memory are drawn together, and pointers are drawn as arrows between the blocks. Make your diagram as clear as possible. c) How much space does it take to store the graph you just drew (including G, the pointer to the graph)? Show your work. d) Given print vertex(vertex t \*v), fill in the blanks in this function: For example, an output for a graph with three edges could be: (a,b) (b,c) (c,a) void print\_graph(edge\_t \*\*graph) { while ( \*graph ) > first ); print\_vertex( (\*graph ) > se and ); printf(","); printf(")\n"); (aph ++ 2 pts for pointer increment 2 pts for correct conditional statement 3 pts for print arg (all or nothing) 1 pt for "second" print arg penalties for introducing extra variables Login: cs61c-\_\_\_\_ ## 6. SIMD String Below is some MIPS from lecture for string copying. \$s1 holds the address of string we are copying, and \$s2 holds the address we are copying to. We'll define a new kind of string to allow faster copying through SIMD. Our new kind of string is much like the regular C strings except: - The string starts on a word-aligned address; in other words, the last two bits of its 32 bit address are 0s. - The string ends with a word-aligned 32 bit null value. Using the idea of SIMD, change at most four instructions to make a faster C string copy. To modify an instruction fill in the replacement instruction in the blank on the same line as the original instruction. Do not fill in a blank if you do not wish to modify its corresponding instruction. Loop: j Loop | lb \$t2,0(\$s1) | # \$t2 = *p | 1 w \$+2, 0(\$s1) | 05 | lh | |--------------------|-----------------------------|---------------------------------------|-----------------------------------------|----| | sb \$t2,0(\$s2) | # *q = \$t2 | Sw Sp2,0(\$52) | 0 | sh | | addi \$s1,\$s1,1 | # p = p + 1 | add: Ps/ \$1/4 | * * * * * * * * * * * * * * * * * * * * | | | addi \$s2,\$s2,1 | # q = q + 1 | addi \$52\$524 | | | | beq \$t2,\$zero,Ex | it # if *p == 0, go to Exit | · · · · · · · · · · · · · · · · · · · | | | | | | | | | # go to Loop Zpts a line 11 correct formula! Implies more than simply capying Login: cs61cfrom a sheet. Understanding of the formula must be demonstrated #### 7. How Fast Suppose you are running code on a machine with a two level data cache. It also has an instruction cache, which always hits, so we disregard it for our calculations. - L1\$ has a local hit rate of 50%, and hits in 1 cycle. - L2\$ has a local hit rate of 95%, and hits in 10 cycles. - Main memory has a hit rate of 100%, and hits in 100 cycles. Now, consider the following MIPS function: #begin function call jal foo foo: addiu \$sp, \$sp, -8 sw \$s0, 0(\$sp) sw \$s1, 4(\$sp) lw \$s0, 0(\$a0) lw \$s1, 0(\$a1) sw \$s1, 0(\$a0) sw \$s0, 0(\$a1) lw \$s0, 0(\$sp) lw \$s1, 4(\$sp) addiu \$sp, \$sp, 8 jr \$ra #end function call Swaps the values pointed to by Jao, Sal. Does not swap Sao, Sal. a. Briefly describe what foo does. 2 pts-correct b. Calculate the AMAT for the architecture provided (Reminder: we are ignoring instruction formula loads for our calculations). AMAT = 1 + 0.5(10 + .05(100)) = 8.5 Cycles 1pt-correct numerical substitution Ipt - correct answer/units c. Assuming that the program's CPI would be 1 in the absence of cache misses, calculate the CPI of a call to foo (note that the function call includes the jal). CPI = 1 + (=) (0.5(10+.05(100))) = 6 cycles/instruction 2pts - correct formula Ipt - correct substitution 1 pt - answer and units | -2 ots | per | incorre | 1 | line | |--------|-----|---------|---|------| | -2 pts | of | 0/8 | | | d. The following function is supposed to do the same thing as foo, but doesn't. Without adding any instructions, repair footoo so that it does. To modify an instruction fill in the replacement instruction in the blank on the same line as the original instruction. Do not fill in a blank if you do not wish to modify its corresponding instruction. | | jal footoo | #begin function call | | _ | |--------|-----------------|----------------------|-------------|-------| | | | ¥ | | | | | • | | | | | | | | | | | footoo | ): | | | | | | lw \$s0,0(\$a0) | | lw \$+0,0( | \$q0) | | | lw \$s1,0(\$a1) | | Iw 1+1, 0( | 1a1) | | | sw \$s1,0(\$a0) | | Sw sti oc | 1a0) | | | sw \$s0,0(\$a1) | | sw \$to, oc | du 1) | | | jr \$ra | #end function call | | - , | | | | | | | e. Assuming that a program spends 20% of it's time making calls to foo, what is the speedup of switching to footoo? The ratio of teads to loads and stores to other in structions is constant moving from foo to footoo, so CPI foo = CPI footoo. So we can apply Amdahl's law directly: $speedup = \frac{1}{(1-0.2) + 0.2} = \frac{10}{9} = 1.7$ certification #### 8. Ackermann The Ackermann function A is defined as follows: $$A(m,n) = \begin{cases} n+1 & \text{if } m = 0\\ A(m-1,1) & \text{if } m > 0 \text{ and } n = 0\\ A(m-1,A(m,n-1)) & \text{if } m > 0 \text{ and } n > 0. \end{cases}$$ Fill in the following C function so that it computes A(m, n). unsigned int A(unsigned int m, unsigned int n) { if $$(M=20)$$ } { return $N+1$ ; } else if $(N=20)$ } { return $A(M-1,1)$ ; } else { return $A(M-1,1)$ ; } $(M-1,1)$ ; } $(M-1,1)$ ; } $(M-1,1)$ ; } $(M-1,1)$ ; } $(M-1,1)$ ; Now you're going to translate that C into an equivalent MIPS function. We've built a skeleton once again, but you're going to have to fill in the blanks to flesh it out. A: a) addiu \$sp, \$sp, $$-|2| \leftarrow nob + |2|$$ sw \$s0, 0(\$sp) sw \$s1, 4(\$sp) sw \$ra, 8(\$sp) addu \$s0, \$a0, \$0 addu \$s1, \$a1, \$0 beg \$s0, \$0, L1 #continued on next page# Login: cs61c-\_\_\_\_ addu \$a0, \$s0, \$0 #does nothing, included for clarity addiu \$a1, \$s1, -1 jal A L1: $$(e,f)$$ addiu \$v0, $\frac{4s1}{ss1}$ , $\frac{1}{ss1}$ L2: addiu \$a0, \$s0 ,-1 addiu \$a1,\$0, 1 jal A j Exit Exit: 1 point for b, e, t, at's (must get born) 2 points for c, d, 9thti (if revosed -1)