CS250A Computer Systems Architecture Assignment 3

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due


CS250A Computer Systems Architecture  Assignment 3

1. Tomasulo (30 points) 

A. Fill in the cycles an instruction spends in each stage (per Table 3.4 and details on p. 97), showing start, end cycles in each column. Initially, all registers are logical and the ROB is empty. Show the assigned reservation station, ROB entry, reasons for delays, if any, in the "Explanations" column. Use Fig. 3.5 diagram with pipelined units and latencies of 4 (*) and 2 (+). You can utilize other hrdw structures to assist you. 

Instr.
DecRen
Disp
IS
EX
Comm
Explanations
F3<-F2 * F1
1/1
2/2




F1<-F3 + F4






F4<-F4 - F2






F3<-F3 * F2






F2<-F2 - F4






2. Cache Coherence (20 points) 

A bus based multiprocessor system has 2 processors and a memory connected by a bus. It uses the MSI cache coherence protocol with write invalidation. Caches are write-back. A modified line is written to memory on a change of its state. The other cache that has issued a request which caused the change of state is delayed until the memory is updated, then re-tries. A bus signal called Abort can be raised to abort a cache bus read, which will be retried later. Abort is raised in the same bus cycle as the request (by another cache) Initially all lines are invalid. P1 has a higher priority for bus arbitration than P2 in the case of a simultaneous attempt to access the bus. 

Processors execute the following time-ordered (left to right) sequence of memory requests to address A: 

P1: R(A), P2: R(A), P2: W(A), P1: W(A), P2: R(A) Show the protocol state transitions of line A in each cache after each processor or request. A line is one bus/coherence operation. (Add more rows to the table if necessary).

Bus Request
C1
C2
read A

I















3. Superscalar (35 points) 


Show cycle by cycle execution of the code below on the DEC 21164 processor. Use a table with 5 columns, one for each 21164 FUs, showing the RTL for each operation. Use Rx,Fy for int and f.p. registers and create your own register assignments. Show operation concurrency by placing them in the same row of the table. Explain any stalls or delays that may occur. 

for(i=1;i<64;i++) { 

 A[i] = B[i] + C[i]; 

 E[i] = A[i] * D[i];} 

Show the execution of 3 iterations. 

4. LD/Store forwarding (15 points) 

Consider the Load/Store unit in Fig. 5.2. Redesign it to support store to load forwarding. You are allowed to add new datapaths and fields to the SQ and LQ in the block diagram. Show its operation using RTLfrom the time an address is generated in the AGU.

发表评论

电子邮件地址不会被公开。 必填项已用*标注