Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
Design Space Exploration
In this project, you are going to use SimpleScalar as the evaluation engine to perform a design space exploration, using the provided framework code, over a 18-dimensional processor pipeline and memory hierarchy design space (some of these dimensions are not independent). You will use a 5-benchmark suite as the workload.
1. Project Goal
- The “best” performing overall design (in term of the geometric mean of normalized execution time normalized across all benchmarks)
- The most energy-efficient design (as measured by the lowest geometric mean of normalized energy-delay product [units of energy delay product are joule-seconds] across all benchmarks)
2. Background
2.1. SimpleScalar
This project heavily uses SimpleScalar but most of the interface is abstracted out by a simpler framework interface. Nevertheless, you can refer to this SimpleScalar guide for details about parameters passed to SimpleScalar.
2.2. Design Space Exploration
Given a set of design parameters, Design Space Exploration (DSE) involves probing various design points to find the most suitable design to meet required goals. Follow this quick reading about DSE before moving ahead.
DSE can be performed for different design goals. For example, one DSE may want to find the best performing design whereas another DSE may be aimed at finding the most energy efficient design. A more complex DSE may look for the best performing design given a fixed energy budget.
An exhaustive DSE simply tries out all possible combinations of parameter values to find the absolute best design. However, as the size of design space increases this approach quickly becomes infeasible. Consider a 10-dimensional design space with 5 possible values for each parameter and 2 minutes simulation time to evaluate a given design point; an exhaustive search will take 510 ∗ 2min ≈ 37years.
A more intelligent DSE employs heuristics to intelligently prune down the design space and to prioritize evaluation of more reasonable design points first. If the assumptions employed by the heuristics are correct, the DSE will still result in the best design. On the other hand. with a set of reasonably justified assumptions a heuristic can result in a “good enough” design point.
2.3. Energy-Delay Product
Design A takes 100pJ to process an image in 100ms, EDP = 10000 units. Design B takes 80pJ to process an image in 2000ms, EDP = 160000. Design A is clearly more energy efficient, but it performs poorly as it incurs more execution time. EDP enables a more holistic design comparison.
3. Our Heuristic
3.1. Evaluate all possible design points by changing the value of this dimension only3.2. Fix value of this dimension by selecting the best design so far (consider DSE goal)3.3. Mark this dimension as explored
You should choose an unexplored dimension in step 3 based on your PSU ID Numbers of students in the group, as follows.
DSE dimensions can be categorized in four major classes as follows:
and then you should look at the Table 1 and start from the first category in the corresponding row, and then second category, and so on.
For example, if your ID numbers are 9123456789 and 9111111111, the remainder of its sum’s division by 24 is 12 and you should explore Core configs first, then BP configs, then Cache configs, and then FPU configs at last.
Please note that the current implemented heuristic in generateNextConfigurationPro posal function is a simple heuristic as follows and you should extend it as explained above. Current implementation starts from the leftmost dimension and explores all possible options for this dimension, and then goes to the next dimension until the rightmost dimension.
4. Logistics
The set of possible points within the design space to be considered are constrained by the provided shell script wrapper runprojectsuite.sh. All allowed configuration parameters for each dimension of the design space are briefly described in the provided shell script. runprojectsuite.sh shell script takes 18 integer arguments, one for each configuration dimension, which expresses the index of the selected parameter for each dimension. All reported results should be normalized against a baseline design with configuration parameters which already hard-coded in the framework.
Note that not all possible parameter settings represent a valid combination. One of your tasks will be to write a configuration validation function based upon restrictions described later in this document. Further, note that this design space is too large to effi- ciently search in an exhaustive manner. Hence, a heuristic will be developed to specify an order in which the design space will be explored.
The framework code will evaluate a fixed number of design points per run. This pa rameter cannot be changed. The key part of your task in this project is to implement a heuristic search function that selects the next design point to consider, given either a
Table 1. Exploration Orders based on PSU ID.
|
( ID Number Sum) mod 24
|
1st |
2nd |
3rd |
4th |
|
0
|
BP
|
Cache
|
Core
|
FPU |
|
1
|
BP
|
Cache
|
FPU
|
Core |
|
2
|
BP
|
Core
|
Cache
|
FPU |
|
3
|
BP
|
Core |
FPU
|
Cache |
|
4
|
BP
|
FPU
|
Cache
|
Core |
|
5
|
BP
|
FPU
|
Core
|
Cache |
|
6
|
Cache
|
BP
|
Core
|
FPU |
| 7 |
Cache
|
BP
|
FPU
|
Core |
| 8 |
Cache
|
Core
|
BP
|
FPU |
| 9 |
Cache
|
Core
|
FPU
|
BP |
|
10
|
Cache
|
FPU
|
BP
|
Core |
| 11 |
Cache
|
FPU
|
Core
|
BP |
| 12 |
Core
|
BP
|
Cache
|
FPU |
| 13 |
Core
|
BP
|
FPU
|
Cache |
|
14
|
Core
|
Cache
|
BP
|
FPU |
| 15 |
Core
|
Cache
|
FPU
|
BP |
| 16 |
Core
|
FPU
|
BP
|
Cache |
|
17
|
Core
|
FPU
|
Cache
|
BP |
| 18 |
FPU
|
BP
|
Cache
|
Core |
| 19 |
FPU
|
BP
|
Core
|
Cache |
| 20 |
FPU
|
Cache
|
BP
|
Core |
| 21 |
FPU
|
Cache
|
Core
|
BP |
|
22
|
FPU
|
Core
|
BP
|
Cache |
|
23
|
FPU
|
Core
|
Cache
|
BP |
The framework, as given, provides functionality to enforce several, but by no means all, of the validation constraints. It is your job to implement validation functions to enforce constraints described throughout this document.
5. Framework
Extract project files archive and navigate toproject directory.make cleanmake./DSE performance
Different components of the framework are invoked in the following order:
DSE (project binary) → runprojectsuite.sh (shell script) → SimpleScalar
6. Anticipated Steps
7. Submission Requirements
7.1. Project Report
7.2. Code Implementations
# Extract project files archive and navigate toproject directory.make./DSE performance./DSE energy
8. Modeling Considerations
The Instruction Count (IC) for each benchmark is a constant. Thus, for performance, you will be trying to optimize Instructions Per Cycle (IPC) and the Clock Cycle (CC) time. Unless specified otherwise, the following modeling consideration have already been implemented in the framework to calculate EDP. However, the provided information may be used for explaining design space exploration results.
8.1. Clock Cycle Time
• Dynamic, fetch width = 1: 115 ps + FPU delay• In-order, fetch width = 1:100 ps + FPU delay• Dynamic, fetch width = 2: 125 ps + FPU delay• In-order, fetch width = 2: 120 ps + FPU delay• Dynamic, fetch width = 4: 150 ps + FPU delay• In-order, fetch width = 4: 140 ps + FPU delay• Dynamic, fetch width = 8: 175 ps + FPU delay• In-order, fetch width = 8: 165 ps + FPU delay
- • count = 1: 5ps
- • count = 2: 10ps
- • count = 4: 20ps
- • count = 8: 40ps
8.2.1. Core Leakage Power
• Dynamic, fetch width = 1: 1.5 mW• In-order, fetch width = 1: 1 mW• Dynamic, fetch width = 2: 2 mW• In-order, fetch width = 2: 1.5 mW• Dynamic, fetch width = 4: 8 mW• In-order, fetch width = 4: 7 mW• Dynamic, fetch width = 8: 32 mW• In-order, fetch width = 8: 30 mW
• count = 1: 0.25 mW• count = 2: 0.50 mW• count = 4: 1 mW• count = 8: 2 mW
Following list comprises tuples of format: [cache size or memory, access energy(pJ), leakage/refresh power(mW)]
• 8KB: 20pJ, 0.125mW• 16KB: 28pJ, 0.25mW• 32KB: 40pJ, 0.5mW• 64KB: 56pJ, 1mW• 128KB: 80pJ, 2mW• 256KB: 112pJ, 4mW• 512KB: 160pJ, 8mW• 1024KB: 224pJ, 16mW• 2048KB: 360pJ, 32mW• Main Memory: 2nJ, 512mW
• Dynamic, fetch width = 1: 10pJ• In-order, fetch width = 1: 8pJ• Dynamic, fetch width = 2: 12pJ• In-order, fetch width = 2: 10pJ• Dynamic, fetch width = 4: 18pJ• In-order, fetch width = 4: 14pJ• Dynamic, fetch width = 8: 27pJ• In-order, fetch width = 8: 20pJ
8.3. Validation Constraints
1. The il1 (L1 instruction cache) block size must be at least the ifq (instruction fetch queue) size (e.g., for the baseline machine the ifqsize is set to 1 word (8B) then the il1 block size should be at least 8B). The dl1 (L1 data cache) should have the same block size as your il1.
2. The ul2 (unified L2 cache) block size must be at least twice your il1 (and dl1) block size with a maximum block size of 128B. Your ul2 must be at least twice as large as il1+dl1 in order to be inclusive.
4. ul2 size: Minimum = 32 KB; Maximum = 1 MB
5. The il1 sizes and il1 latencies are linked as follows (the same linkages hold for the dl1 size and dl1 latency):
8.4. Miscellaneous Constraints
Appendices
In each cell specify the parameter value followed by why this value guided the DSE closer to your optimization goal (for example: more ALUs allow extraction of more ILP and increase performance). Make sure the parameters are in the exact order as they appear in runprojectsuite.sh
|
Parameter |
Performance |
EDP |
|
Param1 (i.e. width) |
Value = Why = |
Value = Why = |
|
Param2 (i.e. scheduling) |
Value = Why = |
Value = Why = |
|
... |
... |
|
|
ParamN |
Value = Why = |
Value = Why = |
D. Bar chart showing per-benchmark normalized energy-delay product and geomean normalized energy delay product for the most energy-efficient design found
These four plots must be labelled in your report corresponding exactly to numbering in the list above. Furthermore, axis in the plots should be properly labelled.
For clarity in the written report, when listing the best design points, please do not represent them in terms of their index representations (e.g. 1 0 0 5 2 ...) and instead describe the actual value used for each dimension in a table or similar presentation.
Points will also be assigned for following the guidelines and adhering to appropriate levels of clarity, and style (and spelling, grammar, etc.) for a technical document.
Line 2 headers: bestTimeconfig, normalized EDP of bestTimeconfig, normalized Execution time of bestTimeconfig, absolute EDP of bestTimeconfig, absolute Execution time of bestTimeconfig, absolute Time of Bench 0 on bestTimeconfig, normalized Time of Bench 0 on bestTimeconfig, absolute Time of Bench 1 on bestTimeconfig, normalized Time of Bench 1 on bestTimeconfig, absolute Time of Bench 2 on bestTimeconfig, normalized Time of Bench 2 on bestTimeconfig, absolute Time of Bench 3 on bestTimeconfig, normalized Time of Bench 3 on bestTimeconfig, absolute Time of Bench 4 on bestTimeconfig, normalized Time of Bench 4 on bestTimeconfig