DSCC 201/401 Homework Assignment #2
Due: February 7, 2024 at 9 a.m. EST
Answers to these questions should be submitted via Blackboard. Questions 1-6 should be answered by all students (DSCC 201 and 401) and Question 7 should be answered by students registered in DSCC 401. Please upload a file containing your answers and explanations to Blackboard (Homework #2: Hardware for Data Science) as a Word document (docx), text file (txt), or PDF.
Imagine that you currently lead the data science department at an education consulting and testing firm that has been assigned to produce customized certification exams for the aviation industry. Your team has been given thousands of documents and other useful unstructured and structured data that you will use to build a large language model for generating questions and answers for certification exams for the aviation industry. You would like to purchase some infrastructure to use for this project since it will need high performance computing resources. You expect to receive new data monthly and would like to develop consistent workflows to generate pools of examination questions and other study materials for the future. Based on the project description and requirements, you recommend purchasing hardware for a Linux cluster that also has the capability of GPU acceleration. After some careful analysis, you and your team have decided to purchase 6 compute nodes that each contain 2 CPUs, 512 GB of RAM, 1 HDR InfiniBand card, and 4 Nvidia Ampere H100 GPUs, along with peripheral equipment. The quote has been uploaded to the Blackboard site and is available under the instructions for this homework assignment (HPC Quote.pdf). Please provide thorough and thoughtful explanations to the following questions.
Question 1: How many teraFLOPS of theoretical computing capacity will be provided by a Linux cluster with the 6 compute nodes as described in the quote? Provide a double-precision floating point (FP64) value. Please show your reasoning and how you derive your values. (Hint: The Intel Xeon Gold 6338 uses the Ice Lake microarchitecture.)
Question 2: If the amount of memory was doubled from 512 GB to 1,024 GB for each of the compute nodes, would the theoretical computing capacity as calculated in Question 1 change? If so, what is the new calculated value? If not, then why not?
Question 3: After further review, your team decides that it may be worthwhile to downgrade the model of GPU card in each compute node. What would be the total computing capacity of the cluster (in teraFLOPS) if each of the Nvidia H100 GPUs was replaced with Nvidia L40 GPUs? Provide a value based on double-precision floating point calculations (FP64). What is the total amount of computing capacity provided by the H100 GPUs if single precision (FP32) is considered? How does this compare to the computing capacity (FP32) for the L40 GPUs? Please provide explicit numerical comparisons.
Question 4: If your team purchases these compute nodes, will this equipment create a complete Linux cluster solution? Does the system as presented in the quote have everything needed to connect and work together? What are the additional components that are included in the quote? Is there any additional hardware or software needed to make these servers into a complete Linux cluster? Please be detailed and specific in your answers.
Question 5: Recently, AMD announced a new Instinct MI300X GPU that provides additional acceleration for computational tasks over the Instinct MI250X GPU. Currently, computational acceleration for Frontier, the fastest supercomputer in the world is provided by the MI250X GPUs. If these GPUs were replaced by AMD’s new MI300X GPUs, what would be the estimated Rpeak value of the entire system? Explain how you derived your answer and state any assumptions you have made.
Question 6: According to the press release provided by Nvidia for the Eos supercomputer, the system is expected to have a performance of 18.4 exaFLOPS. A copy of this press release (press_release.pdf) has been uploaded to the Blackboard site and is available under the instructions for this homework assignment. Based on the data in the TOP 500 list that we discussed in class, will this new Eos supercomputer surpass Frontier as the world’s fastest supercomputer and take the first position on the TOP 500 list? Why or why not? Please make sure to explain your reasoning in detail.
Question 7: (DSCC 401 ONLY): Read the paper, "Benchmarking TPU, GPU, and CPU Platforms for Deep Learning," by Wang, Wei, and Brooks. A copy of this paper has been uploaded to the Blackboard site and is available under the instructions for this homework assignment (Wang_Wei_Brooks.pdf). Provide detailed answers to the following questions:
A. What is ParaDnn? How does it compare to LINPACK?
B. The authors of the paper had access to Google's TPU v3. What is the performance of this accelerator as mentioned in the paper? How does the theoretical performance of Google's TPU v3 compare with the performance of Nvidia’s H100 GPU?
C. What is NVLink? Did the authors measure the performance on multi-GPU systems that use PCIe or NVLink in this study? If so, how many GPUs did they use concurrently? If not, what was the explanation?