首页 » 计算机科学(Computing Science) » CS 521 — Machine Learning and Compilers

CS 521 — Machine Learning and Compilers

2025-03-04 Admin 写评论

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

CS 521 — Machine Learning and Compilers
Spring Semester 2025

Due: Friday March 14, 11:59pm CST (Champaign Standard Time).

Goal

In this project, you will implement tensor graph rewrites that power modern tensor compilers used for machine learning inside the XLA compiler.

Downloading and building XLA

For this MP, you will need to download and build the XLA compiler. Following steps are given for your convenience. We recommend setting up a conda environment (with python>=3.9) to install any pre-requisite packages.

Download and checkout the specific XLA version as mentioned below. Note that this version is a stable version that we will be using for development.

Install clang-17 and bazel-6.5.0.
git clone https://github.com/openxla/xla.git
cd xla; git checkout 759161517

Configure the build for CPU/GPU (for this MP, it does not matter)

• ./configure.py --backend=[CPU|CUDA] --clang path=path/to/clang

• Build the run hlo module tool. bazel build --test output=all --spawn strategy=sandboxed //xla/tools:run hlo module

• Build and run the Algebraic Simplifier Unit tests bazel build --test output=all --spawn strategy=sandboxed //xla/hlo/transforms/simplifiers:algebraic simplifier bazel test //xla/hlo/transforms/simplifiers:algebraic simplifier test

run hlo module is similar to the opt tool in LLVM. This allows you to run arbitrary HLO computational graphs with different compiler optimization passes enabled. Moreover, it will create dummy tensor inputs and automatically checks if the computation after the transformations is correct. For this MP, you only need to use this tool for the final part for which we will give HLO text (similar to LLVM IR bitcode) that can be input to the tool.

Developing and using XLA

For this MP, you will be editing the Algebraic Simplifier (AS) of XLA. You will be working with mainly two files.

• XLA Algebraic Simplifier: xla/hlo/transforms/simplifiers/algebraic simplifier.cc

• Unit testing for the pass: xla/hlo/transforms/simplifiers/algebraic simplifier test.cc

algebriac simplifier.cc contains tensor graph rewrite rules used by the XLA compiler. Rewrites are segregated into classes based on their base operator (e.g., HandleAdd()). For reference, consider the first rewrite in this

function to identify the code structure.

// from HandleAdd():

University of Illinois at Urbana-Champaign Department of Computer ScienceCS 521—Spring 2025 2 Machine Project II

VLOG(10) << "trying transform [Const + A => A + Const]";

if (Match(add, m::Add(m::Constant(), m::NonConstant()))) {

return ReplaceWithNewInstruction(

add,

HloInstruction::CreateBinary(add->shape(), HloOpcode::kAdd, rhs, lhs));

}

The print statement (VLOG) summarizes the rule. It permutes the addition of a scalar to a tensor. Important constructs to note are Match, m:: , ReplaceWithNewInstruction. These have their usual meanings, i.e. Match is a pattern matcher on the subgraph specified using m:: constructs and XLA uses a builder API similar to LLVM to create, delete instructions from the tensor computational graph. algebriac simplifier test.cc contains tests for each individual rewrite rule. This checks if the rewrite is matched and the transformation happens. Let’s look at a test here.

//Test that A + 0 is simplified to A

TEST_F(AlgebraicSimplifierTest, AddZero) {

auto m = CreateNewVerifiedModule();

Shape r0f32 = ShapeUtil::MakeShape(F32, {});

HloComputation::Builder builder(TestName());

HloInstruction* param0 = builder.AddInstruction(

HloInstruction::CreateParameter(0, r0f32, "param0"));

HloInstruction* zero = builder.AddInstruction(

HloInstruction::CreateConstant(LiteralUtil::CreateR0<float>(0.0f)));

builder.AddInstruction(

HloInstruction::CreateBinary(r0f32, HloOpcode::kAdd, param0, zero));

auto computation = m->AddEntryComputationWithLayouts(builder.Build());

HloInstruction* root = computation->root_instruction();

EXPECT_EQ(root->opcode(), HloOpcode::kAdd);

AlgebraicSimplifier simplifier(default_options_);

ASSERT_TRUE(simplifier.Run(m.get()).value());

root = computation->root_instruction();

EXPECT_EQ(root, param0);

}

This code has a few components. First, we need to create a tensor computational graph. That is done in the first few lines. Next, we run this graph through the simplifier (which will have many 100s of rules) using Simplifier.Run command. Then, we check if the graph is transformed. In this case, the graph should be transformed to one node, which is A.

Pay close attention to EXPECT EQ and ASSERT TRUE macros. These are from Google Tests: the tests passes if the conditions are satisfied, otherwise it fails. Make yourself familiar with Google Testing infrastructure to write more informative tests (for those coming from the Java world, it is quite similar to JUnit).

Tips for development:

• Make yourself familiar with HLO manipulation APIs. HLO documentation is a good place to start https://openxla.org/xla.

• HLO operator semantics can be found at https://openxla.org/xla/operation semantics. If you need more formal semantics please read the TensorRight [Arora et al.(2025)] paper (only read if you are familiar with formal definitions of semantics).

• Read existing algebraic simplifier rules and see how the pattern matching and replacement API works.

Handin

You need to hand in the

• the two files mentioned above

• project report with answers to specific questions

in a zip file. More instructions on where to submit will follow. We will run the unit tests using the above code, plus our own tests to check for correctness of your code.

Part 1: Warmup Exercise (20 pts)

First we will implement the following rewrite rule inside XLA’s algebraic simplifier. ew means an element-wise operator. XLA supports three types of element-wise operators (binary, unary, comparison). Please read the operational semantics of XLA operators for exact details (links given before).

ewadd(ewdiv(x,y),ewdiv(z,y)) ⇒ ewdiv(ewadd(x,z),y)

Here, x, z are tensors of any rank. However, they should be of the same rank and should have the same tensor sizes (How are these asserted? Hint: read the operator semantics). y is a scalar that is implicitly broadcasted 1 . The rule simply uses the distributive property of addition.

Tasks:

1. Implement this rule in XLA’s algebraic simplifier. First, determine where to implement this rule (which handle function) and then add the code for matching and replacement inside that function.

2. Write a unit test to check if it is applied correctly. Use TEST F and give the name DistDivScalar to the test. You should get yourself familiar with both Google Tests 2 and the builder APIs of HLO 3 .

Part 2: More Complicated Rewrite Rules (80 pts)

In the class, we talked about DNNFusion [Niu et al.(2021)] and TASO [Jia et al.(2019)] work that uses rewrite rules. Some of these rules are not currently implemented in XLA. In this part, we will first try to formulate the rules, code them up and test their validity. Few challenges exist, such as DNNFusion’s and TASO’s operators are not the same as XLA operators. You need to rewrite the rules to suite XLA operators.

Consider the following rewrite rules. Please read the relevant papers (also covered in the class) to figure out the semantics of the operators.

1. TASO paper Figure 2(b) –“Fusing two matrix multiplications using concatenation and split.”

2. TASO Figure 5 (without layout constraints) – “A graph substitution using the transpose of matrix multipli cation.”

3. DNNFusion Figure 2(a) – “Associative property” (Please refer to the Table 4 captions for operator semantics).

4. DNNFusion Table 4 Associative row 4th rule (Operator semantics for ReduceSum is here 4 ).

• Without graph rewriting: (A ⊙ ReduceSum(B)) ⊙ (ReduceSum(B) ⊙ C)

• With graph rewriting: A ⊙ Square(ReduceSum(B)) ⊙ C

Tasks For each rewrite rule, do the following.

Express these rewrite rules using XLA HLOs present in the XLA operation semantics page (Hint: XLA operators may be too general, you may need to add preconditions to the rule). In the report, you should write the converted rewrites in the form LHS ⇒pre RHS, where pre is the precondition. LHS, RHS and pre should have XLA HLOs and their attributes (if needed).
Implement this rule in XLA’s algebraic simplifier. Follow guidance from the warmup exercise.
Write a unit test to check if this algebraic simplification rule is applied correctly. Follow guidance from the warmup exercise.

Part 3: Understanding Complex Rewrites (extra credit: 20 pts)

Convert the reshape decomposition pass (https://github.com/openxla/xla/blob/main/xla/hlo/transforms/ expanders/reshape decomposer.cc) into a rewrite rule and implement that inside the AS. Write unit tests to validate this rule.
Are there any preconditions on the rule? What assumptions of the hardware does this rule consider? Include your response in the report.
Use the above argument to explain this rule and explain its validity. Include your response in the report.

References

[Arora et al.(2025)] Jai Arora, Sirui Lu, Devansh Jain, Tianfan Xu, Farzin Houshmand, Phitchaya Mangpo Phothilimthana, Mohsen Lesani, Praveen Narayanan, Karthik Srinivasa Murthy, Rastislav Bodik, Amit Sabne, and Charith Mendis. 2025. TensorRight: Automated Verification of Tensor Graph Rewrites. Proc. ACM Program. Lang. 9, POPL, Article 29 (Jan. 2025), 32 pages. https://doi.org/10.1145/3704865

[Jia et al.(2019)] Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken. 2019. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (Huntsville, Ontario, Canada) (SOSP ’19). Association for Computing Machinery, New York, NY, USA, 47–62. https://doi.org/10.1145/3341301. 3359630

[Niu et al.(2021)] Wei Niu, Jiexiong Guan, Yanzhi Wang, Gagan Agrawal, and Bin Ren. 2021. DNNFusion: accelerating deep neural networks execution with advanced operator fusion. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (Virtual, Canada) (PLDI 2021). Association for Computing Machinery, New York, NY, USA, 883–898.

https://doi.org/10. 1145/3453483.3454083

发表评论

电子邮件地址不会被公开。必填项已用*标注

姓名 *

电子邮件 *

验证码 *