Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
CS 521 — Machine Learning and Compilers
Spring Semester 2025
Goal
Downloading and building XLA
Download and checkout the specific XLA version as mentioned below. Note that this version is a stable version that we will be using for development.
- Install clang-17 and bazel-6.5.0.
- git clone https://github.com/openxla/xla.git
- cd xla; git checkout 759161517
• ./configure.py --backend=[CPU|CUDA] --clang path=path/to/clang• Build the run hlo module tool. bazel build --test output=all --spawn strategy=sandboxed //xla/tools:run hlo module• Build and run the Algebraic Simplifier Unit tests bazel build --test output=all --spawn strategy=sandboxed //xla/hlo/transforms/simplifiers:algebraic simplifier bazel test //xla/hlo/transforms/simplifiers:algebraic simplifier test
run hlo module is similar to the opt tool in LLVM. This allows you to run arbitrary HLO computational graphs with different compiler optimization passes enabled. Moreover, it will create dummy tensor inputs and automatically checks if the computation after the transformations is correct. For this MP, you only need to use this tool for the final part for which we will give HLO text (similar to LLVM IR bitcode) that can be input to the tool.
Developing and using XLA
• XLA Algebraic Simplifier: xla/hlo/transforms/simplifiers/algebraic simplifier.cc• Unit testing for the pass: xla/hlo/transforms/simplifiers/algebraic simplifier test.cc
algebriac simplifier.cc contains tensor graph rewrite rules used by the XLA compiler. Rewrites are segregated into classes based on their base operator (e.g., HandleAdd()). For reference, consider the first rewrite in this
The print statement (VLOG) summarizes the rule. It permutes the addition of a scalar to a tensor. Important constructs to note are Match, m:: , ReplaceWithNewInstruction. These have their usual meanings, i.e. Match is a pattern matcher on the subgraph specified using m:: constructs and XLA uses a builder API similar to LLVM to create, delete instructions from the tensor computational graph. algebriac simplifier test.cc contains tests for each individual rewrite rule. This checks if the rewrite is matched and the transformation happens. Let’s look at a test here.
This code has a few components. First, we need to create a tensor computational graph. That is done in the first few lines. Next, we run this graph through the simplifier (which will have many 100s of rules) using Simplifier.Run command. Then, we check if the graph is transformed. In this case, the graph should be transformed to one node, which is A.
Pay close attention to EXPECT EQ and ASSERT TRUE macros. These are from Google Tests: the tests passes if the conditions are satisfied, otherwise it fails. Make yourself familiar with Google Testing infrastructure to write more informative tests (for those coming from the Java world, it is quite similar to JUnit).
Tips for development:
• Make yourself familiar with HLO manipulation APIs. HLO documentation is a good place to start https://openxla.org/xla.• HLO operator semantics can be found at https://openxla.org/xla/operation semantics. If you need more formal semantics please read the TensorRight [Arora et al.(2025)] paper (only read if you are familiar with formal definitions of semantics).• Read existing algebraic simplifier rules and see how the pattern matching and replacement API works.
Handin
• the two files mentioned above• project report with answers to specific questions
in a zip file. More instructions on where to submit will follow. We will run the unit tests using the above code, plus our own tests to check for correctness of your code.
Part 1: Warmup Exercise (20 pts)
First we will implement the following rewrite rule inside XLA’s algebraic simplifier. ew means an element-wise operator. XLA supports three types of element-wise operators (binary, unary, comparison). Please read the operational semantics of XLA operators for exact details (links given before).
Here, x, z are tensors of any rank. However, they should be of the same rank and should have the same tensor sizes (How are these asserted? Hint: read the operator semantics). y is a scalar that is implicitly broadcasted 1 . The rule simply uses the distributive property of addition.
Tasks:
1. Implement this rule in XLA’s algebraic simplifier. First, determine where to implement this rule (which handle function) and then add the code for matching and replacement inside that function.
Part 2: More Complicated Rewrite Rules (80 pts)
Consider the following rewrite rules. Please read the relevant papers (also covered in the class) to figure out the semantics of the operators.
• Without graph rewriting: (A ⊙ ReduceSum(B)) ⊙ (ReduceSum(B) ⊙ C)• With graph rewriting: A ⊙ Square(ReduceSum(B)) ⊙ C
Tasks For each rewrite rule, do the following.
- Express these rewrite rules using XLA HLOs present in the XLA operation semantics page (Hint: XLA operators may be too general, you may need to add preconditions to the rule). In the report, you should write the converted rewrites in the form LHS ⇒pre RHS, where pre is the precondition. LHS, RHS and pre should have XLA HLOs and their attributes (if needed).
- Implement this rule in XLA’s algebraic simplifier. Follow guidance from the warmup exercise.
- Write a unit test to check if this algebraic simplification rule is applied correctly. Follow guidance from the warmup exercise.
Part 3: Understanding Complex Rewrites (extra credit: 20 pts)
- Convert the reshape decomposition pass (https://github.com/openxla/xla/blob/main/xla/hlo/transforms/ expanders/reshape decomposer.cc) into a rewrite rule and implement that inside the AS. Write unit tests to validate this rule.
- Are there any preconditions on the rule? What assumptions of the hardware does this rule consider? Include your response in the report.
- Use the above argument to explain this rule and explain its validity. Include your response in the report.
References
[Arora et al.(2025)] Jai Arora, Sirui Lu, Devansh Jain, Tianfan Xu, Farzin Houshmand, Phitchaya Mangpo Phothilimthana, Mohsen Lesani, Praveen Narayanan, Karthik Srinivasa Murthy, Rastislav Bodik, Amit Sabne, and Charith Mendis. 2025. TensorRight: Automated Verification of Tensor Graph Rewrites. Proc. ACM Program. Lang. 9, POPL, Article 29 (Jan. 2025), 32 pages. https://doi.org/10.1145/3704865
[Jia et al.(2019)] Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken. 2019. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (Huntsville, Ontario, Canada) (SOSP ’19). Association for Computing Machinery, New York, NY, USA, 47–62. https://doi.org/10.1145/3341301. 3359630
[Niu et al.(2021)] Wei Niu, Jiexiong Guan, Yanzhi Wang, Gagan Agrawal, and Bin Ren. 2021. DNNFusion: accelerating deep neural networks execution with advanced operator fusion. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (Virtual, Canada) (PLDI 2021). Association for Computing Machinery, New York, NY, USA, 883–898.