Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
Assignment 2 Distributed Graph Analysis, Feature Engineering, and Graph Neural Networks
Important updates for the assignment will be listed here.
Summary
| Submission | Submit an electronic copy of all answers on Moodle (only the last submission will be used). |
| Required Files | A .pdf file is required. The file name should be ass2_zID.pdf |
| Deadline | 9 pm Friday 8 August (Sydney Time) |
| Marks | 20 marks (10% toward your total marks for this course) |
Late penalty. 5% of max marks will be deducted for each additional day (24 hours) after the specified submission time and date. No submission is accepted 5 days (120 hours) after the deadline.
Q1 (3 marks)
1.1 Are the graphs in Figure 1 and Figure 2 homomorphic? If so, demonstrate a matching instance. (1 mark)
1.2 Present all unique subgraphs in Figure 1 that are isomorphic to the graph in Figure 3. For example, {�1:�,�2:�,�3:�}, {�1:�,�2:�,�3:�}, and {�1:�,�2:�,�3:�} are all considered as the same subgraph abc. (2 marks)
Marking for Q1.1: 1 mark is given for the correct answer. 0 mark is given for all other cases.
Marking for Q1.2: 2 marks are given if the result subgraphs are correct, complete, and not redundant. Extra subgraphs and missing subgraphs will result in a loss of marks.
Q2 (5 marks)
2.1 Given the graph in Figure 4, compute the betweenness centrality and closeness centrality of nodes F and G. Round each value to two decimal places (e.g., 3.333 becomes 3.33; 3.7571 becomes 3.76). (2 marks)
2.2 Given the graphs G and G' in Figure 5, compute the Weisfeiler-Lehman (WL) kernel as follows:
- Compute the color count vector for each graph after color refinement converges.
- Compute the WL kernel value between G and G'.
Marking for Q2.1: 1 mark for correct betweenness centrality values; 1 mark for correct closeness centrality values.
Marking for Q2.2: 1 mark for each correct color count vector; 1 mark for the correct WL kernel value.
Q3 (8 marks)
Suppose we aim to count the number of triangles that each vertex belongs to in an undirected graph using Pregel, as shown in Figure 6. The simple implementation below works but may leave room for improvement. Your task is to analyze the code and, if appropriate, optimise it.
Note: self.tri_local stores the number of triangles the vertex is in. self.neighbors is an unordered list of neighbor vertex IDs.
3.1 Code optimisation (4 marks)
Examine the pseudo-code snippet above. What is the total CPU computation time complexity of the algorithm? What is the network cost complexity? (Network cost accounts for all messages and their payloads, not just the number of messages.) Can this algorithm be optimized (e.g., reduced CPU computation cost, lower network cost, earlier termination, etc.)?
- If your answer is yes, briefly describe the improvement and provide the key lines (pseudo-code is sufficient) that you would change / add. State the new time and network cost complexity.
- If your answer is no, give a short technical reason why the current design is already asymptotically optimal under the Pregel model.
3.2 Execution trace (2 marks)
Assume three Pregel workers with the following vertex-to-worker assignment:
• Vertices 1 and 5 → worker X
• Vertices 2 and 4 → worker Y
• Vertices 3 and 6 → worker Z
For each superstep produced by your algorithm (optimised or not), list
(i) the active vertices and
(ii) for every vertex, the messages sent/received (both the total count and the concrete payload).
3.3 Combiner feasibility (2 marks)
Can a Pregel combiner be applied to the triangle-counting algorithm?
- If yes, provide combiner pseudo-code and recalculate the number of messages per superstep under the vertex-to-worker assignment in 3.2.
- If no, give a concise argument why the semantics of this problem preclude a combiner.
Q3.1 (4 marks) 4 marks for a sound optimisation proposal with correct complexity analysis, or a convincing proof of optimality.
Q3.2 (2 marks) 2 marks for the exact set of active vertices and precise message traffic for every superstep.
Q3.3 (2 marks) 2 marks for either a valid combiner design with accurate message counts, or a clear explanation of why a combiner cannot be used.
Q4 (4 marks)
Given an undirected graph as shown in Figure 7, we aim to compute the output �1 of the first graph convolutional layer with self-loops using the Graph Attention Network (GAT) model. The goal is to transform the initial node embeddings from a dimension of 4 to a dimension of 5 through this layer. The equation can be written as:
ℎ�(�)=�(∑�∈�(�)∪{�}����(�)ℎ�(�−1)),
where ℎ�� indicates the ��-dimensional embedding of node � in layer �, and ��=[ℎ�1�,ℎ�2�,ℎ�3�,ℎ�4�,ℎ�5�,ℎ�6�]�. ���=1|�(�)∪{�}| is the weighting factor of node �'s message to node �. �(�)∈���∗��−1 denotes the weight matrix for the neighbours of � in layer �, �� denotes the dimension of the node embedding in layer �. �(⋅) denotes the ���� non-linear function. The initial embedding for all nodes is stacked in �0. �1 is the weight matrices. Self-loops are included in the calculation to ensure that the node's information is retained. Therefore, the term � is added to its set of neighbors, which can be expressed as {�}∪�(�). Round each value to two decimal places (e.g., 3.333 becomes 3.33; 3.7571 becomes 3.76).
| �0=(0.800.90−0.700.700.50−0.800.30−0.40−0.900.400.50−0.100.800.70−0.80−0.600.50−0.90−0.20−0.400.70−0.200.40−0.90) | �1=(11010110101101001011) |
Marking for Q4: 4 marks are given for the correct result. Incorrect values will result in a deduction of marks. A detailed and correct description of the calculation can earn marks for a valid attempt, even if the final result has major mistakes.