Assignment 2: Python Data Visualization and Dimensionality Reduction

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

Assignment 2: Python Data Visualization and Dimensionality Reduction

Overview

The purpose of this assignment is to give you further hands-on practice with Python data visualization and PCA. Unlike other assignments, you must use Python and the provided Google Colab notebook for completing this assignment.
The assignment builds on the in-class activity for September 25. You will start by working through the Colab notebook in class, which provides a guided overview of matplotlib for visualization and scikit-leam for PCA (building onthe extemal tutorials youcompleted on September 11). After completing this work, you will then use the end ofthe notebook to write code for a full PCA of an unpublished bioinformatics dataset.
At the end of the assignment, youhave the option of doing a borus activity involving UMAP.Successful completion of the borus activity will eam you an additional 15 points. The bonus activity is optional, and you can still eam 100% on Assignment 2 without attempting it.
Detailed instructions are provided in the Colab notebook. You should follow those instructions when working on the in-class activity andthe assignment.

Dataset

For part of the assignment, you will work with areal bioinformatics dataset about gene expression in endometrial cancer. This dataset is unpublished, andyou will not find reference to it anywhere online. It contains measurements of 50 genes related to fatty acid metabolism in 210 control or endometrial cancer samples (the exact gene names have been redacted for confidentiality reasons). Types of endometrial cancer representedinclude G1/G2/G3 endometriod carcinoma, serous endometrial carcinoma, and clear cell endometrial carcinoma.

Materials to Submit

1) A copy of your modified Colab notebook with all ofthe “EXERCISES” completed and the full code for your PCA ofthe endometrial cancer dataset (and UMAP, if doing the optional borus).

2) Your final PCA scatterplot for the penguins dataset in png format.

3) Your final PCA scatterplot for the endometrial cancer dataset in png format.

4) [OPTIONAL] Your final UMAP scatterplot in png format.

You will lose points if you do not follow the file formatrequirements!

Grades will be based on a combination ofyour performance on the basic exercises (30 points) and on the PCA with the endometrial cancer dataset (70 points).

发表评论

电子邮件地址不会被公开。 必填项已用*标注