Web Science (H) COMPSCI4077
Health Data Analysis
Date: 15th March 2024
Individual Assessment: Health Data Analysis
CW is marked out of 100 marks & Weighted 20% for the final marks
Coursework is due on Friday, March 15, 2024, 430 PM (Subject to LTC approval)
All submissions are through Moodle. Penalties will be applied for late submission.
Links to data from SuicideWatch subreddit is given. Crawl the data and create network-based analysis.
Your task is to develop Network analysis on this data set. I recommend you use Jupyter notebook and submit code and outputs archived.
(i) Use the data and create graphs and create visualisation. [25]
In the report – how did you organise the data?
• Data preparation & statistics (10 marks)
Discuss the data pre-processing, data preparation and the justification.
Discuss the data statistics – here you summarise data and this description could be useful for later elaborations.
• Create a global interaction graph. Explain the Start node, end nodes- (5 marks)
• Visualisation of the network (10 marks)
Gephi can be used to build visualisation. Marks breakdown for the visualisation.
Visualisation 5 marks
Description and interpretation – 5 marks
(ii) Create Graph/network Analysis and understand the important properties of the graph. [45]
What analysis? Based on Lecture 6, week starting 29/1/2023
Identify the methodology you will use to analyse the data from a network perspective
Each metric, description, pseudo-code, interpretation (15 marks)
Minimum 3 metrics /approaches
(iii) [Open creativity tasks]
Students are encouraged to explore further and describe the characteristics of data, graph. [20]
(iv) Report Quality – 10 marks
a. Structuring and formatting - 3
b. Articulation of ideas - 3
c. Creativity in addressing the tasks -4 [10]