COMP1012 - Programming Fundamentals and Applications
Project - I
Posted - October 23, 2023 Deadline - November 30, 2023
Choosing your project
As part of your semester in COMP1012, you are expected to complete a fully-fledged Python modeling project in a team-group of 2-3 students. Within a group, all the members should participate in the project development and be responsible for some tasks (design, implementation, testing, report writing, etc.). The steps 1 – 3 are mandatory for everyone to implement. However, there is an opportunity to choose an application topic (1 – 8) from below that sounds interesting to you and dig into it. The selection will be based on First-Come-First-Served-Approach. Once chosen you cannot change it back.Each application topic can be chosen by at most 13 student groups. To choose the application topic you should register your team through this link and finalize the application topic by 31st October 2023. Once you register your topic and team through the link you should also send me an email ([email protected]) citing your application topic choice to resolve conflicts if more students register for a single topic. Failing to choose your group and application topic will be considered that you chose to work on the project individually and an application topic will be assigned to you randomly.
At the end of the deadline of this project (November 30, 2023), you’ll submit your work as detailed as possible and you’ll present your findings in detail in LaTeX document. All the LaTeX files, function files, datasets and all relevant material should be submitted.
No matter what you choose, the main goal is to design a solution that you think you can answer using the techniques and libraries in Python and not just a question that you can look up the answer for on the internet. If similarities are detected in the code, libraries used, reports or any traces that prove the case of plagiarism will lead to 50% deduction of project points and each member in the group will be graded on the remaining 50% of points through viva-voce.
Project Timeline
The following is a tentative plan for the timeline we’ll be following to ensure that you successfully complete your assignment.• Week 1 - Nov 2023: A 3-minute lightening pitching round presentation (< 3 min, 5 slides max; slide 1 for step 1, slide 2 for step 2 and step 3, slide 3 & 4 for application topic and slide 5 for additional details)
• Week 4 - Nov 2023: Semester project due. Late submissions will not be considered.
Project requirements
Project Pitching PPT = Weightage 5% – Every group should do a 3-Minute pitching during Week 1 of November, 2023 to ensure you are not using the same concepts, libraries, etc. for solving the project. A maximum of 5 slide PPT should be prepared (excluding the cover slide) and should be uploaded on thislinkon or before November 01, 2023. Similarities in the projects will not fetch your project full points and copying the ideas from the pitching round will still make the copier lose points.
Project Report = Weightage 40% – A report should be writing in IEEE LaTeX document that contains a 1) Cover page with title and the information of each team member 2) introduction 3) related work 4) libraries/functions used and the design of the analytics program 5) Overview of the dataset 6) algorithm with explanation 7) pseudo code with explanation 8) flowchart with explanation 9) the analytics goal or hypothesis in each task and the corresponding analytics results and meaningful simulation results that productively convey your algorithm/code, 10) Summary of findings and conclusions conclusion, 11) source code with clear instructions on how to execute your code on compiler 12) USP/Innovative work about your project 13) references (optional) 14) Tasks performed by each member. There is no page limitation but avoid copy pasting from the internet as plagiarism will lead to deduction of points.
Project execution = Weightage 55% – Your project code should execute successfully to any applied datasets of images and article of words. Hence, it is strongly recommended that you find a readily available dataset than creating yours.
Locating data and models
You’ll need to spend some time figuring out what model is the right model to use or what data are available to answer your question. The internet is your friend for this part. You should be able to find the details you need to compute your model or the data you want to analyze on the web. These are some possible, non-exhaustive resources for locating data:
• Census and Statistic Department
• Data.gov - Hong Kong
• Kaggle
• Fivethirtyeight
• The Center for Disease Control
• Food and Agricultural Organization
• UNICEF
• World Health Organization
• World Bank
You may also wish to explore some of the additional resources listed on this page:
https://www.dataquest.io/blog/free-datasets-for-projects/
For the models, you’ll need to do a bit of background research and determine which of the models are the most appropriate for your question.
VERY IMPORTANT NOTE: When you’re finding datasets online you should make sure to record exactly where you found the dataset and cite the source in your final project notebook. Additionally, if you use any code that you find online to complete part of your project you must give credit to the original source code and cite this in your project as well. Any code you use that is found online and not properly cited will be considered plagiarism and violates the academic integrity expected of you in this course.
Project Description
Fatigue is one of the factors leading to reduction in productivity, poor quality of work and increased risk of accidents in construction. Existing established methods of assessing fatigue include surveys and questionnaires, which are cumbersome to implement at construction sites. This project presents a novel approach for real time monitoring of physical fatigue in construction workers.
[Worker’s File Processor – Mandatory Functions] Step 1 – Weightage 10%:
Implement a speech to text converting processor. The convertor processor is able to process the speech audio file and save the results of every audio file to a set of new text files.
Read a video file and convert to audio and then to text file. The video file is referred to any real- time video fed to your processor that converts your video file to audio file and your processor consists of several speech(audio)-to-text document articles converted in English from various videos. All character in the file is with valid ASCII codes in the range of [0, 255]. The articles are stored with the format shown below.
• Title: Speech-to-Text of Article of Worker 1
o …The content of the Speech-to-Text Article of Worker 1 …
• Title: Speech-to-Text Article of Worker 2
o … The content of the Speech-to-Text Article 2 of Worker …
[Spliting Files – Mandatory Functions] Step 2 – Weightage 10%:
Imagine all worker articles/files (speech-to-text) are stored in a single text file. Save Speech-to- Text articles into different separated worker files (i.e., one article text of worker to one file). Each file is named with the title of the article by replacing the spaces with underscores (i.e., ‘_’). Examples are given as follows. You can consider any data inside these files as “content” .
o Speech-to-Text_of_Article_of_Worker_ 1.txt
. Title: Speech-to-Text_of_Article_of_Worker_ 1\n
. … Content …
o Speech-to-Text_of_Article_of_Worker_2. txt
. Title: Speech-to-Text_of_Article_of_Worker_2\n
. … Content …
[Creating a Dictionary - Mandatory Functions] Step 3 - Weightage 10%:
Generate a dictionary of the words used in articles and save the dictionary as a dictionary.txt. Each row of the file starts by a word and its frequency (number of time that the word appears in the articles). The words should be sorted in a descending order by the frequency. Note: We do not distinguish upper and lower case when identifying a word (e.g., Dogs, dogS , doGs are counted as the same as the word dogs). Numbers are considered as words (e.g., 23, 3.14). An example is shown below.
• hungry\t87345
• fly\t8967
•
• airplane\t975
• tired\t787
•
[Helmet Detection] Application 1 - Weightage 25%:
To reduce the risk of head trauma to workers working in high-risk workplaces such as construction sites we should design a Python based helmet detection techniques. From the video files collected in step 1; convert the video files into images and detect if the construction worker is wearing a helmet or not, if he is wearing it, no problem, but if not, detect his number plate and send an alert signal notice to the worker. You should also specify the accuracy of detection rate of your model.
You can create your own data set of images or use an online dataset of images of your choice but you need to specific clearly from where you have taken it from thereby giving the full credit to the user and source.
NOTE: 1) Innovative ideas (which are not commonly available on Google) and Comparison with other traditional algorithms vs your code will fetch you extra 10 BONUS points
[Hand Gestures Detection] Application 2 - Weightage 25%:
Construction bots have been recently developed to improve construction workers gestures detection for their safety and productivity. One of the critical steps to make the bots work with human workers as teams is to provide a user-friendly interface to support their mutual interactions on construction sites. From the videos in step 1 extract all the hand gesture (for example showing OK, STOP, ONE, TWO, THREE, FOUR, FIVE, LEFT, RIGHT, UP, DOWN, etc. signs). You should include the accuracy and detection rate of success and failures through a graph model.
You can create your own data set of images or use an online dataset of images of your choice but you need to specific clearly from where you have taken it from thereby giving the full credit to the user and source. This code should be applicable to any real-time images/dataset used by anyone and should not be limited to your self-defined dataset.
NOTE: 1) Innovative ideas (which are not commonly available on Google) and Comparison with other traditional algorithms vs your code will fetch you extra 10 BONUS points
[Monitoring Inattention by Sentiment Analysis] Application 3 - Weightage 25%:
Physical fatigue is frequent for heavy manual laborers like construction workers, but it causes distraction and may lead to safety incidents. Sentiment analysis is the automated process of tagging data according to the sentiment, such as positive, negative and neutral. Based on step 3; categorize each item of word of the worker into positive, negative or neutral. You should plot your results of accuracy and detection rate of success and failures through a graph model.
Find out all the positive, negative and neutral words in each worker text article, calculate the summation of those words, and print the results on screen. The worker text articles should be shown in an ascending order by the summations. A word is considered as a positive when you encounter words like happy, fresh, beautiful, etc and negative words like tired, lonely, bad, etc. and neutral words like detached, boss, etc. Based on the summation of the results of most encountered words your program should conclude if the worker is monitored to be under stress (if more negative words are encountered) or normal
• WorkerTextArticle4\p8n10u10
• WorkerTextArticle8\ p18n22u5
• WorkerTextArticle2\tp50n6u19
You can create your own dataset of positive, negative or neutral words or use an online dataset of dictionary/images of your choice but you need to specific clearly from where you have taken it from thereby giving the full credit to the user and source. This code should be applicable to any real-time images/dataset used by anyone and should not be limited to your self-defined dataset.
NOTE: 1) Innovative ideas (which are not commonly available on Google) and Comparison with other traditional algorithms vs your code will fetch you extra 10 BONUS points
[Behavioral and Vehicle Analytics] Application 4 - Weightage 25%:
Driver behavior statistics - In this application you are required to conduct statistical analytics on the driving behavior. The information of dataset should include but not limited to the datetime, the car plate number, the cumulative number of times of overspeed and fatigue driving, the total time of overspeed and neutral slide. The statics results should be well organized and automatically saved in a file (e.g., txt, json, csv) with formatted output.
Driving speed analytics. You are required to use a diagram to plot the driving speed of each driver during the given period, and then discover and compare the characteristics and patterns of driving speed for each driver.
You can find an online dataset of your choice but you need to specific clearly from where you have taken it from thereby giving the full credit to the user and source. This code should be applicable to any real-time images/dataset used by anyone and should not be limited to your self- defined dataset.
NOTE: 1) Innovative ideas (which are not commonly available on Google) and Comparison with other traditional algorithms vs your code will fetch you extra 10 BONUS points
[Image Classification for Drowsiness Detection] Application 5 - Weightage 25%:
In this Python project, you will be making a drowsiness detection system by taking video used in step 1. A countless number of construction workers drive on the highway day and night and traveling long-distance suffer from lack of sleep. Due to which it becomes very dangerous to drive when feeling sleepy. Apart from this you can also detect if the worker is happy, sad or neutral for your comparison (Optional).
The objective of this Python project is to build a drowsiness detection system that will detect that a person’s eyes are closed for a few seconds. This system will alert the driver when drowsiness is detected. You should plot your accuracy and loss results, classification report and confusion matrix (if any). The results should be discussed thoroughly if you use classifiers/CNN models.
NOTE: 1) Innovative ideas (which are not commonly available on Google) and Comparison with other traditional algorithms vs your code will fetch you extra 10 BONUS points
[Wall/Road Crack Detection] Application 6 - Weightage 25%:
Currently, the structural condition of a building is still predominantly manually inspected. In simple terms, even nowadays when a structure needs to be inspected for any damage, a construction worker or an engineer will manually check all the surfaces and take a bunch of photos while keeping notes of the position of any cracks which risks the life of humans. Fortunately, nowadays in cases with accessibility issues UAVs, such as drones, are deployed to take photos but still, a person would have to spend hours and hours checking each and every photo taken for signs of damage.
You should detect the crack on the walls through a series of dataset images. You should also report on the accuracy and classification of cracks. You should also report on the length and intensity (depth) of the crack. For example, you may require to do the operations on orginal images of masking, thinning, and using distance transform methods to calculate the length and diameter of the cracks.
NOTE: 1) Innovative ideas (which are not commonly available on Google) and Comparison with other traditional algorithms vs your code will fetch you extra 10 BONUS points
[Object Detection] Application 7 - Weightage 25%:
Although there has been study on worker detection using computer vision (CV) for the safety of construction sites, it is still challenging to identify employees who are obstructed or have poor vision. Referring to Setp1 you are going to create image object detection and creating video object detection. An input of imaging dataset should detect the customized objects. You should discuss your results on accuracy and detection rate.
NOTE: 1) Innovative ideas (which are not commonly available on Google) and Comparison with other traditional algorithms vs your code will fetch you extra 10 BONUS points
[Risk Management & Perception and Safety Assessment] Application 8 - Weightage 25%:
Construction workers fatality rate is more when compared with workers of any other sectors. To minimize the number of risks and maximize the safety, construction companies seek different strategies constantly. The criteria to determine whether a worker is at high risk, or more likely to have an accident, should be based on a logistic regression model or any other model where the outcome variable was an accident at the building (say in 2022). Independent variables included building characteristics, number of permits active at the site, type of building permit and buildings with accidents prior to 2022 can be considered for monitoring. You should discuss your risk assessment through graphs, tables, and simulations and figures.
NOTE: 1) Innovative ideas (which are not commonly available on Google) and Comparison with other traditional algorithms vs your code will fetch you extra 10 BONUS points
How to Submit
Each group is required to submit a compressed file (.zip) containing the following items:
Source code in a folder
Analytics results in a folder
A report converted in PDF format, with all LaTeX files as described above in project requirements
The zip folder should be saved with team leader’s student ID number and should be uploaded to the blackboard.