大神代写 - 程序代写, 作业代写, essay代写, homework代写,project代写, quiz代写, exam等

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

Student Assignment Brief

This document is intended for Coventry University Group students for their own use in completing their assessed work for this module. It must not be passed to third parties or posted on any website. If you require this document in an alternative format, please contact your Module Leader.

Assignment Information

Module Name: Natural Language Processing

Module Code: 7120CEM

Assignment Title: Machine Learning and Deep Learning Solutions for Language-Related Problems

Assignment Due: CW1 – Thursday 4 July 2024 at 6pm UK time

CW2 – Thursday 1 August 2024 at 6pm UK time

Assignment Credit: 15 credits: CW1 – 5 credits; CW2 – 10 credits

Word Count (or equivalent): CW1 – 1500 words +/- 10%; CW2 – 3000 words +/- 10%

Assignment Type: Written

Percentage Grade (Applied Core Assessment). You will be provided with an overall grade between 0% and 100%. You have one opportunity to pass the assignment at or above 40%.

Assignment Task

There are in total TWO pieces of Coursework (i.e., two Tasks), each of which has a unique submission deadline (refer to Assignment Due in Page 1). You will provide your own (potentially novel) solutions to solve a real-world Natural Language Processing (NLP) problem. You can either choose to solve the same NLP problem using the same dataset, or choose to work on a different NLP problem/dataset in CW2.

CW1: Based on knowledge in Lectures/Labs 1-5. The CW is for developing a machine learning model to solve an NLP problem.
CW2: Based on knowledge in Lectures/Labs 6-8. The CW is for developing a deep learning model to solve an NLP problem.

* You will know the differences between machine learning solution and deep learning solution in the lectures and labs! No worry at this moment :-)

NLP Problems and Datasets

You are NOT ALLOWED to choose a dataset from Kaggle without permission, but are preferred to choose datasets from the three sources listed below.
Source 1: You will be given a list of Shared Tasks (at the end of the assignment description) on the events hosted by the Association for Computational Linguistics, such as the annual Semantic Evaluation (SemEval) conference, the annual conference of ACL's Special Interest Group on Natural Language Learning (CoNLL), etc. You are required to read the descriptions of the shared tasks, including the associated data, and make your own choice to choose ONE shared task and tackle the chosen Natural Language Processing (NLP) problem of the chosen shared task. Note: For NLP problems in the labs, you are NOT ALLOWED to use the same method. In principle, the same dataset used in the labs are also NOT ALLOWED for use.
Source 2: The NLPprogress website (http://nlpprogress.com) may be an alternative catalogue of most (if not all) famous shared tasks for important computational linguistics/natural language processing problems in the past few decades. Warning: Many NLP problems may be beyond your knowledge or too difficult for module coursework, so please MAKE SURE that you discussthe dataset/task with the module lecturer in advance. On the contrary, the list of tasks in Source 1 have been carefully selected and you can be confident that they are suitable tasks for your CW.
Source 3: The Papers With Code website (https://paperswithcode.com/datasets?mod=texts) maintains a number of good datasets used by NLP researchers around the world, and a corresponding (potentially incomplete) benchmark for each dataset and relevant papers which use the dataset (e.g., https://paperswithcode.com/dataset/senseval-2-1 for word sense disambiguation, or https://paperswithcode.com/dataset/conll-2002 for named entity recognition).

Warning: The same warning as in Source 2.

Optional CW Proposals

If you choose the NLP problem and dataset from the suggested list, you do not need a proposal.

If you choose an NLP problem and dataset from Source 2 or 3, then it is suggested that you write a freestyle proposal to the module leader, discussing the feasibility of the problem. The submission of the (optional) proposal should be no later than：

Lecture 5 for CW1
Lecture 9 for CW2

Developing Shared Task Solutions

To tackle the shared task:

1) First, you will start with searching relevant literature about the same problem.

2) Next, you will read some of them based on the NLP knowledge you learnt from the module and write a small literature review.

3) Then, you will provide a theoretical solution to the shared task.

4) After that, you will write codes for preparing/preprocessing the data, develop machine learning models (CW1) or deep learning models (CW2) to solve the NLP problem, conduct experiments on the dataset, collect and analyse experimental results.

5) Finally, you will sum up all your efforts by now in a CW report, in the form of a research paper.

Guides, Suggestions and Hints

1) Searching literature: This can be done through Google Scholar (https://scholar.google.com) or Semantic Scholar (https://www.semanticscholar.org/). Hint: You can start with the summary paper about the chosen shared task (if you choose a public shared task dataset). The summary paper cites shared task contributions, which can be the initial set of papers for your literature review. You should find such papers in ACL Anthology (https://aclanthology.org). The you can use either Google Scholar or Semantic Scholar for the papers citing your initial set. Example: For “SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)”, the summary paper can be accessed at https://aclanthology.org/S19-2010/.

2) Literature review: You should consult the papers you read and mimic their Related Work section for writing the literature review. You should also summarise the techniques adopted by the papers you read (e.g., for you to decide what solution to choose or design), and also learn how to do experiments, report results, and analyse findings.

3) Theoretical solution: You can choose to either replicate the solution (or majority of the solution) of one good paper of your reading or propose your (potentially novel) solution.

4) Practical implementation: Google Colab can be the platform for development and testing. For CW1, you can also use a Python IDE for the development because it does not require GPU. For CW2, you will have to rely on GPU for deep learning, so Google Colab is a good choice.

Suggestion: Google Colab also has resource limit on free GPU, so you are advised to use a sample of the chosen dataset if the dataset is too large. When the dataset size is not too big, say a few thousand, it is better to use all samples so that your results can be fairly compared to past studies.

5) Writing up: You are advised to refer to published NLP papers in ACL Anthology (e.g., the shared task papers and papers citing them) for how to organise and write a research paper.

Suggested NLP Shared Tasks from Source 1

Chief Complaints (CC) texts from a hospital’s Emergency department for the development of a Gout Flare Early Alert.

Website: https://github.com/ozborn/gout_chief_complaint_alert (where you can find their dataset link, and their codes. Part of which can be used to enhance your baseline, but the DL implementation is not usable, so you still need to implement it by yourself.)
Summary paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075438/

Identification of Randomised Controlled Trials (RCT) (for performing better systematic literature review and evidence-based healthcare)

Website: https://github.com/jennak22/Bat4RCT/blob/main/rct_data.zip
Summary paper: https://doi.org/10.1371/journal.pone.0283342

Clickbait Challenge at SemEval 2023 - Clickbait Spoiling

Website: https://pan.webis.de/semeval23/pan23-web/clickbait-challenge.html
Summary paper: https://aclanthology.org/2023.semeval-1.312/

WASSA 2023 Shared Task on Empathy Emotion and Personality Detection in Interactions (* Including regression problems and classification problems):

Website: https://codalab.lisn.upsaclay.fr/competitions/11167
Summary paper: https://aclanthology.org/2023.wassa-1.44/

SemEval-2022 Task 7 Identifying Plausible Clarifications of Implicit and Underspecified Phrases in Instructional Texts

Website: https://clarificationtask.github.io/
Summary paper: https://aclanthology.org/2022.semeval-1.146/

SemEval-2021 Task 1: Lexical Complexity Prediction:

Website: https://sites.google.com/view/lcpsharedtask2021
Summary paper: https://aclanthology.org/2021.semeval-1.1/

SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles:

Website: https://propaganda.qcri.org/semeval2020-task11/
Summary paper: https://aclanthology.org/2020.semeval-1.186/

OffensEval: Identifying and Categorizing Offensive Language in Social Media:

Website: https://sites.google.com/site/offensevalsharedtask/
Summary paper: https://aclanthology.org/2020.semeval-1.188/

SemEval 2019 Task5: Multilingual detection of hate speech against immigrants and women in Twitter (HatEval):

Suggestion: Only use the English subset for CW
Website: https://competitions.codalab.org/competitions/19935
Summary paper: https://aclanthology.org/S19-2007/

SemEval-2019 Task 4: Hyperpartisan News Detection:

Website: https://pan.webis.de/semeval19/semeval19-web/
Summary paper: https://aclanthology.org/S19-2145/

SemEval-2019 Task 3: EmoContext Contextual Emotion Detection in Text:

Website: https://competitions.codalab.org/competitions/19790
Summary paper: https://aclanthology.org/S18-1007

SemEval 2018 Task 4: Character Identification on Multiparty Dialogues:

Website: https://competitions.codalab.org/competitions/17310
Summary paper: https://aclanthology.org/S18-1007/

SemEval-2017 Task 6: #HashtagWars: Learning a Sense of Humor:

Website: https://alt.qcri.org/semeval2017/task6/
Summary paper: https://aclanthology.org/S17-2004/

SemEval-2017 Task 4: Sentiment Analysis in Twitter:

Website: https://alt.qcri.org/semeval2017/task4/
Summary paper: https://aclanthology.org/S17-2088/

SemEval-2016 Task 6: Detecting Stance in Tweets:

Website: https://alt.qcri.org/semeval2016/task6/
Summary paper: https://aclanthology.org/S16-1003/

SemEval-2016 Task 5: Aspect-Based Sentiment Analysis:

Website: https://alt.qcri.org/semeval2016/task5/
Summary paper: https://aclanthology.org/S16-1002/

SemEval-2015 Task 13: Multilingual All-Words Sense Disambiguation and Entity Linking (* Only on English WSD, not working on Entity Linking):

Website: https://alt.qcri.org/semeval2015/task13/
Summary paper: https://aclanthology.org/S15-2049/, also https://aclanthology.org/S13-2040/
Reading Option 1: Chapter 3 “A Comparison of Supervised ML Algorithms for WSD” in the PhD thesis titled “Machine Learning Techniques for Word Sense Disambiguation” (https://www.cs.upc.edu/~escudero/wsd/06-tesi.pdf)
Reading Option 2: Chapter 7 “Supervised Corpus-Based Methods for WSD” in the edited book titled “Word Sense Disambiguation: Algorithms and Applications” (on Aula)
Reading Option 3: Lecture slides “Word Sense Disambiguation” by Diana McCarthy (https://lct-master.org/files/WSD.pdf)

SemEval-2015 Task 9: CLIPEval Implicit Polarity of Events:

Website: https://alt.qcri.org/semeval2015/task9/
Summary paper: https://aclanthology.org/S15-2077/

The CoNLL-2014 Shared Task on Grammatical Error Correction:

Website: https://www.clips.uantwerpen.be/conll2003/ner/
Summary paper: https://aclanthology.org/W03-0419/
NUCLE Release 3.2: To obtain the data, please download the license form. Print the form, sign, and have the scanned PDF file of the signed form ready. Then, please provide your particulars (name, position, affiliation, and email address) and upload your scanned PDF file of the *signed* form through the license submission page. We will try to send the NUCLE data to you within 3 (three) working days.

*SEM 2012 Shared Task: Resolving the Scope and Focus of Negation (* This is a hard task):

Website: https://www.clips.ua.ac.be/sem2012-st-neg/
Summary paper: https://aclanthology.org/S12-1035/

CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition:

Website: https://www.comp.nus.edu.sg/~nlp/conll14st.html
Summary paper: https://aclanthology.org/W14-1701/

CoNLL-2001 shared task: clause identification:

Website: https://www.clips.uantwerpen.be/conll2001/clauses/
Summary paper: https://aclanthology.org/W03-0419/

CoNLL-2000 Shared Task Chunking (Hint: A sequence labelling task):

Website: https://www.clips.uantwerpen.be/conll2000/chunking/
Summary paper: https://aclanthology.org/W00-0726/

Other Suggested NLP Shared Tasks, Including Some from Source 2

Intent Detection and Slot Filling (Hint: sequence labelling task):

Website: http://nlpprogress.com/english/intent_detection_slot_filling.html
Summary paper: https://aclanthology.org/S12-1035/

HedgePeer: A Dataset for Uncertainty Detection in Peer Reviews (Hint: Span detection problem similar to span-based extractive QA):

Website: https://github.com/Tirthankar-Ghosal/HedgePeer-Dataset
Summary paper: https://doi.org/10.1145/3529372.3533300

Toxic Comment Classification Challenge: Identify and classify toxic online comments:

Website: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge
Summary paper: https://aclanthology.org/N19-1144/

Submission Instructions:

Working Style:

This is an INDIVIDUAL assignment. Both CW1 and CW2 must be finished by yourself individually. BUT discussions with your peers working on similar or related topics are encouraged. Ensure that you clearly cite and reference any sources you have used using APA style referencing. Please include both in-text citations and a list of references.

Submission arrangement online via AULA:

Submit before 6pm UK time, late work will receive a mark of zero.
University regulations regarding the “so-called” grace period may apply, if such regulation existed.

Submission formats:

As the marking guideline shows, FOR EACH CW, the submission will include one REPORT, and one VIVA video. There will be separate links for each type of submission. Viva video will not be separately marked; instead, its mark is part of the CW mark (10%).
The source codes, including the preprocessed dataset, instructions or any intermediate results, should be hosted on GitHub. The codes MUST be annotated clearly to allow readers to grasp the algorithmic idea and data flow of your codes. The GitHub link MUST be provided in the report.

Report format:

For written assignments this should always be Microsoft Word and NOT PDF (University requirement).
Word Template: the ACL Paper Styles at https://github.com/acl-org/acl-style-files

How Reports Will be Judged

The report or paper should broadly include the following components. The structure applies to BOTH CW reports.

Abstract: Word limit – 150 words; Rely on the IMRaD structure – Introduction, Method, Results and Discussion/Conclusion; Make sure the abstract is relatively balanced across all four aspects.
Introduction: Brief introduction of the problem and your paper; Use illustrated examples from the dataset when you think it necessary to help readers understand
Related Work: A short literature review of about 10 most pertinent papers on the same problem
Method: Describe your own approach in enough technical details; Use illustrations when necessary
Experiments: Include a detailed description of the dataset, the experimental setups, the baseline methods you tested, the different variants you tried, etc.; Report results with clarity and succinctness
Discussions: Critically analyse experimental results and appraise your own approach, the baselines and other competitors you find or test
Conclusion: Summarise not only your methodology and methodological contributions, but also your main findings, conclusions, etc. and other important take-home messages.
References: In the ACL reference style, which can be found in the Word template

When marking your CW report(s), the marking components we will focus on are roughly aligned to the aspects above, plus some additional aspects about writing and presentation. They are detailed below.

Marking Scheme for BOTH CW Tasks	Mark (out of 100%)
1) Introduction A clear description of the problem, and a succinct summary of what the current paper does	10
2) Related Work A succinct summary of about 10 (the minimal number of) past studies A pertinent analysis of the reviewed studies and derive insights from them for the current paper	5 5
3) Technical Quality 1. Clarity of the methodological description 2. Rigour and extent of the experiments. 3. Reproducibility, i.e., the clear description of how dataset is used and preprocessed, how experiments are done, how hyperparameters of machine learning and deep learning models are tuned, how fair comparisons against the baselines and competitors are guaranteed, etc.	15 15 10
4) Viva, also Evidence Extent of evidence of running the experiments provided in Viva video and appendices. The clarity of the Viva video showing your good command of the knowledge and technical communication	5 5
5) Evaluation 1. Evaluation and discussion of the results. Why the results are important? How would the results be useful to other researchers or practitioners? 2. Is this a “real” problem or a small “toy” problem? How does the paper advance the state of the art? Notes about Valid codes: All your programming code should be included in the Appendix of your report or provided via a valid GitHub link. Please display them in a structured way (put headings for each Task implemented in your CW), with appropriate comments/annotations. You need to attach the original Python codes (Python is the recommended programming language, but other languages you like are permitted), NOT the screenshots of the code. The code will be marked as part of the above marking scheme (for all the Tasks in this coursework, you will need to provide the corresponding code; when you describe/discuss the Tasks in the main text of the report, please reference the corresponding code section in the Appendix or link).	5 5
6) Presentation and Organisation 1. The overall presentation of the paper, quality of language, art work (figures, illustrations, tables, etc.), correct referencing, in adherence to the requirements of the ACL Paper Styles 2. Balanced and reasonable structure of the paper	5 5
7) Originality (* This is for you to get an 80/90+ mark, close to publication): 1. Is there some original approach to the problem, original use of techniques? 2. Is there any (and how much) difference from previous contributions?	5 5

Marking and Feedback

How will my assignment be marked?

Your assignment will be marked by the module team.

How will I receive my grades and feedback?

Provisional marks will be released once internally moderated. Feedback will be provided by the module team alongside grades release. Students will be able to access their feedback via Aula/Turnitin. Your provisional marks and feedback should be available with 2 weeks (10 working days).

What will I be marked against?

Details of the marking criteria for this task can be found at the bottom of this assignment brief.

Assessed Module Learning Outcomes

The Learning Outcomes for this module align to the marking criteria which can be found at the end of this brief. Ensure you understand the marking criteria to ensure successful achievement of the assessment task. The following module learning outcomes are assessed in this task:

LO1: demonstrate understanding of linguistic concepts relevant to Natural Language Processing (NLP).
LO2: formulate NLP tasks as learning and inference problems for machine learning and demonstrate understanding of underlying algorithms.
LO3: select, apply, and critically evaluate an NLP method for a given task.
LO4: apply computational skills to create NLP processing pipelines using existing NLP libraries and tools.

Assignment Support and Academic Integrity

If you have any questions about this assignment please see the Student Guidance on Coursework for more information.

Spelling, Punctuation, and Grammar:

You are expected to use effective, accurate, and appropriate language within this assessment task.

Academic Integrity:

The work you submit must be your own, or in the case of groupwork, that of your group. All sources of information need to be acknowledged and attributed; therefore, you must provide references for all sources of information and acknowledge any tools used in the production of your work, including Artificial Intelligence (AI). We use detection software and make routine checks for evidence of academic misconduct.

Definitions of academic misconduct, including plagiarism, self-plagiarism, and collusion can be found on the Student Portal. All cases of suspected academic misconduct are referred for investigation, the outcomes of which can have profound consequences to your studies. For more information on academic integrity please visit the Academic and Research Integrity section of the Student Portal.

Support for Students with Disabilities or Additional Needs:

If you have a disability, long-term health condition, specific learning difference, mental health diagnosis or symptoms and have discussed your support needs with health and wellbeing you may be able to access support that will help with your studies.

If you feel you may benefit from additional support, but have not disclosed a disability to the University, or have disclosed but are yet to discuss your support needs it is important to let us know so we can provide the right support for your circumstances. Visit the Student Portal to find out more.

Unable to Submit on Time?

The University wants you to do your best. However, we know that sometimes events happen which mean that you cannot submit your assessment by the deadline or sit a scheduled exam. If you think this might be the case, guidance on understanding what counts as an extenuating circumstance, and how to apply is available on the Student Portal.

Administration of Assessment

Module Leader Name: Dr Mark Johnston

Module Leader Email: [email protected]

Assignment Category: Written

Attempt Type: Standard

Component Code: CW1 and CW2

Assessment Marking Criteria Generic Marking Rubric for PG Modules

Mark band	Outcome	Guidelines
90-100% Distinction	Meets learning outcomes	Distinction - Exceptional work with very high degree of rigour, creativity and critical/analytic skills. Mastery of knowledge and subject-specific theories with originality and autonomy. Demonstrates exceptional ability to analyse and apply concepts within the complexities and uncertainties of the subject/discipline. Innovative research with exceptional ability in the utilisation of research methodologies. Demonstrates, creativity, originality and outstanding problem-solving skills. Work completed with very high degree of accuracy, proficiency and autonomy. Exceptional communication and expression demonstrated throughout. Student evidences the full range of technical and/or artistic skills. Work pushes the boundaries of the discipline and may be strongly considered for external publication/dissemination/presentation.
80-89% Distinction		Distinction - Outstanding work with high degree of rigour, creativity and critical/analytic skills. Near mastery of knowledge and subject-specific theories with originality and autonomy. Demonstrates outstanding ability to analyse and apply concepts within the complexities and uncertainties of the subject/discipline. Innovative research with outstanding ability in the utilisation of research methodologies. Work consistently demonstrates creativity, originality and outstanding problem-solving skills. Work completed with high degree of accuracy, proficiency and autonomy. Outstanding communication and expression demonstrated throughout. Student demonstrates a very wide range of technical and/or artistic skills. With some amendments, the work may be considered for external publication/dissemination/presentation
70-79% Distinction		Distinction - Excellent work undertaken with rigour, creativity and critical/analytic skills. Excellent degree of knowledge and subject-specific theories with originality and autonomy demonstrated. The work exhibits excellent ability to analyse and apply concepts within the complexities and uncertainties of the subject/discipline. Innovative research with excellent ability in the utilisation of research methodologies. Work demonstrates creativity, originality and excellent problem-solving skills. Work completed with very consistent levels of accuracy, proficiency and autonomy. Excellent communication and expression demonstrated throughout. Student demonstrates a very wide range of technical and/or artistic skills.
60-69% Merit		Merit - Very good work often undertaken with rigour, creativity and critical/analytic skills. Very good degree of knowledge and subject-specific theories with some originality and autonomy demonstrated. The work often exhibits the ability to fully analyse and apply concepts within the complexities and uncertainties of the subject/discipline. Very good research evidence and shows very good ability in the utilisation of research methodologies. Work demonstrates creativity, originality and problem-solving skills. Work completed with very consistent levels of accuracy,proficiency and autonomy. Very good communication and expression demonstrated throughout. Student demonstrates a wide range of technical and/or artistic skills.
50-59% Pass		Pass - Good work undertaken with some creativity and critical/analytic skills. Demonstrates knowledge and subject specific theories with some originality and autonomy demonstrated. The work exhibits the ability to analyse and apply concepts within the complexities and uncertainties of the subject/discipline. Good research and shows some ability in the utilisation of research methodologies. Work demonstrates problem solving skills and is completed with some level of accuracy, proficiency and autonomy. Satisfactory communication and expression demonstrated throughout. Student demonstrates some of the technical and/or artistic skills.
40-49% Pass		Pass - Assessment demonstrates some advanced knowledge and understanding of the subject informed by current practice, scholarship and research. Work may be incomplete with some irrelevant material present. Sometimes demonstrates the ability to analyse and apply concepts within the complexities and uncertainties of the subject/discipline. Acceptable research with evidence of basic ability in the utilisation of research methodologies. Demonstrates some originality, creativity and problem-solving skills but often with inconsistencies. Expression and presentation sufficient for accuracy and proficiency. Sufficient communication and expression with professional skill set. Student demonstrates some technical and/or artistic skills.
30-39% Fail	Fails to achieve learning outcomes	Fail - Very limited understanding of relevant theories, concepts and issues with deficiencies in rigour and analysis. Some relevant material may be present but be informed from very limited sources. Fundamental errors and some misunderstanding likely to be present. Demonstrates limited ability to analyse and apply concepts within the complexities and uncertainties of the subject/discipline. Limited research scope and ability in the utilisation of research methodologies. Limited originality, creativity, and struggles with problem-solving skills. Expression and presentation insufficient for accuracy and proficiency. Insufficient communication and expression and with deficiencies in professional skill set. Student demonstrates deficiencies in the range of technical and/or artistic skills.
20-29% Fail		Fail - Clear failure demonstrating little understanding of relevant theories, concepts, issues and only a vague knowledge of the area. Little relevant material may be present and informed from very limited sources. Serious and fundamental errors and virtually no evidence of relevant research. Fundamental errors and misunderstandings likely to be present. Little or no research with no evidence of utilisation of research methodologies. No originality, creativity, and struggles with problem-solving skills. Expression and presentation insufficient for accuracy and proficiency. Insufficient communication and expression and with serious deficiencies in professional skill set. Student has clear deficiencies in range of technical and/or artistic skills.
0-19% Fail		Fail - Clear failure demonstrating no understanding of relevant theories, concepts, issues and no understanding of area. Little or no relevant material may be present and informed from minimal sources. No evidence of ability in the utilisation of research methodologies. No evidence of originality, creativity, and problem-solving skills. Expression presentation deficient for accuracy and proficiency. Insufficient communication and expression and with deficiencies in professional skill set. Student has clear deficiencies in range of technical and/or artistic skills.

文章

7120CEM Natural Language Processing

Contents: