CE 314/887 Assignment 2 Text classification

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

CE 314/887 Assignment 2

Text classification

December 2024

Deadline: Please follow deadline on FASER

Build a text classifier on the Emotions sentiment classification dataset - a collection of English Twitter messages meticulously annotated with six fundamental emotions: anger, fear, joy, love, sadness, and surprise. You can use any classification method except the Naïve Bayes method and rule based method, but you must training your model on the first 90%  instances and testing your model on the last 10% instances. The Emotions dataset will be uploaded on the moodle page for you to download.

Dataset Homepageshttps://www.kaggle.com/datasets/nelgiriyewithana/emotions

Some tutorials can be found here:

https://www.kaggle.com/datasets/nelgiriyewithana/emotions/code?datasetId=4403839&sortB y=voteCount

Your code should include:

1: Read the file, incorporate the instances into the training set and testing set.

2: Pre-processing the text, you can choose whether you need stemming, removing stop words, removing non-alphabetical words. (Not all classification models need this step, it is OK if you think your model can perform better without this step, and you can give some justification in the report.)

3: Analysing the feature of the training set, report the linguistic features of the training dataset.

4: Build a text classification model, train your model on the training set and test your model on the test set.

5: Summarize the performance of your model (You can gain additional marks if you have some graph visualization).

6: (Optional) You can speculate how you can improve your works based on your proposed model.

7. You need to include a ‘readme ’ file in your submission, which you need to tell:

- What’s you python version and third-libs used in you assignment (also give the version).

- How to run you code.

After you build such a model and test on the test set, you should write a report (no longer than three pages in A4,with Arial 11 fonts) to summarize your work.

(You can use the existing algorithms on github or kaggle (please include the references on you codes), but you must not directly copy and paste their code!

However,you are not allowed to use the Naïve Bayes algorithm and VADER classifier, which practiced in Lab 4)

Suggestion: some bonus points:

Have necessary comments on your code

Have proper reference on your report

Have graph visualization on your report

Investigate more evaluation methods, like not only show the P R F score, but also run multiple times and show the standard derivation on P R F (I am sure you can find more evaluation methods.)

Write your report like a mini-conference paper (you can learn from this paper:

•     Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard   Hovy. 2016.Hierarchical Attention Networks for Document Classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,    pages 1480-1489, San Diego, California. Association for Computational

Linguistics.

)


发表评论

电子邮件地址不会被公开。 必填项已用*标注