Data science

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

Data science

Assignment

Due: 5pm EST, 2/21/2025


1. n-gram

Given the training data

John  read  a  book  by  Jane

John  read  another  book

I  read  a  different  book

(a) Calculate bigrams using maximum likelihood estimates (MLE) and fill out the table.

Bigram

Probability

Bigram

Probability

P(John | )

P(another | read)

P(read | John)

P(book | another)

P(a | read)

P( | book)

P(book | a)

P(I )

P(by | book)

P(read | I)

P(Jane | by)

P(different | a)

P( | Jane)

P(book | different)

(b) Calculate the sentence probability of John  read  a  different  book using only MLE bigram.

(c) Calculate the sentence probability of Jane  read  a  book using only MLE bigram.

2. Evaluation metrics on binary classification

Given the following output,

Actual Label

Predicted Label

0

0

1

1

0

1

0

1

1

1

0

0

1

1

0

1

1

0

0

1

(a) Draw the confusion matrix.

(b) Calculate the Accuracy, Precision, Recall, and F1 score.

(c) Why might using accuracy as the only metric is not ideal?

3. Evaluation metrics on multiclass classification

Given the following confusion matrix of a multi-label classifier

Truth

A

B

C

D

E

F

A

95

1

13

0

1

0

B

0

1

0

0

0

0

C

10

90

0

1

0

0

D

0

0

0

34

3

7

E

0

1

2

13

26

5

F

0

0

2

14

5

10

Classifier

(a) Calculate the precision, recall, and F1 for classes A-F

(b) Calculate the micro-average precision, recall, and F1

(c) Calculate the macro-average precision, recall, and F1

4. Text classfication

The drug review dataset provides patient reviews on drugs and a positive and negative rating reflecting overall patient satisfaction.  The dataset consists of two files:  drug review train .csv for training and drug review test .csv for testing.  Both files contain plain-text, UTF8-encoded sample set in a tab-separated format with the following columns:

• Text

• Binary label (0 and 1)

(a) Use BernoulliNB to build a naıve Bayes classifier(¨).

BernoulliNB

true positive

false positive

false negative

precision

recall

F1-score

positive

negative

(b) Repeat the process in Task (a), but use the SVM (SGDClassifier) model.

SGDClassifier

true positive

false positive

false negative

precision

recall

F1-score

positive

negative

(c) Upload the source codes.

发表评论

电子邮件地址不会被公开。 必填项已用*标注