首页 » 语言学 » Empirical Methods

Empirical Methods

2024-11-25 Admin 写评论

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

Empirical Methods Final Report, deadline Thursday 5 December 2024

Electronic Submission on Learn, no cover sheet needed

Submit questions to [email protected] by noon Monday 2 December

Content:

Your individual report should address 3 research questions related to the general 2024 Empirical Methods topic, namely how language varies with context. Your report will provide the reader with an introduction and background to the topic of how speakers’ linguistic expressions vary depending on the context of use. Then you’ll lay out 3 specific research questions, one to represent each of the following:

1. An analysis with categorical dependent & independent variables (table & graph & chi-square statistic)

2. An analysis with a continuous dependent variable & a categorical independent variable (table & graph & t-statistic)

3. An analysis with continuous dependent & independent variables (graph only)

Data: For this report, use the dataset annotations-second-half.txt (or original-transcripts-second-half.txt for analysing disfluency), to be made available in week 11 at http://www.lel.ed.ac.uk/~hrohde/lel2b/maptask-project.html

Marking:

The points allocated for each section will be assessed with an essay-marking approach using the common marking scheme. See the details on the next few pages about what to include in each section. We’ll be looking to see how you use the tools and statistical techniques from class/tutorial as demonstrations of your understanding of the course content.

Word limit:

Overall limit is 1500 words. A rough breakdown is below, but the word counts in brackets are approximate and are just meant to give you a sense of the relative size of each section. The end-of-report bibliography does not count towards your total word count nor do tables/figures/captions. We’re not using a 10% leeway on the word count; 1500 is the max.

Structure:

This is an individual report, not a group report. The structure of the report will follow a standard format for an empirical writeup:

Section 1: Introduction (100 words, 10 points)

Section 2: Background and motivation (300 words, 15 points)

Section 3: Research questions and predictions (100 words, 10 points)

Section 4: Methods (200 words, 10 points)

Section 5: Results (600 words, 45 points)

5.1 Analysis of categorical variables [state which you’re working with]

5.2 Analysis of mix of continuous/categorical variables [state which ones]

5.3 Analysis of continuous variables [state which ones]

Section 6 Discussion (200 words, 10 points)

Section 1: Introduction (100 words, 10 points)

- Name the subfield of linguistics and the kinds of broad research questions therein.

- List research questions (independent of corpus/annotation process; stay high-level).

- Introduce terms (e.g., referring expression, definiteness, disfluency).

- Summarise very briefly topics in prior work (for example, what aspect of speech has been analysed or what methods have been used) to highlight why our study of unscripted dialogue is relevant and goes beyond prior work.

- Describe in very brief terms what our study uses as data and what features of speakers’ language are measured (again, stay high-level, no annotation details).

- Summarise the findings in a sentence or two.

Section 2: Background and motivation (300 words, 15 points)

- Use this section to (briefly) review the phenomenon of speakers' production choices. You should cite at least one general reference for the phenomenon (e.g., encycl.entry) and one reference for the theoretical background that motivates the research question (e.g. Gundel et al.) and one that points to the kind of ongoing work in this domain (one of the papers you’ll have read in week10) so that you can end this section with a description of how our class project goes beyond aspects of that prior work.

- The goal of this section is to highlight what has been done previously, not as an inventory of all prior work but as a targeted characterisation of related prior work that sets up our study as an interesting and novel alternative to what has gone before.

Section 3: Research Questions & Predictions (100 words, 10 points)

Before diving into the methods in Section 4, use this section to make explicit what questions you’re asking and how you’re addressing each question. For each research question, you should spell out what the abstract concept is that you're trying to measure (e.g., the use of different referring expressions or expression length) and what outcome will be explicitly measured (e.g., the annotation of speakers’ use of definites/indefinites or the number of words per referring expression). In addition, for each research question’s measured outcome, specify what the factor is that you’re measuring the impact of (e.g., does expression length vary across familiar versus unfamiliar pairs of speakers). Pin down what the dependent and independent variables are. Make the case for why the MapTask corpus is an appropriate dataset for these research questions. When laying out predictions, it’s best to highlight a plausible generalization or theoretical claim that has implications for the pattern of results one would expect to find.

Section 4 Methods (200 words, 10 points)

Give the details of the corpus and the annotation process so that another researcher could replicate what you did. You'll have to make decisions about how to keep the section brief but also detailed. Avoid tiny details like "I downloaded the file from this URL" or "I opened the datafile as a spreadsheet in Google Sheets", but do include details of where this corpus is from and its properties (you can find the citation from the corpus’ 1991 release on the page about the corpus: https://groups.inf.ed.ac.uk/maptask/maptask-papers.html). List your annotation categories and what kinds of data were included/excluded. Give the reader a sense of the size of the resulting annotated dataset.

Section 5 Results (600 words, 45 points)

For each research question, describe the pattern you see in the data. It’s best to divide the Results section into 3 subsections, one for each research question, where you describe the pattern of results and walk the reader through the relevant evidence. When including tables/figures, label them and discuss them in the text to tell the reader what they show. In the adjacent paragraphs, describe the quantitative pattern so the reader can understand. Ideally, the descriptive statistics will be reported in ways that link the findings to the theoretical questions (e.g., "In keeping with the prediction that speakers would produce more indefinites when a referent is first introduced and more definites across subsequent utterances, Table X shows that the percentage of indefinites is higher for first mentions (XX%) than for later mentions (YY%) ..."). For the estimates of reliability (chi-square and t-statistic), use the format described in the week10 and week11 course materials and tutorials. Make sure that your tables and graphs have understandable headings and labels, and also importantly, always include informative captions. Captions (and table contents and graph labels) don’t count towards the overall word count.

Section 6 Discussion (200 words, 10 points)

Here you should summarise the overall pattern of the results – no numbers, just prose to describe the findings. How do the findings fit with predictions from known generalisations? If there were unexpected findings or potential follow-ups, discuss those here and speculate about possible causes or possible extensions to this work.

Empirical writeups are necessarily a bit redundant so at the very end of the writeup, your concluding paragraph will reiterate the points you made in the Introduction and describe the findings and how they link to theoretical generalisations in pragmatics.

发表评论

电子邮件地址不会被公开。必填项已用*标注

姓名 *

电子邮件 *

验证码 *