Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
Empirical Methods Final Report, deadline Thursday 5 December 2024
Your individual report should address 3 research questions related to the general 2024 Empirical Methods topic, namely how language varies with context. Your report will provide the reader with an introduction and background to the topic of how speakers’ linguistic expressions vary depending on the context of use. Then you’ll lay out 3 specific research questions, one to represent each of the following:
1. An analysis with categorical dependent & independent variables (table & graph & chi-square statistic)
2. An analysis with a continuous dependent variable & a categorical independent variable (table & graph & t-statistic)
3. An analysis with continuous dependent & independent variables (graph only)
Data: For this report, use the dataset annotations-second-half.txt (or original-transcripts-second-half.txt for analysing disfluency), to be made available in week 11 at http://www.lel.ed.ac.uk/~hrohde/lel2b/maptask-project.html
Word limit:
Overall limit is 1500 words. A rough breakdown is below, but the word counts in brackets are approximate and are just meant to give you a sense of the relative size of each section. The end-of-report bibliography does not count towards your total word count nor do tables/figures/captions. We’re not using a 10% leeway on the word count; 1500 is the max.
This is an individual report, not a group report. The structure of the report will follow a standard format for an empirical writeup:
Section 1: Introduction (100 words, 10 points)Section 2: Background and motivation (300 words, 15 points)Section 3: Research questions and predictions (100 words, 10 points)Section 4: Methods (200 words, 10 points)Section 5: Results (600 words, 45 points)
5.1 Analysis of categorical variables [state which you’re working with]5.2 Analysis of mix of continuous/categorical variables [state which ones]5.3 Analysis of continuous variables [state which ones]
Section 6 Discussion (200 words, 10 points)
Section 1: Introduction (100 words, 10 points)
- Name the subfield of linguistics and the kinds of broad research questions therein.- List research questions (independent of corpus/annotation process; stay high-level).- Introduce terms (e.g., referring expression, definiteness, disfluency).- Summarise very briefly topics in prior work (for example, what aspect of speech has been analysed or what methods have been used) to highlight why our study of unscripted dialogue is relevant and goes beyond prior work.- Describe in very brief terms what our study uses as data and what features of speakers’ language are measured (again, stay high-level, no annotation details).- Summarise the findings in a sentence or two.
- Use this section to (briefly) review the phenomenon of speakers' production choices. You should cite at least one general reference for the phenomenon (e.g., encycl.entry) and one reference for the theoretical background that motivates the research question (e.g. Gundel et al.) and one that points to the kind of ongoing work in this domain (one of the papers you’ll have read in week10) so that you can end this section with a description of how our class project goes beyond aspects of that prior work.- The goal of this section is to highlight what has been done previously, not as an inventory of all prior work but as a targeted characterisation of related prior work that sets up our study as an interesting and novel alternative to what has gone before.
Section 3: Research Questions & Predictions (100 words, 10 points)
Before diving into the methods in Section 4, use this section to make explicit what questions you’re asking and how you’re addressing each question. For each research question, you should spell out what the abstract concept is that you're trying to measure (e.g., the use of different referring expressions or expression length) and what outcome will be explicitly measured (e.g., the annotation of speakers’ use of definites/indefinites or the number of words per referring expression). In addition, for each research question’s measured outcome, specify what the factor is that you’re measuring the impact of (e.g., does expression length vary across familiar versus unfamiliar pairs of speakers). Pin down what the dependent and independent variables are. Make the case for why the MapTask corpus is an appropriate dataset for these research questions. When laying out predictions, it’s best to highlight a plausible generalization or theoretical claim that has implications for the pattern of results one would expect to find.
Give the details of the corpus and the annotation process so that another researcher could replicate what you did. You'll have to make decisions about how to keep the section brief but also detailed. Avoid tiny details like "I downloaded the file from this URL" or "I opened the datafile as a spreadsheet in Google Sheets", but do include details of where this corpus is from and its properties (you can find the citation from the corpus’ 1991 release on the page about the corpus: https://groups.inf.ed.ac.uk/maptask/maptask-papers.html). List your annotation categories and what kinds of data were included/excluded. Give the reader a sense of the size of the resulting annotated dataset.
Section 5 Results (600 words, 45 points)
Here you should summarise the overall pattern of the results – no numbers, just prose to describe the findings. How do the findings fit with predictions from known generalisations? If there were unexpected findings or potential follow-ups, discuss those here and speculate about possible causes or possible extensions to this work.