Economics 253 - Spring 2024 Fundamentals of Econometrics: Empirical Paper

Fundamentals of Econometrics: Empirical Paper

Economics 253 - Spring 2024

Throughout the course we will learn the knowledge and skills needed to do empirical research. The structure of this assignment is designed to mimic a research contract with an explicit delivery schedule (the due dates of each part of the assignment), and an emphasis on clear presentation that can be understood by a general audience.

Topics and Data

Topic Selection

You are welcome to choose your own topic for this research project. An eligible topic is a testable hypothesis about an economic relationship. This means you must investigate how some cause or treatment (your x variable) affects or changes an outcome (your y variable).  I will evaluate your topic in your project proposals, but you are encouraged to discuss your potential topics with me before you submit your proposal.

Topicselection culminates in a research question for which there is available data to test empirically. Topics that result in ‘A’ papers tend to be those that are interesting to you and where the answer may change how you think about economic phenomena or inform policymakers in their decision- making. Uninteresting topics are usually those in which the cause and effect relationship is purely mechanical or entirely predicable with little utility (i.e the effect of three-point attempts on three- point shooting percentage.)

Data Sources

This class will focus on econometric analysis of micro-level  data,  in which each data point is an individual person or observation (for example, each data point is a person who reports if they are employed).  Some data you encounter may be highly aggregated  (e.g.  each data point is a states unemployment rate), or time-series financial data (e.g.  the unemployment rate over the last few years).  While you are welcome to use these types of data, I strongly encourage you to try and develop a topic using micro-data.

Data collection and assembly are the most difficult and time intensive components of any empirical research project. You are welcome to find your own data using the library or other freely available data source. However, as this is not a requirement, you are encouraged to consider using one of a selection of sample datasets I will make available on Moodle.  These are example of large micro-level datasets commonly used in economic research. Although we will learn how to assemble and clean data, it is never to early to begin working on your data, and you should consult the Programming Resources section of the course Moodle page for extra resources on cleaning, assembling and merging data.

An ‘A’ Paper Requires a Well Executed Identification Strategy

Your paper will be graded using the rubric attached at the end of this assignment.  Papers are scored first into letter grade tiers based on the identification strategy, which is your choice of model and estimator to identify the economic relationship you are interested in. Merely including additional control variables without a clear identification strategy will result in a ‘C’ paper.  Incorrect or in- sufficient application of one of the identification strategies we learn in class will result in a ‘B’ paper.

Only papers that correctly apply one of the identification strategies we learn in class can receive an ‘A-’ or ‘A’. Importantly, a correct application requires clear and accurate descriptions of your data and methods and correct interpretations of your findings.  The identification strategies we will learn in class include,

. Fixed Effects for Panel Data

.  Difference-in-Differences

.  Regression Discontinuity

. Instrumental Variables

Disparities Estimation

Studies of disparities associated with gender, race/ethnicity, or other group distinctions are essen- tial to understanding social issues and developing policy solutions. Econometric tools can assist researchers in measuring disparities that may be obfuscated by confounding relationships. Impor- tantly,  identifying disparities is not the same as determining the causes of disparities.  You are encouraged to pursue research questions that interrogate the causes of disparities, such as discrim- ination or economics disadvantage. From a modeling perspective, the disparity being investigated would be the outcome in your model. A very well executed disparities estimation paper can receive an ‘A’ even though it is not causally identified.

Models of social or economic outcomes  (e.g.,  a wage equation) in which group identifiers (e.g., gender or race/ethnicity) are used as treatments on the right hand side are rife with opportunities for misinterpretation and often confuse correlation for causation.  If you feel passionate about a research question that involves identifying or quantifying a disparity, please consult with Prof. Biener before proposing you topic to discuss how to appropriately conduct such a study as to avoid these pitfalls.

Proposal

Due Friday, February 16

The proposal is a one page description of your research question, and must contain evidence that your project is feasible. The proposal must contain the following:

Clearly stated research question. For example, ”I want to determine whether uninsured people were more likely to see a doctor after they became eligible for Medicaid under the Affordable Care Act.” A well formed research question will articulate the dependent variable of interest  (whether they saw a doctor), The independent variable of interest  (eligible for Medicaid), and the population being studied  (uninsured Americans).   You must motivate your research question with some answer to the question ”why should I care what you find?”

Initial  literature  review. For your proposal, you are required to cite at least one pa- per/publication/article that helps motivate your study. This can be an academic paper that used the same data to answer a similar question, or a press article that motivates your research question. Any motivation or context for your topic must build off of prior literature.

A description of the data being used. You must provide the name of the dataset and who collected/compiled the data.  If there are multiple sources of data, or data you will collect, you need to outline each source and how they are going to be linked together.

A simple population model equation that you intend to estimate. This equation must include at a minimum your independent (X) variable of interest and your dependent (Y) variable from your research question.  It is not important that you know exactly how you will measure these variables for your proposal, but you must be able to articulate the economic relationship you plan to study in terms of a simple bivariate model.

The proposal is evaluated on two dimensions:  completeness (did you provide all of the above infor- mation) and feasibility (is your project something you can finish successfully in the time allotted). If I deem a project unfeasible, I will allow a resubmission, and will provide guidance on how to overcome limitations of the first proposal. Critical comparison of alternative techniques on apartic- ular specification, and/or presentation of a number of alternative specifications against a particular benchmark (such as a set of nested hypotheses) are allowable topics.

A complete and on-time proposal will receive full credit.  A late or incomplete proposal will receive an 85%. If no proposal is submitted, it will receive a 50%.

Data Readiness Report

Due Friday, March 22

The data readiness report will provide a description of your chosen dataset and how you have cleaned and edited the data so you can analyze it. The data readiness report will contain the following,

. A brief description of the data set you are using and the source of the data.

. A list of all the steps you have taken to clean or edit the data so that you can analyze it.

. A clear description of your independent variable (X) and dependent variable (Y) and the unit of observation (i.e., what a row of the data represents).

. A table of descriptive statistics for all the variables you plan to use in your analysis. Details on how to construct a table of descriptive statistics can be found in your Research Skills 2 Assignment

The data readiness report will be scored on completeness, accuracy, and the quality of the table of descriptive statistics.  A late or incomplete proposal will receive an 50%. If no report is submitted, it will receive a 0%.

Preliminary Findings Report

Due Friday, April 12

In practice, research contracts will have scheduled delivery of preliminary, or early, findings.  These are used to make adjustments to the research plan if needed and to test whether it is possible to finish the project on schedule.  Even among academic co-authors, writing up preliminary findings can be useful for conference presentations, or to highlight problems early so they can be addressed sooner rather than later.  Your second delivery is a short write up of your preliminary findings. Your preliminary findings have no page limit, and must include:

A restatement of your research question. I am looking to see how your research question has evolved from my feedback on your proposal.   Only copy  and paste here if I made no comments!

A brief restatement of the data. Tables of estimates or statistics should never be pre- sented without some citation of the data source.  Be sure to elaborate if data sources have changed since the proposal.

An explanation of the econometric model you used. Again, an equation is preferred, and you should explain clearly how you estimated your model, and what your identification strategy is.

A  table  of summary  statistics. A  good  table of summary statistics will have mean, standard deviation, minimum and maximum values for all variables used in the analysis. Additionally, the table should be able to show the sample size for at least the largest sample analyzed.

Tables of preliminary results. Tables of regression output are rarely as detailed as those in STATA, and only require estimates for variables of interest, standard errors, some indicator of statistical significance, and the sample size for every estimate.  Additionally, write up an explanation of how you interpret your preliminary findings.

Next steps. Your preliminary analysis does not need to be perfect, or even use the method you will use in your final paper.  You must at least say how, having seen the preliminary results, you plan to move forward with the final analysis.

The preliminary results are not a final delivery, which means your prose can be more causal, and there are no formatting requirements.  However this is still a delivery,  and presentation matters! For example, screenshots or copy and paste version of STATA output in paper drafts are considered poor presentation. Make your own tables and get creative!

The preliminary report will be scored on completeness, accuracy, and the quality of the tables of descriptive statistics and regression results. A late or incomplete pro- posal will receive an 50%. If no report is submitted, it will receive a 0%.

Final Paper Draft

Due Friday, May 3

The final paper draft is the final presentation of your work.   Thus,  it is your responsibility to ensure your work is easy to understand through clear explanation and polished formatting and presentation.  The format for the final paper is similar to that of an academic journal article, but will also resemble a final report often presented to clients who fund research. This style of writing up your work will always be acceptable in an academic or industry environment.  The format for the paper is as follows:

Title. All academic papers need titles.  These are often more advertisements for why a paper is interesting, and need not be a literal description of the paper. Get creative!

Abstract/Executive Summary. This is a brief summary of your papers research question, data, method, and findings. It does not contain references to other articles unless absolutely necessary. It is limited to 100 words or less. Often, the abstract is the last part of the paper written.

Introduction. A brief introduction explains the research question, and its motivation. Why is your topic important? Why should I care?

Literature Review. Make reference to the previous literature on this topic and state both what previous literature has found and how your work contributes to that literature. 

  A literature review is an organized and synthesized summary of a number of academic articles related to a chosen topic.  A literature review should not be a series of smaller summaries of each separate article.  Rather what you want to do is to find common themes throughout the articles, identify areas of controversy, and possibly, formulate questions that need further research. You are building an account of what has been published by researchers. You must have at least three sources. It is allowable to combine your introduction and literature review into a single section if this improves the flow of your paper,  but all parts of the introduction and literature review must be included.

Methods. The methods section explains your theoretical model, explains clearly how you estimated your model, and what your identification strategy is. You should again articulate the hypothesized cause-and-effect relationship between the independent variable(s) of interest and the dependent variable.  Explain your other control variables and how they contribute to your identification strategy. If you have made any deliberate decisions about how to specify your model (interaction terms, categorical variables, etc.)  You should defend your choices here.

Data. You must provide the name of the dataset and who collected/compiled the data.  If there are multiple sources of data, or data you collected, you need to outline each source, and how they are were linked together.  Discuss how you chose your analytical sample and in what sense your data are a random sample from a population. You should briefly discuss how your dependent and independent variables of interest are collected in the data.

Results. All results, even descriptive statistics should only be presented in the results section. Thus, this section should have at minimum two tables. A table of descriptive statistics, akin to the table in your preliminary report, and at least one table of estimation results.  Here, you interpret the significance, sign and magnitude of your estimated coefficients on variables of interest. If findings are contrary to their hypothesized significance, sign or magnitude, note this clearly.  In addition, provide any evidence supporting your choice of a specific model or specification.  For example, if using a log-linear model, show how the dependent variable is log-normally distributed.  If using a difference-in-differences identification strategy, consider a figure showing the trends in the dependent variable for the treatment and control groups. If you omitted influential outliers, perhaps a histogram or residual plot can be used to defend omitting them. Any other tests, models, or figures that support your methods and conclusion should be included and discussed in this section.  You have a license to be creative in your exploration here!

Discussion/Conclusions. Provide a brief summary (include a statement about your ques- tion of interest) and assessment of the models.   Explain  how  your  variable(s) of interest worked out and/or did not work out.  Interpret the significance, sign and magnitude(s) of the estimated variable(s) of interest.  Make a conclusion about the answer to your research question given your findings and offer reasons, if possible, for the results that did not work out.  Include business or government policy recommendations if appropriate, or suggestions for further research.

References. The format for your references is: Author. Year.  “Title of article.”  Source.

For example,

Athey, Susan, and Guido Imbens.  2015.  “A Measure of Robustness to Misspecifi- cation.” American Economic Review  105(5):476-80.

Inline citations should only reference the authors and year,

... as shown by Athey and Imbens (2015), econometric models...

When supporting a claim, append the cite to the end of the sentence in parentheses.

...are robust to misspecification (Athey and Imbens, 2015).

Appendices. Increasingly, there is an emphasis on transparency in econometric research and it is not unusual to supply data and code to enable other researchers to replicate your findings. You are required to provide the following appendices,

 Appendix A: STATA output and do file. You have an incentive to make your do file as concise and organized as possible.   Comments embedded in your  code is good practice, and can make it easier for others to understand your methods.  STATA output should be unadulterated, and there is no limit on length. A do file must be able to run from start to finish without errors.

 Appendix B: Data. Provide full bibliographic details on each dataset and variable of interest you used.  I need to be able to find your data myself so that with the data I could replicate your results with your attached do-file. Do not attach a printout of your raw data.

Coefficient Interpretation. Coefficient interpretation is the most important single element of your paper after your identification strategy.  Coefficient interpretations must reflect the sign and magnitude of the coefficient and correctly incorporate the units of x and y. Your pa- per should ultimately interpret the marginal effect of your x variable of interest, so pay special attention to how you specify your model (e.g., linear probability model, non-linear specifi- cation, interaction terms) so that you correctly compute the marginal effect you are reporting.

Consider a regression of a dummy variable for whether a student passed their course in econometrics on the number of courses they attended with an estimated coefficient of 0.025. A very poor (likely ‘D’ paper) interpretation would be,

”A one unit increase in attendance causes a 0.025 change in passing,”

wheras an excellent ‘A’ paper interpretation would be,

”Attending an additional class is associated with a 2.5 percentage point increase in the likelihood of passing econometrics.”

Presentation. Presentation is an essential component of doing research. Different journals, institutions or projects may have very different style guidelines.  In a contract setting, the researchers are often given significant leeway in how to arrange and present their findings. To mimic the latter setting there are not very specific formatting requirements, though some useful guidelines are included here.

 Your paper should have a cover page that includes the title, your name, and your ab- stract.

– Use page numbers.

 Upload your paper as a PDF and either append the file with your appendices or include these as separate submitted files.  Do not embed any hyperlinks into your paper.  Do not use generic file names and include your name the the name of each submitted file.

 It is not acceptable to copy and paste any STATA output into your paper except for STATA generated graphs.  You must make your own tables.   Please keep formatting decisions  (font,  spacing,  titles  and table notes) consistent across your tables.   Every table should have table notes, so that a reader can understand the content of the table without referring to the text.

– Use whatever means you can to provide a clear structure to the paper.  Headings and sub-headings are very useful (and something that you will observe in the papers that you read for this assignment).  Anything that you can do to make the structure of your paper and your argument easier to follow, the more effectively you will convey your message.

 It is important that your writing is clear and concise. I (and your future bosses) will not be impressed by fancy language that hides poor reasoning.  Clear writing  is  a job  skill that commands  a high premium.

 Your paper should be largely free of grammatical and spelling errors (no one is perfect!). If you are a non-native speaker of English, you may wish to consult with a writing associate or have a friend help you edit your paper.   Grammar  counts.   I  encourage you to trade papers with your peers and proofread them.  This will be one of the most productive things you can do for your paper!

Academic Honesty

Cases of academic dishonesty will be dealt with according to College policy.  College policies are clearly detailed in the  “Student Handbook.” All intellectual work builds on the ideas of others; it is very important to provide appropriate references to the sources you consult, whether they are paraphrased or quoted directly.   Students are not allowed to use advanced automated tools (artificial intelligence or machine learning tools such as ChatGPT or Dall-E 2) on assignments in this course. Each student is expected to complete each assignment without substantive assistance from others, including automated tools.


发表评论

电子邮件地址不会被公开。 必填项已用*标注