SESS0023 Applied Econometrics

SESS0023 Applied Econometrics 2023-2024

This CW contributes 25% to your final mark

Please read Guidance on AI use in the assessment (available on Moodle)

before you start working on this CW

Group Coursework Description

NOTE: Affiliate students who are at SSEES for Term 1 only, will be assessed by a single individual Coursework, which is uploaded on Moodle under different link (see section ‘Assessment for Term 1 Affiliate Students’).

CONTENT:

1. General information.

2. Group and dataset allocation.

3. Submissionю

4. Exercise 1. Modelling Educational Attainment.

4.1 Data and description of variables.

4.2 Questions for Exercise 1.

5. Exercise 2. Modelling AR(p) process.

6. Exemplary question for Exercise 1, its answers and presentation.

1. General information

You have to prepare a short project in which you will demonstrate the ability to conduct basic empirical (comparative) analysis. The empirical part of the project has to be done with the use of the statistical package Stata. The project will consist of no more than 2500 words. The project should include all relevant Stata tables and graphs. The project mark will contribute 25% towards the final overall mark.

There are TWO EXERCISES in the Coursework. The questions to be answered are identical for all of you. Each exercise worth 50% of the total Coursework mark.

2. Group and dataset allocation

You’ll be working on the Coursework in GROUPS of up to three members. However, each of you will be allocated individual dataset which you will be ask to combine at some stage (see description of the exercises in Sections 4 and 5).

You can find the allocation of the datasets for Exercises 1 & 2 in the PDF file on the course Moodle page under “Allocation of datasets for coursework”. In this spreadsheet, you will find your student number along with the number of the dataset (data_1.dta, data_2.dta, etc.) for Exercise 1 and time series variable (y1, y2, etc.) which you should use for Exercise 2. Please download the appropriate datasets from the corresponding “Datasets for coursework” folder.

NB: If you are not on the list, please email Dr Svetlana Makarova ([email protected]) immediately. In your message indicate your Students Number, programme and year of study.

3. Submission

Download the Front Page (see Moodle, the CW-1 section) for the coursework, attach it to your coursework and:

1) indicate Students Numbers for all group members;

2) fill in the table for confirming contribution of each member.

NB: If you have concerns regarding unequal contribution of group member(s), please email Dr Svetlana Makarova ([email protected]).

There are TWO parts in the submission:

Part 1: Electronic version (in Word or PDF format) must be uploaded into Turnitin via link provided on course Moodle webpage (do not forget to attached the Front Page).

Part 2: Log-file that contains all records of your work with empirical data while preparing to the Coursework Exercises must be uploaded into Moodle via link provided on course Moodle webpage. Please note that do-file is not required, it won’t replace log-file and won’t be counted as a part of the submission.

Deadline: 3 PM on Thursday 25 January 2024.

4. Exercise 1. (50%)

The dataset dataN.dta (where N is your number on the list) contains cross section data. Variables are named identically in all datasets, but datasets are different. It means that the same questions should be answered with the use of individual dataset, however the numerical results will be different and, therefore, might lead to a different interpretation of empirical outputs.

4.1 Data and variables description

(see Ch. Dougherty, 2007, Introduction to Econometrics, 3rd ed., Oxford University Press)

The data set is a sub-set of a major US data-base, the National Longitudinal Survey of Youth (NLSY79). Each dataset contains data for each respondent on the following variables (C indicates a continuous variable, D a dummy variable):

Personal variables

ID                     C Respondent identification number

FEMALE             D Sex of respondent (0 if male, 1 if female)

MALE                 D Sex of respondent (1 if male, 0 if female)

AGE                   C Age in 2002

HEIGHT85          C height in inches in 1985

WEIGHT85         C weight in pounds in 1985

WEIGHT02         C weight in pounds in 2002

S                       C years of schooling (highest grade completed as of 2002)

Ethnicity:

ETHBLACK          D black

ETHHISP            D hispanic

ETHWHITE         non-black, non-hispanic

Highest educational qualification:

EDUCPROF          D Professional degree

EDUCPHD            D Doctorate

EDUCMAST          D Master’s degree

EDUCBA               D Bachelor’s degree

EDUCAA               D Associate’s (two-year college) degree

EDUCHSD             D High school diploma or equivalent

EDUCDO               D High school drop-out

Marital status

SINGLE                 D Single, never married

MARRIED              D Married, spouse present

DIVORCED             D Divorced or separated

Score on a component of the ASVAB battery (scaled with mean 50, standard deviation 10):

ASVAB02                 C arithmetic reasoning

ASVAB03                 C word knowledge

ASVAB04                  C paragraph comprehension

ASVAB05                  C Numerical operations (speed test)

ASVAB06                  C Coding speed (speed test)

ASVABC                    C composite of ASVAB2 (with double weight),ASVAB3 and ASVAB4

Faith

FAITHN                     D None

FAITHC                     D Catholic

FAITHJ                      D Jewish

FAITHP                      D Protestant

FAITHO                     D Other

Family background variables

SM                          C mother’s years of schooling

SF                           C father’s years of schooling

SIBLINGS                 C number of siblings

LIBRARY                   D Member of family possessed a library card when respondent was 14

POV78                      D Family living in poverty in 1978

Region of residence (census classification):

URBAN                      D living in an urban area

REGNE                       D north-east

REGNC                      D north-central

REGW                       D west

REGS                        D south

Work-related variables

EXP                           C      total years of work experience

EARNINGS                 C      current hourly earnings in 1996 constant dollars

HOURS                      C      hours worked per week

TENURE                     C      years worked with present employer

COLLBARG                  D      pay set by collective bargaining, 2002

Category of employment:

CATGOV                     D      Government

CATPRI                       D      Private sector

CATSE                        D

4.2 Questions for Exercise 1

In this exercise you will formulate and estimate a model explaining average hourly wage rate, named as EARNINGS. On what characteristics personal earnings might depend? In particular, do earnings depend on schooling (named S in the file), working experience (EXP) gender (MALE) and/or other factors?

Please concern all the following aspects (in brackets there is a percentage indicating by how much each aspect contributes to the overall mark for Exercise 1):

1. (5%) Plot the scatter diagrams for earnings against experience and/or schooling based on each individual dataset. Explain the scatter diagram(s) you received. What other characteristics that are available in the file might affect earnings? Explain your choice (e. g you might wish plot scatter diagrams or compute correlation matrix to support your conclusions).

2. (25%) Do earnings of the individual might be explained by working experience, years of schooling and gender? Choose one of the datasets (indicate explicitly the dataset name) and regress EARNINGS on S, EXP, MALE and interpret the regression results answering the following questions:

i. Formulate an econometric model for explaining earnings by schooling, experience and gender. Present Stata estimation output as a table and as an equation.

ii. Formulate and perform test for overall significance of the regression model. Explain your result. What is R-squared for this model? Give its interpretation. Do you think it is high or low and what does this mean?

iii. Test significance of individual coefficients. For this, formulate the appropriate null and alternative hypothesis for performing the t-test. Explain your conclusion. Give a precise interpretation of all the estimated coefficients.

iv. Perform residual analysis for your model and explain your finding. In particular, comment on whether your model suffers from heteroscedasticity or not.

v. Follow the steps below to combine datasets of all members in your group into a new dataset. Make sure that you delete duplicated observations.

a) Keep one individual data set loaded in the Stata memory. (for example, data_100.dta).

b) From the drop-down menu choose: Data -> Combine datasets -> Append datasets and fill in the related fields in the pop–up window (for example, choose data_200.dta).

c) Sort the observations by the command:

sort ID

Open data browser and explore the presence of duplicated observation.

d) If needed, delete the duplicated observations by using the command: duplicates drop ID, force

e) Save newly created file by using drop-down menu: File -> Save as (For example as combined_data.dta.)

vi. Re-estimate the original model using expanded dataset. Compare the estimation output with those obtained in point 2.iv above and explain possible differences. Analyse residuals and compare your finding(s) with those in point 2.iv. Perform statistical testing for heteroscedasticity and explain your results. What are the consequences of your findings for hypothesis testing and parameter estimates? Suggest and implement remedies if heteroscedasticity is a problem for your model.

3. (15%; continue working with combined dataset) To decide, how the model for earnings above can be improved, answer the following questions:

i. Does this model allow for testing whether marginal effect of schooling on earnings depends on gender? If your answer is ‘yes’, explain how you can test this. If your answer is ‘no’ suggest your approach to answer this question (e.g. introducing more variables in the model, changing functional from etc) and explain results.

ii. Discuss briefly what other factors (out of these given in the dataset) might affect earnings. What other functional form might be suitable for modelling earnings (e.g. log-log or log-level)? Support your conclusions with brief quantitative and/or graphical evidence.

4. (3%) What are the overall conclusions of your investigation? Are there any policy conclusions which can be drawn from it?

5. Log file for this exercise accounts for 2% of the overall mark for this Exercise.

5. Exercise 2. (50%).

File: TS_Exercise_2.dta

The file contains 92 time series, but each of you need ONLY TWO time series variables: variable time which indicates time at generic frequencies, and variable yN, where N is your number on the list of allocated dataset and variables (e.g. if your number is 100, then you will be working with variable y100).

You may wish to delete all other ‘y-variables’ that you don’t need (use Stata command ‘drop’ and save new dataset as TS_yN.dta, e.g. TS_y100.dta, if your number in the list is 100).

All ‘y-variables’ are either stationary or are the unit root process and become stationary after taking first difference. In this exercise you will be asked to perform visual analysis to decide whether a particular ‘y-variable’ that is allocated to you (denoted here as yN) is stationary or not, transform it into a stationary form if necessary and then fit an autoregressive process of a proper order to it.

Use the variable named yN1, yN2, yN3, where N1, N2, N3 are corresponding numbers of your group members on the allocation list.

1. (48%) Answer the following questions:

i. Plot (separately) time series graphs and correlograms for variables yN1, yN2, yN3 and comment on their stationarity.

ii. If at least one of the variables yN1, yN2, yN3 is nonstationary, then choose it (say explicitly which variable has been chosen), generate its first difference and name it as Z. Check Z for stationarity by using a time series graph and a correlogram; then go to question 1.iv. Otherwise, go to question 1.iii.

iii. If you decide that all the variables yN1 – yN3 are stationary, then choose one them (it can be any variable, but indicate clearly which one you have chosen), rename it as Z and go to the next question.

iv. Fit AR(p) model for variable Z and justify your choice of p.

2. Log file for this exercise accounts for 2% of the overall mark for this Exercise.

6. Exemplary question for Exercise 1, its answers and presentation

Below is the exemplary answer showing what is expected from you in terms of answering one particular question related to your results.

Question.

Fit educational attainment by regressing S on ASVABC and SM and interpret the regression coefficient on respondent’s mother schooling.

Answer:

The table below gives the regression output (remember to choose Courier New 9 font to obtain proper formatting):

reg S ASVABC SM

Source | SS df MS Number of obs = 540

-------------+------------------------------ F( 2, 537) = 139.54

Model | 1137.7605 2 568.880251 Prob > F = 0.0000

Residual | 2189.23765 537 4.07679264 R-squared = 0.3420

-------------+------------------------------ Adj R-squared = 0.3395

Total | 3326.99815 539 6.17253831 Root MSE = 2.0191

------------------------------------------------------------------------------

S | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

ASVABC | .1138736 .0098421 11.57 0.000 .0945399 .1332074

SM | .2358294 .0361032 6.53 0.000 .1649085 .3067502

_cons | 5.128721 .5209361 9.85 0.000 4.105398 6.152043

1. The estimated coefficient on mother schooling is 0.24 and it is significant at 5% significance level as p-value for this coefficient is less then 0.00001 (and hence <0.005), so we can reject the null hypothesis of coefficient on SM = 0 and accept the alternative hypothesis of coefficient on SM is not equal to zero.

2. The magnitude of 0.24 indicates that schooling (on average) increases by 0.24 years for each additional year of education of the mother.


发表评论

电子邮件地址不会被公开。 必填项已用*标注