EC348: Assignment 1
In this assignment, you can test your newly acquired skills to replicate certain results from a published paper. It is important to discuss the results as well. Your table format for presenting the results can differ from the table structure used in the paper but try to derive the estimates as close as possible to the published estimates.
This paper presents evidence of the short and long-term impact of the COVID-19 crisis on India’s rural youth. There are two components in the paper. First is a panel data component that reports the effects of the pandemic on job search, migration intentions, marriage, life satisfaction, and anxiety among more than 2000 vocational training graduates. More importantly, this section estimates the gap in salaried employment changes between the subgroups (based on gender, training completion, and caste). The second part provides evidence from an RCT intervention. The authors randomly allocated half of the sample to treatment (1122) and control arm (1138). The treatment individuals were trained to search and apply for jobs on an online job search portal (Yuvasampark). Hence, exposure to this ‘Yuvasampark’ portal is considered as the ‘treatment’.
You are asked to replicate:
1. Table 2- This table estimates the salaried employment based on the training status of the respondents (trained or training dropout), in various survey rounds. The variable names in the dataset for the outcome variable (salaried employment) for each survey round are as follows:
a. Pre-lockdown 2020 - Salaried_Prelockdown
b. Jun-Jul 2020 - Salaried_Jun_Jul2020
c. Mar-Apr 2021 - Salaried_Mar_Apr2021
d. Nov-Dec 2021 - Salaried_Nov_Dec2021
The control variables undoubtedly remain the same for all the survey rounds as these were collected at the baseline. The variable names for the controls that are included in each column are as follows:
a. Column 1 - Training_complete (Trained=1, Dropout=0)
b. Column 2 - Training_complete and sector1-sector10
c. Column3 - Training_complete, sector1-sector10 and individual level controls (c_gender c_caste c_age_above_20 c_respondent_maritalstatus2 c_respondent_religion2 c_respondent_religion3 c_respondent_education1 c_respondent_education2 c_respondent_education4 c_Matric_exam c_Inter_exam c_respondent_migrate)
d. Column 4- Training_complete, sectors, individual level controls, and household characteristics (c_household_earning1 c_household_earning2 c_household_earning3 c_agriculture_land c_BPL_card c_RSBY_Card c_SHG_member c_MNREGA c_internet_use c_relatives_migrate c_difficulty_immediate_famil c_difficulty_future_family)
2. Panel B of Table 4: This table shows the impact of the randomized intervention (training on Yuvasampark) on the treatment group. The variable names for the three outcome variables are Main_outcome_1_Job_applied, Main_outcome_2_Job_applications, Main_outcome_3_Job_applications. Needless to mention the main explanatory variable is ‘treatment’ Don’t worry about the q-values, you do not need to generate this. Your results could be slightly different from those in the paper.
3. Table A9 (From the online appendix of the paper): This is the balance table that tests if the randomization has worked. As discussed in the lectures and previous workshops, the balance table typically consists of the control mean, treatment mean, the difference in the means, and the p-valuefor the difference. The variable name for the treatment in the dataset is ‘treatment’ and all variables with c_ as prefix are the baseline variables. No need to generate panel F.