ECO214
2nd SEMESTER 2023-24 Group Project
BSc Economics – Year 3
ECONOMETRICS II
Group Project
General guidelines
This group project accounts for 20% of your module mark. Each group is expected to conduct an empirical study based on panel data analysis. Please choose a socioeconomic phenomenon or relationship and conduct your investigation using real-world data (see the guideline on choosing topics). As a stand-alone empirical study, the report is expected to follow the structure of a typical academic research (see the recommended structure). The submission is subject to Turnitin to check for similarities. Cases of academic misconduct will be penalized according to university policy.
The source of the topics can be your own experience/knowledge (as an economist), textbook examples (with proper modification), or academic literature. You are free to choose your topics, but please bear in mind that (1) they must facilitate analyses using panel data, and (2) they must be properly motivated (i.e., why is it important/useful to study the problem). With this coursework assignment you are expected to develop a good understanding of the learning outcomes, enhance your software skills, and familiarize yourself with the literature and popular datasets. Please also note that even though the statistical methods and models presented in ECO214 are sufficient to produce many interesting results, you are free to use more advanced panel data methods if they provide additional information or fit your purpose.
Guidelines on choosing topics
Potentially, you may find research topics from the following sources.
1. The first source is your textbooks in other fields of studies (micro/macroeconomics, labor economics, international economics, finance, etc.). Usually these textbooks cover a wide range of economic or financial theories which you can test with real-world data. For example, you learned the concept of productivity function in micro/macroeconomics and you may want to estimate a parametric form using province- or city-level data in capital stock, labor input, and output. This will be a typical panel-data application because there are unobserved idiosyncrasies.
2. A second source of topics is the academic literature. Google Scholar is the best place to search the academic literature. Type a key word and it will return hundreds of articles. You may read an article arguing that the urban land use is determined by income, population, and urban transportation conditions. Following this article, you can collect panel data from China City Statistical Yearbook on (1) urban population, (2) per capita income, (3) transport infrastructure, and (4) urban land use and analyze how the first three factors may affect urban land use.
3. A third source is textbooks in econometrics. Most econometric textbooks emphasize empirical examples and exercises. Thus, they provide a large pool of potential topics. The easiest approach is to take one of the problems and apply the empirical model to your own data.
4. In addition, I encourage you to find your own topics by means of deep thinking. Deep thinking produces interesting research questions. To give an example, you may model housing price to be jointly determined by demand and supply factors. However, there are many of them. It is then your job to narrow down to a few major factors and collect data accordingly. These cannot be done without deep thinking. Even if you adopt a research question raised by others, deep thinking will help you refine the question and generate new insights. For instance, in the model of Chinese housing price, you may want to consider factors overlooked by others but may be important in the Chinese context, such as administrative hierarchy and geographical location. These factors may bring further insights into your results.
Below I give a few sample topics.
• Estimate aggregate production function using regional (province- or city-level) data.
• Estimate determinants of pollutants emission using regional data.
• Estimate determinants of housing price using regional data.
• Estimate β-convergence using national data.
• Estimate the effect of training program using population survey data.
• Estimate the effect of having children on labor supply using population survey data.
Although there is no restriction to the scope of topics you may try, to ensure that you obtain meaningful results from the analysis, please adhere to the following principles.
1. Please make sure you test an economic model, rather than an accounting identity. An economic model is a hypothetical functional form that describes how one variable is determined by other variables. The exact form of this function is unknown and must be estimated using real-world data. For instance, we often assume a Cobb-Douglas production function Y = AKαLβ, where Y stands for output (GDP or value added), K for capital stock, L for labor input, and A is called the total factor productivity (TFP). In this formulation, the parameters α and β are unknown, which can be estimated using real-world data.
Accounting identities, on the other hand, are known formulas that must be universally true. This statement has two implications. One, the parameters of the formula are all known, which means there is no need to estimate them. Second, the relationship must be always true for any data set, except for measurement error or statistical discrepancy. To illustrate, you all learned in Principless of Macroeconomic that Y = C + I + G + NX, where Y stands for GDP, C for personal consumption expenditures, I for private investment, G for government spending, and NX for net export. This is an accounting identity because the use of outputs must be one of the four types. Here we have a linear function in C, I, G, and NX, but their coefficients are known to be unity. Hence, it is meaningless for you to estimate this equation.
2. The empirical model must be properly justified. Please refer to economic theory and empirical literature when you choose the variables (regressors), and carefully explain why you decide to use this explanatory variable, and how it affects the dependent variable. Usually, the literature provides a clear guidance on the mechanisms and factors that should be controlled. The literature also helps you avoid some common mistakes in modeling. These include:
• Incomplete model: Some important factors are not considered. For example, in studying the determinants of regional output (GDP), labor, educational input, FDI, and public infrastructure are considered, but capital stock is missing.
• Duplicate measures: Two or more variables measure the same concept. For example, in studying determinants of housing price, regional per capita GDP and per capita disposable income are both used. They are very close to each other, and they both can be used to measure regional income.
• Poor or wrong measures: Sometimes multiple variables can be used to measure the same concept (factor), but some are better than others. For example, the quality of health services can be measured by quite a few variables, including hospital beds per person, per capita fiscal spending on medical services, etc., but infant mortality rate is a lot more popular in the literature. As another example, if you plan to measure the level of regional income, per capital GDP is superior to aggregate GDP.
If you study a unique factor or mechanism that can hardly been found in the literature, then please think hard over the mechanism and measurement. Please be careful in developing your argument and choosing the measure.
3. Data must be available for all the variables in your model. Data availability is usually a major challenge for empirical studies. Using the Cobb-Douglas production function as an example, usually data on GDP (or value added) and labor input (employment) are relatively easy to obtain, but data on capital stock are seldom provided by the statistic bureau. If data on capital stock is unavailable, in principle the estimation cannot be done. In this very case, a common strategy is to estimate the capital stock using data on investment and the perpetual inventory method.
If your study employs country-level, province-level, or city-level aggregate data, please keep in mind that government agencies or international organizations are your only data sources. Please check their websites or publications (statistical yearbooks) to verify that the data you need are available. If you plan to collect data by a survey, please think carefully about implementation issues.
If data availability is a problem, you have three options: First, you can change your measurement. For instance, if you need data on the number of permanent residents in cities, but such information is not provided, you can use the number of registered residents instead. Second, you can modify your topic by using a different variable. As an example, you may want to estimate the aggregate production function. In that situation you need production capital stock of the city or province. Suppose that these data are unavailable but the statistical yearbooks do provide data on the capital stock of the secondary industry, then you can narrow down your topic to the production function of the secondary industry. In what follows you need to use industrial value added as the dependent variable. If both options fail, you had better think about a different topic for which data are available.
Guidelines on using data
Please pay attention to the following issues when choosing your sample.
1. A large sample is always recommended. Although it was mentioned in the lecture that the minimal sample size could be as small as 50, in empirical studies it is highly recommended that you have far more data. A sample size of a few hundred or more is preferred.
2. Please make sure that the variables of interest have enough variations over time. The fixed-effects estimator performs the entity-demeaning transformation on all variables. If there are small variations over time, then the variances of the regressors will be small. Consequently the slope coefficients won't be precisely estimated, and the standard errors will be large. These will invalidate your interpretation and you are unlikely to pass any test for statistical significance. In the most extreme case, the fixed-effects estimator cannot assess the effect of any factor that is constant over time.
Aggregate socioeconomic data at the city-, province-, or country-level can be downloaded from online sources. Below are some frequently used ones.
Statistical yearbooks offered by CNKI (access from XJTLU library):
XJTLU library home->Databases-> China Statistical Yearbooks Database
Data offered by the National Statistics Bureau (register to download):
http://data.stats.gov.cn/index.htm
World Bank Open Data (all indicators):
https://data.worldbank.org/indicator?tab=all
IMF data:
https://www.imf.org/en/Data#global
Eurostat:
https://ec.europa.eu/eurostat/data/database
OECD.Stat:
https://stats.oecd.org/index.aspx?lang=en
The Penn World Table:
https://www.rug.nl/ggdc/productivity/pwt/?lang=en
A rich collection of online data sources (including U.S. labor survey data) compiled by the American Economic Association:
https://www.aeaweb.org/resources/data
Please note: Some data sources cannot be accessed from China, please find technical solutions.
Guidelines on designing the analysis
You are expected to employ appropriate methods (including those not covered by this module) in your empirical analysis. Although there is no fixed rule for good research design, quality researches share these common features:
1. The analytical framework is carefully chosen to answer the research question and to analyze the data.
2. Alternative model specifications or extensions of the model are explored to extract further information from the data, to address data problems, and to consolidate the main findings.
3. The results are interpreted and analyzed in detail.
Please avoid these common mistakes among past students:
1. Trying all the regression models or analytical methods learned in this module. Please bear in mind that your ultimate objective is to answer research questions. The coursework is not supposed to be an exercise on everything you learn. Contents unrelated to the research question damage the quality of your work.
2. Presenting the analytical results without much interpretation. It is the interpretation, not the numerical results generated by software that answers the research question. Without proper interpretation, the results make little sense.
3. Copying the analytical framework of a past student work that earned a high mark. Their analysis serves their research question and their data, which are different from yours. Blindly copying other students’ analytical framework often results in a poor report.
Guidelines on format
1. I recommend a word count of 2,000. This is not mandatory: the mark is not explicitly linked to the word count.
2. I recommend the following structure for each section:
a. Title;
b. Motivation of research question;
c. Description on the data sources, variable measurement, and empirical model (including testable hypotheses, if any);
d. Presentation of the analysis, interpretation, and statistical inferences;
e. Discussion of results and conclusion;
f. References (if any);
g. Appendix (see below).
To give you a better idea of the exact form of testable hypothesis, let’s consider the example of housing price. It can be expressed as a sentence followed by an algebraic expression, such as “According to … (give economic theory or reference), migrants flowing into the city create demand for housing and push up housing price, so the slope coefficient of migrant population in Equation (give number) is expected to be positive. That is, βmigrant > 0 in that equation.”
3. All Stata code and regression output must be reported in the appendix, placed at the end of the report. You should also include figures and tables in the main text and tables should be formatted as those in the textbook (for example, Table 10.1). Please do not present tables or Stata code/output as screenshots.
4. Please use the accompanying MS Word template to prepare your final report. Please insert your digital signature as a picture in the cover page. Please do not alter the format (font, line spacing, page margin, etc.) of the first two pages of the document. Please submit your final report as a MS Word document. PDF files are not accepted.
Important Dates
Group formation deadline: March 3, 2024 (Sunday) 24:00 PM
Event Open date Close date
Report submission March 4, 2024 (Monday) 0:00 AM April 14, 2024 (Sunday) 24:00 PM
Peer assessment March 4, 2024 (Monday) 0:00 AM April 14, 2024 (Sunday) 24:00 PM
Marking components and weights
Component Weight
Motivation of research question 10%
Data description 15%
Empirical model and statistical method 25%
Presentation of results and inferences 30%
Discussion and conclusion 10%
Structure, format, and writing 10%
Total 100%
This gives the base mark, which applies to all group members. After the presentation, group members are required to assess the performance of their peers. Individual marks will be jointly determined by the base mark and peer assessment. In principle, the individual mark can be higher or lower than the base mark, depending on one’s relative contribution. The weight (20%) determines the magnitude of these deviations. The algorithm used by the Learning Mall peer assessment function is described here:
http://webpaproject.lboro.ac.uk/academic-guidance/a-worked-example-of-the-scoring-a algorith/.
Please place your evaluations truthfully, with full respect for your teammates' effort. I will not override peer assessment results even if disputes arise after the release of component marks.