Assignment 1
ECN620
Deadline: February 11, 11:59pm
Instructions
1. Students should complete this assignment using RMarkdown. All R work and answers should be saved in a single R-Markdown file and submitted via D2L.
2. Some questions require essay-type answers. Including them in R-Markdown is straightforward.
3. One of the first lines of your R must be: setwd “yourdirectoryhere” Once I change the setwd “ ” command line to the directory where I have the data files for the assignment, R should execute the whole script smoothly.
4. Load all necessary packages at the beginning of your document.
5. Your R script should have all commands used to process the original and any additional data files, including commands used to convert data from Excel (or any other format) into R, changing variable names, etc.
6. Assignments should be submitted through the Assignment portal on D2L. Attachments sent via email will not be graded and will receive a zero mark.
7. Late submissions will be penalized as per the course policies outlined in the course syllabus.
8. THESE ARE QUESTIONS BASED ON THE MATERIAL COVERED IN THE FIRST TWO WEEKS OF CLASSES. AFTER OUR CLASS ON FEBRUARY 5, I WILL ADD MORE QUESTIONS ON THE TOPIC COVERED IN THAT CLASS.
Question 1
1.A
Load the data file “A1_Q1.xls” into R. The second sheet of the file contains detailed descriptions of the variables. This dataset is an extract from the Canadian Population Census for the year 1970; thus, each observation in this file represents an individual. Please note that many variables contain missing values.
1.B
Our main focus will be on analyzing individual income data. Complete the following tasks
• Clean the data by removing records of individuals who had no income in 1970.
• Calculate the average monthly income and wage for each person in the dataset.
• Create a new factor variable Education that categorizes individuals according to their educational level, using verbal descriptions for each category instead of the numerical value. Use the following categories: “Less than high school” (<12 years of education), “High school” (=12 years), “Some post secondary” (>12 and <=15), “Post secondary” (=16) and “Graduate” (>16).
• Construct a table that displays the average monthly income and wage for individuals within each educational category of variable Education.
• Filter the dataset to include only individuals with a college education. Then, calculate the average monthly income for this group. How does it compare to the overall average?
1.C
• marst is a numeric variable that contains four marital status categories. Create a new variable called Married that is a factor variable with labels “Married” (when the person is married) or “Not married” (otherwise).
• How does the average monthly wage of married workers compares to those who are not married?
• How does the average income and salary of retired individuals (65 and other) compare to working age individuals?
1.D
• Using data from “A1_Q1.xls”, construct a scatter plot for logarithm of monthly earnings versus the Education variable, separately for men and women using different marker colors.
• The file “A1_Q2.xlsx” contains information on the GDP for a group of countries over the period from 1960 to 2022, which includes gaps and numerous instances of missing data. Construct a graph that displays the average annual GDP per capita growth across all countries.
• Next, create a factor variable income_group to categorize all countries into three groups: low (with GDP per capita less than 1,000), medium (GDP per capita between 1,000 and 10,000), and high (GDP per capita more than 10,000). Construct a graph consisting of three panels, with each panel showing the average annual GDP growth for each income group.