STAT GR5293-004: Design and Analysis of Online Experiments Homework 3 Fall 2023

STAT   GR5293-004:   Design   and   Analysis   of  Online   Experiments

Homework 3

Fall 2023

Due Nov 13 midnight (ET)

In  this  exercise,  you  will  implement  a  simulated  AA  test  for  metric  diagnostics  as  well  as  explore different variance estimation approaches. You need to submit the report as well as your code.

The dataset contains a simulated Ads click event logging. Each row represents a unique user, and it also contains the number of clicks and number of impressions per user. In most times, you would have access to impression level data, but in case, we have already aggregated at the user level for you. Suppose you are  interested  in  understanding  the  behaviors  of  click   through  rate  (CTR)  which  is   defined  as clicks/impressions. One way to check the behavior of this metric is to run a simulated AA test.

Problem 1: Single Average CTR (use total number of clicks divided by total number of impressions)

Your goal is to implement a simulated AA test to assess the behavior of this metric (recall slide 15 in Lecture 5). Assume 50/50 split for the simulated AA test and repeat the simulation for 1000 times. In order to conduct simulated AA test, you need to estimate the variance of CTR and there are three variance estimation approaches you may consider:

Variance estimation approach 1:

Traditional approach to calculate sample variance and assume each observation is independent.

For example, ads click events are assumed to be independent.

Variance estimation approach 2:

Implement the delta method to account for the possible correlation of click events from the same

user (recall slide 39 in Lecture 3)

Variance estimation approach 3:

Implement the user-level bootstrap (recall slide 40 in Lecture 3)

1. 1 Plot the distribution of p-value using variance estimation approach 1 and report what you observe. In this case, instead of implementing the traditional approach, you can further approximate the click event using a Bernoulli distribution, then the total number of clicks follow a binomial distribution. Now you can calculate the variance of CTR using p( 1-p)/N, where p is your CTR.

1.2 Plot the distribution of p-value using variance estimation approach 2 and report what you observe.

1.3 Plot the distribution of p-value using variance estimation approach 3 and report what you observe.

1.4 Based on 1.1- 1.3, which one would pass the simulated AA test and which one does not? Explain your reasons.

Problem 2: Double Average CTR (calculate user level CTR first, and then take the average across all users)

2. 1 Plot the distribution of p-value using variance estimation approach 1 and report what you observe.

2.2 Plot the distribution of p-value using variance estimation approach 3 and report what you observe.

2.3 Based on 2.1-2.2, which one would pass the simulated AA test and which one does not? Explain your reasons.

2.4  Comment  on  difference  between  single  average  CTR  and  double  average  CTR.  Why  is  there  a difference? What are the pros/cons of using each metric?

发表评论

电子邮件地址不会被公开。 必填项已用*标注