Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
CSCE-642 Reinforcement Learning
Assignments
Throughout the course you will be implementing various RL algorithms. Namely, Value Iteration, Asynchronous Value Iteration, Policy Iteration, Monte-Carlo Control, Monte-Carlo Control with Importance Sampling, Q-Learning, SARSA, Q-Learning with Approximation, Deep Q-Learning, REINFORCE, A2C, and DDPG. See below the implementation guidelines for each of the algorithms.
All your implementations will be within the provided codebase. You may work on your implementations at any time (even ahead of the lectures) as long as you submit your solutions on time.
By submitting assignments, students acknowledge and agree that their submissions will be sent to a non-secured third-party server for plagiarism detection. You should not to include any protected or private data in your submissions, as the processing of such data will occur on a non-secured server.
Set up
Instructions for setting up the assignment framework on your machine are provided here.
Tip: you might find it comfortable to use PyCharm for easy debugging. Follow this guide for setting and managing a virtual environment in PyCharm.
Executing
In order to execute the code, type python run.py along with the following arguments. If an argument is not provided, its default value is assumed.
- -s <solver> : Specify which RL algorithm to run. E.g., "-s vi" will exacute the Value Iteration algorithm. Default=random control.
- -d <domain> : Chosen environment from OpenAI Gym. E.g., "-d Gridworld" will run the Gridworld domain. Default="Gridworld".
- -o <name> : The result output file name. Default="out.csv".
- -e <int> : Number of episodes for training. Default=500.
- -t <int> : Maximal number of steps per episode. Default=10,000.
- -l <[int,int,...,int]> : Structure of a Deep neural net. E.g., "[10,15]" creates a net where the Input layer is connected to a hidden layer of size 10 that is connected to a hidden layer of size 15 that is connected to the output. Default=[24,24].
- -a <alpha> : The learning rate (alpha). Default=0.5.
- -r <seed> : Seed integer for a random stream. Default=random value from [0, 9999999999].
- -g <gamma> : The discount factor (gamma). Default=1.00.
- -p <epsilon> : Initial epsilon for epsilon greedy policies (can set up to decay over time). Default=0.1.
- -P <epsilon> : The minimum value of epsilon at which decaying is terminated (can be zero). Default=0.1.
- -c <rate> : Epsilon decay rate (can be 1 = no decay). Default=0.99.
- -m <size> : Size of the replay memory. Default=500,000.
- -N <n> : Copy parameters from the trained approximator to the target approximator every n steps. Default=10,000.
- -b <size> : Size of the training mini batches. Default=32.
- --no-plots; : Turn off the training curve online plotting.
Autograder
In order to run the autograder, type python autograder.py <solver> . For instance, to run autograder for Asynchronous Value Interation type python autograder.py avi . The autograder will print a total score out of 10, and will report failed test cases.
Note: The autograder is still in development, so make to sure to keep an eye on campuswire for updates to the file.
Help
Questions, comments, and clarifications regarding the assignments should NOT be sent via email to the course staff. Please use the class discussion board on Campuswire instead.
Submission
Each of the algorithms that you are required to implement has a separate '.py' file affiliated with it. When submitting an implementation, upload only the (single) '.py' file as a zip file named <your UIN>. For example, '123123123.zip'. Students auditing the class (not registered) are NOT allowed to upload a submission.