Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
CS7641 Assignment 4
Markov Decision Processes
Fall 2024
1 Assignment Weight
The assignment is worth 15% of the total points.
Read everything below carefully as this assignment has changed term-over-term.
2 Objective
In some sense, we have spent the semester thinking about machine learning techniques for various forms of function approximation. It’s now time to think about using what we’ve learned in order to allow an agent of some kind to act in the world more directly. This assignment asks you to consider the application of some of the techniques we’ve learned from reinforcement learning to make decisions.
The same ground rules apply for programming languages and libraries. You may program in any language that you wish insofar as you feel the need to program. As always, it is your responsibility to make sure that we can actually recreate your narrative, if necessary.
Please note, this class implements changes to the assignments term-over-term as we are calibrating the course incrementally. Please read through everything, even if you are submitting work from a previous semester as the requirements will likely have changed.
3 Procedure
3.1 The Problems Given to You
1. Come up with two interesting MDPs. Explain why they are interesting. They don’t need to be overly complicated or directly grounded in a real situation, but it will be worthwhile if your MDPs are inspired by some process you are interested in or are familiar with. It’s ok to keep it somewhat simple. For the purposes of this assignment, though, make sure one MDP has a ”small” number of states, and the other MDP has a ”large” number of states. The judgement and rationalization of what is “small” and “large” will be up to you. For initial intuition, 200 states is not considered “large”. Additionally, neither of your MDPs you choose should be a grid world problem.
2. Solve each MDP using value iteration as well as policy iteration. How many iterations does it take to converge? Which one converges faster? Why? How did you choose to define convergence? Do they converge to the same answer? How did the number of states affect things, if at all?
3. Now pick your favorite reinforcement learning algorithm and use it to solve the two MDPs. How does it perform, especially in comparison to the cases above where you knew the model, rewards, and so on?
Extra Credit Opportunity:
Analysis writeup is limited to 8 pages. The page limit does include your citations. Anything past 8 pages will not be read. Please keep your analysis as concise while still covering the requirements of the assignment. As a final check during your submission process, download the submission to double check everything looks correct on Canvas. Try not wait until the last minute to submit as you will only be tempting Murphy’s Law.
In addition, your report must be written in LaTeX on Overleaf. You can create an account with your Georgia Tech email (e.g. [email protected]). When submitting your report, you are required to include a ’READ ONLY’ link to the Overleaf Project. If a link is not provided in the report or Canvas submission comment, 5 points will be deduced from your score. Do not share the project directly with the Instructor or TAs via email. For a starting template, please use the IEEE Conference template.
3.2 Acceptable Libraries
4 Submission Details
The due date is indicated on the Canvas page for this assignment. Make sure you have set your timezone in Canvas to ensure the deadline is accurate. We are in the Eastern Time Zone for the course.
Late Due Date [20 point penalty per day]: Indicated as “Until” on Canvas.
You must submit:
- A file named README.txt containing instructions for running your code. We need to be able to get to your code and your data. Providing entire libraries isn’t necessary when a URL would suffice; however, you should at least provide any files you found necessary to change and enough support and explanation so we can reproduce your results on a standard Linux machine.
- A file named yourgtaccount-analysis.pdf containing your writeup (GT account is what you log in with, not your all-digits ID). This file should not exceed 8 pages.
- A ’READ ONLY’ link to share your Overleaf Project link and final commit for source code in your personal repository on Georgia Tech’s private GitHub. These can be in your README.txt or commented to Canvas submission.
- A description of your MDPs and why they are interesting.
- A discussion of your experiments.
It might be difficult to generate the same kinds of graphs for this part of the assignment as you did in previous assignments; however, you should come up with some clever way to describe the kinds of results you produce. If you can achieve this visually all the better. However, a note of caution. Figures should remain legible in a 100% zoom. Do not try to squish figures together in specific sections where axis labels become 8pt font or less. We are looking for clear and concise demonstration of knowledge and synthesis of results in your demonstrations.
Any paper that solely has figures without formal writing will not be graded. Be methodical with your space. You may submit the assignment as many times as you wish up to the due date, but, we will only consider your last submission for grading purposes.
5 Feedback Requests
Previously, we have had a different rescore policy in this class which usually resulted in the same grade or lower. Many times there is a disconnect between what may be important or may have been missed in analysis. For this reason, we will not be conducting any rescore requests this term.
6 Plagiarism and Proper Citation
In this course, we care very much about your analysis. It must be original. Original here means two things: 1) the text of the written report must be your own and 2) the exploration that leads to your analysis must be your own. Plagiarism typically refers to the former explicitly, but in this case it also refers to the latter explicitly.
It is well known that for this course we do not care about code. We are not interested in your working out the edge cases in k-nn, or proving your skills with python. While there is some value in implementing algorithms yourselves in general, here we are interested in your grokking the practice of ML itself. That practice is about the interaction of algorithms with data. As such, the vast majority of what you’re going to learn in order to master the empirical practice of ML flows from doing your own analysis of the data, hyper parameters, and so on; hence, you are allowed to steal ML code from libraries but are not allowed to steal code written explicitly for this course, particularly those parts of code that automate exploration. You will be tempted to just run said code that has already been overfit to the specific datasets used by that code and will therefore learn very little.
If you are referring to information you got from a third-party source or paraphrasing another author, you need to cite them right where you do so and provide a reference at the end of the document [Col]. Furthermore, “if you use an author’s specific word or words, you must place those words within quotation marks and you must credit the source.” [Wis]. It is good style to use quotations sparingly. Obviously, you cannot quote other people’s assignment and assume that is acceptable. Speaking of acceptable, citing is not a get-out-of-jail-free card. You cannot copy text willy nilly, but cite it all and then claim it’s not plagiarism just because you cited it.
Your README file will include pointers to any code and libraries you used.
7 Version Control
References
Original assignment by Charles Isbell. Updated for Spring 2024 by John Mansfield and Theodore LaGrow. Modified for LATEX by John Mansfield.