CS7641 Assignment 4 Markov Decision Processes


Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due


CS7641 Assignment 4
Markov Decision Processes
Fall 2024

1 Assignment Weight

The assignment is worth 15% of the total points.

Read everything below carefully as this assignment has changed term-over-term.

2 Objective

In some sense, we have spent the semester thinking about machine learning techniques for various forms of function approximation. It’s now time to think about using what we’ve learned in order to allow an agent of some kind to act in the world more directly. This assignment asks you to consider the application of some of the techniques we’ve learned from reinforcement learning to make decisions.

The same ground rules apply for programming languages and libraries. You may program in any language that you wish insofar as you feel the need to program. As always, it is your responsibility to make sure that we can actually recreate your narrative, if necessary.

Please note, this class implements changes to the assignments term-over-term as we are calibrating the course incrementally. Please read through everything, even if you are submitting work from a previous semester as the requirements will likely have changed.

3 Procedure

3.1 The Problems Given to You

You are being asked to explore Markov Decision Processes (MDPs):

1. Come up with two interesting MDPs. Explain why they are interesting. They don’t need to be overly complicated or directly grounded in a real situation, but it will be worthwhile if your MDPs are inspired by some process you are interested in or are familiar with. It’s ok to keep it somewhat simple. For the purposes of this assignment, though, make sure one MDP has a ”small” number of states, and the other MDP has a ”large” number of states. The judgement and rationalization of what is “small” and “large” will be up to you. For initial intuition, 200 states is not considered “large”. Additionally, neither of your MDPs you choose should be a grid world problem.

2. Solve each MDP using value iteration as well as policy iteration. How many iterations does it take to converge? Which one converges faster? Why? How did you choose to define convergence? Do they converge to the same answer? How did the number of states affect things, if at all?

3. Now pick your favorite reinforcement learning algorithm and use it to solve the two MDPs. How does it perform, especially in comparison to the cases above where you knew the model, rewards, and so on?

What exploration strategies did you choose? Did some work better than others?

Extra Credit Opportunity:

There is an opportunity to add 5 points of extra credit. In lieu of only one reinforcement learning algorithm, you will need to implement Q Learning in addition to one of the following: SARSA, DQN, or any ablation from 1the 2017 Rainbow DQN study (Link to study). You will need to justify your choice to compare. Briefly describe whatever it is that you do use. This is not mandatory and may require more time than allotted, especailly with many having longer training times.

Analysis writeup is limited to 8 pages. The page limit does include your citations. Anything past 8 pages will not be read. Please keep your analysis as concise while still covering the requirements of the assignment. As a final check during your submission process, download the submission to double check everything looks correct on Canvas. Try not wait until the last minute to submit as you will only be tempting Murphy’s Law.

In addition, your report must be written in LaTeX on Overleaf. You can create an account with your Georgia Tech email (e.g. [email protected]). When submitting your report, you are required to include a ’READ ONLY’ link to the Overleaf Project. If a link is not provided in the report or Canvas submission comment, 5 points will be deduced from your score. Do not share the project directly with the Instructor or TAs via email. For a starting template, please use the IEEE Conference template.

3.2 Acceptable Libraries

The algorithms used in this assignment are relatively easy to implement. Existing implementations are easy to find too. Below are java and python examples.
• bettermdptools (python) https://github.com/jlm429/bettermdptools
• BURLAP (java) http://burlap.cs.brown.edu/

4 Submission Details

sectionSubmission Details

The due date is indicated on the Canvas page for this assignment. Make sure you have set your timezone in Canvas to ensure the deadline is accurate. We are in the Eastern Time Zone for the course.

Due Date: Indicated as “Due” on Canvas

Late Due Date [20 point penalty per day]: Indicated as “Until” on Canvas.

You must submit:

  • A file named README.txt containing instructions for running your code. We need to be able to get to your code and your data. Providing entire libraries isn’t necessary when a URL would suffice; however, you should at least provide any files you found necessary to change and enough support and explanation so we can reproduce your results on a standard Linux machine.
  • A file named yourgtaccount-analysis.pdf containing your writeup (GT account is what you log in with, not your all-digits ID). This file should not exceed 8 pages.
  • A ’READ ONLY’ link to share your Overleaf Project link and final commit for source code in your personal repository on Georgia Tech’s private GitHub. These can be in your README.txt or commented to Canvas submission.
The file yourgtaccount-analysis.pdf should contain:
  • A description of your MDPs and why they are interesting.
  • A discussion of your experiments.

It might be difficult to generate the same kinds of graphs for this part of the assignment as you did in previous assignments; however, you should come up with some clever way to describe the kinds of results you produce. If you can achieve this visually all the better. However, a note of caution. Figures should remain legible in a 100% zoom. Do not try to squish figures together in specific sections where axis labels become 8pt font or less. We are looking for clear and concise demonstration of knowledge and synthesis of results in your demonstrations.

Any paper that solely has figures without formal writing will not be graded. Be methodical with your space. You may submit the assignment as many times as you wish up to the due date, but, we will only consider your last submission for grading purposes.

Note: we need to be able to get to your code and your data. Providing entire libraries isn’t necessary when a URL would suffice; however, you should at least provide any files you found necessary to change and enough support and explanation so we can reproduce your results on a standard linux machine.

5 Feedback Requests

When your assignment is scored, you will receive feedback explaining your errors and successes in some level of detail. This feedback is for your benefit, both on this assignment and for future assignments. It is considered a part of your learning goal to internalize this feedback. We strive to give meaningful feedback with a human interaction at scale. We have a multitude of mechanisms behind the scenes to ensure grading consistency with meaningful feedback. This can be difficult, however sometimes feedback isn’t always as clear as you need. If you are confused by a piece of feedback, please start a private thread on Ed and we will jump in to help clarify.

Previously, we have had a different rescore policy in this class which usually resulted in the same grade or lower. Many times there is a disconnect between what may be important or may have been missed in analysis. For this reason, we will not be conducting any rescore requests this term.

6 Plagiarism and Proper Citation

The easiest way to fail this class is to plagiarize. Using the analysis, code or graphs of others in this class is considered plagiarism. The assignments are designed to force you to immerse yourself in the empirical and engineering side of ML that one must master to be a viable practitioner and researcher. It is important that you understand why your algorithms work and how they are affected by your choices in data and hyperparameters. The phrase ”as long as you participate in this journey of exploring, tuning, and analyzing” is key. We take this very seriously and you should too.
What is plagiarism?
If you copy any amount of text from other students, websites, or any other source without proper attribution,
that is plagiarism. The most common form of plagiarism is copying definitions or explanations from wikipedia
or similar websites. We use an anti-cheat tool to find out which parts of the assignments are your own and
there is a near 100 percent chance we will find out if you copy or paraphrase text or plots from online articles,
assignments of other students (even across sections and previous courses), or website repositories.
What does it mean to be original?

In this course, we care very much about your analysis. It must be original. Original here means two things: 1) the text of the written report must be your own and 2) the exploration that leads to your analysis must be your own. Plagiarism typically refers to the former explicitly, but in this case it also refers to the latter explicitly.

It is well known that for this course we do not care about code. We are not interested in your working out the edge cases in k-nn, or proving your skills with python. While there is some value in implementing algorithms yourselves in general, here we are interested in your grokking the practice of ML itself. That practice is about the interaction of algorithms with data. As such, the vast majority of what you’re going to learn in order to master the empirical practice of ML flows from doing your own analysis of the data, hyper parameters, and so on; hence, you are allowed to steal ML code from libraries but are not allowed to steal code written explicitly for this course, particularly those parts of code that automate exploration. You will be tempted to just run said code that has already been overfit to the specific datasets used by that code and will therefore learn very little.

How to cite:

If you are referring to information you got from a third-party source or paraphrasing another author, you need to cite them right where you do so and provide a reference at the end of the document [Col]. Furthermore, “if you use an author’s specific word or words, you must place those words within quotation marks and you must credit the source.” [Wis]. It is good style to use quotations sparingly. Obviously, you cannot quote other people’s assignment and assume that is acceptable. Speaking of acceptable, citing is not a get-out-of-jail-free card. You cannot copy text willy nilly, but cite it all and then claim it’s not plagiarism just because you cited it.

Too many quotes of more than, say, two sentences will be considered plagiarism and a terminal lack of academic originality.

Your README file will include pointers to any code and libraries you used.

If we catch you. . .
We report all suspected cases of plagiarism to the Office of Student Integrity. Students who are under investi gation are not allowed to drop from the course in question, and the consequences can be severe, ranging from a lowered grade to expulsion from the program.

7 Version Control

• 10/14/2024 - TJL final updates for Fall 2024 and posting to class.

References

[Col] Williams College. Citing Your Sources: Citing Basics. url: https://libguides.williams.edu/citing.
[Wis] University of Wisconsin - Madison. Quoting and Paraphrasing. url: https://writing.wisc.edu/handbook/assignments/quotingsources.

Original assignment by Charles Isbell. Updated for Spring 2024 by John Mansfield and Theodore LaGrow. Modified for LATEX by John Mansfield.

发表评论

电子邮件地址不会被公开。 必填项已用*标注