COMSC-205 FALL 2024 ASSIGNMENT 4


Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due


COMSC-205 FALL 2024 ASSIGNMENT 4 
Due November 18 at 11:55pm

Learning Goals

● To get practice using a stack and a queue to solve a problem
● To learn a little about HTML
● To practice code-a-little / test-a-little approach to software development

Submission Checklist

Your final submission should have the following files:
● HTMLParser.java - Reads through an HTML file, reports any errors with mismatched opening and closing tags
● A README.md file that clearly states:
○ Your name and the name of the assignment
○ Attribution for any sources used
○ Any notes on questions, difficulties, or bugs
○ If you did the challenge
● An ImpactsReflection.pdf file which contains your answers to the impacts reflection.
● You do not need to submit the HTML files.

Background

LITS wants you to work on their web pages. They have someone else write the pages in HTML, but they want to make sure that the HTML uses tags correctly before those pages go live on the website.

What is HTML?

A markup language is used to annotate a text document with tags that describe how to format the document. Specifically, HyperText Markup Language (HTML) is the markup language usedto specify formatting for web pages and is interpreted by Internet browsers to create the display.

HTML uses tags to specify the display properties for portions of the text. For example, text can be italicized using:

<em>Hello, world!</em>

The browser would display this text as:
Hello, world!

Tags are specified by using an opening tag <> and a matching closing tag </>. Between the <> in the opening tag, a command will specify what that tag means. The closing tag will have the same command following the /. You may have encountered HTML when posting text on online forums. Many will have an option to create posts using HTML (and to show what your existing post looks like in HTML). For example, in Moodle, if you click the arrow for “Show/hide advanced buttons”, then the </> “HTML” button, you can see the HTML of your post. (This can be useful if your new paragraphs don’t look like they should; <br> makes a new line, while <p> makes a new line with extra vertical space!)Comparing the simple editor to the HTML editor, you can see how different tags change the text.

<p> creates a new paragraph with extra vertical space, while <br> creates a new line. <strong> makes text bold. <em> makes text italicized. <del> draws a line through text; <u> draws a line under text. <sup> makes a superscript (letters are small and up), while <sub> makes a subscript (letters are small and down). If you want to look more in depth, you can play with it yourself in a Moodle post.

HTML Tag Rules
Generally speaking, HTML tags follows these rules:
● Every opening tag needs a closing tag that matches (for example, every <em> needs a </em>)
● Some opening tags include optional attributes. For example, consider <p> versus
<p dir="ltr" style="text-align: left;">. The latter contains optional attributes after the 'p'.
Closing tags never contain attributes. The part before the first space is what needs to match; both examples above would match with the closing tag </p>.
● Some tags are empty tags (like <br>). They don’t have a closing tag. <br> is simply inserting a line break at a specific point in the text, not altering how some text is displayed.
● Opening and closing tags must be properly nested (for example,
<strong><em></em></strong> is correct, but <strong><em></strong></em> is not)● Capitalization does not matter (for example, <B></b> is correct, as is <b></B>, <B></B>, and <b></b>)

For this assignment, you must use our implementations of the MHCStackLL and MHCQueueLL classes that you did in the lab, rather than the built-in Stack and Queue classes. You can either use the ones that you created or the ones included in the starter code.

The Assignment

It can be tough to keep track of tags when manually writing them; Moodle has a built-in tag checker to make sure they match. This assignment asks you to create your own tag-checker.

How can a stack help with this?

Recall that a stack is a LIFO (Last-In-First-Out) data structure. We pop values in the opposite order that we push them, like a stack of plates. The last plate we put on the stack is the first plate that we remove from the stack.

To know if the tags used in html match, we want to know that the last open tag in a string is the one whose close tag we should see next as noted above. Consider these strings:
● <em>Hello!</em> - When we see the open tag <em>, we can push that onto a stack.
When we </em>, we can compare that with the top of the stack to make sure that the last open tag we saw was <em>
● <strong><em>Hello!</em></strong> - Push <strong> since it is an open tag. Then push
<em> since it is an open tag. When we see </em>, it matches and we pop <em> from the stack. That leaves <strong> as the new top of the stack. When we see </strong> it matches, and we pop.
● <strong>Hello!</em> - Push <strong> since it is an open tag. When we see </em>, we compare it to the top of the stack. They do not match, so this is an error.

How can a queue help with this?

The first thing you will want to do is find all of the open and close tags in the HTML. A queue is FIFO (First-In-First-Out) data structure and is good for keeping things in order. So, as you read the contents of a file, every time you find a tag (both open and close tags), you can enqueue it.

After you are done reading the file, you can dequeue the tags one at a time, using the stack as described above to check if they match.HTML Tag-Checker

Starter code is provided that opens a file, and steps through each line of that file. The desired end result is for the program to print out a message stating whether or not the tags in the file are correctly matched. Examples of expected output are provided at the end of this document.

To build the HTML tag-checker, you need to add the following:

1. Each input file contains HTML tags and regular text. The first step is to go through the file and add all of the tags, both opening tags and closing tags, to a queue in the order that they appear in the file. Hint: the code to find tags will be similar to the code you wrote to find enzymes in the last programming assignment.
2. Once all tags from all lines are in the queue, the next step is to determine whether the tags are correctly matched. Use a stack as described above to do this.
3. When all tags in the file have been checked, the program should check if there are any remaining opening tags (that did not have a corresponding closing tag). If there are, the tag set isn’t valid; print out an error message, and stop processing the file.
4. If all the tags in the file have been processed and no errors were found, print out a message saying the file is valid HTML.

When running the program, the user should be able to give a filename as a command line argument. Example usage in Terminal (after java files have been compiled):

% javac *.java
% java HTMLParser file1.html

When testing your code, feel free to hard-code the filename, but make sure to change it back touse args[0] before you submit! Forget how to do command line arguments in VSCode? Check Lab 4!

You can assume the following things are true about your HTML file:
● Each tag is on one line (a tag will not have a new line inside of it)
● Opening tags and their corresponding closing tags may appear on different lines
● Tags should match even if their capitalization is different (<B> matches with </b> and </B>, <b> matches with </b> and </B>)
● The only empty tags that you need to handle are: br, hr, img. You can assume that all other tags come in open-close pairs.

Design Decisions for you to make:

● How should you handle empty tags? (Remember empty tags are tags without a closing tag, not tags missing any characters like <>)

● What value should you place in the stack and queue? Should it include the < > characters for the tag or not? Should it use the case (uppercase/lowercase) that is used in the file or always use the same case?

Code-a-little / test-a-little

Remember that you should not attempt to write this all at once. Write a small amount of code and then test it to make sure that it works. For this assignment, you can create some HTML code and run your code from the command line. Some ideas of how to start:

1. Write the code to find the tags and simply output them to the screen as you find them.

2. Write the code to push open tags and pop closing tags and check them. First, just test it on correct HTML.

3. After it works on correct HTML, try examples with different types of errors and update the code to handle those.

4. Go back to the earlier description of the rules that HTML tags follow on pages 2-3 and update your code to handle those variations of tags (tags with attributes, empty tags, mixed case tags).

Optional Challenge: Meaningful error messages

Right now, your code should just print out that there is or is not an error. What if we wantedmore helpful error messages?  Try changing your code so it prints how what kind of error is found: a tag mismatch (when the opening and closing tags don’t match), extra closing tag (when a closing tag exists, but there are no opening tags to match it with), or extra opening tag (when an opening tag is never closed).

Documentation and Organization

Comments are required for this and all future assignments. Make sure every method has a doc comment, and that your code is well-organized and uses good style.

Reflection on Impacts

You should submit a PDF containing 2-3 short paragraphs of reflection or separate responses to each question. Submit these on Moodle.

In your reflection, answer each of the following prompts:

● What was the most challenging part of this assignment for you? Do you feel that you understand this aspect of the assignment well now?

● One benefit of tag checkers like the one you built in this assignment is that they make it easier for web developers to find and fix problems with web sites as they write them. Other tools, like Adobe’s Dreamweaver or GoogleDocs, make it even easier by letting the web developer work with the formatted text directly instead of typing in the HTML code. Thinking about the broader impacts, how do tag checkers or tools like Dreamweaver and Google Docs change who gets to develop content on the internet?

● Although tag checkers can help, they are not always correct. Under what circumstances does the tag checker you wrote produce false or unhelpful errors? Even if your code works on all the test files, can you think of something an html file could contain (using just the tags introduced in this assignment) that could cause it to fail?

Hints and Advice

For this assignment, a high-level design is not provided by default. If you are struggling with where to start, or with where to go next, reach out to a TA or an instructor, and they can give you some steps to help.

Like we have seen on previous labs and assignments, there are built-in String methods that may be useful for your implementation. We have links to the official documentation on some of them below:

● indexOf( )
● toLowerCase( )
● startsWith( )
● substring( )

File Reading

Your program will be run with one command line argument that provides the file name of the HTML file to test, as shown in the starting code. Example files are provided with the starter code; other files will also be tested.

Evaluation

Your Java program will be evaluated on:
● Correctness - Do the constructor and methods behave consistently with the description above?
● Stacks - Does the program use a stack correctly?
● Queues - Does the program use a queue correctly?● Documentation - Are there comments present, according to the Documentation Rubric?
● Style - Does your code follow the guidelines outlined in the Style Rubric?

Expected output

For the test files available on Moodle, here is the expected output of each. You don’t have to follow the exact format below, but you should include the same information (error line and tag name for each error, or a message saying there were no errors).
HTML Parser Output - standard
file1.html:
No errors found in html file
file2.html:
No errors found in html file
file3.html:
Error found
file4.html:
Error found
file5.html:
Error found
HTML Parser Output - with bonus error messages
file1.html:
No errors found in html file
file2.html:
No errors found in html file
file3.html:
Error found: tag mismatch between e and em
file4.html:
Error found: extra closing tag b found
file5.html:
Error found: extra opening tag(s) a, p, sup, and textarea found

发表评论

电子邮件地址不会被公开。 必填项已用*标注