COS 217 ASSIGNMENT 1: A "DE-COMMENT" PROGRAM

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

ASSIGNMENT 1: A "DE-COMMENT" PROGRAM


Purpose

The purpose of this assignment is to help you learn or review (1) the fundamentals of the C programming language, (2) the details of the "de-commenting" task of the C preprocessor, and (3) how to use the the Linux operating system and GNU software, especially bash and gcc217.

Students from past semesters reported taking, on average, 8 hours to complete this assignment.


Rules

This assignment is an individual assignment, not a partnered assignment.

Make sure you study the course Policies web page before doing this assignment or any of the COS 217 assignments. In particular, note that you may use a variety of "human" sources of information while doing assignments, including the course staff members, the Intro Lab TAs, and other current students via Ed.

Each assignment may have a challenge part. While doing the challenge part of an assignment, you are bound to observe the course policies regarding assignment conduct as given in the course Policies web page, plus one additional policy: you may not use any "human" sources of information. That is, you may not consult with the course's staff members, the Intro Lab TAs, other current students via Ed, or any other people while working on the challenge part of an assignment, except for clarification of requirements.

The challenge part tasks typically offer more than one viable path to success, so whereas the code modeled in lectures and precepts will get you the bulk of the way through each assignment, the challenge parts are your opportunity to venture outside that box and program creatively. There's also a practical component: these portions of assignments are your best way to demonstrate your mastery of the material. Some students have bristled at this idea, but it is important for the course staff to be able to observe that you are independently capable beyond cobbling together the results of serial office hours, Intro Lab TA hours, and Ed responses.

Fear not, however: you will have everything you need to complete the challenge, but you may need to think about arranging the code a bit differently or writing code that handles a more complicated job than what we have scaffolded for you in lecture demos or precept sample programs. In solving the challenge, you are permitted to use course material that has not (yet) been covered in readings, lectures, or precepts, but it is always possible to complete the challenge without reading ahead or outside the course materials.

Not following as closely from the given materials does, though, make the challenge part, as its name implies, more challenging. Thus, in each case not attempting the challenge offers a clean stopping point that you can use as a kind of relief valve if you find that an assignment is taking too much time.

For this assignment, avoiding the use of global variables (as described below) is the challenge part. That part is worth 2 percent of this assignment. So if you don't do the challenge part and all other parts of your assignment solution are perfect and submitted on time, then your grade for the assignment will be 98 percent. The weight of the challenge portion will vary by assignment.


Background

The C preprocessor is an important part of the C program build workflow. Given a C source code file, the C preprocessor performs three jobs:

  • Merge physical lines of source code into logical lines. That is, when the preprocessor detects a line that ends with the backslash character, it merges that physical line with the next physical line to form one logical line. More precisely, if the preprocessor detects a backslash character immediately followed by a newline character, then it simply removes both characters.

  • Remove comments from the source code (a.k.a. "de-comment").

  • Handle preprocessor directives (#define, #include, etc.) that reside in the source code.

The second of those jobs — the de-comment job — is more substantial than you might think. For example, when de-commenting a program the C preprocessor must be sensitive to:

  • The fact that a comment is a token delimiter. After removing a comment, the C preprocessor must make sure that a whitespace character is inserted in its place.

  • Line numbers. After removing a comment, the C preprocessor sometimes must insert blank lines in its place to preserve the original line numbering.

  • String and character literal boundaries. The preprocessor must not consider the character sequence /*...*/ to be a comment if it occurs inside a string literal ("...") or character literal ('...').


The Task

Your task is to compose a C program named decomment that performs a subset of the de-comment job of the C preprocessor, as defined below.


Functionality

Your program must be a Linux filter. A filter is a program that reads characters from the standard input stream, and writes a subset of those characters to the standard output stream. Specifically, your program must (1) read text (like the source code of a C program) from the standard input stream, (2) write that same text to the standard output stream with each C90 comment replaced with a space, as prescribed below. Further, your program must (3) write error and warning messages as appropriate to the standard error stream. A typical execution of your program from the shell might look like this:

./decomment <somefile.c >somefileWithoutComments.c 2>errorsAndWarnings

In the following examples a space character is shown as "s" and a newline character as "n".

Your program must replace each single-line comment with a space. Examples:

Standard Input Stream Standard Output Stream Standard Error Stream
abc/*def*/ghin abcsghin
abc/*def*/sghin abcssghin
abcs/*def*/ghin abcssghin

Your program must define "comment" as in the C90 standard. In particular, your program must consider text of the form (/*...*/) to be a comment. It must not consider text of the form (//...) to be a comment. Example:

Standard Input Stream Standard Output Stream Standard Error Stream
abc//defn abc//defn

Your program must allow a comment to span multiple lines. That is, your program must allow a comment to contain newline characters. Your program must replace each multi-line comment with a space, followed by newline characters as necessary to preserve the original line numbering. Examples:

Standard Input Stream Standard Output Stream Standard Error Stream
abc/*defnghi*/jklnmnon abcsnjklnmnon
abc/*defnghinjkl*/mnonpqrn abcsnnmnonpqrn

Your program must not recognize nested comments. Example:

Standard Input Stream Standard Output Stream Standard Error Stream
abc/*def/*ghi*/jkl*/mnon abcsjkl*/mnon

Your program must handle C string literals. In particular, your program must not consider text of the form (/*...*/) that occurs within a string literal ("...") to be a comment. Examples:

Standard Input Stream Standard Output Stream Standard Error Stream
abc"def/*ghi*/jkl"mnon abc"def/*ghi*/jkl"mnon
abc/*def"ghi"jkl*/mnon abcsmnon
abc/*def"ghijkl*/mnon abcsmnon

Similarly, your program must handle C character literals. In particular, your program must not consider text of the form (/*...*/) that occurs within a character literal ('...') to be a comment. Examples:

Standard Input Stream Standard Output Stream Standard Error Stream
abc'def/*ghi*/jkl'mnon abc'def/*ghi*/jkl'mnon
abc/*def'ghi'jkl*/mnon abcsmnon
abc/*def'ghijkl*/mnon abcsmnon

Note that the C compiler would consider the first of those examples to be erroneous (multiple characters in a character literal). But many C preprocessors would not, and your program must not.

Your program must handle escaped characters within string literals. That is, when your program reads a backslash (\) while processing a string literal, your program must consider the next character to be an ordinary character that is devoid of any special meaning. In particular, your program must consider text of the form ("...\"...") to be a valid string literal which happens to contain the double quote character. Examples:

Standard Input Stream Standard Output Stream Standard Error Stream
abc"def\"ghi"jkln abc"def\"ghi"jkln
abc"def\'ghi"jkln abc"def\'ghi"jkln
abc"def\"ghi"jkl/*mno*/pqrn abc"def\"ghi"jklspqrn
abc"def\\"ghi"jkl/*mno*/pqr"stun abc"def\\"ghi"jkl/*mno*/pqr"stun

Similarly, your program must handle escaped characters within character literals. That is, when your program reads a backslash (\) while processing a character literal, your program must consider the next character to be an ordinary character that is devoid of any special meaning. In particular, your program must consider text of the form ('...\'...') to be a valid character literal which happens to contain the quote character. Examples:

Standard Input Stream Standard Output Stream Standard Error Stream
abc'def\'ghi'jkln abc'def\'ghi'jkln
abc'def\"ghi'jkln abc'def\"ghi'jkln
abc'def\'ghi'jkl/*mno*/pqrn abc'def\'ghi'jklspqrn
abc'def\\'ghi'jkl/*mno*/pqr'stun abc'def\\'ghi'jkl/*mno*/pqr'stun

Note that the C compiler would consider all of those examples to be erroneous (multiple characters in a character literal). But many C preprocessors would not, and your program must not.

Your program must handle newline characters in C string literals without generating errors or warnings. Examples:

Standard Input Stream Standard Output Stream Standard Error Stream
abc"defnghi"jkln abc"defnghi"jkln
abc"defnghinjkl"mno/*pqr*/stun abc"defnghinjkl"mnosstun

Note that a C compiler would consider those examples to be erroneous (newline character in a string literal). But many C preprocessors would not, and your program must not.

Similarly, your program must handle newline characters in C character literals without generating errors or warnings. Examples:

Standard Input Stream Standard Output Stream Standard Error Stream
abc'defnghi'jkln abc'defnghi'jkln
abc'defnghinjkl'mno/*pqr*/stun abc'defnghinjkl'mnosstun

Note that a C compiler would consider those examples to be erroneous (multiple characters in a character literal, newline character in a character literal). But many C preprocessors would not, and your program must not.

Your program must handle unterminated string and character literals without generating errors or warnings. Examples:

Standard Input Stream Standard Output Stream Standard Error Stream
abc"def/*ghi*/jkln abc"def/*ghi*/jkln
abc'def/*ghi*/jkln abc'def/*ghi*/jkln

Note that a C compiler would consider those examples to be erroneous (unterminated string literal, unterminated character literal, multiple characters in a character literal). But many C preprocessors would not, and your program must not.

Your program must detect an unterminated comment. If your program detects end-of-file before a comment is terminated, it must write this message to the standard error stream, where "X" is the number of the line on which the unterminated comment begins:

Error: line X: unterminated comment
Examples:
Standard Input Stream Standard Output Stream Standard Error Stream
abc/*defnghin abcsnn Error:slines1:sunterminatedscommentn
abcndef/*ghinjkln abcndefsnn Error:slines2:sunterminatedscommentn
abc/*def/ghinjkln abcsnn Error:slines1:sunterminatedscommentn
abc/*def*ghinjkln abcsnn Error:slines1:sunterminatedscommentn
abc/*defnghi*n abcsnn Error:slines1:sunterminatedscommentn
abc/*defnghi/n abcsnn Error:slines1:sunterminatedscommentn

Your program (more precisely, its main function) must return EXIT_FAILURE if it is unsuccessful, that is, if it detects an unterminated comment and so is unable to remove comments properly. Otherwise it must return EXIT_SUCCESS or, equivalently, 0.

Your program must work for standard input lines of any length.

Your program may assume that logical lines are identical to physical lines in the standard input stream, i.e., your de-comment program need not perform the first of the three jobs described above in the "Background" section. The implication of this allowance is that your program may assume that the two-character sequence of the backslash character followed by the newline character does not occur in the standard input stream. Note, though, that this two-character sequence is not the same as the C character literal '\n', which is simply the (single) newline character.


The Procedure

Create your program in the environment of your choice, staging the files to the armlab cluster to build, test, debug, and submit.


Step 1: Mirror/Import the COS 217 Decomment Repository and Clone Your New Repository

Note: you should complete this by attentively and assiduously following the precise instructions in Setup Step 5 from the Git and GitHub Primer document from the first precept.

Make and populate a repository for your work in this assignment. It should initially have the contents from our provided COS 217 assignment repository for this assignment (https://github.com/COS217/Decomment). These files include testing infrastructure (sampledecomment, testdecomment, testdecommentdiff), and many sample input files (those whose names end with .txt) to test your code. Step 5 will describe their utility in saving time and effort in testing.

Once you have your own repository, get one or more working copies of it: one on armlab (where you will build, test, and ultimately submit your code) and one on each machine where you will be editing your code locally (e.g., your laptop). If you will be developing directly on armlab, you only need one working copy.


Step 2: Design a DFA

Design a deterministic finite state automaton (DFA, alias FSM) that expresses the required de-commenting logic. The DFA concept is described in lectures, and in the Wikipedia Deterministic finite automaton page.

Design your DFA so it "accepts" a given sequence of characters if the sequence contains no unterminated comments. That is, when given a sequence of characters that does not contain an unterminated comment, your DFA must end in an "accepting" state. Conversely design your DFA so it "rejects" a given sequence of characters if the sequence contains an unterminated comment. That is, when given a sequence of character that contains an unterminated comment, your DFA must end in a "rejecting" state.

It is likely easiest to start off by using the traditional "labeled ovals and labeled arrows" notation. Let each oval represent a state. Give each state a meaningful descriptive name, and indicate whether it is an "accepting" state or a "rejecting" state. Let each arrow represent a transition from one state to another. Label each arrow with the single character, or class of characters, that causes the transition to occur.

Reminder: a DFA makes a transition on every character read, and has no "memory" — it can only know what state the previous reads resulted in, not what the characters read were.

Express as much of the de-commenting logic, in terms of state transitions, as you can within your DFA. The more complete your state transition logic is, the better your grade on the DFA will be. We also encourage (but do not require) you to express the action(s) that must occur (for example, "print the character") when the corresponding transition occurs.

To properly report unterminated comments, your program must contain logic to keep track of the current line number of the standard input stream. This is an action associated with a transition, and thus you are not required to express that logic in your DFA.

Convert your "labeled ovals and labeled arrows" DFA to a textual representation, placing the result in your project directory in a file named dfa. The document TextualDFAs and the example in nano.pdf contain examples. Make sure you indicate explicitly which state is the DFA's start state, and whether each state is an accepting state or a rejecting state. Each state should have a semantically meaningful name.

The file must be plain text and the name of the file must be dfa, not dfa.txt, not DFA, not Dfa.doc, etc.

Step 3: Create Source Code

Create source code in the working copy of your repository, specifically a file named decomment.c. The decomment.c program must implement your DFA.

If your DFA exits while in an "accepting" state, then your program's exit status must be EXIT_SUCCESS or 0. If your DFA exits while in a "rejecting" state, then your program's exit status must be EXIT_FAILURE. In other words, if your program detects no unterminated comments, then its exit status must be EXIT_SUCCESS or 0; if your program detects an unterminated comment, then its exit status must be EXIT_FAILURE.

Your program must not consist of one large main function. Instead your program must consist of multiple small functions, each of which performs a single well-defined task. In this program you must create one function to implement each state of your DFA, as described in lectures. Each function should have a function comment that specifies the behavior of the function, as specified in the Program Style section below.

Generally, a (large) C program must consist of multiple source code files. For this assignment, though, do not split your source code into multiple files. Instead, place all source code in a single source code file. Subsequent assignments will ask you to compose programs consisting of multiple source code files.

We suggest that your program call the standard C getchar function to read characters from the standard input stream.


Step 4: Build (Part 1)

Stage your source code to your project directory on armlab. Use the gcc217 command to build your program. At this point issue the "shortcut" gcc217 command to preprocess, compile, assemble, and link your program all at once.


Step 5: Execute

Execute your program multiple times on various input files that test all statements in your program.

As noted previously, we have provided files that you will find helpful:

  • sampledecomment is an executable version of a correct assignment solution. Your program must write exactly (character for character) the same data to the standard output stream and the standard error stream as sampledecomment does. Once you have built your program into the decomment executable, you could test your program on armlab using commands similar to these:

    ./sampledecomment <somefile.c > output1 2> errors1 ./decomment <somefile.c > output2 2> errors2 diff -y output1 output2 diff -y errors1 errors2 rm output1 errors1 output2 errors2

    The Linux diff program finds differences between two given files. The executions of the diff program shown above generate side-by-side listings of the two given files, and annotate the output to indicate differences. Feel free to ask questions in precept or on Ed if the annotations aren't clear. Suggestion: Widen your terminal window before issuing diff -y commands.

    If the output of the command diff -y output1 output2 shows differences between output1 and output2, then sampledecomment and your program have written different characters to the standard output stream. Similarly, if output of the command diff -y errors1 errors2 shows differences between errors1 and errors2, then sampledecomment and your program have written different characters to the standard error stream.

    Note that by default the output and error stream redirection will not clobber an existing file: trying to redirect to a file that already exists will fail and give the message cannot overwrite existing file. This is a strong safety measure, however it can become annoying during repeated testing of the form shown above: you'd have to remove your output2 and errors2 files between every run. You can override this behavior, and thus avoid having to continually delete the produced files, by redirecting using >| instead of just > and 2>| instead of just 2>. Use caution when choosing this option, though, as clobbered files are not typically recoverable.

  • Although intended to operate on C source files, any plain text file can serve as the input to be redirected into your program! The repository provides many .txt files for this purpose.

  • testdecomment and testdecommentdiff are bash scripts that automate the testing process described above. Comments at the beginning of those files describe how to use them. After copying the scripts to your project directory on armlab, you may need to execute the commands chmod 700 testdecomment and chmod 700 testdecommentdiff to give them "executable" permissions.

You also must test your program against its own source code using a command sequence such as this:

./sampledecomment < decomment.c > output1 2> errors1 ./decomment < decomment.c > output2 2> errors2 diff -y output1 output2 diff -y errors1 errors2 rm output1 errors1 output2 errors2

Repeat Steps 2, 3, 4 and 5 until your program handles all test files and its own source code properly.


Step 6: Build (Part 2)

Use the gcc217 command to build your program "the long way" by issuing distinct gcc217 commands to preprocess, compile, assemble, and link your program. Examine the intermediate results by issuing these commands:

emacs decomment.i emacs decomment.s emacs decomment.o emacs decomment
You could use more instead of emacs to view the files, as well. The latter two of these files will not be human-readable -- this is not an error.

Step 7: Complete a readme File

Edit your copy of the given readme file by answering each question that is expressed therein.

One of the sections of your readme file requires you to list the authorized sources of information that you used to complete the assignment. Another section requires you to list the unauthorized sources of information that you used to complete the assignment. Your grader will not grade your submission unless you have completed those sections. To complete the "authorized sources" section of your readme file, copy the list of authorized sources given in the "Policies" web page to that section, and edit it as appropriate.

Descriptions of your code must not be in the readme file. Instead they must be integrated into your code as comments.
Your readme file must be a plain text file. Don't create your readme file using Microsoft Word or any other word processor.
The name of the file must be readme, not readme.txt, not README, not Readme, etc.

Step 8: Provide Feedback

Provide the instructors with your feedback on the assignment. To do that, issue this command on armlab:

FeedbackCOS217.py 1

and answer the questions that it asks. That command stores its questions and your answers in a file named feedback in your working directory, so you should issue the command from your project directory on armlab.


Step 9: Submit

Submit your work. Submit your dfa file, your decomment.c file, the files that gcc217 generated from it, your readme file, and your feedback file electronically by issuing these commands on armlab:

submit 1 dfa submit 1 decomment.c submit 1 decomment.i decomment.s decomment.o decomment submit 1 readme feedback
WARNING: if you named your dfa something different than you were supposed to (typically dfa.txt or similar) and copied+pasted the first line above, your file will not be submitted! The submit command prints an error message, but invariably multiple students per term ignore this. Don't be one of them: name your file correctly and, at the very least, submit the filename that you actually have.
Remember, you could also issue submitandbackup X file1 file2 ... commands instead of submit X file1 file2 ... commands.

We can accept your files only if you submit them by executing submit (or submitandbackup) commands on armlab, and the timestamp on your file will be the time of your submit. In particular, we cannot accept your files via e-mail or via their presence in your Git repository. We cannot accept your DFA in any form other than as a file containing plain text.

Step 10: Finalize Your Repository

It is considered good practice to leave each working copy of your repository with a clean working tree up to date with the main branch. This means that you have no updates in the cloud that have not been pulled to the working copy, no commits in the working copy that have not been pushed to the cloud, no modified files in the working copy that have not been committed, and no new files in the working copy that are untracked by git.

As part of this, ensure that the final submitted version of your source code is the same as the version in your repository's working copy. This is only a concern if you are copying files around manually because for workflows that use a working copy of the repository as your project directory, satisfying the conditions of the first paragraph will automatically ensure this.


Program Style

In part, good program style is defined by the rules given in The Practice of Programming (Kernighan and Pike), as summarized by the Rules of Programming Style document. For this assignment we will pay particular attention to rules 1-24.

These more course-specific rules also apply:

  • Indentation: Indent using spaces, not tabs. The emacs editor does that when using the .emacs configuration file that we have provided. Most other editors have a way to configure this setting.

  • Line lengths: Limit line lengths in your source code to 72 characters. Doing that allows us to print your code in two columns (saving paper), or display it on one screen without scrolling horizontally. When using the .emacs configuration file that we have provided, emacs indicates lines that exceed 72 characters. Specifically, emacs uses a green background to mark the character in column 72, and a gray background to mark the character in column 73. So it's OK to see a green background, but a gray background indicates that the line is too long. Again, most other editors have a way to configure similar visual indicators.

  • Names: Use a clear and consistent style for variable and function names. One example of such a style is to prefix each variable name with characters that indicate its type. For example, the prefix c might indicate that the variable is of type char, i might indicate int, pc might mean char*, ui might mean unsigned int, etc. But it is fine to use another style — a style that does not include the type of a variable in its name — as long as the result is a clear and readable program.

  • Variable Comments: Compose a comment for each global variable. The comment must appear immediately before the definition/declaration of the global variable. Compose a comment for each local variable definition/declaration whose purpose is unclear. The comment must appear immediately before the definition/declaration of the local variable, or at the end of the line containing the definition/declaration.

  • Function Comments: Compose a comment for each function. A function definition/declaration's comment must immediately precede the function's definition/declaration. A function's comment must describe what the function does from the point of view of the function's callers. A function's comment must refer explicitly to the function's parameters (by name) and the function's return value. A function's comment must state how, if at all, the function changes global state, that is, if it updates global variables, reads from standard input or any other stream, and writes to standard output, standard error, or any other stream. In short, a function's comment must describe the flow of data into and out of the function. A function's comment must not describe how the function works. To describe how the function works, add local comments (that is, comments within the function definition) as necessary.


Grading

Minimal requirements to receive credit for decomment.c:

  • decomment.c must build.

  • decomment.c must handle "normal" files (that is, files that contain no comments) properly.

To receive credit for the challenge part of the assignment, your program must write proper line numbers within its "unterminated comment" error messages, and must do so without using global variables. The next section of this assignment specification elaborates.

We will grade your work on two kinds of quality:

  • Quality from the user's point of view. From the user's point of view, your code has quality if it behaves as it must. The correct behavior of your program is defined by the previous sections of this assignment specification, and by the behavior of the given sampledecomment program.

  • Quality from the programmer's point of view. From the programmer's point of view, your code has quality if it is well styled and thereby easy to maintain. Good program style is defined by the previous section of this assignment specification.

To encourage good coding practices, we will deduct points if gcc217 generates warning messages.


Avoiding Global Variables

As noted above, when the standard input stream contains an unterminated comment, your decomment program must write an error message. The error message must contain the number of the line at which the unterminated comment begins. The challenge part of the assignment is to implement that logic without using global variables.

Suppose your program's main function calls some state handling function, and that main wishes to pass some values to the state handling function. The main function can do that using ordinary parameters.

But suppose the state handling function wishes to pass some values back to main. The state handling function can use its return value to pass the first value (for example, the next DFA state) back to main. But how can it pass additional values (for example, a line number) back to main?

One approach is to use global variables, where a global variable is one which is defined outside of all functions and accessible from any function. The state handling function could assign the additional values (for example, a line number) to global variables; main then could fetch those values by accessing the global variables.

However, as prescribed by Kernighan and Pike style rule 25, and for reasons that we will discuss later in the course, generally you should avoid using global variables. Instead all communication of values into and out of a function should occur via the function's parameters and its return value.

Thus, for the challenge portion of this assignment, in your decomment program you must avoid using global variables. You can do that using any one of these three approaches:

  • Express the program's logic such that no state handling function must pass more than one value back to main. You can do that by enhancing the logic in main.

  • Use pointer parameters in your state handling functions, as described in Section 11.4 of our King book.

  • Use another feature of the C programming language that allows you to communicate multiple pieces of information within the singular return value from a function. (By analogy: in Java you might think about returning an instance of a class that has multiple instance variables.)

Some notes:

  • The course has already covered all the material you would need to complete the first approach.

  • The COS 217 course has not covered pointers yet. So you'd need to work ahead of the pace of the course if you wanted use the second approach.

  • The third approach is suggested even more vaguely than the second approach, so it would take not just working ahead but doing some investigative work to look through the future readings in the course.

  • Although you must avoid defining variables at the global level, it's fine to define data types at the global level. For example, code of this form:

    enum SomeEnum {SOMEVALUE1, SOMEVALUE2, ...};

    at the global level is fine.

  • As noted in the Grading section of this document, to be awarded points for the challenge part of the assignment your program must write proper line numbers within its "unterminated comments" error messages, and must do so without using global variables. However, if your program writes proper line numbers within its "unterminated comment" error messages but does so using global variables, then your grade will be penalized only 2 percent. We will not think less of you if you decide to accept the small deduction instead of doing that part of the assignment.

发表评论

电子邮件地址不会被公开。 必填项已用*标注