首页 » 计算机科学(Computing Science) » EECS 481 Homework Assignment #1 — Test Coverage

EECS 481 Homework Assignment #1 — Test Coverage

2024-09-12 Admin 写评论

Homework Assignment #1 — Test Coverage

In this assignment you will create three high-coverage test suites, one for each of three different programs spanning three different application domains and three different programming languages. You will then write a short report reflecting on the activity.

Two of the key properties found in industrial software engineering, but not in most undergraduate academic coursework, are: the program is large and you did not write the program. If you are hired by Microsoft as a software engineer, they will not ask you to write Office from scratch. Instead, as discussed in class, the vast majority of software engineering is the comprehension and maintenance of large legacy codebases (i.e., old code you did not write).

Thus, there is no particular assumption that you are familiar with any of the subject programs in this assignment. Indeed, that is the point: you will be applying testing concepts to unknown software.

You may work with a partner for this assignment. If you do you must use the same partner for all sub-components of this assignment; if you don't you must work alone for all sub-components. (For example, if you work alone for HW1a, you must also work alone for HW1b. If you work with XYZ for HW1a, you must also work with XYZ for HW1b unless XYZ drops the class.)

Only one partner needs to submit for any of the programming assignments on the autograder, but you do need to indicate your partner on that website (you will see an option to do so when you click on the assignment). Only one partner needs to submit the report on Gradescope, but you do need to use Gradescope's interface to select your partner. (Here is a video showing Gradescope partner selection.)

You may use files, benchmarks or resources from the Internet (unlike in many other classes) or instructors, provided you cite them in your written report. In particular, you may use generative AI or synthesis tools like ChatGPT, provided you use a free version and you cite that you used it and which version you used. The reason we require that you use a free version is that we do not want students to feel any pressure to purchase something to improve their grades (cf. equity, accessibility, etc.). (If you use the same resource many times, such as StackOverflow, one citation per resource suffices.)

Assignment Flavor and Learning Goals

A recurring theme in this course is a focus on everything except writing code. Perhaps surprisingly, other elements, such as reading code, testing code, eliciting requirements, debugging code, and planning projects, are usually more important in industry than simply writing software. Your other electives cover programming — this one covers software engineering.

The assignments in this class are often more like "open ended puzzles". You won't be writing much code. Instead, you will be reading and using code written by other people. (Yes, that's annoying — but improving your skill at that is the point!)

Among other things, we hope this assignment will give students exposure to: statement (or line) coverage, branch coverage, legacy code bases, white-box testing, black-box testing, writing your own tests from scratch, slightly editing existing tests to improve coverage, thinking about high-coverage inputs from first principles, and using other available resources.

The parts of this assignment may be different from what you are used to. HW1a focuses on white-box testing and looking at the code. In HW1b, there is too much code for you to read it all, so you will have to think of other black-box approaches. HW1c will likely require a combination of efforts.

Installing, Compiling and Running Legacy Code

It is your responsibility to download, compile, and run the three subject programs. Getting the code to work is part of the assignment. You can post on the forum for help and compare notes bemoaning various architectures (e.g., windows vs. mac vs. linux, etc.). Ultimately, however, it is your responsibility to read the documentation for these programs and utilities and use some elbow grease to make them work.

If you are having trouble (e.g., windows is often a bit more complicated than linux for these things), post on the forum. Other students should be able to offer help and advice.

For example, one TA reports that gcc may not easily support static linking on a Mac, but that it took fewer than two minutes to set everything up in Ubuntu.

If relevant, the grading server uses Python 2.7.12, gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 and javac 1.8.0_151 for this assignment.

The TAs have suggested that some components of this assignment, such as HW1b, are very difficult to get set up natively on a Mac. They recommend using a linux virtual machine.

This Setup Is Tricky, Help!

If you don't have as much preparation in the systems and legacy code setup aspects of this assignment, I recommend checking out this information. It offers some advice to help you decide on the best plan for this semester. Click to setup hints and information!

Linux Fundamentals

Many of the assignments are easier to complete on Linux or similar command-line systems. The 481 staff have prepared a video (also linked in HW0) that provides a gentle introduction to a number of Linux concepts. This video is optional, but you may find it useful if you are less familiar with the Linux environment. Some students do not need the video for HW0 but find it helpful for HW1.

There are timestamps in a comment below the video that contains points of interest, such as:

terminals
environment variables
scripts in the shell
compiling programs from source
using ssh

The first half is more about terminals and shells (why do we have to type ./a.out? why can't we just say a.out?), and the second half is more about compiling programs from source. The video also covers elinks in particular, which is a terminal-based browser program.

Subject Programs and Metrics

There are three subject programs for this assignment. Each program has varying characteristics, input formats, and coverage goals. Your work for each subject program is submitted separately to the grading server (as HW1a, HW1b and HW1c).

HW1a — AVL Trees (Python)

The first subject program is a simple python implementation of an abstract data type: the AVL Tree container data structure.

Test cases for this program take the form of sequences of instructions for manipulating a single global AVL tree:

inumber — insert number into the tree
dnumber — delete number from the tree
p — print the current tree

For example, if the text file test1.avl contains i1 i2 i3 i4 i5 d4 p, then python avl.py test1.avl produces the output:2. / \ 1 5 /\ /\ 3 /\

We compute coverage for this program using the coverage package for Python. Both statement and branch coverage are considered. (For more information about statement and branch coverage, read here.)

The reference avl.py implementation is available here. The program is about 300 lines long and is a self-contained single file.

A test suite for this application is a collection of .avl files. The reference starter test suite is available here.

Python Installation and Coverage Details

HW1b — PNG Graphics (C)

The second subject program is the portable network graphics file format reference library, libpng. It is used for image manipulation and conversion.

We will be using the developer-provided pngtest driver. It takes as input a single .png file.

We compute coverage for this program using the gcov test coverage program for gcc. Only statement coverage is considered.

The reference implementation (version 1.6.34) is available here. It contains about 85,000 lines of code spread over about 70 files.

A test suite for this application is a collection of .png files. The reference implementation comes with over 150 such example files. (You really can use any images that came with libpng, plus any images from anywhere else, as part of your submitted answer, see below.) Feel free to use an image manipulation or conversion program to create your own files or to use files or benchmarks you find on the Internet or to use the files that come with the program.

The difficulty difference between HW1a and HW1b is large. (For example, students may report that HW1a takes "minutes" while HW1b takes "hours". In addition, many students find that the difficulty lies more in compilation and instrumentation than in high-coverage images.)

C Installation and Coverage Details

HW1c — Data Visualization (Java)

The final subject program is the JFreeChart chart creation library. It is used to produce quality charts for a variety of applications and files.

We will be writing tests against its programmable API. Each test case will thus be a .java file that invokes methods from the chart-creation library.

We compute coverage for this program using the cobertura bytecode test coverage utility. Both statement and branch coverage is considered.

The reference implementation (version 1.5.0) is available here. It contains about 300,000 lines of code spread over about 640 files.

A test suite for this application is a collection of .java files. The reference starter test suite is available here. (You can submit those tests or modifications to those tests as part of your answer, provided you properly credit them in the written report.) Note that our grading server does not support graphical user interfaces, so any tests that create Java/AWT displays or widgets or GUI elements are unlikely to work.

Java Installation and Coverage Details

HW1d — Written Report

You must also write a short two-paragraph report reflecting on your experiences creating high-coverage tests for this assigment. (If you are working with a partner, indicate as much in the text or header. Recall from above that you need only submit one copy of the report, but if both you do, nothing bad happens. Select your partner on Gradescope.) Consider addressing points such as:

How did you go about manually creating or otherwise obtaining test cases?
Did you use or modify any of the tests we provided or tests from the Internet as part of your answer? (Reminder: You are allowed to do that in this class!)
What was harder than you expected or different than you expected?
What was similar or different between the three subject programs?
(If you used any external resources, such as image files from the Internet or help from Stack Overflow or a free instance of ChatGPT, list it in an optional third paragraph. There is no particular format -- a list of URLs is fine. If used the same general source multiple times, just cite it once.)

(Many students report stress or uncertainty about the length limit. The safest answer is to go with the restriction as written. The graders are free to dock points if your answer is too long. In practice, there are a bunch of ways to cheese the system (e.g., we don't give a word count). I would expect three short paragraphs to be about the same as two longer paragraphs. The real challenge here is that we are asking you to be concise. In SE, your manager will typically not read overly-long reports. So the rule isn't strict, but it's judged by humans. Students are often unhappy when their expectations are unmet. Basically the only failure case here is "you think writing longer is OK" + "the graders decide to be firm about it" so you lose points. So you can write something longer, but you don't have any recourse if the subjective judgment says that yours is too long. If you don't feel comfortable with that gamble (e.g., for students who may not be native English speakers, or may not be confident in their prose), cleave to both the spirit and the letter of the law.)

Rubric:

5 points — a two-paragraph report reflecting on your activities creating high-coverage test suites for this assignment, including comparisons and contrasts between the subjects and your experiences and expectations (with a third paragraph listing any external sources or files used)
4 points — a reasonable report, but lacking a solid description of one or more aspects or significantly exceeding the length limit
3 points — a brief report, detailing only half of the required information
2 points — a report drawing only the bare minimum of contrasts and comparisons (e.g., between two aspects and no more)
1 point — a terse or uninformative report, perhaps describing only one activity or subject

-1 point — English prose or grammatical errors
-1 point — Text does not include University email address(es) and/or does not make it clear whether you are working alone or with a partner
-all points — Submission uses uncited external resources (or paid resources)

There is no explicit format (e.g., for headings or citations) required. You may discuss other activities (e.g., validation or automation scripts you wrote, etc.) as well.

The grading staff may select a small number of excerpts from particularly high-quality or instructive reports and share them with the class. If your report is selected you will receive extra credit.

Recommended Approach

You will almost certainly need to inspect the source code of the various programs (i.e., white-box testing) to construct high-coverage test suites.

For each program, start with any test cases we have provided. Many students also start with image files (for HW1b) or coding hints (for HW1c) they find on the Internet. Because of limits on autograding submissions, and so that you learn how coverage computations work across various systems (a relevant skill for interviews, etc.), you will almost certainly want to install the coverage utilities yourself. As you read each project's source code and expand your test suite, recompute the coverage locally. When you perceive that it has gone up significantly, submit to the grading server for a more official check.

Note the increasing size and complexity of the subject programs. Splitting your time evenly between them is probably not the optimal allocation. The first project is due earlier to prevent students from putting things off until the last minute.

Submission

For each of the three programmatic components (HW1a, HW1b, HW1c), submit your (individual, not zipped) test cases via the autograder.io autograding server. You may submit multiple times (see the grading server for details). For the written component (HW1d), submit your PDF report via Gradescope.

The exact grading rubric (e.g., x points for y coverage) can be seen in the submission feedback details. As of 2019, the grading rubric scales roughly between:

HW1a — statement coverage 50% (one point) to 94% (full points); branch coverage 50% (one point) to 92% (full points)
HW1b — statement coverage 31% (one point) to 36% (full points)
HW1c — statement coverage 6.2% (one point) to 9.0% (full points); branch coverage 3.75% (one point) to 4.9% (full points)

However, the grading rubric on the autograder is officially correct and the rough ranges above are provided only as a planning convenience and may be slightly inaccurate.

Note that your local results may differ largely from the autograder results and that is fine and expected. For example, you may observe 6.8% line coverage "at home" and 16.8% on the autograder. (Different compilers, header files, optimizations, etc., may result in different branch and line counts.) Your grade is based on the autograder results, regardless of what you see locally.

We always use your best autograder submission result (even if your latest result is worse) for your grade.

Note that you are typically limited to a maximum number of test cases (i.e., 50 files totalling 30 megabytes). If you find that you have created or found more than that, minimize your test suite by prioritizing those that most increase coverage. (You might, as an example, use a for loop to add one test to the suite at a time, noting the new coverage after each addition, and use that information to make your decision.)

Note also that the grading server has resource usage (e.g., CPU time) caps. As a result, if you submit a test suite that is very large or long-running, we may not be able to evaluate your submission. Consider both quantity and quality.

Feel free to scour the web (e.g., Stack Overflow, etc.) or this webpage (e.g., the example tests shown above) or the tarballs (e.g., yes, you can submit pngtest.png or toucan.png if you want to) for ideas and example images to use directly as part of your answer (with or without modification) — just cite your sources (or URLs) in the report. However, submissions are limited to 50 test cases (so just finding a big repository of two hundred images may not immediately help you without additional work) totalling 30 megabytes. In addition, you may never submit another student's work (images or test selection) as your own.

Commentary

As befitting a 400-level elective, this assignment is somewhat open-ended. There is no "one true path" to manually creating high-coverage test suites.

You may find yourself thinking something like "I know what to do at a high level, but I don't know specifically what to create here. I could try to read and understand the program to make tests that cause it to do different things, but reading and understanding the program takes a long time, especially since I didn't write this code and I don't really care about it and I'm only doing it because someone is telling me to." Such concerns are, in some sense, the point: they are indicative of industrial software engineering.

History suggests that HW1b is the hardest, HW1c is moderate, and HW1a is the easiest. While your results may vary, feel free to use this information when planning your time.

Finally, students sometimes wonder if it is possible to get 100% coverage on 1a. While it should be possible to get 100% coverage for the general AVL tree algorithm, this particular implementation of the algorithm may well have branches that you are unable to reach — especially given this testing setup.

发表评论

电子邮件地址不会被公开。必填项已用*标注

姓名 *

电子邮件 *

验证码 *