Assignment 7
Huffman Coding
CSE 13S, Winter 2024
Document version 1 (changes in Section 14)
Design Draft due Wednesday, March 6th , 2024
Assignment due Friday, March 8th,2024
1 Introduction
Jessie’s partner, Summer has recently found themselves in a pickle. Due to the large influx of Tank pictures from Alissa, their computer’s storage is running low. After looking around at what files they have on their computer, they realized that they had multiple large books on their computer. They also had a lot of images of their dog, Toby. Unfortunately, they need to keep the books around, but want more space on their computer for pictures of Tank as well. Summer would like you to help them compress many of the things that were on their computer. Your task will be to write two programs that can losslessly compress and decompress any files given to them.
One can use Huffman Coding to compress a data file (introduction[1], original paper[2]). The key idea is to determine which bytes (“symbols”) of the input file are most common and switch their representations to use fewer bits. To compensate, the less common symbols will switch to representations that uses more bits. The overall result is usually that fewer total bits are needed to represent the entire file, meaning that the file has been compressed.
In this assignment, you will create two programs. The first is a data compressor, huff, which computes the Huffman code of an input file.
huff -i infile -o outfile
This data compressor:
. reads a stream of bytes from a binary input file using functions from stdio.h: fopen(), fgetc(), fseek(), and fclose().
. writes a stream of bits to a binary output file using functions from bitwriter.c, which you will write.
The second program is a data decompressor, dehuff, which converts a Huffman Coded input file back into its original form.
dehuff -i infile -o outfile
This data decompressor:
. reads a stream of bits from its input file using functions from bitreader.c, which you will write.
. writes a stream of bytes to its output file using functions from stdio.h: fopen(), fputc(), and fclose().
In addition to bitreader.c and bitwriter.c, you will be creating two other support modules: node.c for a binary tree and pq.c for apriority queue. Here is a checklist of the source code that you will be writing.
□ A “bit writer” abstract data type in bitwriter.c (see Section 2 and bitwriter.h). We have provided you with unit tests for this module in bwtest.c.
□ A “bit reader” abstract data type in bitreader.c (see Section 3 and bitreader.h). We have provided you with unit tests for this module in brtest.c.
□ A binary tree abstract data type in node.c (see Section 4 and node.h). We have provided you with unit tests for this module in nodetest.c.
□ A priority queue abstract data type in pq.c (see Section 5 and pq.h). We have provided you with unit tests for this module in pqtest.c.
□ A Huffman Coding data compressor (see Section 6). We have provided you with system tests in runtests.sh.
□ A Huffman Coding data decompressor (see Section 7). We have provided you with system tests in runtests.sh.
Your Makefile will build the four unit-test programs ( bwtest, brtest, nodetest, pqtest) and the data compressor/decompressor programs (huff, and dehuff).
2 Bit Writer
Previously you have used the fopen(), fclose(), and fputc() functions to write a stream of bytes to a binary file. This approach works well when a binary file’s format is defined as a sequence of bytes. However in this assignment, the .huff file format is defined as a sequence of bits. So to make creating a .huff file straightforward, you will write a set of bit-writing functions.
2.1 BitWriter Functions
These functions write a binary file one bit at a time. So in your huff.c program, you do not call the fputc() function directly. Instead you use these functions, which you will write:
. bit_write_open() calls fopen()
. bit_write_close() calls fclose()
. bit_write_bit() calls fputc() after it collects 8 bits
The functions just mentioned manage a byte buffer.
. bit_write_open() creates the buffer.
. bit_write_close() flushes the buffer and frees it.
. bit_write_bit() collects bits into the buffer and writes the full buffer by calling fputc().
It is important to understand that when your huff.c program needs to write 8-bit, 16-bit, and 32-bit data values, do not call the function fputc() directly. This is because these values almost always will not be aligned on byte boundaries within the binary file. Instead, call these functions:
. bit_write_uint8() calls bit_write_bit() 8 times
. bit_write_uint16() calls bit_write_bit() 16 times
. bit_write_uint32() calls bit_write_bit() 32 times
So, essentially, all of the bit-writing functions send their data through bit_write_bit(). The functions in this section use these data types.
/* bitwriter.h */ typedef struct BitWriter BitWriter; |
/* bitwriter.c */ #include "bitwriter.h" struct BitWriter { FILE *underlying_stream; uint8_t byte; uint8_t bit_position; /* }; |
Function descriptions are below.
BitWriter *bit_write_open(const char *filename);
Open binary or text file, filename for write using fopen() and return a pointer to a newly aloocated BitWriter struct. You must check all function return values and return NULL if any of them report a failure. The pseudocode is below.
def bit_write_open(filename): allocate a new BitWriter open the filename for writing as a binary file, storing the result in FILE *f store f in the BitWriter field underlying_stream clear the byte and bit_positions fields of the BitWriter to 0 if any step above causes an error: return NULL else: return a pointer to the new BitWriter |
void bit_write_close(BitWriter **pbuf);
Using values in the BitWriter pointed to by *pbuf, flush any data in the byte buffer, close underlying_stream, free the BitWriter object, and set the *pbuf pointer to NULL. You must check all function return values and report a fatal error if any of them report a failure.
def bit_write_close(BitWriter **pbuf): if *pbuf != NULL: if (*pbuf)->bit_position > 0: /* (*pbuf)->byte contains at least one bit that has not yet been written */ write the byte to the underlying_stream using fputc() close the underlying_stream free the BitWriter *pbuf = NULL |
void bit_write_bit(BitWriter *buf, uint8_t bit);
This is the main writing function. It writes a single bit, bit, using values in the BitWriter pointed to by buf. This function collects 8 bits into the buffer byte before writing it using fputc(). You must check all function return values and report a fatal error if any of them report a failure.
def bit_write_bit(buf, bit): if bit_position > 7: write the byte to the underlying_stream using fputc() clear the byte and bit_position fields of the BitWriter to 0 set the bit at bit_position of the byte to the value of bit bit_position += 1 |
void bit_write_uint8(BitWriter *buf, uint8_t x);
Write the 8 bits of function parameter x by calling bit_write_bit() 8 times. Start with the LSB (least- significant, or rightmost, bit) of x.
def bit_write_uint8(buf, x): for i = 0 to 7: write bit i of x using bit_write_bit() |
void bit_write_uint16(BitWriter *buf, uint16_t x);
Write the 16 bits of function parameter x by calling bit_write_bit() 16 times. Start with the LSB (least- significant, or rightmost, bit) of x.
def bit_write_unit16(buf, x): for i = 0 to 15: write bit i of x using bit_write_bit() |
void bit_write_uint32(BitWriter *buf, uint32_t x);
Write the 32 bits of function parameter x by calling bit_write_bit() 32 times. Start with the LSB (least- significant, or rightmost, bit) of x.
def bit_write_uint32(buf, x): for i = 0 to 31: write bit i of x using bit_write_bit() |