18-213/18-613, Summer 2024
Attack Lab: Understanding Buffer Overflow Bugs
Assigned: Thursday, May 30
Due: Thursday, June 6, 11:59PM EDT
Last Possible Time to Turn in: Sunday, June 9, 11:59PM EDT
• Gain more experience with debugging tools such as GDB and OBJDUMP.
Note: In this lab, you will gain firsthand experience with methods used to exploit security weaknesses in operating systems and network servers. Our purpose is to help you learn about the runtime operation of programs and to understand the nature of these security weaknesses so that you can avoid them when you write system code. We do not condone the use of any other form of attack to gain unauthorized access to any system resources.
You can obtain your files from the Autolab site https://ics.autolabproject.com
After logging in to Autolab, select Attacklab -> Download handout. The Autolab server will build your files and return them to your browser in a tar file called targetk.tar, where k is the unique number of your target programs.
Save the targetk.tar file in a (protected) Andrew directory in which you plan to do your work. Then login to a shark machine and give the command: tar -xvf targetk.tar. This will extract a directory targetk containing the files described below.
You should only download one set of files. If for some reason you download multiple targets, choose one target to work on and delete the rest.
README.txt: A file describing the contents of the directory
ctarget: An executable program vulnerable to code-injection attacks
rtarget: An executable program vulnerable to return-oriented-programming attacks
cookie.txt: An 8-digit hex code that you will use as a unique identifier in your attacks.
farm.c: The source code of your target’s “gadget farm,” which you will use in generating return-oriented programming attacks.
In the following instructions, we will assume that you have already copied the files to a protected local directory, and that you are executing the programs in that local directory.
2.2 Important Points
This function reads a byte sequence from standard input, terminated by either a newline (’\n’), or end of file (EOF). Then it calls another function, process line, passing it the bytes that were read. (It does not add a NUL terminator to the bytes it reads, so what it passes to process line is not a “string.” Instead, it passes the number of bytes read as process line’s second argument.) In the code sample, you can see that read and process line stores the byte sequence in a local vari able buf, an array of BUFFER_SIZE bytes. (BUFFER_SIZE is a compile-time constant, specific to your version of CTARGET and RTARGET.) Notice that the while loop does not stop when BUFFER_SIZE bytes have been read. This is the same bug that’s found in the C library function gets: it keeps reading data until end of line or file, possibly overrunning the bounds of the storage allocated for the data.
$ ./ctargetCookie: 0x599051ebType string: Keep it short!No exploit, read_and_process_line returned normally.
Typically an error occurs if you type a long string:
$unix ./ctargetCookie: 0x599051ebType string:This is not a very interesting string, but it is quite longOuch!: You caused a segmentation fault!Better luck next time
(Note that the value on the Cookie: line will differ from yours.)
of the cookie shown will differ from yours.) Program RTARGET will have the same behavior. As the error message indicates, overrunning the buffer typically causes the program state to be corrupted, leading to a memory access error. Your task is to be more clever with the strings you feed CTARGET and RTARGET so that they do more interesting things. These are called exploit strings.
-h: Print list of possible command line arguments
-i FILE: Supply input from a file, rather than from standard input
• HEX2RAW expects two-digit hex values separated by one or more white spaces. So if you want to create a byte with a hex value of 0, you need to write it as 00. To create the word 0xdeadbeef you should pass “ef be ad de” to HEX2RAW (note the reversal required for little-endian byte ordering).
When you have correctly solved one of the levels, your target program will automatically send a notification to Autolab. For example:
Phase |
Program |
Level |
Method |
Function |
Points |
1 2 3 |
CTARGET CTARGET CTARGET |
1 2 3 |
CI CI CI |
touch1 touch2 touch3 |
10 25 25 |
4 5 |
RTARGET RTARGET |
2 3 |
ROP ROP |
touch2 touch3 |
35 5 |
Unlike the Bomb Lab, there is no penalty for making mistakes in this lab. Feel free to fire away at CTARGET and RTARGET with any strings you like.1
Figure 1 summarizes the five phases of the lab. As can be seen, the first three involve code-injection (CI) attacks on CTARGET, while the last two involve return-oriented-programming (ROP) attacks on RTARGET.
4 Part I: Code Injection Attacks
1 TOUCH_FN touch1(void) {
Your task is to get CTARGET to execute the code for touch1 when read and process line executes its return statement, rather than returning to test. Note that your exploit string may also corrupt parts of the stack not directly related to this stage, but this will not cause a problem, since touch1 causes the program to exit directly.
Your task is to get CTARGET to execute the code for touch2 rather than returning to test. In this case, however, you must make it appear to touch2 as if you have passed your cookie as its argument.
Phase 3 also involves a code injection attack, but passing a string as argument.
Within the file ctarget there is code for functions hexmatch and touch3 having the following C representations:
• It marks the section of memory holding the stack as nonexecutable, so even if you could set the program counter to the start of your injected code, the program would fail with a segmentation fault.
Fortunately, clever people have devised strategies for getting useful things done in a program by executing existing code, rather than injecting new code. The most general form of this is referred to as return-oriented programming (ROP) [1, 2]. The strategy with ROP is to identify byte sequences within an existing program that consist of one or more instructions followed by the instruction ret. Such a segment is referred to as a gadget. Figure 2 illustrates how the stack can be set up to execute a sequence of n gadgets. In this figure, the stack contains a sequence of gadget addresses. Each gadget consists of a series of instruction bytes, with the final one being 0xc3, encoding the ret instruction. When the program executes a ret instruction starting with this configuration, it will initiate a chain of gadget executions, with the ret instruction at the end of each gadget causing the program to jump to the beginning of the next.
would have popq %rdi as its last instruction before ret. Fortunately, with a byte-oriented instruction set, such as x86-64, a gadget can often be found by extracting patterns from other parts of the instruction byte sequence.
For example, one version of rtarget contains code generated for the following C function:
The byte sequence 48 89 c7 encodes the instruction movq %rax, %rdi. (See Figure 3A for the encodings of useful movq instructions.) This sequence is followed by byte value c3, which encodes the ret instruction. The function starts at address 0x400f15, and the sequence starts on the fourth byte of the function. Thus, this code contains a gadget, having a starting address of 0x400f18, that will copy the 64-bit value in register %rax to register %rdi.
Important: The gadget farm is demarcated by functions start_farm and end_farm in your copy of rtarget. Do not attempt to construct gadgets from other portions of the program code.
movq S, D
Source S |
Destination D |
|||||||
%rax |
%rcx |
%rdx |
%rbx |
%rsp |
%rbp |
%rsi |
%rdi |
|
%rax
%rcx
%rdx
%rbx
%rsp
%rbp
%rsi
%rdi
|
48 89 c0
48 89 c1
48 89 c2
48 89 c3
48 89 c4
48 89 c5
48 89 c6
48 89 c7
|
48 89 c8
48 89 c9
48 89 ca
48 89 cb
48 89 cc
48 89 cd
48 89 ce
48 89 cf
|
48 89 d0
48 89 d1
48 89 d2
48 89 d3
48 89 d4
48 89 d5
48 89 d6
48 89 d7
|
48 89 d8
48 89 d9
48 89 da
48 89 db
48 89 dc
48 89 dd
48 89 de
48 89 df
|
48 89 e0
48 89 e1
48 89 e2
48 89 e3
48 89 e4
48 89 e5
48 89 e6
48 89 e7
|
48 89 e8
48 89 e9
48 89 ea
48 89 eb
48 89 ec
48 89 ed
48 89 ee
48 89 ef
|
48 89 f0
48 89 f1
48 89 f2
48 89 f3
48 89 f4
48 89 f5
48 89 f6
48 89 f7
|
48 89 f8
48 89 f9
48 89 fa
48 89 fb
48 89 fc
48 89 fd
48 89 fe
48 89 ff
|
Operation |
Register R |
|||||||
%rax |
%rcx |
%rdx |
%rbx |
%rsp |
%rbp |
%rsi |
%rdi |
|
popq R |
58 |
59 |
5a |
5b |
5c |
5d |
5e |
5f |
C. Encodings of movl instructions
movl S, D
Source S |
Destination D |
|||||||
%eax |
%ecx |
%edx |
%ebx |
%esp |
%ebp |
%esi |
%edi |
|
%eax
%ecx
%edx
%ebx
%esp
%ebp
%esi
%edi
|
89 c0
89 c1
89 c2
89 c3
89 c4
89 c5
89 c6
89 c7
|
89 c8
89 c9
89 ca
89 cb
89 cc
89 cd
89 ce
89 cf
|
89 d0
89 d1
89 d2
89 d3
89 d4
89 d5
89 d6
89 d7
|
89 d8
89 d9
89 da
89 db
89 dc
89 dd
89 de
89 df
|
89 e0
89 e1
89 e2
89 e3
89 e4
89 e5
89 e6
89 e7
|
89 e8
89 e9
89 ea
89 eb
89 ec
89 ed
89 ee
89 ef
|
89 f0
89 f1
89 f2
89 f3
89 f4
89 f5
89 f6
89 f7
|
89 f8
89 f9
89 fa
89 fb
89 fc
89 fd
89 fe
89 ff
|
D. Encodings of 2-byte functional nop instructions
Operation |
Register R |
|||
%al |
%cl |
%dl |
%bl |
|
andb R, R orb R, R cmpb R, R testb R, R |
20 c0 08 c0 38 c0 84 c0 |
20 c9 08 c9 38 c9 84 c9 |
20 d2 08 d2 38 d2 84 d2 |
20 db 08 db 38 db 84 db |
Before you take on the Phase 5, pause to consider what you have accomplished so far. In Phases 2 and 3, you caused a program to execute machine code of your own design. If CTARGET had been a network server, you could have injected your own code into a distant machine. In Phase 4, you circumvented two of the main devices modern systems use to thwart buffer overflow attacks. Although you did not inject your own code, you were able inject a type of program that operates by stitching together sequences of existing code.
You have also gotten 95/100 points for the lab. That’s a good score. If you have other pressing obligations consider stopping right now.
Phase 5 requires you to do an ROP attack on RTARGET to invoke function touch3 with a pointer to a string representation of your cookie. That may not seem significantly more difficult than using an ROP attack to invoke touch2, except that we have made it so. Moreover, Phase 5 counts for only 5 points, which is not a true measure of the effort it will require. Think of it as more an extra credit problem for those who want to go beyond the normal expectations for the course.
To solve Phase 5, you can use gadgets in the region of the code in rtarget demarcated by functions start_farm and end_farm. In addition to the gadgets used in Phase 4, this expanded farm includes the encodings of different movl instructions, as shown in Figure 3C. The byte sequences in this part of the farm also contain 2-byte instructions that serve as functional nops, i.e., they do not change any register or memory values. These include instructions, shown in Figure 3D, such as andb %al,%al, that operate on the low-order bytes of some of the registers but do not change their values.
• Remember: Your exploit string must not contain the newline character (byte value 0x0a) at any inter mediate position
6 Logistical Notes
Whenever you correctly solve a phase, your target program will send a message to Autolab. The server will test your exploit string to make sure it really works, (From Autolab, follow Attacklab -> View scoreboard). This Web page is updated every minute or so to show the progress for each target. You should be sure to check this page after your submission to make sure your string has been validated. (If you really solved the phase, your string should be valid. But, an attack that works on your copy of one of the targets may be rejected by the version on the server, if it does not correctly implement the specified functionality.)
Each phase is graded individually. You do not need to do them in the specified order, but you will get credit only for the phases for which the server receives a valid message.
A Using HEX2RAW
The hex characters you pass to HEX2RAW should be separated by whitespace (blanks or newlines). We recommend separating different parts of your exploit string with newlines while you’re working on it. HEX2RAW supports C-style block comments, so you can mark off sections of your exploit string. For example:
Be sure to leave space around both the starting and ending comment strings (“/*”, “*/”), so that the comments will be properly ignored.
If you generate a hex-formatted exploit string in the file exploit.txt, you can apply the raw string to
12$unix cat exploit.txt | ./hex2raw | ./ctarget
$unix ./hex2raw exploit.txt exploit-raw.txt
This approach can also be used when running from within GDB:
3. You can store the raw string in a file and provide the file name as a command-line argument:
$unix ./ctarget -i exploit-raw.txt
This approach also can be used when running from within GDB.
The code can contain a mixture of instructions and data. Anything to the right of a ‘#’ character is a comment.
The generated file example.d contains the following:
The lines at the bottom show the machine code generated from the assembly language instructions. Each line has a hexadecimal number on the left indicating the instruction’s starting address (starting with 0), while the hex digits after the ‘:’ character indicate the byte codes for the instruction. Thus, we can see that the instruction push $0xABCDEF has hex-formatted byte code 68 ef cd ab 00.