Defending Attacks Part 2: Kernel Exploit "DirtyCOW"

Some Background

    When code is executed in the kernel it has more capabilities than code executed in user mode. This includes access to kernel data structures, access to the virtual address space of any process, and many other privileged abilities. When code is written to "trick" the kernel to perform an unethical action (often leading to privilege escalation) this is called a kernel exploit. In this post we'll be examining a famous kernel exploit known as "Dirty COW" which applies a race condition to a memory mapped file allowing a non-privileged user to modify a file that should be read only to them.


Important Concepts to Refresh

Pages

    In operating systems memory is not handled in bytes at a time. Rather it is handled in the "unit of transfer" also known as pages. Pages are essentially "blocks of memory", typically 4KiB (4096 bytes) in size. Pages are used when referencing both virtual and physical memory. For this article, when talking about the address space for a process, this will imply virtual pages (VP), and when talking about main memory or disk, this will imply physical pages (PP). In Linux VP are mapped to PP as shown below:

(The page colors have no significance other than to help distinguish adjacent pages)


One important detail is that pages in main memory are directly accessible by the process, whereas pages in disk first need to be brought into main memory before the process can access them. This will be a small detail to remember when analyzing the exploit.

System Call: mmap()

    In Linux, the system call mmap() is used to create a mapping between the virtual address space of the calling process and a file. The mapping can be of different sizes, but for the sake of simplicity this article will assume the file is small enough that it fits inside of one page. With this in mind, the following diagram is an example of what the environment might look like after a process calls mmap() to map "File 1", which again for our purposes will fit inside of 1 page:

(Again page color differences are only to help distinguish)


An important detail is that mmap() does not bring the physical page into main memory. The page will not be brought into main memory until the process actually tries to access the physical page, at which point this will cause a page fault and the page fault handler will bring the physical page into memory. When successful, mmap() returns the virtual address of the VP in the address space of the process (in the diagram above the virtual address of "VP File 1").

For more details on mmap() please see the man page as well as this helpful article covering memory mapped files.


Diving Into The Exploit

    To quickly reiterate, the idea behind the "Dirty COW" exploit is that there is a file owned by root that is read only to non-privileged users. This exploit allows a non-privileged user to modify that file. If an attacker were to utilize this exploit on the /etc/passwd file this could lead to immediate privilege escalation as I will demonstrate at the end of the article. For the initial walkthrough of the exploit we will assume a file "my_root_file.txt" has the following contents and permission:



    The full code of the exploit can be found here, and screenshots of this code will be referenced throughout the article. The first thing to make a note of are the global variables defined:


We will soon see that:

  • map - refers to the virtual address returned by mmap(), this will be the start address of the mapping of "my_root_file.txt"
  • f - refers to the file descriptor of the file to modify, in this case "my_root_file.txt" 
  • st - refers to the stat struct, and will be used to get stats, specifically size, of "my_root_file.txt"
  • name - is just the name of the file to modify "my_root_file.txt"

Jumping into the main function:


    First we use the system call open() to get a file descriptor of the file. We then call fstat() to read the stats of the file into the stat struct. Lastly we assign the name of the file to the global variable name. Note that the file must be opened with read only permissions since the user doesn't have write access. Next the main function utilizes mmap() as follows:



    Note, as written in the comment, MAP_PRIVATE must be used. This ensures that copy-on-write (COW) is used for the mapping. Again this will make sense later on. You may notice that PROT_READ (making the mapping read only) seems to contradict the fact that we want COW. For the current configuration, directly writing to the mapping e.g. *((char*)map) = 'f" would produce a segfault. However this exploit utilizes a different approach to writing which we will see shortly. The main takeaway of the above image is that map now has the virtual address of the page that maps to the underlying file on disk. It is a private mapping so COW is used, and the protections are read only. The following diagram summarizes what this might look like:



Again mmap() doesn't bring the file into main memory. So until the process tries to access the file in the code, the physical page will remain on disk.

For those interested in a more concrete view, here is the contents of the /proc/self/mem file for a process running the "Dirty COW" exploit:

(/proc/self/mem is a special file that lets you view mappings in the current address space of a process)



You can see above that "my_root_file.txt" is mapped to the virtual page spanning 0xb7fda000-0xb7fdb000 after the call to mmap() as highlighted in the GDB view of "dirtycow.c".

Now that we have an idea of what the current environment looks like, let's keep walking through the code. As mentioned earlier, "Dirty COW" takes advantage of a race condition. The next part of the code creates the two threads that will race against each other:



Let's first take a look at the function used by the second thread, "procselfmemThread":

(Note the "f" here is a local variable different from the global one mentioned earlier)




This function first gets a file descriptor to a special file called "/proc/self/mem". Without going into too much detail the "/proc" directory is a pseudo filesystem. Which is really just a fancy way of saying it contains files like a filesystem, but under the hood these files have special meanings. The "/proc/self/" directory contains files that have information/interfaces related to the current process. And the "/proc/self/mem" file is an interface to access memory related to the current process. For more details on the "/proc" directory please see here.

Shortly after "/proc/self/mem" is opened, the system call lseek() is used to jump to the map offset of this file. Recall that map was the global variable pertaining to the address of the virtual page that mapped to "my_root_file.txt". So really think of "/proc/self/mem" as the address space of the process, and lseek() simply moved the cursor to the page mapped to "my_root_file.txt".

Next, the code performs a write system call on "/proc/self/mem" at the address of the page corresponding to "my_root_file.txt".

To understand why this line matters, we need to understand a detail about how Linux handles the write() system call. The kernel allows for different file types to handle the write() system call differently. I am skipping over a fair amount of detail, but the most important part is that /proc/self/mem handles writes in a customized way.

Without getting into too much detail, a high level summary of what happens is as follows:

  1. The kernel tries to access the specified page (that corresponding to the file we want to write to), but causes a page fault because the page is not yet loaded into main memory. 
  2. Within the kernel a page fault handler will create a read only COW page and mark it dirty (hence the name), as well as private. For an in depth look at why this happens see here. The details for why this happens are long, and so I will leave it as further reading for those interested.
  3. Because the COW page is read only, the kernel "retries" accessing the COW page but this time only with read permission, which will succeed. 
  4. The COW page is what gets returned to the initial write system call, and in this special way of accessing the page (via a write to /proc/self/mem) the write is allowed.

The above steps are a bit convoluted, and require a few read throughs plus looking at some kernel code to understand. Again to get a really good in depth look see here. But really the big point is the kernel tries to initially access the page with a write, causing a COW page to be created and marked dirty. The kernel tries again to access the page, but this time only to read, and is given the COW page successfully which gets returned to the write system call. This bit about the second attempt being a read rather than a write is what makes this exploit possible.

With this in mind let's now take a step back and look at the other thread racing this one. That is "madviseThread".



 The main idea of this thread is the madvise() system call. This system call essentially gives the kernel advice on how to handle memory specified by an address range. In this case the address range is from map to map + 100, a.k.a. the first 100 bytes of "my_root_file.txt". The flag MADV_DONTNEED tells the kernel it can free up any resources associated with this address. For more information see the man page. Now, in this case what we are telling the kernel is that it can free the same page pertaining to the mapped file we want to write to. With this in mind, consider the following scenario:

  1. The kernel tries to write to the page pertaining to the file mapping causing a page fault and so a COW page of the file is made and marked dirty.
  2. The next step for the kernel will be to try to access the COW page, but this time only as a read. BUT WHAT IF A CONTEXT SWITCH OCCURS RIGHT BEFORE THIS?
  3. Supposing the context switch occurs, and supposing madvise() gets called on this page, the kernel will free the COW page leaving only the original page in disk pertaining to the true file, not a copy.
  4. When another context switch occurs, and the kernel resumes its attempt to access the COW page only as a read, a page fault will occur because the COW page was dropped from memory due to madvise. So again a page fault will occur due to the page not being in main memory. The page fault handler goes to the true underlying file, and tries to access it using a read. Because this file is read only there is no problem here, and so it returns the page corresponding to the true file.
  5. This page is what gets passed back to the write system call. So now the write system call modifies the page of the true file and therefore writes to the underlying file.

So the kernel has been tricked. It thought it was going to retrieve the COW page, but madvise snuck in right before this and so instead it ends up falling victim to an unintentional page fault, that brings in the page of the true file.

The following sequence of images demonstrates what happens when the race condition succeeds, a.k.a. what I just outlined in steps 1-5 above.

Initially after calling mmap(), the mapping of the file is to the page on disk until the process tries to access the pages of the mapping:


We try to access the page using a write to "/proc/self/mem" triggering a page fault and a private copy to be created:

(This private copy does not ever get written back to the underlying file)
A context switch occurs and madvise() tells the kernel it can drop the copied page:




A context switch occurs and the kernel now tries to access the same page again, with read only permissions, this triggers a page fault, but also brings in the true physical page, which then gets written to:

(When this page gets written to, it will affect the underlying file)




And the rest is history, the file gets written to illegally, unbeknownst to the kernel. 


Applying "Dirty COW" to Privilege Escalation:

    Now let's see how "Dirty COW" can cause issues when used by an attacker. On some versions of Linux it's possible to store a password hash inside of the /etc/passwd file. This password hash will be used to test the user's password when they login. So with "Dirty COW" we can write a new line to /etc/passwd that contains an entry for a new user, making sure they are a superuser, and set the password. First we figure out the password hash using openssl, let's say our new password will be "cow123":

(which is very insecure 😉 see my article on password attacks)



Openssl computes the password hash to be RtLR5toNaxpGk, and so the line we need to add to /etc/passwd will be "root2:RtLR5toNaxpGk:0:0:root/root:/bin/bash" for more details about why this works see here.

Suppose we've located a cron job on a computer, that executes every minute as root:



Looking at the permissions of the script being run by the cron job we see that it's read only to non-root users:



Looking at the contents of the cron job it can be seen it's simply appending data to a log file:



Applying "Dirty COW" to overwrite this cron job, we can cause it to instead write data to the /etc/passwd file every minute as follows:



Note the second argument to "Dirty COW" has lots of whitespace appended to overwrite everything after the echo one liner to clear out any remaining parts of the file. As seen above, after executing dirtycow, the file mycronjob.sh now has the malicious one liner to append data to the end of /etc/passwd, and because this is a cron job run as root it will succeed.

After a minute has passed /etc/passwd contains the newly added user "root2" whose password hash is that of "cow123":


And I am able to login as expected with root privilege:




Defending Against "Dirty COW" and Other Kernel Exploits

    When it comes to defending against kernel exploits like this, there is good news and bad news. The good news is that typically the fix is as simple as applying a patch to the kernel. The bad news however is that typically these types of exploits lie dormant for many years, and other than patching the kernel once the exploit is discovered, there really isn't much you can do. For instance as seen in this Wikipedia article on "Dirty COW"  this vulnerability existed since 2007, and wasn't patched until 2016. This means that for about 9 years anyone who could log into a Linux computer (excluding some distributions) could gain root access. 

While it may sound like Kernel exploits are the end all be all, there are still things you can do to remain proactive and mitigate these types of attacks. These include but are not limited to:
  • Staying up to date on new CVEs (vulnerabilities) by reading security blogs, newsletters, and other reputable sources. The sooner you can read up that a new vulnerability has been released the sooner you can apply the patch to any systems you have in production.
  • Don't provide low hanging fruit to attackers. An extensive list of protection tips can be found here.
  • Similar to the previous bullet point, but something that deserves specific attention, are "distroless" container images. Again, by reducing the attack surface, there are less exploits an attacker can rely on. Software security company Chainguard provides a Linux "undistribution" known as "Wolfi" that even goes so far as to exclude the Linux kernel. Without a kernel it is pretty difficult to attack with a kernel exploit, and so packaging an application in a "distroless" container is a great way to reduce the attack surface.

Comments

Popular posts from this blog

Defending Attacks Part 1: Password Cracking With Hydra and John The Ripper

Rust: Eliminating Use After Free Issues by Default