To initialize or not to initialize - the dirty pipe vulnerability

What happened?

In February 2022, Max Kellerman found corrupted archives in a log system. It turned out that the corruption was caused by a vulnerability in the Linux Kernel, which made it possible to overwrite data in read-only files. The root cause was a reused buffer, which was not initialized properly. Contributors introduced the problem in multiple steps, and only the last action made it exploitable.

  • In 2006, the splice() system call was introduced to make moving data around more efficient by not copying everything to kernel space and back. The splice() call connects pages from the source to the first output buffer of the pipe, which cannot be merged. More on this merging property later.

  • In 2020, after some refactoring, the merge flag ended up stored in every buffer. This flag was checked before appending data to a buffer. However, this change was not propagated correctly, and the buffer flag initialization was missed during the splicing operation.

Pipes before they become dirty

Linuxpipes are a form of Inter-Process Communication between related processes. Every pipe has an input and output interface, so one process can write to the pipe, and another one can read it. For example, if a process creates a pipe (with the pipe() system call), it receives two file descriptors (fd[0] and fd[1] in the figure), which can be used to read and write data.

Figure 1. Workflow of a pipe, where the two end of the pipe are handled by the same process

For every pipe, the kernel stores the data in a ring buffer. Each buffer refers to a page in the page cache.

The write operation is implemented in the pipe_write() function, with two main branches. In the case of small writes, if the pipe is not empty, the pipe_write() tries to merge the data into the last buffer. This is an excellent optimization step.

01| struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask];
02| int offset = buf->offset + buf->len;
04| if ((buf->flags & PIPE_BUF_FLAG_CAN_MERGE) && // can we merge?
05|     offset + chars <= PAGE_SIZE) { // does it fit?
06|   ret = pipe_buf_confirm(pipe, buf);
07|   if (ret)
08|     goto out;
10|   ret = copy_page_from_iter(buf->page, offset, chars, from);
Figure 2. Short writes from pipe_write() in pipe.c

Otherwise, a new buffer is used from the head of the pipe and filled with the data. These new buffers are marked with the PIPE_BUF_FLAG_CAN_MERGE flag, so the next write may merge data. In the case of a newly used buffer, the write function allocates a page in the memory, which is written to the buf structure.

01| buf = &pipe->bufs[head & mask]; // get the buffer at the 'head' of the ring
02| buf->page = page;
03| buf->ops = &anon_pipe_buf_ops;
04| buf->offset = 0;
04| buf->len = 0;
05| if (is_packetized(filp))
06|   buf->flags = PIPE_BUF_FLAG_PACKET;
07| else
08|   buf->flags = PIPE_BUF_FLAG_CAN_MERGE; // allow small write merges
10| pipe->tmp_page = NULL;
12| copied = copy_page_from_iter(page, 0, PAGE_SIZE, from);
Figure 3. "New" buffer allocation from pipe_write() in pipe.c

In the pipe_read() function, the reader can read data from the pipe buffers until it becomes empty. The corresponding page is released if a buffer is read out, but the flags are not cleared in the buf structure.

01| if (!buf->len) {
02|   pipe_buf_release(pipe, buf); // page released
03|   spin_lock_irq(&pipe->rd_wait.lock);
04|   tail++;
05|   pipe->tail = tail;
06|   spin_unlock_irq(&pipe->rd_wait.lock);
07| }
Figure 4. Buffer release without flag clearing from pipe_read() in pipe.c

Splice the pipes

The splice system call speeds up copying data between processes without continually transferring it between kernel and userspace. If the destination is a pipe, the splice will call the copy_page_to_iter_pipe() function, which writes the source page into the page member of the buf struct. Now, the pipe_read() can obtain data from the original page without actually transferring bytes.

01| buf = &pipe->bufs[i_head & p_mask];
02| if (off) {
03|   if (offset == off && buf->page == page) {
04|     /* merge with the last one */
05|     buf->len += bytes;
06|     i->iov_offset += bytes;
07|     goto out;
08|   }
09|   i_head++;
10|   buf = &pipe->bufs[i_head & p_mask];
11| }
12| if (pipe_full(i_head, p_tail, pipe->max_usage))
13|   return 0;
15| buf->ops = &page_cache_pipe_buf_ops;
16| get_page(page);
17| buf->page = page; // use the source page for this buffer's backing page
18| buf->offset = offset;
19| buf->len = bytes;
Figure 5. Use the source page for "new" buffer from copy_page_to_iter_pipe() in iov_iter.c

Although the page reference of the source was written into the pipe’s buffer, the flag of the current buffer was not initialized.

Do you love to dig C code to hunt for vulnerabilities?

Would you like to see the exploitation of this bug in real life? We cover it in some of our courses related to secure coding in C. Interested? Check it out on our website!

Pipe becomes dirty

To understand the problem here, we must place the previous building blocks in the appropriate order. First, we have to open a pipe and write some data. Because of this, a page will be allocated and referenced in the current pipe buffer (head of the pipe ring). In addition, the PIPE_BUF_FLAG_CAN_MERGE flag is also set in the pipe buffer (marked as yellow).

Figure 6. Result of pipe write operation

We know that the PIPE_BUF_FLAG_CAN_MERGE is not cleared during a read and is also untouched before the splice operation. Let’s try to make the kernel reuse this buffer. During the pipe_read(), the depleted buffer becomes reusable. Since it’s a ring buffer, the empty buffer becomes the tail of the ring. So, we can modify the PIPE_BUF_FLAG_CAN_MERGE flag for all buffers by completely filling up and draining the pipe. This will also set the tail as the new head.

Figure 7. State of the pipe buffers after filling up the whole pipe

After draining the pipe, the pipe will be empty, but every buffer has the PIPE_BUF_FLAG_CAN_MERGE flag set, which means smaller data (below the page size) can be written into the buffer.

Figure 8. State of the pipe buffers after draining the pipe

Now splice is coming into the picture since it can connect a page from a source (e.g., file or process) to the pipe buffer (to read it).

Figure 9. Buffer state after the splice command

Some data (at least 1 byte) can be transferred from the input to our pipe with the splice call. The pipe buffer will reference the page of the input data, which the pipe read can obtain. The flag of the current buffer remained PIPE_BUF_FLAG_CAN_MERGE, so a pipe write will push data to the page of the input. Even if it was opened as read-only.

Figure 10. The moment when the file corruption happens

How does this affect you, and what should you do?

When writing this blog post, the Raspberry Pi OS versions come with the kernel version 5.10. We reproduce this vulnerability on the 32bit version of the RPI OS.

As you can see in the image below, with the help of the provided PoC code (with some commenting modification), we could change the pi user to root user.

Figure 11. Running and testing the result of the PoC exploit

The result can be seen in the passwd file, showing that the root user has a new password:

Figure 12. Content of the passwd file, with the changed root password

The impact is wide-ranging. It even affects some Android versions. If you use any Linux distribution with the affected kernel versions, you should update your system to a version where the patch has been applied.

Fortunately, a valid patch candidate was born together with the exploit. As far as this blog was written, the following kernel versions had been patched:

  • 5.16.11,
  • 5.15.25,
  • 5.10.102.

Secure coding lessons

Let’s look at some of the critical lessons within this vulnerability. First of all, always initialize the variables! Considering the commit which contains the patch for the dirty pipe vulnerability, it can be seen that after the pipe ring buffer is drained, the flags have to be reset again.

01| buf->ops = &page_cache_pipe_buf_ops;
02| buf->flags = 0;
03| buf->page = page;
04| buf->ops = &default_pipe_buf_ops;
05| buf->flags = 0;
06| buf->page = page;
Figure 13. The fix for dirty pipe

Secondly, in any case of code refactoring, you should be very careful and precise not to change the original behavior of the code. In many cases, negligent refactoring can lead to inconsistent or unpredictable behavior, which is very difficult to observe. Diligent unit testing and an automated test suite is a great help.

Closing thoughts

The dirty pipe is a critical vulnerability that enables privilege escalation with a high likelihood of being utilized. However, it is a local-only attack, which means that the attacker must have local user access to the computer, making it a bit less severe.

Still, because it takes several commits in the code, which means years in time, to get to the point where this bug becomes a vulnerability, we shall be prepared for other similarly malicious vulnerabilities in the future, just like in the case of the sudo bug.

This post couldn’t have come to life without the help of David Lukacs.

comments powered by Disqus