Security Through Obscurity

Exploring Dynamic Dispatch in Rust

2017-03-07T00:00:00-06:00

Let me preface this by saying that I am a novice in the world of rust (though I'm liking things so far!), so if I make technical mistakes please let me know and I will try to correct them. With that out of the way, lets get started.

My real motivation for taking a closer look at dynamic dispatch can be seen in the following code snippet. Suppose I want to create a struct CloningLab that contains a vector of trait objects (in this case, Mammal):

This works fine. You can iterate over the vector of subjects and call run or walk as you would expect. However, things break down when you try to add an additional trait to the trait object bounds like:

This fails with the the following error:

error[E0225]: only the builtin traits can be used as closure or object bounds
 --> test1.rs:3:32
  |
3 |     subjects: Vec<Box<Mammal + Clone>>,
  |                                ^^^^^ non-builtin trait used as bounds

And I found this surprising. In my mind, a trait object with multiple bounds would be analogous to multiple inheritance in C++. I would expect the object to have multiple vpointers for each 'base', and do dispatch through the appropriate one. Given that rust is still a somewhat young language, I could appreciate why the developers might not want to introduce that complexity immediately (being stuck with a poor design forever would be a high cost for little reward), but I wanted to work out exactly how such a system might work (or not work).

Vtables in Rust

Like C++, dynamic dispatch is achieved in Rust though a table of function pointers (described here in the rust docs). According to that documentation, the memory layout of a Mammal trait object made from a Cat will consist of two pointers arranged like:

I was surprised to see that the data members of the object had an additional layer of indirection. This is unlike the (typical) C++ representation which would look this:

With the vtable pointer first and the data members immediately following. The rust approach is interesting. It incurs a cost when 'constructing' a trait object, unlike the C++ approach in which a cast to a base pointer is free (or just some addition for multiple inheritance). But this cost is very minor. The rust approach has the benefit that an object does not have to store the vtable pointer if it is never used in a polymorphic context. I think it is fair to say that rust encourages the use of monomorphism, so this is probably a good trade-off.

Trait Objects with Multiple Bounds

Returning to the original problem, lets consider how it is resolved in C++. If we have multiple traits (purely abstract classes) that we implement for some structure, then an instance of that structure will have the following layout (e.x., Mammal and Clone):

Notice that we now have multiple vtable pointers, one for each base class Cat inherits from (that contains virtual functions). To convert a Cat* to a Mammal*, we don't need to do anything, but to convert a Cat* to a Clone*, the compiler will add 8 bytes (assuming sizeof(void*) == 8) to the this pointer.

It is easy to imagine a similar thing for rust:

So there are now two vtable pointers in the trait object. If the compiler needs to perform dynamic dispatch on a Mammal + Clone trait object, it can access the appropriate entry in the appropriate vtable and perform the call. Because rust does not (yet) support struct inheritance, the problem of determining the correct subobject to pass as self, does not exist. self will always be whatever is pointed at by the data pointer.

This seems like it would work well, but this approach also has some redundancy. We have multiple copies of the type's size, alignment, and drop pointer. We can eliminate this redundancy by combining the vtables. This is essentially what happens when you perform trait inheritance like:

Using trait inheritance in this way is a commonly suggested trick to get around the normal limitation of trait objects. The use of trait inheritance produces a single vtable without any redundancy. So the memory layout looks like:

Much simpler! And you can currently do this! Perhaps what we really want is for the compiler to generate a trait like this for us when we try to make a trait object with multiple bounds. But hold on, there are some significant limitations. Namely, you cannot convert a trait object of CloneMammal in to a trait object of Clone. This seems like very strange behavior, but it is not hard to see why such a conversion won't work.

Suppose you attempt to write something like:

Line 10 must fail to compile because the compiler cannot possibly find the appropriate vtable to put in the trait object. It only knows that the object being referenced implements CloneMammal, but it doesn't know which one. Of course, we can tell that it must be a Cat, but what if the code was something like:

The problem is more clear here. How can the compiler know what vtable to put in the trait object being constructed on line 17? If clone_mammal refers to a Cat, then it should be the Cat vtable for Clone. If it refers to a Dog then it should be the Dog vtable for Clone.

So the trait-inheritance approach has this limitation. You cannot convert a trait object in to any other kind of trait object, even when the trait object you want is more specific than the one you already have.

The multiple vtable pointer approach seems like a good way forward to allowing trait objects with multiple bounds. It is trivial to convert to a less-bounded trait object with that setup. The vtable the compiler should use is simply whatever is already Clone vtable pointer slot (the second pointer in diagram 4).

Conclusions

I hope going through this was a useful exercise to some readers. It certainly helped me organize how I was thinking about trait objects. In practice, I think this is not really a pressing issue, the restriction was just surprising to me.

Reversing C++ Virtual Functions: Part 2

2017-01-24T00:00:00-06:00

In the previous part I described one approach to 'devirtualize' function calls in a small C++ program. Naturally there were several limitations to that approach, namely that it is very manual. If the target binary contains thousands of vtables, it is not practical to manually locate the tables and create these structures and relationships.

So, in this part I will go through a more precise description of the layout of vtables and how we can find them programmatically. I will also show how we can sometimes recover relationships between these vtables (and therefore, between the types they are associated with).

But first I need to describe the set of binaries this is applicable to. In the first part I mentioned that most things related to vtable layout were not specified in the standard, and so tended to vary from compiler to compiler. This is because the C++ standard needs to be applicable regardless of the underlying architecture. It would be unfortunate if the spec required some specific vtable layout that was inefficient on some architecture. The compiler developers for that architecture would be required to choose between performance and compliance (more than they already are).

However, because programs produced by different compilers frequently need to interact (most notable, for dynamic linking), compiler developers agreed to a kind of supplemental specification for things like vtable layout, exception implementation and others. The most common of these is the Itanium C++ ABI. This standard is implemented by GCC, clang, ICC, and many other compilers (but notably, not Visual Studio). The descriptions I give will be applicable these compilers.

The Itanium ABI is also still ambiguous in some areas. For example, it does not state what segments should be used to store vtables. So I will further specify that I'm describing GCC's particular brand of Itanium. So in essence, I am describing the highlighted section:

Additionally, the following assumptions are made:

RTTI is disabled (if it were on, this would be much easier)
The program does not contain occurrences of virtual inheritance. Unfortunately, discussing this would dramatically increase the complexity of this topic, and because virtual inheritance is somewhat uncommon I didn't think it was worth it.
These are 32bit binaries

More about vtable layout

Before we move forward, recall that in part 1, we described a vtable as a contiguous collection of function pointers in a data segment of the binary. We can also say that the array should only be referenced by its first element, because the other elements will be accessed as offsets in to this array.

.rodata:08048D48 off_8048D48     dd offset sub_8048B6E
.rodata:08048D4C                 dd offset sub_8048BC2
.rodata:08048D50                 dd offset sub_8048BE0

This is a section from a binary that seems to fit that definition. It is an array of 3 function pointers in the '.rodata' segment, and only the pointer at 0x08048D48 is referenced. It turns out that this is a vtable, so maybe this heuristic is good enough? If we were to compile the following code:

We would expect there to be 5 vtables, one for Mammal, Cat, Dog, Bird, and Bat. But as you might have guessed, things aren't that simple. In fact there are 6 regions in the binary that meet the above criteria. It becomes clear why this happens when you consider the layout of an object with multiple inheritance.

Notice that Bat includes a complete instance (called subobjects) of Bird and Mammal as well as a vptr for each. These pointers point to different tables. So a type with multiple parents has a vtable in the binary for each one. The Itanium ABI refers to these as a "virtual table group".

Virtual Table Groups

A virtual table group consists of a primary table for the first parent type, and an arbitrary number of secondary tables, one for each parent type after the first. These tables will be adjacent in the binary, in the order the parent types were declared in the source. With this in mind, we would expect the vtable group for Bat to be something like:

Offset	Description	Bat's vtable for
0	Address of Destructor 1	Bird
4	Address of Destructor 2	Bird
8	Address of Bat::Fly	Bird
12	Address of Destructor 1	Mammal
16	Address of Destructor 2	Mammal
20	Address of Mammal::walk	Mammal

With each vtable taking 12 bytes. Recall from part 1 that there will be two destructors, and because Bat does not override walk, we would expect the walk from Mammal to appear in Bat's table. However, if we examine the binary we don't see any place with 6 consecutive function pointers in the .rodata segment.

If we look more closely at the Itanium specification, we can see why. A virtual table does not consist of just function pointers. In fact a vtable looks more like this:

Itanium vtable layout (without virtual inheritance)

The RTTI pointer will typically point to an RTTI struct (that is also described by the Itanium spec). However, because we are assuming RTTI is disabled, it will always be 0. The offset to top has a value equal to the number of bytes that must be added to the this pointer to get the start of the object from some subobject. This is probably a little confusing, so to clarify, image the following code:

These assignments to b and m are both valid. The first does not require any instructions. A Bat is a Bird, and because Bird is its first parent, the Bird subobject is at the very beginning of any Bat object. Thus, a pointer to a Bat is also a pointer to a Bird. This is just like normal, single inheritance.

However, the assignment to m does require work. The Mammal subobject inside a Bat is not at the beginning, so the compiler must insert some instructions to add to bat to make it point to its Mammal subobject. The value added will be the size of Bird (and alignment). The negative of this value will be stored in the Offset to Top field.

This Offset to Top component of the vtable allows us to easily identify vtable groups. A group will consist of those consecutive vtables that have decreasing values in the Offset to Top. Consider the following:

These are the 6 vtables found in the binary built from the above source. Notice that table 2 has a value of -4 (0xFFFFFFFC as a signed int) for its Offset to Top, and all other tables have a value of 0. Also, each RTTI pointer is 0, as we expected. The -4 tells us two things:

Table 2 is a secondary table in a vtable group (because offset to top is not 0)
The size of the type associated with table 1 is 4. Keep in mind that because tables 1 and 2 form a table group, the size of the type associated with just table 1 is actually the size of part of the object (i.e a subobject).

Finding Vtables Programmatically

From the above, we can devise the following simple procedures to find vtable (groups) from a binary:

After running the above in the IDA python interpreter, you can execute find_tablegroups() to get a list of vtable group addresses. This could be combined with additional code to construct structures from each vtable, for example.

However, just knowing where tablegroups are is not very useful. We need some information about the relationships between the types associated with the tables. Then, we will be able to generate a list of 'candidate' function calls for a virtual call-site, so long as we know the 'family' the type is associated with.

Recovering Type Relationships

The simplest approach to recovering these relationships is to recognize that two vtables sharing a function pointer are necessarily related. We cannot recover the nature of that relationship, but it is enough to determine that they are in the same family.

But we can go further by considering the behavior of constructors and destructors in C++. An constructor performs the following steps:

Invoke the parent class's constructors
Initialize the vptr(s) to point to this type's vtable(s)
Initialize the members of the object
Run whatever other code is in the constructor

The destructor performs essentially the opposite steps:

Set the vptr(s) to point to this type's vtable(s)
Run whatever other code is in the destructor
Destroy the members of the object
Invoke the parent class's destructor

Notice that the vptr is again set to point to the vtable. This seems odd until you consider that virtual function calls should still work during destruction.

Suppose we modified the Bird destructor so it called fly. If you were to destruct a Bat object (which in turn called the Bird destructor when the Bat one was finished), it should call Bird::fly not Bat::fly, because the object is no longer a Bat. For this to work, the Bird destructor must update the vptr.

So, we know that each destructor will call the parent type's destructor, and we know that these destructors will reference the vtable (to assign it to the vptr). We can therefore reconstruct the inheritance hierarchy for a type by "following the destructors". Similar logic can be used for Constructors as well.

Consider the first entry in the first vtable (which we would expect to be a destructor):

Notice that there are two assignments, and these are both address points of vtables. This is step 1 in the list above. These object does not seem to have any members, because it proceeds directly to step 4 and calls the two other destructors. We can confirm that these other functions are destructors because of their location in a vtable (at the start of table 6 and table 3). Doing this for the remaining tables this tells us that the inheritance hierarchy was laid out like:

This matches the actual hierarchy from the source. There are two base classes and one class that has two parents.

Identifying Constructors

By similar reasoning, we can find the constructors associated with a vtable by noting that the constructors will be those functions that assign their vptr to a vtable address that are not destructors. By applying this rule to the target, we discover that there are 5 such functions, one for each type:

Constructor	Table
sub_8048AEC	Table 1/2
sub_8048A64	Table 3
sub_80489A8	Table 4
sub_80488EC	Table 5
sub_8048864	Table 6

Devirtualize

With this, we can look at the decompiled body of main:

The virtual functions are clearly visible on lines 28 and 29. However, we can also identify constructors on lines 13, 16, 22, and 25 from the tables above. Using this knowledge, we can follow the process from part 1 to see the devirtualization:

In the above screenshot, I have set v0 to have type type_8048D40*. This is the type associated with table 1/2 and also with the constructor on line 13. Similarly, the constructor on line 16 is associated with table 5, which I have created a type for named type_8048D98 (the are the addresses at which the tables start. I could just as easily have called them table_5 or some such). The same thing could be done with v2 and v3 to see the alternate possibilities for lines 28 and 29.

So, while the original source contained strings that would make identifying types and methods easy, we did not need any of them to perform our "devirtualization".

Conclusions

This is still a very manual process, but we have come a bit further. We are now able to (approximately) automatically detect vtables. It is not hard to see how we will be able to automate the construction of the associated structures, and then perhaps the location of constructor calls. We could also imagine reconstructing type trees. In the next part, we will delve in to this a bit more.

Reversing C++ Virtual Functions: Part 1

2016-12-17T00:00:00-06:00

There are a few posts in various parts of the internet discussing reverse engineering C++, and these often address virtual functions to a large or small extent. However, I wanted to take some time to write about dealing with virtual functions in large, ‘enterprisy’ code-bases. These can often include thousands of classes and massive type hierarchies, so I think it is worth describing some techniques for reversing them. But before that I’m going to go through some more simple cases. If you are already familiar with virtual function reversing, then you my want to proceed directly to part 2.

It’s also worth noting the following:

The code was compiled without RTTI (RTTI will be discussed later) and without exceptions
I’m using 32bit x86 as the example platform
The binaries have been stripped
Most virtual function implementation details are not standardized and can vary from compiler to compiler. For this reason, we’re going to focus on the behavior of GCC.

So in general, the binaries we’re looking at have been compiled with g++ -m32 -fno-rtti -fnoexceptions -O1 file.cpp and then stripped with strip.

The Goal

In most cases, we cannot hope to “devirtualize” a virtual function call. The information needed to do that is just not present until runtime. Instead, the goal of this exercise will be to determine which function might be being called at a particular point. In later parts we will focus on narrowing down the possibilities.

The Basics

I’m assuming that you are familiar with writing C++ but maybe not with its implementation. So, let’s start by looking at how the compiler implements virtual functions. Suppose we have the following classes:

And we have some code that uses them:

Of course whether m is a Cat or Dog depends on the output of rand. The compiler cannot know this ahead of time, so how does it call the right function? The answer is that for each type having a virtual function, the compiler inserts a table of function pointers called a vtable into the resulting binary. Each instance of such a type is given an additional member called a vptr that points to the correct vtable for that object. Code to initialize this pointer with the right value will be added to the constructor.

Then, when the compiler needs to call a virtual function, it can just access the correct entry in the vtable for the object and call that. This means that the entries in the table must be in the same order for each related type (each class’s run could be at index 1, every walk at index 2, etc).

So we would expect to find three tables in the binary for Mammal, Cat and Dog. We can locate them quickly by looking through .rodata for adjacent function offsets:

What about the main function? It decompiles to:

We can see that 4 bytes are being allocated in either branch. This makes sense, as the only data in the structure is the vptr added by the compiler. We can also see the virtual function calls on lines 15 and 17. In the first, the compiler is dereferencing (to get the vptr) and adding 12 to access the 4th entry in the vtable. Line 17 gets the 2nd entry in the table. The program then calls the function pointer it retrieved from the table.

Looking back at the tables, the 4th entries are sub_80487AA, sub_804877E, and ___cxa_pure_virtual. If we look at the bodies of the two “sub_” functions we see that they are the definitions of walk for Dog and Cat (shown in the pictures). By elimination, the ___cxa_pure_virtual function must belong to the vtable for Mammal. This makes sense, as Mammal has no definition of walk, and these “pure_virtual” entries are inserted by GCC when a function is (unsurprisingly) purely virtual. So, table 1 must be for Mammal objects, 2 is for Cats and table 3 is for Dogs.

But it is seems strange that there are 5 entries in each vtable when there are only 4 virtual functions in play:

run
walk
move
the destructors

The additional entry is an ‘extra’ destructor. This is here because GCC will insert multiple destructors that are used in different circumstances. The first of these will simply destroy the members of the object. The second will also delete the memory that was allocated for the object (this is the version called in the example in line 17). In some cases there may be a 3rd version that is used in certain virtual-inheritance circumstances.

By looking back at the contents of the ‘sub_’ functions, we find the layout of the vtables are as follows:

| Offset | Pointer to  |
|--------+-------------|
|      0 | Destructor1 |
|      4 | Destructor2 |
|      8 | run         |
|     12 | walk        |
|     16 | move        |

However, notice that the first two entries in the Mammal table are zero. This is an eccentricity of newer versions of GCC. The compiler will replace the destructor entries with NULL pointers in classes that have a pure-virtual method (i.e., classes that are abstract).

With all this in mind, let’s do some renaming. Afterwards we’re left with:

Notice that because neither Cat nor Dog implemented move, they both inherited the definition from Mammal and so the move entries in their vtables are the same.

Structures

At this point is useful to start defining some structures. We’ve already seen that the only member of the Mammal, Cat, and Dog structures will be their vptrs. So we can define these quickly:

The next step is a bit more complicated. We’re going to create a structure for each vtable. The objective here is to get the decompiler output to show us what function would actually be called if m had a particular type. We can then cycle through these possibilities and examine all of the options.

To achieve this, the members of this structure will have the name of the corresponding function it will point to, like so:

You will need to set the type of the vptr for each structure to be the corresponding Vtable type. For example, the type of the vptr for Cat should be CatVtable*. Additionally, I have set the type of each vtable entry to be a function pointer. This will help IDA show things correctly. So the type of the Dog__run element should be void (*) (Dog*) (because that is the signature of Dog__run).

If we go back to the decompiled code for main, we can now rename the local variable to m, and set its type to be Cat* or Dog*. Afterwards we see:

Now we can easily see the possible functions being called at the call-sites. If m is a Cat then line 15 will call Cat__walk, if it is a Dog then it will call Dog__walk. Obviously this was a simple example, but this is the general idea.

We could also set the type of m to be Mammal*, but we will see some problems if we do that:

Notice that if the real type of m was Mammal then the call at line 15 would be to a pure-virtual function. This should never happen. There's also a call to a null pointer at line 17 which would obviously cause issues. So we can conclude that m must not be a Mammal.

This may seem strange, because m is in fact declared as a Mammal*. However, that type is the compile-time type (a.k.a., the static type). We are interested in the dynamic type (or runtime-type) of m, because this is what determines which function is called in a virtual function call. In fact, the dynamic type of an object can never be an abstract type. So if a given vtable contains one of the ___cxa_pure_virtual functions, then it is not a candidate and you can ignore it. We could have not created a vtable structure for Mammal because it will never be used (but I hope seeing why was useful).

So the dynamic type will be Cat or Dog, and we know which functions will be called in either case by looking at their vtable entries. This is the basics of virtual function reverse engineering. In the next part we will go in to how to deal with larger code bases and more complex scenarios.