Tuesday, October 9, 2018

BH18: Why so Spurious? How a Highly Error-Prone x86/x64 CPU "Feature" can be Abused to Achieve Local Privilege Escalation on Many Operating Systems

Nemanja Mulasmajic  and Nicolas Peterson are Anti-Cheat Engineers at Riot Games.

This is about a hardware feature available in Intel and ARM chips. The “feature” can be abused to achieve local privilege escalation.

CVE-2018-8897 – this is a local priv escalation – read and write kernel memory from usermode. Execute usermode code with kernel privileges. Affected Windows, Linux, MacOS, FreBSD and some Xen configurations.

To fully understand this, you’ll need to have some good assembly knowledge and privilege models. In the standard model, Ring 1 and 2 are really never used, just Ring 3 (least privileged) to Ring 0 (most) (it is a simplified view).

Hardware breakpoints cannot typically be sent by userland, though there are often ways to do it in syscalls. When an interrupt fires, it transfers execution to an interrupt handler. Lookup is based off of the interrupt descriptor table (IDT), which is registered by the OS.

Segmentation is a vestigial part of the x86 architecture now that everything leverages paging.  You can still set arbitrary base addresses.  The first 2 bits describe if you’re in kernel or user mode. Depending on the mode of execution, the GS base means different things (it holds data structures relevant to the mode of execution). If we’re coming from user mode, we need to call SWAPGS to update to the equiv in kernel mode.

MOV SS and POP SS force the processor to disable external interrupts, NMIs and pending debug exceptions until the boundary of the instruction following the SS load was reached. The intended purpose was to prevent an interrupt from firing immediately after loading SS but before loading a stack pointer.

It was discovered while building a VM detection mechanism, as VMs were being used to attack Anti-Cheat.. They thought – what if VMEXIT occurs during a  “blocking” period? Let’s follow the CPUID… They started thinking about what would happen if they did interrupts at unexpected times.
So, what happens? Why did his machine crash?  Before KiBreakpointTrap executes its first instructions, the pending #DB is fired (which was suppressed by MOV SS) and execution redirects to where KiBreakpointTrap, which sends execution back to where it *thought* it should go – kernel (though it had come from user mode).

Code can be found at github.com/nmulasmajic, if you aren’t passed, system will crash.  Showed demo of 2 lines of assembly code putting a VM into a deadlock.

They can avoid SWAPGS since Windows thinks they are coming from kernelmode.  WRGSBASE writes to the GSBASE address, so use that!

They fired a #DB exception at unexpected location, and then the kernel becomes confused. Handler thinks they are privileged, now they control GSBASE.  Now they just need to find instructions to capitalize on this…

Erroneously assumed there was no encoding for MOV SS, [RAX] only immediate. It doesn’t dereference memory, but POP SS does dereference stack memory. BUT… POP SS is only valid in 32-bit compatibility code segment. On Intel chips, SYSCALL cannot be used in compatibility mode. So… focusing on using INT # only.

With the goal of writing memory, found that if they caused a page fault (KiPageFualt) from kernelmode, they c ould call KeBugCHeckEx again.  This function dereferences GSBASE memory, which is under their control…

It clobbers surrounding memory. Had to make one CPU “stuck” to deal with writing to target location. Chose CPU1 since CPU0 had to service other incoming interrupts from APIC. CPU1 endlessly page faults, goes to the double fault handler when it runs out of stack space.

The goal was to load an unsigned driver. CPU0 does the driver loading. They attempted to send TLB shootdowns, forcing CPU0 to wait on the other CPUs by checking PacketBaerrier variable in its _KPCR. But, CPU1 is in a dead spin… will never respond. But, “luckily” there was a pointer leak in the +KPCR for any CPU, accessible from usermode. (the exploit does require a minimum of 2 CPUS).

It is complicated, and it took the researchers more than a month to make it work. So, they looked into the syscall handler – KiSystemCall64. They registered in the IA32_LSTAR MSR. SYSCALL, unlike INT #, will not immediately swap to kernel – actually made things easier. (Syscall funcions similar to Int 3)

Another cool demo J

A lot of this was patched in May. MS was very quick to respond, and most OSes should be patched by now. You can’t abuse SYSCALL anymore.

Lessons learned – want to make money on bug bounty? You need a cool name and a good graphic for your vuln (pay a designer!), and don’t forget a good soundtrack!