Tuesday, October 9, 2018

BH18: Why so Spurious? How a Highly Error-Prone x86/x64 CPU "Feature" can be Abused to Achieve Local Privilege Escalation on Many Operating Systems

Nemanja Mulasmajic  and Nicolas Peterson are Anti-Cheat Engineers at Riot Games.

This is about a hardware feature available in Intel and ARM chips. The “feature” can be abused to achieve local privilege escalation.

CVE-2018-8897 – this is a local priv escalation – read and write kernel memory from usermode. Execute usermode code with kernel privileges. Affected Windows, Linux, MacOS, FreBSD and some Xen configurations.

To fully understand this, you’ll need to have some good assembly knowledge and privilege models. In the standard model, Ring 1 and 2 are really never used, just Ring 3 (least privileged) to Ring 0 (most) (it is a simplified view).

Hardware breakpoints cannot typically be sent by userland, though there are often ways to do it in syscalls. When an interrupt fires, it transfers execution to an interrupt handler. Lookup is based off of the interrupt descriptor table (IDT), which is registered by the OS.

Segmentation is a vestigial part of the x86 architecture now that everything leverages paging.  You can still set arbitrary base addresses.  The first 2 bits describe if you’re in kernel or user mode. Depending on the mode of execution, the GS base means different things (it holds data structures relevant to the mode of execution). If we’re coming from user mode, we need to call SWAPGS to update to the equiv in kernel mode.

MOV SS and POP SS force the processor to disable external interrupts, NMIs and pending debug exceptions until the boundary of the instruction following the SS load was reached. The intended purpose was to prevent an interrupt from firing immediately after loading SS but before loading a stack pointer.

It was discovered while building a VM detection mechanism, as VMs were being used to attack Anti-Cheat.. They thought – what if VMEXIT occurs during a  “blocking” period? Let’s follow the CPUID… They started thinking about what would happen if they did interrupts at unexpected times.
So, what happens? Why did his machine crash?  Before KiBreakpointTrap executes its first instructions, the pending #DB is fired (which was suppressed by MOV SS) and execution redirects to where KiBreakpointTrap, which sends execution back to where it *thought* it should go – kernel (though it had come from user mode).

Code can be found at github.com/nmulasmajic, if you aren’t passed, system will crash.  Showed demo of 2 lines of assembly code putting a VM into a deadlock.

They can avoid SWAPGS since Windows thinks they are coming from kernelmode.  WRGSBASE writes to the GSBASE address, so use that!

They fired a #DB exception at unexpected location, and then the kernel becomes confused. Handler thinks they are privileged, now they control GSBASE.  Now they just need to find instructions to capitalize on this…

Erroneously assumed there was no encoding for MOV SS, [RAX] only immediate. It doesn’t dereference memory, but POP SS does dereference stack memory. BUT… POP SS is only valid in 32-bit compatibility code segment. On Intel chips, SYSCALL cannot be used in compatibility mode. So… focusing on using INT # only.

With the goal of writing memory, found that if they caused a page fault (KiPageFualt) from kernelmode, they c ould call KeBugCHeckEx again.  This function dereferences GSBASE memory, which is under their control…

It clobbers surrounding memory. Had to make one CPU “stuck” to deal with writing to target location. Chose CPU1 since CPU0 had to service other incoming interrupts from APIC. CPU1 endlessly page faults, goes to the double fault handler when it runs out of stack space.

The goal was to load an unsigned driver. CPU0 does the driver loading. They attempted to send TLB shootdowns, forcing CPU0 to wait on the other CPUs by checking PacketBaerrier variable in its _KPCR. But, CPU1 is in a dead spin… will never respond. But, “luckily” there was a pointer leak in the +KPCR for any CPU, accessible from usermode. (the exploit does require a minimum of 2 CPUS).

It is complicated, and it took the researchers more than a month to make it work. So, they looked into the syscall handler – KiSystemCall64. They registered in the IA32_LSTAR MSR. SYSCALL, unlike INT #, will not immediately swap to kernel – actually made things easier. (Syscall funcions similar to Int 3)

Another cool demo J

A lot of this was patched in May. MS was very quick to respond, and most OSes should be patched by now. You can’t abuse SYSCALL anymore.

Lessons learned – want to make money on bug bounty? You need a cool name and a good graphic for your vuln (pay a designer!), and don’t forget a good soundtrack!

BH18: How I Learned to Stop Worrying and Love the SBOM

Allan Friedman  | Director of Cybersecurity, NTIA / US Department of Commerce

Vendors need to understand what they are shipping to the customer, need to understand the risks in what is going out the door. You cannot defend what you don’t know. Think about ingredients list on a box – if you know you have an allergy, you can simply check the ingredients and make a decision. Why should software/hardware we ship be any different?

There had been a bill before congress, requesting that there always be an SBOM (SW Bill of Materials) for anything the US Government buys – so they know what they are getting and how to take care of it. The bill was DoA, but things are changing…

The Healthcare Sector has started getting behind that. Now people in FDA and Washington are concerned about the supply chain. There should not be health care way of doing this, automotive way of doing this, DoD way of doing this… there should be one way.   That’s where the US Department 
of Commerce comes in.  We don’t want this coming from a single sector.

Committees are the best way to do this – they are consensus based. That means it is stakeholder driven, no single person can derail. Think about it like “I push, but I don’t steer”.

We need Software Component Transparency. We need to compile the data, share it and use it.  Committee kicked off on July 19 in DC. Some folks believe this is a solved problem, but how do we make sure the existing data is machine readable? We can’t just say ‘use grep’. Ideally it could hook into tools we are already using.

First working group is tackling defining the problem. Another is working on case studies and state of practice. Others on standards and formats, healthcare proof of concept, and others.

We need more people to understand and poke at the idea of software transparency – it has real potential to improve resiliency across different sectors.

BH18: Keynote! Parisa Tabriz

Jeff Moss, founder of Blackhat, started out the first session at the top of the conference, noting several countries have only one person from their country here – Angola, Guadalupe, Greece, and several others. About half of the world’s countries are represented here this year! Blackhat continues to offer scholarships to encourage a younger audience to attend, who may not be able to afford to. Over 200 scholarships were awarded this year!

To Jeff, it feels like the adversaries have strategies, and we have tactics – that’s creating a gap. Think about address spoofing – it’s allowed and turned on on popular mobile devices by default, though most consumers don’t know what it is and why they should turn it off.

With Adobe Flash going away, beliefs out there are this will increase SPAM and change that landscape. We need to think about that.

Parisa Tabriz, Director of Engineering, Google.
Parisa has worked as a pen tester, engineer and more recently as a manager. She has often felt she was playing a game of “whack-a-mole” – how do we get away from this? Where the same vuln (or a trivial variation of another vuln) pops up over and over. We have to be more strategic in our defense.
Blockchain is not going to solve our security problems. (no matter what the vendors in the expo tell you…)

It is up to us to fix these issues. We can make great strides here – but we have to realize our current approach is insufficient

We have to tackle the root cause, pick milestones and celebrate and build out your coalition.  We need to invest in bold programs – building that coalition with people outside of the security landscape.

We cannot be satisfied with just fixing vulnerabilities. We need to explore the cause and effect – what causes these issues.

Imagine a remote code execution (RCE) is found in your code – yes, fix it, but figure out why it was introduced (the 5 Whys)

Google has started Project Zero – Make 0-Day Hard. Project Zero was formed in 2014, treats Google products like 3rd party. Finding thousands of vulnerabilities. But they want to achieve the most defensive impact from any vulnerabilities they find.

Team found that vendor response varied wildly in the industry – and it never really aligned with consumer needs. There is a power imbalance between security researcher and the big companies making the software. Project Zero has set a 90 day release time line, which has removed the negotiation between a researcher and the big company. A deadline driven approach causes pain for the larger organizations that need to make big changes – but it is leading to positive change at these companies. They are rallying and making the necessary fixes internally.

One vendor improved their patch response time by as much as 40%! 98% of the issues are fixed within the 90-day disclosure period – a huge change!  Unsure what all of those changes are, but guessing it’s improved processes, creating security response teams, etc.

If you care about end user security, you need to be more open. More transparency in Project Zero has allowed for more collaboration.

We all need to increase collaboration – but this is hard with corporate legal, process and policies. It’s important that we work to change this culture.

The defenders are our unsung heroes – they don’t win awards, often are not even recognized at their office. If they do their job well, nobody notices.

We lose steam in distraction driven work environments. We have to project manage, and keep driving towards this goal.

We need to change the status quo – if you’re not upsetting anyone, then you’re not going to change the status quo.

One project Google is doing to change the world is to move people away from HTTP and to HTTPS on the web platform.  Not just Google services, but the entire world wide web.  We wanted to see a web that was by default secure – not opt-in secure. The old Chrome browser didn’t make this as obvious to users which was the better website – something to work on.

Browser standards come from many standards bodies, like IETF, W3C, ISO, etc – and then people build browsers on top of those using their own designs. Going to HTTPS is not as simple as flipping a switch – need to worry about getting certificates, performance, managing the security, etc.

Did not want to create warning fatigue, or to have it be inconsistently reported (that is, a site reported as insecure on Chrome, but secure on another browser).

Needed to roll out these changes gradually, with specific milestones we could celebrate. Started with a TLSHaiku poetry competition, which led to brainstorming.  Shared ideas publicly, got feedback from all over, and helped to build support internally at Google to drive this. Published a paper on how to best warn users.  Published papers regarding who was and was not using HTTPS. 

Started a grass root effort to help people migrate to HTTPS. Celebrated big conversions publicly, recognizing good actors.  Vendors were given a deadline to transition to, with clear milestones to work against, and could move forward. Had to work with certificate vendors to make it easier and cheaper to get certificates.

Team ate homemade HTTPS cake and pie! It is important to celebrate accomplishments, acknowledge the difficult work done. People need purpose – it will drive and unify them.

Chrome set out with an architecture that would protect a malicious site from attacking your physical machine. But, now with lots of data out there in the cloud, has grown the cross site data attacks.  Google’s Chrome team started the Site Isolation project in 2012 that prevented the data from moving that way.

We need to continue to invest in ambitious proactive defensive projects.

Projects can fail for a variety of reasons – management can kill the project, for example.  The site isolation project was originally estimated to be a year, but it actually took six….. schedule delay at that level puts a bulls-eye on you.  Another issue could be lack of peer support – be a good team player and don’t be a jerk!