Wednesday, May 15, 2019

ICMC2019: Random Numbers, Entropy Sources, and You

John Kelsey, Computer Scientist, NIST, United States

SP 800-90B - think about how random bits should be generated.

DRBG should always be between entropy source and the attackers. Entropy just gives you bits... with entropy (as per the sources promise).

SP 800-90B is not AIS31... though the two groups are talking.

Noise sources are where the entropy comes from, health tests verify the noise source and and conditioning.

Noise sources must be non-deterministic, often uses ring oscillators. You have to be able to describe this in detail. This is complicated, as many vendors are relying on someone else's noise source. They either don't know or there is an NDA around it - that won't get validated. Submitter also has to provide entropy estimate and a justification for that estimate.

Health tests - need to stay working to verify the entropy source continues to work after deployed in the field.

Conditioning is to improve the entropy. They are deterministic, so cannot add entropy...

IID = Independent and Identically Distributed - sample indepent of all others, independent of position in the sequence of samples. NIST will run statistical tests to try to disprove claim. If we can't disprove it, we assume it is true.

If you don't claim to be IID, NIST will apply many different entropy estimators against sequential datasets. they will look for things like bias after restart. May get you rejected or a lower estimate of entropy, if issues are found. Would rather underestimate than overestimate!

But... black box statistical tests can't reliably measure entropy.... Ideally you need to design it right and document it and share with NIST (where available).

Currently for conditioning you can choose: hash, HMAC, CMAC, CBC-MAC, DFs all from 800-90A. You can also roll your own. Or, just don't use it.

Problems are: we can't impact performance too much, can't expect this level of expertise at the labs...

ICMC19: Emerging Cryptography Trends in the Internet of Things


Charles White, CTO, Fornetix, United States

In a connected world, we need to think about security in new ways. There are a lot of IoT devices out there sensing... reading... waiting....   Cryptography is very similar to IoT! In the IoT landscape, we're starting to hear about Root of Trust, Data-in-Motion Encryption and Data-at-Rest Encryption.

Sensing vs Acting - acting has more requirements for encryption and authentication. Cryptography is Identity, Authentication and Authorization. There aren't users, logins, passwords... these are small devices that have little or no human interaction.  Crypto has to be that user, per se.

It's all about the root of trust. When you are going from factory to someone's livingroom, the consumers need to know the device hasn't been tampered with. But crypto can also be used to establish sessions, exchange information and data securely, etc.

When we talk about IoT, there is a lot of data in motion. Hard drive encryption and radio encryption both use symmetric keys - this is something we should understand how to do. Protection needs to be balanced with other requirements, suhch as bandwidth and battery consumption.

We need to protect data at rest - we need to also allow access. Think about a mechanic trying to access data from the canbus. 

We can look at turning our challenges into opportunities. Can we align disparate technologies? Could we orchestrate utilization and product strategy?  What if we could do device attestation at scale?   And make the orchestration of root of trust widely available?
The next trend is how cryptography can orchestrate control and management. Need to rely on standards and interops, automating and simplifying.

Need to have a way to do key distribution and association for Narrow band IoT sensors, communications infrastructure and device management.

We already have the fundamentals and knowledge - need to apply to IoT in a way that makes sense.

ICMC19: FIPS 140-2 and the Cloud


Alan Halachmi, Sr. Manager, Solutions Architecture, Amazon, United States

FIPS 140-2 came out in May 2001... think about that, that was before Facebook, Gmail, etc - and way before cloud computing.

Right now validating on the cloud is impossible, as level 1 requires single operator mode - not how you will find things set up in the cloud. In fact, an IG on Operational Environment specifically notes that You cannot use things like AWS, MS Azure or Google cloud.... )

But - someone like Amazon can validate one of their services, as they are the sole operator.

The security landscape is in constant flux - making it difficult to keep a module validated. Performance is often impacted in validated modules - which is not tenable for Amazon.

Amazon wanted a framework that would allow real time advancement from validated environment to validated environment. We want to make it clear that it's a multi-party environment, and with that comes shared responsibility, but would require minimal coordination and be applied consistently between different application models. As much as possible, what to leverage existing investments.
There needs to be focus on automation and defining of relationships. Vendors need the ability to run their own FIPS 140 testing, so they can be assured that any changes they are making have not caused issues - then they can also test performance, etc. Fortunately, ACVP is creating a method for doing this automated testing! NIST approved!

We should look to another model for validation. Think about our history - humans used to come up with hypothesis and then prove them. After the 1970s, humans came up with the hypothesis and machines provde them. Could machines do both?

Think about your surface area of your code - the most critcal code is in small areas (hypervisor, kernel, etc). Attackers have more time (OSes and machines deployed for years) and learned history from what worked in the past. Can we use formal methods for verification? Amazon has done one for TLS - it's on github.

ICMC19: Keynote: Mary Ann Davidson

The opening remarks included another great cartoon from Atsec, a tribute to the new ACVP (Automated Certificate Validation Program) - very funny!

Matt Keller gave us an update on the CMUF (Crypto Module User Forum). They have several working groups that are contributing to new implementation guidance from NIST. Their goals are to share information and help move the standards forward. NIST also comes and gives updates, and the forum provides a way to share ideas and suggestions on navigating a validation.

Mary Ann Davidson, Chief Security Officer, Oracle Corp., "Keeping Up with the Joneses". When Ms. Davidson started in security, nobody cared (except PTLGA - Paranoid Three Letter Government Agencies). There was hardly any 3rd party software, except for crypto. Nobody cared, so it was a quiet job - like the Maytag repair man....

But things are changing. SW and HW are ubiquitous - you can even have an Internet connected fridge. 66% of applications code is now open source... need to keep up and understand the landscape.

We need to keep up with new threats, market expectations, latest regulatory FDJ (framework du jour) and changes in the industry.

Hackers are moving towards hardware, so her ethical hacking team is focusing now on HW in addition to their SW work. HW hacking combined with IoT has greatly increased our area of attack.
Regulatory frameworks should not be tied to a specific technology or vendor. ("regulatory capture" - not a good thing).

Looking at market expectations - 3rd party code enables scarce resources to be used on innovation, not "building blocks" (ie cutting down trees to build your own house...). But, this creates a target for a hacker. Everyone loves free code, but nobody wants to invest in making it better.

Vendors need to know what is in their code - beware of 3rd party code that pulls in other code... need to understand it all. Should have fewer instances of 3rd party libraries in their code to minimize attack surfaces and simplify and lower cost of upgrade. That is, don't have 48 copies of one 3rd party library - have a central copy.

Department of Commerce (DoC) is working on a Software Bill of Materials - you will have to know, as a vendor, what is in your SW. But what does that buy you? Customers typically cannot replace third party libraries in code - they have a binary, or license forbids. Also, just having the vulnerable code doesn't mean you are using it in a vulnerable way. Lots of resources spent upgrading, even though it is irrelevant. Veracode noted that 95% of Java vulns in 3rd party code are NOT exploitable in the context of the application...

What could we do instead? Be honest with your customers. Describe how we use the code. fix the worst issues the fastest. Need to have a way to teach the scanners about usage - ie - not vulnerable as being used.

Changes in industry can be distracting: "On prem/waterfall is so last year...". Need to keep the meaningful aspects as we move on, timelines still matter. We have to think about how long things like validations take - can't do all SW at the same time. Need to do the most relevant and do it as efficiently as possible.

We need cloud agility in certifications. NIST has 2 working groups looking at doing FIPS for crypto in the cloud, but we need to do it faster.

Perfect storm of increased regulatory scrutiny and increased use of technology has led to greater risk management inquiries. Need to asses relevant risk management concerns. You wouldn't want or need to inspect a day care provider's vacation home... not relevant.

People have asked Ms Davidson for things like: "we have the right to pen test any system in your network" "need patching status of every system in your network" ... etc.

This is problematic because it's not germane to her particular risk management concerns. For example, she's often asked about "3 day patching" - even though the person asking knows it's not possible, but they still want it in a contract...

Mary Ann apparently makes a really good rhubarb crisp, but she's not going to force it as a standard... so don't ask her to do a non-standard certification, either. (though you may want to have some of her rhubarb crisp....)

Vendors need to be more public with what they are doing, otherwise customers will assume you're not doing something. Set up clear rules of engagement - makes the questions more relevant and the discussions more fruitful. Keep in mind that anything vague will be misinterpreted - needs to be challenged.

Remember - change is inevitable, embrace it and OWN it. Don't let others own the change agenda, or you won't like the result. Use only globally accepted standards where feasible instead of one-off "wants". Economics rule the world - know it, use it, own it!

Tuesday, October 9, 2018

BH18: Why so Spurious? How a Highly Error-Prone x86/x64 CPU "Feature" can be Abused to Achieve Local Privilege Escalation on Many Operating Systems

Nemanja Mulasmajic  and Nicolas Peterson are Anti-Cheat Engineers at Riot Games.

This is about a hardware feature available in Intel and ARM chips. The “feature” can be abused to achieve local privilege escalation.

CVE-2018-8897 – this is a local priv escalation – read and write kernel memory from usermode. Execute usermode code with kernel privileges. Affected Windows, Linux, MacOS, FreBSD and some Xen configurations.

To fully understand this, you’ll need to have some good assembly knowledge and privilege models. In the standard model, Ring 1 and 2 are really never used, just Ring 3 (least privileged) to Ring 0 (most) (it is a simplified view).

Hardware breakpoints cannot typically be sent by userland, though there are often ways to do it in syscalls. When an interrupt fires, it transfers execution to an interrupt handler. Lookup is based off of the interrupt descriptor table (IDT), which is registered by the OS.

Segmentation is a vestigial part of the x86 architecture now that everything leverages paging.  You can still set arbitrary base addresses.  The first 2 bits describe if you’re in kernel or user mode. Depending on the mode of execution, the GS base means different things (it holds data structures relevant to the mode of execution). If we’re coming from user mode, we need to call SWAPGS to update to the equiv in kernel mode.

MOV SS and POP SS force the processor to disable external interrupts, NMIs and pending debug exceptions until the boundary of the instruction following the SS load was reached. The intended purpose was to prevent an interrupt from firing immediately after loading SS but before loading a stack pointer.

It was discovered while building a VM detection mechanism, as VMs were being used to attack Anti-Cheat.. They thought – what if VMEXIT occurs during a  “blocking” period? Let’s follow the CPUID… They started thinking about what would happen if they did interrupts at unexpected times.
So, what happens? Why did his machine crash?  Before KiBreakpointTrap executes its first instructions, the pending #DB is fired (which was suppressed by MOV SS) and execution redirects to where KiBreakpointTrap, which sends execution back to where it *thought* it should go – kernel (though it had come from user mode).

Code can be found at github.com/nmulasmajic, if you aren’t passed, system will crash.  Showed demo of 2 lines of assembly code putting a VM into a deadlock.

They can avoid SWAPGS since Windows thinks they are coming from kernelmode.  WRGSBASE writes to the GSBASE address, so use that!

They fired a #DB exception at unexpected location, and then the kernel becomes confused. Handler thinks they are privileged, now they control GSBASE.  Now they just need to find instructions to capitalize on this…

Erroneously assumed there was no encoding for MOV SS, [RAX] only immediate. It doesn’t dereference memory, but POP SS does dereference stack memory. BUT… POP SS is only valid in 32-bit compatibility code segment. On Intel chips, SYSCALL cannot be used in compatibility mode. So… focusing on using INT # only.

With the goal of writing memory, found that if they caused a page fault (KiPageFualt) from kernelmode, they c ould call KeBugCHeckEx again.  This function dereferences GSBASE memory, which is under their control…

It clobbers surrounding memory. Had to make one CPU “stuck” to deal with writing to target location. Chose CPU1 since CPU0 had to service other incoming interrupts from APIC. CPU1 endlessly page faults, goes to the double fault handler when it runs out of stack space.

The goal was to load an unsigned driver. CPU0 does the driver loading. They attempted to send TLB shootdowns, forcing CPU0 to wait on the other CPUs by checking PacketBaerrier variable in its _KPCR. But, CPU1 is in a dead spin… will never respond. But, “luckily” there was a pointer leak in the +KPCR for any CPU, accessible from usermode. (the exploit does require a minimum of 2 CPUS).

It is complicated, and it took the researchers more than a month to make it work. So, they looked into the syscall handler – KiSystemCall64. They registered in the IA32_LSTAR MSR. SYSCALL, unlike INT #, will not immediately swap to kernel – actually made things easier. (Syscall funcions similar to Int 3)

Another cool demo J

A lot of this was patched in May. MS was very quick to respond, and most OSes should be patched by now. You can’t abuse SYSCALL anymore.

Lessons learned – want to make money on bug bounty? You need a cool name and a good graphic for your vuln (pay a designer!), and don’t forget a good soundtrack!

BH18: How I Learned to Stop Worrying and Love the SBOM

Allan Friedman  | Director of Cybersecurity, NTIA / US Department of Commerce

Vendors need to understand what they are shipping to the customer, need to understand the risks in what is going out the door. You cannot defend what you don’t know. Think about ingredients list on a box – if you know you have an allergy, you can simply check the ingredients and make a decision. Why should software/hardware we ship be any different?

There had been a bill before congress, requesting that there always be an SBOM (SW Bill of Materials) for anything the US Government buys – so they know what they are getting and how to take care of it. The bill was DoA, but things are changing…

The Healthcare Sector has started getting behind that. Now people in FDA and Washington are concerned about the supply chain. There should not be health care way of doing this, automotive way of doing this, DoD way of doing this… there should be one way.   That’s where the US Department 
of Commerce comes in.  We don’t want this coming from a single sector.

Committees are the best way to do this – they are consensus based. That means it is stakeholder driven, no single person can derail. Think about it like “I push, but I don’t steer”.

We need Software Component Transparency. We need to compile the data, share it and use it.  Committee kicked off on July 19 in DC. Some folks believe this is a solved problem, but how do we make sure the existing data is machine readable? We can’t just say ‘use grep’. Ideally it could hook into tools we are already using.

First working group is tackling defining the problem. Another is working on case studies and state of practice. Others on standards and formats, healthcare proof of concept, and others.

We need more people to understand and poke at the idea of software transparency – it has real potential to improve resiliency across different sectors.

BH18: Keynote! Parisa Tabriz

Jeff Moss, founder of Blackhat, started out the first session at the top of the conference, noting several countries have only one person from their country here – Angola, Guadalupe, Greece, and several others. About half of the world’s countries are represented here this year! Blackhat continues to offer scholarships to encourage a younger audience to attend, who may not be able to afford to. Over 200 scholarships were awarded this year!

To Jeff, it feels like the adversaries have strategies, and we have tactics – that’s creating a gap. Think about address spoofing – it’s allowed and turned on on popular mobile devices by default, though most consumers don’t know what it is and why they should turn it off.

With Adobe Flash going away, beliefs out there are this will increase SPAM and change that landscape. We need to think about that.

Parisa Tabriz, Director of Engineering, Google.
Parisa has worked as a pen tester, engineer and more recently as a manager. She has often felt she was playing a game of “whack-a-mole” – how do we get away from this? Where the same vuln (or a trivial variation of another vuln) pops up over and over. We have to be more strategic in our defense.
Blockchain is not going to solve our security problems. (no matter what the vendors in the expo tell you…)

It is up to us to fix these issues. We can make great strides here – but we have to realize our current approach is insufficient

We have to tackle the root cause, pick milestones and celebrate and build out your coalition.  We need to invest in bold programs – building that coalition with people outside of the security landscape.

We cannot be satisfied with just fixing vulnerabilities. We need to explore the cause and effect – what causes these issues.

Imagine a remote code execution (RCE) is found in your code – yes, fix it, but figure out why it was introduced (the 5 Whys)

Google has started Project Zero – Make 0-Day Hard. Project Zero was formed in 2014, treats Google products like 3rd party. Finding thousands of vulnerabilities. But they want to achieve the most defensive impact from any vulnerabilities they find.

Team found that vendor response varied wildly in the industry – and it never really aligned with consumer needs. There is a power imbalance between security researcher and the big companies making the software. Project Zero has set a 90 day release time line, which has removed the negotiation between a researcher and the big company. A deadline driven approach causes pain for the larger organizations that need to make big changes – but it is leading to positive change at these companies. They are rallying and making the necessary fixes internally.

One vendor improved their patch response time by as much as 40%! 98% of the issues are fixed within the 90-day disclosure period – a huge change!  Unsure what all of those changes are, but guessing it’s improved processes, creating security response teams, etc.

If you care about end user security, you need to be more open. More transparency in Project Zero has allowed for more collaboration.

We all need to increase collaboration – but this is hard with corporate legal, process and policies. It’s important that we work to change this culture.

The defenders are our unsung heroes – they don’t win awards, often are not even recognized at their office. If they do their job well, nobody notices.

We lose steam in distraction driven work environments. We have to project manage, and keep driving towards this goal.

We need to change the status quo – if you’re not upsetting anyone, then you’re not going to change the status quo.

One project Google is doing to change the world is to move people away from HTTP and to HTTPS on the web platform.  Not just Google services, but the entire world wide web.  We wanted to see a web that was by default secure – not opt-in secure. The old Chrome browser didn’t make this as obvious to users which was the better website – something to work on.

Browser standards come from many standards bodies, like IETF, W3C, ISO, etc – and then people build browsers on top of those using their own designs. Going to HTTPS is not as simple as flipping a switch – need to worry about getting certificates, performance, managing the security, etc.

Did not want to create warning fatigue, or to have it be inconsistently reported (that is, a site reported as insecure on Chrome, but secure on another browser).

Needed to roll out these changes gradually, with specific milestones we could celebrate. Started with a TLSHaiku poetry competition, which led to brainstorming.  Shared ideas publicly, got feedback from all over, and helped to build support internally at Google to drive this. Published a paper on how to best warn users.  Published papers regarding who was and was not using HTTPS. 

Started a grass root effort to help people migrate to HTTPS. Celebrated big conversions publicly, recognizing good actors.  Vendors were given a deadline to transition to, with clear milestones to work against, and could move forward. Had to work with certificate vendors to make it easier and cheaper to get certificates.

Team ate homemade HTTPS cake and pie! It is important to celebrate accomplishments, acknowledge the difficult work done. People need purpose – it will drive and unify them.

Chrome set out with an architecture that would protect a malicious site from attacking your physical machine. But, now with lots of data out there in the cloud, has grown the cross site data attacks.  Google’s Chrome team started the Site Isolation project in 2012 that prevented the data from moving that way.

We need to continue to invest in ambitious proactive defensive projects.

Projects can fail for a variety of reasons – management can kill the project, for example.  The site isolation project was originally estimated to be a year, but it actually took six….. schedule delay at that level puts a bulls-eye on you.  Another issue could be lack of peer support – be a good team player and don’t be a jerk!