Wednesday, July 26, 2017

BHUSA17: How We Created the First SHA-1 Collision and What It Means for Hash Security

Elie Bursztein

There are two other types of attacks, besides collision attack, and those are pre-image attacks (not feasible at this time).

When doing this attack, could not use bruteforce as you would need millions of years of bruteforcing with a GPU - we don't have that much time. Better to use cryptanalysis!

Before you can attack, you must understand how a hash function is constructed. In SHA1, we use a Merkle-Damgard construction. This works by taking the first block of the file, adding an IV and perform compression. Then go block by block until you get to the end and you have a short string of compression that is related to the file.

We got  a cool picture of what SHA1 compression looks like "unrolled". To do cryptanlysis, need to look at the messages differential path and the equation system (we have solved 16 steps and that makes it predictable to step 24).

It turns out it is easier to do collision wit two blocks instead of 1 block. Find two blocks that have almost the same hash for a near collision, then resolve the difference to get a full collision.

So - how do you exploit this? Look at a fixed prefix attack for SHA1. Finda prefix before you find a collision. If you carefully choose the prefix, you can improve the attack.

Let's take a look at real world attacks exploiting MD5. In 2009 there was an SSL certificate forgery. This was exploited by leveraging wildcard (instead of domain name) and adding the old public key and signature as a "comment" in the new certificate.

In 2012 there was massive malware called "Flame" that was used to spy on Iranian computers with fake windows update certificates. The collision in practice used 4 blocks, instead of 2. Shows that the attackers had their own cryptographer.

As of today, MD5 is broken - you can create collisions on your cell phone. For SH1 you still need a lot of time with current computing power.

So - how do create a new collision? Choose carefully what prefix you want, as you cannot change it after the fact. Through 2015 and 2016, worked on near-collision blocks. Used about 3000 CPUS for about 8 months to calculate a near collision. Throughout 2016 worked on a full collision attack and 2017 the attack was completed.

You have to find a tradeoff between failure and efficiency and continue to scale the computation. This team did it in 1 hour batches.

In 2016 found their first collision - had to spend a few days analyzing it, and they found a problem. It wasn't usable for finding a full collision.

The team had to make efficient use of GPUs. Work step by step to generate enough solutions for the next step, always try to work at the highest step and backtrack when pool empty.  Also, the work was parallelized: one thread and one solution.

The new attack is called Shattered - it takes 110 GPU for 1 year, which is less than 12milion years for bruteforce. We saw a demo of getting two different PDF files to hash to the same SHA1 hash (but still have different SHA2 hashes)

This is already being exploited, for example on WebKit.  A developer submitted a test to prove WebKit was resistant but tripped an unforeseen bug in SVN and took the site down.

The collision is in the wild - what to do with the legacy software? SHA1 is deeply integrated into GIT - how can they protect themselves? They can test for collisions and has neglible false positives.

The takeaway: SHA1 is dead. Do not do new deployments with it and try to move away from it. We need to continue doing counter-cryptanalysis and keep in mind hash diversity.