Thursday, November 5, 2015

ICMC15: Improved Approaches to Online Health Testing in SP800-90 RNGs

David Johnston, Hardware Security Architect, Intel

There are two basic types of RNGs: big and fast, small and slow.

You need to do online testing to be sure things are actually random, like a statistical test for nondeterministic part. Also need a logic integrity test for the deterministic part : BIST, SCAN, KAT

You can only test for a broken state - strong bias or a strong serial correlation coefficient (SCC).

Min entropy tests are too slow and data hungry to do online. All patterns are equally likely (even all zeros).  But, some patterns are characteristic of a broken state: strong bias or a strong serial correlation coefficient (SCC).

A nice test or "broken - maybe" Note the binomial distribution of short and long patterns in a number of fully random bits. Set bounds for each, then measure each over sample and check they are wall within the bounds. If outside, tag as unhealthy.  The bounds determine the false positive rate.

The advantage - it's cheap. A shift register, 6 comparators and 6 counters.  Spots all repeating patterns up to 6 bits in length and detects bias and correlation. Highly bimodal with stationary data of some bias and auto correlation. Intel CPUs do this over 256 bit samples and aims for 1% false positive.

For an entropy source OHT, what's an error when all patterns are equally likely?

The lower false positive rate means you'll like have a high false negative rate.  how do you know if this is really broken or not?

There's a basic principal: never throw away entropy. - Margaret Salter. If you discard the unhealthy tagged samples, you reduce the entropy. If you accept unhealthy tagged samples, you risk false negatives.  Simple: extract with output = MAC (last_output || MAC(Xi || ... || Xi+r), where n is the number of samples and  is the number of samples that contain the necessary number of healthy tagged samples. Also, mixed in are the unhealthy samples that aren't counted.  Suspicous of MACing over variable field length? See his references (#7).

Over many samples, remember just the 1 bit tag per samples, allos a test over lots of data without huge amounts of memory.  Count the N Healthy:Unhealthy ratio in the last M samples.  Intel CPUS have M=256 -> so the history statistic is over 64Kibit of data..

If it drops below the threshold of goodness, detect and offline.  Ideally, it never happens.

What makes pool feedback good? E.G. Intel's CPUs demand 768 bits of healthy entropy, MACed to 256 its of full entropy, but all intervening unhealthy samples are mixed in, so no entropy is thrown away and occasional false positives don't raise error response.

What's wrong with SP800-90C?

The ENRBG output offers a superset of the DRBG's cryptographic properties. Full entropy vs prediction computational complexity respectively So, a general purpose RNG needs both: DRBG for performance and ENRGBG for arbitrary strength keys and seeding..

The oversampling construction kills performance of DRBG output by forcing intervening reseeds. Unless you put in two DRBGs, doubling the area, doubling the failure rate...

You can add a BIW extractor, and a DRBG for XOR construction.

We wnat better reliability and performance without DRBG. A modern ful entropy ES+Extracot has higher bits/clock/um^2 than an AES-CTR-DRBG.  Best case of 128/10 clocks/31K gate equivalents = 4.13E-4bits per clock per gate (asymptotic as the gen:update ration -> 1.0).

And then we have FIPS 140-2, which has required tests that go beyond SP 800-90A. If you follow 4.9.2, which includes a data modifying test and throwing away data. This yields a radom stream trivially distinguishable from random. No 16 bit equal pairs in 1MByte data = Definitely not random, 16 are expected. It creates algebraic invariates Xi != Xi+1 for all output values, reducing entropy and helping algebraic attacks.

Intel refused to put this in its silicon because it may be a back door. The risk to Intel of having a back door is greater than the cost of not being FIPS compliant.

ISO 19790-2012 removed this test - so, let's hurry up with FIPSf 140-3.

But, still source constrained devices can't have hardware FIPS compliant RNGs because of the DRBG requirement.

Pool structures that use health tagging allow appropriate adaptive responses to entropy source failure and degrdation behaviour and instantaneous response to instantaneous ES failure.

The DRBG requirements of SP800-90C lead to a reduction in reliability and/or efficiency of RNGs and prevent SP800-90 compliant full entropy hardwar RNGs in resource constrained situations. FIPS 140-2 makes this worse.

Johnston has seen systems getting when to block flipped - that is , blocking when they have high entropy and not blocking when their entropy is low.  This is not a good plan, and has led to published papers.

The kernel's job is to do the mixing. While he knows that Intel's entropy source doesn't have a back door, but not everyone else knows that - or know that about their other sources - so, mixing is good.
Post by Valerie Fenwick, syndicated from Security, Beer, Theater and Biking!