random(4) changes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* random(4) changes
@ 2016-04-22 22:27 Sandy Harris
  2016-04-23  7:52 ` Stephan Mueller
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Sandy Harris @ 2016-04-22 22:27 UTC (permalink / raw)
  To: LKML, linux-crypto
  Cc: Theodore Ts'o, Jason Cooper, John Denker, H. Peter Anvin,
	Stephan Mueller

Stephan has recently proposed some extensive changes to this driver,
and I proposed a quite different set earlier. My set can be found at:
https://github.com/sandy-harris

This post tries to find the bits of both proposals that seem clearly
worth doing and entail neither large implementation problems nor large
risk of throwing out any babies with the bathwater.

Unfortunately, nothing here deals with the elephant in the room -- the
distinctly hard problem of making sure the driver is initialised well
enough & early enough. That needs a separate post, probably a separate
thread. I do not find Stepan's solution to this problem plausible and
my stuff does not claim to deal with it, though it includes some
things that might help.

I really like Stephan's idea of simplifying the interrupt handling,
replacing the multiple entropy-gathering calls in the current driver
with one routine called for all interrupts. See section 1.2 of his
doc. That seems to me a much cleaner design, easier both to analyse
and to optimise as a fast interrupt handler. I also find Stephan's
arguments that this will work better on modern  systems -- VMs,
machines with SSDs, etc. -- quite plausible.

Note, though, that I am only talking about the actual interrupt
handling, not the rest of Stephan's input handling code: the parity
calculation and XORing the resulting single bit into the entropy pool.
I'd be happier, at least initially, with a patch that only implemented
a single-source interrupt handler that gave 32 or 64 bits to existing
input-handling code.

Stephan: would you want to provide such a patch?
Ted: would you be inclined to accept it?

I also quite like Stephan's idea of replacing the two output pools
with a NIST-approved DBRG, mainly because this would probably make
getting various certifications easier. I also like the idea of using
crypto lib code for that since it makes both testing & maintenance
easier. This strikes me, though, as a do-when-convenient sort of
cleanup task, not at all urgent unless there are specific
certifications we need soon.

As for my proposals, I of course think they are full of good ideas,
but there's only one I think is really important.

In the current driver -- and I think in Stephan's, though I have not
looked at his code in any detail, only his paper -- heavy use of
/dev/urandom or the kernel get_random_bytes() call can deplete the
entropy available to /dev/random. That can be a serious problem in
some circumstances, but I think I have a fix.

You have an input pool (I) plus a blocking pool (B) & a non-blocking
pool (NB). The problem is what to do when NB must produce a lot of
output but you do not want to deplete I too much. B & NB might be
replaced by DBRGs and the problem would not change.

B must be reseeded before very /dev/random output, NB after some
number of output blocks. I used #define SAFE_OUT 503 but some other
number might be better depending how NB is implemented & how
paranoid/conservative one feels.

B can only produce one full-entropy output, suitable for /dev/random,
per reseed but B and NB are basically the same design so B can also
produce SAFE_OUT reasonably good random numbers per reseed. Use those
to reseed NB.and you reduce the load on I for reseeding NB from
SAFE_OUT (use I every time NB is reseeded) to SAFE_OUT*SAFE_OUT (use I
only to reseed B).

This does need analysis by cryptographers, but at a minimum it is
basically plausible and, even with some fairly small value for
SAFE_OUT, it greatly alleviates the problem.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-22 22:27 random(4) changes Sandy Harris
@ 2016-04-23  7:52 ` Stephan Mueller
  2016-04-24  2:03 ` Theodore Ts'o
  2016-04-25 16:06 ` Andi Kleen
  2 siblings, 0 replies; 38+ messages in thread
From: Stephan Mueller @ 2016-04-23  7:52 UTC (permalink / raw)
  To: Sandy Harris
  Cc: LKML, linux-crypto, Theodore Ts'o, Jason Cooper, John Denker,
	H. Peter Anvin

Am Freitag, 22. April 2016, 18:27:48 schrieb Sandy Harris:

Hi Sandy,

> Stephan has recently proposed some extensive changes to this driver,
> and I proposed a quite different set earlier. My set can be found at:
> https://github.com/sandy-harris
> 
> This post tries to find the bits of both proposals that seem clearly
> worth doing and entail neither large implementation problems nor large
> risk of throwing out any babies with the bathwater.
> 
> Unfortunately, nothing here deals with the elephant in the room -- the
> distinctly hard problem of making sure the driver is initialised well
> enough & early enough. That needs a separate post, probably a separate
> thread. I do not find Stepan's solution to this problem plausible and
> my stuff does not claim to deal with it, though it includes some
> things that might help.

Interesting, I thought I solved the issue. But if you think it is not solved, 
let us cover that in a separate thread.
> 
> I really like Stephan's idea of simplifying the interrupt handling,
> replacing the multiple entropy-gathering calls in the current driver
> with one routine called for all interrupts. See section 1.2 of his
> doc. That seems to me a much cleaner design, easier both to analyse
> and to optimise as a fast interrupt handler. I also find Stephan's
> arguments that this will work better on modern  systems -- VMs,
> machines with SSDs, etc. -- quite plausible.
> 
> Note, though, that I am only talking about the actual interrupt
> handling, not the rest of Stephan's input handling code: the parity
> calculation and XORing the resulting single bit into the entropy pool.
> I'd be happier, at least initially, with a patch that only implemented
> a single-source interrupt handler that gave 32 or 64 bits to existing
> input-handling code.
> 
> Stephan: would you want to provide such a patch?

Sure, if this is the will if the council, I will see it done.

> Ted: would you be inclined to accept it?
> 
> I also quite like Stephan's idea of replacing the two output pools
> with a NIST-approved DBRG, mainly because this would probably make
> getting various certifications easier. I also like the idea of using
> crypto lib code for that since it makes both testing & maintenance
> easier. This strikes me, though, as a do-when-convenient sort of
> cleanup task, not at all urgent unless there are specific
> certifications we need soon.
> 
> As for my proposals, I of course think they are full of good ideas,
> but there's only one I think is really important.
> 
> In the current driver -- and I think in Stephan's, though I have not
> looked at his code in any detail, only his paper -- heavy use of
> /dev/urandom or the kernel get_random_bytes() call can deplete the
> entropy available to /dev/random. That can be a serious problem in
> some circumstances, but I think I have a fix.

To quote from my paper:

"""
When the secondary DRBG requests a reseeding from the primary DRBG and
the primary DRBG pulls from the entropy pool, an emergency entropy level
of 512 bits of entropy is left in the entropy pool. This emergency entropy is
provided to serve /dev/random even while /dev/urandom is stressed.
"""

Note, the 512 bits are chosen arbitrarily and can be set at compile time to 
any other value with LRNG_EMERG_POOLSIZE. If needed, we can even make this 
runtime-configurable.
> 
> You have an input pool (I) plus a blocking pool (B) & a non-blocking
> pool (NB). The problem is what to do when NB must produce a lot of
> output but you do not want to deplete I too much. B & NB might be
> replaced by DBRGs and the problem would not change.
> 
> B must be reseeded before very /dev/random output, NB after some
> number of output blocks. I used #define SAFE_OUT 503 but some other
> number might be better depending how NB is implemented & how
> paranoid/conservative one feels.
> 
> B can only produce one full-entropy output, suitable for /dev/random,
> per reseed but B and NB are basically the same design so B can also
> produce SAFE_OUT reasonably good random numbers per reseed. Use those
> to reseed NB.and you reduce the load on I for reseeding NB from
> SAFE_OUT (use I every time NB is reseeded) to SAFE_OUT*SAFE_OUT (use I
> only to reseed B).
> 
> This does need analysis by cryptographers, but at a minimum it is
> basically plausible and, even with some fairly small value for
> SAFE_OUT, it greatly alleviates the problem.


Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-22 22:27 random(4) changes Sandy Harris
  2016-04-23  7:52 ` Stephan Mueller
@ 2016-04-24  2:03 ` Theodore Ts'o
  2016-04-24  8:03   ` Stephan Mueller
  2016-04-25 16:06 ` Andi Kleen
  2 siblings, 1 reply; 38+ messages in thread
From: Theodore Ts'o @ 2016-04-24  2:03 UTC (permalink / raw)
  To: Sandy Harris
  Cc: LKML, linux-crypto, Jason Cooper, John Denker, H. Peter Anvin,
	Stephan Mueller

On Fri, Apr 22, 2016 at 06:27:48PM -0400, Sandy Harris wrote:
> 
> I really like Stephan's idea of simplifying the interrupt handling,
> replacing the multiple entropy-gathering calls in the current driver
> with one routine called for all interrupts. See section 1.2 of his
> doc. That seems to me a much cleaner design, easier both to analyse
> and to optimise as a fast interrupt handler.

The current /dev/random driver *already* has a fast interrupt handler,
and it was designed specifically to be very fast and very lightweight.

It's a fair argument that getting rid of add_disk_randomness()
probably makes sense.  However, add_input_randomness() is useful
because it is also mixing in the HID input (e.g., the characters typed
or the mouse movements), and that is extremely valuable and I wouldn't
want to get rid of this.

> In the current driver -- and I think in Stephan's, though I have not
> looked at his code in any detail, only his paper -- heavy use of
> /dev/urandom or the kernel get_random_bytes() call can deplete the
> entropy available to /dev/random. That can be a serious problem in
> some circumstances, but I think I have a fix.

So /dev/urandom, or preferentially, the getrandom(2) system call,
which will block until the entropy pool is initialized, is designed to
be a CRNG.  We use the entropy accounting for the urandom pool as a
hueristic to know how aggressively to pull the random pool and/or
things like hwrandom (since pulling entropy from the TPM does have
costs, for example power utilization for battery-powered devices).

We already throttle back how much we pull from the input pool if it is
being used heavily, specifically to avoid this problem.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-24  2:03 ` Theodore Ts'o
@ 2016-04-24  8:03   ` Stephan Mueller
  2016-04-26  3:07     ` Theodore Ts'o
  0 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-24  8:03 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Sandy Harris, LKML, linux-crypto, Jason Cooper, John Denker,
	H. Peter Anvin

Am Samstag, 23. April 2016, 22:03:23 schrieb Theodore Ts'o:

Hi Theodore,

> On Fri, Apr 22, 2016 at 06:27:48PM -0400, Sandy Harris wrote:
> > I really like Stephan's idea of simplifying the interrupt handling,
> > replacing the multiple entropy-gathering calls in the current driver
> > with one routine called for all interrupts. See section 1.2 of his
> > doc. That seems to me a much cleaner design, easier both to analyse
> > and to optimise as a fast interrupt handler.
> 
> The current /dev/random driver *already* has a fast interrupt handler,
> and it was designed specifically to be very fast and very lightweight.

I agree here. The only challenge with the current implementation is the time 
the fast_pool is to be mixed into an entropy pool. This requires a lock and 
quite some code afterwards.

I tried hard to avoid such additional code paths in my LRNG.
> 
> It's a fair argument that getting rid of add_disk_randomness()
> probably makes sense.  However, add_input_randomness() is useful
> because it is also mixing in the HID input (e.g., the characters typed
> or the mouse movements), and that is extremely valuable and I wouldn't
> want to get rid of this.

In addition to the request to remove the Jitter RNG, I also have added support 
for the add_input_randomness function into the LRNG. I will release the code 
shortly.

When dropping the add_disk_randomness function in the legacy /dev/random, I 
would assume that without changes to add_input_randomness and 
add_interrupt_randomness, we become even more entropy-starved. The entropy 
heuristic for add_interrupt_randomness cannot be re-valued to a higher level 
because the time stamp for the HID is still processed as part of 
add_input_randomness -- i.e. there is still a high correlation between the 
processed values of add_interrupt_randomness and add_input_randomness. Thus, 
all events received for block devices are now valued at most with 1/64th bits 
of entropy when dropping add_disk_randomness (which partially used to be 
valued higher).

Thus, I tried with the LRNG to implement add_input_randomness as follows: it 
only picks up the key numbers or mouse coordinates but no timestamp. Those are 
mixed into the entropy pool without crediting any entropy. The reason for not 
crediting any entropy is that an unprivileged user knows the keys he pressed. 
Thus the user is an observer and a potential attacker that has full knowledge 
of an input value into the entropy pool. As each HID event is also processed 
as an interrupt, the interrupt processing of the LRNG will credit close to one 
bit of entropy for each HID event due to the interrupt timing nonetheless.
> 
> > In the current driver -- and I think in Stephan's, though I have not
> > looked at his code in any detail, only his paper -- heavy use of
> > /dev/urandom or the kernel get_random_bytes() call can deplete the
> > entropy available to /dev/random. That can be a serious problem in
> > some circumstances, but I think I have a fix.
> 
> So /dev/urandom, or preferentially, the getrandom(2) system call,
> which will block until the entropy pool is initialized, is designed to
> be a CRNG.  We use the entropy accounting for the urandom pool as a
> hueristic to know how aggressively to pull the random pool and/or
> things like hwrandom (since pulling entropy from the TPM does have
> costs, for example power utilization for battery-powered devices).
> 
> We already throttle back how much we pull from the input pool if it is
> being used heavily, specifically to avoid this problem.

Agreed. And this exact behavior I tried to replicate into the LRNG (the 
blocking of getrandom and the emergency entropy for /dev/random). The 
processing leaves an "emergency" level of entropy in the pool that is 
inaccessible to /dev/urandom. Furthermore, when a call to /dev/random drains 
the entropy pool completely, the emergency entropy is replenished first before 
/dev/urandom gets new entropy from the entropy pool.

Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-24  8:03   ` Stephan Mueller
@ 2016-04-26  3:07     ` Theodore Ts'o
  2016-04-26 11:04       ` Herbert Xu
                         ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Theodore Ts'o @ 2016-04-26  3:07 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Sandy Harris, LKML, linux-crypto, Jason Cooper, John Denker,
	H. Peter Anvin, Andi Kleen

On Sun, Apr 24, 2016 at 10:03:45AM +0200, Stephan Mueller wrote:
> 
> I agree here. The only challenge with the current implementation is the time 
> the fast_pool is to be mixed into an entropy pool. This requires a lock and 
> quite some code afterwards.

This only happens no more than once every 64 interrupts, and we don't
actually block waiting for the lock (we use a spin_trylock, and we
skip mixing to the next interrupt if it is locked).  I've done a lot
of careful benchmarking of the cycles used.

> When dropping the add_disk_randomness function in the legacy /dev/random, I 
> would assume that without changes to add_input_randomness and 
> add_interrupt_randomness, we become even more entropy-starved.

Sure, but your system isn't doing anything magical here.  The main
difference is that you assume you can get almost a full bit of entropy
out of each interrupt timing, where I'm much more conservative and
assume we can only get 1/64th of a bit out of each interrupt timing.
(e.g., that each interrupt event may have some complex correlation
that is more sophisticated than what a "stuck bit" detector might be
able to detect.)

Part of the reason why I've been very conservative here is because not
all ARM CPU's provide access to a high speed counter.  Using the IP
and other CPU registers as a stop-gap is not great, but it is better
than just using jiffies (which you seem to assume the /dev/random
driver is doing; this is not true, and this is one of the ways in
which the current system is better than your proposed LRNG, and why
I'm not really fond of major "rip and replace" patches --- it's likely
such a approach will end up making things worse for some systems, and
I don't true the ARM SOC or embedded/mobile vendors to chose the
kernel configuration sanely in terms of "should I use random number
generator 'A' or 'B' for my system?).

The other big difference is you aren't allowing anyone to extract from
the primary entropy pool (/dev/random) until it is has a chance to
fully initialize the /dev/urandom pool, which is a good thing to do,
and something that's not hard to do without doing a complete rip and
replace of the RNG.  So I'll look at adding that to the /dev/random driver.

Yet another difference which I've noticed as I've been going over the
patches is that that since it relies on CRYPTO_DRBG, it drags in a
fairly large portion of the crypto subsystem, and requires it to be
compiled into the kernel (instead of being loaded as needed as a
module).  So the people who are worrying about keeping the kernel on a
diet aren't going to be particularly happy about this.

I've thought about using a CRNG for the secondary pool, which would be
a lot smaller and faster as far as random number extraction.  But the
concern I have is that I don't want to drag in the whole generalized
crypto subsystem just for /dev/random.  If we make it too heavyweight,
then there will be pressure to make /dev/random optional, which would
mean that application programs can't depend on it and some device
manufacturers might be tempted to make it disappear for their kernels.

So my preference if we want to go down this path is to use a CRNG
based on something like Twofish, which is modern, still unbroken, and
is designed to be implemented efficiently in software in a small
amount (both in terms of text and data segments).  This would then
make it realtively efficient to use per-CPU CRNG's, in order to to
satisfy Andi Kleen's concern about making /dev/urandom efficient for
crazy programs that are trying to extract a huge amounts of data out
of /dev/urandom on a big multi-socket system.  And I would do this
with a hard-wired system that avoids dragging in the crypto system to
to keep the Linux tinification folks happy.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-26  3:07     ` Theodore Ts'o
@ 2016-04-26 11:04       ` Herbert Xu
  2016-04-26 20:47         ` Andi Kleen
  2016-04-26 18:24       ` Stephan Mueller
  2016-04-26 18:44       ` Pavel Machek
  2 siblings, 1 reply; 38+ messages in thread
From: Herbert Xu @ 2016-04-26 11:04 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: smueller, sandyinchina, linux-kernel, linux-crypto, jason, jsd,
	hpa, andi

Theodore Ts'o <tytso@mit.edu> wrote:
>
> Yet another difference which I've noticed as I've been going over the
> patches is that that since it relies on CRYPTO_DRBG, it drags in a
> fairly large portion of the crypto subsystem, and requires it to be
> compiled into the kernel (instead of being loaded as needed as a
> module).  So the people who are worrying about keeping the kernel on a
> diet aren't going to be particularly happy about this.

As the IPv4 stack now selects CRYPTO_AES, the crypto system will
be pulled into your kernel anyway unless you can live without IPv4.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-26 11:04       ` Herbert Xu
@ 2016-04-26 20:47         ` Andi Kleen
  2016-04-27  4:23           ` Herbert Xu
  0 siblings, 1 reply; 38+ messages in thread
From: Andi Kleen @ 2016-04-26 20:47 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Theodore Ts'o, smueller, sandyinchina, linux-kernel,
	linux-crypto, jason, jsd, hpa, andi

On Tue, Apr 26, 2016 at 07:04:15PM +0800, Herbert Xu wrote:
> Theodore Ts'o <tytso@mit.edu> wrote:
> >
> > Yet another difference which I've noticed as I've been going over the
> > patches is that that since it relies on CRYPTO_DRBG, it drags in a
> > fairly large portion of the crypto subsystem, and requires it to be
> > compiled into the kernel (instead of being loaded as needed as a
> > module).  So the people who are worrying about keeping the kernel on a
> > diet aren't going to be particularly happy about this.
> 
> As the IPv4 stack now selects CRYPTO_AES, the crypto system will
> be pulled into your kernel anyway unless you can live without IPv4.

I posted patches to fix this. At some point it definitely has to be.

-Andi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-26 20:47         ` Andi Kleen
@ 2016-04-27  4:23           ` Herbert Xu
  0 siblings, 0 replies; 38+ messages in thread
From: Herbert Xu @ 2016-04-27  4:23 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Theodore Ts'o, smueller, sandyinchina, linux-kernel,
	linux-crypto, jason, jsd, hpa

On Tue, Apr 26, 2016 at 01:47:09PM -0700, Andi Kleen wrote:
>
> I posted patches to fix this. At some point it definitely has to be.

Can you point me to the patch submission?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-26  3:07     ` Theodore Ts'o
  2016-04-26 11:04       ` Herbert Xu
@ 2016-04-26 18:24       ` Stephan Mueller
  2016-04-26 18:44       ` Pavel Machek
  2 siblings, 0 replies; 38+ messages in thread
From: Stephan Mueller @ 2016-04-26 18:24 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Sandy Harris, LKML, linux-crypto, Jason Cooper, John Denker,
	H. Peter Anvin, Andi Kleen

Am Montag, 25. April 2016, 23:07:35 schrieb Theodore Ts'o:

Hi Theodore,
> 
> > When dropping the add_disk_randomness function in the legacy /dev/random,
> > I
> > would assume that without changes to add_input_randomness and
> > add_interrupt_randomness, we become even more entropy-starved.
> 
> Sure, but your system isn't doing anything magical here.  The main
> difference is that you assume you can get almost a full bit of entropy
> out of each interrupt timing, where I'm much more conservative and
> assume we can only get 1/64th of a bit out of each interrupt timing.
> (e.g., that each interrupt event may have some complex correlation
> that is more sophisticated than what a "stuck bit" detector might be
> able to detect.)

The stuck test is only for identifying small patterns.

With the measurements I have done on a number of different systems trying to 
find a worst case attack and applying it, I still found that the timing of 
interrupt events show variations of 11 bits and more. It is good to be 
conservative for entropy estimations. But being too conservative is like 
killing yourself just because you are afraid of dying.

I tried to find an entropy estimate that is reasonable. And by moving from 11 
bits to 0.9 bits, I thought I am good here.

However, if a more conservative approach is requested, the LRNG only requires 
the change of LRNG_IRQ_ENTROPY_BYTES.

> 
> Part of the reason why I've been very conservative here is because not
> all ARM CPU's provide access to a high speed counter.  Using the IP
> and other CPU registers as a stop-gap is not great, but it is better
> than just using jiffies (which you seem to assume the /dev/random
> driver is doing; this is not true, and this is one of the ways in
> which the current system is better than your proposed LRNG, and why

I have neither said that the legacy /dev/random rests only on jiffies in 
absence of a high-res timer nor did I try to imply that. What I am saying is 
that even the combination of Jiffies and registers is not great either as they 
seem to be predictable with a reasonable degree of precision by an external 
entity.

In fact, Pavel's comments made me add exactly this kind of logic to cover 
systems without high-res timers. I will release the new code shortly. But they 
are only invoked if a high-res timer is not available.

> I'm not really fond of major "rip and replace" patches --- it's likely
> such a approach will end up making things worse for some systems, and
> I don't true the ARM SOC or embedded/mobile vendors to chose the
> kernel configuration sanely in terms of "should I use random number
> generator 'A' or 'B' for my system?).

I am sorry, but I cannot understand this statement: I am neither ripping 
things out, nor do I favor an outright replacement. I am offering a new option 
which may even be marked experimental for the time being.

I am asking to consider a new approach to collect entropy. And I am offering a 
patch that currently is intended for research and evaluation. It is a full API 
and ABI compatible version of the legacy /dev/random which allows such 
research and evaluation.

I do not see any way to use small steps in changing the legacy /dev/random 
with the challenges it faces. Besides, even small changes to the legacy 
/dev/random are rarely accepted, let alone the big items covering its 
challenges.

[..]
> 
> Yet another difference which I've noticed as I've been going over the
> patches is that that since it relies on CRYPTO_DRBG, it drags in a
> fairly large portion of the crypto subsystem, and requires it to be
> compiled into the kernel (instead of being loaded as needed as a
> module).  So the people who are worrying about keeping the kernel on a
> diet aren't going to be particularly happy about this.

If this is really a concern to people, I think there is no blocker to us here: 
I deliberately implemented the DRBG in the kernel crypto API such that it acts 
as a mere "block chaining mode" which is independent from the API it is called 
with and independent from the API of the underlying cipher suites. For a proof 
of this claim, you may want to compare the code from the crypto/drbg.c with 
the random/random-drbg.c code in upstream libgcrypt -- they are identical when 
it comes to the DRBG logic (I implemented the DRBG code on libgcrypt at the 
beginning with the goal to provide such cryptolib-agnostic implementation so 
that I can easily apply it to the kernel crypto API).

The LRNG uses only the DRBG core without using the kernel crypto API itself. 
Thus, it is not too hard to extract the DRBG core into a library code like 
lib/sha1.c and combine both if one does not want to compile the kernel crypto 
API.

> 
> I've thought about using a CRNG for the secondary pool, which would be
> a lot smaller and faster as far as random number extraction.  But the
> concern I have is that I don't want to drag in the whole generalized
> crypto subsystem just for /dev/random.  If we make it too heavyweight,
> then there will be pressure to make /dev/random optional, which would
> mean that application programs can't depend on it and some device
> manufacturers might be tempted to make it disappear for their kernels.
> 
> So my preference if we want to go down this path is to use a CRNG
> based on something like Twofish, which is modern, still unbroken, and
> is designed to be implemented efficiently in software in a small
> amount (both in terms of text and data segments).  This would then
> make it realtively efficient to use per-CPU CRNG's, in order to to
> satisfy Andi Kleen's concern about making /dev/urandom efficient for
> crazy programs that are trying to extract a huge amounts of data out
> of /dev/urandom on a big multi-socket system.  And I would do this
> with a hard-wired system that avoids dragging in the crypto system to
> to keep the Linux tinification folks happy.

I think with my answer above it is clear that the LRNG does not rely on having 
the kernel crypto API -- it is merely a convenience as of now. But we should 
not block ourselves from using it when it is there. It provides huge 
advantages.

Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-26  3:07     ` Theodore Ts'o
  2016-04-26 11:04       ` Herbert Xu
  2016-04-26 18:24       ` Stephan Mueller
@ 2016-04-26 18:44       ` Pavel Machek
  2016-04-26 18:55         ` Stephan Mueller
  2 siblings, 1 reply; 38+ messages in thread
From: Pavel Machek @ 2016-04-26 18:44 UTC (permalink / raw)
  To: Theodore Ts'o, Stephan Mueller, Sandy Harris, LKML,
	linux-crypto, Jason Cooper, John Denker, H. Peter Anvin,
	Andi Kleen

Hi1

> > When dropping the add_disk_randomness function in the legacy /dev/random, I 
> > would assume that without changes to add_input_randomness and 
> > add_interrupt_randomness, we become even more entropy-starved.
> 
> Sure, but your system isn't doing anything magical here.  The main
> difference is that you assume you can get almost a full bit of entropy
> out of each interrupt timing, where I'm much more conservative and
> assume we can only get 1/64th of a bit out of each interrupt timing.

Maybe 1/64th of a bit is a bit too conservative? I guess we really
have more than one bit of entropy on any system with timestamp
counter....

Making it 1/2 of bit (or something) should be very easy way to improve
entropy early during boot...

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-26 18:44       ` Pavel Machek
@ 2016-04-26 18:55         ` Stephan Mueller
  2016-04-26 19:41           ` Pavel Machek
  0 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-26 18:55 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Theodore Ts'o, Sandy Harris, LKML, linux-crypto, Jason Cooper,
	John Denker, H. Peter Anvin, Andi Kleen

Am Dienstag, 26. April 2016, 20:44:39 schrieb Pavel Machek:

Hi Pavel,

> Hi1
> 
> > > When dropping the add_disk_randomness function in the legacy
> > > /dev/random, I
> > > would assume that without changes to add_input_randomness and
> > > add_interrupt_randomness, we become even more entropy-starved.
> > 
> > Sure, but your system isn't doing anything magical here.  The main
> > difference is that you assume you can get almost a full bit of entropy
> > out of each interrupt timing, where I'm much more conservative and
> > assume we can only get 1/64th of a bit out of each interrupt timing.
> 
> Maybe 1/64th of a bit is a bit too conservative? I guess we really
> have more than one bit of entropy on any system with timestamp
> counter....
> 
> Making it 1/2 of bit (or something) should be very easy way to improve
> entropy early during boot...

I can easily settle on 1/2 bit here. The LRNG currently uses 0.9 bits which 
are based on measurements plus a safety margin. But I see no issue to even 
lower it further to, say, 1/2.

But simply enlarging the heuristic for the interrupt processing of the legacy 
/dev/random is a challenge IMHO. The key issue is the following:

When the legacy /dev/random receives one [block|HID] event, the following 
happens:

- add_[disk|input]_randomness assigns a time stamp containing majority of the 
entropy plus jiffies plus the event value and mix that triplet into the input 
pool

- for the very same event add_interrupt_randomness is also triggered and 
records the time stamp (plus jiffies, the instruction pointer and one 
register). Again, the majority of the entropy comes from the time stamp.

Both invocations are applied to the same event where the majority of entropy 
for each invocation is derived from a time stamp. It is clear that the 
invocation of both are highly correlated. So is the time stamp both 
invocations obtain. Thus, the time stamp of either one must not be credited 
with high entropy content.

If the credited entropy for an interrupt raises, the credited entropy for 
add_[disk|block]_randomness must be decreased. That is the core issue why I 
came up with a separate way of recording these events.

Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-26 18:55         ` Stephan Mueller
@ 2016-04-26 19:41           ` Pavel Machek
  0 siblings, 0 replies; 38+ messages in thread
From: Pavel Machek @ 2016-04-26 19:41 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Theodore Ts'o, Sandy Harris, LKML, linux-crypto, Jason Cooper,
	John Denker, H. Peter Anvin, Andi Kleen

Hi!

> > > > When dropping the add_disk_randomness function in the legacy
> > > > /dev/random, I
> > > > would assume that without changes to add_input_randomness and
> > > > add_interrupt_randomness, we become even more entropy-starved.
> > > 
> > > Sure, but your system isn't doing anything magical here.  The main
> > > difference is that you assume you can get almost a full bit of entropy
> > > out of each interrupt timing, where I'm much more conservative and
> > > assume we can only get 1/64th of a bit out of each interrupt timing.
> > 
> > Maybe 1/64th of a bit is a bit too conservative? I guess we really
> > have more than one bit of entropy on any system with timestamp
> > counter....
> > 
> > Making it 1/2 of bit (or something) should be very easy way to improve
> > entropy early during boot...
> 
> I can easily settle on 1/2 bit here. The LRNG currently uses 0.9 bits which 
> are based on measurements plus a safety margin. But I see no issue to even 
> lower it further to, say, 1/2.

No, you don't need to change anything. But maybe mainline rng should
change.

									Pavel
									
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-22 22:27 random(4) changes Sandy Harris
  2016-04-23  7:52 ` Stephan Mueller
  2016-04-24  2:03 ` Theodore Ts'o
@ 2016-04-25 16:06 ` Andi Kleen
  2016-04-25 17:25   ` Stephan Mueller
                     ` (2 more replies)
  2 siblings, 3 replies; 38+ messages in thread
From: Andi Kleen @ 2016-04-25 16:06 UTC (permalink / raw)
  To: Sandy Harris
  Cc: LKML, linux-crypto, Theodore Ts'o, Jason Cooper, John Denker,
	H. Peter Anvin, Stephan Mueller

Sandy Harris <sandyinchina@gmail.com> writes:

There is also the third problem of horrible scalability of /dev/random
output on larger systems, for which patches are getting ignored.

https://lkml.org/lkml/2016/2/10/716

Ignoring problems does not make them go away.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-25 16:06 ` Andi Kleen
@ 2016-04-25 17:25   ` Stephan Mueller
  2016-04-25 17:38     ` Andi Kleen
  2016-04-26  1:00   ` Theodore Ts'o
  2016-04-26 12:42   ` Sandy Harris
  2 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-25 17:25 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Sandy Harris, LKML, linux-crypto, Theodore Ts'o, Jason Cooper,
	John Denker, H. Peter Anvin

Am Montag, 25. April 2016, 09:06:03 schrieb Andi Kleen:

Hi Andi,

> Sandy Harris <sandyinchina@gmail.com> writes:
> 
> There is also the third problem of horrible scalability of /dev/random
> output on larger systems, for which patches are getting ignored.
> 
> https://lkml.org/lkml/2016/2/10/716
> 
> Ignoring problems does not make them go away.

I have seen your patches, but I am not fully sure I understand the root cause. 
is the noise source handling the issue or the random number generation the 
issue?

If it is the latter, can you explain where the scalability issue comes in?
> 
> -Andi


Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-25 17:25   ` Stephan Mueller
@ 2016-04-25 17:38     ` Andi Kleen
  2016-04-25 17:56       ` Stephan Mueller
  0 siblings, 1 reply; 38+ messages in thread
From: Andi Kleen @ 2016-04-25 17:38 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Andi Kleen, Sandy Harris, LKML, linux-crypto, Theodore Ts'o,
	Jason Cooper, John Denker, H. Peter Anvin

On Mon, Apr 25, 2016 at 07:25:55PM +0200, Stephan Mueller wrote:
> Am Montag, 25. April 2016, 09:06:03 schrieb Andi Kleen:
> 
> Hi Andi,
> 
> > Sandy Harris <sandyinchina@gmail.com> writes:
> > 
> > There is also the third problem of horrible scalability of /dev/random
> > output on larger systems, for which patches are getting ignored.
> > 
> > https://lkml.org/lkml/2016/2/10/716
> > 
> > Ignoring problems does not make them go away.
> 
> I have seen your patches, but I am not fully sure I understand the root cause. 
> is the noise source handling the issue or the random number generation the 
> issue?

Noise source handling is fine, the problem is the global locking on the
entropy pools when generating random numbers.

> If it is the latter, can you explain where the scalability issue comes in?

A single pool which is locked/written to does not scale. Larger systems
need multiple pools

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-25 17:38     ` Andi Kleen
@ 2016-04-25 17:56       ` Stephan Mueller
  2016-04-25 19:35         ` Andi Kleen
  0 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-25 17:56 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Sandy Harris, LKML, linux-crypto, Theodore Ts'o, Jason Cooper,
	John Denker, H. Peter Anvin

Am Montag, 25. April 2016, 10:38:25 schrieb Andi Kleen:

Hi Andi,

> On Mon, Apr 25, 2016 at 07:25:55PM +0200, Stephan Mueller wrote:
> > Am Montag, 25. April 2016, 09:06:03 schrieb Andi Kleen:
> > 
> > Hi Andi,
> > 
> > > Sandy Harris <sandyinchina@gmail.com> writes:
> > > 
> > > There is also the third problem of horrible scalability of /dev/random
> > > output on larger systems, for which patches are getting ignored.
> > > 
> > > https://lkml.org/lkml/2016/2/10/716
> > > 
> > > Ignoring problems does not make them go away.
> > 
> > I have seen your patches, but I am not fully sure I understand the root
> > cause. is the noise source handling the issue or the random number
> > generation the issue?
> 
> Noise source handling is fine, the problem is the global locking on the
> entropy pools when generating random numbers.
> 
> > If it is the latter, can you explain where the scalability issue comes in?
> 
> A single pool which is locked/written to does not scale. Larger systems
> need multiple pools

That would imply that even when you have a system with 1000 CPUs, you want to 
have a large amount of random numbers. Is this the use case?

Or is simply the presence of 1000 CPUs an issue for "normal" loads on 
/dev/urandom?
> 
> -Andi


Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-25 17:56       ` Stephan Mueller
@ 2016-04-25 19:35         ` Andi Kleen
  2016-04-26 12:01           ` Stephan Mueller
  2016-04-27 17:47           ` Stephan Mueller
  0 siblings, 2 replies; 38+ messages in thread
From: Andi Kleen @ 2016-04-25 19:35 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Andi Kleen, Sandy Harris, LKML, linux-crypto, Theodore Ts'o,
	Jason Cooper, John Denker, H. Peter Anvin

> > > If it is the latter, can you explain where the scalability issue comes in?
> > 
> > A single pool which is locked/written to does not scale. Larger systems
> > need multiple pools
> 
> That would imply that even when you have a system with 1000 CPUs, you want to 
> have a large amount of random numbers. Is this the use case?

That is right. Large systems do more work than small systems.
If the system is for example handling SSL connections it needs
more random numbers to handle more connections.

BTW the problems happen long before 1000 CPUs, more like 12-18 cores
competing.

Also today's large system is tomorrow's small systems. The
systems affected are actually not that large anymore.

The original numbers

Without patchkit:

1 node:  1x
2 nodes: 0.75x
3 nodes: 0.55x
4 nodes: 0.42x

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-25 19:35         ` Andi Kleen
@ 2016-04-26 12:01           ` Stephan Mueller
  2016-04-27 17:47           ` Stephan Mueller
  1 sibling, 0 replies; 38+ messages in thread
From: Stephan Mueller @ 2016-04-26 12:01 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Sandy Harris, LKML, linux-crypto, Theodore Ts'o, Jason Cooper,
	John Denker, H. Peter Anvin

Am Montag, 25. April 2016, 12:35:32 schrieb Andi Kleen:

Hi Andi,

> > > > If it is the latter, can you explain where the scalability issue comes
> > > > in?
> > > 
> > > A single pool which is locked/written to does not scale. Larger systems
> > > need multiple pools
> > 
> > That would imply that even when you have a system with 1000 CPUs, you want
> > to have a large amount of random numbers. Is this the use case?
> 
> That is right. Large systems do more work than small systems.
> If the system is for example handling SSL connections it needs
> more random numbers to handle more connections.
> 
> BTW the problems happen long before 1000 CPUs, more like 12-18 cores
> competing.
> 
> Also today's large system is tomorrow's small systems. The
> systems affected are actually not that large anymore.
> 
> The original numbers
> 
> Without patchkit:
> 
> 1 node:  1x
> 2 nodes: 0.75x
> 3 nodes: 0.55x
> 4 nodes: 0.42x

I have changed the LRNG now such that a multiple instantiation of the 
secondary DRBG can be implemented with very limited amount of code.

Thus, the proposal you have for the nonblocking_pool can be adapted.

Yet I have not implemented such duplication as I first would like to see 
whether the initial proposal of my LRNG is considered acceptable.

Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-25 19:35         ` Andi Kleen
  2016-04-26 12:01           ` Stephan Mueller
@ 2016-04-27 17:47           ` Stephan Mueller
  1 sibling, 0 replies; 38+ messages in thread
From: Stephan Mueller @ 2016-04-27 17:47 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Sandy Harris, LKML, linux-crypto, Theodore Ts'o, Jason Cooper,
	John Denker, H. Peter Anvin

Am Montag, 25. April 2016, 12:35:32 schrieb Andi Kleen:

Hi Andi,

> > > > If it is the latter, can you explain where the scalability issue comes
> > > > in?
> > > 
> > > A single pool which is locked/written to does not scale. Larger systems
> > > need multiple pools
> > 
> > That would imply that even when you have a system with 1000 CPUs, you want
> > to have a large amount of random numbers. Is this the use case?
> 
> That is right. Large systems do more work than small systems.
> If the system is for example handling SSL connections it needs
> more random numbers to handle more connections.

I have ported the NUMA logic to the LRNG. It instantiates the secondary DRBG 
once for each NUMA node just like your patch.

Though, the initialization of the instances of the secondary DRBGs is 
different. I serialize the initialization such that one DRBG instance is 
seeded at a time from the primary DRBG.

I tested the code by using the per-CPU logic instead of per-NUMA node. This 
test shows that all works fine.

I then changed it to use a per NUMA node instance. It works on my test systems 
which instantiate the DRBG only once as I only have one node.

May I ask you to test that code on your system as I do not have access to a 
NUMA system? I will release a new version shortly.

Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-25 16:06 ` Andi Kleen
  2016-04-25 17:25   ` Stephan Mueller
@ 2016-04-26  1:00   ` Theodore Ts'o
  2016-04-26 12:42   ` Sandy Harris
  2 siblings, 0 replies; 38+ messages in thread
From: Theodore Ts'o @ 2016-04-26  1:00 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Sandy Harris, LKML, linux-crypto, Jason Cooper, John Denker,
	H. Peter Anvin, Stephan Mueller

On Mon, Apr 25, 2016 at 09:06:03AM -0700, Andi Kleen wrote:
> Sandy Harris <sandyinchina@gmail.com> writes:
> 
> There is also the third problem of horrible scalability of /dev/random
> output on larger systems, for which patches are getting ignored.
> 
> https://lkml.org/lkml/2016/2/10/716
> 
> Ignoring problems does not make them go away.

Sorry, too much travel, deadlines and conference this spring.  I
haven't forgot them.

						- Ted

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-25 16:06 ` Andi Kleen
  2016-04-25 17:25   ` Stephan Mueller
  2016-04-26  1:00   ` Theodore Ts'o
@ 2016-04-26 12:42   ` Sandy Harris
  2 siblings, 0 replies; 38+ messages in thread
From: Sandy Harris @ 2016-04-26 12:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: LKML, linux-crypto, Theodore Ts'o, Jason Cooper, John Denker,
	H. Peter Anvin, Stephan Mueller

On Mon, Apr 25, 2016 at 12:06 PM, Andi Kleen <andi@firstfloor.org> wrote:

> Sandy Harris <sandyinchina@gmail.com> writes:
>
> There is also the third problem of horrible scalability of /dev/random
> output on larger systems, for which patches are getting ignored.

I did not write that. I think Andi is quoting himself here, not me.

> https://lkml.org/lkml/2016/2/10/716
>
> Ignoring problems does not make them go away.
>
> -Andi
> --
> ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 38+ messages in thread

[parent not found: <5279345.Lo7T948V4W@positron.chronox.de>]

* Re: random(4) changes
       [not found] <5279345.Lo7T948V4W@positron.chronox.de>
@ 2016-04-26 20:43 ` George Spelvin
  2016-04-26 21:01   ` Stephan Mueller
  0 siblings, 1 reply; 38+ messages in thread
From: George Spelvin @ 2016-04-26 20:43 UTC (permalink / raw)
  To: linux, smueller; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso

Schrieb Stephan Mueller:
> Am Montag, 25. April 2016, 21:59:43 schrieb George Spelvin:
>> Indeed, this is an incredibly popular novice mistake and I don't
>> understand why people keep making it.

> Can you please elaborate on your statement to help me understanding the issue 
> and substantiate your claim here?

Basically, hashing down to 1 bit limits the entropy to 1 bit.

If there happened to be more than 1 bit of entropy in the
original input (and many timestamps *do* have more entropy than
that; the hard part is identifying which), you've thrown it away.

You need to hash eventually, to convert the large amount of weakly-random
input to the desired strongly-random output, but you should do that as
late in the processing as possible, and in as large blocks as possible.

The "novice mistake" is to try to concentrate the entropy (reduce the
number of bits used to store it) too soon.  You want to defer that as much
as possible.

> Please note the mathematical background I outlined in my documentation: What I
> try is to collapse the received data such as a time stamp into one bit by
> XORing each bit with each other. Note, the bits within a time stamp are IID
> (independent and identically distributed -- i.e. when you see one or more bits
> of a given time stamp, you cannot derive the yet unseen bit values). 
> Technically this is identical to a parity calculation.

And I'm still struggling to understand it.  You wrote it up formally, so
I want to stare at it for a few hours (which is a couple of days calendar
time) before passing judgement on it.

For example, my initial reaction is that the IID claim seems ridiculous.

Bit 63 of a rdtsc timestamp is always zero.  It's initialized to zero on
boot and no computer with a TSC has been up for the 50+ years it would
take to flip that bit.

But presumably that's obvious to you, too, so I'm misunderstanding.

I'm trying to catch up on your paper and all the other comments in this
thread at the same time, and my brain is a bit scattered.

I'm trying to resist the urge to respond until I understand everything
that's already been said, but as I mentioned previously, I'm not being
entirely successful.

> - the output of the entropy pool is meant to be fed into a DRBG. Such DRBG 
> (let us take the example of a Hash DRBG) will, well, hash the input data. So, 
> what help does a hash to raw entropy before feeding it to a DRBG which will 
> hash it (again)?

The two-stage hashing is a matter of practical implementation necessity.

Ideally, we'd take all of the raw sampled data and use a strong hash
on it directly.  But that requires an impractical amount of storage.

Just as good would be to losslessly compress the data.  If we could do
*perfect* compression, we'd get pure entropy directly.  But the latter
is impossible and even the former is impractical.

So we hash it to fit it into a fixed-size buffer.  This hash does
not have to be cryptographically strong, just minimize collisions.
(Since a collision is the one and only way to lose entropy.)  This is
explained in the comment at drivers/char/random.c:335.

A second design goal of this first stage hash is speed; we want to
minimize interrupt overhead.  Since it was first designed, cache effects
have gotten stronger and the scattered access to a large pool could be
improved upon, but it's still reasonably fast.

The second stage hash (DRBG or equivalent) then uses a strong cryptographic
algorithm to generate the final output.

> - the entropy pool maintenance does not need to have any backtracking 
> resistance as (1) it is always postprocessed by the cryptographic operation
> of the DRBG, and (2) constantly overwritten by new interrupts coming in

I don't see how (1) is relevant at all; if you can recover the DRBG
seed, you can recover the DRBG output, and (2) might not be fast enough.
For example, suppose someone suspends to disk immediately after generating
a key.

(I'm assuming you instantiate a new DRBG for each open() of /dev/random.
I haven't read your code yet to verify that.)

If the amount of entropy added after the key generation is attackable (say
it's around 32 bits), then the image on disk can reveal the previously
generated key.

You're right that it's not a very critical feature in most use cases,
but it's not very expensive to implement and I think a lot of people
would question its dismissal.  Knowledgeable people, never mind the
howls from the peanut gallery if they hear we're weakening /dev/random.

(I mention that the NIST DRBGs implement anti-backtracking, so presumably
they think it's an important feature.)

> - to hash raw input data is usually performed to whiten it. When you have a 
>   need to whiten it, it contains skews and statistical weaknesses that
>   you try to disguise. My approach is to not disguise anything -- I try
>   to have "nothing up my sleeve".

Only the final hash, which produces the strongly-random output, is for the
explicit purpose of whitening.  That's because strongly-random bits are,
by definition, white.

Earlier steps should not try to whiten.

That's what I don't like about Intel's RDRAND and similar hardware RNGs:
they are whitening too early.

That's also what I don't like about XORing down to 1 bit before adding
to the pool.  Again, whitening too early!

Is that any clearer?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-26 20:43 ` George Spelvin
@ 2016-04-26 21:01   ` Stephan Mueller
  2016-04-27  0:23     ` George Spelvin
  0 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-26 21:01 UTC (permalink / raw)
  To: George Spelvin; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso

Am Dienstag, 26. April 2016, 16:43:30 schrieb George Spelvin:

Hi George,

(I am not covering the initial part as I leave you time to read through the 
paper which should cover those aspects)

> 
> That's what I don't like about Intel's RDRAND and similar hardware RNGs:
> they are whitening too early.
> 
> That's also what I don't like about XORing down to 1 bit before adding
> to the pool.  Again, whitening too early!
> 
> 
> Is that any clearer?

I see what you are saying. And I know that the best way (TM) would be to 
simply concatenate the time stamps. But that is not feasible.

And considering that I only want to have 0.9 bits of entropy, why should I not 
collapse it? The XOR operation does not destroy the existing entropy, it only 
caps it to at most one bit of information theoretical entropy. As I can show 
that the original value has many more bits of entropy, I use that as my safety 
margin.

Hence, I combine the safety margin provided by the XOR folding with a nice and 
easy maintenance of the harvested one bit by simply concatenating them. Again, 
the entire harvesting and collection shall be very easy to understand without 
hiding anything. In addition it is intended to solely use XOR and 
concatenation, i.e. the two only functions whose effect on entropy are known.

Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-26 21:01   ` Stephan Mueller
@ 2016-04-27  0:23     ` George Spelvin
  2016-04-27 18:03       ` George Spelvin
                         ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: George Spelvin @ 2016-04-27  0:23 UTC (permalink / raw)
  To: linux, smueller; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso

> And considering that I only want to have 0.9 bits of entropy, why
> should I not collapse it? The XOR operation does not destroy the existing
> entropy, it only caps it to at most one bit of information theoretical
> entropy.

No.  Absolutely, demonstrably false.

The XOR operation certainly *does* destroy entropy.
If you have 0.9 bits of entropy to start, you will have less
after the XOR.  It does NOT return min(input, 1) bits.

In rare cases, the XOR won't destroy entropy, but that's statistically
unlikely.

Here's a proof:

If there are only two possible inputs (timings), and those inputs have
opposite parities, then the XOR will cause no collisions and no entropy
is destroyed.

If you have at least three possibilities, hashing down to one bit (by
XOR or any other algorithm) must cause a collision, and that collision
will lose entropy.

Just as an example, let me use a 3-option distribution with roughly 0.5
bit of Shannon entropy: probabilities 90%, 9% and 1%.  Then list all
the possible collisions, and the Shannon and min-entropy in each case:

	%	Shannon		Min
	90/9/1	0.5159		0.1520
	90/10	0.4690	(91%)	0.1520	(100%)
	91/9	0.4365	(85%)	0.1361	(90%)
	99/1	0.0808	(16%)	0.0145	(10%)
	100	0		0

If you reduce the number of cases to 2, you lose Shannon entropy, always.

Min-entropy is preserved 1/4 of the time if you get lucky and none of the
less-likely options collide with the most-likely.

If the 4 possible collision cases are equally likely (which is the
case if the hashing to one bit is a random function), then you expect
to retain half of the input entropy.

If there are more than three possible inputs, the situation gets worse,
and the likelihood of no loss of min-entropy falls.

In a case of particular interest to an RNG, consider the min-entropy when
there are a large number of possible input measurements.  The min-entropy is
simply -log2(p(max)), where p(max) is the probability of the most likely
outcome.  If p(max) > 50%, then the input min-entropy is less than 1 bit.

In this case we can assume that, when collapsing to a single bit, the
less likely cases will be distributed uniformly between colliding and
not colliding with the most likely alternative.

Thus, the probability of the most likely increases from p to
p + (1-p)/2 = (1+p)/2, and the min-entropy correspondingly
decreases from -log2(p) to -log2((1+p)/2).

The ratio of output to input min-entropy varies from 50% near 0 bits to
45.7% at 0.5 bits to 41.5% at 1 bit input.

In this case, which I think is a plausible case for /dev/random
measurements, you're throwing away half the entropy.

Beyond 1 bit of input entropy, the ratio gets worse as the output
asymptotically approaches 1 bit of entropy.  Specifically, in order to
get 0.9 bits of min-entropy in the output (p(max) = 0.5358), you need
3.8 bits (p(max) = 0.07177 = 1/14) in the input!

I'm sorry, but collapsing individual samples to 1 bit is a Bad Design,
full stop.  It's not the algorithm used to do the reduction, it's the
reduction itself.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-27  0:23     ` George Spelvin
@ 2016-04-27 18:03       ` George Spelvin
  2016-04-28 20:15       ` Stephan Mueller
  2016-04-29  0:47       ` George Spelvin
  2 siblings, 0 replies; 38+ messages in thread
From: George Spelvin @ 2016-04-27 18:03 UTC (permalink / raw)
  To: andi
  Cc: herbert, linux, linux-crypto, linux-kernel, sandyinchina,
	smueller, tytso

Andi Kleen wrote:
> There is also the third problem of horrible scalability of /dev/random
> output on larger systems, for which patches are getting ignored.

I came up with some very pretty code to fix this, which
tried to copy_to_user with a lock held.

After all my attempts to fix that fatal flaw resulted in much uglier
code I set it aside for a while in the hopes that inspiration would
strike.

and it's still sitting unfinished.  :-(

But I want to finish it, honest!  This latest discussion has
made me acutely conscious of it.

The fact that the scope of changes just got bigger doesn't help of
course, but I *have* picked it up again.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-27  0:23     ` George Spelvin
  2016-04-27 18:03       ` George Spelvin
@ 2016-04-28 20:15       ` Stephan Mueller
  2016-04-29  7:29         ` George Spelvin
  2016-04-29  0:47       ` George Spelvin
  2 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-28 20:15 UTC (permalink / raw)
  To: George Spelvin; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso

Am Dienstag, 26. April 2016, 20:23:46 schrieb George Spelvin:

Hi George,

> > And considering that I only want to have 0.9 bits of entropy, why
> > should I not collapse it? The XOR operation does not destroy the existing
> > entropy, it only caps it to at most one bit of information theoretical
> > entropy.
> 
> No.  Absolutely, demonstrably false.
> 
> The XOR operation certainly *does* destroy entropy.
> If you have 0.9 bits of entropy to start, you will have less
> after the XOR.  It does NOT return min(input, 1) bits.

As I am having difficulties following your explanation, let us start at the 
definition:

XOR is defined as an entropy preserving operation, provided the two arguments 
to the XOR operation are statistically independent (let us remember that 
caveat for later).

That means, the entropy behavior of H(A XOR B) is max(H(A), H(B)) if they are 
independent. For example, A has 5 bits of entropy and B has 7 bits of entropy, 
A XOR B has 7 bits of entropy. Similarly, if A has zero bits of entropy the 
XORed result will still have 7 bits of entropy from B. That applies regardless 
of the size of A or B, including one bit sized chunks. The same applies when 
XORing more values:

A XOR B XOR C = (A XOR B) XOR C

Now, the entropy behaves like:

max(max(H(A), H(B)), H(C)) = max(H(A), H(B), H(C))

Now, with that definition, let us look at the LRNG method. The LRNG obtains a 
time stamp and uses the low 32 bits of it. The LRNG now slices those 32 bits 
up in individual bits, let us call them b0 through b31.

The LRNG XORs these individual bits together. This means:

b0 XOR b1 XOR b2 XOR ... XOR b31

This operation gives us one bit.

How is the entropy behaving here? Let us use the definition from above:

H(XORed bit) = max(H(b0), H(b1), ..., H(b31))

We know that each individual bit can hold at most one bit. Thus the formula 
implies that the XOR operation in the LRNG can at most get one bit of entropy.

Given these findings, I now have to show and demonstrate that:

1. the individual bits of a given 32 bit time stamp are independent (or IID in 
terms of NIST)

2. show that the maximum entropy of each of the individual bits is equal or 
more to my entropy estimate I apply.

Regarding 1: The time stamp (or cycle counter) is a 32 bit value where each 
of the bits does not depend on the other bits. When considering one and only 
one time stamp value and we look at, say, the first 20 bits, there is no way 
it is clear what the missing 12 bits will be. Note I am not saying that when 
comparing two or more time stamps that one cannot deduce the bits! And here it 
is clear that the bits within one given time stamp are independent, but 
multiple time stamps are not independent. This finding is supported with 
measurements given in 3.4.1 (I understand that the measurements are only 
supportive and no proof). Figure 3.1 shows an (almost) rectangular 
distribution which is the hint to an equidistribution which in turn supports 
the finding that the individual bits within a time stamp are independent. In 
addition, when you look at the Shannon/Min Entropy values (which do not give 
an entropy estimate here, but only help in understanding the distribution!), 
the values show that the distribution has hardly any discontinuities -- please 
read the explanation surrounding the figure.

Regarding 2: I did numerous measurements that show that the low bits do have 
close to one bit of entropy per data bit. If I may ask to consider section 
3.4.1 again (please consider that I tried to break the logic by applying a 
pathological generation of interrupts here to stimulate the worst case): The 
entropy is not found in the absolute time stamps, but visible in the time 
deltas (and the uncertainty of the variations of those). So I calculated the 
time deltas from the collected set of time stamps of events. Now, when simply 
using the four (you may also use three or perhaps five) lower bits of the time 
delta values, we can calculate an interesting and very important Minimum 
Entropy value: the Markov Min Entropy. Using the table 2, I calculated the 
Markov Min Entropy of the data set of the 4 low bit time delta values. The 
result shows that the 4 bit values still have 3.92 bits of entropy (about 0.98 
bits of entropy per data bit). Ok, one worst case measurement may not be good 
enough. So I continued on other environments with the same testing. Table 3 
provides the results on those environments. And they have even more entropy 
than the first measurement! So, with all the measurements I always see that 
each of the four low bits has around 0.98 bits of entropy. Thus, with the XOR 
value I can conclude that these measurements show that the XOR result will 
have 0.98 bits of Markov Min Entropy based on these measurements.

Please note that I assume an entropy content of 256/288 bits of entropy per 
data bit which is slightly less than 0.9. This lower level is significantly 
less than the measured values -- a safety margin.

Albeit that marks the conclusion of the XOR folding assessment, let me 
continue why this XOR folding operation provides another helping hand. The 
measurement of the time deltas in 3.4.1, particular figure 3.2 shows that the 
time delta has even 11 bits of ("regular") Min Entropy. So, don't I waste a 
lot of entropy with the XOR folding? Apart from having more safety margins in 
case the overall delta values have less variations than I measured in my worst 
case testing, there is another factor at play:

As I have explained above, the XOR collapse is applied to the time stamps. 
Those time stamps show statistical dependencies (and maybe even to a lesser 
degree the time deltas have some dependencies too). We fold the time stamps 
and then concatenate them -- concatenation is not affected by the statistical 
dependencies. At one point in time we have a wrap-around in the entropy pool. 
The current entropy pool is 4096 bits in size (which can be arbitrarily 
changed to a minimum of 256 bits), so we wrap after 4096 received events. Now, 
we XOR the bit from the first interrupt with the bit from the 4097th 
interrupt. To ensure that the XOR operation is entropy preserving, these bits 
must be statistically independent. And to ensure that, the collapsing of the 
time stamp and the seemingly loosing of entropy helps here too! So, we give up 
entropy to "buy" statistical independence to support the XOR operation here.

With section 3.4.2 I apply a large array of statistical tests against a bit 
stream of folded bits. All of those tests pass, indicating that the bit stream 
behaves like White Noise without any whitening logic like LFSR or even 
hashing. Thus, this testing supports my analysis from above.

The root cause for not applying an LFSR or another mix-in function is that 
such LFSR is already a whitening logic. But I do have a whitening logic 
already with the DRBG. So, to me having a whitening logic whose output is used 
by another whitener is akin that you have to hide some deficiencies like skews 
or other problems in your noise source. But I have nothing to hide at the 
layer of the noise source.

Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-28 20:15       ` Stephan Mueller
@ 2016-04-29  7:29         ` George Spelvin
  2016-04-29  8:02           ` Stephan Mueller
  0 siblings, 1 reply; 38+ messages in thread
From: George Spelvin @ 2016-04-29  7:29 UTC (permalink / raw)
  To: linux, smueller; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso

>From smueller@chronox.de Fri Apr 29 04:56:49 2016
From: Stephan Mueller <smueller@chronox.de>
To: George Spelvin <linux@horizon.com>
Cc: herbert@gondor.apana.org.au, linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org, sandyinchina@gmail.com, tytso@mit.edu
Subject: Re: random(4) changes
Date: Thu, 28 Apr 2016 22:15:17 +0200
User-Agent: KMail/4.14.10 (Linux/4.4.7-300.fc23.x86_64; KDE/4.14.18; x86_64; ; )
In-Reply-To: <20160427002346.12354.qmail@ns.horizon.com>
References: <20160427002346.12354.qmail@ns.horizon.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"

Am Dienstag, 26. April 2016, 20:23:46 schrieb George Spelvin:

Hi George,

> > And considering that I only want to have 0.9 bits of entropy, why
> > should I not collapse it? The XOR operation does not destroy the existing
> > entropy, it only caps it to at most one bit of information theoretical
> > entropy.
> 
> No.  Absolutely, demonstrably false.
> 
> The XOR operation certainly *does* destroy entropy.
> If you have 0.9 bits of entropy to start, you will have less
> after the XOR.  It does NOT return min(input, 1) bits.

As I am having difficulties following your explanation, let us start at the 
definition:

XOR is defined as an entropy preserving operation, provided the two arguments 
to the XOR operation are statistically independent (let us remember that 
caveat for later).

> That means, the entropy behavior of H(A XOR B) is max(H(A), H(B)) if they are 
> independent.

Actually, no.  If they're independent, it can be greater.

For example, if A has half a bit of entropy, and B has half a bit
(both in the Shannon sense), then A XOR B will have 0.713537
bits.

> 1. the individual bits of a given 32 bit time stamp are independent
>    (or IID in terms of NIST)

They're not independent, nor are they identically distributed.

If they were identically distributed, they'd all have identical
entropy.  And there's be no reason to stop at 32 bits.  If the high
32 bits have the same entropy as the low
entropy too?.

> 2. show that the maximum entropy of each of the individual bits is equal or 
>    more to my entropy estimate I apply.

I'm not sure what you mean here.  When you say "maximum entropy" is that
a non-strict upper bound?

Or does that mean that at least one bit achieves that maximum?

That will be a much more interesting proof.

> Regarding 1: The time stamp (or cycle counter) is a 32 bit value where
> each of the bits does not depend on the other bits. When considering one
> and only one time stamp value and we look at, say, the first 20 bits,
> there is no way it is clear what the missing 12 bits will be.

If you deliberately exclude all external data, then a 32-bit
constant is random.  (I suggest 17, the "most random number".)

But that's meaningless.  When we talk about "entropy", we are talking
about an attacker's uncertainty about the value.  Any other measure is
garbage in, and proiduces nothing but garbage out.

 Note I
am not saying that when comparing two or more time stamps that one
cannot deduce the bits! And here it is clear that the bits within
one given time stamp are independent, but multiple time stamps are
not independent. This finding is supported with measurements given in
3.4.1 (I understand that the measurements are only supportive and no
proof). Figure 3.1 shows an (almost) rectangular distribution which is
the hint to an equidistribution which in turn supports the finding that
the individual bits within a time stamp are independent. In addition,
when you look at the Shannon/Min Entropy values (which do not give an
entropy estimate here, but only help in understanding the distribution!),
the values show that the distribution has hardly any discontinuities --
please read the explanation surrounding the figure.

Regarding 2: I did numerous measurements that show that the low bits do have 
close to one bit of entropy per data bit. If I may ask to consider section 
3.4.1 again (please consider that I tried to break the logic by applying a 
pathological generation of interrupts here to stimulate the worst case): The 
entropy is not found in the absolute time stamps, but visible in the time 
deltas (and the uncertainty of the variations of those). So I calculated the 
time deltas from the collected set of time stamps of events. Now, when simply 
using the four (you may also use three or perhaps five) lower bits of the time 
delta values, we can calculate an interesting and very important Minimum 
Entropy value: the Markov Min Entropy. Using the table 2, I calculated the 
Markov Min Entropy of the data set of the 4 low bit time delta values. The 
result shows that the 4 bit values still have 3.92 bits of entropy (about 0.98 
bits of entropy per data bit). Ok, one worst case measurement may not be good 
enough. So I continued on other environments with the same testing. Table 3 
provides the results on those environments. And they have even more entropy 
than the first measurement! So, with all the measurements I always see that 
each of the four low bits has around 0.98 bits of entropy. Thus, with the XOR 
value I can conclude that these measurements show that the XOR result will 
have 0.98 bits of Markov Min Entropy based on these measurements.

Please note that I assume an entropy content of 256/288 bits of entropy per 
data bit which is slightly less than 0.9. This lower level is significantly 
less than the measured values -- a safety margin.

Albeit that marks the conclusion of the XOR folding assessment, let me 
continue why this XOR folding operation provides another helping hand. The 
measurement of the time deltas in 3.4.1, particular figure 3.2 shows that the 
time delta has even 11 bits of ("regular") Min Entropy. So, don't I waste a 
lot of entropy with the XOR folding? Apart from having more safety margins in 
case the overall delta values have less variations than I measured in my worst 
case testing, there is another factor at play:

As I have explained above, the XOR collapse is applied to the time stamps. 
Those time stamps show statistical dependencies (and maybe even to a lesser 
degree the time deltas have some dependencies too). We fold the time stamps 
and then concatenate them -- concatenation is not affected by the statistical 
dependencies. At one point in time we have a wrap-around in the entropy pool. 
The current entropy pool is 4096 bits in size (which can be arbitrarily 
changed to a minimum of 256 bits), so we wrap after 4096 received events. Now, 
we XOR the bit from the first interrupt with the bit from the 4097th 
interrupt. To ensure that the XOR operation is entropy preserving, these bits 
must be statistically independent. And to ensure that, the collapsing of the 
time stamp and the seemingly loosing of entropy helps here too! So, we give up 
entropy to "buy" statistical independence to support the XOR operation here.

With section 3.4.2 I apply a large array of statistical tests against a bit 
stream of folded bits. All of those tests pass, indicating that the bit stream 
behaves like White Noise without any whitening logic like LFSR or even 
hashing. Thus, this testing supports my analysis from above.

The root cause for not applying an LFSR or another mix-in function is that 
such LFSR is already a whitening logic. But I do have a whitening logic 
already with the DRBG. So, to me having a whitening logic whose output is used 
by another whitener is akin that you have to hide some deficiencies like skews 
or other problems in your noise source. But I have nothing to hide at the 
layer of the noise source.

Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-29  7:29         ` George Spelvin
@ 2016-04-29  8:02           ` Stephan Mueller
  2016-04-29  9:34             ` George Spelvin
  0 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-29  8:02 UTC (permalink / raw)
  To: George Spelvin; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso

Am Freitag, 29. April 2016, 03:29:50 schrieb George Spelvin:

Hi George,

> > 1. the individual bits of a given 32 bit time stamp are independent
> > 
> >    (or IID in terms of NIST)
> 
> They're not independent, nor are they identically distributed.

That is an interesting statement: you say that the time stamp has holes in it, 
i.e. some values have zero probability of being selected! Second, you imply 
that when bit x of a given time stamp has some particular value, bit y can be 
deduced from bit x.

I have not experienced that nor do I see any hints for that claim.
> 
> If they were identically distributed, they'd all have identical
> entropy.  And there's be no reason to stop at 32 bits.  If the high
> 32 bits have the same entropy as the low
> entropy too?.

There is absolutely no limit to the 32 bits. We easily can take the high bits 
too. But we know (as you mention below), an attacker has more and more 
knowledge about the selected bits the higher the bit is as he can predict an 
event with a certain degree of probability.

Thus, mixing in the high 32 bits do not hurt here from a mathematical point of 
view. But from a technical, it hurts: we know that these high 32 bits have 
hardly any entropy relative to the attacker. Thus, we would mix in bits that 
do not really help us in the entropy collection. But the processing still 
requires CPU cycles -- for each interrupt. Thus, to prevent wasting CPU 
cycles, I think that the high 32 bits should be discarded.

But if people say that they want them considered too, I have no problems in 
adding them
> 
> > 2. show that the maximum entropy of each of the individual bits is equal
> > or
> > 
> >    more to my entropy estimate I apply.
> 
> I'm not sure what you mean here.  When you say "maximum entropy" is that
> a non-strict upper bound?
> 
> Or does that mean that at least one bit achieves that maximum?

Exactly that -- I have to show that at least one bit out of the 32 bit value 
reaches that maximum, i.e. one bit has more entropy than my entropy estimate.
> 
> That will be a much more interesting proof.
> 
> > Regarding 1: The time stamp (or cycle counter) is a 32 bit value where
> > each of the bits does not depend on the other bits. When considering one
> > and only one time stamp value and we look at, say, the first 20 bits,
> > there is no way it is clear what the missing 12 bits will be.
> 
> If you deliberately exclude all external data, then a 32-bit
> constant is random.  (I suggest 17, the "most random number".)
> 
> But that's meaningless.  When we talk about "entropy", we are talking
> about an attacker's uncertainty about the value.  Any other measure is
> garbage in, and proiduces nothing but garbage out.

Correct. Please attack the, say, low 4 or 5 bits of a high-res timer so that 
you can predict their values with a certain confidence for the observed events 
(in the legacy /dev/random, that is a hdd event, a HID event and an interrupt 
-- all of those events are user-triggerable). If you achieve that, you broke, 
well, almost all timer based noise sources -- be it the legacy /dev/random, be 
it OpenBSD, XNU, you name it.

Note, I thought I can attack the legacy /dev/random HID noise source using the 
X11 logic: if one has access to the X11 server, one can see all HID events. I 
measured its RDTSC time and obtained the respective RDTSC time from the legacy 
/dev/random event processing. There are about 500,000,000 clock ticks in 
variations between both measurements. For a ping flood from a VMM host to a 
virtual machine guest, I get down to 11 bits variations. I even measured RDTSC 
timers (see my Jitter RNG measurements) in a tight loop where interrupts are 
immediately to be spotted -- the variations of those interrupts are also in 
the vicinity of 10 or 11 bits.

Regardless of what I am doing, I do not see that I can get below 10 bits of 
"accuracy" in predicting an RDTSC time stamp of an event processed by the 
legacy /dev/random.

Maybe I am not smart enough for attacking the system. Maybe others are smarter 
than me and find a way to attack it to get to 5 or 6 bits of accuracy. Yet, 
this is means there is way more entropy than I need -- this is my first safety 
margin.

Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-29  8:02           ` Stephan Mueller
@ 2016-04-29  9:34             ` George Spelvin
  2016-04-29  9:53               ` Stephan Mueller
  0 siblings, 1 reply; 38+ messages in thread
From: George Spelvin @ 2016-04-29  9:34 UTC (permalink / raw)
  To: linux, smueller; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso

(Note that we have two chains of e-mails crossing mid-stream.  I'm in
the middle of working on a much longer reply to your previous e-mail.)

>> They're not independent, nor are they identically distributed.

> That is an interesting statement: you say that the time stamp has holes
> in it, i.e. some values have zero probability of being selected!

That's not at all what I said.  It may be true, depending on Intel's
TSC implementation, but I didn't say or imply it.

> Second, you imply that when bit x of a given time stamp has some
> particular value, bit y can be deduced from bit x.

Yes.  For example, bit 30 can be deduced from bit 31, given our
assumption that the attacker has knowledge of previous timestamps, and
likely inter-interrupt times.  If bit 31 has changed, bit 30 is almost
certainly zero.  The bits are not independent.

The distribution of bit 31 is, with very high probability, equal to that
in the previous timestamp.  Bit 0, not so much.

In other words, bits 31 and 0 have different distributions.  They are
not identically distributed.

I gave this example in my previous e-mail
Message-ID: <20160429004748.9422.qmail@ns.horizon.com>

>> If they were identically distributed, they'd all have identical
>> entropy.  And there's be no reason to stop at 32 bits.  If the high
>> 32 bits have the same entropy as the low
>> entropy too?.

> There is absolutely no limit to the 32 bits. We easily can take the high bits 
> too. But we know (as you mention below), an attacker has more and more 
> knowledge about the selected bits the higher the bit is as he can predict an 
> event with a certain degree of probability.

Yes, an attacker has more information about higher bits.

This is the defintion of NOT identically distributed!

*If* they were identically distributed, a suggestion I'm pointing
out the ridiculous implications of, then an attacker's knowledge
of each of them would be identical.

If that were the case (and it's not), then the high 32 bits would be as
good a source of entropy as the low 32 bits.

>>> 2. show that the maximum entropy of each of the individual bits is equal
>>> or  more to my entropy estimate I apply.
>> 
>> I'm not sure what you mean here.  When you say "maximum entropy" is that
>> a non-strict upper bound?
>> 
>> Or does that mean that at least one bit achieves that maximum?

> Exactly that -- I have to show that at least one bit out of the 32
> bit value reaches that maximum, i.e. one bit has more entropy than my
> entropy estimate.

That will be an interesting claim to argue for.  Where do you make it?

>>> Regarding 1: The time stamp (or cycle counter) is a 32 bit value where
>>> each of the bits does not depend on the other bits. When considering one
>>> and only one time stamp value and we look at, say, the first 20 bits,
>>> there is no way it is clear what the missing 12 bits will be.

>> If you deliberately exclude all external data, then a 32-bit
>> constant is random.  (I suggest 17, the "most random number".)
>> 
>> But that's meaningless.  When we talk about "entropy", we are talking
>> about an attacker's uncertainty about the value.  Any other measure is
>> garbage in, and produces nothing but garbage out.

> Correct.

You mean that I'm correct that your description of the timestamp bits
as independent is meaningless?

> Maybe I am not smart enough for attacking the system. Maybe others are
> smarter than me and find a way to attack it to get to 5 or 6 bits of
> accuracy. Yet, this is means there is way more entropy than I need --
> this is my first safety margin.

I agree that the amount of entropy per timing sample is almost
certainly much higher than the current /dev/random credits it for.

The hard part is proving it.  All a statistical test can show is
that its model has a hard time predicting the output.  It can't show
tha non-existence of a better model.  That's why /dev/random is so
conservative.

Many years ago, when clock rates were below 1 GHz, I wrote a small kernel
module which disabled all other interrupts, and did nothing but take timer
interrupts and capture TSC values to RAM.  (It stopped when the buffer
was full and let the system continue.)  This was on a system with both
CPU and timer clocks generated from a single crystal by PLL.

I got a nice gaussian distribution of interrupt timings, relative
to a best-fit line,, with a standard deviation of about 8 cycles.

If I wanted to repeat that these days, I'd have to either disable in
the BIOS, or take into account, spread-spectrum clocking.  Modern clock
PLLs, to reduce EMI, deliberately modulate the CPU clock at about 30 kHz.
That adds a "ripple" with about 40 ns p-p to the TSC values relative to
a non-modulated external clock.

If I'm not careful, I could think that was 40 ns * 3.2 GHz = 128 cycles
of unpredicatability when it's just a periodic pattern.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-29  9:34             ` George Spelvin
@ 2016-04-29  9:53               ` Stephan Mueller
  2016-04-29 11:04                 ` George Spelvin
  0 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-29  9:53 UTC (permalink / raw)
  To: George Spelvin; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso

Am Freitag, 29. April 2016, 05:34:18 schrieb George Spelvin:

Hi George,

> (Note that we have two chains of e-mails crossing mid-stream.  I'm in
> the middle of working on a much longer reply to your previous e-mail.)
> 
> >> They're not independent, nor are they identically distributed.
> > 
> > That is an interesting statement: you say that the time stamp has holes
> > in it, i.e. some values have zero probability of being selected!
> 
> That's not at all what I said.  It may be true, depending on Intel's
> TSC implementation, but I didn't say or imply it.
> 
> > Second, you imply that when bit x of a given time stamp has some
> > particular value, bit y can be deduced from bit x.
> 
> Yes.  For example, bit 30 can be deduced from bit 31, given our
> assumption that the attacker has knowledge of previous timestamps, and
> likely inter-interrupt times.  If bit 31 has changed, bit 30 is almost
> certainly zero.  The bits are not independent.

I think there is a slight mixup: IID is not related to an attacker predicting 
things. IID is simply a statistical measure, it is either there or not. It 
does not depend on an attacker (assuming that the attacker cannot change the 
data). Note, the IID is only needed to claim that the XOR will be entropy 
preserving.

The reason that the IID on a statistical level is preserved is due to the fact 
that that an attacker can only observe the values, but not manipulate them 
(i.e. set the bits in a time stamp depending on other bits in that very time 
stamp).

Hence, the attacker may cause that some bits have zero or little entropy, but 
he cannot change the statistical pattern of the bits. This is the key 
requirement why the XOR can be applied here: statistical independent bits, 
where some bits may not have any entropy.

The relativity of an attacker comes in when you want to determine how much 
entropy a particular bit has. And here, the higher the bit is the lower the 
entropy as the attacker has more and more likelihood to guess the bit 
correctly.
> 
> The distribution of bit 31 is, with very high probability, equal to that
> in the previous timestamp.  Bit 0, not so much.
> 
> In other words, bits 31 and 0 have different distributions.  They are
> not identically distributed.
> 
> I gave this example in my previous e-mail
> Message-ID: <20160429004748.9422.qmail@ns.horizon.com>
> 
> >> If they were identically distributed, they'd all have identical
> >> entropy.  And there's be no reason to stop at 32 bits.  If the high
> >> 32 bits have the same entropy as the low
> >> entropy too?.
> > 
> > There is absolutely no limit to the 32 bits. We easily can take the high
> > bits too. But we know (as you mention below), an attacker has more and
> > more knowledge about the selected bits the higher the bit is as he can
> > predict an event with a certain degree of probability.
> 
> Yes, an attacker has more information about higher bits.
> 
> This is the defintion of NOT identically distributed!

So, you are saying that by looking at data, you change their statistical 
distribution?
> 
> *If* they were identically distributed, a suggestion I'm pointing
> out the ridiculous implications of, then an attacker's knowledge
> of each of them would be identical.

Not at all, you mix the attackers knowledge again with a pure statistical 
property.


Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-29  9:53               ` Stephan Mueller
@ 2016-04-29 11:04                 ` George Spelvin
  2016-04-29 11:18                   ` Stephan Mueller
  0 siblings, 1 reply; 38+ messages in thread
From: George Spelvin @ 2016-04-29 11:04 UTC (permalink / raw)
  To: linux, smueller; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso

> I think there is a slight mixup: IID is not related to an attacker
> predicting things. IID is simply a statistical measure, it is either there
> or not. It does not depend on an attacker (assuming that the attacker
> cannot change the data). Note, the IID is only needed to claim that the
> XOR will be entropy preserving.

1. It DOES depend on the attacker.  Any statement about independence
   depends on the available knowledge.
2. XOR being entropy preserving depends on independence ONLY, it does
   NOT depend on identical distribution.  The latter is a red herring.
   (An English metaphor for "irrelevant distraction.")
3. Precisely because the bits are not independent, XOR is not
   guaranteed to be entropy-preserving (your sense) on real data.

To give a specific example, suppose that an attacker can predict that the
counter will be either x or x+1 on the upcoming sample.  For simplicity,
assume the probabilites are exactly 50%, so there is one full bit of
entropy in the lsbit.

But if x ends in ..01, then x+1 ends in ..10, and they have the same
XOR, and the attacker knows (0 bits if entropy) the XOR of the bottom
two bits even though they know nothing about the bottom bit in isolation.

>>> There is absolutely no limit to the 32 bits. We easily can take the high
>>> bits too. But we know (as you mention below), an attacker has more and
>>> more knowledge about the selected bits the higher the bit is as he can
>>> predict an event with a certain degree of probability.

>> Yes, an attacker has more information about higher bits.
>> 
>> This is the defintion of NOT identically distributed!

> So, you are saying that by looking at data, you change their statistical 
> distribution?

Yes.

For example, if I have seen the previous sample and it is 0x00000000,
I know that the distribution of the msbit of the following sample
is heavily biased toward 0.

If I have seen the previous sample and it is 0x7fffffff, I know that the
distribution of the msbit is heavily biased toward 1.

If I had not looked at the preceding samples, I would not be able
to draw those conclusions.

Remember, the following sample doesn't have a distribution; it is a
future fact.  The only thing that has a distribution is my advance
knowledge (prediction) of that fact.

>> *If* they were identically distributed, a suggestion I'm pointing
>> out the ridiculous implications of, then an attacker's knowledge
>> of each of them would be identical.

> Not at all, you mix the attackers knowledge again with a pure statistical 
> property.

I don't understand what a "pure statistical property" means.

The distribution of a single independent bit can be described
completely by giving the probability of it being 1.

In the absence of correlations (dependencies), this single number
completely describes the attacker's knowledge of the bit.

Several bits have identical distributions if and only if the
probability of their being 1 is identical.

This is the same as saying that the attacker's knowledge of the
bits is identical.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-29 11:04                 ` George Spelvin
@ 2016-04-29 11:18                   ` Stephan Mueller
  2016-04-29 18:02                     ` George Spelvin
  0 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-29 11:18 UTC (permalink / raw)
  To: George Spelvin; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso

Am Freitag, 29. April 2016, 07:04:24 schrieb George Spelvin:

Hi George,

> > I think there is a slight mixup: IID is not related to an attacker
> > predicting things. IID is simply a statistical measure, it is either there
> > or not. It does not depend on an attacker (assuming that the attacker
> > cannot change the data). Note, the IID is only needed to claim that the
> > XOR will be entropy preserving.
> 
> 1. It DOES depend on the attacker.  Any statement about independence
>    depends on the available knowledge.
> 2. XOR being entropy preserving depends on independence ONLY, it does
>    NOT depend on identical distribution.  The latter is a red herring.
>    (An English metaphor for "irrelevant distraction.")
> 3. Precisely because the bits are not independent, XOR is not
>    guaranteed to be entropy-preserving (your sense) on real data.

It seems we talk past each other. Your entire explanation refers to individual 
bits that come in sequentially where the attacker inbetween can analyze it and 
potentially modify his attack. Sure, in this case they are not independent and 
I am not claiming that.

But one single time stamp is one value where the entire 32 bits are generated 
in an atomic fashion to any observer -- there is no sequential obtaining of 
its bits, analyzing it and then reacting on it. Each of those bits are set 
independently from the others. So, when an attacker looks at it, the entire 32 
bits are already there. Hence there is no changing in a distribution by simply 
looking at it.

So, take one RDTSC value and slice it into the individual bits. You cannot 
predict the 32nd bit when known the first 31 bits. You can only predict the 
32nd bit if you know the previous time stamps.

Again, I know that when seeing two or more time stamps, they are depending on 
each other. And for processing these multiple time stamps I use concatenation 
which is not affected by dependencies.

Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-29 11:18                   ` Stephan Mueller
@ 2016-04-29 18:02                     ` George Spelvin
  2016-04-29 18:41                       ` Stephan Mueller
  0 siblings, 1 reply; 38+ messages in thread
From: George Spelvin @ 2016-04-29 18:02 UTC (permalink / raw)
  To: linux, smueller; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso

>> 1. It DOES depend on the attacker.  Any statement about independence
>>    depends on the available knowledge.
>> 2. XOR being entropy preserving depends on independence ONLY, it does
>>    NOT depend on identical distribution.  The latter is a red herring.
>>    (An English metaphor for "irrelevant distraction.")
>> 3. Precisely because the bits are not independent, XOR is not
>>    guaranteed to be entropy-preserving (your sense) on real data.

> It seems we talk past each other. Your entire explanation refers to
> individual bits that come in sequentially where the attacker inbetween
> can analyze it and potentially modify his attack. Sure, in this case
> they are not independent and I am not claiming that.

You can analyze it in two equivalent ways:

1. The attacker knows all previous timestamps, and we are tring to
   quantify their uncertainty about the current timestamp word.

   In this case, some of the attacker's knowledge will be about
   correlations between the bits of that word.

   I think this is the easier way to think about the problem and
   the formulation I prefer.  Especially because of the structure
   of the work you do afterward, XORing those 32 bits.

2. An attacker knows all previous *bits*, and is presented with, and
   tries to guess, the 32 bits of the timestamp one at a time.

   In this case, information gleaned from previously-seen bits which
   would be called correlations in option 1 get incorporated into the
   predicted distribution of the current bit.

   For measuring the source entropy, this is also a valid way to
   proceed.  It's like Shannon's early studies of the entropy of
   English by letting readers read the first half of a novel and then
   asking them to guess what follows, one letter at a time.

   I think this form is less convenient in general, and it's particularly
   annoying when subsequently computing the parity of a word, as it's
   hard to talk about cancellations due to non-independent bits.

Entropy (Shannon, Renyi, and min-) is additive, meaning that the two
different ways of measuring will produce the same result.

Either way, word or bit at a time, we are trying to quantify the *new*
additional entropy contributed by the current sample.  That means
we assume for the analysis of each sample that all previous samples
are known to the attacker.

> But one single time stamp is one value where the entire 32 bits are generated 
> in an atomic fashion to any observer -- there is no sequential obtaining of 
> its bits, analyzing it and then reacting on it.

I never suggested doing it any other way, although as I explained above
it's possible to do so.

I was only considering processing *words* sequentially.

Knowledge of the previous *words* affect the predicted distribution
of individual bits in the current word.

When woring word-at-a-time like this, we also have to consider
cross-correlations among the bits of a word.  The most general way to
express it is a set of 2^32 probabilities for each of the 2^32 possible
values.  The awkwardness of this form is why it's sometimes useful to
think about smaller pieces.

For example, if p(9) = 0.5, p(10) = 0.5, and p(x) = 0 for all other x,
then we have 1 bit of entropy in the word.

We can analyze it bit a time, and proceed in oe of two ways:

- If we go lsbit first, then we have 1 bit of entropy in bit 0, but
  then zero entropy in bit 1.
- If we go msbit first, we have 1 bit of entropy in bit 1, but then
  zero entropy in bit 0.

Either way, there's only 1 bit of entropy total because of the
correlation between the bits.  Once we have seen one of the bits, the
entropy of the second one collapses to 0.

And either way, we have 1 bit of entropy in the word, but 0 bits of entropy
in the parity of the word.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-29 18:02                     ` George Spelvin
@ 2016-04-29 18:41                       ` Stephan Mueller
  2016-04-29 20:08                         ` George Spelvin
  0 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-29 18:41 UTC (permalink / raw)
  To: George Spelvin; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso

Am Freitag, 29. April 2016, 14:02:48 schrieb George Spelvin:

Hi George,

> >> 1. It DOES depend on the attacker.  Any statement about independence
> >> 
> >>    depends on the available knowledge.
> >> 
> >> 2. XOR being entropy preserving depends on independence ONLY, it does
> >> 
> >>    NOT depend on identical distribution.  The latter is a red herring.
> >>    (An English metaphor for "irrelevant distraction.")
> >> 
> >> 3. Precisely because the bits are not independent, XOR is not
> >> 
> >>    guaranteed to be entropy-preserving (your sense) on real data.
> > 
> > It seems we talk past each other. Your entire explanation refers to
> > individual bits that come in sequentially where the attacker inbetween
> > can analyze it and potentially modify his attack. Sure, in this case
> > they are not independent and I am not claiming that.
> 
> You can analyze it in two equivalent ways:
> 
> 1. The attacker knows all previous timestamps, and we are tring to
>    quantify their uncertainty about the current timestamp word.
> 
>    In this case, some of the attacker's knowledge will be about
>    correlations between the bits of that word.
> 
>    I think this is the easier way to think about the problem and
>    the formulation I prefer.  Especially because of the structure
>    of the work you do afterward, XORing those 32 bits.
> 
> 2. An attacker knows all previous *bits*, and is presented with, and
>    tries to guess, the 32 bits of the timestamp one at a time.
> 
>    In this case, information gleaned from previously-seen bits which
>    would be called correlations in option 1 get incorporated into the
>    predicted distribution of the current bit.
> 
>    For measuring the source entropy, this is also a valid way to
>    proceed.  It's like Shannon's early studies of the entropy of
>    English by letting readers read the first half of a novel and then
>    asking them to guess what follows, one letter at a time.
> 
>    I think this form is less convenient in general, and it's particularly
>    annoying when subsequently computing the parity of a word, as it's
>    hard to talk about cancellations due to non-independent bits.
> 
> Entropy (Shannon, Renyi, and min-) is additive, meaning that the two
> different ways of measuring will produce the same result.
> 
> Either way, word or bit at a time, we are trying to quantify the *new*
> additional entropy contributed by the current sample.  That means
> we assume for the analysis of each sample that all previous samples
> are known to the attacker.
> 
> > But one single time stamp is one value where the entire 32 bits are
> > generated in an atomic fashion to any observer -- there is no sequential
> > obtaining of its bits, analyzing it and then reacting on it.
> 
> I never suggested doing it any other way, although as I explained above
> it's possible to do so.
> 
> I was only considering processing *words* sequentially.
> 
> Knowledge of the previous *words* affect the predicted distribution
> of individual bits in the current word.

That is all correct what you write and I concur. But you still do not answer 
the point I am making. You always in all your descriptions compare two or more 
time stamps. And I always concured that they are dependent -- and XOR will do 
whatever to the entropy.

What I am saying that the bits in one given time stamp are mutually 
independent. I.e. bit 0 of one time stamp does not depend on bit 1 of that 
very same time stamp.

I totally agree with you that bit 0 from time stamp 1 may tell you something 
about bit 0 of time stamp 2. And therefore I am not considering multiple time 
stamps for this first step.

Let me quote the definitions from SP800-90B:

IID: A sequence of random variables for which each element of the sequence has 
the same probability distribution as the other values and all values are 
mutually independent.

Independent: Two discrete random variables X and Y are (statistically) 
independent if the probability that an observation of X will have a certain 
value does not change, given knowledge of the value of an observation of Y 
(and vice versa). When this is the case, the probability that the observed 
values of X and Y will be x and y, respectively, is equal to the probability 
that the observed value of X will be x (determined without regard for the 
value of y) multiplied by the probability that the observed value of Y will be 
y (determined without regard for the value of x).


All I am claiming and all I am saying is that the bits 0 through 31 in one 
given time stamp are mutually indpendent. And thus the XOR of those 
independent bits is appropriate.

> 
> When woring word-at-a-time like this, we also have to consider
> cross-correlations among the bits of a word.  The most general way to

Tell me where the correlations should be within one word. Where do you claim 
they come from?

> express it is a set of 2^32 probabilities for each of the 2^32 possible
> values.  The awkwardness of this form is why it's sometimes useful to
> think about smaller pieces.
> 
> For example, if p(9) = 0.5, p(10) = 0.5, and p(x) = 0 for all other x,
> then we have 1 bit of entropy in the word.
> 
> We can analyze it bit a time, and proceed in oe of two ways:
> 
> - If we go lsbit first, then we have 1 bit of entropy in bit 0, but
>   then zero entropy in bit 1.
> - If we go msbit first, we have 1 bit of entropy in bit 1, but then
>   zero entropy in bit 0.

This description makes no sense.
> 
> Either way, there's only 1 bit of entropy total because of the
> correlation between the bits.  Once we have seen one of the bits, the
> entropy of the second one collapses to 0.

Again, where does the correlation is supposed to come from?
> 
> And either way, we have 1 bit of entropy in the word, but 0 bits of entropy
> in the parity of the word.






Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-29 18:41                       ` Stephan Mueller
@ 2016-04-29 20:08                         ` George Spelvin
  2016-04-29 21:54                           ` Stephan Mueller
  0 siblings, 1 reply; 38+ messages in thread
From: George Spelvin @ 2016-04-29 20:08 UTC (permalink / raw)
  To: linux, smueller; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso

> What I am saying that the bits in one given time stamp are mutually 
> independent. I.e. bit 0 of one time stamp does not depend on bit 1 of that 
> very same time stamp.

And I'm saying that's wrong.

We are interested in the correlation from the point of view of someone
who knows all previous time stamps (and a lot else besides).

Based on this knowledge of what happened before, it is possible to
deduce a correlation.

> I totally agree with you that bit 0 from time stamp 1 may tell you something 
> about bit 0 of time stamp 2. And therefore I am not considering multiple time 
> stamps for this first step.

But also bits 0 and 1 of time stamp 1 will tell you something about the
correlation of bits 0 and 1 of time stamp 2.

To simplify the discussion, let's slow down the time stamp counter
to slightly more than 1 tick per interrupt.

Suppose time stamp 1 (I'll call it x) happens to end in the bits "10".
The attacker knows, based on the rates of the interrupts and TSC ticks,
that the next time stamp is probably x+1 or x+2.  It might be x+3 or more,
but that's pretty unlikely.

This means that the lsbits are probably 11 or 00.  So there's a strong
correlation between bit 0 and bit 1.

> IID: A sequence of random variables for which each element of the
> sequence has the same probability distribution as the other values
> and all values are mutually independent.
>
> Independent: Two discrete random variables X and Y are (statistically)
> independent if the probability that an observation of X will have
> a certain value does not change, given knowledge of the value of
> an observation of Y (and vice versa). When this is the case, the
> probability that the observed values of X and Y will be x and y,
> respectively, is equal to the probability that the observed value of
> X will be x (determined without regard for the value of y) multiplied
> by the probability that the observed value of Y will be y (determined
> without regard for the value of x).

These are both exactly correct.

Notice in particular the statement that a probability (of X) can change
based on knowledge (of Y).  The special case where it doesn't change is
called independence.

> All I am claiming and all I am saying is that the bits 0 through 31 in one 
> given time stamp are mutually indpendent. And thus the XOR of those 
> independent bits is appropriate.

And I'm saying that's flat out wrong.  The bits are NOT mutually
independent.  The correlation might be small in some cases, but
it's not exactly zero.

>> When woring word-at-a-time like this, we also have to consider
>> cross-correlations among the bits of a word.  The most general way to

> Tell me where the correlations should be within one word.  Where do you claim 
> they come from?

>From all the other knowledge of the machine.  Knowledge of previous
timestamps, kernel internals, interrupt service routine execution times,
interrupt mitigation timers in various hardware, periodic interrupts, etc.

>> For example, if p(9) = 0.5, p(10) = 0.5, and p(x) = 0 for all other x,
>> then we have 1 bit of entropy in the word.
>> 
>> We can analyze it bit a time, and proceed in oe of two ways:
>> 
>> - If we go lsbit first, then we have 1 bit of entropy in bit 0, but
>>   then zero entropy in bit 1.
>> - If we go msbit first, we have 1 bit of entropy in bit 1, but then
>>   zero entropy in bit 0.

> This description makes no sense.

I tried to make it as simple as I could.

Let me try again.

Assume for the purpose of discussion that we are able to predict, by some
magical means irrelevant to the example, that the next timestamp will
definitely be either 9 or 10, with 50% probability of each.

This is obviously an extremely simplified example, but the basic logic
applies to more complex cases.

We can analyze the entropy of the following timestamp in three ways:

1) Consider the timestamp all at once.  The entropy of the word is the
   expected value of -log2(p[i]).  This is p[9] * -log2(p[9]) +
   p[10] * -log2(p[10]) = 0.5*1 + 0.5*1 = 1.0 bits of entropy.

2) Consider the timestamp a bit at a time, starting from the lsbit.
2a) We have no idea what bit 0 is, so the entropy of this bit is 1.0.
2b) Given that we have already seen bit 0, we know with certainty that
    bit 1 is its complement, so bit 1 contributes zero entropy.
    (See the definition of "independent" above.  Having gained knowledge
    of bit 0, the probability of bit 1 having a particular value has
    changed.  Thus bits 0 and 1 are not independent.)
2c) Bits 2..31 are all known ahead of time, so contribute zero entropy.

3) Consider the timestamp a bit at a time, starting from the msbit.
3a) We know bits 31..2 ahead of time, so they contribute zero entropy.
3b) We have no idea what bit 1 is, so it contributes 1.0 bits of entropy.
3c) Having seen bit 1, we know with certainty that bit 0 is its
    complement, so bit 0 contributes zero entropy.

The point of the example is that regardless of the method used to
add it up, the total entropy is the same.  We may not even attribute the
entropy to the same bit, but the total is still the same.

This additive property is what makes entropy a useful measurement.

The same logic applies to more complex cases, it just takes longer
to write out all the equations and show that the numbers add up the same.

>> Either way, there's only 1 bit of entropy total because of the
>> correlation between the bits.  Once we have seen one of the bits, the
>> entropy of the second one collapses to 0.

> Again, where does the correlation is supposed to come from?

>From our knowledge that the value will be either 9 (1001) or 10 (1010).

That's a severely simplfiied case, but it's meant to show what happens if
we have any knowlegedge of the range of possible values.

For another simplified example using larger numbers, suppose we know
that the next timestamp will be somewhere in the 2^17-tick range between
0xffff0000 and 0x0000ffff.  In this case, bits 16..31 are guaranteed
perfectly correlated.

The same logic applies if we know that the next interrupt timestamp will
be x + delta, where delta is a Poisson-distributed value with expected
value lambda, the math is just a lot messier and the correlations are
a lot smaller.  But still non-zero, so they're *not* independent.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-29 20:08                         ` George Spelvin
@ 2016-04-29 21:54                           ` Stephan Mueller
  2016-04-29 22:32                             ` George Spelvin
  0 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-29 21:54 UTC (permalink / raw)
  To: George Spelvin; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso

Am Freitag, 29. April 2016, 16:08:48 schrieb George Spelvin:

Hi George,

> > What I am saying that the bits in one given time stamp are mutually
> > independent. I.e. bit 0 of one time stamp does not depend on bit 1 of that
> > very same time stamp.
> 
> And I'm saying that's wrong.

I think we can agree that we disagree.

I am not sure whether you have a point or not. Though, I will get back to the 
drawing board and think about the problem of how to efficiently collect 
entropy.

Ciao
Stephan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-29 21:54                           ` Stephan Mueller
@ 2016-04-29 22:32                             ` George Spelvin
  0 siblings, 0 replies; 38+ messages in thread
From: George Spelvin @ 2016-04-29 22:32 UTC (permalink / raw)
  To: linux, smueller; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso

> I think we can agree that we disagree.

Yes, we agree on that, at least!

The problem is, this is supposed to be a matter of fact, not opinion,
so there should be one right answer.

I suppose it's possible it's still an issue of terminology, but we've
exhausted

> Though, I will get back to the drawing board and think about the problem
> of how to efficiently collect entropy.

For *collecting* it, there are two obvious sources that would be very
nice to use, I've just never had the courage to dig into the relevant
subsystems deep enough to add the hooks.

These could be turned on when entropy is required and turned off afterward:

1) The RTC periodic interrupt.  This is invariably driven by
   a separate 32.768 Hz crystal, so the jitter against the main
   clock oscillator is a useful source.

   A stadanrd PC RTC can run at up to 8 kHz, and probably deliver
   a few hundred bits/s.

2) Any available ADCs, especially audio ADCs.  A 16-bit or better ADC is a
   very rich source of entropy.  We'd need hook into the audio subsystem
   which could activate the ADC when needed and, most importantly,
   guarantee that the data did not go anywhere else.

   Becasue we can't guarantee that the audio input is quiet when collected,
   the entropy would have to be estimated a priori rather than deduced
   from measurements.  Some measurements would still be useful as a sanity
   check to ensure the data isn't digital zero or something.

For storing it after collecting it, I still think the current CRC-based
scheme is pretty good.  Note that cyclical XOR is a special case of this,
just using a polynomial of x^4096-1.  The good properties of CRCs for
detection of hardware-type errors are exactly equivalent to the collision
resistance properties desired for an entropy pool.

A collision results in an undetected error in the former case, and loss
of entropy in the latter.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: random(4) changes
  2016-04-27  0:23     ` George Spelvin
  2016-04-27 18:03       ` George Spelvin
  2016-04-28 20:15       ` Stephan Mueller
@ 2016-04-29  0:47       ` George Spelvin
  2 siblings, 0 replies; 38+ messages in thread
From: George Spelvin @ 2016-04-29  0:47 UTC (permalink / raw)
  To: smueller; +Cc: herbert, linux, linux-crypto, linux-kernel, sandyinchina, tytso

I'd like to apologise for the harsh tone of my earlier message.
I was frustrated, and it showed.

I hope I can be more gentle as I continue to elaborate on the shortcomings
of that scheme.

Concentrating entropy is hard.  To paraphrase Princess Leia, "the more
you tighten your grip, the more entropy will slip through your fingers."

While it's true that the entropy of two independent bit strings is at
least as high as the entropy of either one, it's not necessarily much
higher, either.

I'm going to discuss two things here.  First, what happens if you assume
that the inpuit samples are made up of independent and identically
distributed (i.i.d.) random bits, and second why they're not i.i.d.

When talking about single bits, all that matters is the probabilities
of the two values.  I'm going to assume that the bits are 1 with some
probability p <= 0.5, and 0 with probability p >= 0.5, but that's just
for convenience.

You can XOR any attacker-known pattern into those bits and it doesn't
change the entropy.  What matters is the probability that they match an
attacker's prediction, and the probability that they don't match.

For example, if an attacker knows that a bit is 75% likely to match
some other bit, then I'll describe it with p=0.25.

i.i.d. bits of course all have the same entropy.  Suppose that we have 32
bits, with 0.1 bits of entropy each, making 3.2 bits of entropy in total.

This is a pretty good sample; it shoud be easy to get one bit with high
entropy out of this, right?

Well, no.

0.1 bits of Shannon entropy means that a bit is 0 with probability 98.7%,
and 1 with probability p = 1.3%.

If you XOR two such bits together, the result is 1 with probability
2 * p * (1-p) (which is also less than 0.5).

Let's work out the entropy assuming we start with 0.1 of a bit of Shannon
entropy per bit, and 0.1 of a bit of min-entropy per bit, and then
form the XOR of increasingly large blocks:

Starting with 32/10 bits of Shannon entropy
 1: p=0.012987  Shannon=0.100000 (sum 3.200000)  Min=0.018859 (sum 0.603482)
 2: p=0.025636  Shannon=0.172013 (sum 2.752203)  Min=0.037468 (sum 0.599486)
 4: p=0.049958  Shannon=0.286220 (sum 2.289760)  Min=0.073937 (sum 0.591499)
 8: p=0.094925  Shannon=0.452699 (sum 1.810795)  Min=0.143891 (sum 0.575563)
16: p=0.171829  Shannon=0.661871 (sum 1.323741)  Min=0.271999 (sum 0.543997)
32: p=0.284607  Shannon=0.861653 (sum 0.861653)  Min=0.483192 (sum 0.483192)

The last line is the probability that the XOR of 32 such bits is 1.
28.46% is still some distance from 50/50, isn't it?

The min-entropy per bit comes close to doubling each time, since it
suffers less from saturation effects as it approaches 1 bit per bit,
but still 20% of the entropy is lost.

Starting with 32/10 bits of min-entropy
 1: p=0.066967  Shannon=0.354502 (sum 11.344068)  Min=0.100000 (sum 3.200000)
 2: p=0.124965  Shannon=0.543466 (sum  8.695452)  Min=0.192587 (sum 3.081394)
 4: p=0.218697  Shannon=0.757782 (sum  6.062253)  Min=0.356046 (sum 2.848372)
 8: p=0.341738  Shannon=0.926472 (sum  3.705887)  Min=0.603265 (sum 2.413061)
16: p=0.449906  Shannon=0.992747 (sum  1.985494)  Min=0.862250 (sum 1.724500)
32: p=0.494981  Shannon=0.999927 (sum  0.999927)  Min=0.985591 (sum 0.985591)

Because min-entropy is a more conservaitve estimate, the starting probability
is much closer to 50/50, and we got very close to unbiased.  But it took
11 bits of Shannon entropy to do it!

Working the formula backward, we can deduce that to get 0.95 bits of
Shannon entropy by XORing 32 i.i.d. bits, each of the input bits needs
to have p = 0.020511, 0.1443 bits of entropy, a total of 4.62 bits.

In fact, if the maximum entropy per bit is low enough then the
probability of getting the all-zero word is greater than 50% and
that imposes an upper limit on the entropy produces by *any* hashing
scheme.

For 0.1 bits of Shannon entropy per bit, (1-p)^32 = 0.658164 so even if
you reduced to one bit using !, you couldn't get closer to 50:50
than that.  (0.926565 bit Shannon, 0.603482 bits min-entropy.)

It turns out that lots of weakly random i.i.d. bits are a worst case for
this sort of reduction; if divide entropy in a "triangular" way, where
bit i gets (i+1) parts of the entropy (dividing by (32*33/2) to normalize)
then we get a final XOR of

p=0.338856  Shannon=0.923720  Min=0.596964

which is better than the p=0.284607 we got from i.i.d. bits.

Is it, then, perhaps the case that the i.i.d. assumption is not a good one?
The bit patterns we're talking about don't seem to resemble realistic
timestamps.

It's a very bad one, as I'll explain.

*Independence* can exist in a vacuum: not correlated with anything.
That's what /dev/random is trying to produce.  But the seed material
is anything but.

There are two major sources of information to an attacker about the
seed entropy:

1) An attacker is assumed to know all prior seed inputs.
   That's because we're trying to quantify the *additional* entropy
   contributed by this sample.  The difficulty of guessing the
   previous seed material has already been accounted for in the
   entropy assigned to it.  So any correlation with any previous
   seed material must be considered non-random.

2) An attacker is assumed to be able to run programs on the
   machine collecting entropy.  This is because we want different
   users' random bits to be secure from each other.

For the second issue, note that on my Ivy Bridge, I can do rdtsc(),
subtract from the previous and compare to a threshold in 24 cycles.

If I notice a sudden jump in rdtsc() deltas, I know an interrupt
happened, and I know when it happened within a window of 24 clock
cycles.

There's some delay to run the interrupt handler, but that's
highly predictable.

If interrupt arrival time is quantized, by a lower APIC clock speed,
PCI bus clock speed, or the like, then it's possible there's less
than log2(24) = 4.6 bits of uncertainty there.

(It may be possible to ignore this threat during early boot when
nothing is running, to speed up entropy accumulation.)

Getting to the first, much more important case, this is why the bits in
a captured timestamp are neither independent nor identically distributed.

What we care about those statements is *relative to the knowledge of the
attacker*.  Deterministic whitening is easy and completely pointless; the
only thing we're interested in measuring is what an attacker cannot know.

First of all, the correlation with the *previous* timestamp is stronger
the higher you go in the bits.

The chance of a higher-order bit having the predicted value is higher than
for a lower-order bit.  Thus, the bits aren't identically distributed.

If bit 31 toggles, bit 30 is almost certainly 0 now.  Thus, the bits
aren't independent.

There are many more counterexamples, but just those two will suffice
to show that the i.i.d. assumption is fatally flawed.  It makes the math
easy, but that's not what the available entropy is like.

The way to think about handling entropy is minimizing collisions.
No collisions means no entropy loss.  Collisions mean entropy loss.
This is not a huge problem when you're extracting from a pool and the
pool remains; all that happens is the remaining entropy stays in the pool.

Two possible inputs that result in the same pool state afterward is like
crossing the streams: it Would Be Bad.  You can't eliminate the possibility
with a finite-sized pool, but you should work to minimize it.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2016-04-29 22:32 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-22 22:27 random(4) changes Sandy Harris
2016-04-23  7:52 ` Stephan Mueller
2016-04-24  2:03 ` Theodore Ts'o
2016-04-24  8:03   ` Stephan Mueller
2016-04-26  3:07     ` Theodore Ts'o
2016-04-26 11:04       ` Herbert Xu
2016-04-26 20:47         ` Andi Kleen
2016-04-27  4:23           ` Herbert Xu
2016-04-26 18:24       ` Stephan Mueller
2016-04-26 18:44       ` Pavel Machek
2016-04-26 18:55         ` Stephan Mueller
2016-04-26 19:41           ` Pavel Machek
2016-04-25 16:06 ` Andi Kleen
2016-04-25 17:25   ` Stephan Mueller
2016-04-25 17:38     ` Andi Kleen
2016-04-25 17:56       ` Stephan Mueller
2016-04-25 19:35         ` Andi Kleen
2016-04-26 12:01           ` Stephan Mueller
2016-04-27 17:47           ` Stephan Mueller
2016-04-26  1:00   ` Theodore Ts'o
2016-04-26 12:42   ` Sandy Harris
     [not found] <5279345.Lo7T948V4W@positron.chronox.de>
2016-04-26 20:43 ` George Spelvin
2016-04-26 21:01   ` Stephan Mueller
2016-04-27  0:23     ` George Spelvin
2016-04-27 18:03       ` George Spelvin
2016-04-28 20:15       ` Stephan Mueller
2016-04-29  7:29         ` George Spelvin
2016-04-29  8:02           ` Stephan Mueller
2016-04-29  9:34             ` George Spelvin
2016-04-29  9:53               ` Stephan Mueller
2016-04-29 11:04                 ` George Spelvin
2016-04-29 11:18                   ` Stephan Mueller
2016-04-29 18:02                     ` George Spelvin
2016-04-29 18:41                       ` Stephan Mueller
2016-04-29 20:08                         ` George Spelvin
2016-04-29 21:54                           ` Stephan Mueller
2016-04-29 22:32                             ` George Spelvin
2016-04-29  0:47       ` George Spelvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox