* Re: random(4) changes
2016-04-27 0:23 ` George Spelvin
@ 2016-04-27 18:03 ` George Spelvin
2016-04-28 20:15 ` Stephan Mueller
2016-04-29 0:47 ` George Spelvin
2 siblings, 0 replies; 38+ messages in thread
From: George Spelvin @ 2016-04-27 18:03 UTC (permalink / raw)
To: andi
Cc: herbert, linux, linux-crypto, linux-kernel, sandyinchina,
smueller, tytso
Andi Kleen wrote:
> There is also the third problem of horrible scalability of /dev/random
> output on larger systems, for which patches are getting ignored.
I came up with some very pretty code to fix this, which
tried to copy_to_user with a lock held.
After all my attempts to fix that fatal flaw resulted in much uglier
code I set it aside for a while in the hopes that inspiration would
strike.
and it's still sitting unfinished. :-(
But I want to finish it, honest! This latest discussion has
made me acutely conscious of it.
The fact that the scope of changes just got bigger doesn't help of
course, but I *have* picked it up again.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: random(4) changes
2016-04-27 0:23 ` George Spelvin
2016-04-27 18:03 ` George Spelvin
@ 2016-04-28 20:15 ` Stephan Mueller
2016-04-29 7:29 ` George Spelvin
2016-04-29 0:47 ` George Spelvin
2 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-28 20:15 UTC (permalink / raw)
To: George Spelvin; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso
Am Dienstag, 26. April 2016, 20:23:46 schrieb George Spelvin:
Hi George,
> > And considering that I only want to have 0.9 bits of entropy, why
> > should I not collapse it? The XOR operation does not destroy the existing
> > entropy, it only caps it to at most one bit of information theoretical
> > entropy.
>
> No. Absolutely, demonstrably false.
>
> The XOR operation certainly *does* destroy entropy.
> If you have 0.9 bits of entropy to start, you will have less
> after the XOR. It does NOT return min(input, 1) bits.
As I am having difficulties following your explanation, let us start at the
definition:
XOR is defined as an entropy preserving operation, provided the two arguments
to the XOR operation are statistically independent (let us remember that
caveat for later).
That means, the entropy behavior of H(A XOR B) is max(H(A), H(B)) if they are
independent. For example, A has 5 bits of entropy and B has 7 bits of entropy,
A XOR B has 7 bits of entropy. Similarly, if A has zero bits of entropy the
XORed result will still have 7 bits of entropy from B. That applies regardless
of the size of A or B, including one bit sized chunks. The same applies when
XORing more values:
A XOR B XOR C = (A XOR B) XOR C
Now, the entropy behaves like:
max(max(H(A), H(B)), H(C)) = max(H(A), H(B), H(C))
Now, with that definition, let us look at the LRNG method. The LRNG obtains a
time stamp and uses the low 32 bits of it. The LRNG now slices those 32 bits
up in individual bits, let us call them b0 through b31.
The LRNG XORs these individual bits together. This means:
b0 XOR b1 XOR b2 XOR ... XOR b31
This operation gives us one bit.
How is the entropy behaving here? Let us use the definition from above:
H(XORed bit) = max(H(b0), H(b1), ..., H(b31))
We know that each individual bit can hold at most one bit. Thus the formula
implies that the XOR operation in the LRNG can at most get one bit of entropy.
Given these findings, I now have to show and demonstrate that:
1. the individual bits of a given 32 bit time stamp are independent (or IID in
terms of NIST)
2. show that the maximum entropy of each of the individual bits is equal or
more to my entropy estimate I apply.
Regarding 1: The time stamp (or cycle counter) is a 32 bit value where each
of the bits does not depend on the other bits. When considering one and only
one time stamp value and we look at, say, the first 20 bits, there is no way
it is clear what the missing 12 bits will be. Note I am not saying that when
comparing two or more time stamps that one cannot deduce the bits! And here it
is clear that the bits within one given time stamp are independent, but
multiple time stamps are not independent. This finding is supported with
measurements given in 3.4.1 (I understand that the measurements are only
supportive and no proof). Figure 3.1 shows an (almost) rectangular
distribution which is the hint to an equidistribution which in turn supports
the finding that the individual bits within a time stamp are independent. In
addition, when you look at the Shannon/Min Entropy values (which do not give
an entropy estimate here, but only help in understanding the distribution!),
the values show that the distribution has hardly any discontinuities -- please
read the explanation surrounding the figure.
Regarding 2: I did numerous measurements that show that the low bits do have
close to one bit of entropy per data bit. If I may ask to consider section
3.4.1 again (please consider that I tried to break the logic by applying a
pathological generation of interrupts here to stimulate the worst case): The
entropy is not found in the absolute time stamps, but visible in the time
deltas (and the uncertainty of the variations of those). So I calculated the
time deltas from the collected set of time stamps of events. Now, when simply
using the four (you may also use three or perhaps five) lower bits of the time
delta values, we can calculate an interesting and very important Minimum
Entropy value: the Markov Min Entropy. Using the table 2, I calculated the
Markov Min Entropy of the data set of the 4 low bit time delta values. The
result shows that the 4 bit values still have 3.92 bits of entropy (about 0.98
bits of entropy per data bit). Ok, one worst case measurement may not be good
enough. So I continued on other environments with the same testing. Table 3
provides the results on those environments. And they have even more entropy
than the first measurement! So, with all the measurements I always see that
each of the four low bits has around 0.98 bits of entropy. Thus, with the XOR
value I can conclude that these measurements show that the XOR result will
have 0.98 bits of Markov Min Entropy based on these measurements.
Please note that I assume an entropy content of 256/288 bits of entropy per
data bit which is slightly less than 0.9. This lower level is significantly
less than the measured values -- a safety margin.
Albeit that marks the conclusion of the XOR folding assessment, let me
continue why this XOR folding operation provides another helping hand. The
measurement of the time deltas in 3.4.1, particular figure 3.2 shows that the
time delta has even 11 bits of ("regular") Min Entropy. So, don't I waste a
lot of entropy with the XOR folding? Apart from having more safety margins in
case the overall delta values have less variations than I measured in my worst
case testing, there is another factor at play:
As I have explained above, the XOR collapse is applied to the time stamps.
Those time stamps show statistical dependencies (and maybe even to a lesser
degree the time deltas have some dependencies too). We fold the time stamps
and then concatenate them -- concatenation is not affected by the statistical
dependencies. At one point in time we have a wrap-around in the entropy pool.
The current entropy pool is 4096 bits in size (which can be arbitrarily
changed to a minimum of 256 bits), so we wrap after 4096 received events. Now,
we XOR the bit from the first interrupt with the bit from the 4097th
interrupt. To ensure that the XOR operation is entropy preserving, these bits
must be statistically independent. And to ensure that, the collapsing of the
time stamp and the seemingly loosing of entropy helps here too! So, we give up
entropy to "buy" statistical independence to support the XOR operation here.
With section 3.4.2 I apply a large array of statistical tests against a bit
stream of folded bits. All of those tests pass, indicating that the bit stream
behaves like White Noise without any whitening logic like LFSR or even
hashing. Thus, this testing supports my analysis from above.
The root cause for not applying an LFSR or another mix-in function is that
such LFSR is already a whitening logic. But I do have a whitening logic
already with the DRBG. So, to me having a whitening logic whose output is used
by another whitener is akin that you have to hide some deficiencies like skews
or other problems in your noise source. But I have nothing to hide at the
layer of the noise source.
Ciao
Stephan
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: random(4) changes
2016-04-28 20:15 ` Stephan Mueller
@ 2016-04-29 7:29 ` George Spelvin
2016-04-29 8:02 ` Stephan Mueller
0 siblings, 1 reply; 38+ messages in thread
From: George Spelvin @ 2016-04-29 7:29 UTC (permalink / raw)
To: linux, smueller; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso
>From smueller@chronox.de Fri Apr 29 04:56:49 2016
From: Stephan Mueller <smueller@chronox.de>
To: George Spelvin <linux@horizon.com>
Cc: herbert@gondor.apana.org.au, linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org, sandyinchina@gmail.com, tytso@mit.edu
Subject: Re: random(4) changes
Date: Thu, 28 Apr 2016 22:15:17 +0200
User-Agent: KMail/4.14.10 (Linux/4.4.7-300.fc23.x86_64; KDE/4.14.18; x86_64; ; )
In-Reply-To: <20160427002346.12354.qmail@ns.horizon.com>
References: <20160427002346.12354.qmail@ns.horizon.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
Am Dienstag, 26. April 2016, 20:23:46 schrieb George Spelvin:
Hi George,
> > And considering that I only want to have 0.9 bits of entropy, why
> > should I not collapse it? The XOR operation does not destroy the existing
> > entropy, it only caps it to at most one bit of information theoretical
> > entropy.
>
> No. Absolutely, demonstrably false.
>
> The XOR operation certainly *does* destroy entropy.
> If you have 0.9 bits of entropy to start, you will have less
> after the XOR. It does NOT return min(input, 1) bits.
As I am having difficulties following your explanation, let us start at the
definition:
XOR is defined as an entropy preserving operation, provided the two arguments
to the XOR operation are statistically independent (let us remember that
caveat for later).
> That means, the entropy behavior of H(A XOR B) is max(H(A), H(B)) if they are
> independent.
Actually, no. If they're independent, it can be greater.
For example, if A has half a bit of entropy, and B has half a bit
(both in the Shannon sense), then A XOR B will have 0.713537
bits.
> 1. the individual bits of a given 32 bit time stamp are independent
> (or IID in terms of NIST)
They're not independent, nor are they identically distributed.
If they were identically distributed, they'd all have identical
entropy. And there's be no reason to stop at 32 bits. If the high
32 bits have the same entropy as the low
entropy too?.
> 2. show that the maximum entropy of each of the individual bits is equal or
> more to my entropy estimate I apply.
I'm not sure what you mean here. When you say "maximum entropy" is that
a non-strict upper bound?
Or does that mean that at least one bit achieves that maximum?
That will be a much more interesting proof.
> Regarding 1: The time stamp (or cycle counter) is a 32 bit value where
> each of the bits does not depend on the other bits. When considering one
> and only one time stamp value and we look at, say, the first 20 bits,
> there is no way it is clear what the missing 12 bits will be.
If you deliberately exclude all external data, then a 32-bit
constant is random. (I suggest 17, the "most random number".)
But that's meaningless. When we talk about "entropy", we are talking
about an attacker's uncertainty about the value. Any other measure is
garbage in, and proiduces nothing but garbage out.
Note I
am not saying that when comparing two or more time stamps that one
cannot deduce the bits! And here it is clear that the bits within
one given time stamp are independent, but multiple time stamps are
not independent. This finding is supported with measurements given in
3.4.1 (I understand that the measurements are only supportive and no
proof). Figure 3.1 shows an (almost) rectangular distribution which is
the hint to an equidistribution which in turn supports the finding that
the individual bits within a time stamp are independent. In addition,
when you look at the Shannon/Min Entropy values (which do not give an
entropy estimate here, but only help in understanding the distribution!),
the values show that the distribution has hardly any discontinuities --
please read the explanation surrounding the figure.
Regarding 2: I did numerous measurements that show that the low bits do have
close to one bit of entropy per data bit. If I may ask to consider section
3.4.1 again (please consider that I tried to break the logic by applying a
pathological generation of interrupts here to stimulate the worst case): The
entropy is not found in the absolute time stamps, but visible in the time
deltas (and the uncertainty of the variations of those). So I calculated the
time deltas from the collected set of time stamps of events. Now, when simply
using the four (you may also use three or perhaps five) lower bits of the time
delta values, we can calculate an interesting and very important Minimum
Entropy value: the Markov Min Entropy. Using the table 2, I calculated the
Markov Min Entropy of the data set of the 4 low bit time delta values. The
result shows that the 4 bit values still have 3.92 bits of entropy (about 0.98
bits of entropy per data bit). Ok, one worst case measurement may not be good
enough. So I continued on other environments with the same testing. Table 3
provides the results on those environments. And they have even more entropy
than the first measurement! So, with all the measurements I always see that
each of the four low bits has around 0.98 bits of entropy. Thus, with the XOR
value I can conclude that these measurements show that the XOR result will
have 0.98 bits of Markov Min Entropy based on these measurements.
Please note that I assume an entropy content of 256/288 bits of entropy per
data bit which is slightly less than 0.9. This lower level is significantly
less than the measured values -- a safety margin.
Albeit that marks the conclusion of the XOR folding assessment, let me
continue why this XOR folding operation provides another helping hand. The
measurement of the time deltas in 3.4.1, particular figure 3.2 shows that the
time delta has even 11 bits of ("regular") Min Entropy. So, don't I waste a
lot of entropy with the XOR folding? Apart from having more safety margins in
case the overall delta values have less variations than I measured in my worst
case testing, there is another factor at play:
As I have explained above, the XOR collapse is applied to the time stamps.
Those time stamps show statistical dependencies (and maybe even to a lesser
degree the time deltas have some dependencies too). We fold the time stamps
and then concatenate them -- concatenation is not affected by the statistical
dependencies. At one point in time we have a wrap-around in the entropy pool.
The current entropy pool is 4096 bits in size (which can be arbitrarily
changed to a minimum of 256 bits), so we wrap after 4096 received events. Now,
we XOR the bit from the first interrupt with the bit from the 4097th
interrupt. To ensure that the XOR operation is entropy preserving, these bits
must be statistically independent. And to ensure that, the collapsing of the
time stamp and the seemingly loosing of entropy helps here too! So, we give up
entropy to "buy" statistical independence to support the XOR operation here.
With section 3.4.2 I apply a large array of statistical tests against a bit
stream of folded bits. All of those tests pass, indicating that the bit stream
behaves like White Noise without any whitening logic like LFSR or even
hashing. Thus, this testing supports my analysis from above.
The root cause for not applying an LFSR or another mix-in function is that
such LFSR is already a whitening logic. But I do have a whitening logic
already with the DRBG. So, to me having a whitening logic whose output is used
by another whitener is akin that you have to hide some deficiencies like skews
or other problems in your noise source. But I have nothing to hide at the
layer of the noise source.
Ciao
Stephan
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: random(4) changes
2016-04-29 7:29 ` George Spelvin
@ 2016-04-29 8:02 ` Stephan Mueller
2016-04-29 9:34 ` George Spelvin
0 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-29 8:02 UTC (permalink / raw)
To: George Spelvin; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso
Am Freitag, 29. April 2016, 03:29:50 schrieb George Spelvin:
Hi George,
> > 1. the individual bits of a given 32 bit time stamp are independent
> >
> > (or IID in terms of NIST)
>
> They're not independent, nor are they identically distributed.
That is an interesting statement: you say that the time stamp has holes in it,
i.e. some values have zero probability of being selected! Second, you imply
that when bit x of a given time stamp has some particular value, bit y can be
deduced from bit x.
I have not experienced that nor do I see any hints for that claim.
>
> If they were identically distributed, they'd all have identical
> entropy. And there's be no reason to stop at 32 bits. If the high
> 32 bits have the same entropy as the low
> entropy too?.
There is absolutely no limit to the 32 bits. We easily can take the high bits
too. But we know (as you mention below), an attacker has more and more
knowledge about the selected bits the higher the bit is as he can predict an
event with a certain degree of probability.
Thus, mixing in the high 32 bits do not hurt here from a mathematical point of
view. But from a technical, it hurts: we know that these high 32 bits have
hardly any entropy relative to the attacker. Thus, we would mix in bits that
do not really help us in the entropy collection. But the processing still
requires CPU cycles -- for each interrupt. Thus, to prevent wasting CPU
cycles, I think that the high 32 bits should be discarded.
But if people say that they want them considered too, I have no problems in
adding them
>
> > 2. show that the maximum entropy of each of the individual bits is equal
> > or
> >
> > more to my entropy estimate I apply.
>
> I'm not sure what you mean here. When you say "maximum entropy" is that
> a non-strict upper bound?
>
> Or does that mean that at least one bit achieves that maximum?
Exactly that -- I have to show that at least one bit out of the 32 bit value
reaches that maximum, i.e. one bit has more entropy than my entropy estimate.
>
> That will be a much more interesting proof.
>
> > Regarding 1: The time stamp (or cycle counter) is a 32 bit value where
> > each of the bits does not depend on the other bits. When considering one
> > and only one time stamp value and we look at, say, the first 20 bits,
> > there is no way it is clear what the missing 12 bits will be.
>
> If you deliberately exclude all external data, then a 32-bit
> constant is random. (I suggest 17, the "most random number".)
>
> But that's meaningless. When we talk about "entropy", we are talking
> about an attacker's uncertainty about the value. Any other measure is
> garbage in, and proiduces nothing but garbage out.
Correct. Please attack the, say, low 4 or 5 bits of a high-res timer so that
you can predict their values with a certain confidence for the observed events
(in the legacy /dev/random, that is a hdd event, a HID event and an interrupt
-- all of those events are user-triggerable). If you achieve that, you broke,
well, almost all timer based noise sources -- be it the legacy /dev/random, be
it OpenBSD, XNU, you name it.
Note, I thought I can attack the legacy /dev/random HID noise source using the
X11 logic: if one has access to the X11 server, one can see all HID events. I
measured its RDTSC time and obtained the respective RDTSC time from the legacy
/dev/random event processing. There are about 500,000,000 clock ticks in
variations between both measurements. For a ping flood from a VMM host to a
virtual machine guest, I get down to 11 bits variations. I even measured RDTSC
timers (see my Jitter RNG measurements) in a tight loop where interrupts are
immediately to be spotted -- the variations of those interrupts are also in
the vicinity of 10 or 11 bits.
Regardless of what I am doing, I do not see that I can get below 10 bits of
"accuracy" in predicting an RDTSC time stamp of an event processed by the
legacy /dev/random.
Maybe I am not smart enough for attacking the system. Maybe others are smarter
than me and find a way to attack it to get to 5 or 6 bits of accuracy. Yet,
this is means there is way more entropy than I need -- this is my first safety
margin.
Ciao
Stephan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: random(4) changes
2016-04-29 8:02 ` Stephan Mueller
@ 2016-04-29 9:34 ` George Spelvin
2016-04-29 9:53 ` Stephan Mueller
0 siblings, 1 reply; 38+ messages in thread
From: George Spelvin @ 2016-04-29 9:34 UTC (permalink / raw)
To: linux, smueller; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso
(Note that we have two chains of e-mails crossing mid-stream. I'm in
the middle of working on a much longer reply to your previous e-mail.)
>> They're not independent, nor are they identically distributed.
> That is an interesting statement: you say that the time stamp has holes
> in it, i.e. some values have zero probability of being selected!
That's not at all what I said. It may be true, depending on Intel's
TSC implementation, but I didn't say or imply it.
> Second, you imply that when bit x of a given time stamp has some
> particular value, bit y can be deduced from bit x.
Yes. For example, bit 30 can be deduced from bit 31, given our
assumption that the attacker has knowledge of previous timestamps, and
likely inter-interrupt times. If bit 31 has changed, bit 30 is almost
certainly zero. The bits are not independent.
The distribution of bit 31 is, with very high probability, equal to that
in the previous timestamp. Bit 0, not so much.
In other words, bits 31 and 0 have different distributions. They are
not identically distributed.
I gave this example in my previous e-mail
Message-ID: <20160429004748.9422.qmail@ns.horizon.com>
>> If they were identically distributed, they'd all have identical
>> entropy. And there's be no reason to stop at 32 bits. If the high
>> 32 bits have the same entropy as the low
>> entropy too?.
> There is absolutely no limit to the 32 bits. We easily can take the high bits
> too. But we know (as you mention below), an attacker has more and more
> knowledge about the selected bits the higher the bit is as he can predict an
> event with a certain degree of probability.
Yes, an attacker has more information about higher bits.
This is the defintion of NOT identically distributed!
*If* they were identically distributed, a suggestion I'm pointing
out the ridiculous implications of, then an attacker's knowledge
of each of them would be identical.
If that were the case (and it's not), then the high 32 bits would be as
good a source of entropy as the low 32 bits.
>>> 2. show that the maximum entropy of each of the individual bits is equal
>>> or more to my entropy estimate I apply.
>>
>> I'm not sure what you mean here. When you say "maximum entropy" is that
>> a non-strict upper bound?
>>
>> Or does that mean that at least one bit achieves that maximum?
> Exactly that -- I have to show that at least one bit out of the 32
> bit value reaches that maximum, i.e. one bit has more entropy than my
> entropy estimate.
That will be an interesting claim to argue for. Where do you make it?
>>> Regarding 1: The time stamp (or cycle counter) is a 32 bit value where
>>> each of the bits does not depend on the other bits. When considering one
>>> and only one time stamp value and we look at, say, the first 20 bits,
>>> there is no way it is clear what the missing 12 bits will be.
>> If you deliberately exclude all external data, then a 32-bit
>> constant is random. (I suggest 17, the "most random number".)
>>
>> But that's meaningless. When we talk about "entropy", we are talking
>> about an attacker's uncertainty about the value. Any other measure is
>> garbage in, and produces nothing but garbage out.
> Correct.
You mean that I'm correct that your description of the timestamp bits
as independent is meaningless?
> Maybe I am not smart enough for attacking the system. Maybe others are
> smarter than me and find a way to attack it to get to 5 or 6 bits of
> accuracy. Yet, this is means there is way more entropy than I need --
> this is my first safety margin.
I agree that the amount of entropy per timing sample is almost
certainly much higher than the current /dev/random credits it for.
The hard part is proving it. All a statistical test can show is
that its model has a hard time predicting the output. It can't show
tha non-existence of a better model. That's why /dev/random is so
conservative.
Many years ago, when clock rates were below 1 GHz, I wrote a small kernel
module which disabled all other interrupts, and did nothing but take timer
interrupts and capture TSC values to RAM. (It stopped when the buffer
was full and let the system continue.) This was on a system with both
CPU and timer clocks generated from a single crystal by PLL.
I got a nice gaussian distribution of interrupt timings, relative
to a best-fit line,, with a standard deviation of about 8 cycles.
If I wanted to repeat that these days, I'd have to either disable in
the BIOS, or take into account, spread-spectrum clocking. Modern clock
PLLs, to reduce EMI, deliberately modulate the CPU clock at about 30 kHz.
That adds a "ripple" with about 40 ns p-p to the TSC values relative to
a non-modulated external clock.
If I'm not careful, I could think that was 40 ns * 3.2 GHz = 128 cycles
of unpredicatability when it's just a periodic pattern.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: random(4) changes
2016-04-29 9:34 ` George Spelvin
@ 2016-04-29 9:53 ` Stephan Mueller
2016-04-29 11:04 ` George Spelvin
0 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-29 9:53 UTC (permalink / raw)
To: George Spelvin; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso
Am Freitag, 29. April 2016, 05:34:18 schrieb George Spelvin:
Hi George,
> (Note that we have two chains of e-mails crossing mid-stream. I'm in
> the middle of working on a much longer reply to your previous e-mail.)
>
> >> They're not independent, nor are they identically distributed.
> >
> > That is an interesting statement: you say that the time stamp has holes
> > in it, i.e. some values have zero probability of being selected!
>
> That's not at all what I said. It may be true, depending on Intel's
> TSC implementation, but I didn't say or imply it.
>
> > Second, you imply that when bit x of a given time stamp has some
> > particular value, bit y can be deduced from bit x.
>
> Yes. For example, bit 30 can be deduced from bit 31, given our
> assumption that the attacker has knowledge of previous timestamps, and
> likely inter-interrupt times. If bit 31 has changed, bit 30 is almost
> certainly zero. The bits are not independent.
I think there is a slight mixup: IID is not related to an attacker predicting
things. IID is simply a statistical measure, it is either there or not. It
does not depend on an attacker (assuming that the attacker cannot change the
data). Note, the IID is only needed to claim that the XOR will be entropy
preserving.
The reason that the IID on a statistical level is preserved is due to the fact
that that an attacker can only observe the values, but not manipulate them
(i.e. set the bits in a time stamp depending on other bits in that very time
stamp).
Hence, the attacker may cause that some bits have zero or little entropy, but
he cannot change the statistical pattern of the bits. This is the key
requirement why the XOR can be applied here: statistical independent bits,
where some bits may not have any entropy.
The relativity of an attacker comes in when you want to determine how much
entropy a particular bit has. And here, the higher the bit is the lower the
entropy as the attacker has more and more likelihood to guess the bit
correctly.
>
> The distribution of bit 31 is, with very high probability, equal to that
> in the previous timestamp. Bit 0, not so much.
>
> In other words, bits 31 and 0 have different distributions. They are
> not identically distributed.
>
> I gave this example in my previous e-mail
> Message-ID: <20160429004748.9422.qmail@ns.horizon.com>
>
> >> If they were identically distributed, they'd all have identical
> >> entropy. And there's be no reason to stop at 32 bits. If the high
> >> 32 bits have the same entropy as the low
> >> entropy too?.
> >
> > There is absolutely no limit to the 32 bits. We easily can take the high
> > bits too. But we know (as you mention below), an attacker has more and
> > more knowledge about the selected bits the higher the bit is as he can
> > predict an event with a certain degree of probability.
>
> Yes, an attacker has more information about higher bits.
>
> This is the defintion of NOT identically distributed!
So, you are saying that by looking at data, you change their statistical
distribution?
>
> *If* they were identically distributed, a suggestion I'm pointing
> out the ridiculous implications of, then an attacker's knowledge
> of each of them would be identical.
Not at all, you mix the attackers knowledge again with a pure statistical
property.
Ciao
Stephan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: random(4) changes
2016-04-29 9:53 ` Stephan Mueller
@ 2016-04-29 11:04 ` George Spelvin
2016-04-29 11:18 ` Stephan Mueller
0 siblings, 1 reply; 38+ messages in thread
From: George Spelvin @ 2016-04-29 11:04 UTC (permalink / raw)
To: linux, smueller; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso
> I think there is a slight mixup: IID is not related to an attacker
> predicting things. IID is simply a statistical measure, it is either there
> or not. It does not depend on an attacker (assuming that the attacker
> cannot change the data). Note, the IID is only needed to claim that the
> XOR will be entropy preserving.
1. It DOES depend on the attacker. Any statement about independence
depends on the available knowledge.
2. XOR being entropy preserving depends on independence ONLY, it does
NOT depend on identical distribution. The latter is a red herring.
(An English metaphor for "irrelevant distraction.")
3. Precisely because the bits are not independent, XOR is not
guaranteed to be entropy-preserving (your sense) on real data.
To give a specific example, suppose that an attacker can predict that the
counter will be either x or x+1 on the upcoming sample. For simplicity,
assume the probabilites are exactly 50%, so there is one full bit of
entropy in the lsbit.
But if x ends in ..01, then x+1 ends in ..10, and they have the same
XOR, and the attacker knows (0 bits if entropy) the XOR of the bottom
two bits even though they know nothing about the bottom bit in isolation.
>>> There is absolutely no limit to the 32 bits. We easily can take the high
>>> bits too. But we know (as you mention below), an attacker has more and
>>> more knowledge about the selected bits the higher the bit is as he can
>>> predict an event with a certain degree of probability.
>> Yes, an attacker has more information about higher bits.
>>
>> This is the defintion of NOT identically distributed!
> So, you are saying that by looking at data, you change their statistical
> distribution?
Yes.
For example, if I have seen the previous sample and it is 0x00000000,
I know that the distribution of the msbit of the following sample
is heavily biased toward 0.
If I have seen the previous sample and it is 0x7fffffff, I know that the
distribution of the msbit is heavily biased toward 1.
If I had not looked at the preceding samples, I would not be able
to draw those conclusions.
Remember, the following sample doesn't have a distribution; it is a
future fact. The only thing that has a distribution is my advance
knowledge (prediction) of that fact.
>> *If* they were identically distributed, a suggestion I'm pointing
>> out the ridiculous implications of, then an attacker's knowledge
>> of each of them would be identical.
> Not at all, you mix the attackers knowledge again with a pure statistical
> property.
I don't understand what a "pure statistical property" means.
The distribution of a single independent bit can be described
completely by giving the probability of it being 1.
In the absence of correlations (dependencies), this single number
completely describes the attacker's knowledge of the bit.
Several bits have identical distributions if and only if the
probability of their being 1 is identical.
This is the same as saying that the attacker's knowledge of the
bits is identical.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: random(4) changes
2016-04-29 11:04 ` George Spelvin
@ 2016-04-29 11:18 ` Stephan Mueller
2016-04-29 18:02 ` George Spelvin
0 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-29 11:18 UTC (permalink / raw)
To: George Spelvin; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso
Am Freitag, 29. April 2016, 07:04:24 schrieb George Spelvin:
Hi George,
> > I think there is a slight mixup: IID is not related to an attacker
> > predicting things. IID is simply a statistical measure, it is either there
> > or not. It does not depend on an attacker (assuming that the attacker
> > cannot change the data). Note, the IID is only needed to claim that the
> > XOR will be entropy preserving.
>
> 1. It DOES depend on the attacker. Any statement about independence
> depends on the available knowledge.
> 2. XOR being entropy preserving depends on independence ONLY, it does
> NOT depend on identical distribution. The latter is a red herring.
> (An English metaphor for "irrelevant distraction.")
> 3. Precisely because the bits are not independent, XOR is not
> guaranteed to be entropy-preserving (your sense) on real data.
It seems we talk past each other. Your entire explanation refers to individual
bits that come in sequentially where the attacker inbetween can analyze it and
potentially modify his attack. Sure, in this case they are not independent and
I am not claiming that.
But one single time stamp is one value where the entire 32 bits are generated
in an atomic fashion to any observer -- there is no sequential obtaining of
its bits, analyzing it and then reacting on it. Each of those bits are set
independently from the others. So, when an attacker looks at it, the entire 32
bits are already there. Hence there is no changing in a distribution by simply
looking at it.
So, take one RDTSC value and slice it into the individual bits. You cannot
predict the 32nd bit when known the first 31 bits. You can only predict the
32nd bit if you know the previous time stamps.
Again, I know that when seeing two or more time stamps, they are depending on
each other. And for processing these multiple time stamps I use concatenation
which is not affected by dependencies.
Ciao
Stephan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: random(4) changes
2016-04-29 11:18 ` Stephan Mueller
@ 2016-04-29 18:02 ` George Spelvin
2016-04-29 18:41 ` Stephan Mueller
0 siblings, 1 reply; 38+ messages in thread
From: George Spelvin @ 2016-04-29 18:02 UTC (permalink / raw)
To: linux, smueller; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso
>> 1. It DOES depend on the attacker. Any statement about independence
>> depends on the available knowledge.
>> 2. XOR being entropy preserving depends on independence ONLY, it does
>> NOT depend on identical distribution. The latter is a red herring.
>> (An English metaphor for "irrelevant distraction.")
>> 3. Precisely because the bits are not independent, XOR is not
>> guaranteed to be entropy-preserving (your sense) on real data.
> It seems we talk past each other. Your entire explanation refers to
> individual bits that come in sequentially where the attacker inbetween
> can analyze it and potentially modify his attack. Sure, in this case
> they are not independent and I am not claiming that.
You can analyze it in two equivalent ways:
1. The attacker knows all previous timestamps, and we are tring to
quantify their uncertainty about the current timestamp word.
In this case, some of the attacker's knowledge will be about
correlations between the bits of that word.
I think this is the easier way to think about the problem and
the formulation I prefer. Especially because of the structure
of the work you do afterward, XORing those 32 bits.
2. An attacker knows all previous *bits*, and is presented with, and
tries to guess, the 32 bits of the timestamp one at a time.
In this case, information gleaned from previously-seen bits which
would be called correlations in option 1 get incorporated into the
predicted distribution of the current bit.
For measuring the source entropy, this is also a valid way to
proceed. It's like Shannon's early studies of the entropy of
English by letting readers read the first half of a novel and then
asking them to guess what follows, one letter at a time.
I think this form is less convenient in general, and it's particularly
annoying when subsequently computing the parity of a word, as it's
hard to talk about cancellations due to non-independent bits.
Entropy (Shannon, Renyi, and min-) is additive, meaning that the two
different ways of measuring will produce the same result.
Either way, word or bit at a time, we are trying to quantify the *new*
additional entropy contributed by the current sample. That means
we assume for the analysis of each sample that all previous samples
are known to the attacker.
> But one single time stamp is one value where the entire 32 bits are generated
> in an atomic fashion to any observer -- there is no sequential obtaining of
> its bits, analyzing it and then reacting on it.
I never suggested doing it any other way, although as I explained above
it's possible to do so.
I was only considering processing *words* sequentially.
Knowledge of the previous *words* affect the predicted distribution
of individual bits in the current word.
When woring word-at-a-time like this, we also have to consider
cross-correlations among the bits of a word. The most general way to
express it is a set of 2^32 probabilities for each of the 2^32 possible
values. The awkwardness of this form is why it's sometimes useful to
think about smaller pieces.
For example, if p(9) = 0.5, p(10) = 0.5, and p(x) = 0 for all other x,
then we have 1 bit of entropy in the word.
We can analyze it bit a time, and proceed in oe of two ways:
- If we go lsbit first, then we have 1 bit of entropy in bit 0, but
then zero entropy in bit 1.
- If we go msbit first, we have 1 bit of entropy in bit 1, but then
zero entropy in bit 0.
Either way, there's only 1 bit of entropy total because of the
correlation between the bits. Once we have seen one of the bits, the
entropy of the second one collapses to 0.
And either way, we have 1 bit of entropy in the word, but 0 bits of entropy
in the parity of the word.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: random(4) changes
2016-04-29 18:02 ` George Spelvin
@ 2016-04-29 18:41 ` Stephan Mueller
2016-04-29 20:08 ` George Spelvin
0 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-29 18:41 UTC (permalink / raw)
To: George Spelvin; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso
Am Freitag, 29. April 2016, 14:02:48 schrieb George Spelvin:
Hi George,
> >> 1. It DOES depend on the attacker. Any statement about independence
> >>
> >> depends on the available knowledge.
> >>
> >> 2. XOR being entropy preserving depends on independence ONLY, it does
> >>
> >> NOT depend on identical distribution. The latter is a red herring.
> >> (An English metaphor for "irrelevant distraction.")
> >>
> >> 3. Precisely because the bits are not independent, XOR is not
> >>
> >> guaranteed to be entropy-preserving (your sense) on real data.
> >
> > It seems we talk past each other. Your entire explanation refers to
> > individual bits that come in sequentially where the attacker inbetween
> > can analyze it and potentially modify his attack. Sure, in this case
> > they are not independent and I am not claiming that.
>
> You can analyze it in two equivalent ways:
>
> 1. The attacker knows all previous timestamps, and we are tring to
> quantify their uncertainty about the current timestamp word.
>
> In this case, some of the attacker's knowledge will be about
> correlations between the bits of that word.
>
> I think this is the easier way to think about the problem and
> the formulation I prefer. Especially because of the structure
> of the work you do afterward, XORing those 32 bits.
>
> 2. An attacker knows all previous *bits*, and is presented with, and
> tries to guess, the 32 bits of the timestamp one at a time.
>
> In this case, information gleaned from previously-seen bits which
> would be called correlations in option 1 get incorporated into the
> predicted distribution of the current bit.
>
> For measuring the source entropy, this is also a valid way to
> proceed. It's like Shannon's early studies of the entropy of
> English by letting readers read the first half of a novel and then
> asking them to guess what follows, one letter at a time.
>
> I think this form is less convenient in general, and it's particularly
> annoying when subsequently computing the parity of a word, as it's
> hard to talk about cancellations due to non-independent bits.
>
> Entropy (Shannon, Renyi, and min-) is additive, meaning that the two
> different ways of measuring will produce the same result.
>
> Either way, word or bit at a time, we are trying to quantify the *new*
> additional entropy contributed by the current sample. That means
> we assume for the analysis of each sample that all previous samples
> are known to the attacker.
>
> > But one single time stamp is one value where the entire 32 bits are
> > generated in an atomic fashion to any observer -- there is no sequential
> > obtaining of its bits, analyzing it and then reacting on it.
>
> I never suggested doing it any other way, although as I explained above
> it's possible to do so.
>
> I was only considering processing *words* sequentially.
>
> Knowledge of the previous *words* affect the predicted distribution
> of individual bits in the current word.
That is all correct what you write and I concur. But you still do not answer
the point I am making. You always in all your descriptions compare two or more
time stamps. And I always concured that they are dependent -- and XOR will do
whatever to the entropy.
What I am saying that the bits in one given time stamp are mutually
independent. I.e. bit 0 of one time stamp does not depend on bit 1 of that
very same time stamp.
I totally agree with you that bit 0 from time stamp 1 may tell you something
about bit 0 of time stamp 2. And therefore I am not considering multiple time
stamps for this first step.
Let me quote the definitions from SP800-90B:
IID: A sequence of random variables for which each element of the sequence has
the same probability distribution as the other values and all values are
mutually independent.
Independent: Two discrete random variables X and Y are (statistically)
independent if the probability that an observation of X will have a certain
value does not change, given knowledge of the value of an observation of Y
(and vice versa). When this is the case, the probability that the observed
values of X and Y will be x and y, respectively, is equal to the probability
that the observed value of X will be x (determined without regard for the
value of y) multiplied by the probability that the observed value of Y will be
y (determined without regard for the value of x).
All I am claiming and all I am saying is that the bits 0 through 31 in one
given time stamp are mutually indpendent. And thus the XOR of those
independent bits is appropriate.
>
> When woring word-at-a-time like this, we also have to consider
> cross-correlations among the bits of a word. The most general way to
Tell me where the correlations should be within one word. Where do you claim
they come from?
> express it is a set of 2^32 probabilities for each of the 2^32 possible
> values. The awkwardness of this form is why it's sometimes useful to
> think about smaller pieces.
>
> For example, if p(9) = 0.5, p(10) = 0.5, and p(x) = 0 for all other x,
> then we have 1 bit of entropy in the word.
>
> We can analyze it bit a time, and proceed in oe of two ways:
>
> - If we go lsbit first, then we have 1 bit of entropy in bit 0, but
> then zero entropy in bit 1.
> - If we go msbit first, we have 1 bit of entropy in bit 1, but then
> zero entropy in bit 0.
This description makes no sense.
>
> Either way, there's only 1 bit of entropy total because of the
> correlation between the bits. Once we have seen one of the bits, the
> entropy of the second one collapses to 0.
Again, where does the correlation is supposed to come from?
>
> And either way, we have 1 bit of entropy in the word, but 0 bits of entropy
> in the parity of the word.
Ciao
Stephan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: random(4) changes
2016-04-29 18:41 ` Stephan Mueller
@ 2016-04-29 20:08 ` George Spelvin
2016-04-29 21:54 ` Stephan Mueller
0 siblings, 1 reply; 38+ messages in thread
From: George Spelvin @ 2016-04-29 20:08 UTC (permalink / raw)
To: linux, smueller; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso
> What I am saying that the bits in one given time stamp are mutually
> independent. I.e. bit 0 of one time stamp does not depend on bit 1 of that
> very same time stamp.
And I'm saying that's wrong.
We are interested in the correlation from the point of view of someone
who knows all previous time stamps (and a lot else besides).
Based on this knowledge of what happened before, it is possible to
deduce a correlation.
> I totally agree with you that bit 0 from time stamp 1 may tell you something
> about bit 0 of time stamp 2. And therefore I am not considering multiple time
> stamps for this first step.
But also bits 0 and 1 of time stamp 1 will tell you something about the
correlation of bits 0 and 1 of time stamp 2.
To simplify the discussion, let's slow down the time stamp counter
to slightly more than 1 tick per interrupt.
Suppose time stamp 1 (I'll call it x) happens to end in the bits "10".
The attacker knows, based on the rates of the interrupts and TSC ticks,
that the next time stamp is probably x+1 or x+2. It might be x+3 or more,
but that's pretty unlikely.
This means that the lsbits are probably 11 or 00. So there's a strong
correlation between bit 0 and bit 1.
> IID: A sequence of random variables for which each element of the
> sequence has the same probability distribution as the other values
> and all values are mutually independent.
>
> Independent: Two discrete random variables X and Y are (statistically)
> independent if the probability that an observation of X will have
> a certain value does not change, given knowledge of the value of
> an observation of Y (and vice versa). When this is the case, the
> probability that the observed values of X and Y will be x and y,
> respectively, is equal to the probability that the observed value of
> X will be x (determined without regard for the value of y) multiplied
> by the probability that the observed value of Y will be y (determined
> without regard for the value of x).
These are both exactly correct.
Notice in particular the statement that a probability (of X) can change
based on knowledge (of Y). The special case where it doesn't change is
called independence.
> All I am claiming and all I am saying is that the bits 0 through 31 in one
> given time stamp are mutually indpendent. And thus the XOR of those
> independent bits is appropriate.
And I'm saying that's flat out wrong. The bits are NOT mutually
independent. The correlation might be small in some cases, but
it's not exactly zero.
>> When woring word-at-a-time like this, we also have to consider
>> cross-correlations among the bits of a word. The most general way to
> Tell me where the correlations should be within one word. Where do you claim
> they come from?
>From all the other knowledge of the machine. Knowledge of previous
timestamps, kernel internals, interrupt service routine execution times,
interrupt mitigation timers in various hardware, periodic interrupts, etc.
>> For example, if p(9) = 0.5, p(10) = 0.5, and p(x) = 0 for all other x,
>> then we have 1 bit of entropy in the word.
>>
>> We can analyze it bit a time, and proceed in oe of two ways:
>>
>> - If we go lsbit first, then we have 1 bit of entropy in bit 0, but
>> then zero entropy in bit 1.
>> - If we go msbit first, we have 1 bit of entropy in bit 1, but then
>> zero entropy in bit 0.
> This description makes no sense.
I tried to make it as simple as I could.
Let me try again.
Assume for the purpose of discussion that we are able to predict, by some
magical means irrelevant to the example, that the next timestamp will
definitely be either 9 or 10, with 50% probability of each.
This is obviously an extremely simplified example, but the basic logic
applies to more complex cases.
We can analyze the entropy of the following timestamp in three ways:
1) Consider the timestamp all at once. The entropy of the word is the
expected value of -log2(p[i]). This is p[9] * -log2(p[9]) +
p[10] * -log2(p[10]) = 0.5*1 + 0.5*1 = 1.0 bits of entropy.
2) Consider the timestamp a bit at a time, starting from the lsbit.
2a) We have no idea what bit 0 is, so the entropy of this bit is 1.0.
2b) Given that we have already seen bit 0, we know with certainty that
bit 1 is its complement, so bit 1 contributes zero entropy.
(See the definition of "independent" above. Having gained knowledge
of bit 0, the probability of bit 1 having a particular value has
changed. Thus bits 0 and 1 are not independent.)
2c) Bits 2..31 are all known ahead of time, so contribute zero entropy.
3) Consider the timestamp a bit at a time, starting from the msbit.
3a) We know bits 31..2 ahead of time, so they contribute zero entropy.
3b) We have no idea what bit 1 is, so it contributes 1.0 bits of entropy.
3c) Having seen bit 1, we know with certainty that bit 0 is its
complement, so bit 0 contributes zero entropy.
The point of the example is that regardless of the method used to
add it up, the total entropy is the same. We may not even attribute the
entropy to the same bit, but the total is still the same.
This additive property is what makes entropy a useful measurement.
The same logic applies to more complex cases, it just takes longer
to write out all the equations and show that the numbers add up the same.
>> Either way, there's only 1 bit of entropy total because of the
>> correlation between the bits. Once we have seen one of the bits, the
>> entropy of the second one collapses to 0.
> Again, where does the correlation is supposed to come from?
>From our knowledge that the value will be either 9 (1001) or 10 (1010).
That's a severely simplfiied case, but it's meant to show what happens if
we have any knowlegedge of the range of possible values.
For another simplified example using larger numbers, suppose we know
that the next timestamp will be somewhere in the 2^17-tick range between
0xffff0000 and 0x0000ffff. In this case, bits 16..31 are guaranteed
perfectly correlated.
The same logic applies if we know that the next interrupt timestamp will
be x + delta, where delta is a Poisson-distributed value with expected
value lambda, the math is just a lot messier and the correlations are
a lot smaller. But still non-zero, so they're *not* independent.
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: random(4) changes
2016-04-29 20:08 ` George Spelvin
@ 2016-04-29 21:54 ` Stephan Mueller
2016-04-29 22:32 ` George Spelvin
0 siblings, 1 reply; 38+ messages in thread
From: Stephan Mueller @ 2016-04-29 21:54 UTC (permalink / raw)
To: George Spelvin; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso
Am Freitag, 29. April 2016, 16:08:48 schrieb George Spelvin:
Hi George,
> > What I am saying that the bits in one given time stamp are mutually
> > independent. I.e. bit 0 of one time stamp does not depend on bit 1 of that
> > very same time stamp.
>
> And I'm saying that's wrong.
I think we can agree that we disagree.
I am not sure whether you have a point or not. Though, I will get back to the
drawing board and think about the problem of how to efficiently collect
entropy.
Ciao
Stephan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: random(4) changes
2016-04-29 21:54 ` Stephan Mueller
@ 2016-04-29 22:32 ` George Spelvin
0 siblings, 0 replies; 38+ messages in thread
From: George Spelvin @ 2016-04-29 22:32 UTC (permalink / raw)
To: linux, smueller; +Cc: herbert, linux-crypto, linux-kernel, sandyinchina, tytso
> I think we can agree that we disagree.
Yes, we agree on that, at least!
The problem is, this is supposed to be a matter of fact, not opinion,
so there should be one right answer.
I suppose it's possible it's still an issue of terminology, but we've
exhausted
> Though, I will get back to the drawing board and think about the problem
> of how to efficiently collect entropy.
For *collecting* it, there are two obvious sources that would be very
nice to use, I've just never had the courage to dig into the relevant
subsystems deep enough to add the hooks.
These could be turned on when entropy is required and turned off afterward:
1) The RTC periodic interrupt. This is invariably driven by
a separate 32.768 Hz crystal, so the jitter against the main
clock oscillator is a useful source.
A stadanrd PC RTC can run at up to 8 kHz, and probably deliver
a few hundred bits/s.
2) Any available ADCs, especially audio ADCs. A 16-bit or better ADC is a
very rich source of entropy. We'd need hook into the audio subsystem
which could activate the ADC when needed and, most importantly,
guarantee that the data did not go anywhere else.
Becasue we can't guarantee that the audio input is quiet when collected,
the entropy would have to be estimated a priori rather than deduced
from measurements. Some measurements would still be useful as a sanity
check to ensure the data isn't digital zero or something.
For storing it after collecting it, I still think the current CRC-based
scheme is pretty good. Note that cyclical XOR is a special case of this,
just using a polynomial of x^4096-1. The good properties of CRCs for
detection of hardware-type errors are exactly equivalent to the collision
resistance properties desired for an entropy pool.
A collision results in an undetected error in the former case, and loss
of entropy in the latter.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: random(4) changes
2016-04-27 0:23 ` George Spelvin
2016-04-27 18:03 ` George Spelvin
2016-04-28 20:15 ` Stephan Mueller
@ 2016-04-29 0:47 ` George Spelvin
2 siblings, 0 replies; 38+ messages in thread
From: George Spelvin @ 2016-04-29 0:47 UTC (permalink / raw)
To: smueller; +Cc: herbert, linux, linux-crypto, linux-kernel, sandyinchina, tytso
I'd like to apologise for the harsh tone of my earlier message.
I was frustrated, and it showed.
I hope I can be more gentle as I continue to elaborate on the shortcomings
of that scheme.
Concentrating entropy is hard. To paraphrase Princess Leia, "the more
you tighten your grip, the more entropy will slip through your fingers."
While it's true that the entropy of two independent bit strings is at
least as high as the entropy of either one, it's not necessarily much
higher, either.
I'm going to discuss two things here. First, what happens if you assume
that the inpuit samples are made up of independent and identically
distributed (i.i.d.) random bits, and second why they're not i.i.d.
When talking about single bits, all that matters is the probabilities
of the two values. I'm going to assume that the bits are 1 with some
probability p <= 0.5, and 0 with probability p >= 0.5, but that's just
for convenience.
You can XOR any attacker-known pattern into those bits and it doesn't
change the entropy. What matters is the probability that they match an
attacker's prediction, and the probability that they don't match.
For example, if an attacker knows that a bit is 75% likely to match
some other bit, then I'll describe it with p=0.25.
i.i.d. bits of course all have the same entropy. Suppose that we have 32
bits, with 0.1 bits of entropy each, making 3.2 bits of entropy in total.
This is a pretty good sample; it shoud be easy to get one bit with high
entropy out of this, right?
Well, no.
0.1 bits of Shannon entropy means that a bit is 0 with probability 98.7%,
and 1 with probability p = 1.3%.
If you XOR two such bits together, the result is 1 with probability
2 * p * (1-p) (which is also less than 0.5).
Let's work out the entropy assuming we start with 0.1 of a bit of Shannon
entropy per bit, and 0.1 of a bit of min-entropy per bit, and then
form the XOR of increasingly large blocks:
Starting with 32/10 bits of Shannon entropy
1: p=0.012987 Shannon=0.100000 (sum 3.200000) Min=0.018859 (sum 0.603482)
2: p=0.025636 Shannon=0.172013 (sum 2.752203) Min=0.037468 (sum 0.599486)
4: p=0.049958 Shannon=0.286220 (sum 2.289760) Min=0.073937 (sum 0.591499)
8: p=0.094925 Shannon=0.452699 (sum 1.810795) Min=0.143891 (sum 0.575563)
16: p=0.171829 Shannon=0.661871 (sum 1.323741) Min=0.271999 (sum 0.543997)
32: p=0.284607 Shannon=0.861653 (sum 0.861653) Min=0.483192 (sum 0.483192)
The last line is the probability that the XOR of 32 such bits is 1.
28.46% is still some distance from 50/50, isn't it?
The min-entropy per bit comes close to doubling each time, since it
suffers less from saturation effects as it approaches 1 bit per bit,
but still 20% of the entropy is lost.
Starting with 32/10 bits of min-entropy
1: p=0.066967 Shannon=0.354502 (sum 11.344068) Min=0.100000 (sum 3.200000)
2: p=0.124965 Shannon=0.543466 (sum 8.695452) Min=0.192587 (sum 3.081394)
4: p=0.218697 Shannon=0.757782 (sum 6.062253) Min=0.356046 (sum 2.848372)
8: p=0.341738 Shannon=0.926472 (sum 3.705887) Min=0.603265 (sum 2.413061)
16: p=0.449906 Shannon=0.992747 (sum 1.985494) Min=0.862250 (sum 1.724500)
32: p=0.494981 Shannon=0.999927 (sum 0.999927) Min=0.985591 (sum 0.985591)
Because min-entropy is a more conservaitve estimate, the starting probability
is much closer to 50/50, and we got very close to unbiased. But it took
11 bits of Shannon entropy to do it!
Working the formula backward, we can deduce that to get 0.95 bits of
Shannon entropy by XORing 32 i.i.d. bits, each of the input bits needs
to have p = 0.020511, 0.1443 bits of entropy, a total of 4.62 bits.
In fact, if the maximum entropy per bit is low enough then the
probability of getting the all-zero word is greater than 50% and
that imposes an upper limit on the entropy produces by *any* hashing
scheme.
For 0.1 bits of Shannon entropy per bit, (1-p)^32 = 0.658164 so even if
you reduced to one bit using !, you couldn't get closer to 50:50
than that. (0.926565 bit Shannon, 0.603482 bits min-entropy.)
It turns out that lots of weakly random i.i.d. bits are a worst case for
this sort of reduction; if divide entropy in a "triangular" way, where
bit i gets (i+1) parts of the entropy (dividing by (32*33/2) to normalize)
then we get a final XOR of
p=0.338856 Shannon=0.923720 Min=0.596964
which is better than the p=0.284607 we got from i.i.d. bits.
Is it, then, perhaps the case that the i.i.d. assumption is not a good one?
The bit patterns we're talking about don't seem to resemble realistic
timestamps.
It's a very bad one, as I'll explain.
*Independence* can exist in a vacuum: not correlated with anything.
That's what /dev/random is trying to produce. But the seed material
is anything but.
There are two major sources of information to an attacker about the
seed entropy:
1) An attacker is assumed to know all prior seed inputs.
That's because we're trying to quantify the *additional* entropy
contributed by this sample. The difficulty of guessing the
previous seed material has already been accounted for in the
entropy assigned to it. So any correlation with any previous
seed material must be considered non-random.
2) An attacker is assumed to be able to run programs on the
machine collecting entropy. This is because we want different
users' random bits to be secure from each other.
For the second issue, note that on my Ivy Bridge, I can do rdtsc(),
subtract from the previous and compare to a threshold in 24 cycles.
If I notice a sudden jump in rdtsc() deltas, I know an interrupt
happened, and I know when it happened within a window of 24 clock
cycles.
There's some delay to run the interrupt handler, but that's
highly predictable.
If interrupt arrival time is quantized, by a lower APIC clock speed,
PCI bus clock speed, or the like, then it's possible there's less
than log2(24) = 4.6 bits of uncertainty there.
(It may be possible to ignore this threat during early boot when
nothing is running, to speed up entropy accumulation.)
Getting to the first, much more important case, this is why the bits in
a captured timestamp are neither independent nor identically distributed.
What we care about those statements is *relative to the knowledge of the
attacker*. Deterministic whitening is easy and completely pointless; the
only thing we're interested in measuring is what an attacker cannot know.
First of all, the correlation with the *previous* timestamp is stronger
the higher you go in the bits.
The chance of a higher-order bit having the predicted value is higher than
for a lower-order bit. Thus, the bits aren't identically distributed.
If bit 31 toggles, bit 30 is almost certainly 0 now. Thus, the bits
aren't independent.
There are many more counterexamples, but just those two will suffice
to show that the i.i.d. assumption is fatally flawed. It makes the math
easy, but that's not what the available entropy is like.
The way to think about handling entropy is minimizing collisions.
No collisions means no entropy loss. Collisions mean entropy loss.
This is not a huge problem when you're extracting from a pool and the
pool remains; all that happens is the remaining entropy stays in the pool.
Two possible inputs that result in the same pool state afterward is like
crossing the streams: it Would Be Bad. You can't eliminate the possibility
with a finite-sized pool, but you should work to minimize it.
^ permalink raw reply [flat|nested] 38+ messages in thread