Entropy from disks

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Entropy from disks
@ 2002-10-29 17:10 Oliver Xymoron
  2002-10-29 18:02 ` Chris Friesen
  0 siblings, 1 reply; 5+ messages in thread
From: Oliver Xymoron @ 2002-10-29 17:10 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel

I just noticed:

http://linux.bkbits.net:8080/linux-2.5/cset@1.858?nav=index.html|ChangeSet@-2d

Since you're touching the disk entropy code, please read this, which
is one of the few papers on the subject:

http://world.std.com/~dtd/random/forward.ps

The current Linux PRNG is playing fast and loose here, adding entropy
based on the resolution of the TSC, while the physical turbulence
processes that actually produce entropy are happening at a scale of
seconds. On a GHz processor, if it takes 4 microseconds to return a
disk result from on-disk cache, /dev/random will get a 12-bit credit.

My entropy patches had each entropy source (not just disks) allocate
its own state object, and declare its timing "granularity".

There's a further problem with disk timing samples that make them less
than useful in typical headless server use (ie where it matters): the
server does its best to reveal disk latency to clients, easily
measurable within the auto-correlating domain of disk turbulence.

-- 
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Entropy from disks
  2002-10-29 17:10 Entropy from disks Oliver Xymoron
@ 2002-10-29 18:02 ` Chris Friesen
  2002-10-29 19:08   ` Oliver Xymoron
  2002-11-05 23:55   ` David Schwartz
  0 siblings, 2 replies; 5+ messages in thread
From: Chris Friesen @ 2002-10-29 18:02 UTC (permalink / raw)
  To: Oliver Xymoron; +Cc: Alexander Viro, linux-kernel

Oliver Xymoron wrote:

I'm not an expert in this field, so bear with me if I make any blunders 
obvious to one trained in information theory.

> The current Linux PRNG is playing fast and loose here, adding entropy
> based on the resolution of the TSC, while the physical turbulence
> processes that actually produce entropy are happening at a scale of
> seconds. On a GHz processor, if it takes 4 microseconds to return a
> disk result from on-disk cache, /dev/random will get a 12-bit credit.

In the paper the accuracy of measurement is 1ms.  Current hardware has 
tsc precision of nanoseconds, or about 6 orders of magnitude more 
accuracy.  Doesn't this mean that we can pump in many more bits into the 
  algorithm and get out many more than the 100bits/min that the setup in 
the paper acheives?

> My entropy patches had each entropy source (not just disks) allocate
> its own state object, and declare its timing "granularity".

Is it that straightforward? In the paper they go through a number of 
stages designed to remove correlated data.  From what I remember of the 
linux prng disk-related stuff it is not that sophisticated.

> There's a further problem with disk timing samples that make them less
> than useful in typical headless server use (ie where it matters): the
> server does its best to reveal disk latency to clients, easily
> measurable within the auto-correlating domain of disk turbulence.

If this is truly a concern, what about having a separate disk used for 
nothing but generating randomness?

Chris

-- 
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Entropy from disks
  2002-10-29 18:02 ` Chris Friesen
@ 2002-10-29 19:08   ` Oliver Xymoron
  2002-11-05 23:55   ` David Schwartz
  1 sibling, 0 replies; 5+ messages in thread
From: Oliver Xymoron @ 2002-10-29 19:08 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Alexander Viro, linux-kernel

On Tue, Oct 29, 2002 at 01:02:39PM -0500, Chris Friesen wrote:
> Oliver Xymoron wrote:
> 
> I'm not an expert in this field, so bear with me if I make any blunders 
> obvious to one trained in information theory.
> 
> >The current Linux PRNG is playing fast and loose here, adding entropy
> >based on the resolution of the TSC, while the physical turbulence
> >processes that actually produce entropy are happening at a scale of
> >seconds. On a GHz processor, if it takes 4 microseconds to return a
> >disk result from on-disk cache, /dev/random will get a 12-bit credit.
> 
> In the paper the accuracy of measurement is 1ms.  Current hardware has 
> tsc precision of nanoseconds, or about 6 orders of magnitude more 
> accuracy.  Doesn't this mean that we can pump in many more bits into the 
>  algorithm and get out many more than the 100bits/min that the setup in 
> the paper acheives?

No, it doesn't scale quite like that. The autocorrelation property
says such bits are poorly distributed - the turbulence has an inherent
timescale at which things become hard to predict. Over short
timescales they become easier to predict. We can posit that we can
increase the measurement accuracy to a degree where we begin to see
some sort of underlying noise caused by Brownian motion and physical
processes of that sort, but then you'd have to sit down and model it
like this paper's done rather than just handwaving and saying 12 LSBs
of the TSC are real noise. 

But that's all irrelevant because the TSC resolution isn't really
improving our measurement accuracy anything like that much. Nothing
the processor can measure comes in faster than the FSB clock rate,
which is a fixed lock-step multiple of the processor clock, which is
generally a fixed lock-step multiple of a 33.3MHz crystal somewhere.
The disk's microcontrollers, if not taking a clock signal directly
from the motherboard, are likely in sympathetic synchrony with them,
and are even then likely divided down a bit. Framing requirements for
disk and bus protocols lower this further. 

> >My entropy patches had each entropy source (not just disks) allocate
> >its own state object, and declare its timing "granularity".
> 
> Is it that straightforward? In the paper they go through a number of 
> stages designed to remove correlated data.  From what I remember of the 
> linux prng disk-related stuff it is not that sophisticated.

No, it's not that straightforward. And no, the current code doesn't
even address cache effects. Hell, it doesn't even know about
solid-state devices. Note that SHA will do just as good a job of
removing correlation as an FFT, the problem is overestimating the
input entropy.

I think the prudent approach here is to punt and declare disks
untrusted, especially in light of the headless server attack model
where we intentionally leak entropy. It's still possible to use this
data for /dev/urandom (which my patches allow).

> >There's a further problem with disk timing samples that make them less
> >than useful in typical headless server use (ie where it matters): the
> >server does its best to reveal disk latency to clients, easily
> >measurable within the auto-correlating domain of disk turbulence.
> 
> If this is truly a concern, what about having a separate disk used for 
> nothing but generating randomness?

Doable, and probably best done with a userspace app feeding /dev/random.

-- 
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Entropy from disks
  2002-10-29 18:02 ` Chris Friesen
  2002-10-29 19:08   ` Oliver Xymoron
@ 2002-11-05 23:55   ` David Schwartz
  1 sibling, 0 replies; 5+ messages in thread
From: David Schwartz @ 2002-11-05 23:55 UTC (permalink / raw)
  To: cfriesen, Oliver Xymoron; +Cc: Alexander Viro, linux-kernel


On Tue, 29 Oct 2002 13:02:39 -0500, Chris Friesen wrote:
>Oliver Xymoron wrote:

>I'm not an expert in this field, so bear with me if I make any blunders
>obvious to one trained in information theory.

>>The current Linux PRNG is playing fast and loose here, adding entropy
>>based on the resolution of the TSC, while the physical turbulence
>>processes that actually produce entropy are happening at a scale of
>>seconds. On a GHz processor, if it takes 4 microseconds to return a
>>disk result from on-disk cache, /dev/random will get a 12-bit credit.

>In the paper the accuracy of measurement is 1ms.  Current hardware has
>tsc precision of nanoseconds, or about 6 orders of magnitude more
>accuracy.  Doesn't this mean that we can pump in many more bits into the
>algorithm and get out many more than the 100bits/min that the setup in
>the paper acheives?

	In theory, if there's any real physical randomness in a timing source, the 
more accuracy you measure the timing to, the more bits you get.

>>My entropy patches had each entropy source (not just disks) allocate
>>its own state object, and declare its timing "granularity".
>
>Is it that straightforward? In the paper they go through a number of
>stages designed to remove correlated data.  From what I remember of the
>linux prng disk-related stuff it is not that sophisticated.

	It does't matter. Any processing you do on data that may contain randomness 
can only remove the randomness from it. There is no deterministic processing 
method that can increase the randomness of a sample, only concentrate it in 
fewer bits.

>>There's a further problem with disk timing samples that make them less
>>than useful in typical headless server use (ie where it matters): the
>>server does its best to reveal disk latency to clients, easily
>>measurable within the auto-correlating domain of disk turbulence.

>If this is truly a concern, what about having a separate disk used for
>nothing but generating randomness?

	Or just delay by one extra clock cycle if the TSC is odd. This will make the 
LSB of the TSC totally opaque at a negligible cost. One perfect bit per disk 
access is enough for most reasonable applications that really only needed 
cryptographically-secure randomness to begin with.

	DS



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Entropy from disks
@ 2002-11-06  0:27 Oliver Xymoron
  0 siblings, 0 replies; 5+ messages in thread
From: Oliver Xymoron @ 2002-11-06  0:27 UTC (permalink / raw)
  To: David Schwartz; +Cc: cfriesen, Alexander Viro, linux-kernel

On Tue, Nov 05, 2002 at 03:55:57PM -0800, David Schwartz wrote:
> 
> On Tue, 29 Oct 2002 13:02:39 -0500, Chris Friesen wrote:
> >Oliver Xymoron wrote:
> 
> >I'm not an expert in this field, so bear with me if I make any blunders
> >obvious to one trained in information theory.
> 
> >>The current Linux PRNG is playing fast and loose here, adding entropy
> >>based on the resolution of the TSC, while the physical turbulence
> >>processes that actually produce entropy are happening at a scale of
> >>seconds. On a GHz processor, if it takes 4 microseconds to return a
> >>disk result from on-disk cache, /dev/random will get a 12-bit credit.
> 
> >In the paper the accuracy of measurement is 1ms.  Current hardware has
> >tsc precision of nanoseconds, or about 6 orders of magnitude more
> >accuracy.  Doesn't this mean that we can pump in many more bits into the
> >algorithm and get out many more than the 100bits/min that the setup in
> >the paper acheives?
> 
> 	In theory, if there's any real physical randomness in a timing source, the 
> more accuracy you measure the timing to, the more bits you get.

Not if the timing source is clocked at a substantially slower speed
than the measurement clock and is phase locked. Which is the case.

-- 
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-11-06  0:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-29 17:10 Entropy from disks Oliver Xymoron
2002-10-29 18:02 ` Chris Friesen
2002-10-29 19:08   ` Oliver Xymoron
2002-11-05 23:55   ` David Schwartz
  -- strict thread matches above, loose matches on Subject: below --
2002-11-06  0:27 Oliver Xymoron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox