All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH,RFC] random: collect cpu randomness
@ 2014-02-02 20:36 Jörn Engel
  2014-02-02 21:25 ` Stephan Mueller
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Jörn Engel @ 2014-02-02 20:36 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: H. Peter Anvin, Linux Kernel Developers List, macro, ralf,
	dave.taht, blogic, andrewmcgr, smueller, geert, tg

Collects entropy from random behaviour all modern cpus exhibit.  The
scheduler and slab allocator are instrumented for this purpose.  How
much randomness can be gathered is clearly hardware-dependent and hard
to estimate.  Therefore the entropy estimate is zero, but random bits
still get mixed into the pools.

Performance overhead seems low.  I did 20 kernel compiles with
allnoconfig, everything precompiled and warm caches.  Difference was in
the noise and on average performance was actually better when collecting
randomness.

To judge the benefits I ran this in kvm with instrumentation to print
out the various input values.  Entropy was estimated by booting ten
times, collecting the output for each boot, doing a diff between any two
and counting the diff hunks.  Numbers like "46 .. 215" means the two
most similar runs had 46 hunks, the two most dissimilar ones had 215.

Running kvm with -smp 4 gave the following results:
                 slab+sched     slab            sched
input[0]         46 .. 215      202 .. 342        0 ..   0
input[1]         76 .. 180        0 ..   4        0 ..   0
input[2]        147 .. 388        4 ..  44      286 .. 464
input[3]         56 .. 185        0 ..   2        9 ..  40
jiffies           8 ..  67       10 ..  28       19 ..  50
caller          114 .. 306       25 .. 178      219 .. 422
val              54 .. 254       15 .. 106      175 .. 321
&input           50 .. 246        0 ..  59      178 .. 387

First column collected entropy from both the slab allocator and the
scheduler, second and third only collected from one source.  I only used
the first 512 values per cpu.  Therefore the combined numbers can be
lower than the individual ones - there was more entropy collected, it
was just swamped by non-entropy.

Lots of entropy gets collected and about half of it stems from the
uninitialized valued of input[] on the stack.

Rerunning only the first column with -smp 1 was more sobering:
                slab+sched
input[0]          0 ..  13
input[1]          0 ..  13
input[2]          0 ..  14
input[3]          0 ..  11
jiffies           2 ..  22
caller            0 ..  19
val               0 ..  16
&input            0 ..   4

Every single input value contains entropy in some runs, but the only one
that was an entropy guarantee was jiffies.  In other words, a
low-precision clock.  This clearly shows how important it is to have a
high-precision clock, which would gather significantly more entropy.

Measuring the randomness from random_get_entropy() with above approach
failed because there was so much randomness.  All numbers in all runs
were different.  Taking the delta between the numbers, again almost all
numbers were different with at most 1 identical delta per 1000.
Compared to a high-precision clock, no other input comes within two
orders of magnitude.

An interesting aside is that the cpu actually functions as a
high-precision clock.  This partially explains why the -smp 4 number are
so much better - any drift between the four cpu-clocks gets turned into
randomness.  The other explanation is that kvm is actually collecting
randomness from the host system.  I tried to minimize that effect by not
touching my notebook while running these tests, but a hardware test box
would clearly yield more meaningful results.

I invite others to also test whether the patch collects useful
randomness or causes a measureable performance loss on any hardware or
workload.  Preferrably not using my methods, in order to avoid
systematic errors.

Signed-off-by: Joern Engel <joern@logfs.org>
---
 drivers/char/random.c  | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/random.h |  2 ++
 kernel/sched/core.c    |  1 +
 mm/slab.c              |  1 +
 4 files changed, 65 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 429b75bb60e8..693dea730a3e 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -587,6 +587,30 @@ static void fast_mix(struct fast_pool *f, __u32 input[4])
 }
 
 /*
+ * An even faster variant of fast_mix, albeit with worse mixing.
+ */
+static void weak_mix(struct fast_pool *f, __u32 input[4])
+{
+	__u32 acc, carry = f->pool[3] >> 25;
+
+	acc = (f->pool[0] << 7) ^ input[0] ^ carry;
+	carry = f->pool[0] >> 25;
+	f->pool[0] = acc;
+
+	acc = (f->pool[1] << 7) ^ input[1] ^ carry;
+	carry = f->pool[1] >> 25;
+	f->pool[1] = acc;
+
+	acc = (f->pool[2] << 7) ^ input[2] ^ carry;
+	carry = f->pool[2] >> 25;
+	f->pool[2] = acc;
+
+	acc = (f->pool[3] << 7) ^ input[3] ^ carry;
+	//carry = f->pool[3] >> 25;
+	f->pool[3] = acc;
+}
+
+/*
  * Credit (or debit) the entropy store with n bits of entropy.
  * Use credit_entropy_bits_safe() if the value comes from userspace
  * or otherwise should be checked for extreme values.
@@ -833,6 +857,43 @@ void add_input_randomness(unsigned int type, unsigned int code,
 }
 EXPORT_SYMBOL_GPL(add_input_randomness);
 
+static DEFINE_PER_CPU(struct fast_pool, cpu_randomness);
+
+void __add_cpu_randomness(void *caller, void *val)
+{
+	struct entropy_store	*r;
+	struct fast_pool	*fast_pool = &__get_cpu_var(cpu_randomness);
+	unsigned long		now = jiffies;
+	__u32			input[4], cycles = random_get_entropy();
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wuninitialized"
+	input[0] ^= cycles ^ jiffies;
+	input[1] ^= (unsigned long)caller;
+	input[2] ^= (unsigned long)val;
+	input[3] ^= (unsigned long)&input;
+#pragma GCC diagnostic pop
+
+	weak_mix(fast_pool, input);
+
+	if ((fast_pool->count++ & 1023) &&
+	    !time_after(now, fast_pool->last + HZ))
+		return;
+
+	fast_pool->last = now;
+	fast_pool->count = 1;
+
+	r = nonblocking_pool.initialized ? &input_pool : &nonblocking_pool;
+	__mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool), NULL);
+}
+
+void add_cpu_randomness(void *caller, void *val)
+{
+	preempt_disable();
+	__add_cpu_randomness(caller, val);
+	preempt_enable();
+}
+
 static DEFINE_PER_CPU(struct fast_pool, irq_randomness);
 
 void add_interrupt_randomness(int irq, int irq_flags)
diff --git a/include/linux/random.h b/include/linux/random.h
index 4002b3df4c85..ce0ccdcd1d63 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -12,6 +12,8 @@
 extern void add_device_randomness(const void *, unsigned int);
 extern void add_input_randomness(unsigned int type, unsigned int code,
 				 unsigned int value);
+extern void __add_cpu_randomness(void *caller, void *val);
+extern void add_cpu_randomness(void *caller, void *val);
 extern void add_interrupt_randomness(int irq, int irq_flags);
 
 extern void get_random_bytes(void *buf, int nbytes);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a88f4a485c5e..7af6389f9b9e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2511,6 +2511,7 @@ need_resched:
 	rq = cpu_rq(cpu);
 	rcu_note_context_switch(cpu);
 	prev = rq->curr;
+	__add_cpu_randomness(__builtin_return_address(1), prev);
 
 	schedule_debug(prev);
 
diff --git a/mm/slab.c b/mm/slab.c
index eb043bf05f4c..ea5a30d44ad1 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3587,6 +3587,7 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
 	trace_kmalloc(caller, ret,
 		      size, cachep->size, flags);
 
+	add_cpu_randomness(__builtin_return_address(2), ret);
 	return ret;
 }
 
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread
* [PATCH] random: mix all saved registers into entropy pool
@ 2014-05-19 21:17 Jörn Engel
  2014-05-19 21:23 ` Jörn Engel
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Jörn Engel @ 2014-05-19 21:17 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: H. Peter Anvin, lkml

The single biggest entropy source is a high-resolution timer running
asynchronous to the triggering event.  That leaves systems without a
useful get_cycles() implementation with significantly less entropy.
Significant means orders of magnitude, not a few percent.

An alternate high-resolution timer is the register content at the time
of an interrupt.  While not monotonic, it is guaranteed to change every
few clock cycles and very unlikely to repeat the same pattern.  Not
useful for interpretation as time, but we only care about some bits
of the "clock" flipping in an unpredictable fashion.

Experimentation show this to be an excellent entropy source.  Doing 1000
boottests with kvm and dumping a hash of the registers for the first
1024 interrupts each, >40% of all hashes were unique and >80% of all
hashes occurred less than once 1% of the time.

Even assuming that only unique register hashes yield any entropy at all
and only one bit each, we would gather >400 bits of entropy early in
boot and another 40 bits/s from the timer interrupt during runtime.  I
would feel confident with this amount of entropy in the absence of any
other source.

Repeating the test on real hardware, albeit with fewer repetitions
yields roughly the same results, slightly above 40% unique hashes.

Signed-off-by: Joern Engel <joern@logfs.org>
---
 drivers/char/random.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 62 insertions(+), 4 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 429b75bb60e8..b4a8968fe28d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -553,6 +553,8 @@ static void mix_pool_bytes(struct entropy_store *r, const void *in,
 
 struct fast_pool {
 	__u32		pool[4];
+	u32		last_shift;
+	u32		regs_count;
 	unsigned long	last;
 	unsigned short	count;
 	unsigned char	rotate;
@@ -835,6 +837,61 @@ EXPORT_SYMBOL_GPL(add_input_randomness);
 
 static DEFINE_PER_CPU(struct fast_pool, irq_randomness);
 
+/*
+ * Ratelimit to a steady state of about once per jiffy.  A naïve approach
+ * would be to return 1 every time jiffies changes.  But we want to avoid
+ * being closely coupled to the timer interrupt.  So instead we increment
+ * a counter on every call and shift it right every time jiffies changes.
+ * If the counter is a power of two, return false;
+ *
+ * Effect is that some time after a jiffies change and cutting the counter
+ * in half we reach another power of two and return false.  But the
+ * likelihood of this happening is about the same at any time within a
+ * jiffies interval.
+ */
+static inline int ratelimited(struct fast_pool *p)
+{
+	int ret = !(p->regs_count == 0 || is_power_of_2(p->regs_count));
+
+	p->regs_count++;
+	if (p->last_shift != (u32)jiffies) {
+		p->regs_count >>= min(31u, (u32)jiffies - p->last_shift);
+		p->last_shift = (u32)jiffies;
+	}
+	return ret;
+}
+
+#define BOOT_IRQS 1024
+
+/*
+ * The single biggest entropy source is a high-resolution timer running
+ * asynchronous to the triggering event.  That leaves systems without a
+ * useful get_cycles() implementation with significantly less entropy.
+ * Significant means orders of magnitude, not a few percent.
+ *
+ * An alternate high-resolution timer is the register content at the time
+ * of an interrupt.  While not monotonic, it is guaranteed to change every
+ * few clock cycles and very unlikely to repeat the same pattern.  Not
+ * useful for interpretation as time, but we only care about some bits
+ * of the "clock" flipping in an unpredictable fashion.
+ */
+static void mix_regs(struct pt_regs *regs, struct fast_pool *fast_pool)
+{
+	struct entropy_store	*r;
+	/* Two variables avoid decrementing by two without using atomics */
+	static int boot_count = BOOT_IRQS;
+	int in_boot = boot_count;
+
+	if (in_boot) {
+		boot_count = in_boot - 1;
+	} else if (ratelimited(fast_pool))
+		return;
+
+	/* During bootup we alternately feed both pools */
+	r = (in_boot & 1) ? &nonblocking_pool : &input_pool;
+	__mix_pool_bytes(r, regs, sizeof(*regs), NULL);
+}
+
 void add_interrupt_randomness(int irq, int irq_flags)
 {
 	struct entropy_store	*r;
@@ -845,13 +902,14 @@ void add_interrupt_randomness(int irq, int irq_flags)
 	__u32			input[4], c_high, j_high;
 	__u64			ip;
 
+	mix_regs(regs, fast_pool);
 	c_high = (sizeof(cycles) > 4) ? cycles >> 32 : 0;
 	j_high = (sizeof(now) > 4) ? now >> 32 : 0;
-	input[0] = cycles ^ j_high ^ irq;
-	input[1] = now ^ c_high;
+	input[0] ^= cycles ^ j_high ^ irq;
+	input[1] ^= now ^ c_high;
 	ip = regs ? instruction_pointer(regs) : _RET_IP_;
-	input[2] = ip;
-	input[3] = ip >> 32;
+	input[2] ^= ip;
+	input[3] ^= ip >> 32;
 
 	fast_mix(fast_pool, input);
 
-- 
2.0.0.rc0.1.g7b2ba98


^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2014-06-12 20:27 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-02 20:36 [PATCH,RFC] random: collect cpu randomness Jörn Engel
2014-02-02 21:25 ` Stephan Mueller
2014-02-03  1:24   ` Jörn Engel
2014-02-03  1:28     ` H. Peter Anvin
2014-02-03 13:36     ` Stephan Mueller
2014-02-03  1:39   ` Theodore Ts'o
2014-02-03  3:35     ` Jörn Engel
2014-02-03 12:54     ` Thorsten Glaser
2014-02-03 13:06     ` Stephan Mueller
2014-02-03 15:50 ` Jörn Engel
2014-02-03 16:37   ` Theodore Ts'o
2014-02-03 18:48     ` Jörn Engel
2014-03-23 18:00       ` [PATCH] random: mix all saved registers into entropy pool Jörn Engel
2014-02-03 21:54     ` [PATCH,RFC] random: collect cpu randomness Maciej W. Rozycki
2014-02-03 22:44       ` Theodore Ts'o
2014-02-06 22:20 ` Kees Cook
2014-02-06 22:21   ` Dave Taht
2014-02-07  7:44   ` Jörn Engel
2014-02-20  9:50 ` Paolo Bonzini
  -- strict thread matches above, loose matches on Subject: below --
2014-05-19 21:17 [PATCH] random: mix all saved registers into entropy pool Jörn Engel
2014-05-19 21:23 ` Jörn Engel
2014-05-19 22:18   ` H. Peter Anvin
2014-05-19 22:39     ` Jörn Engel
2014-05-19 23:05       ` H. Peter Anvin
2014-05-19 23:18         ` Jörn Engel
2014-05-20 12:12 ` Andi Kleen
2014-05-20 20:08   ` Jörn Engel
2014-05-21 19:39     ` Andi Kleen
2014-05-21 20:29       ` Jörn Engel
2014-05-21 20:38       ` Jörn Engel
2014-06-04 23:17 ` Jörn Engel
2014-06-10 16:14 ` Theodore Ts'o
2014-06-11  0:10   ` Jörn Engel
2014-06-11 15:27     ` Theodore Ts'o
2014-06-12 20:25       ` Jörn Engel
2014-06-12 20:05     ` Jörn Engel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.