[PATCH 4/3] random: use siphash24 instead of md5 for get_random

linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long
       [not found] <20161214001656.19388-1-Jason@zx2c4.com>
@ 2016-12-14  3:10 ` Jason A. Donenfeld
  2016-12-14 16:37   ` Theodore Ts'o
  0 siblings, 1 reply; 5+ messages in thread
From: Jason A. Donenfeld @ 2016-12-14  3:10 UTC (permalink / raw)
  To: Netdev, David Miller, Linus Torvalds,
	kernel-hardening@lists.openwall.com, LKML, George Spelvin,
	Scott Bauer, Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers,
	linux-crypto, Ted Tso
  Cc: Jason A. Donenfeld, Jean-Philippe Aumasson

This duplicates the current algorithm for get_random_int/long, but uses
siphash24 instead. This comes with several benefits. It's certainly
faster and more cryptographically secure than MD5. This patch also
hashes the pid, entropy, and timestamp as fixed width fields, in order
to increase diffusion.

The previous md5 algorithm used a per-cpu md5 state, which caused
successive calls to the function to chain upon each other. While it's
not entirely clear that this kind of chaining is absolutely necessary
when using a secure PRF like siphash24, it can't hurt, and the timing of
the call chain does add a degree of natural entropy. So, in keeping with
this design, instead of the massive per-cpu 64-byte md5 state, there is
instead a per-cpu previously returned value for chaining.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
---
 drivers/char/random.c | 50 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 31 insertions(+), 19 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d6876d506220..25f96f074da5 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -262,6 +262,7 @@
 #include <linux/syscalls.h>
 #include <linux/completion.h>
 #include <linux/uuid.h>
+#include <linux/siphash.h>
 #include <crypto/chacha20.h>
 
 #include <asm/processor.h>
@@ -2042,7 +2043,7 @@ struct ctl_table random_table[] = {
 };
 #endif 	/* CONFIG_SYSCTL */
 
-static u32 random_int_secret[MD5_MESSAGE_BYTES / 4] ____cacheline_aligned;
+static u8 random_int_secret[SIPHASH24_KEY_LEN];
 
 int random_int_secret_init(void)
 {
@@ -2050,8 +2051,7 @@ int random_int_secret_init(void)
 	return 0;
 }
 
-static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
-		__aligned(sizeof(unsigned long));
+static DEFINE_PER_CPU(u64, get_random_int_chaining);
 
 /*
  * Get a random word for internal kernel use only. Similar to urandom but
@@ -2061,19 +2061,25 @@ static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
  */
 unsigned int get_random_int(void)
 {
-	__u32 *hash;
+	uint64_t *chaining;
 	unsigned int ret;
+	struct {
+		uint64_t chaining;
+		unsigned long ts;
+		unsigned long entropy;
+		pid_t pid;
+	} __packed combined;
 
 	if (arch_get_random_int(&ret))
 		return ret;
 
-	hash = get_cpu_var(get_random_int_hash);
-
-	hash[0] += current->pid + jiffies + random_get_entropy();
-	md5_transform(hash, random_int_secret);
-	ret = hash[0];
-	put_cpu_var(get_random_int_hash);
-
+	chaining = &get_cpu_var(get_random_int_chaining);
+	combined.chaining = *chaining;
+	combined.ts = jiffies;
+	combined.entropy = random_get_entropy();
+	combined.pid = current->pid;
+	ret = *chaining = siphash24((u8 *)&combined, sizeof(combined), random_int_secret);
+	put_cpu_var(chaining);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_int);
@@ -2083,19 +2089,25 @@ EXPORT_SYMBOL(get_random_int);
  */
 unsigned long get_random_long(void)
 {
-	__u32 *hash;
+	uint64_t *chaining;
 	unsigned long ret;
+	struct {
+		uint64_t chaining;
+		unsigned long ts;
+		unsigned long entropy;
+		pid_t pid;
+	} __packed combined;
 
 	if (arch_get_random_long(&ret))
 		return ret;
 
-	hash = get_cpu_var(get_random_int_hash);
-
-	hash[0] += current->pid + jiffies + random_get_entropy();
-	md5_transform(hash, random_int_secret);
-	ret = *(unsigned long *)hash;
-	put_cpu_var(get_random_int_hash);
-
+	chaining = &get_cpu_var(get_random_int_chaining);
+	combined.chaining = *chaining;
+	combined.ts = jiffies;
+	combined.entropy = random_get_entropy();
+	combined.pid = current->pid;
+	ret = *chaining = siphash24((u8 *)&combined, sizeof(combined), random_int_secret);
+	put_cpu_var(chaining);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_long);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long
  2016-12-14  3:10 ` [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long Jason A. Donenfeld
@ 2016-12-14 16:37   ` Theodore Ts'o
  2016-12-14 17:58     ` [kernel-hardening] " Jason A. Donenfeld
  2016-12-14 19:12     ` Jason A. Donenfeld
  0 siblings, 2 replies; 5+ messages in thread
From: Theodore Ts'o @ 2016-12-14 16:37 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Netdev, David Miller, Linus Torvalds,
	kernel-hardening@lists.openwall.com, LKML, George Spelvin,
	Scott Bauer, Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers,
	linux-crypto, Jean-Philippe Aumasson

On Wed, Dec 14, 2016 at 04:10:37AM +0100, Jason A. Donenfeld wrote:
> This duplicates the current algorithm for get_random_int/long, but uses
> siphash24 instead. This comes with several benefits. It's certainly
> faster and more cryptographically secure than MD5. This patch also
> hashes the pid, entropy, and timestamp as fixed width fields, in order
> to increase diffusion.
> 
> The previous md5 algorithm used a per-cpu md5 state, which caused
> successive calls to the function to chain upon each other. While it's
> not entirely clear that this kind of chaining is absolutely necessary
> when using a secure PRF like siphash24, it can't hurt, and the timing of
> the call chain does add a degree of natural entropy. So, in keeping with
> this design, instead of the massive per-cpu 64-byte md5 state, there is
> instead a per-cpu previously returned value for chaining.
> 
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>

The original reason for get_random_int was because the original
urandom algorithms were too slow.  When we moved to chacha20, which is
must faster, I didn't think to revisit get_random_int() and
get_random_long().

One somewhat undesirable aspect of the current algorithm is that we
never change random_int_secret.  So I've been toying with the
following, which is 4 times faster than md5.  (I haven't tried
benchmarking against siphash yet.)

[    3.606139] random benchmark!!
[    3.606276] get_random_int # cycles: 326578
[    3.606317] get_random_int_new # cycles: 95438
[    3.607423] get_random_bytes # cycles: 2653388

     	       			  	  - Ted

P.S.  It's interesting to note that siphash24 and chacha20 are both
add-rotate-xor based algorithms.

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d6876d506220..be172ea75799 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1681,6 +1681,38 @@ static int rand_initialize(void)
 }
 early_initcall(rand_initialize);
 
+static unsigned int get_random_int_new(void);
+
+static int rand_benchmark(void)
+{
+	cycles_t start,finish;
+	int i, out;
+
+	pr_crit("random benchmark!!\n");
+	start = get_cycles();
+	for (i = 0; i < 1000; i++) {
+		get_random_int();
+	}
+	finish = get_cycles();
+	pr_err("get_random_int # cycles: %llu\n", finish - start);
+
+	start = get_cycles();
+	for (i = 0; i < 1000; i++) {
+		get_random_int_new();
+	}
+	finish = get_cycles();
+	pr_err("get_random_int_new # cycles: %llu\n", finish - start);
+
+	start = get_cycles();
+	for (i = 0; i < 1000; i++) {
+		get_random_bytes(&out, sizeof(out));
+	}
+	finish = get_cycles();
+	pr_err("get_random_bytes # cycles: %llu\n", finish - start);
+	return 0;
+}
+device_initcall(rand_benchmark);
+
 #ifdef CONFIG_BLOCK
 void rand_initialize_disk(struct gendisk *disk)
 {
@@ -2064,8 +2096,10 @@ unsigned int get_random_int(void)
 	__u32 *hash;
 	unsigned int ret;
 
+#if 0	// force slow path
 	if (arch_get_random_int(&ret))
 		return ret;
+#endif
 
 	hash = get_cpu_var(get_random_int_hash);
 
@@ -2100,6 +2134,38 @@ unsigned long get_random_long(void)
 }
 EXPORT_SYMBOL(get_random_long);
 
+struct random_buf {
+	__u8 buf[CHACHA20_BLOCK_SIZE];
+	int ptr;
+};
+
+static DEFINE_PER_CPU(struct random_buf, batched_entropy);
+
+static void get_batched_entropy(void *buf, int n)
+{
+	struct random_buf *p;
+
+	p = &get_cpu_var(batched_entropy);
+
+	if ((p->ptr == 0) ||
+	    (p->ptr + n >= CHACHA20_BLOCK_SIZE)) {
+		extract_crng(p->buf);
+		p->ptr = 0;
+	}
+	BUG_ON(n > CHACHA20_BLOCK_SIZE);
+	memcpy(buf, p->buf, n);
+	p->ptr += n;
+	put_cpu_var(batched_entropy);
+}
+
+static unsigned int get_random_int_new(void)
+{
+	int	ret;
+
+	get_batched_entropy(&ret, sizeof(ret));
+	return ret;
+}
+
 /**
  * randomize_page - Generate a random, page aligned address
  * @start:	The smallest acceptable address the caller will take.

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [kernel-hardening] Re: [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long
  2016-12-14 16:37   ` Theodore Ts'o
@ 2016-12-14 17:58     ` Jason A. Donenfeld
  2016-12-14 19:12     ` Jason A. Donenfeld
  1 sibling, 0 replies; 5+ messages in thread
From: Jason A. Donenfeld @ 2016-12-14 17:58 UTC (permalink / raw)
  To: kernel-hardening, Theodore Ts'o, Jason A. Donenfeld, Netdev,
	David Miller, Linus Torvalds, LKML, George Spelvin, Scott Bauer,
	Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers,
	Linux Crypto Mailing List, Jean-Philippe Aumasson

Hey Ted,

On Wed, Dec 14, 2016 at 5:37 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> One somewhat undesirable aspect of the current algorithm is that we
> never change random_int_secret.

Why exactly would this be a problem? So long as the secret is kept
secret, the PRF is secure. If an attacker can read arbitrary kernel
memory, there are much much bigger issues to be concerned about. As
well, the "chaining" variable I introduce ensures that the random
numbers are, per-cpu, related to the uniqueness of timing of
subsequent calls.

> So I've been toying with the
> following, which is 4 times faster than md5.  (I haven't tried
> benchmarking against siphash yet.)
>
> [    3.606139] random benchmark!!
> [    3.606276] get_random_int # cycles: 326578
> [    3.606317] get_random_int_new # cycles: 95438
> [    3.607423] get_random_bytes # cycles: 2653388

Cool, I'll benchmark it against the siphash implementation. I like
what you did with batching up lots of chacha output, and doling it out
bit by bit. I suspect this will be quite fast, because with chacha20
you get an entire block.

> P.S.  It's interesting to note that siphash24 and chacha20 are both
> add-rotate-xor based algorithms.

Quite! Lots of nice shiny things are turning out be be ARX -- ChaCha,
BLAKE2, Siphash, NORX. The simplicity is really appealing.

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long
  2016-12-14 16:37   ` Theodore Ts'o
  2016-12-14 17:58     ` [kernel-hardening] " Jason A. Donenfeld
@ 2016-12-14 19:12     ` Jason A. Donenfeld
  2016-12-15  1:19       ` Jason A. Donenfeld
  1 sibling, 1 reply; 5+ messages in thread
From: Jason A. Donenfeld @ 2016-12-14 19:12 UTC (permalink / raw)
  To: kernel-hardening, Theodore Ts'o, Jason A. Donenfeld, Netdev,
	David Miller, Linus Torvalds, LKML, George Spelvin, Scott Bauer,
	Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers,
	Linux Crypto Mailing List, Jean-Philippe Aumasson

Hi again,

On Wed, Dec 14, 2016 at 5:37 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> [    3.606139] random benchmark!!
> [    3.606276] get_random_int # cycles: 326578
> [    3.606317] get_random_int_new # cycles: 95438
> [    3.607423] get_random_bytes # cycles: 2653388

Looks to me like my siphash implementation is much faster for
get_random_long, and more or less tied for get_random_int:

[    1.729370] random benchmark!!
[    1.729710] get_random_long # cycles: 349771
[    1.730128] get_random_long_chacha # cycles: 359660
[    1.730457] get_random_long_siphash # cycles: 94255
[    1.731307] get_random_bytes # cycles: 1354894
[    1.731707] get_random_int # cycles: 305640
[    1.732095] get_random_int_chacha # cycles: 80726
[    1.732425] get_random_int_siphash # cycles: 94265
[    1.733278] get_random_bytes # cycles: 1315873

Given the increasing usage of get_random_long for ASLR and related, I
think this makes the siphash approach worth pursuing. The chacha
approach is also not significantly different from the md5 approach in
terms of speed for get_rand_long. Additionally, since siphash is a
PRF, I think this opens up a big window for optimizing it even
further.

Benchmark here:
    https://git.zx2c4.com/linux-dev/commit/?h=rng-bench

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long
  2016-12-14 19:12     ` Jason A. Donenfeld
@ 2016-12-15  1:19       ` Jason A. Donenfeld
  0 siblings, 0 replies; 5+ messages in thread
From: Jason A. Donenfeld @ 2016-12-15  1:19 UTC (permalink / raw)
  To: kernel-hardening, Theodore Ts'o, Jason A. Donenfeld, Netdev,
	David Miller, Linus Torvalds, LKML, George Spelvin, Scott Bauer,
	Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers,
	Linux Crypto Mailing List, Jean-Philippe Aumasson

Hey Ted,

On Wed, Dec 14, 2016 at 8:12 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> I think this opens up a big window for optimizing it even
> further.

I optimized it a bit further and siphash is now the clear winner over chacha:

[    1.784801] random benchmark!!
[    1.785161] get_random_long # cycles: 415983
[    1.785595] get_random_long_chacha # cycles: 242047
[    1.785997] get_random_long_siphash # cycles: 137130
[    1.787450] get_random_bytes # cycles: 1452985
[    1.787947] get_random_int # cycles: 343323
[    1.788282] get_random_int_chacha # cycles: 170767
[    1.788656] get_random_int_siphash # cycles: 86384
[    1.789764] get_random_bytes # cycles: 2279519

And even still, there is more that could be optimized. Therefore, I'll
continue to keep this patch in the series and will CC you on the next
patch set that goes out.

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-12-15  1:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20161214001656.19388-1-Jason@zx2c4.com>
2016-12-14  3:10 ` [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long Jason A. Donenfeld
2016-12-14 16:37   ` Theodore Ts'o
2016-12-14 17:58     ` [kernel-hardening] " Jason A. Donenfeld
2016-12-14 19:12     ` Jason A. Donenfeld
2016-12-15  1:19       ` Jason A. Donenfeld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).