* [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long [not found] <20161214001656.19388-1-Jason@zx2c4.com> @ 2016-12-14 3:10 ` Jason A. Donenfeld 2016-12-14 16:37 ` Theodore Ts'o 0 siblings, 1 reply; 5+ messages in thread From: Jason A. Donenfeld @ 2016-12-14 3:10 UTC (permalink / raw) To: Netdev, David Miller, Linus Torvalds, kernel-hardening@lists.openwall.com, LKML, George Spelvin, Scott Bauer, Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers, linux-crypto, Ted Tso Cc: Jason A. Donenfeld, Jean-Philippe Aumasson This duplicates the current algorithm for get_random_int/long, but uses siphash24 instead. This comes with several benefits. It's certainly faster and more cryptographically secure than MD5. This patch also hashes the pid, entropy, and timestamp as fixed width fields, in order to increase diffusion. The previous md5 algorithm used a per-cpu md5 state, which caused successive calls to the function to chain upon each other. While it's not entirely clear that this kind of chaining is absolutely necessary when using a secure PRF like siphash24, it can't hurt, and the timing of the call chain does add a degree of natural entropy. So, in keeping with this design, instead of the massive per-cpu 64-byte md5 state, there is instead a per-cpu previously returned value for chaining. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> --- drivers/char/random.c | 50 +++++++++++++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 19 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index d6876d506220..25f96f074da5 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -262,6 +262,7 @@ #include <linux/syscalls.h> #include <linux/completion.h> #include <linux/uuid.h> +#include <linux/siphash.h> #include <crypto/chacha20.h> #include <asm/processor.h> @@ -2042,7 +2043,7 @@ struct ctl_table random_table[] = { }; #endif /* CONFIG_SYSCTL */ -static u32 random_int_secret[MD5_MESSAGE_BYTES / 4] ____cacheline_aligned; +static u8 random_int_secret[SIPHASH24_KEY_LEN]; int random_int_secret_init(void) { @@ -2050,8 +2051,7 @@ int random_int_secret_init(void) return 0; } -static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash) - __aligned(sizeof(unsigned long)); +static DEFINE_PER_CPU(u64, get_random_int_chaining); /* * Get a random word for internal kernel use only. Similar to urandom but @@ -2061,19 +2061,25 @@ static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash) */ unsigned int get_random_int(void) { - __u32 *hash; + uint64_t *chaining; unsigned int ret; + struct { + uint64_t chaining; + unsigned long ts; + unsigned long entropy; + pid_t pid; + } __packed combined; if (arch_get_random_int(&ret)) return ret; - hash = get_cpu_var(get_random_int_hash); - - hash[0] += current->pid + jiffies + random_get_entropy(); - md5_transform(hash, random_int_secret); - ret = hash[0]; - put_cpu_var(get_random_int_hash); - + chaining = &get_cpu_var(get_random_int_chaining); + combined.chaining = *chaining; + combined.ts = jiffies; + combined.entropy = random_get_entropy(); + combined.pid = current->pid; + ret = *chaining = siphash24((u8 *)&combined, sizeof(combined), random_int_secret); + put_cpu_var(chaining); return ret; } EXPORT_SYMBOL(get_random_int); @@ -2083,19 +2089,25 @@ EXPORT_SYMBOL(get_random_int); */ unsigned long get_random_long(void) { - __u32 *hash; + uint64_t *chaining; unsigned long ret; + struct { + uint64_t chaining; + unsigned long ts; + unsigned long entropy; + pid_t pid; + } __packed combined; if (arch_get_random_long(&ret)) return ret; - hash = get_cpu_var(get_random_int_hash); - - hash[0] += current->pid + jiffies + random_get_entropy(); - md5_transform(hash, random_int_secret); - ret = *(unsigned long *)hash; - put_cpu_var(get_random_int_hash); - + chaining = &get_cpu_var(get_random_int_chaining); + combined.chaining = *chaining; + combined.ts = jiffies; + combined.entropy = random_get_entropy(); + combined.pid = current->pid; + ret = *chaining = siphash24((u8 *)&combined, sizeof(combined), random_int_secret); + put_cpu_var(chaining); return ret; } EXPORT_SYMBOL(get_random_long); -- 2.11.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long 2016-12-14 3:10 ` [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long Jason A. Donenfeld @ 2016-12-14 16:37 ` Theodore Ts'o 2016-12-14 17:58 ` [kernel-hardening] " Jason A. Donenfeld 2016-12-14 19:12 ` Jason A. Donenfeld 0 siblings, 2 replies; 5+ messages in thread From: Theodore Ts'o @ 2016-12-14 16:37 UTC (permalink / raw) To: Jason A. Donenfeld Cc: Netdev, David Miller, Linus Torvalds, kernel-hardening@lists.openwall.com, LKML, George Spelvin, Scott Bauer, Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers, linux-crypto, Jean-Philippe Aumasson On Wed, Dec 14, 2016 at 04:10:37AM +0100, Jason A. Donenfeld wrote: > This duplicates the current algorithm for get_random_int/long, but uses > siphash24 instead. This comes with several benefits. It's certainly > faster and more cryptographically secure than MD5. This patch also > hashes the pid, entropy, and timestamp as fixed width fields, in order > to increase diffusion. > > The previous md5 algorithm used a per-cpu md5 state, which caused > successive calls to the function to chain upon each other. While it's > not entirely clear that this kind of chaining is absolutely necessary > when using a secure PRF like siphash24, it can't hurt, and the timing of > the call chain does add a degree of natural entropy. So, in keeping with > this design, instead of the massive per-cpu 64-byte md5 state, there is > instead a per-cpu previously returned value for chaining. > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> The original reason for get_random_int was because the original urandom algorithms were too slow. When we moved to chacha20, which is must faster, I didn't think to revisit get_random_int() and get_random_long(). One somewhat undesirable aspect of the current algorithm is that we never change random_int_secret. So I've been toying with the following, which is 4 times faster than md5. (I haven't tried benchmarking against siphash yet.) [ 3.606139] random benchmark!! [ 3.606276] get_random_int # cycles: 326578 [ 3.606317] get_random_int_new # cycles: 95438 [ 3.607423] get_random_bytes # cycles: 2653388 - Ted P.S. It's interesting to note that siphash24 and chacha20 are both add-rotate-xor based algorithms. diff --git a/drivers/char/random.c b/drivers/char/random.c index d6876d506220..be172ea75799 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1681,6 +1681,38 @@ static int rand_initialize(void) } early_initcall(rand_initialize); +static unsigned int get_random_int_new(void); + +static int rand_benchmark(void) +{ + cycles_t start,finish; + int i, out; + + pr_crit("random benchmark!!\n"); + start = get_cycles(); + for (i = 0; i < 1000; i++) { + get_random_int(); + } + finish = get_cycles(); + pr_err("get_random_int # cycles: %llu\n", finish - start); + + start = get_cycles(); + for (i = 0; i < 1000; i++) { + get_random_int_new(); + } + finish = get_cycles(); + pr_err("get_random_int_new # cycles: %llu\n", finish - start); + + start = get_cycles(); + for (i = 0; i < 1000; i++) { + get_random_bytes(&out, sizeof(out)); + } + finish = get_cycles(); + pr_err("get_random_bytes # cycles: %llu\n", finish - start); + return 0; +} +device_initcall(rand_benchmark); + #ifdef CONFIG_BLOCK void rand_initialize_disk(struct gendisk *disk) { @@ -2064,8 +2096,10 @@ unsigned int get_random_int(void) __u32 *hash; unsigned int ret; +#if 0 // force slow path if (arch_get_random_int(&ret)) return ret; +#endif hash = get_cpu_var(get_random_int_hash); @@ -2100,6 +2134,38 @@ unsigned long get_random_long(void) } EXPORT_SYMBOL(get_random_long); +struct random_buf { + __u8 buf[CHACHA20_BLOCK_SIZE]; + int ptr; +}; + +static DEFINE_PER_CPU(struct random_buf, batched_entropy); + +static void get_batched_entropy(void *buf, int n) +{ + struct random_buf *p; + + p = &get_cpu_var(batched_entropy); + + if ((p->ptr == 0) || + (p->ptr + n >= CHACHA20_BLOCK_SIZE)) { + extract_crng(p->buf); + p->ptr = 0; + } + BUG_ON(n > CHACHA20_BLOCK_SIZE); + memcpy(buf, p->buf, n); + p->ptr += n; + put_cpu_var(batched_entropy); +} + +static unsigned int get_random_int_new(void) +{ + int ret; + + get_batched_entropy(&ret, sizeof(ret)); + return ret; +} + /** * randomize_page - Generate a random, page aligned address * @start: The smallest acceptable address the caller will take. ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [kernel-hardening] Re: [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long 2016-12-14 16:37 ` Theodore Ts'o @ 2016-12-14 17:58 ` Jason A. Donenfeld 2016-12-14 19:12 ` Jason A. Donenfeld 1 sibling, 0 replies; 5+ messages in thread From: Jason A. Donenfeld @ 2016-12-14 17:58 UTC (permalink / raw) To: kernel-hardening, Theodore Ts'o, Jason A. Donenfeld, Netdev, David Miller, Linus Torvalds, LKML, George Spelvin, Scott Bauer, Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers, Linux Crypto Mailing List, Jean-Philippe Aumasson Hey Ted, On Wed, Dec 14, 2016 at 5:37 PM, Theodore Ts'o <tytso@mit.edu> wrote: > One somewhat undesirable aspect of the current algorithm is that we > never change random_int_secret. Why exactly would this be a problem? So long as the secret is kept secret, the PRF is secure. If an attacker can read arbitrary kernel memory, there are much much bigger issues to be concerned about. As well, the "chaining" variable I introduce ensures that the random numbers are, per-cpu, related to the uniqueness of timing of subsequent calls. > So I've been toying with the > following, which is 4 times faster than md5. (I haven't tried > benchmarking against siphash yet.) > > [ 3.606139] random benchmark!! > [ 3.606276] get_random_int # cycles: 326578 > [ 3.606317] get_random_int_new # cycles: 95438 > [ 3.607423] get_random_bytes # cycles: 2653388 Cool, I'll benchmark it against the siphash implementation. I like what you did with batching up lots of chacha output, and doling it out bit by bit. I suspect this will be quite fast, because with chacha20 you get an entire block. > P.S. It's interesting to note that siphash24 and chacha20 are both > add-rotate-xor based algorithms. Quite! Lots of nice shiny things are turning out be be ARX -- ChaCha, BLAKE2, Siphash, NORX. The simplicity is really appealing. Jason ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long 2016-12-14 16:37 ` Theodore Ts'o 2016-12-14 17:58 ` [kernel-hardening] " Jason A. Donenfeld @ 2016-12-14 19:12 ` Jason A. Donenfeld 2016-12-15 1:19 ` Jason A. Donenfeld 1 sibling, 1 reply; 5+ messages in thread From: Jason A. Donenfeld @ 2016-12-14 19:12 UTC (permalink / raw) To: kernel-hardening, Theodore Ts'o, Jason A. Donenfeld, Netdev, David Miller, Linus Torvalds, LKML, George Spelvin, Scott Bauer, Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers, Linux Crypto Mailing List, Jean-Philippe Aumasson Hi again, On Wed, Dec 14, 2016 at 5:37 PM, Theodore Ts'o <tytso@mit.edu> wrote: > [ 3.606139] random benchmark!! > [ 3.606276] get_random_int # cycles: 326578 > [ 3.606317] get_random_int_new # cycles: 95438 > [ 3.607423] get_random_bytes # cycles: 2653388 Looks to me like my siphash implementation is much faster for get_random_long, and more or less tied for get_random_int: [ 1.729370] random benchmark!! [ 1.729710] get_random_long # cycles: 349771 [ 1.730128] get_random_long_chacha # cycles: 359660 [ 1.730457] get_random_long_siphash # cycles: 94255 [ 1.731307] get_random_bytes # cycles: 1354894 [ 1.731707] get_random_int # cycles: 305640 [ 1.732095] get_random_int_chacha # cycles: 80726 [ 1.732425] get_random_int_siphash # cycles: 94265 [ 1.733278] get_random_bytes # cycles: 1315873 Given the increasing usage of get_random_long for ASLR and related, I think this makes the siphash approach worth pursuing. The chacha approach is also not significantly different from the md5 approach in terms of speed for get_rand_long. Additionally, since siphash is a PRF, I think this opens up a big window for optimizing it even further. Benchmark here: https://git.zx2c4.com/linux-dev/commit/?h=rng-bench Jason ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long 2016-12-14 19:12 ` Jason A. Donenfeld @ 2016-12-15 1:19 ` Jason A. Donenfeld 0 siblings, 0 replies; 5+ messages in thread From: Jason A. Donenfeld @ 2016-12-15 1:19 UTC (permalink / raw) To: kernel-hardening, Theodore Ts'o, Jason A. Donenfeld, Netdev, David Miller, Linus Torvalds, LKML, George Spelvin, Scott Bauer, Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers, Linux Crypto Mailing List, Jean-Philippe Aumasson Hey Ted, On Wed, Dec 14, 2016 at 8:12 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote: > I think this opens up a big window for optimizing it even > further. I optimized it a bit further and siphash is now the clear winner over chacha: [ 1.784801] random benchmark!! [ 1.785161] get_random_long # cycles: 415983 [ 1.785595] get_random_long_chacha # cycles: 242047 [ 1.785997] get_random_long_siphash # cycles: 137130 [ 1.787450] get_random_bytes # cycles: 1452985 [ 1.787947] get_random_int # cycles: 343323 [ 1.788282] get_random_int_chacha # cycles: 170767 [ 1.788656] get_random_int_siphash # cycles: 86384 [ 1.789764] get_random_bytes # cycles: 2279519 And even still, there is more that could be optimized. Therefore, I'll continue to keep this patch in the series and will CC you on the next patch set that goes out. Jason ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-12-15 1:19 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20161214001656.19388-1-Jason@zx2c4.com> 2016-12-14 3:10 ` [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long Jason A. Donenfeld 2016-12-14 16:37 ` Theodore Ts'o 2016-12-14 17:58 ` [kernel-hardening] " Jason A. Donenfeld 2016-12-14 19:12 ` Jason A. Donenfeld 2016-12-15 1:19 ` Jason A. Donenfeld
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).