Linux cryptographic layer development

Linux cryptographic layer development
 help / color / mirror / Atom feed

* Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform
From: Hannes Frederic Sowa @ 2016-12-14 20:27 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: David Miller, David Laight, Netdev, kernel-hardening, Andi Kleen,
	LKML, Linux Crypto Mailing List
In-Reply-To: <CAHmME9rp2oSCo0eu92jKm00S0eJHz65bJKXRpeS7=_EV6zZNYw@mail.gmail.com>

Hey Jason,

On 14.12.2016 20:38, Jason A. Donenfeld wrote:
> On Wed, Dec 14, 2016 at 8:22 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> I don't think this helps. Did you test it? I don't see reason why
>> padding could be left out between `d' and `end' because of the flexible
>> array member?
> 
> Because the type u8 doesn't require any alignment requirements, it can
> nestle right up there cozy with the u16:
> 
> zx2c4@thinkpad ~ $ cat a.c
> #include <stdint.h>
> #include <stdio.h>
> #include <stddef.h>
> int main()
> {
>        struct {
>                uint64_t a;
>                uint32_t b;
>                uint32_t c;
>                uint16_t d;
>                char x[];
>        } a;
>        printf("%zu\n", sizeof(a));
>        printf("%zu\n", offsetof(typeof(a), x));
>        return 0;
> }
> zx2c4@thinkpad ~ $ gcc a.c
> zx2c4@thinkpad ~ $ ./a.out
> 24
> 18

Sorry, I misread the patch. You are using offsetof. In this case remove
the char x[] and just use offsetofend because it is misleading
otherwise. Should work like that though.

What I don't really understand is that the addition of this complexity
actually reduces the performance, as you have to take the "if (left)"
branch during hashing and causes you to make a load_unaligned_zeropad.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform
From: Tom Herbert @ 2016-12-14 20:12 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: David Laight, Netdev, kernel-hardening, Andi Kleen, LKML,
	Linux Crypto Mailing List
In-Reply-To: <CAHmME9pEM=cDC5S=j1BU2oCF8-WdnbRfiVojcet4rXcRLcpJRw@mail.gmail.com>

On Wed, Dec 14, 2016 at 4:53 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hi David,
>
> On Wed, Dec 14, 2016 at 10:51 AM, David Laight <David.Laight@aculab.com> wrote:
>> From: Jason A. Donenfeld
>>> Sent: 14 December 2016 00:17
>>> This gives a clear speed and security improvement. Rather than manually
>>> filling MD5 buffers, we simply create a layout by a simple anonymous
>>> struct, for which gcc generates rather efficient code.
>> ...
>>> +     const struct {
>>> +             struct in6_addr saddr;
>>> +             struct in6_addr daddr;
>>> +             __be16 sport;
>>> +             __be16 dport;
>>> +     } __packed combined = {
>>> +             .saddr = *(struct in6_addr *)saddr,
>>> +             .daddr = *(struct in6_addr *)daddr,
>>> +             .sport = sport,
>>> +             .dport = dport
>>> +     };
>>
>> You need to look at the effect of marking this (and the other)
>> structures 'packed' on architectures like sparc64.
>
> In all current uses of __packed in the code, I think the impact is
> precisely zero, because all structures have members in descending
> order of size, with each member being a perfect multiple of the one
> below it. The __packed is therefore just there for safety, in case
> somebody comes in and screws everything up by sticking a u8 in
> between. In that case, it wouldn't be desirable to hash the structure
> padding bits. In the worst case, I don't believe the impact would be
> worse than a byte-by-byte memcpy, which is what the old code did. But
> anyway, these structures are already naturally packed anyway, so the
> present impact is nil.
>
If you pad the data structure to 64 bits then we can call the version
of siphash that only deals in 64 bit words. Writing a zero in the
padding will be cheaper than dealing with odd lengths in siphash24.

Tom

> Jason

^ permalink raw reply

* Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-14 19:47 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Netdev, kernel-hardening, LKML, Linux Crypto Mailing List,
	Jean-Philippe Aumasson, Daniel J . Bernstein, Linus Torvalds,
	Eric Biggers
In-Reply-To: <9fea41e0-fd55-7328-e2f4-73eb2e7f7a98@stressinduktion.org>

Hi Hannes,

On Wed, Dec 14, 2016 at 4:09 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Yes, numbers would be very usable here. I am mostly concerned about
> small plastic router cases. E.g. assume you double packet processing
> time with a change of the hashing function at what point is the actual
> packet processing more of an attack vector than the hashtable?

I agree. Looks like Tom did some very quick benchmarks. I'll do some
more precise benchmarks myself when we graduate from looking at md5
replacement (the easy case) to looking at jhash replacement (the
harder case).

>> With that said, siphash is here to replace uses of jhash where
>> hashtable poisoning vulnerabilities make it necessary. Where there's
>> no significant security improvement, if there's no speed improvement
>> either, then of course nothing's going to change.
>
> It still changes currently well working source. ;-)

I mean if siphash doesn't make things better in someway, we'll just
continue using jhash, so no source change or anything. In other words:
evolutionary conservative approach rather than hasty "replace 'em
all!" tomfoolery.

> MD5 is considered broken because its collision resistance is broken?
> SipHash doesn't even claim to have collision resistance (which we don't
> need here)?

Not just that, but it's not immediately clear to me that using MD5 as
a PRF the way it is now with md5_transform is even a straightforwardly
good idea.

> But I agree, certainly it could be a nice speed-up!

The benchmarks for the secure sequence number generation and the rng
are indeed really promising.

> I think you mean non-linearity.

Yea of course, editing typo, sorry.

> In general I am in favor to switch to siphash, but it would be nice to
> see some benchmarks with the specific kernel implementation also on some
> smaller 32 bit CPUs and especially without using any SIMD instructions
> (which might have been used in paper comparison).

Sure, agreed. Each proposed jhash replacement will need to be
benchmarked on little MIPS machines and x86 monsters alike, with
patches indicating PPS before and after.

Jason

^ permalink raw reply

* Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform
From: Jason A. Donenfeld @ 2016-12-14 19:38 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: David Miller, David Laight, Netdev, kernel-hardening, Andi Kleen,
	LKML, Linux Crypto Mailing List
In-Reply-To: <0e708ba2-6a4e-013e-597a-62ab32cc240b@stressinduktion.org>

Hi Hannes,

On Wed, Dec 14, 2016 at 8:22 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> I don't think this helps. Did you test it? I don't see reason why
> padding could be left out between `d' and `end' because of the flexible
> array member?

Because the type u8 doesn't require any alignment requirements, it can
nestle right up there cozy with the u16:

zx2c4@thinkpad ~ $ cat a.c
#include <stdint.h>
#include <stdio.h>
#include <stddef.h>
int main()
{
       struct {
               uint64_t a;
               uint32_t b;
               uint32_t c;
               uint16_t d;
               char x[];
       } a;
       printf("%zu\n", sizeof(a));
       printf("%zu\n", offsetof(typeof(a), x));
       return 0;
}
zx2c4@thinkpad ~ $ gcc a.c
zx2c4@thinkpad ~ $ ./a.out
24
18

Jason

^ permalink raw reply

* Re: [PATCH v3 1/3] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-14 19:35 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Netdev, kernel-hardening, LKML, Linux Crypto Mailing List,
	Jean-Philippe Aumasson, Daniel J . Bernstein, Linus Torvalds,
	Eric Biggers, David Laight
In-Reply-To: <CALx6S35UgTyqkYUjS5gYFH4HnjW974WQ_JiDXxgb9rZ7gnY52Q@mail.gmail.com>

Hi Tom,

On Wed, Dec 14, 2016 at 8:18 PM, Tom Herbert <tom@herbertland.com> wrote:
> "super fast" is relative. My quick test shows that this faster than
> Toeplitz (good, but not exactly hard to achieve), but is about 4x
> slower than jhash.

Fast relative to other cryptographically secure PRFs.

>> SipHash isn't just some new trendy hash function. It's been around for a
>> while, and there really isn't anything that comes remotely close to
>> being useful in the way SipHash is. With that said, why do we need this?
> I don't think we need advertising nor a lesson on hashing. It would be
> much more useful if you just point us to the paper on siphash (which I
> assume I http://cr.yp.to/siphash/siphash-20120918.pdf ?).

Ugh. Sorry. It definitely wasn't my intention to give an uninvited
lesson or an annoying advert. For the former, I didn't want to make
any expectations about fields of knowledge, because I honest have no
idea. For the latter, I wrote that sentence to indicate that siphash
isn't just some newfangled hipster function, but something useful and
well established. I didn't mean it as a form of advertising. My
apologies if I've offended your sensibilities.

That cr.yp.to link is fine, or https://131002.net/siphash/siphash.pdf I believe.

> Key rotation is important anyway, without any key rotation even if the
> key is compromised in siphash by some external means we would have an
> insecure hash until the system reboots.

I'm a bit surprised to read this. I've never designed a system to be
secure even in the event of remote arbitrary kernel memory disclosure,
and I wasn't aware this was generally considered an architectural
requirement or Linux.

In any case, if you want this, I suppose you can have it with siphash too.

> Maybe so, but we need to do due diligence before considering adopting
> siphash as the primary hashing in the network stack. Consider that we
> may very well perform a hash over L4 tuples on _every_ packet. We've
> done a good job at limiting this to be at most one hash per packet,
> but nevertheless the performance of the hash function must be take
> into account.

I agree with you. It seems like each case is going to needed to be
measured on a case by case basis. In this series I make the first use
of siphash in the secure sequence generation and get_random_int/long,
where siphash replaces md5, so there's a pretty clear performance in.
But for the jhash replacements indeed things are going to need to be
individually evaluated.

> 1) My quick test shows siphash is about four times more expensive than
> jhash. On my test system, computing a hash over IPv4 tuple (two 32 bit
> addresses and 2 16 bit source ports) is 6.9 nsecs in Jenkins hash, 33
> nsecs with siphash. Given that we have eliminated most of the packet
> header hashes this might be tolerable, but still should be looking at
> ways to optimize.
> 2) I like moving to use u64 (quad words) in the hash, this is an
> improvement over Jenkins which is based on 32 bit words. If we put
> this in the kernel we probably want to have several variants of
> siphash for specific sizes (e.g. siphash1, siphash2, siphash2,
> siphashn for hash over one, two, three, or n sixty four bit words).

I think your suggestion for (2) will contribute to further
optimizations for (1). In v2, I had another patch in there adding
siphash_1word, siphash_2words, etc, like jhash, but I implemented it
by taking u32 variables and then just concatenating these into a
buffer and passing them to the main siphash function. I removed it
from v3 because I thought that these kind of missed the whole point.
In particular:

a) siphash24_1word, siphash24_2words, siphash24_3words, etc should
take u64, not u32, since that's what siphash operates on natively
b) Rather than concatenating them in a buffer, I should write
specializations of the siphash24 function _especially_ for these size
inputs to avoid the copy and to reduce the book keeping.

I'll add these functions to v4 implemented like that.

Thanks for the useful feedback and benchmarks!

Jason

^ permalink raw reply

* Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform
From: Hannes Frederic Sowa @ 2016-12-14 19:22 UTC (permalink / raw)
  To: Jason A. Donenfeld, David Miller
  Cc: David Laight, Netdev, kernel-hardening, Andi Kleen, LKML,
	Linux Crypto Mailing List
In-Reply-To: <CAHmME9pz4XudfeqhKBwFNDmp7AYuNwbnevMqB3e6ScPDnUnq9g@mail.gmail.com>

On 14.12.2016 19:06, Jason A. Donenfeld wrote:
> Hi David,
> 
> On Wed, Dec 14, 2016 at 6:56 PM, David Miller <davem@davemloft.net> wrote:
>> Just marking the structure __packed, whether necessary or not, makes
>> the compiler assume that the members are not aligned and causes
>> byte-by-byte accesses to be performed for words.
>> Never, _ever_, use __packed unless absolutely necessary, it pessimizes
>> the code on cpus that require proper alignment of types.
> 
> Oh, jimminy cricket, I did not realize that it made assignments
> byte-by-byte *always*. So what options am I left with? What
> immediately comes to mind are:
> 
> 1)
> 
> struct {
>     u64 a;
>     u32 b;
>     u32 c;
>     u16 d;
>     u8 end[];

I don't think this helps. Did you test it? I don't see reason why
padding could be left out between `d' and `end' because of the flexible
array member?

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH v3 1/3] siphash: add cryptographically secure hashtable function
From: Tom Herbert @ 2016-12-14 19:18 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Netdev, kernel-hardening, LKML, linux-crypto,
	Jean-Philippe Aumasson, Daniel J . Bernstein, Linus Torvalds,
	Eric Biggers, David Laight
In-Reply-To: <20161214184605.24006-1-Jason@zx2c4.com>

On Wed, Dec 14, 2016 at 10:46 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> SipHash is a 64-bit keyed hash function that is actually a
> cryptographically secure PRF, like HMAC. Except SipHash is super fast,
> and is meant to be used as a hashtable keyed lookup function.
>
"super fast" is relative. My quick test shows that this faster than
Toeplitz (good, but not exactly hard to achieve), but is about 4x
slower than jhash.

> SipHash isn't just some new trendy hash function. It's been around for a
> while, and there really isn't anything that comes remotely close to
> being useful in the way SipHash is. With that said, why do we need this?
>
I don't think we need advertising nor a lesson on hashing. It would be
much more useful if you just point us to the paper on siphash (which I
assume I http://cr.yp.to/siphash/siphash-20120918.pdf ?).

> There are a variety of attacks known as "hashtable poisoning" in which an
> attacker forms some data such that the hash of that data will be the
> same, and then preceeds to fill up all entries of a hashbucket. This is
> a realistic and well-known denial-of-service vector.
>
> Linux developers already seem to be aware that this is an issue, and
> various places that use hash tables in, say, a network context, use a
> non-cryptographically secure function (usually jhash) and then try to
> twiddle with the key on a time basis (or in many cases just do nothing
> and hope that nobody notices). While this is an admirable attempt at
> solving the problem, it doesn't actually fix it. SipHash fixes it.
>
Key rotation is important anyway, without any key rotation even if the
key is compromised in siphash by some external means we would have an
insecure hash until the system reboots.

> (It fixes it in such a sound way that you could even build a stream
> cipher out of SipHash that would resist the modern cryptanalysis.)
>
> There are a modicum of places in the kernel that are vulnerable to
> hashtable poisoning attacks, either via userspace vectors or network
> vectors, and there's not a reliable mechanism inside the kernel at the
> moment to fix it. The first step toward fixing these issues is actually
> getting a secure primitive into the kernel for developers to use. Then
> we can, bit by bit, port things over to it as deemed appropriate.
>
> Secondly, a few places are using MD5 for creating secure sequence
> numbers, port numbers, or fast random numbers. Siphash is a faster, more
> fittting, and more secure replacement for MD5 in those situations.
>
> Dozens of languages are already using this internally for their hash
> tables. Some of the BSDs already use this in their kernels. SipHash is
> a widely known high-speed solution to a widely known problem, and it's
> time we catch-up.
>
Maybe so, but we need to do due diligence before considering adopting
siphash as the primary hashing in the network stack. Consider that we
may very well perform a hash over L4 tuples on _every_ packet. We've
done a good job at limiting this to be at most one hash per packet,
but nevertheless the performance of the hash function must be take
into account.

A few points:

1) My quick test shows siphash is about four times more expensive than
jhash. On my test system, computing a hash over IPv4 tuple (two 32 bit
addresses and 2 16 bit source ports) is 6.9 nsecs in Jenkins hash, 33
nsecs with siphash. Given that we have eliminated most of the packet
header hashes this might be tolerable, but still should be looking at
ways to optimize.
2) I like moving to use u64 (quad words) in the hash, this is an
improvement over Jenkins which is based on 32 bit words. If we put
this in the kernel we probably want to have several variants of
siphash for specific sizes (e.g. siphash1, siphash2, siphash2,
siphashn for hash over one, two, three, or n sixty four bit words).
3) I also tested siphash distribution and Avalanche Effect (these
really should have been covered in the paper :-( ). Both of these are
good with siphash, in fact almost have identical characteristics to
Jenkins hash.

Tom

> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
> Cc: Daniel J. Bernstein <djb@cr.yp.to>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Eric Biggers <ebiggers3@gmail.com>
> Cc: David Laight <David.Laight@aculab.com>
> ---
> Changes from v2->v3:
>
>   - There is now a fast aligned version of the function and a not-as-fast
>     unaligned version. The requirements for each have been documented in
>     a docbook-style comment. As well, the header now contains a constant
>     for the expected alignment.
>
>   - The test suite has been updated to check both the unaligned and aligned
>     version of the function.
>
>  include/linux/siphash.h |  30 ++++++++++
>  lib/Kconfig.debug       |   6 +-
>  lib/Makefile            |   5 +-
>  lib/siphash.c           | 153 ++++++++++++++++++++++++++++++++++++++++++++++++
>  lib/test_siphash.c      |  85 +++++++++++++++++++++++++++
>  5 files changed, 274 insertions(+), 5 deletions(-)
>  create mode 100644 include/linux/siphash.h
>  create mode 100644 lib/siphash.c
>  create mode 100644 lib/test_siphash.c
>
> diff --git a/include/linux/siphash.h b/include/linux/siphash.h
> new file mode 100644
> index 000000000000..82dc1a911a2e
> --- /dev/null
> +++ b/include/linux/siphash.h
> @@ -0,0 +1,30 @@
> +/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>
> + *
> + * This file is provided under a dual BSD/GPLv2 license.
> + *
> + * SipHash: a fast short-input PRF
> + * https://131002.net/siphash/
> + */
> +
> +#ifndef _LINUX_SIPHASH_H
> +#define _LINUX_SIPHASH_H
> +
> +#include <linux/types.h>
> +
> +enum siphash_lengths {
> +       SIPHASH24_KEY_LEN = 16,
> +       SIPHASH24_ALIGNMENT = 8
> +};
> +
> +u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]);
> +
> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +static inline u64 siphash24_unaligned(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
> +{
> +       return siphash24(data, len, key);
> +}
> +#else
> +u64 siphash24_unaligned(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]);
> +#endif
> +
> +#endif /* _LINUX_SIPHASH_H */
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index e6327d102184..32bbf689fc46 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1843,9 +1843,9 @@ config TEST_HASH
>         tristate "Perform selftest on hash functions"
>         default n
>         help
> -         Enable this option to test the kernel's integer (<linux/hash,h>)
> -         and string (<linux/stringhash.h>) hash functions on boot
> -         (or module load).
> +         Enable this option to test the kernel's integer (<linux/hash.h>),
> +         string (<linux/stringhash.h>), and siphash (<linux/siphash.h>)
> +         hash functions on boot (or module load).
>
>           This is intended to help people writing architecture-specific
>           optimized versions.  If unsure, say N.
> diff --git a/lib/Makefile b/lib/Makefile
> index 50144a3aeebd..71d398b04a74 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -22,7 +22,8 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          sha1.o chacha20.o md5.o irq_regs.o argv_split.o \
>          flex_proportions.o ratelimit.o show_mem.o \
>          is_single_threaded.o plist.o decompress.o kobject_uevent.o \
> -        earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o win_minmax.o
> +        earlycpio.o seq_buf.o siphash.o \
> +        nmi_backtrace.o nodemask.o win_minmax.o
>
>  lib-$(CONFIG_MMU) += ioremap.o
>  lib-$(CONFIG_SMP) += cpumask.o
> @@ -44,7 +45,7 @@ obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
>  obj-y += kstrtox.o
>  obj-$(CONFIG_TEST_BPF) += test_bpf.o
>  obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o
> -obj-$(CONFIG_TEST_HASH) += test_hash.o
> +obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o
>  obj-$(CONFIG_TEST_KASAN) += test_kasan.o
>  obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o
>  obj-$(CONFIG_TEST_LKM) += test_module.o
> diff --git a/lib/siphash.c b/lib/siphash.c
> new file mode 100644
> index 000000000000..32acdc26234f
> --- /dev/null
> +++ b/lib/siphash.c
> @@ -0,0 +1,153 @@
> +/* Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com>
> + * Copyright (C) 2012-2014 Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
> + * Copyright (C) 2012-2014 Daniel J. Bernstein <djb@cr.yp.to>
> + *
> + * This file is provided under a dual BSD/GPLv2 license.
> + *
> + * SipHash: a fast short-input PRF
> + * https://131002.net/siphash/
> + */
> +
> +#include <linux/siphash.h>
> +#include <linux/kernel.h>
> +#include <asm/unaligned.h>
> +
> +#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
> +#include <linux/dcache.h>
> +#include <asm/word-at-a-time.h>
> +#endif
> +
> +#define SIPROUND \
> +       do { \
> +       v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32); \
> +       v2 += v3; v3 = rol64(v3, 16); v3 ^= v2; \
> +       v0 += v3; v3 = rol64(v3, 21); v3 ^= v0; \
> +       v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32); \
> +       } while(0)
> +
> +static inline u16 le16_to_cpuvp(const void *p)
> +{
> +       return le16_to_cpup(p);
> +}
> +static inline u32 le32_to_cpuvp(const void *p)
> +{
> +       return le32_to_cpup(p);
> +}
> +static inline u64 le64_to_cpuvp(const void *p)
> +{
> +       return le64_to_cpup(p);
> +}
> +
> +/**
> + * siphash24 - compute 64-bit siphash24 PRF value
> + * @data: buffer to hash, must be aligned to SIPHASH24_ALIGNMENT
> + * @size: size of @data
> + * @key: key buffer of size SIPHASH24_KEY_LEN, must be aligned to SIPHASH24_ALIGNMENT
> + */
> +u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
> +{
> +       u64 v0 = 0x736f6d6570736575ULL;
> +       u64 v1 = 0x646f72616e646f6dULL;
> +       u64 v2 = 0x6c7967656e657261ULL;
> +       u64 v3 = 0x7465646279746573ULL;
> +       u64 b = ((u64)len) << 56;
> +       u64 k0 = le64_to_cpuvp(key);
> +       u64 k1 = le64_to_cpuvp(key + sizeof(u64));
> +       u64 m;
> +       const u8 *end = data + len - (len % sizeof(u64));
> +       const u8 left = len & (sizeof(u64) - 1);
> +       v3 ^= k1;
> +       v2 ^= k0;
> +       v1 ^= k1;
> +       v0 ^= k0;
> +       for (; data != end; data += sizeof(u64)) {
> +               m = le64_to_cpuvp(data);
> +               v3 ^= m;
> +               SIPROUND;
> +               SIPROUND;
> +               v0 ^= m;
> +       }
> +#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
> +       if (left)
> +               b |= le64_to_cpu((__force __le64)(load_unaligned_zeropad(data) & bytemask_from_count(left)));
> +#else
> +       switch (left) {
> +       case 7: b |= ((u64)data[6]) << 48;
> +       case 6: b |= ((u64)data[5]) << 40;
> +       case 5: b |= ((u64)data[4]) << 32;
> +       case 4: b |= le32_to_cpuvp(data); break;
> +       case 3: b |= ((u64)data[2]) << 16;
> +       case 2: b |= le16_to_cpuvp(data); break;
> +       case 1: b |= data[0];
> +       }
> +#endif
> +       v3 ^= b;
> +       SIPROUND;
> +       SIPROUND;
> +       v0 ^= b;
> +       v2 ^= 0xff;
> +       SIPROUND;
> +       SIPROUND;
> +       SIPROUND;
> +       SIPROUND;
> +       return (v0 ^ v1) ^ (v2 ^ v3);
> +}
> +EXPORT_SYMBOL(siphash24);
> +
> +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +/**
> + * siphash24 - compute 64-bit siphash24 PRF value, without alignment requirements
> + * @data: buffer to hash
> + * @size: size of @data
> + * @key: key buffer of size SIPHASH24_KEY_LEN
> + */
> +u64 siphash24_unaligned(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
> +{
> +       u64 v0 = 0x736f6d6570736575ULL;
> +       u64 v1 = 0x646f72616e646f6dULL;
> +       u64 v2 = 0x6c7967656e657261ULL;
> +       u64 v3 = 0x7465646279746573ULL;
> +       u64 b = ((u64)len) << 56;
> +       u64 k0 = get_unaligned_le64(key);
> +       u64 k1 = get_unaligned_le64(key + sizeof(u64));
> +       u64 m;
> +       const u8 *end = data + len - (len % sizeof(u64));
> +       const u8 left = len & (sizeof(u64) - 1);
> +       v3 ^= k1;
> +       v2 ^= k0;
> +       v1 ^= k1;
> +       v0 ^= k0;
> +       for (; data != end; data += sizeof(u64)) {
> +               m = get_unaligned_le64(data);
> +               v3 ^= m;
> +               SIPROUND;
> +               SIPROUND;
> +               v0 ^= m;
> +       }
> +#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
> +       if (left)
> +               b |= le64_to_cpu((__force __le64)(load_unaligned_zeropad(data) & bytemask_from_count(left)));
> +#else
> +       switch (left) {
> +       case 7: b |= ((u64)data[6]) << 48;
> +       case 6: b |= ((u64)data[5]) << 40;
> +       case 5: b |= ((u64)data[4]) << 32;
> +       case 4: b |= get_unaligned_le32(data); break;
> +       case 3: b |= ((u64)data[2]) << 16;
> +       case 2: b |= get_unaligned_le16(data); break;
> +       case 1: b |= data[0];
> +       }
> +#endif
> +       v3 ^= b;
> +       SIPROUND;
> +       SIPROUND;
> +       v0 ^= b;
> +       v2 ^= 0xff;
> +       SIPROUND;
> +       SIPROUND;
> +       SIPROUND;
> +       SIPROUND;
> +       return (v0 ^ v1) ^ (v2 ^ v3);
> +}
> +EXPORT_SYMBOL(siphash24_unaligned);
> +#endif
> diff --git a/lib/test_siphash.c b/lib/test_siphash.c
> new file mode 100644
> index 000000000000..69ac94dec366
> --- /dev/null
> +++ b/lib/test_siphash.c
> @@ -0,0 +1,85 @@
> +/* Test cases for siphash.c
> + *
> + * Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com>
> + *
> + * This file is provided under a dual BSD/GPLv2 license.
> + *
> + * SipHash: a fast short-input PRF
> + * https://131002.net/siphash/
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/siphash.h>
> +#include <linux/kernel.h>
> +#include <linux/string.h>
> +#include <linux/errno.h>
> +#include <linux/module.h>
> +
> +/* Test vectors taken from official reference source available at:
> + *     https://131002.net/siphash/siphash24.c
> + */
> +static const u64 test_vectors[64] = {
> +       0x726fdb47dd0e0e31ULL, 0x74f839c593dc67fdULL, 0x0d6c8009d9a94f5aULL,
> +       0x85676696d7fb7e2dULL, 0xcf2794e0277187b7ULL, 0x18765564cd99a68dULL,
> +       0xcbc9466e58fee3ceULL, 0xab0200f58b01d137ULL, 0x93f5f5799a932462ULL,
> +       0x9e0082df0ba9e4b0ULL, 0x7a5dbbc594ddb9f3ULL, 0xf4b32f46226bada7ULL,
> +       0x751e8fbc860ee5fbULL, 0x14ea5627c0843d90ULL, 0xf723ca908e7af2eeULL,
> +       0xa129ca6149be45e5ULL, 0x3f2acc7f57c29bdbULL, 0x699ae9f52cbe4794ULL,
> +       0x4bc1b3f0968dd39cULL, 0xbb6dc91da77961bdULL, 0xbed65cf21aa2ee98ULL,
> +       0xd0f2cbb02e3b67c7ULL, 0x93536795e3a33e88ULL, 0xa80c038ccd5ccec8ULL,
> +       0xb8ad50c6f649af94ULL, 0xbce192de8a85b8eaULL, 0x17d835b85bbb15f3ULL,
> +       0x2f2e6163076bcfadULL, 0xde4daaaca71dc9a5ULL, 0xa6a2506687956571ULL,
> +       0xad87a3535c49ef28ULL, 0x32d892fad841c342ULL, 0x7127512f72f27cceULL,
> +       0xa7f32346f95978e3ULL, 0x12e0b01abb051238ULL, 0x15e034d40fa197aeULL,
> +       0x314dffbe0815a3b4ULL, 0x027990f029623981ULL, 0xcadcd4e59ef40c4dULL,
> +       0x9abfd8766a33735cULL, 0x0e3ea96b5304a7d0ULL, 0xad0c42d6fc585992ULL,
> +       0x187306c89bc215a9ULL, 0xd4a60abcf3792b95ULL, 0xf935451de4f21df2ULL,
> +       0xa9538f0419755787ULL, 0xdb9acddff56ca510ULL, 0xd06c98cd5c0975ebULL,
> +       0xe612a3cb9ecba951ULL, 0xc766e62cfcadaf96ULL, 0xee64435a9752fe72ULL,
> +       0xa192d576b245165aULL, 0x0a8787bf8ecb74b2ULL, 0x81b3e73d20b49b6fULL,
> +       0x7fa8220ba3b2eceaULL, 0x245731c13ca42499ULL, 0xb78dbfaf3a8d83bdULL,
> +       0xea1ad565322a1a0bULL, 0x60e61c23a3795013ULL, 0x6606d7e446282b93ULL,
> +       0x6ca4ecb15c5f91e1ULL, 0x9f626da15c9625f3ULL, 0xe51b38608ef25f57ULL,
> +       0x958a324ceb064572ULL
> +};
> +
> +static int __init siphash_test_init(void)
> +{
> +       u8 in[64] __aligned(SIPHASH24_ALIGNMENT);
> +       u8 k[16] __aligned(SIPHASH24_ALIGNMENT);
> +       u8 in_unaligned[65];
> +       u8 k_unaligned[65];
> +       u8 i;
> +       int ret = 0;
> +
> +       for (i = 0; i < 16; ++i) {
> +               k[i] = i;
> +               k_unaligned[i + 1] = i;
> +       }
> +       for (i = 0; i < 64; ++i) {
> +               in[i] = i;
> +               in_unaligned[i + 1] = i;
> +               if (siphash24(in, i, k) != test_vectors[i]) {
> +                       pr_info("self-test aligned %u: FAIL\n", i + 1);
> +                       ret = -EINVAL;
> +               }
> +               if (siphash24_unaligned(in_unaligned + 1, i, k_unaligned + 1) != test_vectors[i]) {
> +                       pr_info("self-test unaligned %u: FAIL\n", i + 1);
> +                       ret = -EINVAL;
> +               }
> +       }
> +       if (!ret)
> +               pr_info("self-tests: pass\n");
> +       return ret;
> +}
> +
> +static void __exit siphash_test_exit(void)
> +{
> +}
> +
> +module_init(siphash_test_init);
> +module_exit(siphash_test_exit);
> +
> +MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
> +MODULE_LICENSE("Dual BSD/GPL");
> --
> 2.11.0
>

^ permalink raw reply

* Re: [PATCH v2] crypto: sun4i-ss: support the Security System PRNG
From: PrasannaKumar Muralidharan @ 2016-12-14 19:17 UTC (permalink / raw)
  To: Herbert Xu
  Cc: linux-kernel, Chen-Yu Tsai, Corentin Labbe, linux-crypto,
	maxime.ripard, davem, linux-arm-kernel
In-Reply-To: <20161214050551.GD9592@gondor.apana.org.au>

>> I have found two solutions:
>
> No we already have algif_rng so let's not confuse things even
> further by making hwrng take PRNGs.

Even if both the solutions could not be adopted I think there must be
a way for applications to use similar API to get true rng or prng.
Given the case that no user complained about prng data when using
/dev/hwrng is it safe to assume that the random data generated is
acceptable for users? If so, the drivers can be left without any
modification.

Should there be a mandate that driver will be accepted only when it
passes 'rngtest'. This will make sure that prng drivers won't get
added in future.

Regards,
PrasannaKumar

^ permalink raw reply

* Re: Re: [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long
From: Jason A. Donenfeld @ 2016-12-14 19:12 UTC (permalink / raw)
  To: kernel-hardening, Theodore Ts'o, Jason A. Donenfeld, Netdev,
	David Miller, Linus Torvalds, LKML, George Spelvin, Scott Bauer,
	Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers,
	Linux Crypto Mailing List, Jean-Philippe Aumasson
In-Reply-To: <20161214163731.luj2dzmnihcuhn5p@thunk.org>

Hi again,

On Wed, Dec 14, 2016 at 5:37 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> [    3.606139] random benchmark!!
> [    3.606276] get_random_int # cycles: 326578
> [    3.606317] get_random_int_new # cycles: 95438
> [    3.607423] get_random_bytes # cycles: 2653388

Looks to me like my siphash implementation is much faster for
get_random_long, and more or less tied for get_random_int:

[    1.729370] random benchmark!!
[    1.729710] get_random_long # cycles: 349771
[    1.730128] get_random_long_chacha # cycles: 359660
[    1.730457] get_random_long_siphash # cycles: 94255
[    1.731307] get_random_bytes # cycles: 1354894
[    1.731707] get_random_int # cycles: 305640
[    1.732095] get_random_int_chacha # cycles: 80726
[    1.732425] get_random_int_siphash # cycles: 94265
[    1.733278] get_random_bytes # cycles: 1315873

Given the increasing usage of get_random_long for ASLR and related, I
think this makes the siphash approach worth pursuing. The chacha
approach is also not significantly different from the md5 approach in
terms of speed for get_rand_long. Additionally, since siphash is a
PRF, I think this opens up a big window for optimizing it even
further.

Benchmark here:
    https://git.zx2c4.com/linux-dev/commit/?h=rng-bench

Jason

^ permalink raw reply

* [PATCH v3 3/3] random: use siphash24 instead of md5 for get_random_int/long
From: Jason A. Donenfeld @ 2016-12-14 18:46 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto
  Cc: Jason A. Donenfeld, Jean-Philippe Aumasson, Ted Tso
In-Reply-To: <20161214184605.24006-1-Jason@zx2c4.com>

This duplicates the current algorithm for get_random_int/long, but uses
siphash24 instead. This comes with several benefits. It's certainly
faster and more cryptographically secure than MD5. This patch also
hashes the pid, entropy, and timestamp as fixed width fields, in order
to increase diffusion.

The previous md5 algorithm used a per-cpu md5 state, which caused
successive calls to the function to chain upon each other. While it's
not entirely clear that this kind of chaining is absolutely necessary
when using a secure PRF like siphash24, it can't hurt, and the timing of
the call chain does add a degree of natural entropy. So, in keeping with
this design, instead of the massive per-cpu 64-byte md5 state, there is
instead a per-cpu previously returned value for chaining.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Ted Tso <tytso@mit.edu>
---
Changes from v2->v3:

  - Structs are no longer packed, to mitigate slow byte-by-byte assignment.

 drivers/char/random.c | 52 ++++++++++++++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 19 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d6876d506220..b1c2e3b26430 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -262,6 +262,7 @@
 #include <linux/syscalls.h>
 #include <linux/completion.h>
 #include <linux/uuid.h>
+#include <linux/siphash.h>
 #include <crypto/chacha20.h>
 
 #include <asm/processor.h>
@@ -2042,7 +2043,7 @@ struct ctl_table random_table[] = {
 };
 #endif 	/* CONFIG_SYSCTL */
 
-static u32 random_int_secret[MD5_MESSAGE_BYTES / 4] ____cacheline_aligned;
+static u8 random_int_secret[SIPHASH24_KEY_LEN] __aligned(SIPHASH24_ALIGNMENT);
 
 int random_int_secret_init(void)
 {
@@ -2050,8 +2051,7 @@ int random_int_secret_init(void)
 	return 0;
 }
 
-static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
-		__aligned(sizeof(unsigned long));
+static DEFINE_PER_CPU(u64, get_random_int_chaining);
 
 /*
  * Get a random word for internal kernel use only. Similar to urandom but
@@ -2061,19 +2061,26 @@ static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
  */
 unsigned int get_random_int(void)
 {
-	__u32 *hash;
 	unsigned int ret;
+	struct {
+		u64 chaining;
+		unsigned long ts;
+		unsigned long entropy;
+		pid_t pid;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined;
+	u64 *chaining;
 
 	if (arch_get_random_int(&ret))
 		return ret;
 
-	hash = get_cpu_var(get_random_int_hash);
-
-	hash[0] += current->pid + jiffies + random_get_entropy();
-	md5_transform(hash, random_int_secret);
-	ret = hash[0];
-	put_cpu_var(get_random_int_hash);
-
+	chaining = get_cpu_ptr(&get_random_int_chaining);
+	combined.chaining = *chaining;
+	combined.ts = jiffies;
+	combined.entropy = random_get_entropy();
+	combined.pid = current->pid;
+	ret = *chaining = siphash24((u8 *)&combined, offsetof(typeof(combined), end), random_int_secret);
+	put_cpu_ptr(chaining);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_int);
@@ -2083,19 +2090,26 @@ EXPORT_SYMBOL(get_random_int);
  */
 unsigned long get_random_long(void)
 {
-	__u32 *hash;
 	unsigned long ret;
+	struct {
+		u64 chaining;
+		unsigned long ts;
+		unsigned long entropy;
+		pid_t pid;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined;
+	u64 *chaining;
 
 	if (arch_get_random_long(&ret))
 		return ret;
 
-	hash = get_cpu_var(get_random_int_hash);
-
-	hash[0] += current->pid + jiffies + random_get_entropy();
-	md5_transform(hash, random_int_secret);
-	ret = *(unsigned long *)hash;
-	put_cpu_var(get_random_int_hash);
-
+	chaining = get_cpu_ptr(&get_random_int_chaining);
+	combined.chaining = *chaining;
+	combined.ts = jiffies;
+	combined.entropy = random_get_entropy();
+	combined.pid = current->pid;
+	ret = *chaining = siphash24((u8 *)&combined, offsetof(typeof(combined), end), random_int_secret);
+	put_cpu_ptr(chaining);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_long);
-- 
2.11.0

^ permalink raw reply related

* [PATCH v3 2/3] secure_seq: use siphash24 instead of md5_transform
From: Jason A. Donenfeld @ 2016-12-14 18:46 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto
  Cc: Jason A. Donenfeld, Andi Kleen, David Miller, David Laight
In-Reply-To: <20161214184605.24006-1-Jason@zx2c4.com>

This gives a clear speed and security improvement. Siphash is both
faster and is more solid crypto than the aging MD5.

Rather than manually filling MD5 buffers, we simply create
a layout by a simple anonymous struct, for which gcc generates
rather efficient code.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Laight <David.Laight@aculab.com>
---
Changes from v2->v3:

  - Structs are no longer packed, to mitigate slow byte-by-byte assignment.
  - A typo has been fixed in the port number assignment.

 net/core/secure_seq.c | 166 ++++++++++++++++++++++++++------------------------
 1 file changed, 85 insertions(+), 81 deletions(-)

diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 88a8e429fc3e..00eb141c981b 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -1,3 +1,5 @@
+/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. */
+
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/cryptohash.h>
@@ -8,14 +10,14 @@
 #include <linux/ktime.h>
 #include <linux/string.h>
 #include <linux/net.h>
-
+#include <linux/siphash.h>
 #include <net/secure_seq.h>
 
 #if IS_ENABLED(CONFIG_IPV6) || IS_ENABLED(CONFIG_INET)
+#include <linux/in6.h>
 #include <net/tcp.h>
-#define NET_SECRET_SIZE (MD5_MESSAGE_BYTES / 4)
 
-static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
+static u8 net_secret[SIPHASH24_KEY_LEN] __aligned(SIPHASH24_ALIGNMENT);
 
 static __always_inline void net_secret_init(void)
 {
@@ -44,44 +46,41 @@ static u32 seq_scale(u32 seq)
 u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
 				 __be16 sport, __be16 dport, u32 *tsoff)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
-	u32 i;
-
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 sport;
+		__be16 dport;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.sport = sport,
+		.dport = dport
+	};
+	u64 hash;
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32)daddr[i];
-	secret[4] = net_secret[4] +
-		(((__force u16)sport << 16) + (__force u16)dport);
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	*tsoff = sysctl_tcp_timestamps == 1 ? hash[1] : 0;
-	return seq_scale(hash[0]);
+	hash = siphash24((const u8 *)&combined, offsetof(typeof(combined), end), net_secret);
+	*tsoff = sysctl_tcp_timestamps == 1 ? (hash >> 32) : 0;
+	return seq_scale(hash);
 }
 EXPORT_SYMBOL(secure_tcpv6_sequence_number);
 
 u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
-	u32 i;
-
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 dport;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.dport = dport
+	};
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32) daddr[i];
-	secret[4] = net_secret[4] + (__force u32)dport;
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	return hash[0];
+	return siphash24((const u8 *)&combined, offsetof(typeof(combined), end), net_secret);
 }
 EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 #endif
@@ -91,33 +90,39 @@ EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
 			       __be16 sport, __be16 dport, u32 *tsoff)
 {
-	u32 hash[MD5_DIGEST_WORDS];
-
+	const struct {
+		__be32 saddr;
+		__be32 daddr;
+		__be16 sport;
+		__be16 dport;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined = {
+		.saddr = saddr,
+		.daddr = daddr,
+		.sport = sport,
+		.dport = dport
+	};
+	u64 hash;
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	*tsoff = sysctl_tcp_timestamps == 1 ? hash[1] : 0;
-	return seq_scale(hash[0]);
+	hash = siphash24((const u8 *)&combined, offsetof(typeof(combined), end), net_secret);
+	*tsoff = sysctl_tcp_timestamps == 1 ? (hash >> 32) : 0;
+	return seq_scale(hash);
 }
 
 u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
 {
-	u32 hash[MD5_DIGEST_WORDS];
-
+	const struct {
+		__be32 saddr;
+		__be32 daddr;
+		__be16 dport;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined = {
+		.saddr = saddr,
+		.daddr = daddr,
+		.dport = dport
+	};
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = (__force u32)dport ^ net_secret[14];
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	return hash[0];
+	return siphash24((const u8 *)&combined, offsetof(typeof(combined), end), net_secret);
 }
 EXPORT_SYMBOL_GPL(secure_ipv4_port_ephemeral);
 #endif
@@ -126,21 +131,23 @@ EXPORT_SYMBOL_GPL(secure_ipv4_port_ephemeral);
 u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
 				__be16 sport, __be16 dport)
 {
-	u32 hash[MD5_DIGEST_WORDS];
+	const struct {
+		__be32 saddr;
+		__be32 daddr;
+		__be16 sport;
+		__be16 dport;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined = {
+		.saddr = saddr,
+		.daddr = daddr,
+		.sport = sport,
+		.dport = dport
+	};
 	u64 seq;
-
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	seq = hash[0] | (((u64)hash[1]) << 32);
+	seq = siphash24((const u8 *)&combined, offsetof(typeof(combined), end), net_secret);
 	seq += ktime_get_real_ns();
 	seq &= (1ull << 48) - 1;
-
 	return seq;
 }
 EXPORT_SYMBOL(secure_dccp_sequence_number);
@@ -149,26 +156,23 @@ EXPORT_SYMBOL(secure_dccp_sequence_number);
 u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr,
 				  __be16 sport, __be16 dport)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 sport;
+		__be16 dport;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.sport = sport,
+		.dport = dport
+	};
 	u64 seq;
-	u32 i;
-
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32)daddr[i];
-	secret[4] = net_secret[4] +
-		(((__force u16)sport << 16) + (__force u16)dport);
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	seq = hash[0] | (((u64)hash[1]) << 32);
+	seq = siphash24((const u8 *)&combined, offsetof(typeof(combined), end), net_secret);
 	seq += ktime_get_real_ns();
 	seq &= (1ull << 48) - 1;
-
 	return seq;
 }
 EXPORT_SYMBOL(secure_dccpv6_sequence_number);
-- 
2.11.0

^ permalink raw reply related

* [PATCH v3 1/3] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-14 18:46 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto
  Cc: Jason A. Donenfeld, Jean-Philippe Aumasson, Daniel J . Bernstein,
	Linus Torvalds, Eric Biggers, David Laight
In-Reply-To: <20161214035927.30004-1-Jason@zx2c4.com>

SipHash is a 64-bit keyed hash function that is actually a
cryptographically secure PRF, like HMAC. Except SipHash is super fast,
and is meant to be used as a hashtable keyed lookup function.

SipHash isn't just some new trendy hash function. It's been around for a
while, and there really isn't anything that comes remotely close to
being useful in the way SipHash is. With that said, why do we need this?

There are a variety of attacks known as "hashtable poisoning" in which an
attacker forms some data such that the hash of that data will be the
same, and then preceeds to fill up all entries of a hashbucket. This is
a realistic and well-known denial-of-service vector.

Linux developers already seem to be aware that this is an issue, and
various places that use hash tables in, say, a network context, use a
non-cryptographically secure function (usually jhash) and then try to
twiddle with the key on a time basis (or in many cases just do nothing
and hope that nobody notices). While this is an admirable attempt at
solving the problem, it doesn't actually fix it. SipHash fixes it.

(It fixes it in such a sound way that you could even build a stream
cipher out of SipHash that would resist the modern cryptanalysis.)

There are a modicum of places in the kernel that are vulnerable to
hashtable poisoning attacks, either via userspace vectors or network
vectors, and there's not a reliable mechanism inside the kernel at the
moment to fix it. The first step toward fixing these issues is actually
getting a secure primitive into the kernel for developers to use. Then
we can, bit by bit, port things over to it as deemed appropriate.

Secondly, a few places are using MD5 for creating secure sequence
numbers, port numbers, or fast random numbers. Siphash is a faster, more
fittting, and more secure replacement for MD5 in those situations.

Dozens of languages are already using this internally for their hash
tables. Some of the BSDs already use this in their kernels. SipHash is
a widely known high-speed solution to a widely known problem, and it's
time we catch-up.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Daniel J. Bernstein <djb@cr.yp.to>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
---
Changes from v2->v3:

  - There is now a fast aligned version of the function and a not-as-fast
    unaligned version. The requirements for each have been documented in
    a docbook-style comment. As well, the header now contains a constant
    for the expected alignment.

  - The test suite has been updated to check both the unaligned and aligned
    version of the function.

 include/linux/siphash.h |  30 ++++++++++
 lib/Kconfig.debug       |   6 +-
 lib/Makefile            |   5 +-
 lib/siphash.c           | 153 ++++++++++++++++++++++++++++++++++++++++++++++++
 lib/test_siphash.c      |  85 +++++++++++++++++++++++++++
 5 files changed, 274 insertions(+), 5 deletions(-)
 create mode 100644 include/linux/siphash.h
 create mode 100644 lib/siphash.c
 create mode 100644 lib/test_siphash.c

diff --git a/include/linux/siphash.h b/include/linux/siphash.h
new file mode 100644
index 000000000000..82dc1a911a2e
--- /dev/null
+++ b/include/linux/siphash.h
@@ -0,0 +1,30 @@
+/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#ifndef _LINUX_SIPHASH_H
+#define _LINUX_SIPHASH_H
+
+#include <linux/types.h>
+
+enum siphash_lengths {
+	SIPHASH24_KEY_LEN = 16,
+	SIPHASH24_ALIGNMENT = 8
+};
+
+u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]);
+
+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+static inline u64 siphash24_unaligned(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
+{
+	return siphash24(data, len, key);
+}
+#else
+u64 siphash24_unaligned(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]);
+#endif
+
+#endif /* _LINUX_SIPHASH_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e6327d102184..32bbf689fc46 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1843,9 +1843,9 @@ config TEST_HASH
 	tristate "Perform selftest on hash functions"
 	default n
 	help
-	  Enable this option to test the kernel's integer (<linux/hash,h>)
-	  and string (<linux/stringhash.h>) hash functions on boot
-	  (or module load).
+	  Enable this option to test the kernel's integer (<linux/hash.h>),
+	  string (<linux/stringhash.h>), and siphash (<linux/siphash.h>)
+	  hash functions on boot (or module load).
 
 	  This is intended to help people writing architecture-specific
 	  optimized versions.  If unsure, say N.
diff --git a/lib/Makefile b/lib/Makefile
index 50144a3aeebd..71d398b04a74 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -22,7 +22,8 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 sha1.o chacha20.o md5.o irq_regs.o argv_split.o \
 	 flex_proportions.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
-	 earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o win_minmax.o
+	 earlycpio.o seq_buf.o siphash.o \
+	 nmi_backtrace.o nodemask.o win_minmax.o
 
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
@@ -44,7 +45,7 @@ obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
 obj-y += kstrtox.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
 obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o
-obj-$(CONFIG_TEST_HASH) += test_hash.o
+obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o
 obj-$(CONFIG_TEST_KASAN) += test_kasan.o
 obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o
 obj-$(CONFIG_TEST_LKM) += test_module.o
diff --git a/lib/siphash.c b/lib/siphash.c
new file mode 100644
index 000000000000..32acdc26234f
--- /dev/null
+++ b/lib/siphash.c
@@ -0,0 +1,153 @@
+/* Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ * Copyright (C) 2012-2014 Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
+ * Copyright (C) 2012-2014 Daniel J. Bernstein <djb@cr.yp.to>
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#include <linux/siphash.h>
+#include <linux/kernel.h>
+#include <asm/unaligned.h>
+
+#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
+#include <linux/dcache.h>
+#include <asm/word-at-a-time.h>
+#endif
+
+#define SIPROUND \
+	do { \
+	v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32); \
+	v2 += v3; v3 = rol64(v3, 16); v3 ^= v2; \
+	v0 += v3; v3 = rol64(v3, 21); v3 ^= v0; \
+	v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32); \
+	} while(0)
+
+static inline u16 le16_to_cpuvp(const void *p)
+{
+	return le16_to_cpup(p);
+}
+static inline u32 le32_to_cpuvp(const void *p)
+{
+	return le32_to_cpup(p);
+}
+static inline u64 le64_to_cpuvp(const void *p)
+{
+	return le64_to_cpup(p);
+}
+
+/**
+ * siphash24 - compute 64-bit siphash24 PRF value
+ * @data: buffer to hash, must be aligned to SIPHASH24_ALIGNMENT
+ * @size: size of @data
+ * @key: key buffer of size SIPHASH24_KEY_LEN, must be aligned to SIPHASH24_ALIGNMENT
+ */
+u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
+{
+	u64 v0 = 0x736f6d6570736575ULL;
+	u64 v1 = 0x646f72616e646f6dULL;
+	u64 v2 = 0x6c7967656e657261ULL;
+	u64 v3 = 0x7465646279746573ULL;
+	u64 b = ((u64)len) << 56;
+	u64 k0 = le64_to_cpuvp(key);
+	u64 k1 = le64_to_cpuvp(key + sizeof(u64));
+	u64 m;
+	const u8 *end = data + len - (len % sizeof(u64));
+	const u8 left = len & (sizeof(u64) - 1);
+	v3 ^= k1;
+	v2 ^= k0;
+	v1 ^= k1;
+	v0 ^= k0;
+	for (; data != end; data += sizeof(u64)) {
+		m = le64_to_cpuvp(data);
+		v3 ^= m;
+		SIPROUND;
+		SIPROUND;
+		v0 ^= m;
+	}
+#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
+	if (left)
+		b |= le64_to_cpu((__force __le64)(load_unaligned_zeropad(data) & bytemask_from_count(left)));
+#else
+	switch (left) {
+	case 7: b |= ((u64)data[6]) << 48;
+	case 6: b |= ((u64)data[5]) << 40;
+	case 5: b |= ((u64)data[4]) << 32;
+	case 4: b |= le32_to_cpuvp(data); break;
+	case 3: b |= ((u64)data[2]) << 16;
+	case 2: b |= le16_to_cpuvp(data); break;
+	case 1: b |= data[0];
+	}
+#endif
+	v3 ^= b;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= b;
+	v2 ^= 0xff;
+	SIPROUND;
+	SIPROUND;
+	SIPROUND;
+	SIPROUND;
+	return (v0 ^ v1) ^ (v2 ^ v3);
+}
+EXPORT_SYMBOL(siphash24);
+
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+/**
+ * siphash24 - compute 64-bit siphash24 PRF value, without alignment requirements
+ * @data: buffer to hash
+ * @size: size of @data
+ * @key: key buffer of size SIPHASH24_KEY_LEN
+ */
+u64 siphash24_unaligned(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
+{
+	u64 v0 = 0x736f6d6570736575ULL;
+	u64 v1 = 0x646f72616e646f6dULL;
+	u64 v2 = 0x6c7967656e657261ULL;
+	u64 v3 = 0x7465646279746573ULL;
+	u64 b = ((u64)len) << 56;
+	u64 k0 = get_unaligned_le64(key);
+	u64 k1 = get_unaligned_le64(key + sizeof(u64));
+	u64 m;
+	const u8 *end = data + len - (len % sizeof(u64));
+	const u8 left = len & (sizeof(u64) - 1);
+	v3 ^= k1;
+	v2 ^= k0;
+	v1 ^= k1;
+	v0 ^= k0;
+	for (; data != end; data += sizeof(u64)) {
+		m = get_unaligned_le64(data);
+		v3 ^= m;
+		SIPROUND;
+		SIPROUND;
+		v0 ^= m;
+	}
+#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
+	if (left)
+		b |= le64_to_cpu((__force __le64)(load_unaligned_zeropad(data) & bytemask_from_count(left)));
+#else
+	switch (left) {
+	case 7: b |= ((u64)data[6]) << 48;
+	case 6: b |= ((u64)data[5]) << 40;
+	case 5: b |= ((u64)data[4]) << 32;
+	case 4: b |= get_unaligned_le32(data); break;
+	case 3: b |= ((u64)data[2]) << 16;
+	case 2: b |= get_unaligned_le16(data); break;
+	case 1: b |= data[0];
+	}
+#endif
+	v3 ^= b;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= b;
+	v2 ^= 0xff;
+	SIPROUND;
+	SIPROUND;
+	SIPROUND;
+	SIPROUND;
+	return (v0 ^ v1) ^ (v2 ^ v3);
+}
+EXPORT_SYMBOL(siphash24_unaligned);
+#endif
diff --git a/lib/test_siphash.c b/lib/test_siphash.c
new file mode 100644
index 000000000000..69ac94dec366
--- /dev/null
+++ b/lib/test_siphash.c
@@ -0,0 +1,85 @@
+/* Test cases for siphash.c
+ *
+ * Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/siphash.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/module.h>
+
+/* Test vectors taken from official reference source available at:
+ *     https://131002.net/siphash/siphash24.c
+ */
+static const u64 test_vectors[64] = {
+	0x726fdb47dd0e0e31ULL, 0x74f839c593dc67fdULL, 0x0d6c8009d9a94f5aULL,
+	0x85676696d7fb7e2dULL, 0xcf2794e0277187b7ULL, 0x18765564cd99a68dULL,
+	0xcbc9466e58fee3ceULL, 0xab0200f58b01d137ULL, 0x93f5f5799a932462ULL,
+	0x9e0082df0ba9e4b0ULL, 0x7a5dbbc594ddb9f3ULL, 0xf4b32f46226bada7ULL,
+	0x751e8fbc860ee5fbULL, 0x14ea5627c0843d90ULL, 0xf723ca908e7af2eeULL,
+	0xa129ca6149be45e5ULL, 0x3f2acc7f57c29bdbULL, 0x699ae9f52cbe4794ULL,
+	0x4bc1b3f0968dd39cULL, 0xbb6dc91da77961bdULL, 0xbed65cf21aa2ee98ULL,
+	0xd0f2cbb02e3b67c7ULL, 0x93536795e3a33e88ULL, 0xa80c038ccd5ccec8ULL,
+	0xb8ad50c6f649af94ULL, 0xbce192de8a85b8eaULL, 0x17d835b85bbb15f3ULL,
+	0x2f2e6163076bcfadULL, 0xde4daaaca71dc9a5ULL, 0xa6a2506687956571ULL,
+	0xad87a3535c49ef28ULL, 0x32d892fad841c342ULL, 0x7127512f72f27cceULL,
+	0xa7f32346f95978e3ULL, 0x12e0b01abb051238ULL, 0x15e034d40fa197aeULL,
+	0x314dffbe0815a3b4ULL, 0x027990f029623981ULL, 0xcadcd4e59ef40c4dULL,
+	0x9abfd8766a33735cULL, 0x0e3ea96b5304a7d0ULL, 0xad0c42d6fc585992ULL,
+	0x187306c89bc215a9ULL, 0xd4a60abcf3792b95ULL, 0xf935451de4f21df2ULL,
+	0xa9538f0419755787ULL, 0xdb9acddff56ca510ULL, 0xd06c98cd5c0975ebULL,
+	0xe612a3cb9ecba951ULL, 0xc766e62cfcadaf96ULL, 0xee64435a9752fe72ULL,
+	0xa192d576b245165aULL, 0x0a8787bf8ecb74b2ULL, 0x81b3e73d20b49b6fULL,
+	0x7fa8220ba3b2eceaULL, 0x245731c13ca42499ULL, 0xb78dbfaf3a8d83bdULL,
+	0xea1ad565322a1a0bULL, 0x60e61c23a3795013ULL, 0x6606d7e446282b93ULL,
+	0x6ca4ecb15c5f91e1ULL, 0x9f626da15c9625f3ULL, 0xe51b38608ef25f57ULL,
+	0x958a324ceb064572ULL
+};
+
+static int __init siphash_test_init(void)
+{
+	u8 in[64] __aligned(SIPHASH24_ALIGNMENT);
+	u8 k[16] __aligned(SIPHASH24_ALIGNMENT);
+	u8 in_unaligned[65];
+	u8 k_unaligned[65];
+	u8 i;
+	int ret = 0;
+
+	for (i = 0; i < 16; ++i) {
+		k[i] = i;
+		k_unaligned[i + 1] = i;
+	}
+	for (i = 0; i < 64; ++i) {
+		in[i] = i;
+		in_unaligned[i + 1] = i;
+		if (siphash24(in, i, k) != test_vectors[i]) {
+			pr_info("self-test aligned %u: FAIL\n", i + 1);
+			ret = -EINVAL;
+		}
+		if (siphash24_unaligned(in_unaligned + 1, i, k_unaligned + 1) != test_vectors[i]) {
+			pr_info("self-test unaligned %u: FAIL\n", i + 1);
+			ret = -EINVAL;
+		}
+	}
+	if (!ret)
+		pr_info("self-tests: pass\n");
+	return ret;
+}
+
+static void __exit siphash_test_exit(void)
+{
+}
+
+module_init(siphash_test_init);
+module_exit(siphash_test_exit);
+
+MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
+MODULE_LICENSE("Dual BSD/GPL");
-- 
2.11.0

^ permalink raw reply related

* (unknown), 
From: witt.kohl @ 2016-12-14 18:22 UTC (permalink / raw)
  To: linux-crypto

[-- Attachment #1: MICROSOFT_4388381_linux-crypto.zip --]
[-- Type: application/zip, Size: 3096 bytes --]

^ permalink raw reply

* Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform
From: Jason A. Donenfeld @ 2016-12-14 18:06 UTC (permalink / raw)
  To: David Miller
  Cc: David Laight, Netdev, kernel-hardening, Andi Kleen, LKML,
	Linux Crypto Mailing List
In-Reply-To: <20161214.125612.1361254098267633173.davem@davemloft.net>

Hi David,

On Wed, Dec 14, 2016 at 6:56 PM, David Miller <davem@davemloft.net> wrote:
> Just marking the structure __packed, whether necessary or not, makes
> the compiler assume that the members are not aligned and causes
> byte-by-byte accesses to be performed for words.
> Never, _ever_, use __packed unless absolutely necessary, it pessimizes
> the code on cpus that require proper alignment of types.

Oh, jimminy cricket, I did not realize that it made assignments
byte-by-byte *always*. So what options am I left with? What
immediately comes to mind are:

1)

struct {
    u64 a;
    u32 b;
    u32 c;
    u16 d;
    u8 end[];
} a = {
    .a = a,
    .b = b,
    .c = c,
    .d = d
};
siphash24(&a, offsetof(typeof(a), end), key);

2)

u8 bytes[sizeof(u64) + sizeof(u32) * 2 + sizeof(u16)];
*(u64 *)&bytes[0] = a;
*(u32 *)&bytes[sizeof(u64)] = b;
*(u32 *)&bytes[sizeof(u64) + sizeof(u32)] = c;
*(u16 *)&bytes[sizeof(u64) + sizeof(u32) * 2] = d;
siphash24(bytes, sizeof(bytes), key);


Personally I find (1) a bit neater than (2). What's your opinion?

Jason

^ permalink raw reply

* Re: [kernel-hardening] Re: [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long
From: Jason A. Donenfeld @ 2016-12-14 17:58 UTC (permalink / raw)
  To: kernel-hardening, Theodore Ts'o, Jason A. Donenfeld, Netdev,
	David Miller, Linus Torvalds, LKML, George Spelvin, Scott Bauer,
	Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers,
	Linux Crypto Mailing List, Jean-Philippe Aumasson
In-Reply-To: <20161214163731.luj2dzmnihcuhn5p@thunk.org>

Hey Ted,

On Wed, Dec 14, 2016 at 5:37 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> One somewhat undesirable aspect of the current algorithm is that we
> never change random_int_secret.

Why exactly would this be a problem? So long as the secret is kept
secret, the PRF is secure. If an attacker can read arbitrary kernel
memory, there are much much bigger issues to be concerned about. As
well, the "chaining" variable I introduce ensures that the random
numbers are, per-cpu, related to the uniqueness of timing of
subsequent calls.

> So I've been toying with the
> following, which is 4 times faster than md5.  (I haven't tried
> benchmarking against siphash yet.)
>
> [    3.606139] random benchmark!!
> [    3.606276] get_random_int # cycles: 326578
> [    3.606317] get_random_int_new # cycles: 95438
> [    3.607423] get_random_bytes # cycles: 2653388

Cool, I'll benchmark it against the siphash implementation. I like
what you did with batching up lots of chacha output, and doling it out
bit by bit. I suspect this will be quite fast, because with chacha20
you get an entire block.

> P.S.  It's interesting to note that siphash24 and chacha20 are both
> add-rotate-xor based algorithms.

Quite! Lots of nice shiny things are turning out be be ARX -- ChaCha,
BLAKE2, Siphash, NORX. The simplicity is really appealing.

Jason

^ permalink raw reply

* Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform
From: David Miller @ 2016-12-14 17:56 UTC (permalink / raw)
  To: Jason
  Cc: David.Laight, netdev, kernel-hardening, ak, linux-kernel,
	linux-crypto
In-Reply-To: <CAHmME9pEM=cDC5S=j1BU2oCF8-WdnbRfiVojcet4rXcRLcpJRw@mail.gmail.com>

From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Wed, 14 Dec 2016 13:53:10 +0100

> In all current uses of __packed in the code, I think the impact is
> precisely zero, because all structures have members in descending
> order of size, with each member being a perfect multiple of the one
> below it. The __packed is therefore just there for safety, in case
> somebody comes in and screws everything up by sticking a u8 in
> between.

Just marking the structure __packed, whether necessary or not, makes
the compiler assume that the members are not aligned and causes
byte-by-byte accesses to be performed for words.

Never, _ever_, use __packed unless absolutely necessary, it pessimizes
the code on cpus that require proper alignment of types.

^ permalink raw reply

* Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform
From: Jason A. Donenfeld @ 2016-12-14 17:49 UTC (permalink / raw)
  To: David Laight
  Cc: Hannes Frederic Sowa, Netdev, kernel-hardening@lists.openwall.com,
	Andi Kleen, LKML, Linux Crypto Mailing List
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB023F7BD@AcuExch.aculab.com>

On Wed, Dec 14, 2016 at 3:47 PM, David Laight <David.Laight@aculab.com> wrote:
> Just remove the __packed and ensure that the structure is 'nice'.
> This includes ensuring there is no 'tail padding'.
> In some cases you'll need to put the port number into a 32bit field.

I'd rather not. There's no point in wasting extra cycles on hashing
useless data, just for some particular syntactic improvement. __packed
__aligned(8) will do what we want perfectly, I think.

> I'd also require that the key be aligned.

Yep, I'll certainly do this for the siphash24_aligned version in the v3.

^ permalink raw reply

* Re: [PATCH v2] keys/encrypted: Fix two crypto-on-the-stack bugs
From: David Howells @ 2016-12-14 17:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: dhowells, Andy Lutomirski, linux-kernel@vger.kernel.org, USB list,
	keyrings, Eric Biggers, linux-crypto, Herbert Xu, Stephan Mueller
In-Reply-To: <CALCETrWWt33=03gbBs73NkBOunZU+req1BLgUNcEMfh3T1AXUA@mail.gmail.com>

Andy Lutomirski <luto@amacapital.net> wrote:

> David, are these encrypted keys ever exported anywhere?  If not, could
> the code use a mode that doesn't need padding?

ecryptfs uses them, I think.

David

^ permalink raw reply

* Re: [PATCH] keys/encrypted: Fix two crypto-on-the-stack bugs
From: Joerg Roedel @ 2016-12-14 16:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Laight, David Woodhouse, Linus Torvalds, Ingo Molnar,
	Andy Lutomirski, linux-kernel@vger.kernel.org,
	linux-usb@vger.kernel.org, dhowells@redhat.com,
	keyrings@vger.kernel.org, Eric Biggers,
	linux-crypto@vger.kernel.org, Herbert Xu, Stephan Mueller
In-Reply-To: <CALCETrWsTKq0NOpwiJtB50OU7w99-m82NhPG_Uxs2Fqbpz0LLA@mail.gmail.com>

On Tue, Dec 13, 2016 at 08:40:00AM -0800, Andy Lutomirski wrote:
> But I think this is rather silly.  Joerg, Linus, etc: would it be okay
> to change lib/dma-debug.c to allow DMA *from* rodata?

Yeah, this would be fine for DMA_TO_DEVICE mappings. At least I can't
think of a reason right now to not allow it, in the end its also
read-only memory for the CPU, so it can be readable by devices too.
There is no danger of race conditions like on stacks or data leaks, as
there is only compile-time data in rodata.

	Joerg

^ permalink raw reply

* Re: [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long
From: Theodore Ts'o @ 2016-12-14 16:37 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Netdev, David Miller, Linus Torvalds,
	kernel-hardening@lists.openwall.com, LKML, George Spelvin,
	Scott Bauer, Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers,
	linux-crypto, Jean-Philippe Aumasson
In-Reply-To: <20161214031037.25498-1-Jason@zx2c4.com>

On Wed, Dec 14, 2016 at 04:10:37AM +0100, Jason A. Donenfeld wrote:
> This duplicates the current algorithm for get_random_int/long, but uses
> siphash24 instead. This comes with several benefits. It's certainly
> faster and more cryptographically secure than MD5. This patch also
> hashes the pid, entropy, and timestamp as fixed width fields, in order
> to increase diffusion.
> 
> The previous md5 algorithm used a per-cpu md5 state, which caused
> successive calls to the function to chain upon each other. While it's
> not entirely clear that this kind of chaining is absolutely necessary
> when using a secure PRF like siphash24, it can't hurt, and the timing of
> the call chain does add a degree of natural entropy. So, in keeping with
> this design, instead of the massive per-cpu 64-byte md5 state, there is
> instead a per-cpu previously returned value for chaining.
> 
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>

The original reason for get_random_int was because the original
urandom algorithms were too slow.  When we moved to chacha20, which is
must faster, I didn't think to revisit get_random_int() and
get_random_long().

One somewhat undesirable aspect of the current algorithm is that we
never change random_int_secret.  So I've been toying with the
following, which is 4 times faster than md5.  (I haven't tried
benchmarking against siphash yet.)

[    3.606139] random benchmark!!
[    3.606276] get_random_int # cycles: 326578
[    3.606317] get_random_int_new # cycles: 95438
[    3.607423] get_random_bytes # cycles: 2653388

     	       			  	  - Ted

P.S.  It's interesting to note that siphash24 and chacha20 are both
add-rotate-xor based algorithms.

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d6876d506220..be172ea75799 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1681,6 +1681,38 @@ static int rand_initialize(void)
 }
 early_initcall(rand_initialize);
 
+static unsigned int get_random_int_new(void);
+
+static int rand_benchmark(void)
+{
+	cycles_t start,finish;
+	int i, out;
+
+	pr_crit("random benchmark!!\n");
+	start = get_cycles();
+	for (i = 0; i < 1000; i++) {
+		get_random_int();
+	}
+	finish = get_cycles();
+	pr_err("get_random_int # cycles: %llu\n", finish - start);
+
+	start = get_cycles();
+	for (i = 0; i < 1000; i++) {
+		get_random_int_new();
+	}
+	finish = get_cycles();
+	pr_err("get_random_int_new # cycles: %llu\n", finish - start);
+
+	start = get_cycles();
+	for (i = 0; i < 1000; i++) {
+		get_random_bytes(&out, sizeof(out));
+	}
+	finish = get_cycles();
+	pr_err("get_random_bytes # cycles: %llu\n", finish - start);
+	return 0;
+}
+device_initcall(rand_benchmark);
+
 #ifdef CONFIG_BLOCK
 void rand_initialize_disk(struct gendisk *disk)
 {
@@ -2064,8 +2096,10 @@ unsigned int get_random_int(void)
 	__u32 *hash;
 	unsigned int ret;
 
+#if 0	// force slow path
 	if (arch_get_random_int(&ret))
 		return ret;
+#endif
 
 	hash = get_cpu_var(get_random_int_hash);
 
@@ -2100,6 +2134,38 @@ unsigned long get_random_long(void)
 }
 EXPORT_SYMBOL(get_random_long);
 
+struct random_buf {
+	__u8 buf[CHACHA20_BLOCK_SIZE];
+	int ptr;
+};
+
+static DEFINE_PER_CPU(struct random_buf, batched_entropy);
+
+static void get_batched_entropy(void *buf, int n)
+{
+	struct random_buf *p;
+
+	p = &get_cpu_var(batched_entropy);
+
+	if ((p->ptr == 0) ||
+	    (p->ptr + n >= CHACHA20_BLOCK_SIZE)) {
+		extract_crng(p->buf);
+		p->ptr = 0;
+	}
+	BUG_ON(n > CHACHA20_BLOCK_SIZE);
+	memcpy(buf, p->buf, n);
+	p->ptr += n;
+	put_cpu_var(batched_entropy);
+}
+
+static unsigned int get_random_int_new(void)
+{
+	int	ret;
+
+	get_batched_entropy(&ret, sizeof(ret));
+	return ret;
+}
+
 /**
  * randomize_page - Generate a random, page aligned address
  * @start:	The smallest acceptable address the caller will take.

^ permalink raw reply related

* Re: [PATCH v2] keys/encrypted: Fix two crypto-on-the-stack bugs
From: Andy Lutomirski @ 2016-12-14 16:22 UTC (permalink / raw)
  To: David Howells
  Cc: Andy Lutomirski, linux-kernel@vger.kernel.org, USB list, keyrings,
	Eric Biggers, linux-crypto, Herbert Xu, Stephan Mueller
In-Reply-To: <18039.1481704679@warthog.procyon.org.uk>

On Wed, Dec 14, 2016 at 12:37 AM, David Howells <dhowells@redhat.com> wrote:
> Andy Lutomirski <luto@amacapital.net> wrote:
>
>> > -       sg_set_buf(&sg_out[1], pad, sizeof pad);
>> > +       sg_set_buf(&sg_out[1], empty_zero_page, 16);
>>
>> My fix here is obviously bogus (I meant to use ZERO_PAGE(0)), but what
>> exactly is the code trying to do?  The old code makes no sense.  It's
>> setting the *output* buffer to zeroed padding.
>
> Padding goes into the encrypt function and is going to come out of the decrypt
> function.  Possibly derived_key_decrypt() should be checking that the padding
> that comes out is actually a bunch of zeros.  Maybe we don't actually need to
> get the padding out, but I'm not sure whether the crypto layer will
> malfunction if we don't give it a buffer for the padding.

It was the memset that threw me for a loop.

David, are these encrypted keys ever exported anywhere?  If not, could
the code use a mode that doesn't need padding?

--Andy

>
> David



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* Re: [Qemu-devel] [PATCH v7 1/1] crypto: add virtio-crypto driver
From: Halil Pasic @ 2016-12-14 15:38 UTC (permalink / raw)
  To: Gonglei, linux-kernel, qemu-devel, virtio-dev, virtualization,
	linux-crypto
  Cc: weidong.huang, claudio.fontana, mst, luonengjun, hanweidong,
	xin.zeng, xuquan8, wanzongshun, stefanha, jianjay.zhou,
	cornelia.huck, longpeng2, arei.gonglei, davem, wu.wubin, herbert
In-Reply-To: <1481716249-185392-2-git-send-email-arei.gonglei@huawei.com>



On 12/14/2016 12:50 PM, Gonglei wrote:
> diff --git a/drivers/crypto/virtio/virtio_crypto_core.c b/drivers/crypto/virtio/virtio_crypto_core.c
> new file mode 100644
> index 0000000..c0854a1
> --- /dev/null
> +++ b/drivers/crypto/virtio/virtio_crypto_core.c
> @@ -0,0 +1,474 @@
[..]
> +
> +static void virtcrypto_dataq_callback(struct virtqueue *vq)
> +{
> +	struct virtio_crypto *vcrypto = vq->vdev->priv;
> +	struct virtio_crypto_request *vc_req;
> +	unsigned long flags;
> +	unsigned int len;
> +	struct ablkcipher_request *ablk_req;
> +	int error;
> +
> +	spin_lock_irqsave(&vcrypto->lock, flags);

Would it make sense to use a per virtqueue lock
like in virtio_blk for example instead of locking on the whole
device? OK, it seems you use only one dataqueue, so it
may not be that relevant.

> +	do {
> +		virtqueue_disable_cb(vq);
> +		while ((vc_req = virtqueue_get_buf(vq, &len)) != NULL) {
> +			if (vc_req->type == VIRTIO_CRYPTO_SYM_OP_CIPHER) {
> +				switch (vc_req->status) {
> +				case VIRTIO_CRYPTO_OK:
> +					error = 0;
> +					break;
> +				case VIRTIO_CRYPTO_INVSESS:
> +				case VIRTIO_CRYPTO_ERR:
> +					error = -EINVAL;
> +					break;
> +				case VIRTIO_CRYPTO_BADMSG:
> +					error = -EBADMSG;
> +					break;
> +				default:
> +					error = -EIO;
> +					break;
> +				}
> +				ablk_req = vc_req->ablkcipher_req;
> +				virtcrypto_clear_request(vc_req);
> +
> +				spin_unlock_irqrestore(&vcrypto->lock, flags);
> +				/* Finish the encrypt or decrypt process */
> +				ablk_req->base.complete(&ablk_req->base, error);
> +				spin_lock_irqsave(&vcrypto->lock, flags);
> +			}
> +		}
> +	} while (!virtqueue_enable_cb(vq));
> +	spin_unlock_irqrestore(&vcrypto->lock, flags);
> +}

^ permalink raw reply

* Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function
From: Hannes Frederic Sowa @ 2016-12-14 15:09 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Netdev, kernel-hardening, LKML, Linux Crypto Mailing List,
	Jean-Philippe Aumasson, Daniel J . Bernstein, Linus Torvalds,
	Eric Biggers
In-Reply-To: <CAHmME9qA6qKdp+qoih2Je4fxU+4E6=Gp7CVfhYU7VbOr6HJ=0Q@mail.gmail.com>

Hello,

On 14.12.2016 14:10, Jason A. Donenfeld wrote:
> On Wed, Dec 14, 2016 at 12:21 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> Can you show or cite benchmarks in comparison with jhash? Last time I
>> looked, especially for short inputs, siphash didn't beat jhash (also on
>> all the 32 bit devices etc.).
> 
> I assume that jhash is likely faster than siphash, but I wouldn't be
> surprised if with optimization we can make siphash at least pretty
> close on 64-bit platforms. (I'll do some tests though; maybe I'm wrong
> and jhash is already slower.)

Yes, numbers would be very usable here. I am mostly concerned about
small plastic router cases. E.g. assume you double packet processing
time with a change of the hashing function at what point is the actual
packet processing more of an attack vector than the hashtable?

> With that said, siphash is here to replace uses of jhash where
> hashtable poisoning vulnerabilities make it necessary. Where there's
> no significant security improvement, if there's no speed improvement
> either, then of course nothing's going to change.

It still changes currently well working source. ;-)

> I should have mentioned md5_transform in this first message too, as
> two other patches in this series actually replace md5_transform usage
> with siphash. I think in this case, siphash is a clear performance
> winner (and security winner) over md5_transform. So if the push back
> against replacing jhash usages is just too high, at the very least it
> remains useful already for the md5_transform usage.

MD5 is considered broken because its collision resistance is broken?
SipHash doesn't even claim to have collision resistance (which we don't
need here)?

But I agree, certainly it could be a nice speed-up!

>> This pretty much depends on the linearity of the hash function? I don't
>> think a crypto secure hash function is needed for a hash table. Albeit I
>> agree that siphash certainly looks good to be used here.
> 
> In order to prevent the aforementioned poisoning attacks, a PRF with
> perfect linearity is required, which is what's achieved when it's a
> cryptographically secure one. Check out section 7 of
> https://131002.net/siphash/siphash.pdf .

I think you mean non-linearity. Otherwise I agree that siphash is
certainly a better suited hashing algorithm as far as I know. But it
would be really interesting to compare some performance numbers. Hard to
say anything without them.

>> I am pretty sure that SipHash still needs a random key per hash table
>> also. So far it was only the choice of hash function you are questioning.
> 
> Siphash needs a random secret key, yes. The point is that the hash
> function remains secure so long as the secret key is kept secret.
> Other functions can't make the same guarantee, and so nervous periodic
> key rotation is necessary, but in most cases nothing is done, and so
> things just leak over time.
> 
> 
>> Hmm, I tried to follow up with all the HashDoS work and so far didn't
>> see any HashDoS attacks against the Jenkins/SpookyHash family.
>>
>> If this is an issue we might need to also put those changes into stable.
> 
> jhash just isn't secure; it's not a cryptographically secure PRF. If
> there hasn't already been an academic paper put out there about it
> this year, let's make this thread 1000 messages long to garner
> attention, and next year perhaps we'll see one. No doubt that
> motivated government organizations, defense contractors, criminals,
> and other netizens have already done research in private. Replacing
> insecure functions with secure functions is usually a good thing.

I think this is a weak argument.

In general I am in favor to switch to siphash, but it would be nice to
see some benchmarks with the specific kernel implementation also on some
smaller 32 bit CPUs and especially without using any SIMD instructions
(which might have been used in paper comparison).

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH 3/3] crypto: brcm: Add Broadcom SPU driver DT entry.
From: Rob (William) Rice @ 2016-12-14 15:00 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Herbert Xu, David S. Miller, Rob Herring,
	Mark Rutland, linux-crypto, devicetree, linux-kernel, Ray Jui,
	Scott Branden, Jon Mason, bcm-kernel-feedback-list,
	Catalin Marinas, Will Deacon, linux-arm-kernel, Steve Lin
In-Reply-To: <201612110810.BEVbRp0B%fengguang.wu@intel.com>

I will rebase to Herbert's latest when I submit v3 to address Mark 
Rutland's DT comments (and any others that come in).

Rob


On 12/10/2016 7:14 PM, kbuild test robot wrote:
> Hi Rob,
>
> [auto build test ERROR on cryptodev/master]
> [also build test ERROR on v4.9-rc8]
> [cannot apply to next-20161209]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url:    https://github.com/0day-ci/linux/commits/Rob-Rice/crypto-brcm-DT-documentation-for-Broadcom-SPU-driver/20161202-010038
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
> config: arm64-defconfig (attached as .config)
> compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
>          wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # save the attached .config to linux build tree
>          make.cross ARCH=arm64
>
> All errors (new ones prefixed by >>):
>
>>> ERROR: Input tree has errors, aborting (use -f to force output)
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* RE: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform
From: David Laight @ 2016-12-14 14:47 UTC (permalink / raw)
  To: 'Jason A. Donenfeld', Hannes Frederic Sowa
  Cc: Netdev, kernel-hardening@lists.openwall.com, Andi Kleen, LKML,
	Linux Crypto Mailing List
In-Reply-To: <CAHmME9o4NVi-MPeURio1Ga58rnW6JAGQdTg6scd+K3EZEf3RNA@mail.gmail.com>

From: Jason A. Donenfeld
> Sent: 14 December 2016 13:44
> To: Hannes Frederic Sowa
> > __packed not only removes all padding of the struct but also changes the
> > alignment assumptions for the whole struct itself. The rule, the struct
> > is aligned by its maximum alignment of a member is no longer true. That
> > said, the code accessing this struct will change (not on archs that can
> > deal efficiently with unaligned access, but on others).
> 
> That's interesting. There currently aren't any alignment requirements
> in siphash because we use the unaligned helper functions, but as David
> pointed out in another thread, maybe that too should change. In that
> case, we'd have an aligned-only version of the function that requires
> 8-byte aligned input. Perhaps the best way to go about that would be
> to just mark the struct as __packed __aligned(8). Or, I guess, since
> 64-bit accesses gets split into two on 32-bit, that'd be best descried
> as __packed __aligned(sizeof(long)). Would that be an acceptable
> solution?

Just remove the __packed and ensure that the structure is 'nice'.
This includes ensuring there is no 'tail padding'.
In some cases you'll need to put the port number into a 32bit field.

I'd also require that the key be aligned.
It probably ought to be a named structure type with two 64bit members
(or with an array member that has two elements).

	David


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox