Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

All of lore.kernel.org
 help / color / mirror / Atom feed

From: tytso@mit.edu
To: Stephan Mueller <smueller@chronox.de>
Cc: linux-kernel@vger.kernel.org, herbert@gondor.apana.org.au,
	andi@firstfloor.org, sandyinchina@gmail.com,
	cryptography@lakedaemon.net, jsd@av8n.com, hpa@zytor.com,
	linux-crypto@vger.kernel.org
Subject: Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG
Date: Wed, 4 May 2016 17:30:41 +0000	[thread overview]
Message-ID: <20160504173041.GB3901@thunk.org> (raw)
In-Reply-To: <6266306.8u6rsYMSBi@positron.chronox.de>

On Tue, May 03, 2016 at 10:50:25AM +0200, Stephan Mueller wrote:
> > +/*
> > + * crng_init =  0 --> Uninitialized
> > + *		2 --> Initialized
> > + *		3 --> Initialized from input_pool
> > + */
> > +static int crng_init = 0;
> 
> shouldn't that be an atomic_t ?

The crng_init variable only gets incremented (it progresses from
0->1->2->3) and it's protected by the primary_crng->lock spinlock.
There are a few places where we read it without first taking the lock,
but that's where we depend on it getting incremented and where if we
race with another CPU which has just bumped the crng_init status, it's
not critical.  (Or we can take the lock and then recheck the crng_init
if we really need to be sure.)

> > +static void _initialize_crng(struct crng_state *crng)
> > +{
> > +	int		i;
> > +	unsigned long	rv;
> 
> Why do you use unsigned long here? I thought the state[i] is unsigned int.

Because it gets filled in via arch_get_random_seed_long(&rv) and
arch_get_random_log(&rv).  If that means we get 64 bits and we only
use 32 bits, that's OK....

> Would it make sense to add the ChaCha20 self test vectors from RFC7539 here to 
> test that the ChaCha20 works?

We're not doing that for any of the other ciphers and hashes.  We
didn't do that for SHA1, and the chacha20 code where I took this from
didn't check for this as well.  What threat are you most worried
about.  We don't care about chacha20 being exactly chacha20, so long
as it's cryptographically strong.  In fact I think I removed a
potential host order byteswap in the set key operation specifically
because we don't care and interop.

If this is required for FIPS mode, we can add that later.  I was
primarily trying to keep the patch small to be easier for people to
look at it, so I've deliberately left off bells and whistles that
aren't strictly needed to show that the new approach is sound.

> > +	if (crng_init++ >= 2)
> > +		wake_up_interruptible(&crng_init_wait);
> 
> Don't we have a race here with the crng_init < 3 check in crng_reseed 
> considering multi-core systems?

No, because we are holding the primary_crng->lock spinlock.  I'll add
a comment explaining the locking protections which is intended for
crng_init where we declare it.

> > +	if (num < 16 || num > 32) {
> > +		WARN_ON(1);
> > +		pr_err("crng_reseed: num is %d?!?\n", num);
> > +	}
> > +	num_words = (num + 3) / 4;
> > +	for (i = 0; i < num_words; i++)
> > +		primary_crng.state[i+4] ^= tmp[i];
> > +	primary_crng.init_time = jiffies;
> > +	if (crng_init < 3) {
> 
> Shouldn't that one be if (crng_init < 3 && num >= 16) ?

I'll just change the above WRN_ON test to be:

     BUG_ON(num < 16 || num > 32);

This really can't happen, and I had it as a WARN_ON with a printk for
debugging purpose in case I was wrong about how the code works.

> > +		crng_init = 3;
> > +		process_random_ready_list();
> > +		wake_up_interruptible(&crng_init_wait);
> > +		pr_notice("random: crng_init 3\n");
> 
> Would it make sense to be more descriptive here to allow readers of dmesg to 
> understand the output?

Yes, what we're currently printing during the initialization:

random: crng_init 1
random: crng_init 2
random: crng_init 3

was again mostly for debugging purposes.  What I'm thinking about
doing is changing crng_init 2 and 3 messages to be:

random: crng fast init done
random: crng conservative init done

> > +	}
> > +	ret = 1;
> > +out:
> > +	spin_unlock_irqrestore(&primary_crng.lock, flags);
> 
> memzero_explicit of tmp?

Good point, I've added a call to memzero_explicit().

> > +static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
> > +{
> > +	unsigned long v, flags;
> > +	struct crng_state *crng = &primary_crng;
> > +
> > +	if (crng_init > 2 &&
> > +	    time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL))
> > +		crng_reseed(&input_pool);
> > +	spin_lock_irqsave(&crng->lock, flags);
> > +	if (arch_get_random_long(&v))
> > +		crng->state[14] ^= v;
> > +	chacha20_block(&crng->state[0], out);
> > +	if (crng->state[12] == 0)
> > +		crng->state[13]++;

> What is the purpose to only cover the 2nd 32 bit value of the nonce with 
> arch_get_random?
> 
> state[12]++? Or why do you increment the nonce?

In Chacha20, its output is a funcrtion of the state array, where
state[0..3] is a constant specified by the Chacha20 definition,
state[4..11] is the Key, and state[12..15] is the IV.  The
chacha20_block() function increments state[12] each time it is called,
so state[12] is being used as the block counter.  The increment of
state[13] is used to make state[13] to be the upper 32-bits of the
block counter.  We *should* be reseeding more often than 2**32 calls
to chacha20_block(), and the increment is just to be safe in case
something goes wronng and we're not reseeding.

We're using crng[14] to be contain the output of RDRAND, so this is
where we mix in the contribution from a CPU-level hwrng.  Previously
we called RDRAND multiple times and XOR'ed the results into the
output.  This is a bit faster and is more comforting for paranoiacs
who are concerned that Intel might have down something shifty enough
to be able to read from memory and change the output of RDRAND to
force a particular output from the RNG.

Ware using state[15] to contain the NUMA node id.  This was because
originally the used used the same key bytes (state[4..11]) between
different NUMA state arrays, so state[15] was used to guarantee that
state arrays would be different between different NUMA node state
arrays.  Now that we are deriving the key from a primary_crng, we
could use state[15] for something else, including as a second
destination from RDRAND.

> Now I have to wear my (ugly) FIPS heat: we need that code from the current 
> implementation here:
> 
>                 if (fips_enabled) {
>                         spin_lock_irqsave(&r->lock, flags);
>                         if (!memcmp(tmp, r->last_data, EXTRACT_SIZE))
>                                 panic("Hardware RNG duplicated output!\n");
>                         memcpy(r->last_data, tmp, EXTRACT_SIZE);
>                         spin_unlock_irqrestore(&r->lock, flags);
>                 }
> 

I'll add FIPS support as a separate patch.  I personally consider FIPS
support to be a distraction, and it's only useful for people who need
to sell to governments (mostly US governments).

> > -	if (unlikely(nonblocking_pool.initialized == 0))
> > -		printk_once(KERN_NOTICE "random: %s urandom read "
> > -			    "with %d bits of entropy available\n",
> > -			    current->comm, nonblocking_pool.entropy_total);
> > -
> > +	crng_wait_ready();
> 
> Just for clarification: are you now blocking /dev/urandom until the CRNG is 
> filled? That would be a big win.

Until the the fast init state, yes.  In practice we are blocking until
128 interrupts have occurred, which during boot is hapens quickly
enough that even on a simple KVM system, this happens before userspace
starts up.  There *is* a risk here, though.  Imagine a embedded
platform with very few interrupt-driven devices so device probing
isn't enough to make the CRNG ready when the initial ramdisk starts
up, and the init program tries to read from /dev/urandom --- which
then blocks, potentially indefinitely.

If that happens, then we will have broken userspace, and we may need
to revert this part of the change.  But I'm willing to take the risk,
in hopes that such super-simplisitic devices don't exist in real life,
and if they do, the IOT devices will probably be blithely ignoring
cryptographic concerns so much that they aren't using /dev/urandom
anyway.  :-)

>Would it make sense to add another chacha20_block() call here at the end?
>Note, the one thing about the SP800-90A DRBG I really like is the enhanced
>backward secrecy support which is implemented by "updating" the internal state
>(the key / state) used for one or more random number generation rounds after
>one request for random numbers is satisfied.
>
>This means that even if the state becomes known or the subsequent caller
>manages to deduce the state of the RNG to some degree of confidence, he cannot
>backtrack the already generated random numbers.

That's a good idea; being able to prevent back-tracking attacks is
a good thing.  I'll add this in the next version.

Cheers,

						- Ted

next prev parent reply	other threads:[~2016-05-04 17:30 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-02  6:26 [RFC PATCH 0/3] random: replace urandom pool with a CRNG Theodore Ts'o
2016-05-02  6:26 ` [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG Theodore Ts'o
2016-05-03  8:50   ` Stephan Mueller
2016-05-04 16:54     ` Jeffrey Walton
2016-05-04 17:30     ` tytso [this message]
2016-05-04 17:52       ` H. Peter Anvin
2016-05-03  9:36   ` Stephan Mueller
2016-05-04  6:24     ` Stephan Mueller
2016-05-04 14:40   ` Jeffrey Walton
2016-05-04 17:49     ` tytso
2016-05-04 18:22       ` Jeffrey Walton
2016-05-04 18:29         ` H. Peter Anvin
2016-05-04 19:07           ` tytso
2016-05-04 20:53             ` H. Peter Anvin
2016-05-04 21:42             ` John Denker
2016-05-04 21:52               ` better patch for linux/bitops.h John Denker
2016-05-05  1:35                 ` Jeffrey Walton
2016-05-05  2:41                   ` H. Peter Anvin
2016-05-05  2:54                     ` Jeffrey Walton
2016-05-05  3:08                       ` H. Peter Anvin
2016-05-05  3:30                         ` Jeffrey Walton
2016-05-05  3:50                           ` Theodore Ts'o
2016-05-05  4:03                             ` Jeffrey Walton
2016-05-05  6:35                               ` H. Peter Anvin
2016-05-05 16:15                                 ` UB in general ... and linux/bitops.h in particular John Denker
2016-05-05 17:32                                   ` Andi Kleen
2016-05-06  2:25                                   ` Jeffrey Walton
2016-05-05 21:34                             ` better patch for linux/bitops.h Sandy Harris
2016-05-05 22:18                               ` tytso
2016-05-05 22:22                                 ` H. Peter Anvin
2016-05-05 22:38                                 ` H. Peter Anvin
2016-05-06  0:13                                 ` H. Peter Anvin
2016-05-04 21:56               ` [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG H. Peter Anvin
2016-05-04 22:06                 ` linux/bitops.h John Denker
2016-05-04 23:06                   ` linux/bitops.h Andi Kleen
2016-05-05  0:13                     ` linux/bitops.h John Denker
2016-05-05  1:20                     ` linux/bitops.h Jeffrey Walton
2016-05-05  1:27                       ` linux/bitops.h H. Peter Anvin
2016-05-05  0:30                   ` linux/bitops.h H. Peter Anvin
2016-05-05  0:48                     ` linux/bitops.h Linus Torvalds
2016-05-06 20:08                       ` linux/bitops.h Sasha Levin
2016-05-06 20:07                     ` linux/bitops.h Sasha Levin
2016-05-06 20:25                       ` linux/bitops.h H. Peter Anvin
2016-05-06 20:30                       ` linux/bitops.h H. Peter Anvin
2016-05-02  6:26 ` [PATCH 2/3] random: make /dev/urandom scalable for silly userspace programs Theodore Ts'o
2016-05-02  7:00   ` Stephan Mueller
2016-05-02 12:50     ` Theodore Ts'o
2016-05-02 13:48       ` Theodore Ts'o
2016-05-02 13:53         ` Stephan Mueller
2016-05-02  6:26 ` [PATCH 3/3] random: add interrupt callback to VMBus IRQ handler Theodore Ts'o
2016-05-02  9:00   ` Jeffrey Walton
2016-05-02  9:14     ` Stephan Mueller
2016-05-02 12:56       ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160504173041.GB3901@thunk.org \
    --to=tytso@mit.edu \
    --cc=andi@firstfloor.org \
    --cc=cryptography@lakedaemon.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=hpa@zytor.com \
    --cc=jsd@av8n.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sandyinchina@gmail.com \
    --cc=smueller@chronox.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.