Re: [PATCH v16 4/5] random: introduce generic vDSO getrandom() implementation

linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org, patches@lists.linux.dev,
	linux-crypto@vger.kernel.org, linux-api@vger.kernel.org,
	x86@kernel.org, Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
	Carlos O'Donell <carlos@redhat.com>,
	Florian Weimer <fweimer@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>, Jann Horn <jannh@google.com>,
	Christian Brauner <brauner@kernel.org>,
	David Hildenbrand <dhildenb@redhat.com>
Subject: Re: [PATCH v16 4/5] random: introduce generic vDSO getrandom() implementation
Date: Fri, 7 Jun 2024 18:32:58 +0200	[thread overview]
Message-ID: <ZmM2Olwd4hr0teMT@zx2c4.com> (raw)
In-Reply-To: <874ja73xx7.ffs@tglx>

On Wed, Jun 05, 2024 at 11:03:00PM +0200, Thomas Gleixner wrote:
> Jason!
Thomas!

> Can you please split the required defines into a seperate header
> preferrably in include/vdso/ and include that from crypto/chacha.h

Sure. It only actually uses two straight forward constants from there.
> > +			u32	key[CHACHA_KEY_SIZE / sizeof(u32)];
> 
> CHACHA_STATE_WORDS ?

Nah, that's for CHACHA_BLOCK_SIZE / sizeof(u32), but here is
CHACHA_KEY_SIZE.

> 
> > +		};
> > +		u8		batch_key[CHACHA_BLOCK_SIZE * 2];
> 
> What does the u8 buy here over a simple unsigned int?
> 
> > +	bool 			in_use;

It means that the structure can be more compact, because `pos` and the
`in_use` boolean will be closer together.


> > diff --git a/include/vdso/types.h b/include/vdso/types.h
> > new file mode 100644
> > index 000000000000..ce131463aeff
> > --- /dev/null
> > +++ b/include/vdso/types.h
> > @@ -0,0 +1,35 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> 
> Why does this need an extra header when it's clearly getrandom specific?
> Please put this into getrandom.h

From your followup, I just killed the whole thing and now use u64.

> > +/**
> > + * __cvdso_getrandom_data - Generic vDSO implementation of getrandom() syscall.
> > + * @rng_info:		Describes state of kernel RNG, memory shared with kernel.
> > + * @buffer:		Destination buffer to fill with random bytes.
> > + * @len:		Size of @buffer in bytes.
> > + * @flags:		Zero or more GRND_* flags.
> > + * @opaque_state:	Pointer to an opaque state area.
> > + *
> > + * This implements a "fast key erasure" RNG using ChaCha20, in the same way that the kernel's
> > + * getrandom() syscall does. It periodically reseeds its key from the kernel's RNG, at the same
> > + * schedule that the kernel's RNG is reseeded. If the kernel's RNG is not ready, then this always
> > + * calls into the syscall.
> > + *
> > + * @opaque_state *must* be allocated using the vgetrandom_alloc() syscall.  Unless external locking
> > + * is used, one state must be allocated per thread, as it is not safe to call this function
> > + * concurrently with the same @opaque_state. However, it is safe to call this using the same
> > + * @opaque_state that is shared between main code and signal handling code, within the same thread.
> > + *
> > + * Returns the number of random bytes written to @buffer, or a negative value indicating an error.
> > + */
> > +static __always_inline ssize_t
> > +__cvdso_getrandom_data(const struct vdso_rng_data *rng_info, void *buffer, size_t len,
> > +		       unsigned int flags, void *opaque_state)
> > +{
> > +	ssize_t ret = min_t(size_t, INT_MAX & PAGE_MASK /* = MAX_RW_COUNT */, len);
> 
> We really need to allow reading almost 2GB of random data in one go?

It's just copying the precise semantics as the syscall by bounding the
requested length. The idea is to make this have basically identical
semantics as the syscall (while being way faster).

> > +	/*
> > +	 * @state->in_use is basic reentrancy protection against this running in a signal handler
> > +	 * with the same @opaque_state, but obviously not atomic wrt multiple CPUs or more than one
> > +	 * level of reentrancy. If a signal interrupts this after reading @state->in_use, but before
> > +	 * writing @state->in_use, there is still no race, because the signal handler will run to
> > +	 * its completion before returning execution.
> 
> Can you please add an explanation that the syscall does not touch the
> state and just fills the buffer?

Will do.

> > +	 */
> > +	in_use = READ_ONCE(state->in_use);
> > +	if (unlikely(in_use))
> > +		goto fallback_syscall;
> > +	WRITE_ONCE(state->in_use, true);
> > +
> > +retry_generation:
> > +	/*
> > +	 * @rng_info->generation must always be read here, as it serializes @state->key with the
> > +	 * kernel's RNG reseeding schedule.
> > +	 */
> > +	current_generation = READ_ONCE(rng_info->generation);
> > +
> > +	/*
> > +	 * If @state->generation doesn't match the kernel RNG's generation, then it means the
> > +	 * kernel's RNG has reseeded, and so @state->key is reseeded as well.
> > +	 */
> > +	if (unlikely(state->generation != current_generation)) {
> > +		/*
> > +		 * Write the generation before filling the key, in case of fork. If there is a fork
> > +		 * just after this line, the two forks will get different random bytes from the
> 
> the two forks? You mean the parent and the child, no?

Yes, nice catch, thanks.

> > +		 * syscall, which is good. However, were this line to occur after the getrandom
> > +		 * syscall, then both child and parent could have the same bytes and the same
> > +		 * generation counter, so the fork would not be detected. Therefore, write
> > +		 * @state->generation before the call to the getrandom syscall.
> > +		 */
> > +		WRITE_ONCE(state->generation, current_generation);
> > +
> > +		/* Prevent the syscall from being reordered wrt current_generation. */
> > +		barrier();
> > +
> > +		/* Reseed @state->key using fresh bytes from the kernel. */
> > +		if (getrandom_syscall(state->key, sizeof(state->key), 0) != sizeof(state->key)) {
> > +			/*
> > +			 * If the syscall failed to refresh the key, then @state->key is now
> > +			 * invalid, so invalidate the generation so that it is not used again, and
> > +			 * fallback to using the syscall entirely.
> > +			 */
> > +			WRITE_ONCE(state->generation, 0);
> > +
> > +			/*
> > +			 * Set @state->in_use to false only after the last write to @state in the
> > +			 * line above.
> > +			 */
> > +			WRITE_ONCE(state->in_use, false);
> 
> So here you rely on the compiler not reordering vs. WRITE_ONCE(),
> i.e. volatile, but above you have a barrier() to prevent the write being
> reordered vs. the syscall, confused.
> 
> But even when the compiler does not reorder, what prevents a weakly
> ordered CPU from doing so?

The issue isn't that this code will race between CPUs -- that's
explicitly disallowed by the design. The issue is that this code is
signal-reentrant. So it's mostly a matter of compiler ordering the
instructions correctly. Then, in addition, the write of
current_generation to state->generation must come before the syscall
fires.

> > +	if (!len) {
> > +		/* Prevent the loop from being reordered wrt ->generation. */
> > +		barrier();
> 
> Same question as above.

This came out of discussions here:
https://lore.kernel.org/all/878rjlr85s.fsf@oldenburg.str.redhat.com/
And on IRC with Jann.

> > +	/* Refill the batch and then overwrite the key, in order to preserve forward secrecy. */
> 
> 'and then overwrite'?
> 
> Isn't this overwriting it implicitely because batch_key and key are at
> the same place in the union?

Yes, I'll remove the `then`.

Thanks for the review!

Jason

next prev parent reply	other threads:[~2024-06-07 16:33 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-28 12:19 [PATCH v16 0/5] implement getrandom() in vDSO Jason A. Donenfeld
2024-05-28 12:19 ` [PATCH v16 1/5] mm: add VM_DROPPABLE for designating always lazily freeable mappings Jason A. Donenfeld
2024-05-28 20:41   ` Frank van der Linden
2024-05-28 20:51     ` Jason A. Donenfeld
2024-05-31 10:48   ` Jann Horn
2024-05-31 12:13     ` Jason A. Donenfeld
2024-05-31 13:00       ` Jann Horn
2024-06-07 14:35         ` Jason A. Donenfeld
2024-06-07 15:12           ` Jann Horn
2024-06-07 15:50             ` Jann Horn
2024-06-10 12:00               ` Michal Hocko
2024-06-14 18:35                 ` Jason A. Donenfeld
2024-06-07 18:40   ` Andy Lutomirski
2024-05-28 12:19 ` [PATCH v16 2/5] random: add vgetrandom_alloc() syscall Jason A. Donenfeld
2024-05-31  3:59   ` Eric Biggers
2024-06-01 10:56     ` Jason A. Donenfeld
2024-06-04 17:22       ` Eric Biggers
2024-06-07 14:41         ` Jason A. Donenfeld
2024-06-07 14:45           ` Jason A. Donenfeld
2024-05-28 12:19 ` [PATCH v16 3/5] arch: allocate vgetrandom_alloc() syscall number Jason A. Donenfeld
2024-05-28 13:08   ` Geert Uytterhoeven
2024-05-28 13:10     ` Jason A. Donenfeld
2024-05-28 13:28       ` Jason A. Donenfeld
2024-05-31  2:26         ` Eric Biggers
2024-06-01 10:58           ` Jason A. Donenfeld
2024-05-28 12:19 ` [PATCH v16 4/5] random: introduce generic vDSO getrandom() implementation Jason A. Donenfeld
2024-05-31 19:12   ` Randy Dunlap
2024-05-31 19:15   ` Randy Dunlap
2024-06-07 15:37     ` Jason A. Donenfeld
2024-05-31 23:06   ` Andy Lutomirski
2024-06-07 15:52     ` Jason A. Donenfeld
2024-06-05 21:03   ` Thomas Gleixner
2024-06-05 22:10     ` Thomas Gleixner
2024-06-07 15:59       ` Jason A. Donenfeld
2024-06-07 16:32     ` Jason A. Donenfeld [this message]
2024-06-07 18:39   ` Andy Lutomirski
2024-05-28 12:19 ` [PATCH v16 5/5] x86: vdso: Wire up getrandom() vDSO implementation Jason A. Donenfeld
2024-05-31  3:38   ` Eric Biggers
2024-06-07 15:27     ` Jason A. Donenfeld
2024-05-31 19:16   ` Randy Dunlap
2024-06-07 15:30     ` Jason A. Donenfeld
2024-06-05 21:41   ` Thomas Gleixner
2024-06-07 15:32     ` Jason A. Donenfeld
2024-05-28 14:46 ` [PATCH v16 0/5] implement getrandom() in vDSO Jason A. Donenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZmM2Olwd4hr0teMT@zx2c4.com \
    --to=jason@zx2c4.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=arnd@arndb.de \
    --cc=brauner@kernel.org \
    --cc=carlos@redhat.com \
    --cc=dhildenb@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).