qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: len.brown@intel.com, linux-hyperv@vger.kernel.org,
	colmmacc@amazon.com, berrange@redhat.com, adrian@parity.io,
	kvm@vger.kernel.org, jannh@google.com,
	gregkh@linuxfoundation.org, linux-pm@vger.kernel.org,
	rafael@kernel.org, linux-kernel@vger.kernel.org,
	linux@dominikbrodowski.net, qemu-devel@nongnu.org,
	graf@amazon.com, linux-crypto@vger.kernel.org, pavel@ucw.cz,
	tytso@mit.edu, mikelley@microsoft.com, lersek@redhat.com,
	arnd@arndb.de
Subject: Re: propagating vmgenid outward and upward
Date: Tue, 1 Mar 2022 11:21:38 -0500	[thread overview]
Message-ID: <20220301111459-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <Yh4+9+UpanJWAIyZ@zx2c4.com>

On Tue, Mar 01, 2022 at 04:42:47PM +0100, Jason A. Donenfeld wrote:
> Hey folks,
> 
> Having finally wrapped up development of the initial vmgenid driver, I
> thought I'd pull together some thoughts on vmgenid, notification, and
> propagating, from disjointed conversations I've had with a few of you
> over the last several weeks.
> 
> The basic problem is: VMs can be cloned, forked, rewound, or
> snapshotted, and when this happens, a) the RNG needs to reseed itself,
> and b) cryptographic algorithms that are not reuse resistant need to
> reinitialize in one way or another. For 5.18, we're handling (a) via the
> new vmgenid driver, which implements a spec from Microsoft, whereby the
> driver receives ACPI notifications when a 16 byte unique value changes.
> 
> The vmgenid driver basically works, though it is racy, because that ACPI
> notification can arrive after the system is already running again. This
> race is even worse on Windows, where they kick the notification into a
> worker thread, which then publishes it upward elsewhere to another async
> mechanism, and eventually it hits the RNG and various userspace apps.
> On Linux it's not that bad -- we reseed immediately upon receiving the
> notification -- but it still inherits this same "push"-model deficiency,
> which a "pull"-model would not have.
> 
> If we had a "pull" model, rather than just expose a 16-byte unique
> identifier, the vmgenid virtual hardware would _also_ expose a
> word-sized generation counter, which would be incremented every time the
> unique ID changed. Then, every time we would touch the RNG, we'd simply
> do an inexpensive check of this memremap()'d integer, and reinitialize
> with the unique ID if the integer changed. In this way, the race would
> be entirely eliminated. We would then be able to propagate this outwards
> to other drivers, by just exporting an extern symbol, in the manner of
> `jiffies`, and propagate it upwards to userspace, by putting it in the
> vDSO, in the manner of gettimeofday. And like that, there'd be no
> terrible async thing and things would work pretty easily.

I am not sure what the difference is though. So we have a 16 byte unique
value and you would prefer a dword counter. How is the former not a
superset of the later?  I'm not sure how safe it is to expose it to
userspace specifically, but rest of text talks about exposing it to a
kernel driver so maybe not an issue? So what makes interrupt driven
required, and why not just remap and read existing vmgenid in the pull
manner?  What did I miss?


> But that's not what we have, because Microsoft didn't collaborate with
> anybody on this, and now it's implemented in several hypervisors. Given
> that I'm already spending considerable time working on the RNG, entirely
> without funding, somehow I'm not super motivated to lead a
> cross-industry political effort to change Microsoft's vmgenid spec.
> Maybe somebody else has an appetite for this, but either way, those
> changes would be several years off at best.
> 
> So given we have a "push"-model mechanism, there are two problems to
> tackle, perhaps in the same way, perhaps in a different way:
> 
> A) Outwards propagation toward other kernel drivers: in this case, I
>    have in mind WireGuard, naturally, which very much needs to clear its
>    existing sessions when VMs are forked.
> 
> B) Upwards propagation to userspace: in this case, we handle the
>    concerns of the Amazon engineers on this thread who broached this
>    topic a few years ago, in which s2n, their TLS library, wants to
>    reinitialize its userspace RNG (a silly thing, but I digress) and
>    probably clear session keys too, for the same good reason as
>    WireGuard.
> 
> For (A), at least wearing my WireGuard-maintainer hat, there is an easy
> way and there is a "race-free" way. I use scare quotes there because
> we're still in a "push"-model, which means it's still racy no matter
> what.
> 
> The faux "race-free" way involves having `extern u32 rng_vm_generation;`
> or similar in random.h, and then everything that generates a session key
> would snapshot this value, and every time a session key is used, a
> comparison would be made. This works, but given that we're going to be
> racy no matter what, I think I'd prefer avoiding the extra code in the
> hot path and extra per-session storage. It seems like that'd involve a
> lot of fiddly engineering for no real world benefit.
> 
> The easy way, and the way that I think I prefer, would be to just have a
> sync notifier_block for this, just like we have with
> register_pm_notifier(). From my perspective, it'd be simplest to just
> piggy back on the already existing PM notifier with an extra event,
> PM_POST_VMFORK, which would join the existing set of 7, following
> PM_POST_RESTORE. I think that'd be coherent. However, if the PM people
> don't want to play ball, we could always come up with our own
> notifier_block. But I don't see the need. Plus, WireGuard *already*
> uses the PM notifier for clearing keys, so code-wise for my use case,
> that'd amount adding another case for PM_POST_VMFORK, in addition to the
> currently existing PM_HIBERNATION_PREPARE and PM_SUSPEND_PREPARE cases,
> which all would be treated the same way. Ezpz. So if that sounds like an
> interesting thing to the PM people, I think I'd like to propose a patch
> for that, possibly even for 5.18, given that it'd be very straight-
> forward.
> 
> For (B), it's a little bit trickier. But I think our options follow the
> same rubric. We can expose a generation counter in the vDSO, with
> semantics akin to the extern integer I described above. Or we could
> expose that counter in a file that userspace could poll() on and receive
> notifications that way. Or perhaps a third way. I'm all ears here.
> Alex's team from Amazon last year proposed something similar to the vDSO
> idea, except using mmap on a sysfs file, though from what I can tell,
> that wound up being kind of complicated. Due to the fact that we're
> _already_ racy, I think I'm most inclined at this point toward the
> poll() approach for the same reasons as I prefer a notifier_block. But
> on userspace I could be convinced otherwise, and I'd be interested in
> totally different ideas here too.
> 
> Another thing I should note is that, while I'm not currently leaning
> toward it, the vDSO approach also ties into interesting discussions
> about userspace RNGs (generally a silly idea), and their need for things
> like fork detection and also learning when the kernel RNG was last
> reseeded. So cracking open the vDSO book might invite all sorts of other
> interesting questions and discussions, which may be productive or may be
> a humongous distraction. (Also, again, I'm not super enthusiastic about
> userspace RNGs.)
> 
> Also, there is an interesting question to decide with regards to
> userspace, which is whether the vmgenid driver should expose its unique
> ID to userspace, as Alex requested on an earlier thread. I am actually
> sort of opposed to this. That unique ID may or may not be secret and
> entropic; if it isn't, the crypto is designed to not be impacted
> negatively, but if it is, we should keep it secret. So, rather, I think
> the correct flow is that userspace simply calls getrandom() upon
> learning that the VM forked, which is guaranteed to have been
> reinitialized already by add_vmfork_randomness(), and that will
> guarantee a value that is unique to the VM, without having to actually
> expose that value.
> 
> So, anyway, this is more or less where my thinking on this matter is.
> Would be happy to hear some fresh ideas here too.
> 
> Regards,
> Jason



  parent reply	other threads:[~2022-03-01 16:23 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-01 15:42 propagating vmgenid outward and upward Jason A. Donenfeld
2022-03-01 16:15 ` Laszlo Ersek
2022-03-01 16:28   ` Jason A. Donenfeld
2022-03-01 17:17     ` Michael S. Tsirkin
2022-03-01 18:37       ` Jason A. Donenfeld
2022-03-02  7:42         ` Michael S. Tsirkin
2022-03-02  7:48           ` Michael S. Tsirkin
2022-03-02  8:30         ` Michael S. Tsirkin
2022-03-02 11:26           ` Jason A. Donenfeld
2022-03-02 12:58             ` Michael S. Tsirkin
2022-03-02 13:55               ` Jason A. Donenfeld
2022-03-02 14:46                 ` Michael S. Tsirkin
2022-03-02 15:14                   ` Jason A. Donenfeld
2022-03-02 15:20                     ` Michael S. Tsirkin
2022-03-02 15:36                       ` Jason A. Donenfeld
2022-03-02 16:22                         ` Michael S. Tsirkin
2022-03-02 16:32                           ` Jason A. Donenfeld
2022-03-02 17:27                             ` Michael S. Tsirkin
2022-03-03 13:07                             ` Michael S. Tsirkin
2022-03-02 16:29                         ` Michael S. Tsirkin
2022-03-01 16:21 ` Michael S. Tsirkin [this message]
2022-03-01 16:35   ` Jason A. Donenfeld
2022-03-01 18:01 ` Greg KH
2022-03-01 18:24   ` Jason A. Donenfeld
2022-03-01 19:41     ` Greg KH
2022-03-01 23:12       ` Jason A. Donenfeld
2022-03-02 14:35 ` Jason A. Donenfeld
2022-03-09 10:10 ` Alexander Graf
2022-03-09 22:02   ` Jason A. Donenfeld
2022-03-10 11:18     ` Alexander Graf
2022-03-20 22:53       ` Michael S. Tsirkin
2022-04-19 15:12       ` Jason A. Donenfeld
2022-04-19 16:43         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220301111459-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=Jason@zx2c4.com \
    --cc=adrian@parity.io \
    --cc=arnd@arndb.de \
    --cc=berrange@redhat.com \
    --cc=colmmacc@amazon.com \
    --cc=graf@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=len.brown@intel.com \
    --cc=lersek@redhat.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux@dominikbrodowski.net \
    --cc=mikelley@microsoft.com \
    --cc=pavel@ucw.cz \
    --cc=qemu-devel@nongnu.org \
    --cc=rafael@kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).