Re: propagating vmgenid outward and upward

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: "Laszlo Ersek" <lersek@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"KVM list" <kvm@vger.kernel.org>,
	"QEMU Developers" <qemu-devel@nongnu.org>,
	linux-hyperv@vger.kernel.org,
	"Linux Crypto Mailing List" <linux-crypto@vger.kernel.org>,
	"Alexander Graf" <graf@amazon.com>,
	"Michael Kelley (LINUX)" <mikelley@microsoft.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	adrian@parity.io, "Daniel P. Berrangé" <berrange@redhat.com>,
	"Dominik Brodowski" <linux@dominikbrodowski.net>,
	"Jann Horn" <jannh@google.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	"Brown, Len" <len.brown@intel.com>, "Pavel Machek" <pavel@ucw.cz>,
	"Linux PM" <linux-pm@vger.kernel.org>,
	"Colm MacCarthaigh" <colmmacc@amazon.com>,
	"Theodore Ts'o" <tytso@mit.edu>, "Arnd Bergmann" <arnd@arndb.de>
Subject: Re: propagating vmgenid outward and upward
Date: Wed, 2 Mar 2022 07:58:33 -0500	[thread overview]
Message-ID: <20220302074503-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAHmME9pf-bjnZuweoLqoFEmPy1OK7ogEgGEAva1T8uVTufhCuw@mail.gmail.com>

On Wed, Mar 02, 2022 at 12:26:27PM +0100, Jason A. Donenfeld wrote:
> Hey Michael,
> 
> Thanks for the benchmark.
> 
> On Wed, Mar 2, 2022 at 9:30 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > So yes, the overhead is higher by 50% which seems a lot but it's from a
> > very small number, so I don't see why it's a show stopper, it's not by a
> > factor of 10 such that we should sacrifice safety by default. Maybe a
> > kernel flag that removes the read replacing it with an interrupt will
> > do.
> >
> > In other words, premature optimization is the root of all evil.
> 
> Unfortunately I don't think it's as simple as that for several reasons.
> 
> First, I'm pretty confident a beefy Intel machine can mostly hide
> non-dependent comparisons in the memory access and have the problem
> mostly go away. But this is much less the case on, say, an in-order
> MIPS32r2, which isn't just "some crappy ISA I'm using for the sake of
> argument," but actually the platform on which a lot of networking and
> WireGuard stuff runs, so I do care about it. There, we have 4
> reads/comparisons which can't pipeline nearly as well.

Sure. Want to try running some benchmarks on that platform?
Presumably you have access to such a box, right?


> There's also the atomicity aspect, which I think makes your benchmark
> not quite accurate. Those 16 bytes could change between the first and
> second word (or between the Nth and N+1th word for N<=3 on 32-bit).
> What if in that case the word you read second doesn't change, but the
> word you read first did? So then you find yourself having to do a
> hi-lo-hi dance.
> And then consider the 32-bit case, where that's even
> more annoying. This is just one of those things that comes up when you
> compare the semantics of a "large unique ID" and "word-sized counter",
> as general topics. (My suggestion is that vmgenid provide both.)

I don't see how this matters for any applications at all. Feel free to
present a case that would be race free with a word but not a 16
byte value, I could not imagine one. It's human to err of course.

>
> Finally, there's a slightly storage aspect, where adding 16 bytes to a
> per-key struct is a little bit heavier than adding 4 bytes and might
> bust a cache line without sufficient care, care which always has some
> cost in one way or another.
> 
> So I just don't know if it's realistic to impose a 16-byte per-packet
> comparison all the time like that. I'm familiar with WireGuard
> obviously, but there's also cifs and maybe even wifi and bluetooth,
> and who knows what else, to care about too. Then there's the userspace
> discussion. I can't imagine a 16-byte hotpath comparison being
> accepted as implementable.

I think this hinges on benchmarking results. Want to start with
my silly benchmark at least? If you can't measure an order of
magnitude gain then I think any effect on wireguard will be in the
noise.


> > And I feel if linux
> > DTRT and reads the 16 bytes then hypervisor vendors will be motivated to
> > improve and add a 4 byte unique one. As long as linux is interrupt
> > driven there's no motivation for change.
> 
> I reeeeeally don't want to get pulled into the politics of this on the
> hypervisor side. I assume an improved thing would begin with QEMU and
> Firecracker or something collaborating because they're both open
> source and Amazon people seem interested.

I think it would begin with a benchmark showing there's even any
measureable performance to be gained by switching the semantics.

> And then pressure builds for
> Microsoft and VMware to do it on their side. And then we get this all
> nicely implemented in the kernel. In the meantime, though, I'm not
> going to refuse to address the problem entirely just because the
> virtual hardware is less than perfect; I'd rather make the most with
> what we've got while still being somewhat reasonable from an
> implementation perspective.
> 
> Jason

Right but given you are trading security off for performance, it matters
a lot what the performance gain is.

-- 
MST

WARNING: multiple messages have this Message-ID (diff)

From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: "Brown, Len" <len.brown@intel.com>,
	linux-hyperv@vger.kernel.org,
	"Colm MacCarthaigh" <colmmacc@amazon.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	adrian@parity.io, "KVM list" <kvm@vger.kernel.org>,
	"Jann Horn" <jannh@google.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Linux PM" <linux-pm@vger.kernel.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Dominik Brodowski" <linux@dominikbrodowski.net>,
	"QEMU Developers" <qemu-devel@nongnu.org>,
	"Alexander Graf" <graf@amazon.com>,
	"Linux Crypto Mailing List" <linux-crypto@vger.kernel.org>,
	"Pavel Machek" <pavel@ucw.cz>, "Theodore Ts'o" <tytso@mit.edu>,
	"Michael Kelley (LINUX)" <mikelley@microsoft.com>,
	"Laszlo Ersek" <lersek@redhat.com>,
	"Arnd Bergmann" <arnd@arndb.de>
Subject: Re: propagating vmgenid outward and upward
Date: Wed, 2 Mar 2022 07:58:33 -0500	[thread overview]
Message-ID: <20220302074503-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAHmME9pf-bjnZuweoLqoFEmPy1OK7ogEgGEAva1T8uVTufhCuw@mail.gmail.com>

On Wed, Mar 02, 2022 at 12:26:27PM +0100, Jason A. Donenfeld wrote:
> Hey Michael,
> 
> Thanks for the benchmark.
> 
> On Wed, Mar 2, 2022 at 9:30 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > So yes, the overhead is higher by 50% which seems a lot but it's from a
> > very small number, so I don't see why it's a show stopper, it's not by a
> > factor of 10 such that we should sacrifice safety by default. Maybe a
> > kernel flag that removes the read replacing it with an interrupt will
> > do.
> >
> > In other words, premature optimization is the root of all evil.
> 
> Unfortunately I don't think it's as simple as that for several reasons.
> 
> First, I'm pretty confident a beefy Intel machine can mostly hide
> non-dependent comparisons in the memory access and have the problem
> mostly go away. But this is much less the case on, say, an in-order
> MIPS32r2, which isn't just "some crappy ISA I'm using for the sake of
> argument," but actually the platform on which a lot of networking and
> WireGuard stuff runs, so I do care about it. There, we have 4
> reads/comparisons which can't pipeline nearly as well.

Sure. Want to try running some benchmarks on that platform?
Presumably you have access to such a box, right?


> There's also the atomicity aspect, which I think makes your benchmark
> not quite accurate. Those 16 bytes could change between the first and
> second word (or between the Nth and N+1th word for N<=3 on 32-bit).
> What if in that case the word you read second doesn't change, but the
> word you read first did? So then you find yourself having to do a
> hi-lo-hi dance.
> And then consider the 32-bit case, where that's even
> more annoying. This is just one of those things that comes up when you
> compare the semantics of a "large unique ID" and "word-sized counter",
> as general topics. (My suggestion is that vmgenid provide both.)

I don't see how this matters for any applications at all. Feel free to
present a case that would be race free with a word but not a 16
byte value, I could not imagine one. It's human to err of course.

>
> Finally, there's a slightly storage aspect, where adding 16 bytes to a
> per-key struct is a little bit heavier than adding 4 bytes and might
> bust a cache line without sufficient care, care which always has some
> cost in one way or another.
> 
> So I just don't know if it's realistic to impose a 16-byte per-packet
> comparison all the time like that. I'm familiar with WireGuard
> obviously, but there's also cifs and maybe even wifi and bluetooth,
> and who knows what else, to care about too. Then there's the userspace
> discussion. I can't imagine a 16-byte hotpath comparison being
> accepted as implementable.

I think this hinges on benchmarking results. Want to start with
my silly benchmark at least? If you can't measure an order of
magnitude gain then I think any effect on wireguard will be in the
noise.


> > And I feel if linux
> > DTRT and reads the 16 bytes then hypervisor vendors will be motivated to
> > improve and add a 4 byte unique one. As long as linux is interrupt
> > driven there's no motivation for change.
> 
> I reeeeeally don't want to get pulled into the politics of this on the
> hypervisor side. I assume an improved thing would begin with QEMU and
> Firecracker or something collaborating because they're both open
> source and Amazon people seem interested.

I think it would begin with a benchmark showing there's even any
measureable performance to be gained by switching the semantics.

> And then pressure builds for
> Microsoft and VMware to do it on their side. And then we get this all
> nicely implemented in the kernel. In the meantime, though, I'm not
> going to refuse to address the problem entirely just because the
> virtual hardware is less than perfect; I'd rather make the most with
> what we've got while still being somewhat reasonable from an
> implementation perspective.
> 
> Jason

Right but given you are trading security off for performance, it matters
a lot what the performance gain is.

-- 
MST

next prev parent reply	other threads:[~2022-03-02 12:58 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-01 15:42 propagating vmgenid outward and upward Jason A. Donenfeld
2022-03-01 16:15 ` Laszlo Ersek
2022-03-01 16:28   ` Jason A. Donenfeld
2022-03-01 16:28     ` Jason A. Donenfeld
2022-03-01 17:17     ` Michael S. Tsirkin
2022-03-01 17:17       ` Michael S. Tsirkin
2022-03-01 18:37       ` Jason A. Donenfeld
2022-03-01 18:37         ` Jason A. Donenfeld
2022-03-02  7:42         ` Michael S. Tsirkin
2022-03-02  7:42           ` Michael S. Tsirkin
2022-03-02  7:48           ` Michael S. Tsirkin
2022-03-02  7:48             ` Michael S. Tsirkin
2022-03-02  8:30         ` Michael S. Tsirkin
2022-03-02  8:30           ` Michael S. Tsirkin
2022-03-02 11:26           ` Jason A. Donenfeld
2022-03-02 11:26             ` Jason A. Donenfeld
2022-03-02 12:58             ` Michael S. Tsirkin [this message]
2022-03-02 12:58               ` Michael S. Tsirkin
2022-03-02 13:55               ` Jason A. Donenfeld
2022-03-02 13:55                 ` Jason A. Donenfeld
2022-03-02 14:46                 ` Michael S. Tsirkin
2022-03-02 14:46                   ` Michael S. Tsirkin
2022-03-02 15:14                   ` Jason A. Donenfeld
2022-03-02 15:14                     ` Jason A. Donenfeld
2022-03-02 15:20                     ` Michael S. Tsirkin
2022-03-02 15:20                       ` Michael S. Tsirkin
2022-03-02 15:36                       ` Jason A. Donenfeld
2022-03-02 15:36                         ` Jason A. Donenfeld
2022-03-02 16:22                         ` Michael S. Tsirkin
2022-03-02 16:22                           ` Michael S. Tsirkin
2022-03-02 16:32                           ` Jason A. Donenfeld
2022-03-02 16:32                             ` Jason A. Donenfeld
2022-03-02 17:27                             ` Michael S. Tsirkin
2022-03-02 17:27                               ` Michael S. Tsirkin
2022-03-03 13:07                             ` Michael S. Tsirkin
2022-03-03 13:07                               ` Michael S. Tsirkin
2022-03-02 16:29                         ` Michael S. Tsirkin
2022-03-02 16:29                           ` Michael S. Tsirkin
2022-03-01 16:21 ` Michael S. Tsirkin
2022-03-01 16:21   ` Michael S. Tsirkin
2022-03-01 16:35   ` Jason A. Donenfeld
2022-03-01 16:35     ` Jason A. Donenfeld
2022-03-01 18:01 ` Greg KH
2022-03-01 18:01   ` Greg KH
2022-03-01 18:24   ` Jason A. Donenfeld
2022-03-01 18:24     ` Jason A. Donenfeld
2022-03-01 19:41     ` Greg KH
2022-03-01 19:41       ` Greg KH
2022-03-01 23:12       ` Jason A. Donenfeld
2022-03-01 23:12         ` Jason A. Donenfeld
2022-03-02 14:35 ` Jason A. Donenfeld
2022-03-09 10:10 ` Alexander Graf
2022-03-09 22:02   ` Jason A. Donenfeld
2022-03-09 22:02     ` Jason A. Donenfeld
2022-03-10 11:18     ` Alexander Graf
2022-03-20 22:53       ` Michael S. Tsirkin
2022-03-20 22:53         ` Michael S. Tsirkin
2022-04-19 15:12       ` Jason A. Donenfeld
2022-04-19 15:12         ` Jason A. Donenfeld
2022-04-19 16:43         ` Michael S. Tsirkin
2022-04-19 16:43           ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220302074503-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=Jason@zx2c4.com \
    --cc=adrian@parity.io \
    --cc=arnd@arndb.de \
    --cc=berrange@redhat.com \
    --cc=colmmacc@amazon.com \
    --cc=graf@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=len.brown@intel.com \
    --cc=lersek@redhat.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux@dominikbrodowski.net \
    --cc=mikelley@microsoft.com \
    --cc=pavel@ucw.cz \
    --cc=qemu-devel@nongnu.org \
    --cc=rafael@kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.