Re: propagating vmgenid outward and upward

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: "Laszlo Ersek" <lersek@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"KVM list" <kvm@vger.kernel.org>,
	"QEMU Developers" <qemu-devel@nongnu.org>,
	linux-hyperv@vger.kernel.org,
	"Linux Crypto Mailing List" <linux-crypto@vger.kernel.org>,
	"Alexander Graf" <graf@amazon.com>,
	"Michael Kelley (LINUX)" <mikelley@microsoft.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	adrian@parity.io, "Daniel P. Berrangé" <berrange@redhat.com>,
	"Dominik Brodowski" <linux@dominikbrodowski.net>,
	"Jann Horn" <jannh@google.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	"Brown, Len" <len.brown@intel.com>, "Pavel Machek" <pavel@ucw.cz>,
	"Linux PM" <linux-pm@vger.kernel.org>,
	"Colm MacCarthaigh" <colmmacc@amazon.com>,
	"Theodore Ts'o" <tytso@mit.edu>, "Arnd Bergmann" <arnd@arndb.de>
Subject: Re: propagating vmgenid outward and upward
Date: Wed, 2 Mar 2022 03:30:06 -0500	[thread overview]
Message-ID: <20220302031738-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAHmME9qieLUDVoPYZPo=N8NCL1T-RzQ4p7kCFv3PKFUkhWZPsw@mail.gmail.com>

On Tue, Mar 01, 2022 at 07:37:06PM +0100, Jason A. Donenfeld wrote:
> Hi Michael,
> 
> On Tue, Mar 1, 2022 at 6:17 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > Hmm okay, so it's a performance optimization... some batching then? Do
> > you really need to worry about every packet? Every 64 packets not
> > enough?  Packets are after all queued at NICs etc, and VM fork can
> > happen after they leave wireguard ...
> 
> Unfortunately, yes, this is an "every packet" sort of thing -- if the
> race is to be avoided in a meaningful way. It's really extra bad:
> ChaCha20 and AES-CTR work by xoring a secret stream of bytes with
> plaintext to produce a ciphertext. If you use that same secret stream
> and xor it with a second plaintext and transmit that too, an attacker
> can combine the two different ciphertexts to learn things about the
> original plaintext.
> 
> But, anyway, it seems like the race is here to stay given what we have
> _currently_ available with the virtual hardware. That's why I'm
> focused on trying to get something going that's the least bad with
> what we've currently got, which is racy by design. How vitally
> important is it to have something that doesn't race in the far future?
> I don't know, really. It seems plausible that that ACPI notifier
> triggers so early that nothing else really even has a chance, so the
> race concern is purely theoretical. But I haven't tried to measure
> that so I'm not sure.
> 
> Jason


I got curious, and wrote a dumb benchmark:


#include <stdio.h>
#include <assert.h>
#include <limits.h>
#include <string.h>

struct lng {
	unsigned long long l1;
	unsigned long long l2;
};

struct shrt {
	unsigned long s;
};


struct lng l = { 1, 2 };
struct shrt s = { 3 };

static void test1(volatile struct shrt *sp)
{
	if (sp->s != s.s) {
		printf("short mismatch!\n");
		s.s = sp->s;
	}
}
static void test2(volatile struct lng *lp)
{
	if (lp->l1 != l.l1 || lp->l2 != l.l2) {
		printf("long mismatch!\n");
		l.l1 = lp->l1;
		l.l2 = lp->l2;
	}
}

int main(int argc, char **argv)
{
	volatile struct shrt sv = { 4 };
	volatile struct lng lv = { 5, 6 };

	if (argc > 1) {
		printf("test 1\n");
		for (int i = 0; i < 10000000; ++i) 
			test1(&sv);
	} else {
		printf("test 2\n");
		for (int i = 0; i < 10000000; ++i)
			test2(&lv);
	}
	return 0;
}


Results (built with -O2, nothing fancy):

[mst@tuck ~]$ perf stat -r 1000 ./a.out 1 > /dev/null

 Performance counter stats for './a.out 1' (1000 runs):

              5.12 msec task-clock:u              #    0.945 CPUs utilized            ( +-  0.07% )
                 0      context-switches:u        #    0.000 /sec                   
                 0      cpu-migrations:u          #    0.000 /sec                   
                52      page-faults:u             #   10.016 K/sec                    ( +-  0.07% )
        20,190,800      cycles:u                  #    3.889 GHz                      ( +-  0.01% )
        50,147,371      instructions:u            #    2.48  insn per cycle           ( +-  0.00% )
        20,032,224      branches:u                #    3.858 G/sec                    ( +-  0.00% )
             1,604      branch-misses:u           #    0.01% of all branches          ( +-  0.26% )

        0.00541882 +- 0.00000847 seconds time elapsed  ( +-  0.16% )

[mst@tuck ~]$ perf stat -r 1000 ./a.out > /dev/null

 Performance counter stats for './a.out' (1000 runs):

              7.75 msec task-clock:u              #    0.947 CPUs utilized            ( +-  0.12% )
                 0      context-switches:u        #    0.000 /sec                   
                 0      cpu-migrations:u          #    0.000 /sec                   
                52      page-faults:u             #    6.539 K/sec                    ( +-  0.07% )
        30,205,916      cycles:u                  #    3.798 GHz                      ( +-  0.01% )
        80,147,373      instructions:u            #    2.65  insn per cycle           ( +-  0.00% )
        30,032,227      branches:u                #    3.776 G/sec                    ( +-  0.00% )
             1,621      branch-misses:u           #    0.01% of all branches          ( +-  0.23% )

        0.00817982 +- 0.00000965 seconds time elapsed  ( +-  0.12% )


So yes, the overhead is higher by 50% which seems a lot but it's from a
very small number, so I don't see why it's a show stopper, it's not by a
factor of 10 such that we should sacrifice safety by default. Maybe a
kernel flag that removes the read replacing it with an interrupt will
do.

In other words, premature optimization is the root of all evil.

-- 
MST

WARNING: multiple messages have this Message-ID (diff)

From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: "Brown, Len" <len.brown@intel.com>,
	linux-hyperv@vger.kernel.org,
	"Colm MacCarthaigh" <colmmacc@amazon.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	adrian@parity.io, "KVM list" <kvm@vger.kernel.org>,
	"Jann Horn" <jannh@google.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Linux PM" <linux-pm@vger.kernel.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Dominik Brodowski" <linux@dominikbrodowski.net>,
	"QEMU Developers" <qemu-devel@nongnu.org>,
	"Alexander Graf" <graf@amazon.com>,
	"Linux Crypto Mailing List" <linux-crypto@vger.kernel.org>,
	"Pavel Machek" <pavel@ucw.cz>, "Theodore Ts'o" <tytso@mit.edu>,
	"Michael Kelley (LINUX)" <mikelley@microsoft.com>,
	"Laszlo Ersek" <lersek@redhat.com>,
	"Arnd Bergmann" <arnd@arndb.de>
Subject: Re: propagating vmgenid outward and upward
Date: Wed, 2 Mar 2022 03:30:06 -0500	[thread overview]
Message-ID: <20220302031738-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAHmME9qieLUDVoPYZPo=N8NCL1T-RzQ4p7kCFv3PKFUkhWZPsw@mail.gmail.com>

On Tue, Mar 01, 2022 at 07:37:06PM +0100, Jason A. Donenfeld wrote:
> Hi Michael,
> 
> On Tue, Mar 1, 2022 at 6:17 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > Hmm okay, so it's a performance optimization... some batching then? Do
> > you really need to worry about every packet? Every 64 packets not
> > enough?  Packets are after all queued at NICs etc, and VM fork can
> > happen after they leave wireguard ...
> 
> Unfortunately, yes, this is an "every packet" sort of thing -- if the
> race is to be avoided in a meaningful way. It's really extra bad:
> ChaCha20 and AES-CTR work by xoring a secret stream of bytes with
> plaintext to produce a ciphertext. If you use that same secret stream
> and xor it with a second plaintext and transmit that too, an attacker
> can combine the two different ciphertexts to learn things about the
> original plaintext.
> 
> But, anyway, it seems like the race is here to stay given what we have
> _currently_ available with the virtual hardware. That's why I'm
> focused on trying to get something going that's the least bad with
> what we've currently got, which is racy by design. How vitally
> important is it to have something that doesn't race in the far future?
> I don't know, really. It seems plausible that that ACPI notifier
> triggers so early that nothing else really even has a chance, so the
> race concern is purely theoretical. But I haven't tried to measure
> that so I'm not sure.
> 
> Jason


I got curious, and wrote a dumb benchmark:


#include <stdio.h>
#include <assert.h>
#include <limits.h>
#include <string.h>

struct lng {
	unsigned long long l1;
	unsigned long long l2;
};

struct shrt {
	unsigned long s;
};


struct lng l = { 1, 2 };
struct shrt s = { 3 };

static void test1(volatile struct shrt *sp)
{
	if (sp->s != s.s) {
		printf("short mismatch!\n");
		s.s = sp->s;
	}
}
static void test2(volatile struct lng *lp)
{
	if (lp->l1 != l.l1 || lp->l2 != l.l2) {
		printf("long mismatch!\n");
		l.l1 = lp->l1;
		l.l2 = lp->l2;
	}
}

int main(int argc, char **argv)
{
	volatile struct shrt sv = { 4 };
	volatile struct lng lv = { 5, 6 };

	if (argc > 1) {
		printf("test 1\n");
		for (int i = 0; i < 10000000; ++i) 
			test1(&sv);
	} else {
		printf("test 2\n");
		for (int i = 0; i < 10000000; ++i)
			test2(&lv);
	}
	return 0;
}


Results (built with -O2, nothing fancy):

[mst@tuck ~]$ perf stat -r 1000 ./a.out 1 > /dev/null

 Performance counter stats for './a.out 1' (1000 runs):

              5.12 msec task-clock:u              #    0.945 CPUs utilized            ( +-  0.07% )
                 0      context-switches:u        #    0.000 /sec                   
                 0      cpu-migrations:u          #    0.000 /sec                   
                52      page-faults:u             #   10.016 K/sec                    ( +-  0.07% )
        20,190,800      cycles:u                  #    3.889 GHz                      ( +-  0.01% )
        50,147,371      instructions:u            #    2.48  insn per cycle           ( +-  0.00% )
        20,032,224      branches:u                #    3.858 G/sec                    ( +-  0.00% )
             1,604      branch-misses:u           #    0.01% of all branches          ( +-  0.26% )

        0.00541882 +- 0.00000847 seconds time elapsed  ( +-  0.16% )

[mst@tuck ~]$ perf stat -r 1000 ./a.out > /dev/null

 Performance counter stats for './a.out' (1000 runs):

              7.75 msec task-clock:u              #    0.947 CPUs utilized            ( +-  0.12% )
                 0      context-switches:u        #    0.000 /sec                   
                 0      cpu-migrations:u          #    0.000 /sec                   
                52      page-faults:u             #    6.539 K/sec                    ( +-  0.07% )
        30,205,916      cycles:u                  #    3.798 GHz                      ( +-  0.01% )
        80,147,373      instructions:u            #    2.65  insn per cycle           ( +-  0.00% )
        30,032,227      branches:u                #    3.776 G/sec                    ( +-  0.00% )
             1,621      branch-misses:u           #    0.01% of all branches          ( +-  0.23% )

        0.00817982 +- 0.00000965 seconds time elapsed  ( +-  0.12% )


So yes, the overhead is higher by 50% which seems a lot but it's from a
very small number, so I don't see why it's a show stopper, it's not by a
factor of 10 such that we should sacrifice safety by default. Maybe a
kernel flag that removes the read replacing it with an interrupt will
do.

In other words, premature optimization is the root of all evil.

-- 
MST

next prev parent reply	other threads:[~2022-03-02  8:30 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-01 15:42 propagating vmgenid outward and upward Jason A. Donenfeld
2022-03-01 16:15 ` Laszlo Ersek
2022-03-01 16:28   ` Jason A. Donenfeld
2022-03-01 16:28     ` Jason A. Donenfeld
2022-03-01 17:17     ` Michael S. Tsirkin
2022-03-01 17:17       ` Michael S. Tsirkin
2022-03-01 18:37       ` Jason A. Donenfeld
2022-03-01 18:37         ` Jason A. Donenfeld
2022-03-02  7:42         ` Michael S. Tsirkin
2022-03-02  7:42           ` Michael S. Tsirkin
2022-03-02  7:48           ` Michael S. Tsirkin
2022-03-02  7:48             ` Michael S. Tsirkin
2022-03-02  8:30         ` Michael S. Tsirkin [this message]
2022-03-02  8:30           ` Michael S. Tsirkin
2022-03-02 11:26           ` Jason A. Donenfeld
2022-03-02 11:26             ` Jason A. Donenfeld
2022-03-02 12:58             ` Michael S. Tsirkin
2022-03-02 12:58               ` Michael S. Tsirkin
2022-03-02 13:55               ` Jason A. Donenfeld
2022-03-02 13:55                 ` Jason A. Donenfeld
2022-03-02 14:46                 ` Michael S. Tsirkin
2022-03-02 14:46                   ` Michael S. Tsirkin
2022-03-02 15:14                   ` Jason A. Donenfeld
2022-03-02 15:14                     ` Jason A. Donenfeld
2022-03-02 15:20                     ` Michael S. Tsirkin
2022-03-02 15:20                       ` Michael S. Tsirkin
2022-03-02 15:36                       ` Jason A. Donenfeld
2022-03-02 15:36                         ` Jason A. Donenfeld
2022-03-02 16:22                         ` Michael S. Tsirkin
2022-03-02 16:22                           ` Michael S. Tsirkin
2022-03-02 16:32                           ` Jason A. Donenfeld
2022-03-02 16:32                             ` Jason A. Donenfeld
2022-03-02 17:27                             ` Michael S. Tsirkin
2022-03-02 17:27                               ` Michael S. Tsirkin
2022-03-03 13:07                             ` Michael S. Tsirkin
2022-03-03 13:07                               ` Michael S. Tsirkin
2022-03-02 16:29                         ` Michael S. Tsirkin
2022-03-02 16:29                           ` Michael S. Tsirkin
2022-03-01 16:21 ` Michael S. Tsirkin
2022-03-01 16:21   ` Michael S. Tsirkin
2022-03-01 16:35   ` Jason A. Donenfeld
2022-03-01 16:35     ` Jason A. Donenfeld
2022-03-01 18:01 ` Greg KH
2022-03-01 18:01   ` Greg KH
2022-03-01 18:24   ` Jason A. Donenfeld
2022-03-01 18:24     ` Jason A. Donenfeld
2022-03-01 19:41     ` Greg KH
2022-03-01 19:41       ` Greg KH
2022-03-01 23:12       ` Jason A. Donenfeld
2022-03-01 23:12         ` Jason A. Donenfeld
2022-03-02 14:35 ` Jason A. Donenfeld
2022-03-09 10:10 ` Alexander Graf
2022-03-09 22:02   ` Jason A. Donenfeld
2022-03-09 22:02     ` Jason A. Donenfeld
2022-03-10 11:18     ` Alexander Graf
2022-03-20 22:53       ` Michael S. Tsirkin
2022-03-20 22:53         ` Michael S. Tsirkin
2022-04-19 15:12       ` Jason A. Donenfeld
2022-04-19 15:12         ` Jason A. Donenfeld
2022-04-19 16:43         ` Michael S. Tsirkin
2022-04-19 16:43           ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220302031738-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=Jason@zx2c4.com \
    --cc=adrian@parity.io \
    --cc=arnd@arndb.de \
    --cc=berrange@redhat.com \
    --cc=colmmacc@amazon.com \
    --cc=graf@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=len.brown@intel.com \
    --cc=lersek@redhat.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux@dominikbrodowski.net \
    --cc=mikelley@microsoft.com \
    --cc=pavel@ucw.cz \
    --cc=qemu-devel@nongnu.org \
    --cc=rafael@kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.