From: "Michael S. Tsirkin" <mst@redhat.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>,
Jann Horn <jannh@google.com>, Willy Tarreau <w@1wt.eu>,
Colm MacCarthaigh <colmmacc@amazon.com>,
"Catangiu, Adrian Costin" <acatan@amazon.com>,
"Theodore Y. Ts'o" <tytso@mit.edu>,
Eric Biggers <ebiggers@kernel.org>,
"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
kernel list <linux-kernel@vger.kernel.org>,
"open list:VIRTIO GPU DRIVER"
<virtualization@lists.linux-foundation.org>, "Graf (AWS),
Alexander" <graf@amazon.de>,
"Woodhouse, David" <dwmw@amazon.co.uk>,
bonzini@gnu.org, "Singh, Balbir" <sblbir@amazon.com>,
"Weiss, Radu" <raduweis@amazon.com>,
oridgar@gmail.com, ghammer@redhat.com,
Jonathan Corbet <corbet@lwn.net>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Qemu Developers <qemu-devel@nongnu.org>,
KVM list <kvm@vger.kernel.org>, Michal Hocko <mhocko@kernel.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Pavel Machek <pavel@ucw.cz>,
Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
Date: Mon, 19 Oct 2020 11:00:45 -0400 [thread overview]
Message-ID: <20201019105118-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CALCETrUeRAhmEFR6EFXz8HzDYd2doZ2TMyZmu1pU_-yAPA6KDw@mail.gmail.com>
On Sun, Oct 18, 2020 at 09:14:00AM -0700, Andy Lutomirski wrote:
> On Sun, Oct 18, 2020 at 8:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Sun, Oct 18, 2020 at 08:54:36AM -0700, Andy Lutomirski wrote:
> > > On Sun, Oct 18, 2020 at 8:52 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Sat, Oct 17, 2020 at 03:24:08PM +0200, Jason A. Donenfeld wrote:
> > > > > 4c. The guest kernel maintains an array of physical addresses that are
> > > > > MADV_WIPEONFORK. The hypervisor knows about this array and its
> > > > > location through whatever protocol, and before resuming a
> > > > > moved/snapshotted/duplicated VM, it takes the responsibility for
> > > > > memzeroing this memory. The huge pro here would be that this
> > > > > eliminates all races, and reduces complexity quite a bit, because the
> > > > > hypervisor can perfectly synchronize its bringup (and SMP bringup)
> > > > > with this, and it can even optimize things like on-disk memory
> > > > > snapshots to simply not write out those pages to disk.
> > > > >
> > > > > A 4c-like approach seems like it'd be a lot of bang for the buck -- we
> > > > > reuse the existing mechanism (MADV_WIPEONFORK), so there's no new
> > > > > userspace API to deal with, and it'd be race free, and eliminate a lot
> > > > > of kernel complexity.
> > > >
> > > > Clearly this has a chance to break applications, right?
> > > > If there's an app that uses this as a non-system-calls way
> > > > to find out whether there was a fork, it will break
> > > > when wipe triggers without a fork ...
> > > > For example, imagine:
> > > >
> > > > MADV_WIPEONFORK
> > > > copy secret data to MADV_DONTFORK
> > > > fork
> > > >
> > > >
> > > > used to work, with this change it gets 0s instead of the secret data.
> > > >
> > > >
> > > > I am also not sure it's wise to expose each guest process
> > > > to the hypervisor like this. E.g. each process needs a
> > > > guest physical address of its own then. This is a finite resource.
> > > >
> > > >
> > > > The mmap interface proposed here is somewhat baroque, but it is
> > > > certainly simple to implement ...
> > >
> > > Wipe of fork/vmgenid/whatever could end up being much more problematic
> > > than it naively appears -- it could be wiped in the middle of a read.
> > > Either the API needs to handle this cleanly, or we need something more
> > > aggressive like signal-on-fork.
> > >
> > > --Andy
> >
> >
> > Right, it's not on fork, it's actually when process is snapshotted.
> >
> > If we assume it's CRIU we care about, then I
> > wonder what's wrong with something like
> > MADV_CHANGEONPTRACE_SEIZE
> > and basically say it's X bytes which change the value...
>
> I feel like we may be approaching this from the wrong end. Rather
> than saying "what data structure can the kernel expose that might
> plausibly be useful", how about we try identifying some specific
> userspace needs and see what a good solution could look like. I can
> identify two major cryptographic use cases:
Well, I'm aware of a non-cryptographic use-case:
https://bugzilla.redhat.com/show_bug.cgi?id=1118834
this seems to just ask for the guest to have a way to detect that
a VM cloning triggered.
--
MST
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>,
KVM list <kvm@vger.kernel.org>,
"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
ghammer@redhat.com, "Weiss, Radu" <raduweis@amazon.com>,
Qemu Developers <qemu-devel@nongnu.org>,
"open list:VIRTIO GPU DRIVER"
<virtualization@lists.linux-foundation.org>,
Pavel Machek <pavel@ucw.cz>,
Colm MacCarthaigh <colmmacc@amazon.com>,
Jonathan Corbet <corbet@lwn.net>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Eric Biggers <ebiggers@kernel.org>,
"Singh, Balbir" <sblbir@amazon.com>,
bonzini@gnu.org, "Graf \(AWS\), Alexander" <graf@amazon.de>,
Jann Horn <jannh@google.com>,
oridgar@gmail.com, "Catangiu, Adrian Costin" <acatan@amazon.com>,
Michal Hocko <mhocko@kernel.org>,
"Theodore Y. Ts'o" <tytso@mit.edu>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
kernel list <linux-kernel@vger.kernel.org>,
Linux API <linux-api@vger.kernel.org>, Willy Tarreau <w@1wt.eu>,
"Woodhouse, David" <dwmw@amazon.co.uk>
Subject: Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
Date: Mon, 19 Oct 2020 11:00:45 -0400 [thread overview]
Message-ID: <20201019105118-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CALCETrUeRAhmEFR6EFXz8HzDYd2doZ2TMyZmu1pU_-yAPA6KDw@mail.gmail.com>
On Sun, Oct 18, 2020 at 09:14:00AM -0700, Andy Lutomirski wrote:
> On Sun, Oct 18, 2020 at 8:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Sun, Oct 18, 2020 at 08:54:36AM -0700, Andy Lutomirski wrote:
> > > On Sun, Oct 18, 2020 at 8:52 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Sat, Oct 17, 2020 at 03:24:08PM +0200, Jason A. Donenfeld wrote:
> > > > > 4c. The guest kernel maintains an array of physical addresses that are
> > > > > MADV_WIPEONFORK. The hypervisor knows about this array and its
> > > > > location through whatever protocol, and before resuming a
> > > > > moved/snapshotted/duplicated VM, it takes the responsibility for
> > > > > memzeroing this memory. The huge pro here would be that this
> > > > > eliminates all races, and reduces complexity quite a bit, because the
> > > > > hypervisor can perfectly synchronize its bringup (and SMP bringup)
> > > > > with this, and it can even optimize things like on-disk memory
> > > > > snapshots to simply not write out those pages to disk.
> > > > >
> > > > > A 4c-like approach seems like it'd be a lot of bang for the buck -- we
> > > > > reuse the existing mechanism (MADV_WIPEONFORK), so there's no new
> > > > > userspace API to deal with, and it'd be race free, and eliminate a lot
> > > > > of kernel complexity.
> > > >
> > > > Clearly this has a chance to break applications, right?
> > > > If there's an app that uses this as a non-system-calls way
> > > > to find out whether there was a fork, it will break
> > > > when wipe triggers without a fork ...
> > > > For example, imagine:
> > > >
> > > > MADV_WIPEONFORK
> > > > copy secret data to MADV_DONTFORK
> > > > fork
> > > >
> > > >
> > > > used to work, with this change it gets 0s instead of the secret data.
> > > >
> > > >
> > > > I am also not sure it's wise to expose each guest process
> > > > to the hypervisor like this. E.g. each process needs a
> > > > guest physical address of its own then. This is a finite resource.
> > > >
> > > >
> > > > The mmap interface proposed here is somewhat baroque, but it is
> > > > certainly simple to implement ...
> > >
> > > Wipe of fork/vmgenid/whatever could end up being much more problematic
> > > than it naively appears -- it could be wiped in the middle of a read.
> > > Either the API needs to handle this cleanly, or we need something more
> > > aggressive like signal-on-fork.
> > >
> > > --Andy
> >
> >
> > Right, it's not on fork, it's actually when process is snapshotted.
> >
> > If we assume it's CRIU we care about, then I
> > wonder what's wrong with something like
> > MADV_CHANGEONPTRACE_SEIZE
> > and basically say it's X bytes which change the value...
>
> I feel like we may be approaching this from the wrong end. Rather
> than saying "what data structure can the kernel expose that might
> plausibly be useful", how about we try identifying some specific
> userspace needs and see what a good solution could look like. I can
> identify two major cryptographic use cases:
Well, I'm aware of a non-cryptographic use-case:
https://bugzilla.redhat.com/show_bug.cgi?id=1118834
this seems to just ask for the guest to have a way to detect that
a VM cloning triggered.
--
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>,
KVM list <kvm@vger.kernel.org>,
"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
ghammer@redhat.com, "Weiss, Radu" <raduweis@amazon.com>,
Qemu Developers <qemu-devel@nongnu.org>,
"open list:VIRTIO GPU DRIVER"
<virtualization@lists.linux-foundation.org>,
Pavel Machek <pavel@ucw.cz>,
Colm MacCarthaigh <colmmacc@amazon.com>,
Jonathan Corbet <corbet@lwn.net>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Eric Biggers <ebiggers@kernel.org>,
"Singh, Balbir" <sblbir@amazon.com>,
bonzini@gnu.org, "Graf \(AWS\), Alexander" <graf@amazon.de>,
Jann Horn <jannh@google.com>,
oridgar@gmail.com, "Catangiu, Adrian Costin" <acatan@amazon.com>,
Michal Hocko <mhocko@kernel.org>,
"Theodore Y. Ts'o" <tytso@mit.edu>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
kernel list <linux-kernel@vger.kernel.org>,
Linux API <linux-api@vger.kernel.org>, Willy Tarreau <w@1wt.eu>,
"Woodhouse, David" <dwmw@amazon.co.uk>
Subject: Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
Date: Mon, 19 Oct 2020 11:00:45 -0400 [thread overview]
Message-ID: <20201019105118-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CALCETrUeRAhmEFR6EFXz8HzDYd2doZ2TMyZmu1pU_-yAPA6KDw@mail.gmail.com>
On Sun, Oct 18, 2020 at 09:14:00AM -0700, Andy Lutomirski wrote:
> On Sun, Oct 18, 2020 at 8:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Sun, Oct 18, 2020 at 08:54:36AM -0700, Andy Lutomirski wrote:
> > > On Sun, Oct 18, 2020 at 8:52 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Sat, Oct 17, 2020 at 03:24:08PM +0200, Jason A. Donenfeld wrote:
> > > > > 4c. The guest kernel maintains an array of physical addresses that are
> > > > > MADV_WIPEONFORK. The hypervisor knows about this array and its
> > > > > location through whatever protocol, and before resuming a
> > > > > moved/snapshotted/duplicated VM, it takes the responsibility for
> > > > > memzeroing this memory. The huge pro here would be that this
> > > > > eliminates all races, and reduces complexity quite a bit, because the
> > > > > hypervisor can perfectly synchronize its bringup (and SMP bringup)
> > > > > with this, and it can even optimize things like on-disk memory
> > > > > snapshots to simply not write out those pages to disk.
> > > > >
> > > > > A 4c-like approach seems like it'd be a lot of bang for the buck -- we
> > > > > reuse the existing mechanism (MADV_WIPEONFORK), so there's no new
> > > > > userspace API to deal with, and it'd be race free, and eliminate a lot
> > > > > of kernel complexity.
> > > >
> > > > Clearly this has a chance to break applications, right?
> > > > If there's an app that uses this as a non-system-calls way
> > > > to find out whether there was a fork, it will break
> > > > when wipe triggers without a fork ...
> > > > For example, imagine:
> > > >
> > > > MADV_WIPEONFORK
> > > > copy secret data to MADV_DONTFORK
> > > > fork
> > > >
> > > >
> > > > used to work, with this change it gets 0s instead of the secret data.
> > > >
> > > >
> > > > I am also not sure it's wise to expose each guest process
> > > > to the hypervisor like this. E.g. each process needs a
> > > > guest physical address of its own then. This is a finite resource.
> > > >
> > > >
> > > > The mmap interface proposed here is somewhat baroque, but it is
> > > > certainly simple to implement ...
> > >
> > > Wipe of fork/vmgenid/whatever could end up being much more problematic
> > > than it naively appears -- it could be wiped in the middle of a read.
> > > Either the API needs to handle this cleanly, or we need something more
> > > aggressive like signal-on-fork.
> > >
> > > --Andy
> >
> >
> > Right, it's not on fork, it's actually when process is snapshotted.
> >
> > If we assume it's CRIU we care about, then I
> > wonder what's wrong with something like
> > MADV_CHANGEONPTRACE_SEIZE
> > and basically say it's X bytes which change the value...
>
> I feel like we may be approaching this from the wrong end. Rather
> than saying "what data structure can the kernel expose that might
> plausibly be useful", how about we try identifying some specific
> userspace needs and see what a good solution could look like. I can
> identify two major cryptographic use cases:
Well, I'm aware of a non-cryptographic use-case:
https://bugzilla.redhat.com/show_bug.cgi?id=1118834
this seems to just ask for the guest to have a way to detect that
a VM cloning triggered.
--
MST
next prev parent reply other threads:[~2020-10-19 15:00 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <AQHWo8lIfZnFKGe8nkGmhTCXwq5R3w==>
2020-10-16 14:33 ` [PATCH] drivers/virt: vmgenid: add vm generation id driver Catangiu, Adrian Costin
2020-10-16 15:00 ` Catangiu, Adrian Costin
2020-10-16 15:14 ` gregkh
2020-10-16 15:14 ` gregkh
2020-10-16 15:14 ` gregkh
2020-10-17 1:40 ` Jann Horn
2020-10-17 1:40 ` Jann Horn via Virtualization
2020-10-17 3:36 ` Willy Tarreau
2020-10-17 3:36 ` Willy Tarreau
2020-10-17 4:02 ` Jann Horn
2020-10-17 4:02 ` Jann Horn via Virtualization
2020-10-17 4:34 ` Colm MacCarthaigh
2020-10-17 5:01 ` Jann Horn
2020-10-17 5:01 ` Jann Horn via Virtualization
2020-10-17 5:29 ` Colm MacCarthaigh
2020-10-17 5:29 ` Colm MacCarthaigh
2020-10-17 5:37 ` Willy Tarreau
2020-10-17 5:37 ` Willy Tarreau
2020-10-17 5:52 ` Jann Horn
2020-10-17 5:52 ` Jann Horn via Virtualization
2020-10-17 6:44 ` Willy Tarreau
2020-10-17 6:44 ` Willy Tarreau
2020-10-17 6:55 ` Jann Horn
2020-10-17 6:55 ` Jann Horn via Virtualization
2020-10-17 7:17 ` Willy Tarreau
2020-10-17 7:17 ` Willy Tarreau
2020-10-17 13:24 ` Jason A. Donenfeld
2020-10-17 13:24 ` Jason A. Donenfeld
2020-10-17 18:06 ` Catangiu, Adrian Costin
2020-10-17 18:09 ` Alexander Graf
2020-10-17 18:09 ` Alexander Graf
2020-10-18 2:08 ` Jann Horn
2020-10-18 2:08 ` Jann Horn via Virtualization
2020-10-20 9:35 ` Christian Borntraeger
2020-10-20 9:35 ` Christian Borntraeger
2020-10-20 9:35 ` Christian Borntraeger
2020-10-20 9:54 ` Alexander Graf
2020-10-20 9:54 ` Alexander Graf
2020-10-20 16:54 ` Catangiu, Adrian Costin
2020-10-18 3:14 ` Colm MacCarthaigh
2020-10-18 3:14 ` Colm MacCarthaigh
2020-10-18 15:52 ` Michael S. Tsirkin
2020-10-18 15:52 ` Michael S. Tsirkin
2020-10-18 15:52 ` Michael S. Tsirkin
2020-10-18 15:54 ` Andy Lutomirski
2020-10-18 15:54 ` Andy Lutomirski
2020-10-18 15:54 ` Andy Lutomirski
2020-10-18 15:59 ` Michael S. Tsirkin
2020-10-18 15:59 ` Michael S. Tsirkin
2020-10-18 15:59 ` Michael S. Tsirkin
2020-10-18 16:14 ` Andy Lutomirski
2020-10-18 16:14 ` Andy Lutomirski
2020-10-18 16:14 ` Andy Lutomirski
2020-10-19 15:00 ` Michael S. Tsirkin [this message]
2020-10-19 15:00 ` Michael S. Tsirkin
2020-10-19 15:00 ` Michael S. Tsirkin
2020-10-17 18:10 ` Andy Lutomirski
2020-10-17 18:10 ` Andy Lutomirski
2020-10-17 18:10 ` Andy Lutomirski
2020-10-19 17:15 ` Mathieu Desnoyers
2020-10-19 17:15 ` Mathieu Desnoyers
2020-10-19 17:15 ` Mathieu Desnoyers
2020-10-20 10:00 ` Alexander Graf
2020-10-20 10:00 ` Alexander Graf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201019105118-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=Jason@zx2c4.com \
--cc=acatan@amazon.com \
--cc=bonzini@gnu.org \
--cc=colmmacc@amazon.com \
--cc=corbet@lwn.net \
--cc=dwmw@amazon.co.uk \
--cc=ebiggers@kernel.org \
--cc=ghammer@redhat.com \
--cc=graf@amazon.de \
--cc=gregkh@linuxfoundation.org \
--cc=jannh@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mhocko@kernel.org \
--cc=oridgar@gmail.com \
--cc=pavel@ucw.cz \
--cc=qemu-devel@nongnu.org \
--cc=raduweis@amazon.com \
--cc=rafael@kernel.org \
--cc=sblbir@amazon.com \
--cc=tytso@mit.edu \
--cc=virtualization@lists.linux-foundation.org \
--cc=w@1wt.eu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.