qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Steven Sistare <steven.sistare@oracle.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Zeng" <jason.zeng@linux.intel.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Juan Quintela" <quintela@redhat.com>,
	qemu-devel@nongnu.org, "Markus Armbruster" <armbru@redhat.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>
Subject: Re: [PATCH V3 00/22] Live Update
Date: Tue, 18 May 2021 10:57:50 +0100	[thread overview]
Message-ID: <YKOPnkefxgRZ8PoQ@work-vm> (raw)
In-Reply-To: <29356c3a-0b3f-2773-4f9f-18ff4ed4d5bb@oracle.com>

* Steven Sistare (steven.sistare@oracle.com) wrote:
> On 5/14/2021 7:53 AM, Stefan Hajnoczi wrote:
> > On Thu, May 13, 2021 at 04:21:15PM -0400, Steven Sistare wrote:
> >> On 5/12/2021 12:42 PM, Stefan Hajnoczi wrote:
> >>> On Fri, May 07, 2021 at 05:24:58AM -0700, Steve Sistare wrote:
> >>>> Provide the cprsave and cprload commands for live update.  These save and
> >>>> restore VM state, with minimal guest pause time, so that qemu may be updated
> >>>> to a new version in between.
> >>>>
> >>>> cprsave stops the VM and saves vmstate to an ordinary file.  It supports two
> >>>> modes: restart and reboot.  For restart, cprsave exec's the qemu binary (or
> >>>> /usr/bin/qemu-exec if it exists) with the same argv.  qemu restarts in a
> >>>> paused state and waits for the cprload command.
> >>>
> >>> I think cprsave/cprload could be generalized by using QMP to stash the
> >>> file descriptors. The 'getfd' QMP command already exists and QEMU code
> >>> already opens fds passed using this mechanism.
> >>>
> >>> I haven't checked but it may be possible to drop some patches by reusing
> >>> QEMU's monitor file descriptor passing since the code already knows how
> >>> to open from 'getfd' fds.
> >>>
> >>> The reason why using QMP is interesting is because it eliminates the
> >>> need for execve(2). QEMU may be unable to execute a program due to
> >>> chroot, seccomp, etc.
> >>>
> >>> QMP would enable cprsave/cprload to work both with and without
> >>> execve(2).
> >>>
> >>> One tricky thing with this approach might be startup ordering: how to
> >>> get fds via the QMP monitor in the new process before processing the
> >>> entire command-line.
> >>
> >> Early on I experimented with a similar approach.  Old qemu passed descriptors to an
> >> escrow process and exited; new qemu started and retrieved the descriptors from escrow.
> >> vfio mostly worked after I hacked the kernel to suppress the original-pid owner check.
> >> I suspect my recent vfio extensions would smooth the rough edges.
> > 
> > I wonder about the reason for VFIO's pid limitation, maybe because it
> > pins pages from the original process?
> 
> The dma unmap code verifies that the requesting task is the same as the task that mapped
> the pages.  We could add an ioctl that passes ownership to a new task.  We would also need
> to fix locked memory accounting, which is associated with the mm of the original task.

> > Is this VFIO pid limitation the main reason why you chose to make QEMU
> > execve(2) the new binary?
> 
> That is one.  Plus, re-attaching to named shared memory for pc.ram causes the vfio conflict
> errors I mentioned in the previous email.  We would need to suppress redundant dma map calls,
> but allow legitimate dma maps and unmaps in response to the ongoing address space changes and
> diff callbacks caused by some drivers. It would be messy and fragile. In general, it felt like 
> I was working against vfio rather than with it.

OK the weirdness of vfio helps explain a bit about why you're doing it
this way; can you help separate some difference between restart and
reboot for me though:

In 'reboot' mode; where the guest must do suspend in it's drivers, how
much of these vfio requirements are needed?  I guess the memfd use
for the anonymous areas isn't any use for reboot mode.

You mention cprsave calls VFIO_DMA_UNMAP_FLAG_VADDR - after that does
vfio still care about the currently-anonymous areas?

> Another big reason is a requirement to preserve anonymous memory for legacy qemu updates (via
> code injection which I briefly mentioned in KVM forum).  If we extend cpr to allow updates 
> without exec, I still need the exec option.

Can you explain what that code injection mechanism is for those of us
who didn't see that?

Dave

> >> However, the main issue is that guest ram must be backed by named shared memory, and
> >> we would need to add code to support shared memory for all the secondary memory objects.
> >> That makes it less interesting for us at this time; we care about updating legacy qemu 
> >> instances with anonymous guest memory.
> > 
> > Thanks for explaining this more in the other sub-thread. The secondary
> > memory objects you mentioned are relatively small so I don't think
> > saving them in the traditional way is a problem.
> > 
> > Two approaches for zero-copy memory migration fit into QEMU's existing
> > migration infrastructure:
> > 
> > - Marking RAM blocks that are backed by named memory (tmpfs, hugetlbfs,
> >   etc) so they are not saved into the savevm file. The existing --object
> >   memory-backend-file syntax can be used.
> > 
> > - Extending the live migration protocol to detect when file descriptor
> >   passing is available (i.e. UNIX domain socket migration) and using
> >   that for memory-backend-* objects that have fds.
> > 
> > Either of these approaches would handle RAM with existing savevm/migrate
> > commands.
> 
> Yes, but the vfio issues would still need to be solved, and we would need new
> command line options to back existing and future secondary memory objects with 
> named shared memory.
> 
> > The remaining issue is how to migrate VFIO and other file descriptors
> > that cannot be reopened by the new process. As mentioned, QEMU already
> > has file descriptor passing support in the QMP monitor and support for
> > opening passed file descriptors (see qemu_open_internal(),
> > monitor_fd_param(), and socket_get_fd()).
> > 
> > The advantage of integrating live update functionality into the existing
> > savevm/migrate commands is that it will work in more use cases with
> > less code duplication/maintenance/bitrot prevention than the
> > special-case cprsave command in this patch series.
> > 
> > Maybe there is a fundamental technical reason why live update needs to
> > be different from QEMU's existing migration commands but I haven't
> > figured it out yet.
> 
> vfio and anonymous memory.
> 
> Regarding code duplication, I did consider whether to extend the migration
> syntax and implementation versus creating something new.  Those functions
> handle stuff like bdrv snapshot, aio, and migration which are n/a for the cpr
> use case, and the cpr functions handle state that is n/a for the migration case.
> I judged that handling both in the same functions would be less readable and
> maintainable.  After feedback during the V1 review, I simplified the cprsave
> code by by calling qemu_save_device_state, as Xen does, thus eliminating any
> interaction with the migration code.
> 
> Regarding bit rot, I still need to add a cpr test to the test suite, when the 
> review is more complete and folks agree on the final form of the functionality.
> 
> I do like the idea of supporting update without exec, but as a future project, 
> and not at the expense of dropping update with exec.
> 
> - Steve
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  parent reply	other threads:[~2021-05-18 10:27 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-07 12:24 [PATCH V3 00/22] Live Update Steve Sistare
2021-05-07 12:24 ` [PATCH V3 01/22] as_flat_walk Steve Sistare
2021-05-07 12:25 ` [PATCH V3 02/22] qemu_ram_volatile Steve Sistare
2021-05-07 12:25 ` [PATCH V3 03/22] oslib: qemu_clr_cloexec Steve Sistare
2021-05-07 12:25 ` [PATCH V3 04/22] util: env var helpers Steve Sistare
2021-05-07 12:25 ` [PATCH V3 05/22] machine: memfd-alloc option Steve Sistare
2021-05-07 12:25 ` [PATCH V3 06/22] vl: add helper to request re-exec Steve Sistare
2021-05-07 14:31   ` Eric Blake
2021-05-13 20:19     ` Steven Sistare
2021-05-14  8:18       ` Daniel P. Berrangé
2021-05-12 16:27   ` Stefan Hajnoczi
2021-05-13 20:20     ` Steven Sistare
2021-05-07 12:25 ` [PATCH V3 07/22] cpr Steve Sistare
2021-05-12 16:19   ` Stefan Hajnoczi
2021-05-13 20:21     ` Steven Sistare
2021-05-14 11:28       ` Stefan Hajnoczi
2021-05-14 15:14         ` Steven Sistare
2021-05-18 13:42           ` Stefan Hajnoczi
2021-05-07 12:25 ` [PATCH V3 08/22] cpr: QMP interfaces Steve Sistare
2021-06-04 13:59   ` Eric Blake
2021-06-07 17:19     ` Steven Sistare
2021-05-07 12:25 ` [PATCH V3 09/22] cpr: HMP interfaces Steve Sistare
2021-05-07 12:25 ` [PATCH V3 10/22] pci: export functions for cpr Steve Sistare
2021-05-07 12:25 ` [PATCH V3 11/22] vfio-pci: refactor " Steve Sistare
2021-05-19 22:38   ` Alex Williamson
2021-05-21 13:33     ` Steven Sistare
2021-05-21 21:07       ` Alex Williamson
2021-05-21 21:18         ` Steven Sistare
2021-05-07 12:25 ` [PATCH V3 12/22] vfio-pci: cpr part 1 Steve Sistare
2021-05-21 22:24   ` Alex Williamson
2021-05-24 18:29     ` Steven Sistare
2021-06-11 18:15       ` Steven Sistare
2021-06-11 19:43         ` Steven Sistare
2021-05-07 12:25 ` [PATCH V3 13/22] vfio-pci: cpr part 2 Steve Sistare
2021-05-21 22:24   ` Alex Williamson
2021-05-24 18:31     ` Steven Sistare
2021-05-07 12:25 ` [PATCH V3 14/22] vhost: reset vhost devices upon cprsave Steve Sistare
2021-05-07 12:25 ` [PATCH V3 15/22] hostmem-memfd: cpr support Steve Sistare
2021-05-07 12:25 ` [PATCH V3 16/22] chardev: cpr framework Steve Sistare
2021-05-07 14:33   ` Eric Blake
2021-05-13 20:19     ` Steven Sistare
2021-05-07 12:25 ` [PATCH V3 17/22] chardev: cpr for simple devices Steve Sistare
2021-05-07 12:25 ` [PATCH V3 18/22] chardev: cpr for pty Steve Sistare
2021-05-07 12:25 ` [PATCH V3 19/22] chardev: cpr for sockets Steve Sistare
2021-05-07 12:25 ` [PATCH V3 20/22] cpr: only-cpr-capable option Steve Sistare
2021-05-07 12:25 ` [PATCH V3 21/22] cpr: maintainers Steve Sistare
2021-05-07 12:25 ` [PATCH V3 22/22] simplify savevm Steve Sistare
2021-05-07 13:00 ` [PATCH V3 00/22] Live Update no-reply
2021-05-13 20:42   ` Steven Sistare
2021-05-12 16:42 ` Stefan Hajnoczi
2021-05-13 20:21   ` Steven Sistare
2021-05-14 11:53     ` Stefan Hajnoczi
2021-05-14 15:15       ` Steven Sistare
2021-05-17 11:40         ` Stefan Hajnoczi
2021-05-17 19:10           ` Alex Williamson
2021-05-18 13:39             ` Stefan Hajnoczi
2021-05-18 15:48               ` Steven Sistare
2021-05-18  9:57         ` Dr. David Alan Gilbert [this message]
2021-05-18 16:00           ` Steven Sistare
2021-05-18 19:23             ` Dr. David Alan Gilbert
2021-05-18 20:01               ` Alex Williamson
2021-05-18 20:14               ` Steven Sistare
2021-05-20 13:00                 ` [PATCH V3 00/22] Live Update [reboot] Dr. David Alan Gilbert
2021-05-21 14:55                   ` Steven Sistare
2021-06-15 19:14                     ` Dr. David Alan Gilbert
2021-06-24 15:05                       ` Steven Sistare
2021-07-06 17:31                         ` Steven Sistare
2021-05-20 13:13                 ` [PATCH V3 00/22] Live Update [restart] Dr. David Alan Gilbert
2021-05-21 14:56                   ` Steven Sistare
2021-05-24 10:39                     ` Dr. David Alan Gilbert
2021-06-02 13:51                       ` Steven Sistare
2021-06-03 19:36                         ` Dr. David Alan Gilbert
2021-06-03 20:44                           ` Daniel P. Berrangé
2021-06-07 16:40                             ` [PATCH V3 00/22] Live Update [restart] : exec Steven Sistare
2021-06-14 14:31                               ` Steven Sistare
2021-06-14 14:36                                 ` Daniel P. Berrangé
2021-06-15 19:05                               ` Dr. David Alan Gilbert
2021-06-07 18:08                           ` [PATCH V3 00/22] Live Update [restart] : code replication Steven Sistare
2021-06-14 14:33                             ` Steven Sistare
2021-05-19 16:43 ` [PATCH V3 00/22] Live Update Steven Sistare
2021-06-02 15:19   ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YKOPnkefxgRZ8PoQ@work-vm \
    --to=dgilbert@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=jason.zeng@linux.intel.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=steven.sistare@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).