qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Michael Roth <michael.roth@amd.com>
To: <qemu-devel@nongnu.org>
Cc: qemu-stable@nongnu.org, "David Hildenbrand" <david@redhat.com>,
	"Wei Wang" <wei.w.wang@intel.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Alexander Duyck" <alexander.duyck@gmail.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Peter Xu" <peterx@redhat.com>
Subject: [PATCH 42/64] virtio-balloon: don't start free page hinting if postcopy is possible
Date: Tue, 19 Oct 2021 09:09:22 -0500	[thread overview]
Message-ID: <20211019140944.152419-43-michael.roth@amd.com> (raw)
In-Reply-To: <20211019140944.152419-1-michael.roth@amd.com>

From: David Hildenbrand <david@redhat.com>

Postcopy never worked properly with 'free-page-hint=on', as there are
at least two issues:

1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE
   and consequently won't release free pages back to the OS once
   migration finishes.

   The issue is that for postcopy, we won't do a final bitmap sync while
   the guest is stopped on the source and
   virtio_balloon_free_page_hint_notify() will only call
   virtio_balloon_free_page_done() on the source during
   PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to
   the destination.

2) Once the VM touches a page on the destination that has been excluded
   from migration on the source via qemu_guest_free_page_hint() while
   postcopy is active, that thread will stall until postcopy finishes
   and all threads are woken up. (with older Linux kernels that won't
   retry faults when woken up via userfaultfd, we might actually get a
   SEGFAULT)

   The issue is that the source will refuse to migrate any pages that
   are not marked as dirty in the dirty bmap -- for example, because the
   page might just have been sent. Consequently, the faulting thread will
   stall, waiting for the page to be migrated -- which could take quite
   a while and result in guest OS issues.

While we could fix 1) comparatively easily, 2) is harder to get right and
might require more involved RAM migration changes on source and destination
[1].

As it never worked properly, let's not start free page hinting in the
precopy notifier if the postcopy migration capability was enabled to fix
it easily. Capabilities cannot be enabled once migration is already
running.

Note 1: in the future we might either adjust migration code on the source
        to track pages that have actually been sent or adjust
        migration code on source and destination  to eventually send
        pages multiple times from the source and and deal with pages
        that are sent multiple times on the destination.

Note 2: virtio-mem has similar issues, however, access to "unplugged"
        memory by the guest is very rare and we would have to be very
        lucky for it to happen during migration. The spec states
        "The driver SHOULD NOT read from unplugged memory blocks ..."
        and "The driver MUST NOT write to unplugged memory blocks".
        virtio-mem will move away from virtio_balloon_free_page_done()
        soon and handle this case explicitly on the destination.

[1] https://lkml.kernel.org/r/e79fd18c-aa62-c1d8-c7f3-ba3fc2c25fc8@redhat.com

Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Cc: qemu-stable@nongnu.org
Cc: Wei Wang <wei.w.wang@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210708095339.20274-2-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit fd51e54fa10221e5a8add894c38cc1cf199f4bc4)
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 hw/virtio/virtio-balloon.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index d120bf8f43..4cdbe6b540 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -30,6 +30,7 @@
 #include "trace.h"
 #include "qemu/error-report.h"
 #include "migration/misc.h"
+#include "migration/migration.h"
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
@@ -662,6 +663,18 @@ virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data)
         return 0;
     }
 
+    /*
+     * Pages hinted via qemu_guest_free_page_hint() are cleared from the dirty
+     * bitmap and will not get migrated, especially also not when the postcopy
+     * destination starts using them and requests migration from the source; the
+     * faulting thread will stall until postcopy migration finishes and
+     * all threads are woken up. Let's not start free page hinting if postcopy
+     * is possible.
+     */
+    if (migrate_postcopy_ram()) {
+        return 0;
+    }
+
     switch (pnd->reason) {
     case PRECOPY_NOTIFY_SETUP:
         precopy_enable_free_page_optimization();
-- 
2.25.1



  parent reply	other threads:[~2021-10-19 14:53 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-19 14:08 [PATCH 00/64] Patch Round-up for stable 6.0.1, freeze on 2021-10-26 Michael Roth
2021-10-19 14:08 ` [PATCH 01/64] multi-process: Initialize variables declared with g_auto* Michael Roth
2021-10-19 14:08 ` [PATCH 02/64] linux-user/aarch64: Enable hwcap for RND, BTI, and MTE Michael Roth
2021-10-19 14:08 ` [PATCH 03/64] docs/system: Document the removal of "compat" property for POWER CPUs Michael Roth
2021-10-19 14:08 ` [PATCH 04/64] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB Michael Roth
2021-10-19 14:08 ` [PATCH 05/64] migration/rdma: Fix cm_event used before being initialized Michael Roth
2021-10-19 14:08 ` [PATCH 06/64] target/i386: Exit tb after wrmsr Michael Roth
2021-10-19 14:08 ` [PATCH 07/64] target/ppc: Fix load endianness for lxvwsx/lxvdsx Michael Roth
2021-10-19 14:08 ` [PATCH 08/64] vl: allow not specifying size in -m when using -M memory-backend Michael Roth
2021-10-19 14:08 ` [PATCH 09/64] target/xtensa: fix access ring in l32ex Michael Roth
2021-10-19 14:08 ` [PATCH 10/64] qemu-option: support accept-any QemuOptsList in qemu_opts_absorb_qdict Michael Roth
2021-10-19 14:08 ` [PATCH 11/64] qemu-config: load modules when instantiating option groups Michael Roth
2021-10-19 14:08 ` [PATCH 12/64] qemu-config: parse configuration files to a QDict Michael Roth
2021-10-19 14:08 ` [PATCH 13/64] vl: plumb keyval-based options into -readconfig Michael Roth
2021-10-19 14:08 ` [PATCH 14/64] vl: plug -object back " Michael Roth
2021-10-19 14:08 ` [PATCH 15/64] sockets: update SOCKET_ADDRESS_TYPE_FD listen(2) backlog Michael Roth
2021-10-19 14:08 ` [PATCH 16/64] hmp: Fix loadvm to resume the VM on success instead of failure Michael Roth
2021-10-19 14:08 ` [PATCH 17/64] configure: fix detection of gdbus-codegen Michael Roth
2021-10-19 14:08 ` [PATCH 18/64] vhost-vdpa: don't initialize backend_features Michael Roth
2021-10-19 14:08 ` [PATCH 19/64] esp: only assert INTR_DC interrupt flag if selection fails Michael Roth
2021-10-19 14:09 ` [PATCH 20/64] esp: only set ESP_RSEQ at the start of the select sequence Michael Roth
2021-10-19 14:09 ` [PATCH 21/64] runstate: Initialize Error * to NULL Michael Roth
2021-10-19 14:09 ` [PATCH 22/64] vfio: Fix unregister SaveVMHandler in vfio_migration_finalize Michael Roth
2021-10-19 14:09 ` [PATCH 23/64] vl: Fix an assert failure in error path Michael Roth
2021-10-19 14:09 ` [PATCH 24/64] tcg/sparc: Fix temp_allocate_frame vs sparc stack bias Michael Roth
2021-10-19 14:09 ` [PATCH 25/64] tcg: Allocate sufficient storage in temp_allocate_frame Michael Roth
2021-10-19 14:09 ` [PATCH 26/64] hw/pci-host/q35: Ignore write of reserved PCIEXBAR LENGTH field Michael Roth
2021-10-19 14:09 ` [PATCH 27/64] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device Michael Roth
2021-10-19 14:09 ` [PATCH 28/64] crypto/tlscreds: Introduce qcrypto_tls_creds_check_endpoint() helper Michael Roth
2021-10-19 14:09 ` [PATCH 29/64] block/nbd: Use qcrypto_tls_creds_check_endpoint() Michael Roth
2021-10-19 14:09 ` [PATCH 30/64] qemu-nbd: " Michael Roth
2021-10-19 14:09 ` [PATCH 31/64] chardev/socket: " Michael Roth
2021-10-19 14:09 ` [PATCH 32/64] migration/tls: " Michael Roth
2021-10-19 14:09 ` [PATCH 33/64] ui/vnc: " Michael Roth
2021-10-19 14:09 ` [PATCH 34/64] crypto: Make QCryptoTLSCreds* structures private Michael Roth
2021-10-19 14:09 ` [PATCH 35/64] yank: Unregister function when using TLS migration Michael Roth
2021-10-19 14:09 ` [PATCH 36/64] tests: acpi: prepare for changing DSDT tables Michael Roth
2021-10-19 14:09 ` [PATCH 37/64] acpi: pc: revert back to v5.2 PCI slot enumeration Michael Roth
2021-10-19 14:09 ` [PATCH 38/64] tests: acpi: pc: update expected DSDT blobs Michael Roth
2021-10-19 14:09 ` [PATCH 39/64] hw/block/nvme: align with existing style Michael Roth
2021-10-19 14:09 ` [PATCH 40/64] hw/nvme: fix missing check for PMR capability Michael Roth
2021-10-19 14:09 ` [PATCH 41/64] hw/nvme: fix pin-based interrupt behavior (again) Michael Roth
2021-10-19 14:09 ` Michael Roth [this message]
2021-10-19 14:09 ` [PATCH 43/64] hw/net/can: sja1000 fix buff2frame_bas and buff2frame_pel when dlc is out of std CAN 8 bytes Michael Roth
2021-10-19 14:09 ` [PATCH 44/64] hw/sd/sdcard: Document out-of-range addresses for SEND_WRITE_PROT Michael Roth
2021-10-19 14:09 ` [PATCH 45/64] hw/sd/sdcard: Fix assertion accessing out-of-range addresses with CMD30 Michael Roth
2021-10-19 14:09 ` [PATCH 46/64] audio: Never send migration section Michael Roth
2021-10-19 14:09 ` [PATCH 47/64] target/arm: Don't skip M-profile reset entirely in user mode Michael Roth
2021-10-19 14:09 ` [PATCH 48/64] virtio-net: fix use after unmap/free for sg Michael Roth
2021-10-19 14:09 ` [PATCH 49/64] qemu-nbd: Change default cache mode to writeback Michael Roth
2021-10-19 14:09 ` [PATCH 50/64] hmp: Unbreak "change vnc" Michael Roth
2021-10-19 14:09 ` [PATCH 51/64] virtio-mem-pci: Fix memory leak when creating MEMORY_DEVICE_SIZE_CHANGE event Michael Roth
2021-10-19 14:09 ` [PATCH 52/64] uas: add stream number sanity checks Michael Roth
2021-10-19 14:09 ` [PATCH 53/64] usb/redir: avoid dynamic stack allocation (CVE-2021-3527) Michael Roth
2021-10-19 14:09 ` [PATCH 54/64] usb: limit combined packets to 1 MiB (CVE-2021-3527) Michael Roth
2021-10-19 14:09 ` [PATCH 55/64] vhost-user-gpu: fix memory disclosure in virgl_cmd_get_capset_info (CVE-2021-3545) Michael Roth
2021-10-19 14:09 ` [PATCH 56/64] vhost-user-gpu: fix resource leak in 'vg_resource_create_2d' (CVE-2021-3544) Michael Roth
2021-10-19 14:09 ` [PATCH 57/64] vhost-user-gpu: fix memory leak in vg_resource_attach_backing (CVE-2021-3544) Michael Roth
2021-10-19 14:09 ` [PATCH 58/64] vhost-user-gpu: fix memory leak while calling 'vg_resource_unref' (CVE-2021-3544) Michael Roth
2021-10-19 14:09 ` [PATCH 59/64] vhost-user-gpu: fix memory leak in 'virgl_cmd_resource_unref' (CVE-2021-3544) Michael Roth
2021-10-19 14:09 ` [PATCH 60/64] vhost-user-gpu: fix memory leak in 'virgl_resource_attach_backing' (CVE-2021-3544) Michael Roth
2021-10-19 14:09 ` [PATCH 61/64] vhost-user-gpu: fix OOB write in 'virgl_cmd_get_capset' (CVE-2021-3546) Michael Roth
2021-10-19 14:09 ` [PATCH 62/64] hw/rdma: Fix possible mremap overflow in the pvrdma device (CVE-2021-3582) Michael Roth
2021-10-19 14:09 ` [PATCH 63/64] pvrdma: Ensure correct input on ring init (CVE-2021-3607) Michael Roth
2021-10-19 14:09 ` [PATCH 64/64] pvrdma: Fix the ring init error flow (CVE-2021-3608) Michael Roth
2021-10-19 14:43 ` [PATCH 00/64] Patch Round-up for stable 6.0.1, freeze on 2021-10-26 Ani Sinha
2021-10-19 14:45   ` Michael S. Tsirkin
2021-10-19 18:22   ` Michael Roth
2021-10-19 23:05     ` Ani Sinha
2021-10-19 14:52 ` Christian Schoenebeck
2021-10-19 15:26   ` Greg Kurz
2021-10-19 15:37     ` Christian Schoenebeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211019140944.152419-43-michael.roth@amd.com \
    --to=michael.roth@amd.com \
    --cc=alexander.duyck@gmail.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=wei.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).