qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: qemu-devel@nongnu.org
Cc: Michal Privoznik <mprivozn@redhat.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	David Hildenbrand <david@redhat.com>
Subject: [PULL v3 40/55] virtio-mem: Support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE
Date: Fri, 7 Jan 2022 20:05:43 -0500	[thread overview]
Message-ID: <20220108003423.15830-41-mst@redhat.com> (raw)
In-Reply-To: <20220108003423.15830-1-mst@redhat.com>

From: David Hildenbrand <david@redhat.com>

With VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we signal the VM that reading
unplugged memory is not supported. We have to fail feature negotiation
in case the guest does not support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE.

First, VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE is required to properly handle
memory backends (or architectures) without support for the shared zeropage
in the hypervisor cleanly. Without the shared zeropage, even reading an
unpopulated virtual memory location can populate real memory and
consequently consume memory in the hypervisor. We have a guaranteed shared
zeropage only on MAP_PRIVATE anonymous memory.

Second, we want VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE to be the default
long-term as even populating the shared zeropage can be problematic: for
example, without THP support (possible) or without support for the shared
huge zeropage with THP (unlikely), the PTE page tables to hold the shared
zeropage entries can consume quite some memory that cannot be reclaimed
easily.

Third, there are other optimizations+features (e.g., protection of
unplugged memory, reducing the total memory slot size and bitmap sizes)
that will require VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE.

We really only support x86 targets with virtio-mem for now (and
Linux similarly only support x86), but that might change soon, so prepare
for different targets already.

Add a new "unplugged-inaccessible" tristate property for x86 targets:
- "off" will keep VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE unset and legacy
  guests working.
- "on" will set VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE and stop legacy guests
  from using the device.
- "auto" selects the default based on support for the shared zeropage.

Warn in case the property is set to "off" and we don't have support for the
shared zeropage.

For existing compat machines, the property will default to "off", to
not change the behavior but eventually warn about a problematic setup.
Short-term, we'll set the property default to "auto" for new QEMU machines.
Mid-term, we'll set the property default to "on" for new QEMU machines.
Long-term, we'll deprecate the parameter and disallow legacy
guests completely.

The property has to match on the migration source and destination. "auto"
will result in the same VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE setting as long
as the qemu command line (esp. memdev) match -- so "auto" is good enough
for migration purposes and the parameter doesn't have to be migrated
explicitly.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20211217134039.29670-3-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/virtio/virtio-mem.h |  8 +++++
 hw/virtio/virtio-mem.c         | 63 ++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h
index 0ac7bcb3b6..7745cfc1a3 100644
--- a/include/hw/virtio/virtio-mem.h
+++ b/include/hw/virtio/virtio-mem.h
@@ -30,6 +30,7 @@ OBJECT_DECLARE_TYPE(VirtIOMEM, VirtIOMEMClass,
 #define VIRTIO_MEM_REQUESTED_SIZE_PROP "requested-size"
 #define VIRTIO_MEM_BLOCK_SIZE_PROP "block-size"
 #define VIRTIO_MEM_ADDR_PROP "memaddr"
+#define VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP "unplugged-inaccessible"
 #define VIRTIO_MEM_PREALLOC_PROP "prealloc"
 
 struct VirtIOMEM {
@@ -63,6 +64,13 @@ struct VirtIOMEM {
     /* block size and alignment */
     uint64_t block_size;
 
+    /*
+     * Whether we indicate VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE to the guest.
+     * For !x86 targets this will always be "on" and consequently indicate
+     * VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE.
+     */
+    OnOffAuto unplugged_inaccessible;
+
     /* whether to prealloc memory when plugging new blocks */
     bool prealloc;
 
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index ab975ff566..fb6687d4c7 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -32,6 +32,14 @@
 #include CONFIG_DEVICES
 #include "trace.h"
 
+/*
+ * We only had legacy x86 guests that did not support
+ * VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE. Other targets don't have legacy guests.
+ */
+#if defined(TARGET_X86_64) || defined(TARGET_I386)
+#define VIRTIO_MEM_HAS_LEGACY_GUESTS
+#endif
+
 /*
  * Let's not allow blocks smaller than 1 MiB, for example, to keep the tracking
  * bitmap small.
@@ -110,6 +118,19 @@ static uint64_t virtio_mem_default_block_size(RAMBlock *rb)
     return MAX(page_size, VIRTIO_MEM_MIN_BLOCK_SIZE);
 }
 
+#if defined(VIRTIO_MEM_HAS_LEGACY_GUESTS)
+static bool virtio_mem_has_shared_zeropage(RAMBlock *rb)
+{
+    /*
+     * We only have a guaranteed shared zeropage on ordinary MAP_PRIVATE
+     * anonymous RAM. In any other case, reading unplugged *can* populate a
+     * fresh page, consuming actual memory.
+     */
+    return !qemu_ram_is_shared(rb) && rb->fd < 0 &&
+           qemu_ram_pagesize(rb) == qemu_real_host_page_size;
+}
+#endif /* VIRTIO_MEM_HAS_LEGACY_GUESTS */
+
 /*
  * Size the usable region bigger than the requested size if possible. Esp.
  * Linux guests will only add (aligned) memory blocks in case they fully
@@ -683,15 +704,29 @@ static uint64_t virtio_mem_get_features(VirtIODevice *vdev, uint64_t features,
                                         Error **errp)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
+    VirtIOMEM *vmem = VIRTIO_MEM(vdev);
 
     if (ms->numa_state) {
 #if defined(CONFIG_ACPI)
         virtio_add_feature(&features, VIRTIO_MEM_F_ACPI_PXM);
 #endif
     }
+    assert(vmem->unplugged_inaccessible != ON_OFF_AUTO_AUTO);
+    if (vmem->unplugged_inaccessible == ON_OFF_AUTO_ON) {
+        virtio_add_feature(&features, VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE);
+    }
     return features;
 }
 
+static int virtio_mem_validate_features(VirtIODevice *vdev)
+{
+    if (virtio_host_has_feature(vdev, VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE) &&
+        !virtio_vdev_has_feature(vdev, VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE)) {
+        return -EFAULT;
+    }
+    return 0;
+}
+
 static void virtio_mem_system_reset(void *opaque)
 {
     VirtIOMEM *vmem = VIRTIO_MEM(opaque);
@@ -746,6 +781,29 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
     rb = vmem->memdev->mr.ram_block;
     page_size = qemu_ram_pagesize(rb);
 
+#if defined(VIRTIO_MEM_HAS_LEGACY_GUESTS)
+    switch (vmem->unplugged_inaccessible) {
+    case ON_OFF_AUTO_AUTO:
+        if (virtio_mem_has_shared_zeropage(rb)) {
+            vmem->unplugged_inaccessible = ON_OFF_AUTO_OFF;
+        } else {
+            vmem->unplugged_inaccessible = ON_OFF_AUTO_ON;
+        }
+        break;
+    case ON_OFF_AUTO_OFF:
+        if (!virtio_mem_has_shared_zeropage(rb)) {
+            warn_report("'%s' property set to 'off' with a memdev that does"
+                        " not support the shared zeropage.",
+                        VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP);
+        }
+        break;
+    default:
+        break;
+    }
+#else /* VIRTIO_MEM_HAS_LEGACY_GUESTS */
+    vmem->unplugged_inaccessible = ON_OFF_AUTO_ON;
+#endif /* VIRTIO_MEM_HAS_LEGACY_GUESTS */
+
     /*
      * If the block size wasn't configured by the user, use a sane default. This
      * allows using hugetlbfs backends of any page size without manual
@@ -1141,6 +1199,10 @@ static Property virtio_mem_properties[] = {
     DEFINE_PROP_BOOL(VIRTIO_MEM_PREALLOC_PROP, VirtIOMEM, prealloc, false),
     DEFINE_PROP_LINK(VIRTIO_MEM_MEMDEV_PROP, VirtIOMEM, memdev,
                      TYPE_MEMORY_BACKEND, HostMemoryBackend *),
+#if defined(VIRTIO_MEM_HAS_LEGACY_GUESTS)
+    DEFINE_PROP_ON_OFF_AUTO(VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP, VirtIOMEM,
+                            unplugged_inaccessible, ON_OFF_AUTO_OFF),
+#endif
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1279,6 +1341,7 @@ static void virtio_mem_class_init(ObjectClass *klass, void *data)
     vdc->unrealize = virtio_mem_device_unrealize;
     vdc->get_config = virtio_mem_get_config;
     vdc->get_features = virtio_mem_get_features;
+    vdc->validate_features = virtio_mem_validate_features;
     vdc->vmsd = &vmstate_virtio_mem_device;
 
     vmc->fill_device_info = virtio_mem_fill_device_info;
-- 
MST



  parent reply	other threads:[~2022-01-08  1:47 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-08  1:03 [PULL v3 00/55] virtio,pci,pc: features,fixes,cleanups Michael S. Tsirkin
2022-01-08  1:03 ` [PULL v3 01/55] virtio-mem: Don't skip alignment checks when warning about block size Michael S. Tsirkin
2022-01-08  1:03 ` [PULL v3 02/55] acpi: validate hotplug selector on access Michael S. Tsirkin
2022-01-08  1:03 ` [PULL v3 03/55] virtio: introduce macro IRTIO_CONFIG_IRQ_IDX Michael S. Tsirkin
2022-01-08  1:03 ` [PULL v3 04/55] virtio-pci: decouple notifier from interrupt process Michael S. Tsirkin
2022-01-08  1:03 ` [PULL v3 05/55] virtio-pci: decouple the single vector from the " Michael S. Tsirkin
2022-01-08  1:03 ` [PULL v3 06/55] vhost: introduce new VhostOps vhost_set_config_call Michael S. Tsirkin
2022-01-08  1:03 ` [PULL v3 07/55] vhost-vdpa: add support for config interrupt Michael S. Tsirkin
2022-01-08  1:03 ` [PULL v3 08/55] virtio: add support for configure interrupt Michael S. Tsirkin
2022-01-08  1:03 ` [PULL v3 09/55] vhost: " Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 10/55] virtio-net: " Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 11/55] virtio-mmio: " Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 12/55] virtio-pci: " Michael S. Tsirkin
2022-01-09  6:17   ` Volker Rümelin
2022-01-09 16:11     ` Michael S. Tsirkin
2022-01-09 17:52       ` Volker Rümelin
2022-01-09 18:01         ` Michael S. Tsirkin
2022-01-09 18:54           ` Volker Rümelin
2022-01-09 20:19             ` Volker Rümelin
2022-01-09 13:33   ` Cédric Le Goater
2022-01-09 15:54     ` Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 13/55] trace-events,pci: unify trace events format Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 14/55] vhost-user-blk: reconnect on any error during realize Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 15/55] chardev/char-socket: tcp_chr_recv: don't clobber errno Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 16/55] chardev/char-socket: tcp_chr_sync_read: " Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 17/55] vhost-backend: avoid overflow on memslots_limit Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 18/55] vhost-backend: stick to -errno error return convention Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 19/55] vhost-vdpa: " Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 20/55] vhost-user: " Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 21/55] vhost: " Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 22/55] vhost-user-blk: propagate error return from generic vhost Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 23/55] pci: Export the pci_intx() function Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 24/55] pcie_aer: Don't trigger a LSI if none are defined Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 25/55] smbios: Rename SMBIOS_ENTRY_POINT_* enums Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 26/55] hw/smbios: Use qapi for SmbiosEntryPointType Michael S. Tsirkin
2022-01-08  1:04 ` [PULL v3 27/55] hw/i386: expose a "smbios-entry-point-type" PC machine property Michael S. Tsirkin
2022-01-08  1:05 ` [PULL v3 28/55] hw/vhost-user-blk: turn on VIRTIO_BLK_F_SIZE_MAX feature for virtio blk device Michael S. Tsirkin
2022-01-08  1:05 ` [PULL v3 29/55] util/oslib-posix: Let touch_all_pages() return an error Michael S. Tsirkin
2022-01-08  1:05 ` [PULL v3 30/55] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() Michael S. Tsirkin
2022-01-08  1:05 ` [PULL v3 31/55] util/oslib-posix: Introduce and use MemsetContext for touch_all_pages() Michael S. Tsirkin
2022-01-08  1:05 ` [PULL v3 32/55] util/oslib-posix: Don't create too many threads with small memory or little pages Michael S. Tsirkin
2022-01-08  1:05 ` [PULL v3 33/55] util/oslib-posix: Avoid creating a single thread with MADV_POPULATE_WRITE Michael S. Tsirkin
2022-01-08  1:05 ` [PULL v3 34/55] util/oslib-posix: Support concurrent os_mem_prealloc() invocation Michael S. Tsirkin
2022-01-08  1:05 ` [PULL v3 35/55] util/oslib-posix: Forward SIGBUS to MCE handler under Linux Michael S. Tsirkin
2022-01-08  1:05 ` [PULL v3 36/55] virtio-mem: Support "prealloc=on" option Michael S. Tsirkin
2022-01-08  1:05 ` [PULL v3 37/55] virtio: signal after wrapping packed used_idx Michael S. Tsirkin
2022-01-08  1:05 ` [PULL v3 38/55] MAINTAINERS: Add a separate entry for acpi/VIOT tables Michael S. Tsirkin
2022-01-08  1:05 ` [PULL v3 39/55] linux-headers: sync VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE Michael S. Tsirkin
2022-01-08  1:05 ` Michael S. Tsirkin [this message]
2022-01-08  1:05 ` [PULL v3 41/55] virtio-mem: Set "unplugged-inaccessible=auto" for the 7.0 machine on x86 Michael S. Tsirkin
2022-01-08  1:05 ` [PULL v3 42/55] intel-iommu: correctly check passthrough during translation Michael S. Tsirkin
2022-01-08  1:05 ` [PULL v3 43/55] acpi: fix QEMU crash when started with SLIC table Michael S. Tsirkin
2022-01-08  1:05 ` [PULL v3 44/55] tests: acpi: whitelist expected blobs before changing them Michael S. Tsirkin
2022-01-08  1:06 ` [PULL v3 45/55] tests: acpi: add SLIC table test Michael S. Tsirkin
2022-01-08  1:06 ` [PULL v3 46/55] tests: acpi: SLIC: update expected blobs Michael S. Tsirkin
2022-01-08  1:06 ` [PULL v3 47/55] acpihp: simplify acpi_pcihp_disable_root_bus Michael S. Tsirkin
2022-01-08  1:06 ` [PULL v3 48/55] hw/i386/pc: Add missing property descriptions Michael S. Tsirkin
2022-01-08  1:06 ` [PULL v3 49/55] docs: reSTify virtio-balloon-stats documentation and move to docs/interop Michael S. Tsirkin
2022-01-08  1:06 ` [PULL v3 50/55] hw/scsi/vhost-scsi: don't leak vqs on error Michael S. Tsirkin
2022-01-08  1:06 ` [PULL v3 51/55] hw/scsi/vhost-scsi: don't double close vhostfd " Michael S. Tsirkin
2022-01-08  1:06 ` [PULL v3 52/55] virtio/vhost-vsock: don't double close vhostfd, remove redundant cleanup Michael S. Tsirkin
2022-01-08  1:06 ` [PULL v3 53/55] tests: acpi: prepare for updated TPM related tables Michael S. Tsirkin
2022-01-08  1:06 ` [PULL v3 54/55] acpi: tpm: Add missing device identification objects Michael S. Tsirkin
2022-01-08  1:06 ` [PULL v3 55/55] tests: acpi: Add updated TPM related tables Michael S. Tsirkin
2022-01-08  5:32 ` [PULL v3 00/55] virtio,pci,pc: features,fixes,cleanups Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220108003423.15830-41-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=david@redhat.com \
    --cc=mprivozn@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).