qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: qemu-devel@nongnu.org
Cc: Michal Privoznik <mprivozn@redhat.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	David Hildenbrand <david@redhat.com>
Subject: [PULL v2 36/55] virtio-mem: Support "prealloc=on" option
Date: Fri, 7 Jan 2022 06:04:52 -0500	[thread overview]
Message-ID: <20220107102526.39238-37-mst@redhat.com> (raw)
In-Reply-To: <20220107102526.39238-1-mst@redhat.com>

From: David Hildenbrand <david@redhat.com>

For scarce memory resources, such as hugetlb, we want to be able to
prealloc such memory resources in order to not crash later on access. On
simple user errors we could otherwise easily run out of memory resources
an crash the VM -- pretty much undesired.

For ordinary memory devices, such as DIMMs, we preallocate memory via the
memory backend for such use cases; however, with virtio-mem we're dealing
with sparse memory backends; preallocating the whole memory backend
destroys the whole purpose of virtio-mem.

Instead, we want to preallocate memory when actually exposing memory to the
VM dynamically, and fail plugging memory gracefully + warn the user in case
preallocation fails.

A common use case for hugetlb will be using "reserve=off,prealloc=off" for
the memory backend and "prealloc=on" for the virtio-mem device. This
way, no huge pages will be reserved for the process, but we can recover
if there are no actual huge pages when plugging memory. Libvirt is
already prepared for this.

Note that preallocation cannot protect from the OOM killer -- which
holds true for any kind of preallocation in QEMU. It's primarily useful
only for scarce memory resources such as hugetlb, or shared file-backed
memory. It's of little use for ordinary anonymous memory that can be
swapped, KSM merged, ... but we won't forbid it.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20211217134611.31172-9-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/virtio/virtio-mem.h |  4 ++++
 hw/virtio/virtio-mem.c         | 39 ++++++++++++++++++++++++++++++----
 2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h
index a5dd6a493b..0ac7bcb3b6 100644
--- a/include/hw/virtio/virtio-mem.h
+++ b/include/hw/virtio/virtio-mem.h
@@ -30,6 +30,7 @@ OBJECT_DECLARE_TYPE(VirtIOMEM, VirtIOMEMClass,
 #define VIRTIO_MEM_REQUESTED_SIZE_PROP "requested-size"
 #define VIRTIO_MEM_BLOCK_SIZE_PROP "block-size"
 #define VIRTIO_MEM_ADDR_PROP "memaddr"
+#define VIRTIO_MEM_PREALLOC_PROP "prealloc"
 
 struct VirtIOMEM {
     VirtIODevice parent_obj;
@@ -62,6 +63,9 @@ struct VirtIOMEM {
     /* block size and alignment */
     uint64_t block_size;
 
+    /* whether to prealloc memory when plugging new blocks */
+    bool prealloc;
+
     /* notifiers to notify when "size" changes */
     NotifierList size_change_notifiers;
 
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 341c3fa2c1..ab975ff566 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -429,10 +429,40 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa,
             return -EBUSY;
         }
         virtio_mem_notify_unplug(vmem, offset, size);
-    } else if (virtio_mem_notify_plug(vmem, offset, size)) {
-        /* Could be a mapping attempt resulted in memory getting populated. */
-        ram_block_discard_range(vmem->memdev->mr.ram_block, offset, size);
-        return -EBUSY;
+    } else {
+        int ret = 0;
+
+        if (vmem->prealloc) {
+            void *area = memory_region_get_ram_ptr(&vmem->memdev->mr) + offset;
+            int fd = memory_region_get_fd(&vmem->memdev->mr);
+            Error *local_err = NULL;
+
+            os_mem_prealloc(fd, area, size, 1, &local_err);
+            if (local_err) {
+                static bool warned;
+
+                /*
+                 * Warn only once, we don't want to fill the log with these
+                 * warnings.
+                 */
+                if (!warned) {
+                    warn_report_err(local_err);
+                    warned = true;
+                } else {
+                    error_free(local_err);
+                }
+                ret = -EBUSY;
+            }
+        }
+        if (!ret) {
+            ret = virtio_mem_notify_plug(vmem, offset, size);
+        }
+
+        if (ret) {
+            /* Could be preallocation or a notifier populated memory. */
+            ram_block_discard_range(vmem->memdev->mr.ram_block, offset, size);
+            return -EBUSY;
+        }
     }
     virtio_mem_set_bitmap(vmem, start_gpa, size, plug);
     return 0;
@@ -1108,6 +1138,7 @@ static void virtio_mem_instance_init(Object *obj)
 static Property virtio_mem_properties[] = {
     DEFINE_PROP_UINT64(VIRTIO_MEM_ADDR_PROP, VirtIOMEM, addr, 0),
     DEFINE_PROP_UINT32(VIRTIO_MEM_NODE_PROP, VirtIOMEM, node, 0),
+    DEFINE_PROP_BOOL(VIRTIO_MEM_PREALLOC_PROP, VirtIOMEM, prealloc, false),
     DEFINE_PROP_LINK(VIRTIO_MEM_MEMDEV_PROP, VirtIOMEM, memdev,
                      TYPE_MEMORY_BACKEND, HostMemoryBackend *),
     DEFINE_PROP_END_OF_LIST(),
-- 
MST



  parent reply	other threads:[~2022-01-07 11:39 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-07 11:03 [PULL v2 00/55] virtio,pci,pc: features,fixes,cleanups Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 01/55] virtio-mem: Don't skip alignment checks when warning about block size Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 02/55] acpi: validate hotplug selector on access Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 03/55] virtio: introduce macro IRTIO_CONFIG_IRQ_IDX Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 04/55] virtio-pci: decouple notifier from interrupt process Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 05/55] virtio-pci: decouple the single vector from the " Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 06/55] vhost: introduce new VhostOps vhost_set_config_call Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 07/55] vhost-vdpa: add support for config interrupt Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 08/55] virtio: add support for configure interrupt Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 09/55] vhost: " Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 10/55] virtio-net: " Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 11/55] virtio-mmio: " Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 12/55] virtio-pci: " Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 13/55] trace-events,pci: unify trace events format Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 14/55] vhost-user-blk: reconnect on any error during realize Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 15/55] chardev/char-socket: tcp_chr_recv: don't clobber errno Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 16/55] chardev/char-socket: tcp_chr_sync_read: " Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 17/55] vhost-backend: avoid overflow on memslots_limit Michael S. Tsirkin
2022-01-07 11:03 ` [PULL v2 18/55] vhost-backend: stick to -errno error return convention Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 19/55] vhost-vdpa: " Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 20/55] vhost-user: " Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 21/55] vhost: " Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 22/55] vhost-user-blk: propagate error return from generic vhost Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 23/55] pci: Export the pci_intx() function Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 24/55] pcie_aer: Don't trigger a LSI if none are defined Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 25/55] smbios: Rename SMBIOS_ENTRY_POINT_* enums Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 26/55] hw/smbios: Use qapi for SmbiosEntryPointType Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 27/55] hw/i386: expose a "smbios-entry-point-type" PC machine property Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 28/55] hw/vhost-user-blk: turn on VIRTIO_BLK_F_SIZE_MAX feature for virtio blk device Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 29/55] util/oslib-posix: Let touch_all_pages() return an error Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 30/55] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 31/55] util/oslib-posix: Introduce and use MemsetContext for touch_all_pages() Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 32/55] util/oslib-posix: Don't create too many threads with small memory or little pages Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 33/55] util/oslib-posix: Avoid creating a single thread with MADV_POPULATE_WRITE Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 34/55] util/oslib-posix: Support concurrent os_mem_prealloc() invocation Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 35/55] util/oslib-posix: Forward SIGBUS to MCE handler under Linux Michael S. Tsirkin
2022-01-07 11:04 ` Michael S. Tsirkin [this message]
2022-01-07 11:04 ` [PULL v2 37/55] virtio: signal after wrapping packed used_idx Michael S. Tsirkin
2022-01-07 11:04 ` [PULL v2 38/55] MAINTAINERS: Add a separate entry for acpi/VIOT tables Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 39/55] linux-headers: sync VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 40/55] virtio-mem: Support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 41/55] virtio-mem: Set "unplugged-inaccessible=auto" for the 7.0 machine on x86 Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 42/55] intel-iommu: correctly check passthrough during translation Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 43/55] acpi: fix QEMU crash when started with SLIC table Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 44/55] tests: acpi: whitelist expected blobs before changing them Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 45/55] tests: acpi: add SLIC table test Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 46/55] tests: acpi: SLIC: update expected blobs Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 47/55] acpihp: simplify acpi_pcihp_disable_root_bus Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 48/55] hw/i386/pc: Add missing property descriptions Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 49/55] docs: reSTify virtio-balloon-stats documentation and move to docs/interop Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 50/55] hw/scsi/vhost-scsi: don't leak vqs on error Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 51/55] hw/scsi/vhost-scsi: don't double close vhostfd " Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 52/55] virtio/vhost-vsock: don't double close vhostfd, remove redundant cleanup Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 53/55] tests: acpi: prepare for updated TPM related tables Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 54/55] acpi: tpm: Add missing device identification objects Michael S. Tsirkin
2022-01-07 11:05 ` [PULL v2 55/55] tests: acpi: Add updated TPM related tables Michael S. Tsirkin
2022-01-07 19:38 ` [PULL v2 00/55] virtio,pci,pc: features,fixes,cleanups Richard Henderson
2022-01-08  0:34   ` Michael S. Tsirkin
2022-01-09 12:20     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220107102526.39238-37-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=david@redhat.com \
    --cc=mprivozn@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).