qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Raphael Norwitz <raphael.norwitz@nutanix.com>
To: mst@redhat.com, qemu-devel@nongnu.org
Cc: raphael.s.norwitz@gmail.com,
	Raphael Norwitz <raphael.norwitz@nutanix.com>
Subject: [RFC PATCH 3/3] Introduce Configurable Number of Memory Slots Exposed by vhost-user:
Date: Mon,  9 Dec 2019 02:00:47 -0500	[thread overview]
Message-ID: <1575874847-5792-4-git-send-email-raphael.norwitz@nutanix.com> (raw)
In-Reply-To: <1575874847-5792-1-git-send-email-raphael.norwitz@nutanix.com>

The current vhost-user implementation in Qemu imposes a limit on the
maximum number of memory slots exposed to a VM using a vhost-user
device. This change provides a new protocol feature
VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit
and allows a VM with a vhost-user device to expose a configurable
number of memory slots, up to the maximum supported by the platform.
Existing backends are unaffected.

This feature works by using three new messages,
VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and
VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the
number of memory slots the backend is willing to accept. Then,
when the memory tables are set or updated, a series of
VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages are sent
to transmit the regions to map and/or unmap instead of trying to
send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE message.

The vhost_user struct maintains a shadow state of the VM’s memory
regions. When the memory tables are modified, the
vhost_user_set_mem_table() function compares the new device memory state
to the shadow state and only sends regions which need to be unmapped or
mapped in. The regions which must be unmapped are sent first, followed
by the new regions to be mapped in. After all the messages have been sent,
the shadow state is set to the current virtual device state.

The current feature implementation does not work with postcopy migration
and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has
also been negotiated.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
---
 docs/interop/vhost-user.rst |  43 ++++++++
 hw/virtio/vhost-user.c      | 251 ++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 273 insertions(+), 21 deletions(-)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 7827b71..855a072 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -785,6 +785,7 @@ Protocol features
   #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD  10
   #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
   #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
+  #define VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS   13
 
 Master message types
 --------------------
@@ -1190,6 +1191,48 @@ Master message types
   ancillary data. The GPU protocol is used to inform the master of
   rendering state and updates. See vhost-user-gpu.rst for details.
 
+``VHOST_USER_GET_MAX_MEM_SLOTS``
+  :id: 34
+  :equivalent ioctl: N/A
+  :slave payload: u64
+
+  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
+  successfully negotiated, this message is submitted by master to the
+  slave. The slave should return the message with a u64 payload
+  containing the maximum number of memory slots for QEMU to expose to
+  the guest. This message is not supported with postcopy migration or if
+  the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
+
+``VHOST_USER_ADD_MEM_REG``
+  :id: 35
+  :equivalent ioctl: N/A
+  :slave payload: memory region
+
+  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
+  successfully negotiated, this message is submitted by master to the slave.
+  The message payload contains a memory region descriptor struct, describing
+  a region of guest memory which the slave device must map in. When the
+  VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully
+  negotiated, along with the VHOST_USER_REM_MEM_REG message, this message is
+  used to set and update the memory tables of the slave device. This message
+  is not supported with postcopy migration or if the
+  VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
+
+``VHOST_USER_REM_MEM_REG``
+  :id: 36
+  :equivalent ioctl: N/A
+  :slave payload: memory region
+
+  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
+  successfully negotiated, this message is submitted by master to the slave.
+  The message payload contains a memory region descriptor struct, describing
+  a region of guest memory which the slave device must unmap. When the
+  VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully
+  negotiated, along with the VHOST_USER_ADD_MEM_REG message, this message is
+  used to set and update the memory tables of the slave device. This message
+  is not supported with postcopy migration or if the
+  VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
+
 Slave message types
 -------------------
 
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 2134e81..3432462 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -35,11 +35,29 @@
 #include <linux/userfaultfd.h>
 #endif
 
-#define VHOST_MEMORY_MAX_NREGIONS    8
+#define VHOST_MEMORY_LEGACY_NREGIONS    8
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
 #define VHOST_USER_SLAVE_MAX_FDS     8
 
 /*
+ * Set maximum number of RAM slots supported to
+ * the maximum number supported by the target
+ * hardware plaform.
+ */
+#if defined(TARGET_X86) || defined(TARGET_X86_64) || \
+    defined(TARGET_ARM) || defined(TARGET_ARM_64)
+#include "hw/acpi/acpi.h"
+#define VHOST_USER_MAX_RAM_SLOTS ACPI_MAX_RAM_SLOTS
+
+#elif defined(TARGET_PPC) || defined(TARGET_PPC_64)
+#include "hw/ppc/spapr.h"
+#define VHOST_USER_MAX_RAM_SLOTS SPAPR_MAX_RAM_SLOTS
+
+#else
+#define VHOST_USER_MAX_RAM_SLOTS 512
+#endif
+
+/*
  * Maximum size of virtio device config space
  */
 #define VHOST_USER_MAX_CONFIG_SIZE 256
@@ -58,6 +76,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
     VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
     VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
+    VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS = 13,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -98,6 +117,9 @@ typedef enum VhostUserRequest {
     VHOST_USER_GET_INFLIGHT_FD = 31,
     VHOST_USER_SET_INFLIGHT_FD = 32,
     VHOST_USER_GPU_SET_SOCKET = 33,
+    VHOST_USER_GET_MAX_MEM_SLOTS = 34,
+    VHOST_USER_ADD_MEM_REG = 35,
+    VHOST_USER_REM_MEM_REG = 36,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -119,9 +141,14 @@ typedef struct VhostUserMemoryRegion {
 typedef struct VhostUserMemory {
     uint32_t nregions;
     uint32_t padding;
-    VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
+    VhostUserMemoryRegion regions[VHOST_MEMORY_LEGACY_NREGIONS];
 } VhostUserMemory;
 
+typedef struct VhostUserMemRegMsg {
+    uint32_t padding;
+    VhostUserMemoryRegion region;
+} VhostUserMemRegMsg;
+
 typedef struct VhostUserLog {
     uint64_t mmap_size;
     uint64_t mmap_offset;
@@ -180,6 +207,7 @@ typedef union {
         struct vhost_vring_state state;
         struct vhost_vring_addr addr;
         VhostUserMemory memory;
+        VhostUserMemRegMsg mem_reg;
         VhostUserLog log;
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
@@ -208,7 +236,7 @@ struct vhost_user {
     int slave_fd;
     NotifierWithReturn postcopy_notifier;
     struct PostCopyFD  postcopy_fd;
-    uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
+    uint64_t           postcopy_client_bases[VHOST_USER_MAX_RAM_SLOTS];
     /* Length of the region_rb and region_rb_offset arrays */
     size_t             region_rb_len;
     /* RAMBlock associated with a given region */
@@ -220,6 +248,10 @@ struct vhost_user {
 
     /* True once we've entered postcopy_listen */
     bool               postcopy_listen;
+
+    /* Our current regions */
+    int num_shadow_regions;
+    VhostUserMemoryRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS];
 };
 
 static bool ioeventfd_enabled(void)
@@ -368,7 +400,7 @@ int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd)
 static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
                                    struct vhost_log *log)
 {
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int fds[VHOST_USER_MAX_RAM_SLOTS];
     size_t fd_num = 0;
     bool shmfd = virtio_has_feature(dev->protocol_features,
                                     VHOST_USER_PROTOCOL_F_LOG_SHMFD);
@@ -426,7 +458,7 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
                                      &offset);
         fd = memory_region_get_fd(mr);
         if (fd > 0) {
-            if (*fd_num == VHOST_MEMORY_MAX_NREGIONS) {
+            if (*fd_num == VHOST_MEMORY_LEGACY_NREGIONS) {
                 error_report("Failed preparing vhost-user memory table msg");
                 return -1;
             }
@@ -472,7 +504,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
                                              struct vhost_memory *mem)
 {
     struct vhost_user *u = dev->opaque;
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int fds[VHOST_MEMORY_LEGACY_NREGIONS];
     size_t fd_num = 0;
     VhostUserMsg msg_reply;
     int region_i, msg_i;
@@ -521,7 +553,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
     }
 
     memset(u->postcopy_client_bases, 0,
-           sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
+           sizeof(uint64_t) * VHOST_USER_MAX_RAM_SLOTS);
 
     /* They're in the same order as the regions that were sent
      * but some of the regions were skipped (above) if they
@@ -562,18 +594,151 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
     return 0;
 }
 
+static inline bool reg_equal(VhostUserMemoryRegion *shadow_reg,
+                             struct vhost_memory_region *vdev_reg)
+{
+    if (shadow_reg->guest_phys_addr == vdev_reg->guest_phys_addr &&
+        shadow_reg->userspace_addr == vdev_reg->userspace_addr &&
+        shadow_reg->memory_size == vdev_reg->memory_size) {
+        return true;
+    } else {
+        return false;
+    }
+}
+
+static int vhost_user_send_add_remove_regions(struct vhost_dev *dev,
+                                              struct vhost_memory *mem,
+                                              VhostUserMsg *msg)
+{
+    struct vhost_user *u = dev->opaque;
+    int i, j, fd;
+    bool found[VHOST_USER_MAX_RAM_SLOTS] = {};
+    bool matching = false;
+    struct vhost_memory_region *reg;
+    ram_addr_t offset;
+    MemoryRegion *mr;
+
+    /*
+     * Ensure the VHOST_USER_PROTOCOL_F_REPLY_ACK has not been
+     * negotiated and no postcopy migration is in progress.
+     */
+    assert(!virtio_has_feature(dev->protocol_features,
+                               VHOST_USER_PROTOCOL_F_REPLY_ACK));
+    if (u->postcopy_listen && u->postcopy_fd.handler) {
+        error_report("Postcopy migration is not supported when the "
+                     "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature "
+                     "has been negotiated");
+        return -1;
+    }
+
+    msg->hdr.size = sizeof(msg->payload.mem_reg.padding);
+    msg->hdr.size += sizeof(VhostUserMemoryRegion);
+
+    /*
+     * Send VHOST_USER_REM_MEM_REG for memory regions in our shadow state
+     * which are not found not in the device's memory state.
+     */
+    for (i = 0; i < u->num_shadow_regions; ++i) {
+        reg = dev->mem->regions;
+
+        for (j = 0; j < dev->mem->nregions; j++) {
+            reg = dev->mem->regions + j;
+
+            assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
+            mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
+                                         &offset);
+            fd = memory_region_get_fd(mr);
+
+            if (reg_equal(&u->shadow_regions[i], reg)) {
+                matching = true;
+                found[j] = true;
+                break;
+            }
+        }
+
+        if (fd > 0 && !matching) {
+            msg->hdr.request = VHOST_USER_REM_MEM_REG;
+            msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr;
+            msg->payload.mem_reg.region.memory_size = reg->memory_size;
+            msg->payload.mem_reg.region.guest_phys_addr =
+                reg->guest_phys_addr;
+            msg->payload.mem_reg.region.mmap_offset = offset;
+
+            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
+                return -1;
+            }
+        }
+    }
+
+    /*
+     * Send messages to add regions present in the device which are not
+     * in our shadow state.
+     */
+    for (i = 0; i < dev->mem->nregions; ++i) {
+        reg = dev->mem->regions + i;
+
+        /*
+         * If the region was in both the shadow and vdev state we don't
+         * need to send a VHOST_USER_ADD_MEM_REG message for it.
+         */
+        if (found[i]) {
+            continue;
+        }
+
+        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
+        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
+                                     &offset);
+        fd = memory_region_get_fd(mr);
+
+        if (fd > 0) {
+            msg->hdr.request = VHOST_USER_ADD_MEM_REG;
+            msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr;
+            msg->payload.mem_reg.region.memory_size = reg->memory_size;
+            msg->payload.mem_reg.region.guest_phys_addr =
+                reg->guest_phys_addr;
+            msg->payload.mem_reg.region.mmap_offset = offset;
+
+            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
+                return -1;
+            }
+        }
+    }
+
+    /* Make our shadow state match the updated device state. */
+    u->num_shadow_regions = dev->mem->nregions;
+    for (i = 0; i < dev->mem->nregions; ++i) {
+        reg = dev->mem->regions + i;
+        u->shadow_regions[i].guest_phys_addr = reg->guest_phys_addr;
+        u->shadow_regions[i].userspace_addr = reg->userspace_addr;
+        u->shadow_regions[i].memory_size = reg->memory_size;
+    }
+
+    return 0;
+}
+
 static int vhost_user_set_mem_table(struct vhost_dev *dev,
                                     struct vhost_memory *mem)
 {
     struct vhost_user *u = dev->opaque;
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int fds[VHOST_MEMORY_LEGACY_NREGIONS];
     size_t fd_num = 0;
     bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
     bool reply_supported = virtio_has_feature(dev->protocol_features,
                                               VHOST_USER_PROTOCOL_F_REPLY_ACK);
+    bool config_slots =
+        virtio_has_feature(dev->protocol_features,
+                           VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS);
 
     if (do_postcopy) {
-        /* Postcopy has enough differences that it's best done in it's own
+        if (config_slots) {
+            error_report("Postcopy migration not supported with "
+                          "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature "
+                          "enabled.");
+            return -1;
+        }
+
+        /*
+         * Postcopy has enough differences that it's best done in it's own
          * version
          */
         return vhost_user_set_mem_table_postcopy(dev, mem);
@@ -587,17 +752,22 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
         msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
     }
 
-    if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
-                                          false) < 0) {
-        return -1;
-    }
-
-    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
-        return -1;
-    }
+    if (config_slots && !reply_supported) {
+        if (vhost_user_send_add_remove_regions(dev, mem, &msg) < 0) {
+            return -1;
+        }
+    } else {
+        if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
+                                              false) < 0) {
+            return -1;
+        }
+        if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
+            return -1;
+        }
 
-    if (reply_supported) {
-        return process_message_reply(dev, &msg);
+        if (reply_supported) {
+            return process_message_reply(dev, &msg);
+        }
     }
 
     return 0;
@@ -762,7 +932,7 @@ static int vhost_set_vring_file(struct vhost_dev *dev,
                                 VhostUserRequest request,
                                 struct vhost_vring_file *file)
 {
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int fds[VHOST_USER_MAX_RAM_SLOTS];
     size_t fd_num = 0;
     VhostUserMsg msg = {
         .hdr.request = request,
@@ -1496,7 +1666,46 @@ static int vhost_user_get_vq_index(struct vhost_dev *dev, int idx)
 
 static int vhost_user_memslots_limit(struct vhost_dev *dev)
 {
-    return VHOST_MEMORY_MAX_NREGIONS;
+    VhostUserMsg msg = {
+        .hdr.request = VHOST_USER_GET_MAX_MEM_SLOTS,
+        .hdr.flags = VHOST_USER_VERSION,
+    };
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS)) {
+        return VHOST_MEMORY_LEGACY_NREGIONS;
+    }
+
+    if (virtio_has_feature(dev->protocol_features,
+                           VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
+        error_report("The VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol "
+                     "feature is not supported when the "
+                     "VHOST_USER_PROTOCOL_F_REPLY_ACK features has also "
+                     "been negotiated");
+        return -1;
+    }
+
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        return -1;
+    }
+
+    if (vhost_user_read(dev, &msg) < 0) {
+        return -1;
+    }
+
+    if (msg.hdr.request != VHOST_USER_GET_MAX_MEM_SLOTS) {
+        error_report("Received unexpected msg type. Expected %d received %d",
+                     VHOST_USER_GET_MAX_MEM_SLOTS, msg.hdr.request);
+        return -1;
+    }
+
+    if (msg.hdr.size != sizeof(msg.payload.u64)) {
+        error_report("Received bad msg size");
+        return -1;
+    }
+
+    return MIN(MAX(msg.payload.u64, VHOST_MEMORY_LEGACY_NREGIONS),
+               VHOST_USER_MAX_RAM_SLOTS);
 }
 
 static bool vhost_user_requires_shm_log(struct vhost_dev *dev)
-- 
1.8.3.1



  parent reply	other threads:[~2019-12-17  1:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-09  7:00 [RFC PATCH 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
2019-12-09  7:00 ` [RFC PATCH 1/3] Fixed Error Handling in vhost_user_set_mem_table_postcopy Raphael Norwitz
2020-01-14  7:07   ` Michael S. Tsirkin
2020-01-16  3:16     ` Raphael Norwitz
2019-12-09  7:00 ` [RFC PATCH 2/3] vhost-user: Refactor vhost_user_set_mem_table Functions Raphael Norwitz
2019-12-09  7:00 ` Raphael Norwitz [this message]
2020-01-14  7:12   ` [RFC PATCH 3/3] Introduce Configurable Number of Memory Slots Exposed by vhost-user: Michael S. Tsirkin
2020-01-16  3:23     ` Raphael Norwitz
2020-01-22  8:57       ` Michael S. Tsirkin
2020-01-13 19:43 ` [RFC PATCH 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1575874847-5792-4-git-send-email-raphael.norwitz@nutanix.com \
    --to=raphael.norwitz@nutanix.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=raphael.s.norwitz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).