* [RFC PATCH 0/3] vhost-user: Lift Max Ram Slots Limitation @ 2019-12-09 7:00 Raphael Norwitz 2019-12-09 7:00 ` [RFC PATCH 1/3] Fixed Error Handling in vhost_user_set_mem_table_postcopy Raphael Norwitz ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Raphael Norwitz @ 2019-12-09 7:00 UTC (permalink / raw) To: mst, qemu-devel; +Cc: raphael.s.norwitz, Raphael Norwitz In QEMU today, a VM with a vhost-user device can hot add memory a maximum of 8 times. See these threads, among others: [1] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01046.html https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01236.html [2] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html This RFC/patch set introduces a new protocol feature VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS which, when enabled, lifts the restriction on the maximum number RAM slots imposed by vhost-user. The patch consists of 3 changes: 1. Fixed Error Handling in vhost_user_set_mem_table_postcopy: This is a bug fix in the postcopy migration path 2. vhost-user: Refactor vhost_user_set_mem_table Functions: This is a non-functional change refractoring the vhost_user_set_mem_table and vhost_user_set_mem_table_postcopy functions such that the feature can be more cleanly added. 3. Introduce Configurable Number of Memory Slots Exposed by vhost-user: This change introduces the new protocol feature VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS. The implementation details are explained in more detail in the commit messages, but at a high level the new protocol feature works as follows: - If the VHOST_USER_PROTCOL_F_CONFIGURE_SLOTS feature is enabled, QEMU will send multiple VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages to map and unmap individual memory regions instead of one large VHOST_USER_SET_MEM_TABLE message containing all memory regions. - The vhost-user struct maintains a ’shadow state’ of memory regions already sent to the guest. Each time vhost_user_set_mem_table is called, the shadow state is compared with the new device state. A VHOST_USER_REM_MEM_REG will be sent for each region in the shadow state not in the device state. Then, a VHOST_USER_ADD_MEM_REG will be sent for each region in the device state but not the shadow state. After these messages have been sent, the shadow state will be updated to reflect the new device state. The VHOST_USER_SET_MEM_TABLE message was not reused because as the number of regions grows, the message becomes very large. In practice, such large messages caused problems (truncated messages) and in the past it seems the community has opted for smaller fixed size messages where possible. VRINGs, for example, are sent to the backend individually instead of in one massive message. Current Limitations: - postcopy migration is not supported when the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS has been negotiated. - VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS cannot be negotiated when VHOST_USER_PROTOCOL_F_REPLY_ACK has also been negotiated. Both of these limitations are due to resource contraints. They are not imposed for technical reasons. Questions: - In the event transmitting a VHOST_USER_ADD_MEM_REG or VHOST_USER_REM_REG message fails, is there any reason the error handling should differ from when transmitting VHOST_USER_SET_MEM_TABLE message fails? - Is there a cleaner way to ensure to ensure a postcopy migration cannot be started with this protocol feature enabled? Best, Raphael Raphael Norwitz (3): Fixed Error Handling in vhost_user_set_mem_table_postcopy vhost-user: Refactor vhost_user_set_mem_table Functions Introduce Configurable Number of Memory Slots Exposed by vhost-user: docs/interop/vhost-user.rst | 43 +++++ hw/virtio/vhost-user.c | 384 +++++++++++++++++++++++++++++++++----------- 2 files changed, 335 insertions(+), 92 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 1/3] Fixed Error Handling in vhost_user_set_mem_table_postcopy 2019-12-09 7:00 [RFC PATCH 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz @ 2019-12-09 7:00 ` Raphael Norwitz 2020-01-14 7:07 ` Michael S. Tsirkin 2019-12-09 7:00 ` [RFC PATCH 2/3] vhost-user: Refactor vhost_user_set_mem_table Functions Raphael Norwitz ` (2 subsequent siblings) 3 siblings, 1 reply; 10+ messages in thread From: Raphael Norwitz @ 2019-12-09 7:00 UTC (permalink / raw) To: mst, qemu-devel; +Cc: raphael.s.norwitz, Raphael Norwitz The current vhost_user_set_mem_table_postcopy() implementation populates each region of the VHOST_USER_SET_MEM_TABLE message without first checking if there are more than VHOST_MEMORY_MAX_NREGIONS already populated. This can cause memory corruption and potentially a crash if too many regions are added to the message during the postcopy step. Additionally, after populating each region, the current implementation asserts that the current region index is less than VHOST_MEMORY_MAX_NREGIONS. Thus, even if the aforementioned bug is fixed by moving the existing assert up, too many hot-adds during the postcopy step will bring down qemu instead of gracefully propogating up the error as in vhost_user_set_mem_table(). This change cleans up error handling in vhost_user_set_mem_table_postcopy() such that it handles an unsupported number of memory hot-adds like vhost_user_set_mem_table(), gracefully propogating an error up instead of corrupting memory and crashing qemu. Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com> --- hw/virtio/vhost-user.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 02a9b25..f74ff3b 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -441,6 +441,10 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, &offset); fd = memory_region_get_fd(mr); if (fd > 0) { + if (fd_num == VHOST_MEMORY_MAX_NREGIONS) { + error_report("Failed preparing vhost-user memory table msg"); + return -1; + } trace_vhost_user_set_mem_table_withfd(fd_num, mr->name, reg->memory_size, reg->guest_phys_addr, @@ -453,7 +457,6 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr; msg.payload.memory.regions[fd_num].mmap_offset = offset; - assert(fd_num < VHOST_MEMORY_MAX_NREGIONS); fds[fd_num++] = fd; } else { u->region_rb_offset[i] = 0; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 1/3] Fixed Error Handling in vhost_user_set_mem_table_postcopy 2019-12-09 7:00 ` [RFC PATCH 1/3] Fixed Error Handling in vhost_user_set_mem_table_postcopy Raphael Norwitz @ 2020-01-14 7:07 ` Michael S. Tsirkin 2020-01-16 3:16 ` Raphael Norwitz 0 siblings, 1 reply; 10+ messages in thread From: Michael S. Tsirkin @ 2020-01-14 7:07 UTC (permalink / raw) To: Raphael Norwitz; +Cc: raphael.s.norwitz, qemu-devel On Mon, Dec 09, 2019 at 02:00:45AM -0500, Raphael Norwitz wrote: > The current vhost_user_set_mem_table_postcopy() implementation > populates each region of the VHOST_USER_SET_MEM_TABLE > message without first checking if there are more than > VHOST_MEMORY_MAX_NREGIONS already populated. This can > cause memory corruption and potentially a crash if too many > regions are added to the message during the postcopy step. > > Additionally, after populating each region, the current > implementation asserts that the current region index is less than > VHOST_MEMORY_MAX_NREGIONS. Thus, even if the aforementioned > bug is fixed by moving the existing assert up, too many hot-adds > during the postcopy step will bring down qemu instead of > gracefully propogating up the error as in > vhost_user_set_mem_table(). > > This change cleans up error handling in > vhost_user_set_mem_table_postcopy() such that it handles > an unsupported number of memory hot-adds like > vhost_user_set_mem_table(), gracefully propogating an error > up instead of corrupting memory and crashing qemu. > > Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com> Unfortunately all the nice error handling is mostly futile since vhost_commit does: r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem); if (r < 0) { VHOST_OPS_DEBUG("vhost_set_mem_table failed"); } goto out; so the reason there's an assert is that we really must never hit this path at all. So moving assert up seems cleaner. > --- > hw/virtio/vhost-user.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > index 02a9b25..f74ff3b 100644 > --- a/hw/virtio/vhost-user.c > +++ b/hw/virtio/vhost-user.c > @@ -441,6 +441,10 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > &offset); > fd = memory_region_get_fd(mr); > if (fd > 0) { > + if (fd_num == VHOST_MEMORY_MAX_NREGIONS) { > + error_report("Failed preparing vhost-user memory table msg"); > + return -1; > + } > trace_vhost_user_set_mem_table_withfd(fd_num, mr->name, > reg->memory_size, > reg->guest_phys_addr, > @@ -453,7 +457,6 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > msg.payload.memory.regions[fd_num].guest_phys_addr = > reg->guest_phys_addr; > msg.payload.memory.regions[fd_num].mmap_offset = offset; > - assert(fd_num < VHOST_MEMORY_MAX_NREGIONS); > fds[fd_num++] = fd; > } else { > u->region_rb_offset[i] = 0; > -- > 1.8.3.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 1/3] Fixed Error Handling in vhost_user_set_mem_table_postcopy 2020-01-14 7:07 ` Michael S. Tsirkin @ 2020-01-16 3:16 ` Raphael Norwitz 0 siblings, 0 replies; 10+ messages in thread From: Raphael Norwitz @ 2020-01-16 3:16 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: qemu-devel Makes sense - will fix On Tue, Jan 14, 2020 at 02:07:03AM -0500, Michael S. Tsirkin wrote: > > On Mon, Dec 09, 2019 at 02:00:45AM -0500, Raphael Norwitz wrote: > > The current vhost_user_set_mem_table_postcopy() implementation > > populates each region of the VHOST_USER_SET_MEM_TABLE > > message without first checking if there are more than > > VHOST_MEMORY_MAX_NREGIONS already populated. This can > > cause memory corruption and potentially a crash if too many > > regions are added to the message during the postcopy step. > > > > Additionally, after populating each region, the current > > implementation asserts that the current region index is less than > > VHOST_MEMORY_MAX_NREGIONS. Thus, even if the aforementioned > > bug is fixed by moving the existing assert up, too many hot-adds > > during the postcopy step will bring down qemu instead of > > gracefully propogating up the error as in > > vhost_user_set_mem_table(). > > > > This change cleans up error handling in > > vhost_user_set_mem_table_postcopy() such that it handles > > an unsupported number of memory hot-adds like > > vhost_user_set_mem_table(), gracefully propogating an error > > up instead of corrupting memory and crashing qemu. > > > > Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com> > > Unfortunately all the nice error handling is mostly > futile since vhost_commit does: > > r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem); > if (r < 0) { > VHOST_OPS_DEBUG("vhost_set_mem_table failed"); > } > goto out; > > so the reason there's an assert is that we really must never > hit this path at all. So moving assert up seems cleaner. > > > --- > > hw/virtio/vhost-user.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > index 02a9b25..f74ff3b 100644 > > --- a/hw/virtio/vhost-user.c > > +++ b/hw/virtio/vhost-user.c > > @@ -441,6 +441,10 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > > &offset); > > fd = memory_region_get_fd(mr); > > if (fd > 0) { > > + if (fd_num == VHOST_MEMORY_MAX_NREGIONS) { > > + error_report("Failed preparing vhost-user memory table msg"); > > + return -1; > > + } > > trace_vhost_user_set_mem_table_withfd(fd_num, mr->name, > > reg->memory_size, > > reg->guest_phys_addr, > > @@ -453,7 +457,6 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > > msg.payload.memory.regions[fd_num].guest_phys_addr = > > reg->guest_phys_addr; > > msg.payload.memory.regions[fd_num].mmap_offset = offset; > > - assert(fd_num < VHOST_MEMORY_MAX_NREGIONS); > > fds[fd_num++] = fd; > > } else { > > u->region_rb_offset[i] = 0; > > -- > > 1.8.3.1 > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 2/3] vhost-user: Refactor vhost_user_set_mem_table Functions 2019-12-09 7:00 [RFC PATCH 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz 2019-12-09 7:00 ` [RFC PATCH 1/3] Fixed Error Handling in vhost_user_set_mem_table_postcopy Raphael Norwitz @ 2019-12-09 7:00 ` Raphael Norwitz 2019-12-09 7:00 ` [RFC PATCH 3/3] Introduce Configurable Number of Memory Slots Exposed by vhost-user: Raphael Norwitz 2020-01-13 19:43 ` [RFC PATCH 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz 3 siblings, 0 replies; 10+ messages in thread From: Raphael Norwitz @ 2019-12-09 7:00 UTC (permalink / raw) To: mst, qemu-devel; +Cc: raphael.s.norwitz, Raphael Norwitz vhost_user_set_mem_table() and vhost_user_set_mem_table_postcopy() have gotten convoluted, and have some identical code. This change moves the logic populating the VhostUserMemory struct and fds array from vhost_user_set_mem_table() and vhost_user_set_mem_table_postcopy() to a new function, vhost_user_fill_set_mem_table_msg(). No functionality is impacted. Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com> --- hw/virtio/vhost-user.c | 144 +++++++++++++++++++++++-------------------------- 1 file changed, 66 insertions(+), 78 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index f74ff3b..2134e81 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -405,76 +405,97 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, return 0; } -static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, - struct vhost_memory *mem) +static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u, + struct vhost_dev *dev, + VhostUserMsg *msg, + int *fds, size_t *fd_num, + bool postcopy) { - struct vhost_user *u = dev->opaque; - int fds[VHOST_MEMORY_MAX_NREGIONS]; int i, fd; - size_t fd_num = 0; - VhostUserMsg msg_reply; - int region_i, msg_i; + ram_addr_t offset; + MemoryRegion *mr; + struct vhost_memory_region *reg; - VhostUserMsg msg = { - .hdr.request = VHOST_USER_SET_MEM_TABLE, - .hdr.flags = VHOST_USER_VERSION, - }; - - if (u->region_rb_len < dev->mem->nregions) { - u->region_rb = g_renew(RAMBlock*, u->region_rb, dev->mem->nregions); - u->region_rb_offset = g_renew(ram_addr_t, u->region_rb_offset, - dev->mem->nregions); - memset(&(u->region_rb[u->region_rb_len]), '\0', - sizeof(RAMBlock *) * (dev->mem->nregions - u->region_rb_len)); - memset(&(u->region_rb_offset[u->region_rb_len]), '\0', - sizeof(ram_addr_t) * (dev->mem->nregions - u->region_rb_len)); - u->region_rb_len = dev->mem->nregions; - } + msg->hdr.request = VHOST_USER_SET_MEM_TABLE; for (i = 0; i < dev->mem->nregions; ++i) { - struct vhost_memory_region *reg = dev->mem->regions + i; - ram_addr_t offset; - MemoryRegion *mr; + reg = dev->mem->regions + i; assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, &offset); fd = memory_region_get_fd(mr); if (fd > 0) { - if (fd_num == VHOST_MEMORY_MAX_NREGIONS) { + if (*fd_num == VHOST_MEMORY_MAX_NREGIONS) { error_report("Failed preparing vhost-user memory table msg"); return -1; } - trace_vhost_user_set_mem_table_withfd(fd_num, mr->name, - reg->memory_size, - reg->guest_phys_addr, - reg->userspace_addr, offset); - u->region_rb_offset[i] = offset; - u->region_rb[i] = mr->ram_block; - msg.payload.memory.regions[fd_num].userspace_addr = + if (postcopy) { + trace_vhost_user_set_mem_table_withfd(*fd_num, mr->name, + reg->memory_size, + reg->guest_phys_addr, + reg->userspace_addr, + offset); + u->region_rb_offset[i] = offset; + u->region_rb[i] = mr->ram_block; + } + msg->payload.memory.regions[*fd_num].userspace_addr = reg->userspace_addr; - msg.payload.memory.regions[fd_num].memory_size = reg->memory_size; - msg.payload.memory.regions[fd_num].guest_phys_addr = + msg->payload.memory.regions[*fd_num].memory_size = + reg->memory_size; + msg->payload.memory.regions[*fd_num].guest_phys_addr = reg->guest_phys_addr; - msg.payload.memory.regions[fd_num].mmap_offset = offset; - fds[fd_num++] = fd; - } else { + msg->payload.memory.regions[*fd_num].mmap_offset = offset; + fds[(*fd_num)++] = fd; + } else if (postcopy) { u->region_rb_offset[i] = 0; u->region_rb[i] = NULL; } } - msg.payload.memory.nregions = fd_num; + msg->payload.memory.nregions = *fd_num; - if (!fd_num) { + if (!*fd_num) { error_report("Failed initializing vhost-user memory map, " "consider using -object memory-backend-file share=on"); return -1; } - msg.hdr.size = sizeof(msg.payload.memory.nregions); - msg.hdr.size += sizeof(msg.payload.memory.padding); - msg.hdr.size += fd_num * sizeof(VhostUserMemoryRegion); + msg->hdr.size = sizeof(msg->payload.memory.nregions); + msg->hdr.size += sizeof(msg->payload.memory.padding); + msg->hdr.size += *fd_num * sizeof(VhostUserMemoryRegion); + + return 1; +} + +static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, + struct vhost_memory *mem) +{ + struct vhost_user *u = dev->opaque; + int fds[VHOST_MEMORY_MAX_NREGIONS]; + size_t fd_num = 0; + VhostUserMsg msg_reply; + int region_i, msg_i; + + VhostUserMsg msg = { + .hdr.flags = VHOST_USER_VERSION, + }; + + if (u->region_rb_len < dev->mem->nregions) { + u->region_rb = g_renew(RAMBlock*, u->region_rb, dev->mem->nregions); + u->region_rb_offset = g_renew(ram_addr_t, u->region_rb_offset, + dev->mem->nregions); + memset(&(u->region_rb[u->region_rb_len]), '\0', + sizeof(RAMBlock *) * (dev->mem->nregions - u->region_rb_len)); + memset(&(u->region_rb_offset[u->region_rb_len]), '\0', + sizeof(ram_addr_t) * (dev->mem->nregions - u->region_rb_len)); + u->region_rb_len = dev->mem->nregions; + } + + if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num, + true) < 0) { + return -1; + } if (vhost_user_write(dev, &msg, fds, fd_num) < 0) { return -1; @@ -546,7 +567,6 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, { struct vhost_user *u = dev->opaque; int fds[VHOST_MEMORY_MAX_NREGIONS]; - int i, fd; size_t fd_num = 0; bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler; bool reply_supported = virtio_has_feature(dev->protocol_features, @@ -560,7 +580,6 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, } VhostUserMsg msg = { - .hdr.request = VHOST_USER_SET_MEM_TABLE, .hdr.flags = VHOST_USER_VERSION, }; @@ -568,42 +587,11 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK; } - for (i = 0; i < dev->mem->nregions; ++i) { - struct vhost_memory_region *reg = dev->mem->regions + i; - ram_addr_t offset; - MemoryRegion *mr; - - assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); - mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, - &offset); - fd = memory_region_get_fd(mr); - if (fd > 0) { - if (fd_num == VHOST_MEMORY_MAX_NREGIONS) { - error_report("Failed preparing vhost-user memory table msg"); - return -1; - } - msg.payload.memory.regions[fd_num].userspace_addr = - reg->userspace_addr; - msg.payload.memory.regions[fd_num].memory_size = reg->memory_size; - msg.payload.memory.regions[fd_num].guest_phys_addr = - reg->guest_phys_addr; - msg.payload.memory.regions[fd_num].mmap_offset = offset; - fds[fd_num++] = fd; - } - } - - msg.payload.memory.nregions = fd_num; - - if (!fd_num) { - error_report("Failed initializing vhost-user memory map, " - "consider using -object memory-backend-file share=on"); + if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num, + false) < 0) { return -1; } - msg.hdr.size = sizeof(msg.payload.memory.nregions); - msg.hdr.size += sizeof(msg.payload.memory.padding); - msg.hdr.size += fd_num * sizeof(VhostUserMemoryRegion); - if (vhost_user_write(dev, &msg, fds, fd_num) < 0) { return -1; } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 3/3] Introduce Configurable Number of Memory Slots Exposed by vhost-user: 2019-12-09 7:00 [RFC PATCH 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz 2019-12-09 7:00 ` [RFC PATCH 1/3] Fixed Error Handling in vhost_user_set_mem_table_postcopy Raphael Norwitz 2019-12-09 7:00 ` [RFC PATCH 2/3] vhost-user: Refactor vhost_user_set_mem_table Functions Raphael Norwitz @ 2019-12-09 7:00 ` Raphael Norwitz 2020-01-14 7:12 ` Michael S. Tsirkin 2020-01-13 19:43 ` [RFC PATCH 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz 3 siblings, 1 reply; 10+ messages in thread From: Raphael Norwitz @ 2019-12-09 7:00 UTC (permalink / raw) To: mst, qemu-devel; +Cc: raphael.s.norwitz, Raphael Norwitz The current vhost-user implementation in Qemu imposes a limit on the maximum number of memory slots exposed to a VM using a vhost-user device. This change provides a new protocol feature VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit and allows a VM with a vhost-user device to expose a configurable number of memory slots, up to the maximum supported by the platform. Existing backends are unaffected. This feature works by using three new messages, VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the number of memory slots the backend is willing to accept. Then, when the memory tables are set or updated, a series of VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages are sent to transmit the regions to map and/or unmap instead of trying to send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE message. The vhost_user struct maintains a shadow state of the VM’s memory regions. When the memory tables are modified, the vhost_user_set_mem_table() function compares the new device memory state to the shadow state and only sends regions which need to be unmapped or mapped in. The regions which must be unmapped are sent first, followed by the new regions to be mapped in. After all the messages have been sent, the shadow state is set to the current virtual device state. The current feature implementation does not work with postcopy migration and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated. Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com> --- docs/interop/vhost-user.rst | 43 ++++++++ hw/virtio/vhost-user.c | 251 ++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 273 insertions(+), 21 deletions(-) diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst index 7827b71..855a072 100644 --- a/docs/interop/vhost-user.rst +++ b/docs/interop/vhost-user.rst @@ -785,6 +785,7 @@ Protocol features #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11 #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12 + #define VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS 13 Master message types -------------------- @@ -1190,6 +1191,48 @@ Master message types ancillary data. The GPU protocol is used to inform the master of rendering state and updates. See vhost-user-gpu.rst for details. +``VHOST_USER_GET_MAX_MEM_SLOTS`` + :id: 34 + :equivalent ioctl: N/A + :slave payload: u64 + + When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been + successfully negotiated, this message is submitted by master to the + slave. The slave should return the message with a u64 payload + containing the maximum number of memory slots for QEMU to expose to + the guest. This message is not supported with postcopy migration or if + the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated. + +``VHOST_USER_ADD_MEM_REG`` + :id: 35 + :equivalent ioctl: N/A + :slave payload: memory region + + When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been + successfully negotiated, this message is submitted by master to the slave. + The message payload contains a memory region descriptor struct, describing + a region of guest memory which the slave device must map in. When the + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully + negotiated, along with the VHOST_USER_REM_MEM_REG message, this message is + used to set and update the memory tables of the slave device. This message + is not supported with postcopy migration or if the + VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated. + +``VHOST_USER_REM_MEM_REG`` + :id: 36 + :equivalent ioctl: N/A + :slave payload: memory region + + When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been + successfully negotiated, this message is submitted by master to the slave. + The message payload contains a memory region descriptor struct, describing + a region of guest memory which the slave device must unmap. When the + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully + negotiated, along with the VHOST_USER_ADD_MEM_REG message, this message is + used to set and update the memory tables of the slave device. This message + is not supported with postcopy migration or if the + VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated. + Slave message types ------------------- diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 2134e81..3432462 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -35,11 +35,29 @@ #include <linux/userfaultfd.h> #endif -#define VHOST_MEMORY_MAX_NREGIONS 8 +#define VHOST_MEMORY_LEGACY_NREGIONS 8 #define VHOST_USER_F_PROTOCOL_FEATURES 30 #define VHOST_USER_SLAVE_MAX_FDS 8 /* + * Set maximum number of RAM slots supported to + * the maximum number supported by the target + * hardware plaform. + */ +#if defined(TARGET_X86) || defined(TARGET_X86_64) || \ + defined(TARGET_ARM) || defined(TARGET_ARM_64) +#include "hw/acpi/acpi.h" +#define VHOST_USER_MAX_RAM_SLOTS ACPI_MAX_RAM_SLOTS + +#elif defined(TARGET_PPC) || defined(TARGET_PPC_64) +#include "hw/ppc/spapr.h" +#define VHOST_USER_MAX_RAM_SLOTS SPAPR_MAX_RAM_SLOTS + +#else +#define VHOST_USER_MAX_RAM_SLOTS 512 +#endif + +/* * Maximum size of virtio device config space */ #define VHOST_USER_MAX_CONFIG_SIZE 256 @@ -58,6 +76,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10, VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11, VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12, + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS = 13, VHOST_USER_PROTOCOL_F_MAX }; @@ -98,6 +117,9 @@ typedef enum VhostUserRequest { VHOST_USER_GET_INFLIGHT_FD = 31, VHOST_USER_SET_INFLIGHT_FD = 32, VHOST_USER_GPU_SET_SOCKET = 33, + VHOST_USER_GET_MAX_MEM_SLOTS = 34, + VHOST_USER_ADD_MEM_REG = 35, + VHOST_USER_REM_MEM_REG = 36, VHOST_USER_MAX } VhostUserRequest; @@ -119,9 +141,14 @@ typedef struct VhostUserMemoryRegion { typedef struct VhostUserMemory { uint32_t nregions; uint32_t padding; - VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS]; + VhostUserMemoryRegion regions[VHOST_MEMORY_LEGACY_NREGIONS]; } VhostUserMemory; +typedef struct VhostUserMemRegMsg { + uint32_t padding; + VhostUserMemoryRegion region; +} VhostUserMemRegMsg; + typedef struct VhostUserLog { uint64_t mmap_size; uint64_t mmap_offset; @@ -180,6 +207,7 @@ typedef union { struct vhost_vring_state state; struct vhost_vring_addr addr; VhostUserMemory memory; + VhostUserMemRegMsg mem_reg; VhostUserLog log; struct vhost_iotlb_msg iotlb; VhostUserConfig config; @@ -208,7 +236,7 @@ struct vhost_user { int slave_fd; NotifierWithReturn postcopy_notifier; struct PostCopyFD postcopy_fd; - uint64_t postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS]; + uint64_t postcopy_client_bases[VHOST_USER_MAX_RAM_SLOTS]; /* Length of the region_rb and region_rb_offset arrays */ size_t region_rb_len; /* RAMBlock associated with a given region */ @@ -220,6 +248,10 @@ struct vhost_user { /* True once we've entered postcopy_listen */ bool postcopy_listen; + + /* Our current regions */ + int num_shadow_regions; + VhostUserMemoryRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS]; }; static bool ioeventfd_enabled(void) @@ -368,7 +400,7 @@ int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd) static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, struct vhost_log *log) { - int fds[VHOST_MEMORY_MAX_NREGIONS]; + int fds[VHOST_USER_MAX_RAM_SLOTS]; size_t fd_num = 0; bool shmfd = virtio_has_feature(dev->protocol_features, VHOST_USER_PROTOCOL_F_LOG_SHMFD); @@ -426,7 +458,7 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u, &offset); fd = memory_region_get_fd(mr); if (fd > 0) { - if (*fd_num == VHOST_MEMORY_MAX_NREGIONS) { + if (*fd_num == VHOST_MEMORY_LEGACY_NREGIONS) { error_report("Failed preparing vhost-user memory table msg"); return -1; } @@ -472,7 +504,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, struct vhost_memory *mem) { struct vhost_user *u = dev->opaque; - int fds[VHOST_MEMORY_MAX_NREGIONS]; + int fds[VHOST_MEMORY_LEGACY_NREGIONS]; size_t fd_num = 0; VhostUserMsg msg_reply; int region_i, msg_i; @@ -521,7 +553,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, } memset(u->postcopy_client_bases, 0, - sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS); + sizeof(uint64_t) * VHOST_USER_MAX_RAM_SLOTS); /* They're in the same order as the regions that were sent * but some of the regions were skipped (above) if they @@ -562,18 +594,151 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, return 0; } +static inline bool reg_equal(VhostUserMemoryRegion *shadow_reg, + struct vhost_memory_region *vdev_reg) +{ + if (shadow_reg->guest_phys_addr == vdev_reg->guest_phys_addr && + shadow_reg->userspace_addr == vdev_reg->userspace_addr && + shadow_reg->memory_size == vdev_reg->memory_size) { + return true; + } else { + return false; + } +} + +static int vhost_user_send_add_remove_regions(struct vhost_dev *dev, + struct vhost_memory *mem, + VhostUserMsg *msg) +{ + struct vhost_user *u = dev->opaque; + int i, j, fd; + bool found[VHOST_USER_MAX_RAM_SLOTS] = {}; + bool matching = false; + struct vhost_memory_region *reg; + ram_addr_t offset; + MemoryRegion *mr; + + /* + * Ensure the VHOST_USER_PROTOCOL_F_REPLY_ACK has not been + * negotiated and no postcopy migration is in progress. + */ + assert(!virtio_has_feature(dev->protocol_features, + VHOST_USER_PROTOCOL_F_REPLY_ACK)); + if (u->postcopy_listen && u->postcopy_fd.handler) { + error_report("Postcopy migration is not supported when the " + "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature " + "has been negotiated"); + return -1; + } + + msg->hdr.size = sizeof(msg->payload.mem_reg.padding); + msg->hdr.size += sizeof(VhostUserMemoryRegion); + + /* + * Send VHOST_USER_REM_MEM_REG for memory regions in our shadow state + * which are not found not in the device's memory state. + */ + for (i = 0; i < u->num_shadow_regions; ++i) { + reg = dev->mem->regions; + + for (j = 0; j < dev->mem->nregions; j++) { + reg = dev->mem->regions + j; + + assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); + mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, + &offset); + fd = memory_region_get_fd(mr); + + if (reg_equal(&u->shadow_regions[i], reg)) { + matching = true; + found[j] = true; + break; + } + } + + if (fd > 0 && !matching) { + msg->hdr.request = VHOST_USER_REM_MEM_REG; + msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr; + msg->payload.mem_reg.region.memory_size = reg->memory_size; + msg->payload.mem_reg.region.guest_phys_addr = + reg->guest_phys_addr; + msg->payload.mem_reg.region.mmap_offset = offset; + + if (vhost_user_write(dev, msg, &fd, 1) < 0) { + return -1; + } + } + } + + /* + * Send messages to add regions present in the device which are not + * in our shadow state. + */ + for (i = 0; i < dev->mem->nregions; ++i) { + reg = dev->mem->regions + i; + + /* + * If the region was in both the shadow and vdev state we don't + * need to send a VHOST_USER_ADD_MEM_REG message for it. + */ + if (found[i]) { + continue; + } + + assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); + mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, + &offset); + fd = memory_region_get_fd(mr); + + if (fd > 0) { + msg->hdr.request = VHOST_USER_ADD_MEM_REG; + msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr; + msg->payload.mem_reg.region.memory_size = reg->memory_size; + msg->payload.mem_reg.region.guest_phys_addr = + reg->guest_phys_addr; + msg->payload.mem_reg.region.mmap_offset = offset; + + if (vhost_user_write(dev, msg, &fd, 1) < 0) { + return -1; + } + } + } + + /* Make our shadow state match the updated device state. */ + u->num_shadow_regions = dev->mem->nregions; + for (i = 0; i < dev->mem->nregions; ++i) { + reg = dev->mem->regions + i; + u->shadow_regions[i].guest_phys_addr = reg->guest_phys_addr; + u->shadow_regions[i].userspace_addr = reg->userspace_addr; + u->shadow_regions[i].memory_size = reg->memory_size; + } + + return 0; +} + static int vhost_user_set_mem_table(struct vhost_dev *dev, struct vhost_memory *mem) { struct vhost_user *u = dev->opaque; - int fds[VHOST_MEMORY_MAX_NREGIONS]; + int fds[VHOST_MEMORY_LEGACY_NREGIONS]; size_t fd_num = 0; bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler; bool reply_supported = virtio_has_feature(dev->protocol_features, VHOST_USER_PROTOCOL_F_REPLY_ACK); + bool config_slots = + virtio_has_feature(dev->protocol_features, + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS); if (do_postcopy) { - /* Postcopy has enough differences that it's best done in it's own + if (config_slots) { + error_report("Postcopy migration not supported with " + "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature " + "enabled."); + return -1; + } + + /* + * Postcopy has enough differences that it's best done in it's own * version */ return vhost_user_set_mem_table_postcopy(dev, mem); @@ -587,17 +752,22 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK; } - if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num, - false) < 0) { - return -1; - } - - if (vhost_user_write(dev, &msg, fds, fd_num) < 0) { - return -1; - } + if (config_slots && !reply_supported) { + if (vhost_user_send_add_remove_regions(dev, mem, &msg) < 0) { + return -1; + } + } else { + if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num, + false) < 0) { + return -1; + } + if (vhost_user_write(dev, &msg, fds, fd_num) < 0) { + return -1; + } - if (reply_supported) { - return process_message_reply(dev, &msg); + if (reply_supported) { + return process_message_reply(dev, &msg); + } } return 0; @@ -762,7 +932,7 @@ static int vhost_set_vring_file(struct vhost_dev *dev, VhostUserRequest request, struct vhost_vring_file *file) { - int fds[VHOST_MEMORY_MAX_NREGIONS]; + int fds[VHOST_USER_MAX_RAM_SLOTS]; size_t fd_num = 0; VhostUserMsg msg = { .hdr.request = request, @@ -1496,7 +1666,46 @@ static int vhost_user_get_vq_index(struct vhost_dev *dev, int idx) static int vhost_user_memslots_limit(struct vhost_dev *dev) { - return VHOST_MEMORY_MAX_NREGIONS; + VhostUserMsg msg = { + .hdr.request = VHOST_USER_GET_MAX_MEM_SLOTS, + .hdr.flags = VHOST_USER_VERSION, + }; + + if (!virtio_has_feature(dev->protocol_features, + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS)) { + return VHOST_MEMORY_LEGACY_NREGIONS; + } + + if (virtio_has_feature(dev->protocol_features, + VHOST_USER_PROTOCOL_F_REPLY_ACK)) { + error_report("The VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol " + "feature is not supported when the " + "VHOST_USER_PROTOCOL_F_REPLY_ACK features has also " + "been negotiated"); + return -1; + } + + if (vhost_user_write(dev, &msg, NULL, 0) < 0) { + return -1; + } + + if (vhost_user_read(dev, &msg) < 0) { + return -1; + } + + if (msg.hdr.request != VHOST_USER_GET_MAX_MEM_SLOTS) { + error_report("Received unexpected msg type. Expected %d received %d", + VHOST_USER_GET_MAX_MEM_SLOTS, msg.hdr.request); + return -1; + } + + if (msg.hdr.size != sizeof(msg.payload.u64)) { + error_report("Received bad msg size"); + return -1; + } + + return MIN(MAX(msg.payload.u64, VHOST_MEMORY_LEGACY_NREGIONS), + VHOST_USER_MAX_RAM_SLOTS); } static bool vhost_user_requires_shm_log(struct vhost_dev *dev) -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 3/3] Introduce Configurable Number of Memory Slots Exposed by vhost-user: 2019-12-09 7:00 ` [RFC PATCH 3/3] Introduce Configurable Number of Memory Slots Exposed by vhost-user: Raphael Norwitz @ 2020-01-14 7:12 ` Michael S. Tsirkin 2020-01-16 3:23 ` Raphael Norwitz 0 siblings, 1 reply; 10+ messages in thread From: Michael S. Tsirkin @ 2020-01-14 7:12 UTC (permalink / raw) To: Raphael Norwitz; +Cc: raphael.s.norwitz, qemu-devel On Mon, Dec 09, 2019 at 02:00:47AM -0500, Raphael Norwitz wrote: > The current vhost-user implementation in Qemu imposes a limit on the > maximum number of memory slots exposed to a VM using a vhost-user > device. This change provides a new protocol feature > VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit > and allows a VM with a vhost-user device to expose a configurable > number of memory slots, up to the maximum supported by the platform. > Existing backends are unaffected. > > This feature works by using three new messages, > VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and > VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the > number of memory slots the backend is willing to accept. Then, > when the memory tables are set or updated, a series of > VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages are sent > to transmit the regions to map and/or unmap instead of trying to > send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE message. > > The vhost_user struct maintains a shadow state of the VM’s memory > regions. When the memory tables are modified, the > vhost_user_set_mem_table() function compares the new device memory state > to the shadow state and only sends regions which need to be unmapped or > mapped in. The regions which must be unmapped are sent first, followed > by the new regions to be mapped in. After all the messages have been sent, > the shadow state is set to the current virtual device state. > > The current feature implementation does not work with postcopy migration > and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has > also been negotiated. > > Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com> > --- > docs/interop/vhost-user.rst | 43 ++++++++ > hw/virtio/vhost-user.c | 251 ++++++++++++++++++++++++++++++++++++++++---- > 2 files changed, 273 insertions(+), 21 deletions(-) > > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst > index 7827b71..855a072 100644 > --- a/docs/interop/vhost-user.rst > +++ b/docs/interop/vhost-user.rst > @@ -785,6 +785,7 @@ Protocol features > #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10 > #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11 > #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12 > + #define VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS 13 > > Master message types > -------------------- > @@ -1190,6 +1191,48 @@ Master message types > ancillary data. The GPU protocol is used to inform the master of > rendering state and updates. See vhost-user-gpu.rst for details. > > +``VHOST_USER_GET_MAX_MEM_SLOTS`` > + :id: 34 > + :equivalent ioctl: N/A > + :slave payload: u64 > + > + When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been > + successfully negotiated, this message is submitted by master to the > + slave. The slave should return the message with a u64 payload > + containing the maximum number of memory slots for QEMU to expose to > + the guest. This message is not supported with postcopy migration or if > + the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated. > + > +``VHOST_USER_ADD_MEM_REG`` > + :id: 35 > + :equivalent ioctl: N/A > + :slave payload: memory region > + > + When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been > + successfully negotiated, this message is submitted by master to the slave. > + The message payload contains a memory region descriptor struct, describing > + a region of guest memory which the slave device must map in. When the > + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully > + negotiated, along with the VHOST_USER_REM_MEM_REG message, this message is > + used to set and update the memory tables of the slave device. This message > + is not supported with postcopy migration or if the > + VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated. > + > +``VHOST_USER_REM_MEM_REG`` > + :id: 36 > + :equivalent ioctl: N/A > + :slave payload: memory region > + > + When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been > + successfully negotiated, this message is submitted by master to the slave. > + The message payload contains a memory region descriptor struct, describing > + a region of guest memory which the slave device must unmap. When the > + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully > + negotiated, along with the VHOST_USER_ADD_MEM_REG message, this message is > + used to set and update the memory tables of the slave device. This message > + is not supported with postcopy migration or if the > + VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated. > + > Slave message types > ------------------- > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > index 2134e81..3432462 100644 > --- a/hw/virtio/vhost-user.c > +++ b/hw/virtio/vhost-user.c > @@ -35,11 +35,29 @@ > #include <linux/userfaultfd.h> > #endif > > -#define VHOST_MEMORY_MAX_NREGIONS 8 > +#define VHOST_MEMORY_LEGACY_NREGIONS 8 > #define VHOST_USER_F_PROTOCOL_FEATURES 30 > #define VHOST_USER_SLAVE_MAX_FDS 8 > > /* > + * Set maximum number of RAM slots supported to > + * the maximum number supported by the target > + * hardware plaform. > + */ > +#if defined(TARGET_X86) || defined(TARGET_X86_64) || \ > + defined(TARGET_ARM) || defined(TARGET_ARM_64) > +#include "hw/acpi/acpi.h" > +#define VHOST_USER_MAX_RAM_SLOTS ACPI_MAX_RAM_SLOTS > + > +#elif defined(TARGET_PPC) || defined(TARGET_PPC_64) > +#include "hw/ppc/spapr.h" > +#define VHOST_USER_MAX_RAM_SLOTS SPAPR_MAX_RAM_SLOTS > + > +#else > +#define VHOST_USER_MAX_RAM_SLOTS 512 > +#endif > + > +/* > * Maximum size of virtio device config space > */ > #define VHOST_USER_MAX_CONFIG_SIZE 256 > @@ -58,6 +76,7 @@ enum VhostUserProtocolFeature { > VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10, > VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11, > VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12, > + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS = 13, > VHOST_USER_PROTOCOL_F_MAX > }; > > @@ -98,6 +117,9 @@ typedef enum VhostUserRequest { > VHOST_USER_GET_INFLIGHT_FD = 31, > VHOST_USER_SET_INFLIGHT_FD = 32, > VHOST_USER_GPU_SET_SOCKET = 33, > + VHOST_USER_GET_MAX_MEM_SLOTS = 34, > + VHOST_USER_ADD_MEM_REG = 35, > + VHOST_USER_REM_MEM_REG = 36, > VHOST_USER_MAX > } VhostUserRequest; > > @@ -119,9 +141,14 @@ typedef struct VhostUserMemoryRegion { > typedef struct VhostUserMemory { > uint32_t nregions; > uint32_t padding; > - VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS]; > + VhostUserMemoryRegion regions[VHOST_MEMORY_LEGACY_NREGIONS]; > } VhostUserMemory; > > +typedef struct VhostUserMemRegMsg { > + uint32_t padding; > + VhostUserMemoryRegion region; > +} VhostUserMemRegMsg; > + > typedef struct VhostUserLog { > uint64_t mmap_size; > uint64_t mmap_offset; > @@ -180,6 +207,7 @@ typedef union { > struct vhost_vring_state state; > struct vhost_vring_addr addr; > VhostUserMemory memory; > + VhostUserMemRegMsg mem_reg; > VhostUserLog log; > struct vhost_iotlb_msg iotlb; > VhostUserConfig config; > @@ -208,7 +236,7 @@ struct vhost_user { > int slave_fd; > NotifierWithReturn postcopy_notifier; > struct PostCopyFD postcopy_fd; > - uint64_t postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS]; > + uint64_t postcopy_client_bases[VHOST_USER_MAX_RAM_SLOTS]; > /* Length of the region_rb and region_rb_offset arrays */ > size_t region_rb_len; > /* RAMBlock associated with a given region */ > @@ -220,6 +248,10 @@ struct vhost_user { > > /* True once we've entered postcopy_listen */ > bool postcopy_listen; > + > + /* Our current regions */ > + int num_shadow_regions; > + VhostUserMemoryRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS]; > }; > > static bool ioeventfd_enabled(void) > @@ -368,7 +400,7 @@ int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd) > static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, > struct vhost_log *log) > { > - int fds[VHOST_MEMORY_MAX_NREGIONS]; > + int fds[VHOST_USER_MAX_RAM_SLOTS]; > size_t fd_num = 0; > bool shmfd = virtio_has_feature(dev->protocol_features, > VHOST_USER_PROTOCOL_F_LOG_SHMFD); > @@ -426,7 +458,7 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u, > &offset); > fd = memory_region_get_fd(mr); > if (fd > 0) { > - if (*fd_num == VHOST_MEMORY_MAX_NREGIONS) { > + if (*fd_num == VHOST_MEMORY_LEGACY_NREGIONS) { > error_report("Failed preparing vhost-user memory table msg"); > return -1; > } > @@ -472,7 +504,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > struct vhost_memory *mem) > { > struct vhost_user *u = dev->opaque; > - int fds[VHOST_MEMORY_MAX_NREGIONS]; > + int fds[VHOST_MEMORY_LEGACY_NREGIONS]; > size_t fd_num = 0; > VhostUserMsg msg_reply; > int region_i, msg_i; > @@ -521,7 +553,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > } > > memset(u->postcopy_client_bases, 0, > - sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS); > + sizeof(uint64_t) * VHOST_USER_MAX_RAM_SLOTS); > > /* They're in the same order as the regions that were sent > * but some of the regions were skipped (above) if they > @@ -562,18 +594,151 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > return 0; > } > > +static inline bool reg_equal(VhostUserMemoryRegion *shadow_reg, > + struct vhost_memory_region *vdev_reg) > +{ > + if (shadow_reg->guest_phys_addr == vdev_reg->guest_phys_addr && > + shadow_reg->userspace_addr == vdev_reg->userspace_addr && > + shadow_reg->memory_size == vdev_reg->memory_size) { > + return true; > + } else { > + return false; > + } > +} > + > +static int vhost_user_send_add_remove_regions(struct vhost_dev *dev, > + struct vhost_memory *mem, > + VhostUserMsg *msg) > +{ > + struct vhost_user *u = dev->opaque; > + int i, j, fd; > + bool found[VHOST_USER_MAX_RAM_SLOTS] = {}; > + bool matching = false; > + struct vhost_memory_region *reg; > + ram_addr_t offset; > + MemoryRegion *mr; > + > + /* > + * Ensure the VHOST_USER_PROTOCOL_F_REPLY_ACK has not been > + * negotiated and no postcopy migration is in progress. > + */ > + assert(!virtio_has_feature(dev->protocol_features, > + VHOST_USER_PROTOCOL_F_REPLY_ACK)); > + if (u->postcopy_listen && u->postcopy_fd.handler) { > + error_report("Postcopy migration is not supported when the " > + "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature " > + "has been negotiated"); > + return -1; > + } > + > + msg->hdr.size = sizeof(msg->payload.mem_reg.padding); > + msg->hdr.size += sizeof(VhostUserMemoryRegion); > + > + /* > + * Send VHOST_USER_REM_MEM_REG for memory regions in our shadow state > + * which are not found not in the device's memory state. > + */ > + for (i = 0; i < u->num_shadow_regions; ++i) { > + reg = dev->mem->regions; > + > + for (j = 0; j < dev->mem->nregions; j++) { > + reg = dev->mem->regions + j; > + > + assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); > + mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, > + &offset); > + fd = memory_region_get_fd(mr); > + > + if (reg_equal(&u->shadow_regions[i], reg)) { > + matching = true; > + found[j] = true; > + break; > + } > + } > + > + if (fd > 0 && !matching) { > + msg->hdr.request = VHOST_USER_REM_MEM_REG; > + msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr; > + msg->payload.mem_reg.region.memory_size = reg->memory_size; > + msg->payload.mem_reg.region.guest_phys_addr = > + reg->guest_phys_addr; > + msg->payload.mem_reg.region.mmap_offset = offset; > + > + if (vhost_user_write(dev, msg, &fd, 1) < 0) { > + return -1; > + } > + } > + } > + > + /* > + * Send messages to add regions present in the device which are not > + * in our shadow state. > + */ > + for (i = 0; i < dev->mem->nregions; ++i) { > + reg = dev->mem->regions + i; > + > + /* > + * If the region was in both the shadow and vdev state we don't > + * need to send a VHOST_USER_ADD_MEM_REG message for it. > + */ > + if (found[i]) { > + continue; > + } > + > + assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); > + mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, > + &offset); > + fd = memory_region_get_fd(mr); > + > + if (fd > 0) { > + msg->hdr.request = VHOST_USER_ADD_MEM_REG; > + msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr; > + msg->payload.mem_reg.region.memory_size = reg->memory_size; > + msg->payload.mem_reg.region.guest_phys_addr = > + reg->guest_phys_addr; > + msg->payload.mem_reg.region.mmap_offset = offset; > + > + if (vhost_user_write(dev, msg, &fd, 1) < 0) { > + return -1; > + } > + } > + } > + > + /* Make our shadow state match the updated device state. */ > + u->num_shadow_regions = dev->mem->nregions; > + for (i = 0; i < dev->mem->nregions; ++i) { > + reg = dev->mem->regions + i; > + u->shadow_regions[i].guest_phys_addr = reg->guest_phys_addr; > + u->shadow_regions[i].userspace_addr = reg->userspace_addr; > + u->shadow_regions[i].memory_size = reg->memory_size; > + } > + > + return 0; > +} > + > static int vhost_user_set_mem_table(struct vhost_dev *dev, > struct vhost_memory *mem) > { > struct vhost_user *u = dev->opaque; > - int fds[VHOST_MEMORY_MAX_NREGIONS]; > + int fds[VHOST_MEMORY_LEGACY_NREGIONS]; > size_t fd_num = 0; > bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler; > bool reply_supported = virtio_has_feature(dev->protocol_features, > VHOST_USER_PROTOCOL_F_REPLY_ACK); > + bool config_slots = > + virtio_has_feature(dev->protocol_features, > + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS); > > if (do_postcopy) { > - /* Postcopy has enough differences that it's best done in it's own > + if (config_slots) { > + error_report("Postcopy migration not supported with " > + "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature " > + "enabled."); > + return -1; > + } > + > + /* > + * Postcopy has enough differences that it's best done in it's own > * version > */ > return vhost_user_set_mem_table_postcopy(dev, mem); > @@ -587,17 +752,22 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, > msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK; > } > > - if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num, > - false) < 0) { > - return -1; > - } > - > - if (vhost_user_write(dev, &msg, fds, fd_num) < 0) { > - return -1; > - } > + if (config_slots && !reply_supported) { > + if (vhost_user_send_add_remove_regions(dev, mem, &msg) < 0) { > + return -1; > + } > + } else { > + if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num, > + false) < 0) { > + return -1; > + } > + if (vhost_user_write(dev, &msg, fds, fd_num) < 0) { > + return -1; > + } > > - if (reply_supported) { > - return process_message_reply(dev, &msg); > + if (reply_supported) { > + return process_message_reply(dev, &msg); > + } > } > > return 0; > @@ -762,7 +932,7 @@ static int vhost_set_vring_file(struct vhost_dev *dev, > VhostUserRequest request, > struct vhost_vring_file *file) > { > - int fds[VHOST_MEMORY_MAX_NREGIONS]; > + int fds[VHOST_USER_MAX_RAM_SLOTS]; > size_t fd_num = 0; > VhostUserMsg msg = { > .hdr.request = request, > @@ -1496,7 +1666,46 @@ static int vhost_user_get_vq_index(struct vhost_dev *dev, int idx) > > static int vhost_user_memslots_limit(struct vhost_dev *dev) > { > - return VHOST_MEMORY_MAX_NREGIONS; > + VhostUserMsg msg = { > + .hdr.request = VHOST_USER_GET_MAX_MEM_SLOTS, > + .hdr.flags = VHOST_USER_VERSION, > + }; > + > + if (!virtio_has_feature(dev->protocol_features, > + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS)) { > + return VHOST_MEMORY_LEGACY_NREGIONS; > + } > + > + if (virtio_has_feature(dev->protocol_features, > + VHOST_USER_PROTOCOL_F_REPLY_ACK)) { I'd just assume that VHOST_USER_GET_MAX_MEM_SLOTS always gets a response. > + error_report("The VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol " > + "feature is not supported when the " > + "VHOST_USER_PROTOCOL_F_REPLY_ACK features has also " > + "been negotiated"); > + return -1; > + } > + > + if (vhost_user_write(dev, &msg, NULL, 0) < 0) { > + return -1; > + } This will send the message each time e.g. memory hotplug triggers. Why not just get this value during init time? Also, any way we can cut down number of round trips? Can we combine this value in a single message with some other stuff we are fetching on startup? > + > + if (vhost_user_read(dev, &msg) < 0) { > + return -1; > + } > + > + if (msg.hdr.request != VHOST_USER_GET_MAX_MEM_SLOTS) { > + error_report("Received unexpected msg type. Expected %d received %d", > + VHOST_USER_GET_MAX_MEM_SLOTS, msg.hdr.request); > + return -1; > + } > + > + if (msg.hdr.size != sizeof(msg.payload.u64)) { > + error_report("Received bad msg size"); > + return -1; > + } > + > + return MIN(MAX(msg.payload.u64, VHOST_MEMORY_LEGACY_NREGIONS), > + VHOST_USER_MAX_RAM_SLOTS); What's this trying to do? Pls add code comments. > } > > static bool vhost_user_requires_shm_log(struct vhost_dev *dev) > -- > 1.8.3.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 3/3] Introduce Configurable Number of Memory Slots Exposed by vhost-user: 2020-01-14 7:12 ` Michael S. Tsirkin @ 2020-01-16 3:23 ` Raphael Norwitz 2020-01-22 8:57 ` Michael S. Tsirkin 0 siblings, 1 reply; 10+ messages in thread From: Raphael Norwitz @ 2020-01-16 3:23 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: qemu-devel On Tue, Jan 14, 2020 at 02:12:46AM -0500, Michael S. Tsirkin wrote: > > On Mon, Dec 09, 2019 at 02:00:47AM -0500, Raphael Norwitz wrote: > > The current vhost-user implementation in Qemu imposes a limit on the > > maximum number of memory slots exposed to a VM using a vhost-user > > device. This change provides a new protocol feature > > VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit > > and allows a VM with a vhost-user device to expose a configurable > > number of memory slots, up to the maximum supported by the platform. > > Existing backends are unaffected. > > > > This feature works by using three new messages, > > VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and > > VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the > > number of memory slots the backend is willing to accept. Then, > > when the memory tables are set or updated, a series of > > VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages are sent > > to transmit the regions to map and/or unmap instead of trying to > > send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE message. > > > > The vhost_user struct maintains a shadow state of the VM’s memory > > regions. When the memory tables are modified, the > > vhost_user_set_mem_table() function compares the new device memory state > > to the shadow state and only sends regions which need to be unmapped or > > mapped in. The regions which must be unmapped are sent first, followed > > by the new regions to be mapped in. After all the messages have been sent, > > the shadow state is set to the current virtual device state. > > > > The current feature implementation does not work with postcopy migration > > and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has > > also been negotiated. > > > > Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com> > > --- > > docs/interop/vhost-user.rst | 43 ++++++++ > > hw/virtio/vhost-user.c | 251 ++++++++++++++++++++++++++++++++++++++++---- > > 2 files changed, 273 insertions(+), 21 deletions(-) > > > > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst > > index 7827b71..855a072 100644 > > --- a/docs/interop/vhost-user.rst > > +++ b/docs/interop/vhost-user.rst > > @@ -785,6 +785,7 @@ Protocol features > > #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10 > > #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11 > > #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12 > > + #define VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS 13 > > > > Master message types > > -------------------- > > @@ -1190,6 +1191,48 @@ Master message types > > ancillary data. The GPU protocol is used to inform the master of > > rendering state and updates. See vhost-user-gpu.rst for details. > > > > +``VHOST_USER_GET_MAX_MEM_SLOTS`` > > + :id: 34 > > + :equivalent ioctl: N/A > > + :slave payload: u64 > > + > > + When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been > > + successfully negotiated, this message is submitted by master to the > > + slave. The slave should return the message with a u64 payload > > + containing the maximum number of memory slots for QEMU to expose to > > + the guest. This message is not supported with postcopy migration or if > > + the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated. > > + > > +``VHOST_USER_ADD_MEM_REG`` > > + :id: 35 > > + :equivalent ioctl: N/A > > + :slave payload: memory region > > + > > + When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been > > + successfully negotiated, this message is submitted by master to the slave. > > + The message payload contains a memory region descriptor struct, describing > > + a region of guest memory which the slave device must map in. When the > > + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully > > + negotiated, along with the VHOST_USER_REM_MEM_REG message, this message is > > + used to set and update the memory tables of the slave device. This message > > + is not supported with postcopy migration or if the > > + VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated. > > + > > +``VHOST_USER_REM_MEM_REG`` > > + :id: 36 > > + :equivalent ioctl: N/A > > + :slave payload: memory region > > + > > + When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been > > + successfully negotiated, this message is submitted by master to the slave. > > + The message payload contains a memory region descriptor struct, describing > > + a region of guest memory which the slave device must unmap. When the > > + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully > > + negotiated, along with the VHOST_USER_ADD_MEM_REG message, this message is > > + used to set and update the memory tables of the slave device. This message > > + is not supported with postcopy migration or if the > > + VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated. > > + > > Slave message types > > ------------------- > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > index 2134e81..3432462 100644 > > --- a/hw/virtio/vhost-user.c > > +++ b/hw/virtio/vhost-user.c > > @@ -35,11 +35,29 @@ > > #include <linux/userfaultfd.h> > > #endif > > > > -#define VHOST_MEMORY_MAX_NREGIONS 8 > > +#define VHOST_MEMORY_LEGACY_NREGIONS 8 > > #define VHOST_USER_F_PROTOCOL_FEATURES 30 > > #define VHOST_USER_SLAVE_MAX_FDS 8 > > > > /* > > + * Set maximum number of RAM slots supported to > > + * the maximum number supported by the target > > + * hardware plaform. > > + */ > > +#if defined(TARGET_X86) || defined(TARGET_X86_64) || \ > > + defined(TARGET_ARM) || defined(TARGET_ARM_64) > > +#include "hw/acpi/acpi.h" > > +#define VHOST_USER_MAX_RAM_SLOTS ACPI_MAX_RAM_SLOTS > > + > > +#elif defined(TARGET_PPC) || defined(TARGET_PPC_64) > > +#include "hw/ppc/spapr.h" > > +#define VHOST_USER_MAX_RAM_SLOTS SPAPR_MAX_RAM_SLOTS > > + > > +#else > > +#define VHOST_USER_MAX_RAM_SLOTS 512 > > +#endif > > + > > +/* > > * Maximum size of virtio device config space > > */ > > #define VHOST_USER_MAX_CONFIG_SIZE 256 > > @@ -58,6 +76,7 @@ enum VhostUserProtocolFeature { > > VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10, > > VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11, > > VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12, > > + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS = 13, > > VHOST_USER_PROTOCOL_F_MAX > > }; > > > > @@ -98,6 +117,9 @@ typedef enum VhostUserRequest { > > VHOST_USER_GET_INFLIGHT_FD = 31, > > VHOST_USER_SET_INFLIGHT_FD = 32, > > VHOST_USER_GPU_SET_SOCKET = 33, > > + VHOST_USER_GET_MAX_MEM_SLOTS = 34, > > + VHOST_USER_ADD_MEM_REG = 35, > > + VHOST_USER_REM_MEM_REG = 36, > > VHOST_USER_MAX > > } VhostUserRequest; > > > > @@ -119,9 +141,14 @@ typedef struct VhostUserMemoryRegion { > > typedef struct VhostUserMemory { > > uint32_t nregions; > > uint32_t padding; > > - VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS]; > > + VhostUserMemoryRegion regions[VHOST_MEMORY_LEGACY_NREGIONS]; > > } VhostUserMemory; > > > > +typedef struct VhostUserMemRegMsg { > > + uint32_t padding; > > + VhostUserMemoryRegion region; > > +} VhostUserMemRegMsg; > > + > > typedef struct VhostUserLog { > > uint64_t mmap_size; > > uint64_t mmap_offset; > > @@ -180,6 +207,7 @@ typedef union { > > struct vhost_vring_state state; > > struct vhost_vring_addr addr; > > VhostUserMemory memory; > > + VhostUserMemRegMsg mem_reg; > > VhostUserLog log; > > struct vhost_iotlb_msg iotlb; > > VhostUserConfig config; > > @@ -208,7 +236,7 @@ struct vhost_user { > > int slave_fd; > > NotifierWithReturn postcopy_notifier; > > struct PostCopyFD postcopy_fd; > > - uint64_t postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS]; > > + uint64_t postcopy_client_bases[VHOST_USER_MAX_RAM_SLOTS]; > > /* Length of the region_rb and region_rb_offset arrays */ > > size_t region_rb_len; > > /* RAMBlock associated with a given region */ > > @@ -220,6 +248,10 @@ struct vhost_user { > > > > /* True once we've entered postcopy_listen */ > > bool postcopy_listen; > > + > > + /* Our current regions */ > > + int num_shadow_regions; > > + VhostUserMemoryRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS]; > > }; > > > > static bool ioeventfd_enabled(void) > > @@ -368,7 +400,7 @@ int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd) > > static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, > > struct vhost_log *log) > > { > > - int fds[VHOST_MEMORY_MAX_NREGIONS]; > > + int fds[VHOST_USER_MAX_RAM_SLOTS]; > > size_t fd_num = 0; > > bool shmfd = virtio_has_feature(dev->protocol_features, > > VHOST_USER_PROTOCOL_F_LOG_SHMFD); > > @@ -426,7 +458,7 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u, > > &offset); > > fd = memory_region_get_fd(mr); > > if (fd > 0) { > > - if (*fd_num == VHOST_MEMORY_MAX_NREGIONS) { > > + if (*fd_num == VHOST_MEMORY_LEGACY_NREGIONS) { > > error_report("Failed preparing vhost-user memory table msg"); > > return -1; > > } > > @@ -472,7 +504,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > > struct vhost_memory *mem) > > { > > struct vhost_user *u = dev->opaque; > > - int fds[VHOST_MEMORY_MAX_NREGIONS]; > > + int fds[VHOST_MEMORY_LEGACY_NREGIONS]; > > size_t fd_num = 0; > > VhostUserMsg msg_reply; > > int region_i, msg_i; > > @@ -521,7 +553,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > > } > > > > memset(u->postcopy_client_bases, 0, > > - sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS); > > + sizeof(uint64_t) * VHOST_USER_MAX_RAM_SLOTS); > > > > /* They're in the same order as the regions that were sent > > * but some of the regions were skipped (above) if they > > @@ -562,18 +594,151 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > > return 0; > > } > > > > +static inline bool reg_equal(VhostUserMemoryRegion *shadow_reg, > > + struct vhost_memory_region *vdev_reg) > > +{ > > + if (shadow_reg->guest_phys_addr == vdev_reg->guest_phys_addr && > > + shadow_reg->userspace_addr == vdev_reg->userspace_addr && > > + shadow_reg->memory_size == vdev_reg->memory_size) { > > + return true; > > + } else { > > + return false; > > + } > > +} > > + > > +static int vhost_user_send_add_remove_regions(struct vhost_dev *dev, > > + struct vhost_memory *mem, > > + VhostUserMsg *msg) > > +{ > > + struct vhost_user *u = dev->opaque; > > + int i, j, fd; > > + bool found[VHOST_USER_MAX_RAM_SLOTS] = {}; > > + bool matching = false; > > + struct vhost_memory_region *reg; > > + ram_addr_t offset; > > + MemoryRegion *mr; > > + > > + /* > > + * Ensure the VHOST_USER_PROTOCOL_F_REPLY_ACK has not been > > + * negotiated and no postcopy migration is in progress. > > + */ > > + assert(!virtio_has_feature(dev->protocol_features, > > + VHOST_USER_PROTOCOL_F_REPLY_ACK)); > > + if (u->postcopy_listen && u->postcopy_fd.handler) { > > + error_report("Postcopy migration is not supported when the " > > + "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature " > > + "has been negotiated"); > > + return -1; > > + } > > + > > + msg->hdr.size = sizeof(msg->payload.mem_reg.padding); > > + msg->hdr.size += sizeof(VhostUserMemoryRegion); > > + > > + /* > > + * Send VHOST_USER_REM_MEM_REG for memory regions in our shadow state > > + * which are not found not in the device's memory state. > > + */ > > + for (i = 0; i < u->num_shadow_regions; ++i) { > > + reg = dev->mem->regions; > > + > > + for (j = 0; j < dev->mem->nregions; j++) { > > + reg = dev->mem->regions + j; > > + > > + assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); > > + mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, > > + &offset); > > + fd = memory_region_get_fd(mr); > > + > > + if (reg_equal(&u->shadow_regions[i], reg)) { > > + matching = true; > > + found[j] = true; > > + break; > > + } > > + } > > + > > + if (fd > 0 && !matching) { > > + msg->hdr.request = VHOST_USER_REM_MEM_REG; > > + msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr; > > + msg->payload.mem_reg.region.memory_size = reg->memory_size; > > + msg->payload.mem_reg.region.guest_phys_addr = > > + reg->guest_phys_addr; > > + msg->payload.mem_reg.region.mmap_offset = offset; > > + > > + if (vhost_user_write(dev, msg, &fd, 1) < 0) { > > + return -1; > > + } > > + } > > + } > > + > > + /* > > + * Send messages to add regions present in the device which are not > > + * in our shadow state. > > + */ > > + for (i = 0; i < dev->mem->nregions; ++i) { > > + reg = dev->mem->regions + i; > > + > > + /* > > + * If the region was in both the shadow and vdev state we don't > > + * need to send a VHOST_USER_ADD_MEM_REG message for it. > > + */ > > + if (found[i]) { > > + continue; > > + } > > + > > + assert((uintptr_t)reg->userspace_addr == reg->userspace_addr); > > + mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr, > > + &offset); > > + fd = memory_region_get_fd(mr); > > + > > + if (fd > 0) { > > + msg->hdr.request = VHOST_USER_ADD_MEM_REG; > > + msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr; > > + msg->payload.mem_reg.region.memory_size = reg->memory_size; > > + msg->payload.mem_reg.region.guest_phys_addr = > > + reg->guest_phys_addr; > > + msg->payload.mem_reg.region.mmap_offset = offset; > > + > > + if (vhost_user_write(dev, msg, &fd, 1) < 0) { > > + return -1; > > + } > > + } > > + } > > + > > + /* Make our shadow state match the updated device state. */ > > + u->num_shadow_regions = dev->mem->nregions; > > + for (i = 0; i < dev->mem->nregions; ++i) { > > + reg = dev->mem->regions + i; > > + u->shadow_regions[i].guest_phys_addr = reg->guest_phys_addr; > > + u->shadow_regions[i].userspace_addr = reg->userspace_addr; > > + u->shadow_regions[i].memory_size = reg->memory_size; > > + } > > + > > + return 0; > > +} > > + > > static int vhost_user_set_mem_table(struct vhost_dev *dev, > > struct vhost_memory *mem) > > { > > struct vhost_user *u = dev->opaque; > > - int fds[VHOST_MEMORY_MAX_NREGIONS]; > > + int fds[VHOST_MEMORY_LEGACY_NREGIONS]; > > size_t fd_num = 0; > > bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler; > > bool reply_supported = virtio_has_feature(dev->protocol_features, > > VHOST_USER_PROTOCOL_F_REPLY_ACK); > > + bool config_slots = > > + virtio_has_feature(dev->protocol_features, > > + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS); > > > > if (do_postcopy) { > > - /* Postcopy has enough differences that it's best done in it's own > > + if (config_slots) { > > + error_report("Postcopy migration not supported with " > > + "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature " > > + "enabled."); > > + return -1; > > + } > > + > > + /* > > + * Postcopy has enough differences that it's best done in it's own > > * version > > */ > > return vhost_user_set_mem_table_postcopy(dev, mem); > > @@ -587,17 +752,22 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, > > msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK; > > } > > > > - if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num, > > - false) < 0) { > > - return -1; > > - } > > - > > - if (vhost_user_write(dev, &msg, fds, fd_num) < 0) { > > - return -1; > > - } > > + if (config_slots && !reply_supported) { > > + if (vhost_user_send_add_remove_regions(dev, mem, &msg) < 0) { > > + return -1; > > + } > > + } else { > > + if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num, > > + false) < 0) { > > + return -1; > > + } > > + if (vhost_user_write(dev, &msg, fds, fd_num) < 0) { > > + return -1; > > + } > > > > - if (reply_supported) { > > - return process_message_reply(dev, &msg); > > + if (reply_supported) { > > + return process_message_reply(dev, &msg); > > + } > > } > > > > return 0; > > @@ -762,7 +932,7 @@ static int vhost_set_vring_file(struct vhost_dev *dev, > > VhostUserRequest request, > > struct vhost_vring_file *file) > > { > > - int fds[VHOST_MEMORY_MAX_NREGIONS]; > > + int fds[VHOST_USER_MAX_RAM_SLOTS]; > > size_t fd_num = 0; > > VhostUserMsg msg = { > > .hdr.request = request, > > @@ -1496,7 +1666,46 @@ static int vhost_user_get_vq_index(struct vhost_dev *dev, int idx) > > > > static int vhost_user_memslots_limit(struct vhost_dev *dev) > > { > > - return VHOST_MEMORY_MAX_NREGIONS; > > + VhostUserMsg msg = { > > + .hdr.request = VHOST_USER_GET_MAX_MEM_SLOTS, > > + .hdr.flags = VHOST_USER_VERSION, > > + }; > > + > > + if (!virtio_has_feature(dev->protocol_features, > > + VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS)) { > > + return VHOST_MEMORY_LEGACY_NREGIONS; > > + } > > + > > + if (virtio_has_feature(dev->protocol_features, > > + VHOST_USER_PROTOCOL_F_REPLY_ACK)) { > > I'd just assume that VHOST_USER_GET_MAX_MEM_SLOTS always > gets a response. > > Makes sense > > + error_report("The VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol " > > + "feature is not supported when the " > > + "VHOST_USER_PROTOCOL_F_REPLY_ACK features has also " > > + "been negotiated"); > > + return -1; > > + } > > + > > + if (vhost_user_write(dev, &msg, NULL, 0) < 0) { > > + return -1; > > + } > > This will send the message each time e.g. memory hotplug > triggers. Why not just get this value during init time? > Also, any way we can cut down number of round trips? > Can we combine this value in a single message with > some other stuff we are fetching on startup? > Agreed, sending a VHOST_USER_GET_MEMSLOTS message on every hot-add is unnessesary. I can think of a couple ways to get number of memslots without adding any additional round trips. I don’t like either of them though.` 1. Only 14 of the 64 bits are used in the VHOST_USER_GET_FEATURES message are used to store feature flags. If CONFIGURE SLOTS is enabled, we could use the remaining 64 - VHOST_USER_PROTOCOL_F_MAX bits to store the maximum number of memory slots. We would only need around 10 bits to express a reasonable number of slots so that still leaves room for plenty of future features with VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS negotiated. 2. We could always have it sent from the backend with an existing VHOST_USER_GET_* message in vhost_user_backend_init(). The best option I see is using the VHOST_USER_GET_QUEUE_NUM if the VHOST_USER_PROTOCOL_F_MQ is negotiated. This has the obvious drawback of requiring VHOST_USER_PROTOCOL_F_MQ to negotiate VHOST_USER_PROTOCOL_F_CONFIGURE_SOTS. I don’t see another option without such a limitation. Both (1) and (2) seem hacky. I think it’s preferable to keep the VHOST_USER_GET_MAX_MEM_SLOTS message but send it once on init and cache the response in the vhost-user struct. Do you agree? > > + > > + if (vhost_user_read(dev, &msg) < 0) { > > + return -1; > > + } > > + > > + if (msg.hdr.request != VHOST_USER_GET_MAX_MEM_SLOTS) { > > + error_report("Received unexpected msg type. Expected %d received %d", > > + VHOST_USER_GET_MAX_MEM_SLOTS, msg.hdr.request); > > + return -1; > > + } > > + > > + if (msg.hdr.size != sizeof(msg.payload.u64)) { > > + error_report("Received bad msg size"); > > + return -1; > > + } > > + > > + return MIN(MAX(msg.payload.u64, VHOST_MEMORY_LEGACY_NREGIONS), > > + VHOST_USER_MAX_RAM_SLOTS); > > What's this trying to do? Pls add code comments. > Should be MIN(msg.payload.u64, VHOST_USER_MAX_RAM_SLOTS) - will fix > > } > > > > static bool vhost_user_requires_shm_log(struct vhost_dev *dev) > > -- > > 1.8.3.1 > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 3/3] Introduce Configurable Number of Memory Slots Exposed by vhost-user: 2020-01-16 3:23 ` Raphael Norwitz @ 2020-01-22 8:57 ` Michael S. Tsirkin 0 siblings, 0 replies; 10+ messages in thread From: Michael S. Tsirkin @ 2020-01-22 8:57 UTC (permalink / raw) To: Raphael Norwitz; +Cc: qemu-devel On Wed, Jan 15, 2020 at 07:23:14PM -0800, Raphael Norwitz wrote: > > > + error_report("The VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol " > > > + "feature is not supported when the " > > > + "VHOST_USER_PROTOCOL_F_REPLY_ACK features has also " > > > + "been negotiated"); > > > + return -1; > > > + } > > > + > > > + if (vhost_user_write(dev, &msg, NULL, 0) < 0) { > > > + return -1; > > > + } > > > > This will send the message each time e.g. memory hotplug > > triggers. Why not just get this value during init time? > > Also, any way we can cut down number of round trips? > > Can we combine this value in a single message with > > some other stuff we are fetching on startup? > > > > Agreed, sending a VHOST_USER_GET_MEMSLOTS message on every hot-add is > unnessesary. I can think of a couple ways to get number of memslots without > adding any additional round trips. I don’t like either of them though.` > > 1. > Only 14 of the 64 bits are used in the VHOST_USER_GET_FEATURES message are > used to store feature flags. If CONFIGURE SLOTS is enabled, we could use > the remaining 64 - VHOST_USER_PROTOCOL_F_MAX bits to store the maximum number > of memory slots. We would only need around 10 bits to express a reasonable > number of slots so that still leaves room for plenty of future features with > VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS negotiated. > > 2. > We could always have it sent from the backend with an existing VHOST_USER_GET_* > message in vhost_user_backend_init(). The best option I see is using the > VHOST_USER_GET_QUEUE_NUM if the VHOST_USER_PROTOCOL_F_MQ is negotiated. This has > the obvious drawback of requiring VHOST_USER_PROTOCOL_F_MQ to negotiate > VHOST_USER_PROTOCOL_F_CONFIGURE_SOTS. I don’t see another option without such a > limitation. > > Both (1) and (2) seem hacky. I think it’s preferable to keep the > VHOST_USER_GET_MAX_MEM_SLOTS message but send it once on init and cache the > response in the vhost-user struct. > > Do you agree? Makes sense. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 0/3] vhost-user: Lift Max Ram Slots Limitation 2019-12-09 7:00 [RFC PATCH 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz ` (2 preceding siblings ...) 2019-12-09 7:00 ` [RFC PATCH 3/3] Introduce Configurable Number of Memory Slots Exposed by vhost-user: Raphael Norwitz @ 2020-01-13 19:43 ` Raphael Norwitz 3 siblings, 0 replies; 10+ messages in thread From: Raphael Norwitz @ 2020-01-13 19:43 UTC (permalink / raw) To: mst, qemu-devel Ping On Mon, Dec 09, 2019 at 02:00:44AM -0500, Raphael Norwitz wrote: > > In QEMU today, a VM with a vhost-user device can hot add memory a > maximum of 8 times. See these threads, among others: > > [1] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01046.html > https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01236.html > > [2] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html > > This RFC/patch set introduces a new protocol feature > VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS which, when enabled, lifts the > restriction on the maximum number RAM slots imposed by vhost-user. > > The patch consists of 3 changes: > 1. Fixed Error Handling in vhost_user_set_mem_table_postcopy: > This is a bug fix in the postcopy migration path > 2. vhost-user: Refactor vhost_user_set_mem_table Functions: > This is a non-functional change refractoring the > vhost_user_set_mem_table and vhost_user_set_mem_table_postcopy > functions such that the feature can be more cleanly added. > 3. Introduce Configurable Number of Memory Slots Exposed by vhost-user: > This change introduces the new protocol feature > VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS. > > The implementation details are explained in more detail in the commit > messages, but at a high level the new protocol feature works as follows: > - If the VHOST_USER_PROTCOL_F_CONFIGURE_SLOTS feature is enabled, QEMU will > send multiple VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG > messages to map and unmap individual memory regions instead of one large > VHOST_USER_SET_MEM_TABLE message containing all memory regions. > - The vhost-user struct maintains a ’shadow state’ of memory regions > already sent to the guest. Each time vhost_user_set_mem_table is called, > the shadow state is compared with the new device state. A > VHOST_USER_REM_MEM_REG will be sent for each region in the shadow state > not in the device state. Then, a VHOST_USER_ADD_MEM_REG will be sent > for each region in the device state but not the shadow state. After > these messages have been sent, the shadow state will be updated to > reflect the new device state. > > The VHOST_USER_SET_MEM_TABLE message was not reused because as the number of > regions grows, the message becomes very large. In practice, such large > messages caused problems (truncated messages) and in the past it seems the > community has opted for smaller fixed size messages where possible. VRINGs, > for example, are sent to the backend individually instead of in one massive > message. > > Current Limitations: > - postcopy migration is not supported when the > VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS has been negotiated. > - VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS cannot be negotiated when > VHOST_USER_PROTOCOL_F_REPLY_ACK has also been negotiated. > > Both of these limitations are due to resource contraints. They are not > imposed for technical reasons. > > Questions: > - In the event transmitting a VHOST_USER_ADD_MEM_REG or > VHOST_USER_REM_REG message fails, is there any reason the error handling > should differ from when transmitting VHOST_USER_SET_MEM_TABLE message fails? > - Is there a cleaner way to ensure to ensure a postcopy migration cannot be > started with this protocol feature enabled? > > Best, > Raphael > > Raphael Norwitz (3): > Fixed Error Handling in vhost_user_set_mem_table_postcopy > vhost-user: Refactor vhost_user_set_mem_table Functions > Introduce Configurable Number of Memory Slots Exposed by vhost-user: > > docs/interop/vhost-user.rst | 43 +++++ > hw/virtio/vhost-user.c | 384 +++++++++++++++++++++++++++++++++----------- > 2 files changed, 335 insertions(+), 92 deletions(-) > > -- > 1.8.3.1 > > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-01-22 8:58 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-12-09 7:00 [RFC PATCH 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz 2019-12-09 7:00 ` [RFC PATCH 1/3] Fixed Error Handling in vhost_user_set_mem_table_postcopy Raphael Norwitz 2020-01-14 7:07 ` Michael S. Tsirkin 2020-01-16 3:16 ` Raphael Norwitz 2019-12-09 7:00 ` [RFC PATCH 2/3] vhost-user: Refactor vhost_user_set_mem_table Functions Raphael Norwitz 2019-12-09 7:00 ` [RFC PATCH 3/3] Introduce Configurable Number of Memory Slots Exposed by vhost-user: Raphael Norwitz 2020-01-14 7:12 ` Michael S. Tsirkin 2020-01-16 3:23 ` Raphael Norwitz 2020-01-22 8:57 ` Michael S. Tsirkin 2020-01-13 19:43 ` [RFC PATCH 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).