* [PATCH v13 1/5] vhost: add user to mailmap and define to vhost hdr
2026-05-14 2:01 [PATCH v13 0/5] Support add/remove memory region and get-max-slots pravin.bathija
@ 2026-05-14 2:01 ` pravin.bathija
0 siblings, 0 replies; 10+ messages in thread
From: pravin.bathija @ 2026-05-14 2:01 UTC (permalink / raw)
To: dev, fengchengwen, stephen, maxime.coquelin; +Cc: pravin.bathija, thomas
From: Pravin M Bathija <pravin.bathija@dell.com>
- add user to mailmap file.
- define a bit-field called VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
that depicts if the feature/capability to add/remove memory regions
is supported. This is a part of the overall support for add/remove
memory region feature in this patchset.
Signed-off-by: Pravin M Bathija <pravin.bathija@dell.com>
---
.mailmap | 1 +
lib/vhost/rte_vhost.h | 4 ++++
2 files changed, 5 insertions(+)
diff --git a/.mailmap b/.mailmap
index 0e0d83e1c6..cc44e27036 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1295,6 +1295,7 @@ Prateek Agarwal <prateekag@cse.iitb.ac.in>
Prathisna Padmasanan <prathisna.padmasanan@intel.com>
Praveen Kaligineedi <pkaligineedi@google.com>
Praveen Shetty <praveen.shetty@intel.com>
+Pravin M Bathija <pravin.bathija@dell.com>
Pravin Pathak <pravin.pathak.dev@gmail.com> <pravin.pathak@intel.com>
Prince Takkar <ptakkar@marvell.com>
Priyalee Kushwaha <priyalee.kushwaha@intel.com>
diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
index 2f7c4c0080..a7f9700538 100644
--- a/lib/vhost/rte_vhost.h
+++ b/lib/vhost/rte_vhost.h
@@ -109,6 +109,10 @@ extern "C" {
#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
#endif
+#ifndef VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
+#define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15
+#endif
+
#ifndef VHOST_USER_PROTOCOL_F_STATUS
#define VHOST_USER_PROTOCOL_F_STATUS 16
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v13 0/5] Support add/remove memory region and get-max-slots
@ 2026-05-14 22:46 pravin.bathija
2026-05-14 22:46 ` [PATCH v13 1/5] vhost: add user to mailmap and define to vhost hdr pravin.bathija
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: pravin.bathija @ 2026-05-14 22:46 UTC (permalink / raw)
To: dev, fengchengwen, stephen, maxime.coquelin; +Cc: pravin.bathija, thomas
From: Pravin M Bathija <pravin.bathija@dell.com>
This is version v13 of the patchset and it incorporates the
recommendations made by Fengcheng Wen.
Changes made to patch 3/5 and 4/5
* Relocated function remove_guest_pages from patch 3/5 to 4/5.
Changes made to patch 2/5
* Renamed VhostUserSingleMemReg to VhostUserMemRegMsg and memory_single
to memreg.
This implementation has been extensively tested by doing Read/Write I/O
from multiple instances of fio + libblkio (front-end) talking to
spdk/dpdk (back-end) based drives. Tested with qemu front-end talking to
dpdk testpmd (back-end) performing add/removal of memory regions. Also
tested post-copy live migration after doing add_memory_region.
Version Log:
Version v13 (Current version): Incorporate code review suggestions from
Fengcheng Wen as described above.
Version v12: Incorporate code review suggestions from Maxime Coquelin
and ai-code-review.
Changes made to patch 3/5
Refactored async_dma_map() to delegate to async_dma_map_region(),
eliminating code duplication between the two functions.
Restored original comments in async_dma_map_region() explaining why
ENODEV and EINVAL errors are ignored (these were stripped in v10)
Reverted unnecessary changes to vhost_user_postcopy_register() --
removed the host_user_addr == 0 checks and reg_msg_index indirection
that were added in v10, since this function is only called from
vhost_user_set_mem_table() where regions are always contiguous.
Version v11: Incorporate code review suggestions from Stephen Hemminger.
Change made to patch 4/5
Fix incomplete cleanup in vhost_user_add_mem_reg() when
vhost_user_mmap_region() fails after the mmap succeeds (e.g.
add_guest_pages() realloc failure) realloc failure). The error path now
calls remove_guest_pages() and free_mem_region() to undo the mapping
and stale guest-page entries, preventing a leaked mmap and slot reuse
corruption. The plain close(fd) path is kept for pre-mmap failures.
Version v10: Incorporate code review suggestions from Stephen Hemminger.
Change made to patch 4/5
Moved dev_invalidate_vrings after free_mem_region, array compaction, and
nregions decrement. This ensures translate_ring_addresses only sees
surviving memory regions, preventing vring pointers from resolving into
a region that is about to be unmapped.
Version v9: Incorporate code review suggestions from Stephen Hemminger.
Changes made to patch 3/5
Restored max_guest_pages initial value to hardcoded 8 instead of
VHOST_MEMORY_MAX_NREGIONS, matching upstream semantics.
Changes made to patch 4/5
Added close(reg->fd) and reg->fd = -1 before goto close_msg_fds in the
mmap failure path to fix fd leak after fd was moved from ctx->fds[0].
Converted dev_invalidate_vrings from a plain function to a macro +
implementation function pair, accepting message ID as a parameter so
the static_assert reports the correct handler at each call site.
Updated dev_invalidate_vrings call in add_mem_reg to pass
VHOST_USER_ADD_MEM_REG as message ID.
Updated dev_invalidate_vrings call in rem_mem_reg to pass
VHOST_USER_REM_MEM_REG as message ID.
Version v8: Incorporate code review suggestions from Stephen Hemminger.
rewrite async_dma_map_region function to iterate guest pages by host
address range matching
change function dev_invalidate_vrings to accept a double pointer to
propagate pointer updates
new function remove_guest_pages was added
add_mem_reg error path was narrowed to only clean up the single failed
region instead of destroting all existing regions
Version v7: Incorporate code review suggestions from Maxime Coquelin.
Add debug messages to vhost_postcopy_register function.
Version v6: Added the enablement of this feature as a final patch in
this patch-set and other code optimizations as suggested by Maxime
Coquelin.
Version v5: removed the patch that increased the number of memory regions
from 8 to 128. This will be submitted as a separate feature at a later
point after incorporating additional optimizations. Also includes code
optimizations as suggested by Feng Cheng Wen.
Version v4: code optimizations as suggested by Feng Cheng Wen.
Version v3: code optimizations as suggested by Maxime Coquelin
and Thomas Monjalon.
Version v2: code optimizations as suggested by Maxime Coquelin.
Version v1: Initial patch set.
Pravin M Bathija (5):
vhost: add user to mailmap and define to vhost hdr
vhost_user: header defines for add/rem mem region
vhost_user: support function defines for back-end
vhost_user: Function defs for add/rem mem regions
vhost_user: enable configure memory slots
.mailmap | 1 +
lib/vhost/rte_vhost.h | 4 +
lib/vhost/vhost_user.c | 418 +++++++++++++++++++++++++++++++++++------
lib/vhost/vhost_user.h | 10 +
4 files changed, 371 insertions(+), 62 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v13 1/5] vhost: add user to mailmap and define to vhost hdr
2026-05-14 22:46 [PATCH v13 0/5] Support add/remove memory region and get-max-slots pravin.bathija
@ 2026-05-14 22:46 ` pravin.bathija
2026-05-15 0:20 ` fengchengwen
2026-05-14 22:46 ` [PATCH v13 2/5] vhost_user: header defines for add/rem mem region pravin.bathija
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: pravin.bathija @ 2026-05-14 22:46 UTC (permalink / raw)
To: dev, fengchengwen, stephen, maxime.coquelin; +Cc: pravin.bathija, thomas
From: Pravin M Bathija <pravin.bathija@dell.com>
- add user to mailmap file.
- define a bit-field called VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
that depicts if the feature/capability to add/remove memory regions
is supported. This is a part of the overall support for add/remove
memory region feature in this patchset.
Signed-off-by: Pravin M Bathija <pravin.bathija@dell.com>
---
.mailmap | 1 +
lib/vhost/rte_vhost.h | 4 ++++
2 files changed, 5 insertions(+)
diff --git a/.mailmap b/.mailmap
index 0e0d83e1c6..cc44e27036 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1295,6 +1295,7 @@ Prateek Agarwal <prateekag@cse.iitb.ac.in>
Prathisna Padmasanan <prathisna.padmasanan@intel.com>
Praveen Kaligineedi <pkaligineedi@google.com>
Praveen Shetty <praveen.shetty@intel.com>
+Pravin M Bathija <pravin.bathija@dell.com>
Pravin Pathak <pravin.pathak.dev@gmail.com> <pravin.pathak@intel.com>
Prince Takkar <ptakkar@marvell.com>
Priyalee Kushwaha <priyalee.kushwaha@intel.com>
diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
index 2f7c4c0080..a7f9700538 100644
--- a/lib/vhost/rte_vhost.h
+++ b/lib/vhost/rte_vhost.h
@@ -109,6 +109,10 @@ extern "C" {
#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
#endif
+#ifndef VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
+#define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15
+#endif
+
#ifndef VHOST_USER_PROTOCOL_F_STATUS
#define VHOST_USER_PROTOCOL_F_STATUS 16
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v13 2/5] vhost_user: header defines for add/rem mem region
2026-05-14 22:46 [PATCH v13 0/5] Support add/remove memory region and get-max-slots pravin.bathija
2026-05-14 22:46 ` [PATCH v13 1/5] vhost: add user to mailmap and define to vhost hdr pravin.bathija
@ 2026-05-14 22:46 ` pravin.bathija
2026-05-14 22:46 ` [PATCH v13 3/5] vhost_user: support function defines for back-end pravin.bathija
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: pravin.bathija @ 2026-05-14 22:46 UTC (permalink / raw)
To: dev, fengchengwen, stephen, maxime.coquelin; +Cc: pravin.bathija, thomas
From: Pravin M Bathija <pravin.bathija@dell.com>
The changes in this file cover the enum message requests for
supporting add/remove memory regions. The front-end vhost-user
client sends messages like get max memory slots, add memory region
and remove memory region which are covered in these changes which
are on the vhost-user back-end. The changes also include data structure
definition of memory region to be added/removed. The data structure
VhostUserMsg has been changed to include the memory region.
Signed-off-by: Pravin M Bathija <pravin.bathija@dell.com>
---
lib/vhost/vhost_user.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/lib/vhost/vhost_user.h b/lib/vhost/vhost_user.h
index ef486545ba..6435816534 100644
--- a/lib/vhost/vhost_user.h
+++ b/lib/vhost/vhost_user.h
@@ -67,6 +67,9 @@ typedef enum VhostUserRequest {
VHOST_USER_POSTCOPY_END = 30,
VHOST_USER_GET_INFLIGHT_FD = 31,
VHOST_USER_SET_INFLIGHT_FD = 32,
+ VHOST_USER_GET_MAX_MEM_SLOTS = 36,
+ VHOST_USER_ADD_MEM_REG = 37,
+ VHOST_USER_REM_MEM_REG = 38,
VHOST_USER_SET_STATUS = 39,
VHOST_USER_GET_STATUS = 40,
} VhostUserRequest;
@@ -91,6 +94,11 @@ typedef struct VhostUserMemory {
VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
} VhostUserMemory;
+typedef struct VhostUserMemRegMsg {
+ uint64_t padding;
+ VhostUserMemoryRegion region;
+} VhostUserMemRegMsg;
+
typedef struct VhostUserLog {
uint64_t mmap_size;
uint64_t mmap_offset;
@@ -186,6 +194,7 @@ typedef struct __rte_packed_begin VhostUserMsg {
struct vhost_vring_state state;
struct vhost_vring_addr addr;
VhostUserMemory memory;
+ VhostUserMemRegMsg memreg;
VhostUserLog log;
struct vhost_iotlb_msg iotlb;
VhostUserCryptoSessionParam crypto_session;
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v13 3/5] vhost_user: support function defines for back-end
2026-05-14 22:46 [PATCH v13 0/5] Support add/remove memory region and get-max-slots pravin.bathija
2026-05-14 22:46 ` [PATCH v13 1/5] vhost: add user to mailmap and define to vhost hdr pravin.bathija
2026-05-14 22:46 ` [PATCH v13 2/5] vhost_user: header defines for add/rem mem region pravin.bathija
@ 2026-05-14 22:46 ` pravin.bathija
2026-05-15 0:33 ` fengchengwen
2026-05-14 22:46 ` [PATCH v13 4/5] vhost_user: Function defs for add/rem mem regions pravin.bathija
2026-05-14 22:46 ` [PATCH v13 5/5] vhost_user: enable configure memory slots pravin.bathija
4 siblings, 1 reply; 10+ messages in thread
From: pravin.bathija @ 2026-05-14 22:46 UTC (permalink / raw)
To: dev, fengchengwen, stephen, maxime.coquelin; +Cc: pravin.bathija, thomas
From: Pravin M Bathija <pravin.bathija@dell.com>
Here we define support functions which are called from the various
vhost-user back-end message functions like set memory table, get
memory slots, add memory region, remove memory region. These are
essentially common functions to unmap a set of memory regions,
perform register copy, align memory addresses and dma map/unmap a
single memory region.
Signed-off-by: Pravin M Bathija <pravin.bathija@dell.com>
---
lib/vhost/vhost_user.c | 89 ++++++++++++++++++++++++++++--------------
1 file changed, 60 insertions(+), 29 deletions(-)
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 4bfb13fb98..0ee3fe7a5e 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -171,20 +171,27 @@ get_blk_size(int fd)
return ret == -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize;
}
-static void
-async_dma_map(struct virtio_net *dev, bool do_map)
+static int
+async_dma_map_region(struct virtio_net *dev, struct rte_vhost_mem_region *reg, bool do_map)
{
- int ret = 0;
uint32_t i;
- struct guest_page *page;
+ int ret;
+ uint64_t reg_start = reg->host_user_addr;
+ uint64_t reg_end = reg_start + reg->size;
+
+ for (i = 0; i < dev->nr_guest_pages; i++) {
+ struct guest_page *page = &dev->guest_pages[i];
+
+ /* Only process pages belonging to this region */
+ if (page->host_user_addr < reg_start ||
+ page->host_user_addr >= reg_end)
+ continue;
- if (do_map) {
- for (i = 0; i < dev->nr_guest_pages; i++) {
- page = &dev->guest_pages[i];
+ if (do_map) {
ret = rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD,
- page->host_user_addr,
- page->host_iova,
- page->size);
+ page->host_user_addr,
+ page->host_iova,
+ page->size);
if (ret) {
/*
* DMA device may bind with kernel driver, in this case,
@@ -199,33 +206,57 @@ async_dma_map(struct virtio_net *dev, bool do_map)
* normal case in async path. This is a workaround.
*/
if (rte_errno == ENODEV)
- return;
+ return 0;
/* DMA mapping errors won't stop VHOST_USER_SET_MEM_TABLE. */
VHOST_CONFIG_LOG(dev->ifname, ERR, "DMA engine map failed");
+ return -1;
}
- }
-
- } else {
- for (i = 0; i < dev->nr_guest_pages; i++) {
- page = &dev->guest_pages[i];
+ } else {
ret = rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD,
- page->host_user_addr,
- page->host_iova,
- page->size);
+ page->host_user_addr,
+ page->host_iova,
+ page->size);
if (ret) {
/* like DMA map, ignore the kernel driver case when unmap. */
if (rte_errno == EINVAL)
- return;
+ return 0;
VHOST_CONFIG_LOG(dev->ifname, ERR, "DMA engine unmap failed");
+ return -1;
}
}
}
+
+ return 0;
+}
+
+static void
+async_dma_map(struct virtio_net *dev, bool do_map)
+{
+ uint32_t i;
+ struct rte_vhost_mem_region *reg;
+
+ for (i = 0; i < VHOST_MEMORY_MAX_NREGIONS; i++) {
+ reg = &dev->mem->regions[i];
+ if (reg->host_user_addr == 0)
+ continue;
+ async_dma_map_region(dev, reg, do_map);
+ }
}
static void
-free_mem_region(struct virtio_net *dev)
+free_mem_region(struct rte_vhost_mem_region *reg)
+{
+ if (reg != NULL && reg->mmap_addr) {
+ munmap(reg->mmap_addr, reg->mmap_size);
+ close(reg->fd);
+ memset(reg, 0, sizeof(struct rte_vhost_mem_region));
+ }
+}
+
+static void
+free_all_mem_regions(struct virtio_net *dev)
{
uint32_t i;
struct rte_vhost_mem_region *reg;
@@ -236,12 +267,10 @@ free_mem_region(struct virtio_net *dev)
if (dev->async_copy && rte_vfio_is_enabled("vfio"))
async_dma_map(dev, false);
- for (i = 0; i < dev->mem->nregions; i++) {
+ for (i = 0; i < VHOST_MEMORY_MAX_NREGIONS; i++) {
reg = &dev->mem->regions[i];
- if (reg->host_user_addr) {
- munmap(reg->mmap_addr, reg->mmap_size);
- close(reg->fd);
- }
+ if (reg->mmap_addr)
+ free_mem_region(reg);
}
}
@@ -255,7 +284,7 @@ vhost_backend_cleanup(struct virtio_net *dev)
vdpa_dev->ops->dev_cleanup(dev->vid);
if (dev->mem) {
- free_mem_region(dev);
+ free_all_mem_regions(dev);
rte_free(dev->mem);
dev->mem = NULL;
}
@@ -704,7 +733,7 @@ numa_realloc(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
vhost_devices[dev->vid] = dev;
mem_size = sizeof(struct rte_vhost_memory) +
- sizeof(struct rte_vhost_mem_region) * dev->mem->nregions;
+ sizeof(struct rte_vhost_mem_region) * VHOST_MEMORY_MAX_NREGIONS;
mem = rte_realloc_socket(dev->mem, mem_size, 0, node);
if (!mem) {
VHOST_CONFIG_LOG(dev->ifname, ERR,
@@ -808,8 +837,10 @@ hua_to_alignment(struct rte_vhost_memory *mem, void *ptr)
uint32_t i;
uintptr_t hua = (uintptr_t)ptr;
- for (i = 0; i < mem->nregions; i++) {
+ for (i = 0; i < VHOST_MEMORY_MAX_NREGIONS; i++) {
r = &mem->regions[i];
+ if (r->host_user_addr == 0)
+ continue;
if (hua >= r->host_user_addr &&
hua < r->host_user_addr + r->size) {
return get_blk_size(r->fd);
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v13 4/5] vhost_user: Function defs for add/rem mem regions
2026-05-14 22:46 [PATCH v13 0/5] Support add/remove memory region and get-max-slots pravin.bathija
` (2 preceding siblings ...)
2026-05-14 22:46 ` [PATCH v13 3/5] vhost_user: support function defines for back-end pravin.bathija
@ 2026-05-14 22:46 ` pravin.bathija
2026-05-15 1:04 ` fengchengwen
2026-05-14 22:46 ` [PATCH v13 5/5] vhost_user: enable configure memory slots pravin.bathija
4 siblings, 1 reply; 10+ messages in thread
From: pravin.bathija @ 2026-05-14 22:46 UTC (permalink / raw)
To: dev, fengchengwen, stephen, maxime.coquelin; +Cc: pravin.bathija, thomas
From: Pravin M Bathija <pravin.bathija@dell.com>
These changes cover the function definition for add/remove memory
region calls which are invoked on receiving vhost user message from
vhost user front-end (e.g. Qemu). In our case, in addition to testing
with qemu front-end, the testing has also been performed with libblkio
front-end and spdk/dpdk back-end. We did I/O using libblkio based device
driver, to spdk based drives.
There are also changes for set_mem_table and new definition for get memory
slots. Our changes optimize the set memory table call to use common support
functions. A new vhost_user_initialize_memory() function is introduced to
factor out the common memory initialization logic from the function
vhost_user_set_mem_table(), which is now called from both the SET_MEM_TABLE
message handler and the ADD_MEM_REG handler (for the first region).
Message get memory slots is how the vhost-user front-end queries the
vhost-user back-end about the number of memory slots available to be
registered by the back-end. In addition support function to invalidate
vring is also defined which is used in add/remove memory region functions.
The helper function remove_guest_pages is also defined here which is called
from vhost_user_add_mem_reg.
Signed-off-by: Pravin M Bathija <pravin.bathija@dell.com>
---
lib/vhost/vhost_user.c | 329 ++++++++++++++++++++++++++++++++++++-----
1 file changed, 296 insertions(+), 33 deletions(-)
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 0ee3fe7a5e..fdcb7e0158 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -71,6 +71,9 @@ VHOST_MESSAGE_HANDLER(VHOST_USER_SET_FEATURES, vhost_user_set_features, false, t
VHOST_MESSAGE_HANDLER(VHOST_USER_SET_OWNER, vhost_user_set_owner, false, true) \
VHOST_MESSAGE_HANDLER(VHOST_USER_RESET_OWNER, vhost_user_reset_owner, false, false) \
VHOST_MESSAGE_HANDLER(VHOST_USER_SET_MEM_TABLE, vhost_user_set_mem_table, true, true) \
+VHOST_MESSAGE_HANDLER(VHOST_USER_GET_MAX_MEM_SLOTS, vhost_user_get_max_mem_slots, false, false) \
+VHOST_MESSAGE_HANDLER(VHOST_USER_ADD_MEM_REG, vhost_user_add_mem_reg, true, true) \
+VHOST_MESSAGE_HANDLER(VHOST_USER_REM_MEM_REG, vhost_user_rem_mem_reg, true, true) \
VHOST_MESSAGE_HANDLER(VHOST_USER_SET_LOG_BASE, vhost_user_set_log_base, true, true) \
VHOST_MESSAGE_HANDLER(VHOST_USER_SET_LOG_FD, vhost_user_set_log_fd, true, true) \
VHOST_MESSAGE_HANDLER(VHOST_USER_SET_VRING_NUM, vhost_user_set_vring_num, false, true) \
@@ -1167,6 +1170,24 @@ add_guest_pages(struct virtio_net *dev, struct rte_vhost_mem_region *reg,
return 0;
}
+static void
+remove_guest_pages(struct virtio_net *dev, struct rte_vhost_mem_region *reg)
+{
+ uint64_t reg_start = reg->host_user_addr;
+ uint64_t reg_end = reg_start + reg->size;
+ uint32_t i, j = 0;
+
+ for (i = 0; i < dev->nr_guest_pages; i++) {
+ if (dev->guest_pages[i].host_user_addr >= reg_start &&
+ dev->guest_pages[i].host_user_addr < reg_end)
+ continue;
+ if (j != i)
+ dev->guest_pages[j] = dev->guest_pages[i];
+ j++;
+ }
+ dev->nr_guest_pages = j;
+}
+
#ifdef RTE_LIBRTE_VHOST_DEBUG
/* TODO: enable it only in debug mode? */
static void
@@ -1413,6 +1434,52 @@ vhost_user_mmap_region(struct virtio_net *dev,
return 0;
}
+static int
+vhost_user_initialize_memory(struct virtio_net **pdev)
+{
+ struct virtio_net *dev = *pdev;
+ int numa_node = SOCKET_ID_ANY;
+
+ if (dev->mem != NULL) {
+ VHOST_CONFIG_LOG(dev->ifname, ERR,
+ "memory already initialized, free it first");
+ return -1;
+ }
+
+ /*
+ * If VQ 0 has already been allocated, try to allocate on the same
+ * NUMA node. It can be reallocated later in numa_realloc().
+ */
+ if (dev->nr_vring > 0)
+ numa_node = dev->virtqueue[0]->numa_node;
+
+ dev->nr_guest_pages = 0;
+ if (dev->guest_pages == NULL) {
+ dev->max_guest_pages = 8;
+ dev->guest_pages = rte_zmalloc_socket(NULL,
+ dev->max_guest_pages *
+ sizeof(struct guest_page),
+ RTE_CACHE_LINE_SIZE,
+ numa_node);
+ if (dev->guest_pages == NULL) {
+ VHOST_CONFIG_LOG(dev->ifname, ERR,
+ "failed to allocate memory for dev->guest_pages");
+ return -1;
+ }
+ }
+
+ dev->mem = rte_zmalloc_socket("vhost-mem-table", sizeof(struct rte_vhost_memory) +
+ sizeof(struct rte_vhost_mem_region) * VHOST_MEMORY_MAX_NREGIONS, 0, numa_node);
+ if (dev->mem == NULL) {
+ VHOST_CONFIG_LOG(dev->ifname, ERR, "failed to allocate memory for dev->mem");
+ rte_free(dev->guest_pages);
+ dev->guest_pages = NULL;
+ return -1;
+ }
+
+ return 0;
+}
+
static int
vhost_user_set_mem_table(struct virtio_net **pdev,
struct vhu_msg_context *ctx,
@@ -1421,7 +1488,6 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
struct virtio_net *dev = *pdev;
struct VhostUserMemory *memory = &ctx->msg.payload.memory;
struct rte_vhost_mem_region *reg;
- int numa_node = SOCKET_ID_ANY;
uint64_t mmap_offset;
uint32_t i;
bool async_notify = false;
@@ -1466,39 +1532,13 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
vhost_user_iotlb_flush_all(dev);
- free_mem_region(dev);
+ free_all_mem_regions(dev);
rte_free(dev->mem);
dev->mem = NULL;
}
- /*
- * If VQ 0 has already been allocated, try to allocate on the same
- * NUMA node. It can be reallocated later in numa_realloc().
- */
- if (dev->nr_vring > 0)
- numa_node = dev->virtqueue[0]->numa_node;
-
- dev->nr_guest_pages = 0;
- if (dev->guest_pages == NULL) {
- dev->max_guest_pages = 8;
- dev->guest_pages = rte_zmalloc_socket(NULL,
- dev->max_guest_pages *
- sizeof(struct guest_page),
- RTE_CACHE_LINE_SIZE,
- numa_node);
- if (dev->guest_pages == NULL) {
- VHOST_CONFIG_LOG(dev->ifname, ERR,
- "failed to allocate memory for dev->guest_pages");
- goto close_msg_fds;
- }
- }
-
- dev->mem = rte_zmalloc_socket("vhost-mem-table", sizeof(struct rte_vhost_memory) +
- sizeof(struct rte_vhost_mem_region) * memory->nregions, 0, numa_node);
- if (dev->mem == NULL) {
- VHOST_CONFIG_LOG(dev->ifname, ERR, "failed to allocate memory for dev->mem");
- goto free_guest_pages;
- }
+ if (vhost_user_initialize_memory(pdev) < 0)
+ goto close_msg_fds;
for (i = 0; i < memory->nregions; i++) {
reg = &dev->mem->regions[i];
@@ -1562,11 +1602,9 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
return RTE_VHOST_MSG_RESULT_OK;
free_mem_table:
- free_mem_region(dev);
+ free_all_mem_regions(dev);
rte_free(dev->mem);
dev->mem = NULL;
-
-free_guest_pages:
rte_free(dev->guest_pages);
dev->guest_pages = NULL;
close_msg_fds:
@@ -1574,6 +1612,231 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
return RTE_VHOST_MSG_RESULT_ERR;
}
+
+static int
+vhost_user_get_max_mem_slots(struct virtio_net **pdev __rte_unused,
+ struct vhu_msg_context *ctx,
+ int main_fd __rte_unused)
+{
+ uint32_t max_mem_slots = VHOST_MEMORY_MAX_NREGIONS;
+
+ ctx->msg.payload.u64 = (uint64_t)max_mem_slots;
+ ctx->msg.size = sizeof(ctx->msg.payload.u64);
+ ctx->fd_num = 0;
+
+ return RTE_VHOST_MSG_RESULT_REPLY;
+}
+
+static void
+_dev_invalidate_vrings(struct virtio_net **pdev)
+{
+ struct virtio_net *dev = *pdev;
+ uint32_t i;
+
+ for (i = 0; i < dev->nr_vring; i++) {
+ struct vhost_virtqueue *vq = dev->virtqueue[i];
+
+ if (!vq)
+ continue;
+
+ if (vq->desc || vq->avail || vq->used) {
+ vq_assert_lock(dev, vq);
+
+ /*
+ * If the memory table got updated, the ring addresses
+ * need to be translated again as virtual addresses have
+ * changed.
+ */
+ vring_invalidate(dev, vq);
+
+ translate_ring_addresses(&dev, &vq);
+ }
+ }
+
+ *pdev = dev;
+}
+
+/*
+ * Macro wrapper that performs the compile-time lock assertion with the
+ * correct message ID at the call site, then calls the implementation.
+ */
+#define dev_invalidate_vrings(pdev, id) do { \
+ static_assert(id ## _LOCK_ALL_QPS, \
+ #id " handler is not declared as locking all queue pairs"); \
+ _dev_invalidate_vrings(pdev); \
+} while (0)
+
+static int
+vhost_user_add_mem_reg(struct virtio_net **pdev,
+ struct vhu_msg_context *ctx,
+ int main_fd __rte_unused)
+{
+ uint32_t i;
+ struct virtio_net *dev = *pdev;
+ struct VhostUserMemoryRegion *region = &ctx->msg.payload.memreg.region;
+
+ /* convert first region add to normal memory table set */
+ if (dev->mem == NULL) {
+ if (vhost_user_initialize_memory(pdev) < 0)
+ goto close_msg_fds;
+ }
+
+ /* make sure new region will fit */
+ if (dev->mem->nregions >= VHOST_MEMORY_MAX_NREGIONS) {
+ VHOST_CONFIG_LOG(dev->ifname, ERR, "too many memory regions already (%u)",
+ dev->mem->nregions);
+ goto close_msg_fds;
+ }
+
+ /* make sure supplied memory fd present */
+ if (ctx->fd_num != 1) {
+ VHOST_CONFIG_LOG(dev->ifname, ERR, "fd count makes no sense (%u)", ctx->fd_num);
+ goto close_msg_fds;
+ }
+
+ /* Make sure no overlap in guest virtual address space */
+ for (i = 0; i < dev->mem->nregions; i++) {
+ struct rte_vhost_mem_region *current_region = &dev->mem->regions[i];
+ uint64_t current_region_guest_start = current_region->guest_user_addr;
+ uint64_t current_region_guest_end = current_region_guest_start
+ + current_region->size - 1;
+ uint64_t proposed_region_guest_start = region->userspace_addr;
+ uint64_t proposed_region_guest_end = proposed_region_guest_start
+ + region->memory_size - 1;
+
+ if (!((proposed_region_guest_end < current_region_guest_start) ||
+ (proposed_region_guest_start > current_region_guest_end))) {
+ VHOST_CONFIG_LOG(dev->ifname, ERR,
+ "requested memory region overlaps with another region");
+ VHOST_CONFIG_LOG(dev->ifname, ERR,
+ "\tRequested region address:0x%" PRIx64,
+ region->userspace_addr);
+ VHOST_CONFIG_LOG(dev->ifname, ERR,
+ "\tRequested region size:0x%" PRIx64,
+ region->memory_size);
+ VHOST_CONFIG_LOG(dev->ifname, ERR,
+ "\tOverlapping region address:0x%" PRIx64,
+ current_region->guest_user_addr);
+ VHOST_CONFIG_LOG(dev->ifname, ERR,
+ "\tOverlapping region size:0x%" PRIx64,
+ current_region->size);
+ goto close_msg_fds;
+ }
+ }
+
+ /* New region goes at the end of the contiguous array */
+ struct rte_vhost_mem_region *reg = &dev->mem->regions[dev->mem->nregions];
+
+ reg->guest_phys_addr = region->guest_phys_addr;
+ reg->guest_user_addr = region->userspace_addr;
+ reg->size = region->memory_size;
+ reg->fd = ctx->fds[0];
+ ctx->fds[0] = -1;
+
+ if (vhost_user_mmap_region(dev, reg, region->mmap_offset) < 0) {
+ VHOST_CONFIG_LOG(dev->ifname, ERR, "failed to mmap region");
+ if (reg->mmap_addr) {
+ /* mmap succeeded but a later step (e.g. add_guest_pages)
+ * failed; undo the mapping and any guest-page entries.
+ */
+ remove_guest_pages(dev, reg);
+ free_mem_region(reg);
+ } else {
+ close(reg->fd);
+ reg->fd = -1;
+ }
+ goto close_msg_fds;
+ }
+
+ dev->mem->nregions++;
+
+ if (dev->async_copy && rte_vfio_is_enabled("vfio")) {
+ if (async_dma_map_region(dev, reg, true) < 0)
+ goto free_new_region;
+ }
+
+ if (dev->postcopy_listening) {
+ /*
+ * Cannot use vhost_user_postcopy_register() here because it
+ * reads ctx->msg.payload.memory (SET_MEM_TABLE layout), but
+ * ADD_MEM_REG uses the memreg payload. Register the
+ * single new region directly instead.
+ */
+ if (vhost_user_postcopy_region_register(dev, reg) < 0)
+ goto free_new_region;
+ }
+
+ dev_invalidate_vrings(pdev, VHOST_USER_ADD_MEM_REG);
+ dev = *pdev;
+ dump_guest_pages(dev);
+
+ return RTE_VHOST_MSG_RESULT_OK;
+
+free_new_region:
+ if (dev->async_copy && rte_vfio_is_enabled("vfio"))
+ async_dma_map_region(dev, reg, false);
+ remove_guest_pages(dev, reg);
+ free_mem_region(reg);
+ dev->mem->nregions--;
+close_msg_fds:
+ close_msg_fds(ctx);
+ return RTE_VHOST_MSG_RESULT_ERR;
+}
+
+static int
+vhost_user_rem_mem_reg(struct virtio_net **pdev,
+ struct vhu_msg_context *ctx,
+ int main_fd __rte_unused)
+{
+ uint32_t i;
+ struct virtio_net *dev = *pdev;
+ struct VhostUserMemoryRegion *region = &ctx->msg.payload.memreg.region;
+
+ if (dev->mem == NULL || dev->mem->nregions == 0) {
+ VHOST_CONFIG_LOG(dev->ifname, ERR, "no memory regions to remove");
+ close_msg_fds(ctx);
+ return RTE_VHOST_MSG_RESULT_ERR;
+ }
+
+ for (i = 0; i < dev->mem->nregions; i++) {
+ struct rte_vhost_mem_region *current_region = &dev->mem->regions[i];
+
+ /*
+ * According to the vhost-user specification:
+ * The memory region to be removed is identified by its GPA,
+ * user address and size. The mmap offset is ignored.
+ */
+ if (region->userspace_addr == current_region->guest_user_addr
+ && region->guest_phys_addr == current_region->guest_phys_addr
+ && region->memory_size == current_region->size) {
+ if (dev->async_copy && rte_vfio_is_enabled("vfio"))
+ async_dma_map_region(dev, current_region, false);
+ remove_guest_pages(dev, current_region);
+ free_mem_region(current_region);
+
+ /* Compact the regions array to keep it contiguous */
+ if (i < dev->mem->nregions - 1) {
+ memmove(&dev->mem->regions[i],
+ &dev->mem->regions[i + 1],
+ (dev->mem->nregions - 1 - i) *
+ sizeof(struct rte_vhost_mem_region));
+ memset(&dev->mem->regions[dev->mem->nregions - 1],
+ 0, sizeof(struct rte_vhost_mem_region));
+ }
+
+ dev->mem->nregions--;
+ dev_invalidate_vrings(pdev, VHOST_USER_REM_MEM_REG);
+ dev = *pdev;
+ close_msg_fds(ctx);
+ return RTE_VHOST_MSG_RESULT_OK;
+ }
+ }
+
+ VHOST_CONFIG_LOG(dev->ifname, ERR, "failed to find region");
+ close_msg_fds(ctx);
+ return RTE_VHOST_MSG_RESULT_ERR;
+}
+
static bool
vq_is_ready(struct virtio_net *dev, struct vhost_virtqueue *vq)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v13 5/5] vhost_user: enable configure memory slots
2026-05-14 22:46 [PATCH v13 0/5] Support add/remove memory region and get-max-slots pravin.bathija
` (3 preceding siblings ...)
2026-05-14 22:46 ` [PATCH v13 4/5] vhost_user: Function defs for add/rem mem regions pravin.bathija
@ 2026-05-14 22:46 ` pravin.bathija
4 siblings, 0 replies; 10+ messages in thread
From: pravin.bathija @ 2026-05-14 22:46 UTC (permalink / raw)
To: dev, fengchengwen, stephen, maxime.coquelin; +Cc: pravin.bathija, thomas
From: Pravin M Bathija <pravin.bathija@dell.com>
This patch enables configure memory slots in the header define
VHOST_USER_PROTOCOL_FEATURES.
Signed-off-by: Pravin M Bathija <pravin.bathija@dell.com>
---
lib/vhost/vhost_user.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/vhost/vhost_user.h b/lib/vhost/vhost_user.h
index 6435816534..732aa4dc02 100644
--- a/lib/vhost/vhost_user.h
+++ b/lib/vhost/vhost_user.h
@@ -32,6 +32,7 @@
(1ULL << VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD) | \
(1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \
(1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT) | \
+ (1ULL << VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS) | \
(1ULL << VHOST_USER_PROTOCOL_F_STATUS))
typedef enum VhostUserRequest {
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v13 1/5] vhost: add user to mailmap and define to vhost hdr
2026-05-14 22:46 ` [PATCH v13 1/5] vhost: add user to mailmap and define to vhost hdr pravin.bathija
@ 2026-05-15 0:20 ` fengchengwen
0 siblings, 0 replies; 10+ messages in thread
From: fengchengwen @ 2026-05-15 0:20 UTC (permalink / raw)
To: pravin.bathija, dev, stephen, maxime.coquelin; +Cc: thomas
On 5/15/2026 6:46 AM, pravin.bathija@dell.com wrote:
> From: Pravin M Bathija <pravin.bathija@dell.com>
>
> - add user to mailmap file.
> - define a bit-field called VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
> that depicts if the feature/capability to add/remove memory regions
> is supported. This is a part of the overall support for add/remove
> memory region feature in this patchset.
>
> Signed-off-by: Pravin M Bathija <pravin.bathija@dell.com>
Please attach reviewed/acked-by of this commit in new version
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v13 3/5] vhost_user: support function defines for back-end
2026-05-14 22:46 ` [PATCH v13 3/5] vhost_user: support function defines for back-end pravin.bathija
@ 2026-05-15 0:33 ` fengchengwen
0 siblings, 0 replies; 10+ messages in thread
From: fengchengwen @ 2026-05-15 0:33 UTC (permalink / raw)
To: pravin.bathija, dev, stephen, maxime.coquelin; +Cc: thomas
please use vhost as the commit title prefix (please use git blame to refer),
the same as other commit in this patchset.
How about: vhost: refactor memory helper functions
On 5/15/2026 6:46 AM, pravin.bathija@dell.com wrote:
> From: Pravin M Bathija <pravin.bathija@dell.com>
>
> Here we define support functions which are called from the various
> vhost-user back-end message functions like set memory table, get
> memory slots, add memory region, remove memory region. These are
> essentially common functions to unmap a set of memory regions,
> perform register copy, align memory addresses and dma map/unmap a
> single memory region.
Two much detail, how about:
Extract reusable helper routines for vhost-user backend memory operations:
split DMA map/unmap into per-region logic, decouple and rework memory
region free routines, and iterate over VHOST_MEMORY_MAX_NREGIONS
uniformly across related functions to simplify code reuse.
As above fixed:
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
>
> Signed-off-by: Pravin M Bathija <pravin.bathija@dell.com>
> ---
> lib/vhost/vhost_user.c | 89 ++++++++++++++++++++++++++++--------------
> 1 file changed, 60 insertions(+), 29 deletions(-)
>
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 4bfb13fb98..0ee3fe7a5e 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -171,20 +171,27 @@ get_blk_size(int fd)
> return ret == -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize;
> }
>
> -static void
> -async_dma_map(struct virtio_net *dev, bool do_map)
> +static int
> +async_dma_map_region(struct virtio_net *dev, struct rte_vhost_mem_region *reg, bool do_map)
> {
> - int ret = 0;
> uint32_t i;
> - struct guest_page *page;
> + int ret;
> + uint64_t reg_start = reg->host_user_addr;
> + uint64_t reg_end = reg_start + reg->size;
> +
> + for (i = 0; i < dev->nr_guest_pages; i++) {
> + struct guest_page *page = &dev->guest_pages[i];
> +
> + /* Only process pages belonging to this region */
> + if (page->host_user_addr < reg_start ||
> + page->host_user_addr >= reg_end)
> + continue;
>
> - if (do_map) {
> - for (i = 0; i < dev->nr_guest_pages; i++) {
> - page = &dev->guest_pages[i];
> + if (do_map) {
> ret = rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD,
> - page->host_user_addr,
> - page->host_iova,
> - page->size);
> + page->host_user_addr,
> + page->host_iova,
> + page->size);
> if (ret) {
> /*
> * DMA device may bind with kernel driver, in this case,
> @@ -199,33 +206,57 @@ async_dma_map(struct virtio_net *dev, bool do_map)
> * normal case in async path. This is a workaround.
> */
> if (rte_errno == ENODEV)
> - return;
> + return 0;
>
> /* DMA mapping errors won't stop VHOST_USER_SET_MEM_TABLE. */
> VHOST_CONFIG_LOG(dev->ifname, ERR, "DMA engine map failed");
> + return -1;
> }
> - }
> -
> - } else {
> - for (i = 0; i < dev->nr_guest_pages; i++) {
> - page = &dev->guest_pages[i];
> + } else {
> ret = rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD,
> - page->host_user_addr,
> - page->host_iova,
> - page->size);
> + page->host_user_addr,
> + page->host_iova,
> + page->size);
> if (ret) {
> /* like DMA map, ignore the kernel driver case when unmap. */
> if (rte_errno == EINVAL)
> - return;
> + return 0;
>
> VHOST_CONFIG_LOG(dev->ifname, ERR, "DMA engine unmap failed");
> + return -1;
> }
> }
> }
> +
> + return 0;
> +}
> +
> +static void
> +async_dma_map(struct virtio_net *dev, bool do_map)
> +{
> + uint32_t i;
> + struct rte_vhost_mem_region *reg;
> +
> + for (i = 0; i < VHOST_MEMORY_MAX_NREGIONS; i++) {
> + reg = &dev->mem->regions[i];
> + if (reg->host_user_addr == 0)
> + continue;
> + async_dma_map_region(dev, reg, do_map);
> + }
> }
>
> static void
> -free_mem_region(struct virtio_net *dev)
> +free_mem_region(struct rte_vhost_mem_region *reg)
> +{
> + if (reg != NULL && reg->mmap_addr) {
> + munmap(reg->mmap_addr, reg->mmap_size);
> + close(reg->fd);
> + memset(reg, 0, sizeof(struct rte_vhost_mem_region));
> + }
> +}
> +
> +static void
> +free_all_mem_regions(struct virtio_net *dev)
> {
> uint32_t i;
> struct rte_vhost_mem_region *reg;
> @@ -236,12 +267,10 @@ free_mem_region(struct virtio_net *dev)
> if (dev->async_copy && rte_vfio_is_enabled("vfio"))
> async_dma_map(dev, false);
>
> - for (i = 0; i < dev->mem->nregions; i++) {
> + for (i = 0; i < VHOST_MEMORY_MAX_NREGIONS; i++) {
> reg = &dev->mem->regions[i];
> - if (reg->host_user_addr) {
> - munmap(reg->mmap_addr, reg->mmap_size);
> - close(reg->fd);
> - }
> + if (reg->mmap_addr)
> + free_mem_region(reg);
> }
> }
>
> @@ -255,7 +284,7 @@ vhost_backend_cleanup(struct virtio_net *dev)
> vdpa_dev->ops->dev_cleanup(dev->vid);
>
> if (dev->mem) {
> - free_mem_region(dev);
> + free_all_mem_regions(dev);
> rte_free(dev->mem);
> dev->mem = NULL;
> }
> @@ -704,7 +733,7 @@ numa_realloc(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> vhost_devices[dev->vid] = dev;
>
> mem_size = sizeof(struct rte_vhost_memory) +
> - sizeof(struct rte_vhost_mem_region) * dev->mem->nregions;
> + sizeof(struct rte_vhost_mem_region) * VHOST_MEMORY_MAX_NREGIONS;
> mem = rte_realloc_socket(dev->mem, mem_size, 0, node);
> if (!mem) {
> VHOST_CONFIG_LOG(dev->ifname, ERR,
> @@ -808,8 +837,10 @@ hua_to_alignment(struct rte_vhost_memory *mem, void *ptr)
> uint32_t i;
> uintptr_t hua = (uintptr_t)ptr;
>
> - for (i = 0; i < mem->nregions; i++) {
> + for (i = 0; i < VHOST_MEMORY_MAX_NREGIONS; i++) {
> r = &mem->regions[i];
> + if (r->host_user_addr == 0)
> + continue;
> if (hua >= r->host_user_addr &&
> hua < r->host_user_addr + r->size) {
> return get_blk_size(r->fd);
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v13 4/5] vhost_user: Function defs for add/rem mem regions
2026-05-14 22:46 ` [PATCH v13 4/5] vhost_user: Function defs for add/rem mem regions pravin.bathija
@ 2026-05-15 1:04 ` fengchengwen
0 siblings, 0 replies; 10+ messages in thread
From: fengchengwen @ 2026-05-15 1:04 UTC (permalink / raw)
To: pravin.bathija, dev, stephen, maxime.coquelin; +Cc: thomas
On 5/15/2026 6:46 AM, pravin.bathija@dell.com wrote:
> From: Pravin M Bathija <pravin.bathija@dell.com>
>
> These changes cover the function definition for add/remove memory
> region calls which are invoked on receiving vhost user message from
> vhost user front-end (e.g. Qemu). In our case, in addition to testing
> with qemu front-end, the testing has also been performed with libblkio
> front-end and spdk/dpdk back-end. We did I/O using libblkio based device
> driver, to spdk based drives.
> There are also changes for set_mem_table and new definition for get memory
> slots. Our changes optimize the set memory table call to use common support
> functions. A new vhost_user_initialize_memory() function is introduced to
> factor out the common memory initialization logic from the function
> vhost_user_set_mem_table(), which is now called from both the SET_MEM_TABLE
> message handler and the ADD_MEM_REG handler (for the first region).
> Message get memory slots is how the vhost-user front-end queries the
> vhost-user back-end about the number of memory slots available to be
> registered by the back-end. In addition support function to invalidate
> vring is also defined which is used in add/remove memory region functions.
> The helper function remove_guest_pages is also defined here which is called
> from vhost_user_add_mem_reg.
Two much detail which provide noisy infomation I think, how about:
vhost: add mem region add/remove handlers
Add support for VHOST_USER_ADD_MEM_REG, VHOST_USER_REM_MEM_REG and
VHOST_USER_GET_MAX_MEM_SLOTS. Refactor memory initialization into
common helper and add supporting functions for dynamic memory management.
Signed-off-by: Pravin M Bathija <pravin.bathija@dell.com>
>
> Signed-off-by: Pravin M Bathija <pravin.bathija@dell.com>
> ---
> lib/vhost/vhost_user.c | 329 ++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 296 insertions(+), 33 deletions(-)
>
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 0ee3fe7a5e..fdcb7e0158 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -71,6 +71,9 @@ VHOST_MESSAGE_HANDLER(VHOST_USER_SET_FEATURES, vhost_user_set_features, false, t
> VHOST_MESSAGE_HANDLER(VHOST_USER_SET_OWNER, vhost_user_set_owner, false, true) \
> VHOST_MESSAGE_HANDLER(VHOST_USER_RESET_OWNER, vhost_user_reset_owner, false, false) \
> VHOST_MESSAGE_HANDLER(VHOST_USER_SET_MEM_TABLE, vhost_user_set_mem_table, true, true) \
> +VHOST_MESSAGE_HANDLER(VHOST_USER_GET_MAX_MEM_SLOTS, vhost_user_get_max_mem_slots, false, false) \
> +VHOST_MESSAGE_HANDLER(VHOST_USER_ADD_MEM_REG, vhost_user_add_mem_reg, true, true) \
> +VHOST_MESSAGE_HANDLER(VHOST_USER_REM_MEM_REG, vhost_user_rem_mem_reg, true, true) \
> VHOST_MESSAGE_HANDLER(VHOST_USER_SET_LOG_BASE, vhost_user_set_log_base, true, true) \
> VHOST_MESSAGE_HANDLER(VHOST_USER_SET_LOG_FD, vhost_user_set_log_fd, true, true) \
> VHOST_MESSAGE_HANDLER(VHOST_USER_SET_VRING_NUM, vhost_user_set_vring_num, false, true) \
> @@ -1167,6 +1170,24 @@ add_guest_pages(struct virtio_net *dev, struct rte_vhost_mem_region *reg,
> return 0;
> }
>
> +static void
> +remove_guest_pages(struct virtio_net *dev, struct rte_vhost_mem_region *reg)
> +{
> + uint64_t reg_start = reg->host_user_addr;
> + uint64_t reg_end = reg_start + reg->size;
> + uint32_t i, j = 0;
> +
> + for (i = 0; i < dev->nr_guest_pages; i++) {
> + if (dev->guest_pages[i].host_user_addr >= reg_start &&
> + dev->guest_pages[i].host_user_addr < reg_end)
> + continue;
> + if (j != i)
> + dev->guest_pages[j] = dev->guest_pages[i];
> + j++;
> + }
> + dev->nr_guest_pages = j;
> +}
> +
> #ifdef RTE_LIBRTE_VHOST_DEBUG
> /* TODO: enable it only in debug mode? */
> static void
> @@ -1413,6 +1434,52 @@ vhost_user_mmap_region(struct virtio_net *dev,
> return 0;
> }
>
> +static int
> +vhost_user_initialize_memory(struct virtio_net **pdev)
> +{
> + struct virtio_net *dev = *pdev;
> + int numa_node = SOCKET_ID_ANY;
> +
> + if (dev->mem != NULL) {
> + VHOST_CONFIG_LOG(dev->ifname, ERR,
> + "memory already initialized, free it first");
> + return -1;
> + }
> +
> + /*
> + * If VQ 0 has already been allocated, try to allocate on the same
> + * NUMA node. It can be reallocated later in numa_realloc().
> + */
> + if (dev->nr_vring > 0)
> + numa_node = dev->virtqueue[0]->numa_node;
> +
> + dev->nr_guest_pages = 0;
> + if (dev->guest_pages == NULL) {
> + dev->max_guest_pages = 8;
> + dev->guest_pages = rte_zmalloc_socket(NULL,
> + dev->max_guest_pages *
> + sizeof(struct guest_page),
> + RTE_CACHE_LINE_SIZE,
> + numa_node);
> + if (dev->guest_pages == NULL) {
> + VHOST_CONFIG_LOG(dev->ifname, ERR,
> + "failed to allocate memory for dev->guest_pages");
> + return -1;
> + }
> + }
> +
> + dev->mem = rte_zmalloc_socket("vhost-mem-table", sizeof(struct rte_vhost_memory) +
> + sizeof(struct rte_vhost_mem_region) * VHOST_MEMORY_MAX_NREGIONS, 0, numa_node);
> + if (dev->mem == NULL) {
> + VHOST_CONFIG_LOG(dev->ifname, ERR, "failed to allocate memory for dev->mem");
> + rte_free(dev->guest_pages);
> + dev->guest_pages = NULL;
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> static int
> vhost_user_set_mem_table(struct virtio_net **pdev,
> struct vhu_msg_context *ctx,
> @@ -1421,7 +1488,6 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
> struct virtio_net *dev = *pdev;
> struct VhostUserMemory *memory = &ctx->msg.payload.memory;
> struct rte_vhost_mem_region *reg;
> - int numa_node = SOCKET_ID_ANY;
> uint64_t mmap_offset;
> uint32_t i;
> bool async_notify = false;
> @@ -1466,39 +1532,13 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
> if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> vhost_user_iotlb_flush_all(dev);
>
> - free_mem_region(dev);
> + free_all_mem_regions(dev);
This should be done in commit 3/5, I suspect that 3/5 of the code may fail to be compiled.
Please make sure each commit should compile OK, so that git commit binary search and
troubleshooting could work.
> rte_free(dev->mem);
> dev->mem = NULL;
> }
>
> - /*
> - * If VQ 0 has already been allocated, try to allocate on the same
> - * NUMA node. It can be reallocated later in numa_realloc().
> - */
> - if (dev->nr_vring > 0)
> - numa_node = dev->virtqueue[0]->numa_node;
> -
> - dev->nr_guest_pages = 0;
> - if (dev->guest_pages == NULL) {
> - dev->max_guest_pages = 8;
> - dev->guest_pages = rte_zmalloc_socket(NULL,
> - dev->max_guest_pages *
> - sizeof(struct guest_page),
> - RTE_CACHE_LINE_SIZE,
> - numa_node);
> - if (dev->guest_pages == NULL) {
> - VHOST_CONFIG_LOG(dev->ifname, ERR,
> - "failed to allocate memory for dev->guest_pages");
> - goto close_msg_fds;
> - }
> - }
> -
> - dev->mem = rte_zmalloc_socket("vhost-mem-table", sizeof(struct rte_vhost_memory) +
> - sizeof(struct rte_vhost_mem_region) * memory->nregions, 0, numa_node);
> - if (dev->mem == NULL) {
> - VHOST_CONFIG_LOG(dev->ifname, ERR, "failed to allocate memory for dev->mem");
> - goto free_guest_pages;
> - }
> + if (vhost_user_initialize_memory(pdev) < 0)
> + goto close_msg_fds;
>
> for (i = 0; i < memory->nregions; i++) {
> reg = &dev->mem->regions[i];
> @@ -1562,11 +1602,9 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
> return RTE_VHOST_MSG_RESULT_OK;
>
> free_mem_table:
> - free_mem_region(dev);
> + free_all_mem_regions(dev);
Same, it should be done in commit 3/5
> rte_free(dev->mem);
> dev->mem = NULL;
> -
> -free_guest_pages:
> rte_free(dev->guest_pages);
> dev->guest_pages = NULL;
> close_msg_fds:
> @@ -1574,6 +1612,231 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
> return RTE_VHOST_MSG_RESULT_ERR;
> }
>
> +
> +static int
> +vhost_user_get_max_mem_slots(struct virtio_net **pdev __rte_unused,
> + struct vhu_msg_context *ctx,
> + int main_fd __rte_unused)
> +{
> + uint32_t max_mem_slots = VHOST_MEMORY_MAX_NREGIONS;
> +
> + ctx->msg.payload.u64 = (uint64_t)max_mem_slots;
> + ctx->msg.size = sizeof(ctx->msg.payload.u64);
> + ctx->fd_num = 0;
> +
> + return RTE_VHOST_MSG_RESULT_REPLY;
> +}
> +
> +static void
> +_dev_invalidate_vrings(struct virtio_net **pdev)
It seems that there is no such naming convention in vhost.
> +{
> + struct virtio_net *dev = *pdev;
> + uint32_t i;
> +
> + for (i = 0; i < dev->nr_vring; i++) {
> + struct vhost_virtqueue *vq = dev->virtqueue[i];
> +
> + if (!vq)
> + continue;
> +
> + if (vq->desc || vq->avail || vq->used) {
> + vq_assert_lock(dev, vq);
> +
> + /*
> + * If the memory table got updated, the ring addresses
> + * need to be translated again as virtual addresses have
> + * changed.
> + */
> + vring_invalidate(dev, vq);
> +
> + translate_ring_addresses(&dev, &vq);
> + }
> + }
> +
> + *pdev = dev;
why do this?
> +}
> +
> +/*
> + * Macro wrapper that performs the compile-time lock assertion with the
> + * correct message ID at the call site, then calls the implementation.
> + */
> +#define dev_invalidate_vrings(pdev, id) do { \
> + static_assert(id ## _LOCK_ALL_QPS, \
> + #id " handler is not declared as locking all queue pairs"); \
> + _dev_invalidate_vrings(pdev); \
> +} while (0)
> +
> +static int
> +vhost_user_add_mem_reg(struct virtio_net **pdev,
> + struct vhu_msg_context *ctx,
> + int main_fd __rte_unused)
> +{
> + uint32_t i;
> + struct virtio_net *dev = *pdev;
> + struct VhostUserMemoryRegion *region = &ctx->msg.payload.memreg.region;
Local variables should be arranged in descending order of length.
struct VhostUserMemoryRegion *region = &ctx->msg.payload.memreg.region;
struct virtio_net *dev = *pdev;
uint32_t i;
> +
> + /* convert first region add to normal memory table set */
> + if (dev->mem == NULL) {
> + if (vhost_user_initialize_memory(pdev) < 0)
> + goto close_msg_fds;
> + }
> +
> + /* make sure new region will fit */
> + if (dev->mem->nregions >= VHOST_MEMORY_MAX_NREGIONS) {
> + VHOST_CONFIG_LOG(dev->ifname, ERR, "too many memory regions already (%u)",
> + dev->mem->nregions);
> + goto close_msg_fds;
> + }
> +
> + /* make sure supplied memory fd present */
> + if (ctx->fd_num != 1) {
> + VHOST_CONFIG_LOG(dev->ifname, ERR, "fd count makes no sense (%u)", ctx->fd_num);
> + goto close_msg_fds;
> + }
> +
> + /* Make sure no overlap in guest virtual address space */
> + for (i = 0; i < dev->mem->nregions; i++) {
> + struct rte_vhost_mem_region *current_region = &dev->mem->regions[i];
> + uint64_t current_region_guest_start = current_region->guest_user_addr;
> + uint64_t current_region_guest_end = current_region_guest_start
> + + current_region->size - 1;
> + uint64_t proposed_region_guest_start = region->userspace_addr;
> + uint64_t proposed_region_guest_end = proposed_region_guest_start
> + + region->memory_size - 1;
why not use short name?
> +
> + if (!((proposed_region_guest_end < current_region_guest_start) ||
> + (proposed_region_guest_start > current_region_guest_end))) {
> + VHOST_CONFIG_LOG(dev->ifname, ERR,
> + "requested memory region overlaps with another region");
> + VHOST_CONFIG_LOG(dev->ifname, ERR,
> + "\tRequested region address:0x%" PRIx64,
> + region->userspace_addr);
> + VHOST_CONFIG_LOG(dev->ifname, ERR,
> + "\tRequested region size:0x%" PRIx64,
> + region->memory_size);
> + VHOST_CONFIG_LOG(dev->ifname, ERR,
> + "\tOverlapping region address:0x%" PRIx64,
> + current_region->guest_user_addr);
> + VHOST_CONFIG_LOG(dev->ifname, ERR,
> + "\tOverlapping region size:0x%" PRIx64,
> + current_region->size);
> + goto close_msg_fds;
> + }
> + }
> +
> + /* New region goes at the end of the contiguous array */
> + struct rte_vhost_mem_region *reg = &dev->mem->regions[dev->mem->nregions];
> +
> + reg->guest_phys_addr = region->guest_phys_addr;
> + reg->guest_user_addr = region->userspace_addr;
> + reg->size = region->memory_size;
> + reg->fd = ctx->fds[0];
> + ctx->fds[0] = -1;
> +
> + if (vhost_user_mmap_region(dev, reg, region->mmap_offset) < 0) {
> + VHOST_CONFIG_LOG(dev->ifname, ERR, "failed to mmap region");
> + if (reg->mmap_addr) {
> + /* mmap succeeded but a later step (e.g. add_guest_pages)
> + * failed; undo the mapping and any guest-page entries.
> + */
> + remove_guest_pages(dev, reg);
> + free_mem_region(reg);
> + } else {
> + close(reg->fd);
> + reg->fd = -1;
> + }
> + goto close_msg_fds;
> + }
> +
> + dev->mem->nregions++;
> +
> + if (dev->async_copy && rte_vfio_is_enabled("vfio")) {
> + if (async_dma_map_region(dev, reg, true) < 0)
> + goto free_new_region;
I point it out in v12, maybe not so clear, so again:
the goto will invoke async_dma_map_region(dev, reg, false), it should not invoke in
this branch.
> + }
> +
> + if (dev->postcopy_listening) {
> + /*
> + * Cannot use vhost_user_postcopy_register() here because it
> + * reads ctx->msg.payload.memory (SET_MEM_TABLE layout), but
> + * ADD_MEM_REG uses the memreg payload. Register the
> + * single new region directly instead.
> + */
> + if (vhost_user_postcopy_region_register(dev, reg) < 0)
> + goto free_new_region;
> + }
> +
> + dev_invalidate_vrings(pdev, VHOST_USER_ADD_MEM_REG);
> + dev = *pdev;
What the meaning? the dev already set *pdev in the beginning.
I also point it out in v12, I don't know what happening.
> + dump_guest_pages(dev);
> +
> + return RTE_VHOST_MSG_RESULT_OK;
> +
> +free_new_region:
> + if (dev->async_copy && rte_vfio_is_enabled("vfio"))
> + async_dma_map_region(dev, reg, false);
> + remove_guest_pages(dev, reg);
> + free_mem_region(reg);
> + dev->mem->nregions--;
> +close_msg_fds:
> + close_msg_fds(ctx);
> + return RTE_VHOST_MSG_RESULT_ERR;
> +}
> +
> +static int
> +vhost_user_rem_mem_reg(struct virtio_net **pdev,
> + struct vhu_msg_context *ctx,
> + int main_fd __rte_unused)
> +{
> + uint32_t i;
> + struct virtio_net *dev = *pdev;
> + struct VhostUserMemoryRegion *region = &ctx->msg.payload.memreg.region;
> +
> + if (dev->mem == NULL || dev->mem->nregions == 0) {
> + VHOST_CONFIG_LOG(dev->ifname, ERR, "no memory regions to remove");
> + close_msg_fds(ctx);
> + return RTE_VHOST_MSG_RESULT_ERR;
> + }
> +
> + for (i = 0; i < dev->mem->nregions; i++) {
> + struct rte_vhost_mem_region *current_region = &dev->mem->regions[i];
> +
> + /*
> + * According to the vhost-user specification:
> + * The memory region to be removed is identified by its GPA,
> + * user address and size. The mmap offset is ignored.
> + */
> + if (region->userspace_addr == current_region->guest_user_addr
> + && region->guest_phys_addr == current_region->guest_phys_addr
> + && region->memory_size == current_region->size) {
> + if (dev->async_copy && rte_vfio_is_enabled("vfio"))
> + async_dma_map_region(dev, current_region, false);
> + remove_guest_pages(dev, current_region);
> + free_mem_region(current_region);
> +
> + /* Compact the regions array to keep it contiguous */
> + if (i < dev->mem->nregions - 1) {
> + memmove(&dev->mem->regions[i],
> + &dev->mem->regions[i + 1],
> + (dev->mem->nregions - 1 - i) *
> + sizeof(struct rte_vhost_mem_region));
> + memset(&dev->mem->regions[dev->mem->nregions - 1],
> + 0, sizeof(struct rte_vhost_mem_region));
> + }
> +
> + dev->mem->nregions--;
> + dev_invalidate_vrings(pdev, VHOST_USER_REM_MEM_REG);
> + dev = *pdev;
I still don't know what the assignment meaning/function?
> + close_msg_fds(ctx);
> + return RTE_VHOST_MSG_RESULT_OK;
> + }
> + }
> +
> + VHOST_CONFIG_LOG(dev->ifname, ERR, "failed to find region");
> + close_msg_fds(ctx);
> + return RTE_VHOST_MSG_RESULT_ERR;
> +}
> +
> static bool
> vq_is_ready(struct virtio_net *dev, struct vhost_virtqueue *vq)
> {
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-05-15 1:04 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14 22:46 [PATCH v13 0/5] Support add/remove memory region and get-max-slots pravin.bathija
2026-05-14 22:46 ` [PATCH v13 1/5] vhost: add user to mailmap and define to vhost hdr pravin.bathija
2026-05-15 0:20 ` fengchengwen
2026-05-14 22:46 ` [PATCH v13 2/5] vhost_user: header defines for add/rem mem region pravin.bathija
2026-05-14 22:46 ` [PATCH v13 3/5] vhost_user: support function defines for back-end pravin.bathija
2026-05-15 0:33 ` fengchengwen
2026-05-14 22:46 ` [PATCH v13 4/5] vhost_user: Function defs for add/rem mem regions pravin.bathija
2026-05-15 1:04 ` fengchengwen
2026-05-14 22:46 ` [PATCH v13 5/5] vhost_user: enable configure memory slots pravin.bathija
-- strict thread matches above, loose matches on Subject: below --
2026-05-14 2:01 [PATCH v13 0/5] Support add/remove memory region and get-max-slots pravin.bathija
2026-05-14 2:01 ` [PATCH v13 1/5] vhost: add user to mailmap and define to vhost hdr pravin.bathija
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.