public inbox for dev@dpdk.org
 help / color / mirror / Atom feed
* [PATCH v7 0/5] Support add/remove memory region & get-max-slots
@ 2026-02-11 10:24 pravin.bathija
  2026-02-11 10:24 ` [PATCH v7 1/5] vhost: add user to mailmap and define to vhost hdr pravin.bathija
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: pravin.bathija @ 2026-02-11 10:24 UTC (permalink / raw)
  To: dev; +Cc: pravin.bathija, maxime.coquelin, fengchengwen

From: Pravin M Bathija <pravin.bathija@dell.com>

This is version v7 of the patchset and it incorporates the
recommendations made by Maxime Coquelin and Feng Cheng Wen.
The patchset includes support for adding and removal of memory
regions, getting max memory slots and other changes to vhost-user
messages.  These messages are sent from vhost-user front-end (qemu
or libblkio) to a vhost-user back-end (dpdk, spdk). Support functions
for these message functions have been implemented in the interest of
writing optimized code. Older functions, part of vhost-user back-end
have also been optimized using these newly defined support functions.
This implementation has been extensively tested by doing Read/Write
I/O from multiple instances of fio + libblkio (front-end) talking to
spdk/dpdk (back-end) based drives. Tested with qemu front-end
talking to dpdk + testpmd (back-end) performing add/removal of memory
regions. Also tested post-copy live migration after doing
add_memory_region.

Version Log:
Version v7 (Current version): Incorporate code review suggestions
from Maxime Coquelin. Add debug messages to vhost_postcopy_register
function.
Version v6: Added the enablement of this feature
as a final patch in this patch-set and other code optimizations as
suggested by Maxime Coquelin.
Version v5: removed the patch that increased the
number of memory regions from 8 to 128. This will be submitted as a
separate feature at a later point after incorporating additional
optimizations. Also includes code optimizations as suggested by Feng
Cheng Wen.
Version v4: code optimizations as suggested by Feng Cheng Wen.
Version v3: code optimizations as suggested by Maxime Coquelin
and Thomas Monjalon.
Version v2: code optimizations as suggested by Maxime Coquelin.
Version v1: Initial patch set.

Pravin M Bathija (5):
  vhost: add user to mailmap and define to vhost hdr
  vhost_user: header defines for add/rem mem region
  vhost_user: support function defines for back-end
  vhost_user: Function defs for add/rem mem regions
  vhost_user: enable configure memory slots

 .mailmap               |   1 +
 lib/vhost/rte_vhost.h  |   4 +
 lib/vhost/vhost_user.c | 392 ++++++++++++++++++++++++++++++++++++-----
 lib/vhost/vhost_user.h |  10 ++
 4 files changed, 365 insertions(+), 42 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v7 1/5] vhost: add user to mailmap and define to vhost hdr
  2026-02-11 10:24 [PATCH v7 0/5] Support add/remove memory region & get-max-slots pravin.bathija
@ 2026-02-11 10:24 ` pravin.bathija
  2026-02-11 10:24 ` [PATCH v7 2/5] vhost_user: header defines for add/rem mem region pravin.bathija
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: pravin.bathija @ 2026-02-11 10:24 UTC (permalink / raw)
  To: dev; +Cc: pravin.bathija, maxime.coquelin, fengchengwen

From: Pravin M Bathija <pravin.bathija@dell.com>

- add user to mailmap file.
- define a bit-field called VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
  that depicts if the feature/capability to add/remove memory regions
  is supported. This is a part of the overall support for add/remove
  memory region feature in this patchset.

Signed-off-by: Pravin M Bathija <pravin.bathija@dell.com>
---
 .mailmap              | 1 +
 lib/vhost/rte_vhost.h | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/.mailmap b/.mailmap
index fc53ed2a55..1ca895ef1e 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1286,6 +1286,7 @@ Prateek Agarwal <prateekag@cse.iitb.ac.in>
 Prathisna Padmasanan <prathisna.padmasanan@intel.com>
 Praveen Kaligineedi <pkaligineedi@google.com>
 Praveen Shetty <praveen.shetty@intel.com>
+Pravin M Bathija <pravin.bathija@dell.com>
 Pravin Pathak <pravin.pathak.dev@gmail.com> <pravin.pathak@intel.com>
 Prince Takkar <ptakkar@marvell.com>
 Priyalee Kushwaha <priyalee.kushwaha@intel.com>
diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
index 2f7c4c0080..a7f9700538 100644
--- a/lib/vhost/rte_vhost.h
+++ b/lib/vhost/rte_vhost.h
@@ -109,6 +109,10 @@ extern "C" {
 #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
 #endif
 
+#ifndef VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
+#define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15
+#endif
+
 #ifndef VHOST_USER_PROTOCOL_F_STATUS
 #define VHOST_USER_PROTOCOL_F_STATUS 16
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v7 2/5] vhost_user: header defines for add/rem mem region
  2026-02-11 10:24 [PATCH v7 0/5] Support add/remove memory region & get-max-slots pravin.bathija
  2026-02-11 10:24 ` [PATCH v7 1/5] vhost: add user to mailmap and define to vhost hdr pravin.bathija
@ 2026-02-11 10:24 ` pravin.bathija
  2026-02-11 10:24 ` [PATCH v7 3/5] vhost_user: support function defines for back-end pravin.bathija
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: pravin.bathija @ 2026-02-11 10:24 UTC (permalink / raw)
  To: dev; +Cc: pravin.bathija, maxime.coquelin, fengchengwen

From: Pravin M Bathija <pravin.bathija@dell.com>

The changes in this file cover the enum message requests for
supporting add/remove memory regions. The front-end vhost-user
client sends messages like get max memory slots, add memory region
and remove memory region which are covered in these changes which
are on the vhost-user back-end. The changes also include data structure
definition of memory region to be added/removed. The data structure
VhostUserMsg has been changed to include the memory region.

Signed-off-by: Pravin M Bathija <pravin.bathija@dell.com>
---
 lib/vhost/vhost_user.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/vhost/vhost_user.h b/lib/vhost/vhost_user.h
index ef486545ba..f8d921f7f1 100644
--- a/lib/vhost/vhost_user.h
+++ b/lib/vhost/vhost_user.h
@@ -67,6 +67,9 @@ typedef enum VhostUserRequest {
 	VHOST_USER_POSTCOPY_END = 30,
 	VHOST_USER_GET_INFLIGHT_FD = 31,
 	VHOST_USER_SET_INFLIGHT_FD = 32,
+	VHOST_USER_GET_MAX_MEM_SLOTS = 36,
+	VHOST_USER_ADD_MEM_REG = 37,
+	VHOST_USER_REM_MEM_REG = 38,
 	VHOST_USER_SET_STATUS = 39,
 	VHOST_USER_GET_STATUS = 40,
 } VhostUserRequest;
@@ -91,6 +94,11 @@ typedef struct VhostUserMemory {
 	VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
 } VhostUserMemory;
 
+typedef struct VhostUserSingleMemReg {
+	uint64_t padding;
+	VhostUserMemoryRegion region;
+} VhostUserSingleMemReg;
+
 typedef struct VhostUserLog {
 	uint64_t mmap_size;
 	uint64_t mmap_offset;
@@ -186,6 +194,7 @@ typedef struct __rte_packed_begin VhostUserMsg {
 		struct vhost_vring_state state;
 		struct vhost_vring_addr addr;
 		VhostUserMemory memory;
+		VhostUserSingleMemReg memory_single;
 		VhostUserLog    log;
 		struct vhost_iotlb_msg iotlb;
 		VhostUserCryptoSessionParam crypto_session;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v7 3/5] vhost_user: support function defines for back-end
  2026-02-11 10:24 [PATCH v7 0/5] Support add/remove memory region & get-max-slots pravin.bathija
  2026-02-11 10:24 ` [PATCH v7 1/5] vhost: add user to mailmap and define to vhost hdr pravin.bathija
  2026-02-11 10:24 ` [PATCH v7 2/5] vhost_user: header defines for add/rem mem region pravin.bathija
@ 2026-02-11 10:24 ` pravin.bathija
  2026-02-11 10:24 ` [PATCH v7 4/5] vhost_user: Function defs for add/rem mem regions pravin.bathija
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: pravin.bathija @ 2026-02-11 10:24 UTC (permalink / raw)
  To: dev; +Cc: pravin.bathija, maxime.coquelin, fengchengwen

From: Pravin M Bathija <pravin.bathija@dell.com>

Here we define support functions which are called from the various
vhost-user back-end message functions like set memory table, get
memory slots, add memory region, remove memory region.  These are
essentially common functions to initialize memory, unmap a set of
memory regions, perform register copy, align memory addresses and
dma map/unmap a single memory region.

Signed-off-by: Pravin M Bathija <pravin.bathija@dell.com>
---
 lib/vhost/vhost_user.c | 150 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 140 insertions(+), 10 deletions(-)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 4bfb13fb98..e2831b7625 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -171,6 +171,80 @@ get_blk_size(int fd)
 	return ret == -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize;
 }
 
+static void
+async_dma_map_region(struct virtio_net *dev, struct rte_vhost_mem_region *reg, bool do_map)
+{
+	uint32_t i;
+	int ret = 0;
+	int found = 0;
+	struct guest_page *page;
+
+	uint64_t reg_size = reg->size;
+	uint64_t host_user_addr  = reg->host_user_addr;
+
+	for (i = 0; i < dev->nr_guest_pages; i++) {
+		page = &dev->guest_pages[i];
+		if (host_user_addr == page->host_user_addr) {
+			found = 1;
+			break;
+		}
+	}
+
+	if (!found) {
+		VHOST_CONFIG_LOG(dev->ifname, ERR, "DMA engine map: region not found");
+		return;
+	}
+
+	if (do_map) {
+		while (reg_size) {
+			page = &dev->guest_pages[i];
+			ret = rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD,
+					page->host_user_addr,
+					page->host_iova,
+					page->size);
+			if (ret) {
+				if (rte_errno == ENODEV)
+					return;
+
+				/* DMA mapping errors won't stop VHOST_USER_SET_MEM_TABLE. */
+				VHOST_CONFIG_LOG(dev->ifname, ERR, "DMA engine map failed");
+			}
+
+			reg_size -= page->size;
+			++i;
+			if (i >= dev->nr_guest_pages) {
+				/* shouldn't get here */
+				VHOST_CONFIG_LOG(dev->ifname, ERR, "DMA engine map: No more pages");
+				return;
+			}
+		}
+	} else {
+		while (reg_size) {
+			page = &dev->guest_pages[i];
+			ret = rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD,
+					page->host_user_addr,
+					page->host_iova,
+					page->size);
+			if (ret) {
+				if (rte_errno == EINVAL)
+					return;
+
+				/* DMA mapping errors won't stop VHOST_USER_SET_MEM_TABLE. */
+				VHOST_CONFIG_LOG(dev->ifname, ERR, "DMA engine unmap failed");
+			}
+
+			reg_size -= page->size;
+			++i;
+			if (i >= dev->nr_guest_pages) {
+				/* shouldn't get here */
+				VHOST_CONFIG_LOG(dev->ifname, ERR, "DMA engine unmap: No more pages");
+				return;
+			}
+		}
+	}
+
+}
+
 static void
 async_dma_map(struct virtio_net *dev, bool do_map)
 {
@@ -225,7 +299,17 @@ async_dma_map(struct virtio_net *dev, bool do_map)
 }
 
 static void
-free_mem_region(struct virtio_net *dev)
+free_mem_region(struct rte_vhost_mem_region *reg)
+{
+	if (reg != NULL && reg->host_user_addr) {
+		munmap(reg->mmap_addr, reg->mmap_size);
+		close(reg->fd);
+		memset(reg, 0, sizeof(struct rte_vhost_mem_region));
+	}
+}
+
+static void
+free_all_mem_regions(struct virtio_net *dev)
 {
 	uint32_t i;
 	struct rte_vhost_mem_region *reg;
@@ -236,12 +320,10 @@ free_mem_region(struct virtio_net *dev)
 	if (dev->async_copy && rte_vfio_is_enabled("vfio"))
 		async_dma_map(dev, false);
 
-	for (i = 0; i < dev->mem->nregions; i++) {
+	for (i = 0; i < VHOST_MEMORY_MAX_NREGIONS; i++) {
 		reg = &dev->mem->regions[i];
-		if (reg->host_user_addr) {
-			munmap(reg->mmap_addr, reg->mmap_size);
-			close(reg->fd);
-		}
+		if (reg->mmap_addr)
+			free_mem_region(reg);
 	}
 }
 
@@ -255,7 +337,7 @@ vhost_backend_cleanup(struct virtio_net *dev)
 		vdpa_dev->ops->dev_cleanup(dev->vid);
 
 	if (dev->mem) {
-		free_mem_region(dev);
+		free_all_mem_regions(dev);
 		rte_free(dev->mem);
 		dev->mem = NULL;
 	}
@@ -704,7 +786,7 @@ numa_realloc(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
 	vhost_devices[dev->vid] = dev;
 
 	mem_size = sizeof(struct rte_vhost_memory) +
-		sizeof(struct rte_vhost_mem_region) * dev->mem->nregions;
+		sizeof(struct rte_vhost_mem_region) * VHOST_MEMORY_MAX_NREGIONS;
 	mem = rte_realloc_socket(dev->mem, mem_size, 0, node);
 	if (!mem) {
 		VHOST_CONFIG_LOG(dev->ifname, ERR,
@@ -808,8 +890,10 @@ hua_to_alignment(struct rte_vhost_memory *mem, void *ptr)
 	uint32_t i;
 	uintptr_t hua = (uintptr_t)ptr;
 
-	for (i = 0; i < mem->nregions; i++) {
+	for (i = 0; i < VHOST_MEMORY_MAX_NREGIONS; i++) {
 		r = &mem->regions[i];
+		if (r->host_user_addr == 0)
+			continue;
 		if (hua >= r->host_user_addr &&
 			hua < r->host_user_addr + r->size) {
 			return get_blk_size(r->fd);
@@ -1246,10 +1330,14 @@ vhost_user_postcopy_register(struct virtio_net *dev, int main_fd,
 	 * DPDK's virtual address with Qemu, so that Qemu can
 	 * retrieve the region offset when handling userfaults.
 	 */
+	int reg_msg_index = 0;
 	memory = &ctx->msg.payload.memory;
 	for (i = 0; i < memory->nregions; i++) {
 		reg = &dev->mem->regions[i];
-		memory->regions[i].userspace_addr = reg->host_user_addr;
+		if (reg->host_user_addr == 0)
+			continue;
+		memory->regions[reg_msg_index].userspace_addr = reg->host_user_addr;
+		reg_msg_index++;
 	}
 
 	/* Send the addresses back to qemu */
@@ -1278,6 +1366,8 @@ vhost_user_postcopy_register(struct virtio_net *dev, int main_fd,
 	/* Now userfault register and we can use the memory */
 	for (i = 0; i < memory->nregions; i++) {
 		reg = &dev->mem->regions[i];
+		if (reg->host_user_addr == 0)
+			continue;
 		if (vhost_user_postcopy_region_register(dev, reg) < 0)
 			return -1;
 	}
@@ -1382,6 +1472,46 @@ vhost_user_mmap_region(struct virtio_net *dev,
 	return 0;
 }
 
+static int
+vhost_user_initialize_memory(struct virtio_net **pdev)
+{
+	struct virtio_net *dev = *pdev;
+	int numa_node = SOCKET_ID_ANY;
+
+	/*
+	 * If VQ 0 has already been allocated, try to allocate on the same
+	 * NUMA node. It can be reallocated later in numa_realloc().
+	 */
+	if (dev->nr_vring > 0)
+		numa_node = dev->virtqueue[0]->numa_node;
+
+	dev->nr_guest_pages = 0;
+	if (dev->guest_pages == NULL) {
+		dev->max_guest_pages = VHOST_MEMORY_MAX_NREGIONS;
+		dev->guest_pages = rte_zmalloc_socket(NULL,
+					dev->max_guest_pages *
+					sizeof(struct guest_page),
+					RTE_CACHE_LINE_SIZE,
+					numa_node);
+		if (dev->guest_pages == NULL) {
+			VHOST_CONFIG_LOG(dev->ifname, ERR,
+				"failed to allocate memory for dev->guest_pages");
+			return -1;
+		}
+	}
+
+	dev->mem = rte_zmalloc_socket("vhost-mem-table", sizeof(struct rte_vhost_memory) +
+		sizeof(struct rte_vhost_mem_region) * VHOST_MEMORY_MAX_NREGIONS, 0, numa_node);
+	if (dev->mem == NULL) {
+		VHOST_CONFIG_LOG(dev->ifname, ERR, "failed to allocate memory for dev->mem");
+		rte_free(dev->guest_pages);
+		dev->guest_pages = NULL;
+		return -1;
+	}
+
+	return 0;
+}
+
 static int
 vhost_user_set_mem_table(struct virtio_net **pdev,
 			struct vhu_msg_context *ctx,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v7 4/5] vhost_user: Function defs for add/rem mem regions
  2026-02-11 10:24 [PATCH v7 0/5] Support add/remove memory region & get-max-slots pravin.bathija
                   ` (2 preceding siblings ...)
  2026-02-11 10:24 ` [PATCH v7 3/5] vhost_user: support function defines for back-end pravin.bathija
@ 2026-02-11 10:24 ` pravin.bathija
  2026-02-11 10:24 ` [PATCH v7 5/5] vhost_user: enable configure memory slots pravin.bathija
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: pravin.bathija @ 2026-02-11 10:24 UTC (permalink / raw)
  To: dev; +Cc: pravin.bathija, maxime.coquelin, fengchengwen

From: Pravin M Bathija <pravin.bathija@dell.com>

These changes cover the function definition for add/remove memory
region calls which are invoked on receiving vhost user message from
vhost user front-end (e.g. Qemu). In our case, in addition to testing
with qemu front-end, the testing has also been performed with libblkio
front-end and spdk/dpdk back-end. We did I/O using libblkio based device
driver, to spdk based drives. There are also changes for set_mem_table
and new definition for get memory slots. Our changes optimize the set
memory table call to use common support functions. Message get memory
slots is how the vhost-user front-end queries the vhost-user back-end
about the number of memory slots available to be registered by the
back-end. In addition support function to invalidate vring is also
defined which is used in add/remove memory region functions.

Signed-off-by: Pravin M Bathija <pravin.bathija@dell.com>
---
 lib/vhost/vhost_user.c | 242 +++++++++++++++++++++++++++++++++++------
 1 file changed, 210 insertions(+), 32 deletions(-)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index e2831b7625..746b458d76 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -71,6 +71,9 @@ VHOST_MESSAGE_HANDLER(VHOST_USER_SET_FEATURES, vhost_user_set_features, false, t
 VHOST_MESSAGE_HANDLER(VHOST_USER_SET_OWNER, vhost_user_set_owner, false, true) \
 VHOST_MESSAGE_HANDLER(VHOST_USER_RESET_OWNER, vhost_user_reset_owner, false, false) \
 VHOST_MESSAGE_HANDLER(VHOST_USER_SET_MEM_TABLE, vhost_user_set_mem_table, true, true) \
+VHOST_MESSAGE_HANDLER(VHOST_USER_GET_MAX_MEM_SLOTS, vhost_user_get_max_mem_slots, false, false) \
+VHOST_MESSAGE_HANDLER(VHOST_USER_ADD_MEM_REG, vhost_user_add_mem_reg, true, true) \
+VHOST_MESSAGE_HANDLER(VHOST_USER_REM_MEM_REG, vhost_user_rem_mem_reg, false, true) \
 VHOST_MESSAGE_HANDLER(VHOST_USER_SET_LOG_BASE, vhost_user_set_log_base, true, true) \
 VHOST_MESSAGE_HANDLER(VHOST_USER_SET_LOG_FD, vhost_user_set_log_fd, true, true) \
 VHOST_MESSAGE_HANDLER(VHOST_USER_SET_VRING_NUM, vhost_user_set_vring_num, false, true) \
@@ -1520,7 +1523,6 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
 	struct virtio_net *dev = *pdev;
 	struct VhostUserMemory *memory = &ctx->msg.payload.memory;
 	struct rte_vhost_mem_region *reg;
-	int numa_node = SOCKET_ID_ANY;
 	uint64_t mmap_offset;
 	uint32_t i;
 	bool async_notify = false;
@@ -1565,39 +1567,13 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
 		if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
 			vhost_user_iotlb_flush_all(dev);
 
-		free_mem_region(dev);
+		free_all_mem_regions(dev);
 		rte_free(dev->mem);
 		dev->mem = NULL;
 	}
 
-	/*
-	 * If VQ 0 has already been allocated, try to allocate on the same
-	 * NUMA node. It can be reallocated later in numa_realloc().
-	 */
-	if (dev->nr_vring > 0)
-		numa_node = dev->virtqueue[0]->numa_node;
-
-	dev->nr_guest_pages = 0;
-	if (dev->guest_pages == NULL) {
-		dev->max_guest_pages = 8;
-		dev->guest_pages = rte_zmalloc_socket(NULL,
-					dev->max_guest_pages *
-					sizeof(struct guest_page),
-					RTE_CACHE_LINE_SIZE,
-					numa_node);
-		if (dev->guest_pages == NULL) {
-			VHOST_CONFIG_LOG(dev->ifname, ERR,
-				"failed to allocate memory for dev->guest_pages");
-			goto close_msg_fds;
-		}
-	}
-
-	dev->mem = rte_zmalloc_socket("vhost-mem-table", sizeof(struct rte_vhost_memory) +
-		sizeof(struct rte_vhost_mem_region) * memory->nregions, 0, numa_node);
-	if (dev->mem == NULL) {
-		VHOST_CONFIG_LOG(dev->ifname, ERR, "failed to allocate memory for dev->mem");
-		goto free_guest_pages;
-	}
+	if (vhost_user_initialize_memory(pdev) < 0)
+		goto close_msg_fds;
 
 	for (i = 0; i < memory->nregions; i++) {
 		reg = &dev->mem->regions[i];
@@ -1661,11 +1637,175 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
 	return RTE_VHOST_MSG_RESULT_OK;
 
 free_mem_table:
-	free_mem_region(dev);
+	free_all_mem_regions(dev);
 	rte_free(dev->mem);
 	dev->mem = NULL;
+	rte_free(dev->guest_pages);
+	dev->guest_pages = NULL;
+close_msg_fds:
+	close_msg_fds(ctx);
+	return RTE_VHOST_MSG_RESULT_ERR;
+}
+
+
+static int
+vhost_user_get_max_mem_slots(struct virtio_net **pdev __rte_unused,
+			struct vhu_msg_context *ctx,
+			int main_fd __rte_unused)
+{
+	uint32_t max_mem_slots = VHOST_MEMORY_MAX_NREGIONS;
+
+	ctx->msg.payload.u64 = (uint64_t)max_mem_slots;
+	ctx->msg.size = sizeof(ctx->msg.payload.u64);
+	ctx->fd_num = 0;
+
+	return RTE_VHOST_MSG_RESULT_REPLY;
+}
+
+static void
+dev_invalidate_vrings(struct virtio_net *dev)
+{
+	uint32_t i;
+
+	for (i = 0; i < dev->nr_vring; i++) {
+		struct vhost_virtqueue *vq = dev->virtqueue[i];
+
+		if (!vq)
+			continue;
+
+		if (vq->desc || vq->avail || vq->used) {
+			/* vhost_user_lock_all_queue_pairs locked all qps */
+			VHOST_USER_ASSERT_LOCK(dev, vq, VHOST_USER_ADD_MEM_REG);
+
+			/*
+			 * If the memory table got updated, the ring addresses
+			 * need to be translated again as virtual addresses have
+			 * changed.
+			 */
+			vring_invalidate(dev, vq);
 
-free_guest_pages:
+			translate_ring_addresses(&dev, &vq);
+		}
+	}
+}
+
+static int
+vhost_user_add_mem_reg(struct virtio_net **pdev,
+			struct vhu_msg_context *ctx,
+			int main_fd)
+{
+	uint32_t i;
+	struct virtio_net *dev = *pdev;
+	struct VhostUserMemoryRegion *region = &ctx->msg.payload.memory_single.region;
+
+	/* convert first region add to normal memory table set */
+	if (dev->mem == NULL) {
+		if (vhost_user_initialize_memory(pdev) < 0)
+			goto close_msg_fds;
+	}
+
+	/* make sure new region will fit */
+	if (dev->mem->nregions >= VHOST_MEMORY_MAX_NREGIONS) {
+		VHOST_CONFIG_LOG(dev->ifname, ERR, "too many memory regions already (%u)",
+									dev->mem->nregions);
+		goto close_msg_fds;
+	}
+
+	/* make sure supplied memory fd present */
+	if (ctx->fd_num != 1) {
+		VHOST_CONFIG_LOG(dev->ifname, ERR, "fd count makes no sense (%u)", ctx->fd_num);
+		goto close_msg_fds;
+	}
+
+	/* Make sure no overlap in guest virtual address space */
+	if (dev->mem != NULL && dev->mem->nregions > 0)	{
+		for (i = 0; i < VHOST_MEMORY_MAX_NREGIONS; i++) {
+			struct rte_vhost_mem_region *current_region = &dev->mem->regions[i];
+
+			if (current_region->mmap_size == 0)
+				continue;
+
+			uint64_t current_region_guest_start = current_region->guest_user_addr;
+			uint64_t current_region_guest_end = current_region_guest_start
+								+ current_region->mmap_size - 1;
+			uint64_t proposed_region_guest_start = region->userspace_addr;
+			uint64_t proposed_region_guest_end = proposed_region_guest_start
+								+ region->memory_size - 1;
+			bool overlap = false;
+
+			overlap = !((proposed_region_guest_end < current_region_guest_start) ||
+				(proposed_region_guest_start > current_region_guest_end));
+
+			if (overlap) {
+				VHOST_CONFIG_LOG(dev->ifname, ERR,
+					"requested memory region overlaps with another region");
+				VHOST_CONFIG_LOG(dev->ifname, ERR,
+					"\tRequested region address:0x%" PRIx64,
+					region->userspace_addr);
+				VHOST_CONFIG_LOG(dev->ifname, ERR,
+					"\tRequested region size:0x%" PRIx64,
+					region->memory_size);
+				VHOST_CONFIG_LOG(dev->ifname, ERR,
+					"\tOverlapping region address:0x%" PRIx64,
+					current_region->guest_user_addr);
+				VHOST_CONFIG_LOG(dev->ifname, ERR,
+					"\tOverlapping region size:0x%" PRIx64,
+					current_region->mmap_size);
+				goto close_msg_fds;
+			}
+
+		}
+	}
+
+	/* find a new region and set it like memory table set does */
+	struct rte_vhost_mem_region *reg = NULL;
+
+	for (i = 0; i < VHOST_MEMORY_MAX_NREGIONS; i++) {
+		if (dev->mem->regions[i].guest_user_addr == 0) {
+			reg = &dev->mem->regions[i];
+			break;
+		}
+	}
+	if (reg == NULL) {
+		VHOST_CONFIG_LOG(dev->ifname, ERR, "no free memory region");
+		goto close_msg_fds;
+	}
+
+	reg->guest_phys_addr = region->guest_phys_addr;
+	reg->guest_user_addr = region->userspace_addr;
+	reg->size            = region->memory_size;
+	reg->fd              = ctx->fds[0];
+
+	if (vhost_user_mmap_region(dev, reg, region->mmap_offset) < 0) {
+		VHOST_CONFIG_LOG(dev->ifname, ERR, "failed to mmap region");
+		goto close_msg_fds;
+	}
+
+	dev->mem->nregions++;
+
+	if (dev->async_copy && rte_vfio_is_enabled("vfio"))
+		async_dma_map_region(dev, reg, true);
+
+	VHOST_CONFIG_LOG(dev->ifname, INFO,
+		"ADD_MEM_REG: calling postcopy_register for memory region - "
+		"guest_addr=0x%" PRIx64 ", size=0x%" PRIx64,
+		reg->guest_user_addr, reg->size);
+
+	if (vhost_user_postcopy_register(dev, main_fd, ctx) < 0)
+		goto free_mem_table;
+
+	VHOST_CONFIG_LOG(dev->ifname, INFO,
+		"ADD_MEM_REG: calling postcopy_register success");
+
+	dev_invalidate_vrings(dev);
+	dump_guest_pages(dev);
+
+	return RTE_VHOST_MSG_RESULT_OK;
+
+free_mem_table:
+	free_all_mem_regions(dev);
+	rte_free(dev->mem);
+	dev->mem = NULL;
 	rte_free(dev->guest_pages);
 	dev->guest_pages = NULL;
 close_msg_fds:
@@ -1673,6 +1813,44 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
 	return RTE_VHOST_MSG_RESULT_ERR;
 }
 
+static int
+vhost_user_rem_mem_reg(struct virtio_net **pdev,
+			struct vhu_msg_context *ctx,
+			int main_fd __rte_unused)
+{
+	uint32_t i;
+	struct virtio_net *dev = *pdev;
+	struct VhostUserMemoryRegion *region = &ctx->msg.payload.memory_single.region;
+
+	if (dev->mem != NULL && dev->mem->nregions > 0) {
+		for (i = 0; i < VHOST_MEMORY_MAX_NREGIONS; i++) {
+			struct rte_vhost_mem_region *current_region = &dev->mem->regions[i];
+
+			if (current_region->guest_user_addr == 0)
+				continue;
+
+			/*
+			 * According to the vhost-user specification:
+			 * The memory region to be removed is identified by its guest address,
+			 * user address and size. The mmap offset is ignored.
+			 */
+			if (region->userspace_addr == current_region->guest_user_addr
+				&& region->guest_phys_addr == current_region->guest_phys_addr
+				&& region->memory_size == current_region->size) {
+				if (dev->async_copy && rte_vfio_is_enabled("vfio"))
+					async_dma_map_region(dev, current_region, false);
+				dev_invalidate_vrings(dev);
+				free_mem_region(current_region);
+				dev->mem->nregions--;
+				return RTE_VHOST_MSG_RESULT_OK;
+			}
+		}
+	}
+
+	VHOST_CONFIG_LOG(dev->ifname, ERR, "failed to find region");
+	return RTE_VHOST_MSG_RESULT_ERR;
+}
+
 static bool
 vq_is_ready(struct virtio_net *dev, struct vhost_virtqueue *vq)
 {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v7 5/5] vhost_user: enable configure memory slots
  2026-02-11 10:24 [PATCH v7 0/5] Support add/remove memory region & get-max-slots pravin.bathija
                   ` (3 preceding siblings ...)
  2026-02-11 10:24 ` [PATCH v7 4/5] vhost_user: Function defs for add/rem mem regions pravin.bathija
@ 2026-02-11 10:24 ` pravin.bathija
  2026-03-29 19:46 ` [PATCH v7 0/5] Support add/remove memory region & get-max-slots Stephen Hemminger
  2026-03-31  3:26 ` Stephen Hemminger
  6 siblings, 0 replies; 10+ messages in thread
From: pravin.bathija @ 2026-02-11 10:24 UTC (permalink / raw)
  To: dev; +Cc: pravin.bathija, maxime.coquelin, fengchengwen

From: Pravin M Bathija <pravin.bathija@dell.com>

This patch enables configure memory slots in the header define
VHOST_USER_PROTOCOL_FEATURES.

Signed-off-by: Pravin M Bathija <pravin.bathija@dell.com>
---
 lib/vhost/vhost_user.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/vhost/vhost_user.h b/lib/vhost/vhost_user.h
index f8d921f7f1..5a0e747b58 100644
--- a/lib/vhost/vhost_user.h
+++ b/lib/vhost/vhost_user.h
@@ -32,6 +32,7 @@
 					 (1ULL << VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD) | \
 					 (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \
 					 (1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT) | \
+					 (1ULL << VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS) | \
 					 (1ULL << VHOST_USER_PROTOCOL_F_STATUS))
 
 typedef enum VhostUserRequest {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v7 0/5] Support add/remove memory region & get-max-slots
  2026-02-11 10:24 [PATCH v7 0/5] Support add/remove memory region & get-max-slots pravin.bathija
                   ` (4 preceding siblings ...)
  2026-02-11 10:24 ` [PATCH v7 5/5] vhost_user: enable configure memory slots pravin.bathija
@ 2026-03-29 19:46 ` Stephen Hemminger
  2026-04-02  7:44   ` Bathija, Pravin
  2026-03-31  3:26 ` Stephen Hemminger
  6 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2026-03-29 19:46 UTC (permalink / raw)
  To: pravin.bathija; +Cc: dev, maxime.coquelin, fengchengwen

On Wed, 11 Feb 2026 10:24:28 +0000
<pravin.bathija@dell.com> wrote:

> From: Pravin M Bathija <pravin.bathija@dell.com>
> 
> This is version v7 of the patchset and it incorporates the
> recommendations made by Maxime Coquelin and Feng Cheng Wen.
> The patchset includes support for adding and removal of memory
> regions, getting max memory slots and other changes to vhost-user
> messages.  These messages are sent from vhost-user front-end (qemu
> or libblkio) to a vhost-user back-end (dpdk, spdk). Support functions
> for these message functions have been implemented in the interest of
> writing optimized code. Older functions, part of vhost-user back-end
> have also been optimized using these newly defined support functions.
> This implementation has been extensively tested by doing Read/Write
> I/O from multiple instances of fio + libblkio (front-end) talking to
> spdk/dpdk (back-end) based drives. Tested with qemu front-end
> talking to dpdk + testpmd (back-end) performing add/removal of memory
> regions. Also tested post-copy live migration after doing
> add_memory_region.
> 
> Version Log:
> Version v7 (Current version): Incorporate code review suggestions
> from Maxime Coquelin. Add debug messages to vhost_postcopy_register
> function.
> Version v6: Added the enablement of this feature
> as a final patch in this patch-set and other code optimizations as
> suggested by Maxime Coquelin.
> Version v5: removed the patch that increased the
> number of memory regions from 8 to 128. This will be submitted as a
> separate feature at a later point after incorporating additional
> optimizations. Also includes code optimizations as suggested by Feng
> Cheng Wen.
> Version v4: code optimizations as suggested by Feng Cheng Wen.
> Version v3: code optimizations as suggested by Maxime Coquelin
> and Thomas Monjalon.
> Version v2: code optimizations as suggested by Maxime Coquelin.
> Version v1: Initial patch set.
> 
> Pravin M Bathija (5):
>   vhost: add user to mailmap and define to vhost hdr
>   vhost_user: header defines for add/rem mem region
>   vhost_user: support function defines for back-end
>   vhost_user: Function defs for add/rem mem regions
>   vhost_user: enable configure memory slots
> 
>  .mailmap               |   1 +
>  lib/vhost/rte_vhost.h  |   4 +
>  lib/vhost/vhost_user.c | 392 ++++++++++++++++++++++++++++++++++++-----
>  lib/vhost/vhost_user.h |  10 ++
>  4 files changed, 365 insertions(+), 42 deletions(-)
> 

Ran this patch set through AI review (Gemini this time).
It found lots of issues...

✦ I have performed a comprehensive review of the vhost_user patchset and identified several critical correctness issues that violate DPDK
  guidelines.

  Critical Correctness Bugs

   1. Destructive Error Path in vhost_user_add_mem_reg:
      In Patch 4/5, the error path for vhost_user_add_mem_reg (label free_mem_table) calls free_all_mem_regions(dev) and wipes dev->mem and
  dev->guest_pages. This is catastrophic: if registration of a single new memory region fails, the entire existing memory configuration for the
  running VM is destroyed, leading to immediate failure of all I/O. It should only clean up the specifically failed region.

   2. Broken Address Mapping in async_dma_map_region:
      In Patch 3/5, async_dma_map_region assumes that if the first page of a region is found at index i in dev->guest_pages, all subsequent
  pages for that region are at i+1, i+2, etc. This assumption is invalid because dev->guest_pages is a global array for the device and is
  frequently sorted by address (guest_page_addrcmp). Pages from multiple regions will be interleaved, and the contiguous index assumption will
  lead to mapping incorrect memory or walking off the end of the array.

   3. Stale Guest Page Entries in vhost_user_rem_mem_reg:
      In Patch 4/5, vhost_user_rem_mem_reg removes a region from dev->mem but fails to remove the corresponding entries from dev->guest_pages.
  This leaves stale mappings in the translation cache, which will cause incorrect address translations, potential memory corruption, or crashes
  when the guest reuses those physical addresses.

   4. Vhost-User Protocol Violation:
      In Patch 2/5, VhostUserSingleMemReg is defined with a uint64_t padding before the VhostUserMemoryRegion. The vhost-user specification for
  VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG expects the payload to be exactly VhostUserMemoryRegion. This extra padding will cause the
  back-end to misalign all fields, leading to incorrect address and size interpretations.

   5. Unsafe Assumption in async_dma_map_region Error Handling:
      In Patch 3/5, async_dma_map_region logs DMA mapping/unmapping failures but continues execution. For memory registration, a DMA mapping
  failure must be treated as a fatal error for that operation to ensure the DMA engine never attempts to access unmapped memory.

  Style and Process Observations

   * Low Memory Slot Limit: The implementation returns VHOST_MEMORY_MAX_NREGIONS (8) for VHOST_USER_GET_MAX_MEM_SLOTS. While technically
     correct per the current DPDK defines, 8 slots is extremely restrictive for dynamic memory hotplug, which this feature is intended to
     support.
   * Missing Locking Assertions: In dev_invalidate_vrings, VHOST_USER_ASSERT_LOCK is called with VHOST_USER_ADD_MEM_REG. This macro typically
     expects a string description of the operation for logging, rather than an enum value.

  I recommend a significant redesign of the memory management logic to properly handle sparse region arrays and incremental guest page updates
  before this patchset can be accepted.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v7 0/5] Support add/remove memory region & get-max-slots
  2026-02-11 10:24 [PATCH v7 0/5] Support add/remove memory region & get-max-slots pravin.bathija
                   ` (5 preceding siblings ...)
  2026-03-29 19:46 ` [PATCH v7 0/5] Support add/remove memory region & get-max-slots Stephen Hemminger
@ 2026-03-31  3:26 ` Stephen Hemminger
  2026-04-02  8:07   ` Bathija, Pravin
  6 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2026-03-31  3:26 UTC (permalink / raw)
  To: pravin.bathija; +Cc: dev, maxime.coquelin, fengchengwen

On Wed, 11 Feb 2026 10:24:28 +0000
<pravin.bathija@dell.com> wrote:

> From: Pravin M Bathija <pravin.bathija@dell.com>
> 
> This is version v7 of the patchset and it incorporates the
> recommendations made by Maxime Coquelin and Feng Cheng Wen.
> The patchset includes support for adding and removal of memory
> regions, getting max memory slots and other changes to vhost-user
> messages.  These messages are sent from vhost-user front-end (qemu
> or libblkio) to a vhost-user back-end (dpdk, spdk). Support functions
> for these message functions have been implemented in the interest of
> writing optimized code. Older functions, part of vhost-user back-end
> have also been optimized using these newly defined support functions.
> This implementation has been extensively tested by doing Read/Write
> I/O from multiple instances of fio + libblkio (front-end) talking to
> spdk/dpdk (back-end) based drives. Tested with qemu front-end
> talking to dpdk + testpmd (back-end) performing add/removal of memory
> regions. Also tested post-copy live migration after doing
> add_memory_region.
> 
> Version Log:
> Version v7 (Current version): Incorporate code review suggestions
> from Maxime Coquelin. Add debug messages to vhost_postcopy_register
> function.
> Version v6: Added the enablement of this feature
> as a final patch in this patch-set and other code optimizations as
> suggested by Maxime Coquelin.
> Version v5: removed the patch that increased the
> number of memory regions from 8 to 128. This will be submitted as a
> separate feature at a later point after incorporating additional
> optimizations. Also includes code optimizations as suggested by Feng
> Cheng Wen.
> Version v4: code optimizations as suggested by Feng Cheng Wen.
> Version v3: code optimizations as suggested by Maxime Coquelin
> and Thomas Monjalon.
> Version v2: code optimizations as suggested by Maxime Coquelin.
> Version v1: Initial patch set.
> 
> Pravin M Bathija (5):
>   vhost: add user to mailmap and define to vhost hdr
>   vhost_user: header defines for add/rem mem region
>   vhost_user: support function defines for back-end
>   vhost_user: Function defs for add/rem mem regions
>   vhost_user: enable configure memory slots
> 
>  .mailmap               |   1 +
>  lib/vhost/rte_vhost.h  |   4 +
>  lib/vhost/vhost_user.c | 392 ++++++++++++++++++++++++++++++++++++-----
>  lib/vhost/vhost_user.h |  10 ++
>  4 files changed, 365 insertions(+), 42 deletions(-)
> 

Lots more stuff found by AI code review, this is not ready.

Review of [PATCH v7 1-5/5] vhost: configure memory slots support
Author: Pravin M Bathija <pravin.bathija@dell.com>

This patch series adds support for the VHOST_USER_F_CONFIGURE_MEM_SLOTS
protocol feature, enabling add/remove individual memory regions instead
of requiring a full SET_MEM_TABLE each time.  The approach is sound and
the feature is needed, but several correctness bugs need to be addressed.

Patch 3/5 -- vhost_user: support function defines for back-end
--------------------------------------------------------------------

Error: dev_invalidate_vrings loses *pdev update after numa_realloc

  translate_ring_addresses() calls numa_realloc(), which can reallocate
  the dev struct and update *pdev.  In the original set_mem_table the
  caller writes "*pdev = dev;" after the call to observe this.

  dev_invalidate_vrings() takes a plain "struct virtio_net *dev" and
  passes "&dev, &vq" to translate_ring_addresses -- but dev is a local
  copy.  If numa_realloc reallocates, the caller's *pdev (in
  vhost_user_add_mem_reg) is never updated, and all subsequent accesses
  use a stale/freed pointer -- a use-after-free.

  The function should take struct virtio_net **pdev (matching the
  pattern used in set_mem_table) and propagate updates back.

Error: async_dma_map_region can underflow reg_size

  The loop "reg_size -= page->size" uses unsigned subtraction. If
  page->size does not evenly divide reg->size (which can happen with
  misaligned guest pages), reg_size wraps to a huge value and the loop
  runs out of bounds.  The original async_dma_map avoids this by
  iterating over nr_guest_pages directly.

  Suggest checking "reg_size >= page->size" before subtracting, or
  better yet iterating like the original does.

Warning: free_all_mem_regions iterates all VHOST_MEMORY_MAX_NREGIONS
  but the original free_mem_region iterated only dev->mem->nregions

  This is intentional for the sparse-slot design, but free_mem_region()
  checks "reg->host_user_addr" while free_all_mem_regions checks
  "reg->mmap_addr" as the sentinel.  These should be consistent.
  A region that was partially initialized (mmap_addr set but
  host_user_addr not yet set, or vice versa) would behave
  differently depending on which sentinel is checked.

Warning: vhost_user_initialize_memory unconditionally allocates
  dev->mem without freeing any prior allocation

  If dev->mem is non-NULL when vhost_user_initialize_memory is called,
  the old allocation is leaked.  set_mem_table frees dev->mem before
  calling it, but add_mem_reg only checks "dev->mem == NULL" and skips
  the call otherwise.  This is safe as coded, but the function itself
  is fragile -- consider adding an assertion or early return if
  dev->mem is already set.

Info: vhost_user_postcopy_register changes use reg_msg_index but the
  second loop (postcopy_region_register) still iterates dev->mem->regions
  with nregions as the bound and just skips zeros.  This is correct
  but the two loops use different iteration strategies which is
  confusing.  Consider making both iterate VHOST_MEMORY_MAX_NREGIONS
  with a skip-zero check for consistency.


Patch 4/5 -- vhost_user: Function defs for add/rem mem regions
--------------------------------------------------------------------

Error: vhost_user_add_mem_reg does not set ctx->fds[0] = -1 after
  assigning reg->fd = ctx->fds[0]

  In the original set_mem_table, after "reg->fd = ctx->fds[i]" the
  code sets "ctx->fds[i] = -1" to prevent the fd from being double-
  closed if the error path calls close_msg_fds().  The new add_mem_reg
  omits this.  If vhost_user_mmap_region or vhost_user_postcopy_register
  fails, the error path calls close_msg_fds(ctx) which closes
  ctx->fds[0] -- but reg->fd still holds the same value.  Later,
  free_all_mem_regions will close(reg->fd) again -- double close.

  Fix: add "ctx->fds[0] = -1;" after "reg->fd = ctx->fds[0];".

Error: vhost_user_add_mem_reg error path frees all memory on
  single-region failure

  If postcopy_register fails after successfully mmap'ing one region
  (when other regions already exist), the "free_mem_table" label
  calls free_all_mem_regions + rte_free(dev->mem) + rte_free
  (dev->guest_pages), destroying ALL previously registered regions.
  This is disproportionate -- the error should only unmap the single
  region that was just added and decrement nregions.

Error: overlap check uses mmap_size for existing region but
  memory_size for proposed region

  The overlap check compares:
    current_region_guest_end = guest_user_addr + mmap_size - 1
    proposed_region_guest_end = userspace_addr + memory_size - 1

  mmap_size = memory_size + mmap_offset (set by vhost_user_mmap_region).
  So the existing region's range is inflated by the mmap_offset,
  potentially producing false overlaps.  Use current_region->size
  (which corresponds to memory_size) instead of mmap_size for a
  correct comparison.

Error: vhost_user_add_mem_reg uses guest_user_addr == 0 as "slot
  is empty" but guest_user_addr 0 is a valid guest virtual address

  The free-slot search does:
    if (dev->mem->regions[i].guest_user_addr == 0)
  
  Guest address 0 is valid (common in some VM configurations).
  If a region is mapped at guest VA 0, it would appear as a free
  slot.  The original code uses compact packing (nregions as the
  count), avoiding this problem.  Consider using a dedicated
  "in_use" flag or checking mmap_addr == NULL instead (mmap
  never returns NULL on success, it returns MAP_FAILED).

Warning: vhost_user_rem_mem_reg matches on guest_phys_addr but the
  spec says "identified by guest address, user address and size"

  The comment in the code says "identified by its guest address,
  user address and size. The mmap offset is ignored." but the
  comparison also checks guest_phys_addr (which is the GPA, not
  the guest user address).  The spec says matching is by
  userspace_addr (GVA), guest_phys_addr (GPA), and memory_size.
  So the match is actually correct in practice, but the comment
  is misleading -- it should mention GPA too, or remove the
  comment about what the spec says.

Warning: nregions becomes inconsistent with actual slot usage

  After add_mem_reg, nregions is incremented.  After rem_mem_reg,
  nregions is decremented.  But regions are not packed -- removing
  a region from the middle leaves a hole.  Functions like
  qva_to_vva, hva_to_gpa, and rte_vhost_va_from_guest_pa all
  iterate "i < mem->nregions" and index "mem->regions[i]"
  sequentially.  With sparse slots these functions will miss
  regions that are stored at indices >= nregions.

  Example: 3 regions at slots 0,1,2 (nregions=3).  Remove slot 1
  (memset to 0, nregions=2).  Now qva_to_vva iterates slots 0,1
  only -- slot 2 (still valid) is never checked.  Address
  translations for memory in the third region will fail.

  This is a correctness bug that will cause packet processing
  failures.  Either pack the array on removal (shift later entries
  down) or change all iteration to use VHOST_MEMORY_MAX_NREGIONS
  with skip-zero checks.

Warning: dev_invalidate_vrings does not update *pdev in add_mem_reg

  Related to the Error in patch 3 -- even if dev_invalidate_vrings
  is fixed to take **pdev, the caller vhost_user_add_mem_reg must
  also write "*pdev = dev;" to propagate the possibly-reallocated
  pointer back through the message handler framework.


Patch 5/5 -- vhost_user: enable configure memory slots
--------------------------------------------------------------------

Warning: the feature is enabled unconditionally but several
  address translation functions (qva_to_vva, hva_to_gpa,
  rte_vhost_va_from_guest_pa) have not been updated to handle
  sparse region arrays.  Enabling the feature flag before fixing
  the nregions iteration issue means any front-end that uses
  ADD_MEM_REG/REM_MEM_REG will hit broken address translations.


Summary
--------------------------------------------------------------------

The most critical issues to fix before this can be merged:

1. nregions vs sparse-slot iteration mismatch -- this will cause
   address translation failures at runtime after any REM_MEM_REG.
   All loops that iterate mem->regions must either use compact
   packing or be updated to scan all VHOST_MEMORY_MAX_NREGIONS
   slots with skip-zero checks.

2. dev_invalidate_vrings / translate_ring_addresses *pdev propagation
   -- use-after-free if numa_realloc fires.

3. Missing ctx->fds[0] = -1 in add_mem_reg -- double-close on
   error paths.

4. Overlap check using mmap_size instead of size -- false positive
   overlaps.

5. Error path in add_mem_reg destroys all regions instead of just
   the failed one.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v7 0/5] Support add/remove memory region & get-max-slots
  2026-03-29 19:46 ` [PATCH v7 0/5] Support add/remove memory region & get-max-slots Stephen Hemminger
@ 2026-04-02  7:44   ` Bathija, Pravin
  0 siblings, 0 replies; 10+ messages in thread
From: Bathija, Pravin @ 2026-04-02  7:44 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev@dpdk.org, maxime.coquelin@redhat.com, fengchengwen@huawei.com,
	Thomas Monjalon

Dear Stephen,

Thank you for reviewing the code and providing feedback. Responses inline


Internal Use - Confidential
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Sunday, March 29, 2026 12:46 PM
> To: Bathija, Pravin <Pravin.Bathija@dell.com>
> Cc: dev@dpdk.org; maxime.coquelin@redhat.com;
> fengchengwen@huawei.com
> Subject: Re: [PATCH v7 0/5] Support add/remove memory region & get-max-
> slots
>
>
> [EXTERNAL EMAIL]
>
> On Wed, 11 Feb 2026 10:24:28 +0000
> <pravin.bathija@dell.com> wrote:
>
> > From: Pravin M Bathija <pravin.bathija@dell.com>
> >
> > This is version v7 of the patchset and it incorporates the
> > recommendations made by Maxime Coquelin and Feng Cheng Wen.
> > The patchset includes support for adding and removal of memory
> > regions, getting max memory slots and other changes to vhost-user
> > messages.  These messages are sent from vhost-user front-end (qemu or
> > libblkio) to a vhost-user back-end (dpdk, spdk). Support functions for
> > these message functions have been implemented in the interest of
> > writing optimized code. Older functions, part of vhost-user back-end
> > have also been optimized using these newly defined support functions.
> > This implementation has been extensively tested by doing Read/Write
> > I/O from multiple instances of fio + libblkio (front-end) talking to
> > spdk/dpdk (back-end) based drives. Tested with qemu front-end talking
> > to dpdk + testpmd (back-end) performing add/removal of memory regions.
> > Also tested post-copy live migration after doing add_memory_region.
> >
> > Version Log:
> > Version v7 (Current version): Incorporate code review suggestions from
> > Maxime Coquelin. Add debug messages to vhost_postcopy_register
> > function.
> > Version v6: Added the enablement of this feature as a final patch in
> > this patch-set and other code optimizations as suggested by Maxime
> > Coquelin.
> > Version v5: removed the patch that increased the number of memory
> > regions from 8 to 128. This will be submitted as a separate feature at
> > a later point after incorporating additional optimizations. Also
> > includes code optimizations as suggested by Feng Cheng Wen.
> > Version v4: code optimizations as suggested by Feng Cheng Wen.
> > Version v3: code optimizations as suggested by Maxime Coquelin and
> > Thomas Monjalon.
> > Version v2: code optimizations as suggested by Maxime Coquelin.
> > Version v1: Initial patch set.
> >
> > Pravin M Bathija (5):
> >   vhost: add user to mailmap and define to vhost hdr
> >   vhost_user: header defines for add/rem mem region
> >   vhost_user: support function defines for back-end
> >   vhost_user: Function defs for add/rem mem regions
> >   vhost_user: enable configure memory slots
> >
> >  .mailmap               |   1 +
> >  lib/vhost/rte_vhost.h  |   4 +
> >  lib/vhost/vhost_user.c | 392
> > ++++++++++++++++++++++++++++++++++++-----
> >  lib/vhost/vhost_user.h |  10 ++
> >  4 files changed, 365 insertions(+), 42 deletions(-)
> >
>
> Ran this patch set through AI review (Gemini this time).
> It found lots of issues...
>
> ✦ I have performed a comprehensive review of the vhost_user patchset and
> identified several critical correctness issues that violate DPDK
>   guidelines.
>
>   Critical Correctness Bugs
>
>    1. Destructive Error Path in vhost_user_add_mem_reg:
>       In Patch 4/5, the error path for vhost_user_add_mem_reg (label
> free_mem_table) calls free_all_mem_regions(dev) and wipes dev->mem and
>   dev->guest_pages. This is catastrophic: if registration of a single new memory
> region fails, the entire existing memory configuration for the
>   running VM is destroyed, leading to immediate failure of all I/O. It should only
> clean up the specifically failed region.

Fixed. Replaced the free_mem_table label with free_new_region, which only cleans up the single failed region: DMA unmaps the
region, removes its guest page entries, calls free_mem_region on the single region, and decrements nregions. Existing regions
are untouched.

>
>    2. Broken Address Mapping in async_dma_map_region:
>       In Patch 3/5, async_dma_map_region assumes that if the first page of a
> region is found at index i in dev->guest_pages, all subsequent
>   pages for that region are at i+1, i+2, etc. This assumption is invalid because
> dev->guest_pages is a global array for the device and is
>   frequently sorted by address (guest_page_addrcmp). Pages from multiple
> regions will be interleaved, and the contiguous index assumption will
>   lead to mapping incorrect memory or walking off the end of the array.

Fixed. Rewrote async_dma_map_region to iterate all dev->guest_pages entries and select only those whose host_user_addr falls
within [reg->host_user_addr, reg->host_user_addr + reg->size). This eliminates the contiguous-index assumption entirely.

>
>    3. Stale Guest Page Entries in vhost_user_rem_mem_reg:
>       In Patch 4/5, vhost_user_rem_mem_reg removes a region from dev->mem
> but fails to remove the corresponding entries from dev->guest_pages.
>   This leaves stale mappings in the translation cache, which will cause incorrect
> address translations, potential memory corruption, or crashes
>   when the guest reuses those physical addresses.

Fixed. Added a new remove_guest_pages() function that compacts the dev->guest_pages array by removing all entries whose
host_user_addr falls within the removed region's range. rem_mem_reg now calls remove_guest_pages() before free_mem_region().

>
>    4. Vhost-User Protocol Violation:
>       In Patch 2/5, VhostUserSingleMemReg is defined with a uint64_t padding
> before the VhostUserMemoryRegion. The vhost-user specification for
>   VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG expects
> the payload to be exactly VhostUserMemoryRegion. This extra padding will
> cause the
>   back-end to misalign all fields, leading to incorrect address and size
> interpretations.

This is how I understand the code. The uint64_t padding (8 bytes) corresponds to the same space occupied by nregions (uint32_t) + padding
(uint32_t) in VhostUserMemory. This matches QEMU's wire format (VhostUserMemRegMsg in hw/virtio/vhost-user.c), which also has
8 bytes of padding before the region descriptor. The padding ensures the VhostUserMemoryRegion data starts at byte offset 8 in
the payload, consistent with SET_MEM_TABLE layout.  Hence I don't see a protocol violation and no change made.

>
>    5. Unsafe Assumption in async_dma_map_region Error Handling:
>       In Patch 3/5, async_dma_map_region logs DMA mapping/unmapping
> failures but continues execution. For memory registration, a DMA mapping
>   failure must be treated as a fatal error for that operation to ensure the DMA
> engine never attempts to access unmapped memory.
>

Fixed. async_dma_map_region now returns int (-1 on failure, 0 on success) instead of void. The caller in
vhost_user_add_mem_reg checks the return value and jumps to free_new_region on failure.

>   Style and Process Observations
>
>    * Low Memory Slot Limit: The implementation returns
> VHOST_MEMORY_MAX_NREGIONS (8) for
> VHOST_USER_GET_MAX_MEM_SLOTS. While technically
>      correct per the current DPDK defines, 8 slots is extremely restrictive for
> dynamic memory hotplug, which this feature is intended to
>      support.
>    * Missing Locking Assertions: In dev_invalidate_vrings,
> VHOST_USER_ASSERT_LOCK is called with VHOST_USER_ADD_MEM_REG. This
> macro typically
>      expects a string description of the operation for logging, rather than an
> enum value.

The macro uses C token-pasting (id ## _LOCK_ALL_QPS) which resolves to VHOST_USER_ADD_MEM_REG_LOCK_ALL_QPS — a compile-time
boolean constant generated by the VHOST_MESSAGE_HANDLER registration. It is not string-based. This is the same pattern used
throughout the file (e.g. VHOST_USER_SET_FEATURES, VHOST_USER_SET_MEM_TABLE, VHOST_USER_SET_VRING_ADDR). No change needed.

>
>   I recommend a significant redesign of the memory management logic to
> properly handle sparse region arrays and incremental guest page updates
>   before this patchset can be accepted.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v7 0/5] Support add/remove memory region & get-max-slots
  2026-03-31  3:26 ` Stephen Hemminger
@ 2026-04-02  8:07   ` Bathija, Pravin
  0 siblings, 0 replies; 10+ messages in thread
From: Bathija, Pravin @ 2026-04-02  8:07 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev@dpdk.org, maxime.coquelin@redhat.com, fengchengwen@huawei.com

Dear Stephen,

Thank you again for taking the time to review my patch-set. I have incorporated your suggestions and submitted patch-set v8.
The responses to your comments are inline


Internal Use - Confidential
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday, March 30, 2026 8:26 PM
> To: Bathija, Pravin <Pravin.Bathija@dell.com>
> Cc: dev@dpdk.org; maxime.coquelin@redhat.com;
> fengchengwen@huawei.com
> Subject: Re: [PATCH v7 0/5] Support add/remove memory region & get-max-
> slots
>
>
> [EXTERNAL EMAIL]
>
> On Wed, 11 Feb 2026 10:24:28 +0000
> <pravin.bathija@dell.com> wrote:
>
> > From: Pravin M Bathija <pravin.bathija@dell.com>
> >
> > This is version v7 of the patchset and it incorporates the
> > recommendations made by Maxime Coquelin and Feng Cheng Wen.
> > The patchset includes support for adding and removal of memory
> > regions, getting max memory slots and other changes to vhost-user
> > messages.  These messages are sent from vhost-user front-end (qemu or
> > libblkio) to a vhost-user back-end (dpdk, spdk). Support functions for
> > these message functions have been implemented in the interest of
> > writing optimized code. Older functions, part of vhost-user back-end
> > have also been optimized using these newly defined support functions.
> > This implementation has been extensively tested by doing Read/Write
> > I/O from multiple instances of fio + libblkio (front-end) talking to
> > spdk/dpdk (back-end) based drives. Tested with qemu front-end talking
> > to dpdk + testpmd (back-end) performing add/removal of memory regions.
> > Also tested post-copy live migration after doing add_memory_region.
> >
> > Version Log:
> > Version v7 (Current version): Incorporate code review suggestions from
> > Maxime Coquelin. Add debug messages to vhost_postcopy_register
> > function.
> > Version v6: Added the enablement of this feature as a final patch in
> > this patch-set and other code optimizations as suggested by Maxime
> > Coquelin.
> > Version v5: removed the patch that increased the number of memory
> > regions from 8 to 128. This will be submitted as a separate feature at
> > a later point after incorporating additional optimizations. Also
> > includes code optimizations as suggested by Feng Cheng Wen.
> > Version v4: code optimizations as suggested by Feng Cheng Wen.
> > Version v3: code optimizations as suggested by Maxime Coquelin and
> > Thomas Monjalon.
> > Version v2: code optimizations as suggested by Maxime Coquelin.
> > Version v1: Initial patch set.
> >
> > Pravin M Bathija (5):
> >   vhost: add user to mailmap and define to vhost hdr
> >   vhost_user: header defines for add/rem mem region
> >   vhost_user: support function defines for back-end
> >   vhost_user: Function defs for add/rem mem regions
> >   vhost_user: enable configure memory slots
> >
> >  .mailmap               |   1 +
> >  lib/vhost/rte_vhost.h  |   4 +
> >  lib/vhost/vhost_user.c | 392
> > ++++++++++++++++++++++++++++++++++++-----
> >  lib/vhost/vhost_user.h |  10 ++
> >  4 files changed, 365 insertions(+), 42 deletions(-)
> >
>
> Lots more stuff found by AI code review, this is not ready.
>
> Review of [PATCH v7 1-5/5] vhost: configure memory slots support
> Author: Pravin M Bathija <pravin.bathija@dell.com>
>
> This patch series adds support for the
> VHOST_USER_F_CONFIGURE_MEM_SLOTS protocol feature, enabling
> add/remove individual memory regions instead of requiring a full
> SET_MEM_TABLE each time.  The approach is sound and the feature is needed,
> but several correctness bugs need to be addressed.
>
> Patch 3/5 -- vhost_user: support function defines for back-end
> --------------------------------------------------------------------
>
> Error: dev_invalidate_vrings loses *pdev update after numa_realloc
>
>   translate_ring_addresses() calls numa_realloc(), which can reallocate
>   the dev struct and update *pdev.  In the original set_mem_table the
>   caller writes "*pdev = dev;" after the call to observe this.
>
>   dev_invalidate_vrings() takes a plain "struct virtio_net *dev" and
>   passes "&dev, &vq" to translate_ring_addresses -- but dev is a local
>   copy.  If numa_realloc reallocates, the caller's *pdev (in
>   vhost_user_add_mem_reg) is never updated, and all subsequent accesses
>   use a stale/freed pointer -- a use-after-free.
>
>   The function should take struct virtio_net **pdev (matching the
>   pattern used in set_mem_table) and propagate updates back.
>

Fixed. Changed dev_invalidate_vrings signature to take struct virtio_net **pdev. The function dereferences *pdev locally,
passes &dev to translate_ring_addresses, and writes *pdev = dev at the end to propagate any reallocation. Both callers
(add_mem_reg and rem_mem_reg) now pass pdev and update their local dev = *pdev afterward.

> Error: async_dma_map_region can underflow reg_size
>
>   The loop "reg_size -= page->size" uses unsigned subtraction. If
>   page->size does not evenly divide reg->size (which can happen with
>   misaligned guest pages), reg_size wraps to a huge value and the loop
>   runs out of bounds.  The original async_dma_map avoids this by
>   iterating over nr_guest_pages directly.
>
>   Suggest checking "reg_size >= page->size" before subtracting, or
>   better yet iterating like the original does.
>

Fixed. Rewrote async_dma_map_region to iterate all dev->guest_pages entries and select only those whose host_user_addr falls
within [reg->host_user_addr, reg->host_user_addr + reg->size). This eliminates the contiguous-index assumption entirely.
This rewrite eliminates reg_size tracking entirely. The function now iterates all guest pages and filters by host_user_addr range,
so there is no subtraction that can underflow.

> Warning: free_all_mem_regions iterates all
> VHOST_MEMORY_MAX_NREGIONS
>   but the original free_mem_region iterated only dev->mem->nregions
>
>   This is intentional for the sparse-slot design, but free_mem_region()
>   checks "reg->host_user_addr" while free_all_mem_regions checks
>   "reg->mmap_addr" as the sentinel.  These should be consistent.
>   A region that was partially initialized (mmap_addr set but
>   host_user_addr not yet set, or vice versa) would behave
>   differently depending on which sentinel is checked.
>

Fixed. Changed free_mem_region to check reg->mmap_addr instead of reg->host_user_addr, making both functions consistent.

> Warning: vhost_user_initialize_memory unconditionally allocates
>   dev->mem without freeing any prior allocation
>
>   If dev->mem is non-NULL when vhost_user_initialize_memory is called,
>   the old allocation is leaked.  set_mem_table frees dev->mem before
>   calling it, but add_mem_reg only checks "dev->mem == NULL" and skips
>   the call otherwise.  This is safe as coded, but the function itself
>   is fragile -- consider adding an assertion or early return if
>   dev->mem is already set.
>

Fixed. Added a guard at the top of vhost_user_initialize_memory: if dev->mem != NULL, log an error and return -1. This makes
the function safe against accidental double-initialization.

> Info: vhost_user_postcopy_register changes use reg_msg_index but the
>   second loop (postcopy_region_register) still iterates dev->mem->regions
>   with nregions as the bound and just skips zeros.  This is correct
>   but the two loops use different iteration strategies which is
>   confusing.  Consider making both iterate VHOST_MEMORY_MAX_NREGIONS
>   with a skip-zero check for consistency.

Fixed differently. vhost_user_add_mem_reg no longer calls vhost_user_postcopy_register() at all. That function reads ctx-
>msg.payload.memory (the SET_MEM_TABLE union member), but ADD_MEM_REG uses the memory_single payload — reading the wrong union
member would yield garbage for nregions. Instead, add_mem_reg now calls vhost_user_postcopy_region_register(dev, reg) directly
for the single new region, bypassing the message-based iteration entirely.

>
>
> Patch 4/5 -- vhost_user: Function defs for add/rem mem regions
> --------------------------------------------------------------------
>
> Error: vhost_user_add_mem_reg does not set ctx->fds[0] = -1 after
>   assigning reg->fd = ctx->fds[0]
>
>   In the original set_mem_table, after "reg->fd = ctx->fds[i]" the
>   code sets "ctx->fds[i] = -1" to prevent the fd from being double-
>   closed if the error path calls close_msg_fds().  The new add_mem_reg
>   omits this.  If vhost_user_mmap_region or vhost_user_postcopy_register
>   fails, the error path calls close_msg_fds(ctx) which closes
>   ctx->fds[0] -- but reg->fd still holds the same value.  Later,
>   free_all_mem_regions will close(reg->fd) again -- double close.
>
>   Fix: add "ctx->fds[0] = -1;" after "reg->fd = ctx->fds[0];".
>

Fixed. Added ctx->fds[0] = -1; immediately after reg->fd = ctx->fds[0]; to prevent close_msg_fds from double-closing the fd on
error paths.

> Error: vhost_user_add_mem_reg error path frees all memory on
>   single-region failure
>
>   If postcopy_register fails after successfully mmap'ing one region
>   (when other regions already exist), the "free_mem_table" label
>   calls free_all_mem_regions + rte_free(dev->mem) + rte_free
>   (dev->guest_pages), destroying ALL previously registered regions.
>   This is disproportionate -- the error should only unmap the single
>   region that was just added and decrement nregions.
>

Fixed. Replaced the free_mem_table label with free_new_region, which only cleans up the single failed region: DMA unmaps the
region, removes its guest page entries, calls free_mem_region on the single region, and decrements nregions. Existing regions
are untouched.

> Error: overlap check uses mmap_size for existing region but
>   memory_size for proposed region
>
>   The overlap check compares:
>     current_region_guest_end = guest_user_addr + mmap_size - 1
>     proposed_region_guest_end = userspace_addr + memory_size - 1
>
>   mmap_size = memory_size + mmap_offset (set by
> vhost_user_mmap_region).
>   So the existing region's range is inflated by the mmap_offset,
>   potentially producing false overlaps.  Use current_region->size
>   (which corresponds to memory_size) instead of mmap_size for a
>   correct comparison.
>

Fixed. Changed the overlap check to use current_region->size (which corresponds to memory_size from the protocol) instead of
current_region->mmap_size, in both the range calculation and the error log message.

> Error: vhost_user_add_mem_reg uses guest_user_addr == 0 as "slot
>   is empty" but guest_user_addr 0 is a valid guest virtual address
>
>   The free-slot search does:
>     if (dev->mem->regions[i].guest_user_addr == 0)
>
>   Guest address 0 is valid (common in some VM configurations).
>   If a region is mapped at guest VA 0, it would appear as a free
>   slot.  The original code uses compact packing (nregions as the
>   count), avoiding this problem.  Consider using a dedicated
>   "in_use" flag or checking mmap_addr == NULL instead (mmap
>   never returns NULL on success, it returns MAP_FAILED).
>

Fixed. The regions array is now kept compact (contiguous from index 0 to nregions-1) via memmove compaction in rem_mem_reg.
New regions are always placed at regions[dev->mem->nregions] — no free-slot search is needed, eliminating the guest_user_addr
== 0 sentinel entirely.

> Warning: vhost_user_rem_mem_reg matches on guest_phys_addr but the
>   spec says "identified by guest address, user address and size"
>
>   The comment in the code says "identified by its guest address,
>   user address and size. The mmap offset is ignored." but the
>   comparison also checks guest_phys_addr (which is the GPA, not
>   the guest user address).  The spec says matching is by
>   userspace_addr (GVA), guest_phys_addr (GPA), and memory_size.
>   So the match is actually correct in practice, but the comment
>   is misleading -- it should mention GPA too, or remove the
>   comment about what the spec says.
>

Fixed. Updated the comment to: "The memory region to be removed is identified by its GPA, user address and size."

> Warning: nregions becomes inconsistent with actual slot usage
>
>   After add_mem_reg, nregions is incremented.  After rem_mem_reg,
>   nregions is decremented.  But regions are not packed -- removing
>   a region from the middle leaves a hole.  Functions like
>   qva_to_vva, hva_to_gpa, and rte_vhost_va_from_guest_pa all
>   iterate "i < mem->nregions" and index "mem->regions[i]"
>   sequentially.  With sparse slots these functions will miss
>   regions that are stored at indices >= nregions.
>
>   Example: 3 regions at slots 0,1,2 (nregions=3).  Remove slot 1
>   (memset to 0, nregions=2).  Now qva_to_vva iterates slots 0,1
>   only -- slot 2 (still valid) is never checked.  Address
>   translations for memory in the third region will fail.
>
>   This is a correctness bug that will cause packet processing
>   failures.  Either pack the array on removal (shift later entries
>   down) or change all iteration to use VHOST_MEMORY_MAX_NREGIONS
>   with skip-zero checks.
>

Fixed. rem_mem_reg now compacts the regions array after removal using memmove to shift later entries down, then zeroes the
trailing slot. This keeps the array contiguous from index 0 to nregions-1, so all existing iteration using i < nregions
continues to work correctly.

> Warning: dev_invalidate_vrings does not update *pdev in add_mem_reg
>
>   Related to the Error in patch 3 -- even if dev_invalidate_vrings
>   is fixed to take **pdev, the caller vhost_user_add_mem_reg must
>   also write "*pdev = dev;" to propagate the possibly-reallocated
>   pointer back through the message handler framework.
>

Fixed. Changed dev_invalidate_vrings signature to take struct virtio_net **pdev. The function dereferences *pdev locally,
passes &dev to translate_ring_addresses, and writes *pdev = dev at the end to propagate any reallocation. Both callers
(add_mem_reg and rem_mem_reg) now pass pdev and update their local dev = *pdev afterward.

>
> Patch 5/5 -- vhost_user: enable configure memory slots
> --------------------------------------------------------------------
>
> Warning: the feature is enabled unconditionally but several
>   address translation functions (qva_to_vva, hva_to_gpa,
>   rte_vhost_va_from_guest_pa) have not been updated to handle
>   sparse region arrays.  Enabling the feature flag before fixing
>   the nregions iteration issue means any front-end that uses
>   ADD_MEM_REG/REM_MEM_REG will hit broken address translations.
>

Resolved by the compaction fix in patch 4 (comment #18). Since the regions array is always kept contiguous, these functions
work correctly without modification. No changes needed in patch 5.

>
> Summary
> --------------------------------------------------------------------
>
> The most critical issues to fix before this can be merged:
>
> 1. nregions vs sparse-slot iteration mismatch -- this will cause
>    address translation failures at runtime after any REM_MEM_REG.
>    All loops that iterate mem->regions must either use compact
>    packing or be updated to scan all VHOST_MEMORY_MAX_NREGIONS
>    slots with skip-zero checks.
>
> 2. dev_invalidate_vrings / translate_ring_addresses *pdev propagation
>    -- use-after-free if numa_realloc fires.
>
> 3. Missing ctx->fds[0] = -1 in add_mem_reg -- double-close on
>    error paths.
>
> 4. Overlap check using mmap_size instead of size -- false positive
>    overlaps.
>
> 5. Error path in add_mem_reg destroys all regions instead of just
>    the failed one.

Additional fix found during testing
VHOST_USER_REM_MEM_REG registered with accepts_fd = false

The vhost-user protocol sends an FD with REM_MEM_REG messages. The handler was registered with accepts_fd = false, causing
validate_msg_fds to reject every REM_MEM_REG message with "expect 0 FDs, received 1". Fixed by changing the registration to
accepts_fd = true and adding close_msg_fds(ctx) calls in all exit paths of rem_mem_reg to properly close the received FD.



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-04-02  8:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-11 10:24 [PATCH v7 0/5] Support add/remove memory region & get-max-slots pravin.bathija
2026-02-11 10:24 ` [PATCH v7 1/5] vhost: add user to mailmap and define to vhost hdr pravin.bathija
2026-02-11 10:24 ` [PATCH v7 2/5] vhost_user: header defines for add/rem mem region pravin.bathija
2026-02-11 10:24 ` [PATCH v7 3/5] vhost_user: support function defines for back-end pravin.bathija
2026-02-11 10:24 ` [PATCH v7 4/5] vhost_user: Function defs for add/rem mem regions pravin.bathija
2026-02-11 10:24 ` [PATCH v7 5/5] vhost_user: enable configure memory slots pravin.bathija
2026-03-29 19:46 ` [PATCH v7 0/5] Support add/remove memory region & get-max-slots Stephen Hemminger
2026-04-02  7:44   ` Bathija, Pravin
2026-03-31  3:26 ` Stephen Hemminger
2026-04-02  8:07   ` Bathija, Pravin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox