From: Stephen Hemminger <stephen@networkplumber.org>
To: <pravin.bathija@dell.com>
Cc: <dev@dpdk.org>, <fengchengwen@huawei.com>,
<maxime.coquelin@redhat.com>, <thomas@monjalon.net>
Subject: Re: [PATCH v13 0/5] Support add/remove memory region and get-max-slots
Date: Mon, 18 May 2026 10:13:44 -0700 [thread overview]
Message-ID: <20260518101344.7a1de0ef@phoenix.local> (raw)
In-Reply-To: <20260514020157.1937404-1-pravin.bathija@dell.com>
On Thu, 14 May 2026 02:01:52 +0000
<pravin.bathija@dell.com> wrote:
> From: Pravin M Bathija <pravin.bathija@dell.com>
>
> This is version v13 of the patchset and it incorporates the
> recommendations made by Fengcheng Wen.
>
> Changes made to patch 3/5 and 4/5
> * Relocated function remove_guest_pages from patch 3/5 to 4/5.
> * Renamed VhostUserSingleMemReg to VhostUserMemRegMsg and memory_single
> to memreg.
>
> This implementation has been extensively tested by doing Read/Write I/O
> from multiple instances of fio + libblkio (front-end) talking to
> spdk/dpdk (back-end) based drives. Tested with qemu front-end talking to
> dpdk testpmd (back-end) performing add/removal of memory regions. Also
> tested post-copy live migration after doing add_memory_region.
>
> Version Log:
> Version v13 (Current version): Incorporate code review suggestions from
> Fengcheng Wen as described above.
> Version v12: Incorporate code review suggestions from Maxime Coquelin
> and ai-code-review.
> Changes made to patch 3/5
> Refactored async_dma_map() to delegate to async_dma_map_region(),
> eliminating code duplication between the two functions.
> Restored original comments in async_dma_map_region() explaining why
> ENODEV and EINVAL errors are ignored (these were stripped in v10)
> Reverted unnecessary changes to vhost_user_postcopy_register() --
> removed the host_user_addr == 0 checks and reg_msg_index indirection
> that were added in v10, since this function is only called from
> vhost_user_set_mem_table() where regions are always contiguous.
>
> Version v11: Incorporate code review suggestions from Stephen Hemminger.
> Change made to patch 4/5
> Fix incomplete cleanup in vhost_user_add_mem_reg() when
> vhost_user_mmap_region() fails after the mmap succeeds (e.g.
> add_guest_pages() realloc failure) realloc failure). The error path now
> calls remove_guest_pages() and free_mem_region() to undo the mapping
> and stale guest-page entries, preventing a leaked mmap and slot reuse
> corruption. The plain close(fd) path is kept for pre-mmap failures.
>
> Version v10: Incorporate code review suggestions from Stephen Hemminger.
> Change made to patch 4/5
> Moved dev_invalidate_vrings after free_mem_region, array compaction, and
> nregions decrement. This ensures translate_ring_addresses only sees
> surviving memory regions, preventing vring pointers from resolving into
> a region that is about to be unmapped.
>
> Version v9: Incorporate code review suggestions from Stephen Hemminger.
> Changes made to patch 3/5
> Restored max_guest_pages initial value to hardcoded 8 instead of
> VHOST_MEMORY_MAX_NREGIONS, matching upstream semantics.
> Changes made to patch 4/5
> Added close(reg->fd) and reg->fd = -1 before goto close_msg_fds in the
> mmap failure path to fix fd leak after fd was moved from ctx->fds[0].
> Converted dev_invalidate_vrings from a plain function to a macro +
> implementation function pair, accepting message ID as a parameter so
> the static_assert reports the correct handler at each call site.
> Updated dev_invalidate_vrings call in add_mem_reg to pass
> VHOST_USER_ADD_MEM_REG as message ID.
> Updated dev_invalidate_vrings call in rem_mem_reg to pass
> VHOST_USER_REM_MEM_REG as message ID.
>
> Version v8: Incorporate code review suggestions from Stephen Hemminger.
> rewrite async_dma_map_region function to iterate guest pages by host
> address range matching
> change function dev_invalidate_vrings to accept a double pointer to
> propagate pointer updates
> new function remove_guest_pages was added
> add_mem_reg error path was narrowed to only clean up the single failed
> region instead of destroting all existing regions
>
> Version v7: Incorporate code review suggestions from Maxime Coquelin.
> Add debug messages to vhost_postcopy_register function.
>
> Version v6: Added the enablement of this feature as a final patch in
> this patch-set and other code optimizations as suggested by Maxime
> Coquelin.
>
> Version v5: removed the patch that increased the number of memory regions
> from 8 to 128. This will be submitted as a separate feature at a later
> point after incorporating additional optimizations. Also includes code
> optimizations as suggested by Feng Cheng Wen.
>
> Version v4: code optimizations as suggested by Feng Cheng Wen.
>
> Version v3: code optimizations as suggested by Maxime Coquelin
> and Thomas Monjalon.
>
> Version v2: code optimizations as suggested by Maxime Coquelin.
>
> Version v1: Initial patch set.
>
> Pravin M Bathija (5):
> vhost: add user to mailmap and define to vhost hdr
> vhost_user: header defines for add/rem mem region
> vhost_user: support function defines for back-end
> vhost_user: Function defs for add/rem mem regions
> vhost_user: enable configure memory slots
>
> .mailmap | 1 +
> lib/vhost/rte_vhost.h | 4 +
> lib/vhost/vhost_user.c | 418 +++++++++++++++++++++++++++++++++++------
> lib/vhost/vhost_user.h | 10 +
> 4 files changed, 371 insertions(+), 62 deletions(-)
>
Some useful AI feedback
Review of [PATCH v13 0-5/5] vhost: configure memory slots support
Author: Pravin M Bathija <pravin.bathija@dell.com>
This revision addresses essentially every correctness issue raised in
the v7-v12 reviews:
- ctx->fds[0] = -1 ownership transfer is now done before mmap, and
the mmap-failure path closes reg->fd explicitly when mmap never
set reg->mmap_addr.
- _dev_invalidate_vrings now takes struct virtio_net **pdev and
writes back *pdev = dev at the end, so a numa_realloc inside
translate_ring_addresses propagates correctly. Both call sites
refresh "dev = *pdev;" afterwards.
- The dev_invalidate_vrings() macro now takes the message id and
uses static_assert(id ## _LOCK_ALL_QPS, ...), matching the
existing VHOST_USER_ASSERT_LOCK pattern. Works for both
VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG call sites.
- Overlap check in vhost_user_add_mem_reg uses guest address
space (guest_user_addr, size / userspace_addr, memory_size),
no longer mmap_size.
- free_new_region undoes only the failed region: async DMA unmap,
remove_guest_pages, free_mem_region(reg), nregions--.
- async_dma_map_region iterates dev->nr_guest_pages and filters
by [reg_start, reg_end), eliminating the prior reg_size
underflow loop.
- The regions array is kept contiguous via memmove on REM_MEM_REG,
so existing iterators that walk mem->nregions remain correct.
- max_guest_pages is back to 8 in vhost_user_initialize_memory.
One protocol-level issue remains worth raising.
Patch 4/5 -- vhost_user: Function defs for add/rem mem regions
--------------------------------------------------------------------
Warning: ADD_MEM_REG does not send the host_user_addr reply
Per the vhost-user spec for VHOST_USER_ADD_MEM_REG, the back-end
is expected to reply with the same message format and the
userspace_addr field replaced by the host userspace address that
the region was mapped into. The handler returns
RTE_VHOST_MSG_RESULT_OK with no reply constructed, so the
dispatcher does not call send_vhost_reply().
For postcopy migration this matters in particular: the original
vhost_user_postcopy_register() does two things -- exchange the
host_user_addr with the front-end and wait for an ack, then
register the regions with userfaultfd. The patch only does the
userfaultfd registration via vhost_user_postcopy_region_register().
The in-code comment notes the payload-layout mismatch with
vhost_user_postcopy_register() but stops there.
Without the address reply, QEMU will not know the back-end's
mapping for regions added via ADD_MEM_REG, so the userfaultfd
handling on the QEMU side cannot resolve faults in those
regions. Postcopy migration combined with the
CONFIGURE_MEM_SLOTS feature will not work.
Suggested fix: construct a memreg-payload reply with
region->userspace_addr replaced by reg->host_user_addr and
return RTE_VHOST_MSG_RESULT_REPLY. At minimum, refuse
ADD_MEM_REG when dev->postcopy_listening is set, so that the
combination fails cleanly rather than silently mis-mapping.
Info: vhost_user_rem_mem_reg does not validate ctx->fd_num
The handler is registered with accepts_fd = true and does not
call validate_msg_fds(). The trailing close_msg_fds(ctx) cleans
up whatever fds were passed, so this is not a leak, but a
malformed message with an unexpected fd count is silently
accepted. The other accepts_fd handlers in this file validate
fd_num explicitly.
Info: vhost_user_get_max_mem_slots cast is unnecessary
ctx->msg.payload.u64 = (uint64_t)max_mem_slots;
max_mem_slots is uint32_t and the assignment widens
automatically; the cast can be dropped. Minor.
Reviewed-by would be appropriate once the postcopy reply is
addressed (or the combination is rejected). The rest of the
series looks correct.
next prev parent reply other threads:[~2026-05-18 17:13 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-14 2:01 [PATCH v13 0/5] Support add/remove memory region and get-max-slots pravin.bathija
2026-05-14 2:01 ` [PATCH v13 1/5] vhost: add user to mailmap and define to vhost hdr pravin.bathija
2026-05-14 2:01 ` [PATCH v13 2/5] vhost_user: header defines for add/rem mem region pravin.bathija
2026-05-14 2:01 ` [PATCH v13 3/5] vhost_user: support function defines for back-end pravin.bathija
2026-05-14 2:01 ` [PATCH v13 4/5] vhost_user: Function defs for add/rem mem regions pravin.bathija
2026-05-14 2:01 ` [PATCH v13 5/5] vhost_user: enable configure memory slots pravin.bathija
2026-05-16 2:55 ` [PATCH v14 00/11] net/sxe2: fix logic errors and address feedback liujie5
2026-05-16 2:55 ` [PATCH v14 01/11] mailmap: add Jie Liu liujie5
2026-05-16 2:55 ` [PATCH v14 02/11] doc: add sxe2 guide and release notes liujie5
2026-05-16 2:55 ` [PATCH v14 03/11] common/sxe2: add sxe2 basic structures liujie5
2026-05-16 2:55 ` [PATCH v14 04/11] drivers: add base driver skeleton liujie5
2026-05-16 2:55 ` [PATCH v14 05/11] drivers: add base driver probe skeleton liujie5
2026-05-16 2:55 ` [PATCH v14 06/11] drivers: support PCI BAR mapping liujie5
2026-05-16 2:55 ` [PATCH v14 07/11] common/sxe2: add ioctl interface for DMA map and unmap liujie5
2026-05-16 2:55 ` [PATCH v14 08/11] net/sxe2: support queue setup and control liujie5
2026-05-16 2:55 ` [PATCH v14 09/11] drivers: add data path for Rx and Tx liujie5
2026-05-16 2:55 ` [PATCH v14 10/11] net/sxe2: add vectorized " liujie5
2026-05-16 2:55 ` [PATCH v14 11/11] net/sxe2: implement Tx done cleanup liujie5
2026-05-16 7:46 ` [PATCH v15 00/11] net/sxe2: fix logic errors and address feedback liujie5
2026-05-16 7:46 ` [PATCH v15 01/11] mailmap: add Jie Liu liujie5
2026-05-16 7:46 ` [PATCH v15 02/11] doc: add sxe2 guide and release notes liujie5
2026-05-16 7:46 ` [PATCH v15 03/11] common/sxe2: add sxe2 basic structures liujie5
2026-05-16 7:46 ` [PATCH v15 04/11] drivers: add base driver skeleton liujie5
2026-05-16 7:46 ` [PATCH v15 05/11] drivers: add base driver probe skeleton liujie5
2026-05-16 7:46 ` [PATCH v15 06/11] drivers: support PCI BAR mapping liujie5
2026-05-16 7:46 ` [PATCH v15 07/11] common/sxe2: add ioctl interface for DMA map and unmap liujie5
2026-05-16 7:46 ` [PATCH v15 08/11] net/sxe2: support queue setup and control liujie5
2026-05-16 7:46 ` [PATCH v15 09/11] drivers: add data path for Rx and Tx liujie5
2026-05-16 7:46 ` [PATCH v15 10/11] net/sxe2: add vectorized " liujie5
2026-05-16 7:46 ` [PATCH v15 11/11] net/sxe2: implement Tx done cleanup liujie5
2026-05-18 9:13 ` [PATCH v16 00/11] net/sxe2: fix logic errors and address feedback liujie5
2026-05-18 9:13 ` [PATCH v16 01/11] mailmap: add Jie Liu liujie5
2026-05-18 9:13 ` [PATCH v16 02/11] doc: add sxe2 guide and release notes liujie5
2026-05-18 9:13 ` [PATCH v16 03/11] common/sxe2: add sxe2 basic structures liujie5
2026-05-18 9:13 ` [PATCH v16 04/11] drivers: add base driver skeleton liujie5
2026-05-18 9:13 ` [PATCH v16 05/11] drivers: add base driver probe skeleton liujie5
2026-05-18 9:14 ` [PATCH v16 06/11] drivers: support PCI BAR mapping liujie5
2026-05-18 9:14 ` [PATCH v16 07/11] common/sxe2: add ioctl interface for DMA map and unmap liujie5
2026-05-18 9:14 ` [PATCH v16 08/11] net/sxe2: support queue setup and control liujie5
2026-05-18 9:14 ` [PATCH v16 09/11] drivers: add data path for Rx and Tx liujie5
2026-05-18 9:14 ` [PATCH v16 10/11] net/sxe2: add vectorized " liujie5
2026-05-18 9:14 ` [PATCH v16 11/11] net/sxe2: implement Tx done cleanup liujie5
2026-05-19 3:01 ` [PATCH v17 00/11] net/sxe2: fix logic errors and address feedback liujie5
2026-05-19 3:01 ` [PATCH v17 01/11] mailmap: add Jie Liu liujie5
2026-05-19 3:01 ` [PATCH v17 02/11] doc: add sxe2 guide and release notes liujie5
2026-05-19 3:01 ` [PATCH v17 03/11] common/sxe2: add sxe2 basic structures liujie5
2026-05-19 3:01 ` [PATCH v17 04/11] drivers: add base driver skeleton liujie5
2026-05-19 3:01 ` [PATCH v17 05/11] drivers: add base driver probe skeleton liujie5
2026-05-19 3:01 ` [PATCH v17 06/11] drivers: support PCI BAR mapping liujie5
2026-05-19 3:01 ` [PATCH v17 07/11] common/sxe2: add ioctl interface for DMA map and unmap liujie5
2026-05-19 17:41 ` Stephen Hemminger
2026-05-19 3:01 ` [PATCH v17 08/11] net/sxe2: support queue setup and control liujie5
2026-05-19 3:01 ` [PATCH v17 09/11] drivers: add data path for Rx and Tx liujie5
2026-05-19 3:01 ` [PATCH v17 10/11] net/sxe2: add vectorized " liujie5
2026-05-19 3:01 ` [PATCH v17 11/11] net/sxe2: implement Tx done cleanup liujie5
2026-05-19 14:47 ` [PATCH v18 00/11] net/sxe2: fix logic errors and address feedback liujie5
2026-05-19 14:48 ` [PATCH v18 01/11] mailmap: add Jie Liu liujie5
2026-05-19 14:48 ` [PATCH v18 02/11] doc: add sxe2 guide and release notes liujie5
2026-05-19 14:48 ` [PATCH v18 03/11] common/sxe2: add sxe2 basic structures liujie5
2026-05-19 14:48 ` [PATCH v18 04/11] drivers: add base driver skeleton liujie5
2026-05-19 14:48 ` [PATCH v18 05/11] drivers: add base driver probe skeleton liujie5
2026-05-19 14:48 ` [PATCH v18 06/11] drivers: support PCI BAR mapping liujie5
2026-05-19 14:48 ` [PATCH v18 07/11] common/sxe2: add ioctl interface for DMA map and unmap liujie5
2026-05-19 14:48 ` [PATCH v18 08/11] net/sxe2: support queue setup and control liujie5
2026-05-19 14:48 ` [PATCH v18 09/11] drivers: add data path for Rx and Tx liujie5
2026-05-19 14:48 ` [PATCH v18 10/11] net/sxe2: add vectorized " liujie5
2026-05-19 14:48 ` [PATCH v18 11/11] net/sxe2: implement Tx done cleanup liujie5
2026-05-20 2:17 ` [PATCH v19 00/11]net/sxe2: fix logic errors and address feedback liujie5
2026-05-20 2:17 ` [PATCH v19 01/11] mailmap: add Jie Liu liujie5
2026-05-20 2:18 ` [PATCH v19 02/11] doc: add sxe2 guide and release notes liujie5
2026-05-20 2:18 ` [PATCH v19 03/11] common/sxe2: add sxe2 basic structures liujie5
2026-05-20 2:18 ` [PATCH v19 04/11] drivers: add base driver skeleton liujie5
2026-05-20 2:18 ` [PATCH v19 05/11] drivers: add base driver probe skeleton liujie5
2026-05-20 2:18 ` [PATCH v19 06/11] drivers: support PCI BAR mapping liujie5
2026-05-20 2:18 ` [PATCH v19 07/11] common/sxe2: add ioctl interface for DMA map and unmap liujie5
2026-05-20 2:18 ` [PATCH v19 08/11] net/sxe2: support queue setup and control liujie5
2026-05-20 2:18 ` [PATCH v19 09/11] drivers: add data path for Rx and Tx liujie5
2026-05-20 2:18 ` [PATCH v19 10/11] net/sxe2: add vectorized " liujie5
2026-05-20 2:18 ` [PATCH v19 11/11] net/sxe2: implement Tx done cleanup liujie5
2026-05-20 15:38 ` [PATCH v19 00/11]net/sxe2: fix logic errors and address feedback Stephen Hemminger
2026-05-21 15:16 ` Thomas Monjalon
2026-05-26 14:13 ` Thomas Monjalon
2026-06-01 9:51 ` Thomas Monjalon
2026-06-01 10:14 ` liujie5
2026-05-20 0:32 ` [PATCH v18 00/11] net/sxe2: " Stephen Hemminger
2026-05-19 14:16 ` [PATCH v17 " Stephen Hemminger
2026-05-18 17:25 ` [PATCH v16 " Stephen Hemminger
2026-05-18 0:02 ` [PATCH v15 " Stephen Hemminger
2026-05-18 17:13 ` Stephen Hemminger [this message]
2026-05-20 2:36 ` [PATCH v13 0/5] Support add/remove memory region and get-max-slots Bathija, Pravin
-- strict thread matches above, loose matches on Subject: below --
2026-05-14 22:46 pravin.bathija
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260518101344.7a1de0ef@phoenix.local \
--to=stephen@networkplumber.org \
--cc=dev@dpdk.org \
--cc=fengchengwen@huawei.com \
--cc=maxime.coquelin@redhat.com \
--cc=pravin.bathija@dell.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.