DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: <pravin.bathija@dell.com>
Cc: <dev@dpdk.org>, <fengchengwen@huawei.com>,
	<maxime.coquelin@redhat.com>, <thomas@monjalon.net>
Subject: Re: [PATCH v13 0/5] Support add/remove memory region and get-max-slots
Date: Mon, 18 May 2026 10:13:44 -0700	[thread overview]
Message-ID: <20260518101344.7a1de0ef@phoenix.local> (raw)
In-Reply-To: <20260514020157.1937404-1-pravin.bathija@dell.com>

On Thu, 14 May 2026 02:01:52 +0000
<pravin.bathija@dell.com> wrote:

> From: Pravin M Bathija <pravin.bathija@dell.com>
> 
> This is version v13 of the patchset and it incorporates the
> recommendations made by Fengcheng Wen.
> 
> Changes made to patch 3/5 and 4/5
> * Relocated function remove_guest_pages from patch 3/5 to 4/5.
> * Renamed VhostUserSingleMemReg to VhostUserMemRegMsg and memory_single
>   to memreg.
> 
> This implementation has been extensively tested by doing Read/Write I/O
> from multiple instances of fio + libblkio (front-end) talking to
> spdk/dpdk (back-end) based drives. Tested with qemu front-end talking to
> dpdk testpmd (back-end) performing add/removal of memory regions. Also
> tested post-copy live migration after doing add_memory_region.
> 
> Version Log:
> Version v13 (Current version): Incorporate code review suggestions from
> Fengcheng Wen as described above.
> Version v12: Incorporate code review suggestions from Maxime Coquelin
> and ai-code-review.
> Changes made to patch 3/5
> Refactored async_dma_map() to delegate to async_dma_map_region(),
> eliminating code duplication between the two functions.
> Restored original comments in async_dma_map_region() explaining why
> ENODEV and EINVAL errors are ignored (these were stripped in v10)
> Reverted unnecessary changes to vhost_user_postcopy_register() --
> removed the host_user_addr == 0 checks and reg_msg_index indirection
> that were added in  v10, since this function is only called from
> vhost_user_set_mem_table() where regions are always contiguous.
> 
> Version v11: Incorporate code review suggestions from Stephen Hemminger.
> Change made to patch 4/5
> Fix incomplete cleanup in vhost_user_add_mem_reg() when
> vhost_user_mmap_region() fails after the mmap succeeds (e.g.
> add_guest_pages() realloc failure) realloc failure). The error path now
> calls remove_guest_pages() and free_mem_region() to undo the mapping
> and stale guest-page entries, preventing a leaked mmap and slot reuse
> corruption. The plain close(fd) path is kept for pre-mmap failures.
> 
> Version v10: Incorporate code review suggestions from Stephen Hemminger.
> Change made to patch 4/5
> Moved dev_invalidate_vrings after free_mem_region, array compaction, and
> nregions decrement. This ensures translate_ring_addresses only sees
> surviving memory regions, preventing vring pointers from resolving into
> a region that is about to be unmapped.
> 
> Version v9: Incorporate code review suggestions from Stephen Hemminger.
> Changes made to patch 3/5
> Restored max_guest_pages initial value to hardcoded 8 instead of
> VHOST_MEMORY_MAX_NREGIONS, matching upstream semantics.
> Changes made to patch 4/5
> Added close(reg->fd) and reg->fd = -1 before goto close_msg_fds in the
> mmap failure path to fix fd leak after fd was moved from ctx->fds[0].
> Converted dev_invalidate_vrings from a plain function to a macro +
> implementation function pair, accepting message ID as a parameter so
> the static_assert reports the correct handler at each call site.
> Updated dev_invalidate_vrings call in add_mem_reg to pass
> VHOST_USER_ADD_MEM_REG as message ID.
> Updated dev_invalidate_vrings call in rem_mem_reg to pass
> VHOST_USER_REM_MEM_REG as message ID.
> 
> Version v8:  Incorporate code review suggestions from Stephen Hemminger.
> rewrite async_dma_map_region function to iterate guest pages by host
> address range matching
> change function dev_invalidate_vrings to accept a double pointer to
> propagate pointer updates
> new function remove_guest_pages was added
> add_mem_reg error path was narrowed to only clean up the single failed
> region instead of destroting all existing regions
> 
> Version v7: Incorporate code review suggestions from Maxime Coquelin.
> Add debug messages to vhost_postcopy_register function.
> 
> Version v6: Added the enablement of this feature as a final patch in
> this patch-set and other code optimizations as suggested by Maxime
> Coquelin.
> 
> Version v5: removed the patch that increased the number of memory regions
> from 8 to 128. This will be submitted as a separate feature at a later
> point after incorporating additional optimizations. Also includes code
> optimizations as suggested by Feng Cheng Wen.
> 
> Version v4: code optimizations as suggested by Feng Cheng Wen.
> 
> Version v3: code optimizations as suggested by Maxime Coquelin
> and Thomas Monjalon.
> 
> Version v2: code optimizations as suggested by Maxime Coquelin.
> 
> Version v1: Initial patch set.
> 
> Pravin M Bathija (5):
>   vhost: add user to mailmap and define to vhost hdr
>   vhost_user: header defines for add/rem mem region
>   vhost_user: support function defines for back-end
>   vhost_user: Function defs for add/rem mem regions
>   vhost_user: enable configure memory slots
> 
>  .mailmap               |   1 +
>  lib/vhost/rte_vhost.h  |   4 +
>  lib/vhost/vhost_user.c | 418 +++++++++++++++++++++++++++++++++++------
>  lib/vhost/vhost_user.h |  10 +
>  4 files changed, 371 insertions(+), 62 deletions(-)
> 

Some useful AI feedback

Review of [PATCH v13 0-5/5] vhost: configure memory slots support
Author: Pravin M Bathija <pravin.bathija@dell.com>

This revision addresses essentially every correctness issue raised in
the v7-v12 reviews:

  - ctx->fds[0] = -1 ownership transfer is now done before mmap, and
    the mmap-failure path closes reg->fd explicitly when mmap never
    set reg->mmap_addr.
  - _dev_invalidate_vrings now takes struct virtio_net **pdev and
    writes back *pdev = dev at the end, so a numa_realloc inside
    translate_ring_addresses propagates correctly. Both call sites
    refresh "dev = *pdev;" afterwards.
  - The dev_invalidate_vrings() macro now takes the message id and
    uses static_assert(id ## _LOCK_ALL_QPS, ...), matching the
    existing VHOST_USER_ASSERT_LOCK pattern. Works for both
    VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG call sites.
  - Overlap check in vhost_user_add_mem_reg uses guest address
    space (guest_user_addr, size / userspace_addr, memory_size),
    no longer mmap_size.
  - free_new_region undoes only the failed region: async DMA unmap,
    remove_guest_pages, free_mem_region(reg), nregions--.
  - async_dma_map_region iterates dev->nr_guest_pages and filters
    by [reg_start, reg_end), eliminating the prior reg_size
    underflow loop.
  - The regions array is kept contiguous via memmove on REM_MEM_REG,
    so existing iterators that walk mem->nregions remain correct.
  - max_guest_pages is back to 8 in vhost_user_initialize_memory.

One protocol-level issue remains worth raising.


Patch 4/5 -- vhost_user: Function defs for add/rem mem regions
--------------------------------------------------------------------

Warning: ADD_MEM_REG does not send the host_user_addr reply

  Per the vhost-user spec for VHOST_USER_ADD_MEM_REG, the back-end
  is expected to reply with the same message format and the
  userspace_addr field replaced by the host userspace address that
  the region was mapped into. The handler returns
  RTE_VHOST_MSG_RESULT_OK with no reply constructed, so the
  dispatcher does not call send_vhost_reply().

  For postcopy migration this matters in particular: the original
  vhost_user_postcopy_register() does two things -- exchange the
  host_user_addr with the front-end and wait for an ack, then
  register the regions with userfaultfd. The patch only does the
  userfaultfd registration via vhost_user_postcopy_region_register().
  The in-code comment notes the payload-layout mismatch with
  vhost_user_postcopy_register() but stops there.

  Without the address reply, QEMU will not know the back-end's
  mapping for regions added via ADD_MEM_REG, so the userfaultfd
  handling on the QEMU side cannot resolve faults in those
  regions. Postcopy migration combined with the
  CONFIGURE_MEM_SLOTS feature will not work.

  Suggested fix: construct a memreg-payload reply with
  region->userspace_addr replaced by reg->host_user_addr and
  return RTE_VHOST_MSG_RESULT_REPLY. At minimum, refuse
  ADD_MEM_REG when dev->postcopy_listening is set, so that the
  combination fails cleanly rather than silently mis-mapping.


Info: vhost_user_rem_mem_reg does not validate ctx->fd_num

  The handler is registered with accepts_fd = true and does not
  call validate_msg_fds(). The trailing close_msg_fds(ctx) cleans
  up whatever fds were passed, so this is not a leak, but a
  malformed message with an unexpected fd count is silently
  accepted. The other accepts_fd handlers in this file validate
  fd_num explicitly.


Info: vhost_user_get_max_mem_slots cast is unnecessary

  ctx->msg.payload.u64 = (uint64_t)max_mem_slots;

  max_mem_slots is uint32_t and the assignment widens
  automatically; the cast can be dropped. Minor.


Reviewed-by would be appropriate once the postcopy reply is
addressed (or the combination is rejected). The rest of the
series looks correct.


  parent reply	other threads:[~2026-05-18 17:13 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-14  2:01 [PATCH v13 0/5] Support add/remove memory region and get-max-slots pravin.bathija
2026-05-14  2:01 ` [PATCH v13 1/5] vhost: add user to mailmap and define to vhost hdr pravin.bathija
2026-05-14  2:01 ` [PATCH v13 2/5] vhost_user: header defines for add/rem mem region pravin.bathija
2026-05-14  2:01 ` [PATCH v13 3/5] vhost_user: support function defines for back-end pravin.bathija
2026-05-14  2:01 ` [PATCH v13 4/5] vhost_user: Function defs for add/rem mem regions pravin.bathija
2026-05-14  2:01 ` [PATCH v13 5/5] vhost_user: enable configure memory slots pravin.bathija
2026-05-16  2:55   ` [PATCH v14 00/11] net/sxe2: fix logic errors and address feedback liujie5
2026-05-16  2:55     ` [PATCH v14 01/11] mailmap: add Jie Liu liujie5
2026-05-16  2:55     ` [PATCH v14 02/11] doc: add sxe2 guide and release notes liujie5
2026-05-16  2:55     ` [PATCH v14 03/11] common/sxe2: add sxe2 basic structures liujie5
2026-05-16  2:55     ` [PATCH v14 04/11] drivers: add base driver skeleton liujie5
2026-05-16  2:55     ` [PATCH v14 05/11] drivers: add base driver probe skeleton liujie5
2026-05-16  2:55     ` [PATCH v14 06/11] drivers: support PCI BAR mapping liujie5
2026-05-16  2:55     ` [PATCH v14 07/11] common/sxe2: add ioctl interface for DMA map and unmap liujie5
2026-05-16  2:55     ` [PATCH v14 08/11] net/sxe2: support queue setup and control liujie5
2026-05-16  2:55     ` [PATCH v14 09/11] drivers: add data path for Rx and Tx liujie5
2026-05-16  2:55     ` [PATCH v14 10/11] net/sxe2: add vectorized " liujie5
2026-05-16  2:55     ` [PATCH v14 11/11] net/sxe2: implement Tx done cleanup liujie5
2026-05-16  7:46       ` [PATCH v15 00/11] net/sxe2: fix logic errors and address feedback liujie5
2026-05-16  7:46         ` [PATCH v15 01/11] mailmap: add Jie Liu liujie5
2026-05-16  7:46         ` [PATCH v15 02/11] doc: add sxe2 guide and release notes liujie5
2026-05-16  7:46         ` [PATCH v15 03/11] common/sxe2: add sxe2 basic structures liujie5
2026-05-16  7:46         ` [PATCH v15 04/11] drivers: add base driver skeleton liujie5
2026-05-16  7:46         ` [PATCH v15 05/11] drivers: add base driver probe skeleton liujie5
2026-05-16  7:46         ` [PATCH v15 06/11] drivers: support PCI BAR mapping liujie5
2026-05-16  7:46         ` [PATCH v15 07/11] common/sxe2: add ioctl interface for DMA map and unmap liujie5
2026-05-16  7:46         ` [PATCH v15 08/11] net/sxe2: support queue setup and control liujie5
2026-05-16  7:46         ` [PATCH v15 09/11] drivers: add data path for Rx and Tx liujie5
2026-05-16  7:46         ` [PATCH v15 10/11] net/sxe2: add vectorized " liujie5
2026-05-16  7:46         ` [PATCH v15 11/11] net/sxe2: implement Tx done cleanup liujie5
2026-05-18  9:13           ` [PATCH v16 00/11] net/sxe2: fix logic errors and address feedback liujie5
2026-05-18  9:13             ` [PATCH v16 01/11] mailmap: add Jie Liu liujie5
2026-05-18  9:13             ` [PATCH v16 02/11] doc: add sxe2 guide and release notes liujie5
2026-05-18  9:13             ` [PATCH v16 03/11] common/sxe2: add sxe2 basic structures liujie5
2026-05-18  9:13             ` [PATCH v16 04/11] drivers: add base driver skeleton liujie5
2026-05-18  9:13             ` [PATCH v16 05/11] drivers: add base driver probe skeleton liujie5
2026-05-18  9:14             ` [PATCH v16 06/11] drivers: support PCI BAR mapping liujie5
2026-05-18  9:14             ` [PATCH v16 07/11] common/sxe2: add ioctl interface for DMA map and unmap liujie5
2026-05-18  9:14             ` [PATCH v16 08/11] net/sxe2: support queue setup and control liujie5
2026-05-18  9:14             ` [PATCH v16 09/11] drivers: add data path for Rx and Tx liujie5
2026-05-18  9:14             ` [PATCH v16 10/11] net/sxe2: add vectorized " liujie5
2026-05-18  9:14             ` [PATCH v16 11/11] net/sxe2: implement Tx done cleanup liujie5
2026-05-19  3:01               ` [PATCH v17 00/11] net/sxe2: fix logic errors and address feedback liujie5
2026-05-19  3:01                 ` [PATCH v17 01/11] mailmap: add Jie Liu liujie5
2026-05-19  3:01                 ` [PATCH v17 02/11] doc: add sxe2 guide and release notes liujie5
2026-05-19  3:01                 ` [PATCH v17 03/11] common/sxe2: add sxe2 basic structures liujie5
2026-05-19  3:01                 ` [PATCH v17 04/11] drivers: add base driver skeleton liujie5
2026-05-19  3:01                 ` [PATCH v17 05/11] drivers: add base driver probe skeleton liujie5
2026-05-19  3:01                 ` [PATCH v17 06/11] drivers: support PCI BAR mapping liujie5
2026-05-19  3:01                 ` [PATCH v17 07/11] common/sxe2: add ioctl interface for DMA map and unmap liujie5
2026-05-19 17:41                   ` Stephen Hemminger
2026-05-19  3:01                 ` [PATCH v17 08/11] net/sxe2: support queue setup and control liujie5
2026-05-19  3:01                 ` [PATCH v17 09/11] drivers: add data path for Rx and Tx liujie5
2026-05-19  3:01                 ` [PATCH v17 10/11] net/sxe2: add vectorized " liujie5
2026-05-19  3:01                 ` [PATCH v17 11/11] net/sxe2: implement Tx done cleanup liujie5
2026-05-19 14:47                   ` [PATCH v18 00/11] net/sxe2: fix logic errors and address feedback liujie5
2026-05-19 14:48                     ` [PATCH v18 01/11] mailmap: add Jie Liu liujie5
2026-05-19 14:48                     ` [PATCH v18 02/11] doc: add sxe2 guide and release notes liujie5
2026-05-19 14:48                     ` [PATCH v18 03/11] common/sxe2: add sxe2 basic structures liujie5
2026-05-19 14:48                     ` [PATCH v18 04/11] drivers: add base driver skeleton liujie5
2026-05-19 14:48                     ` [PATCH v18 05/11] drivers: add base driver probe skeleton liujie5
2026-05-19 14:48                     ` [PATCH v18 06/11] drivers: support PCI BAR mapping liujie5
2026-05-19 14:48                     ` [PATCH v18 07/11] common/sxe2: add ioctl interface for DMA map and unmap liujie5
2026-05-19 14:48                     ` [PATCH v18 08/11] net/sxe2: support queue setup and control liujie5
2026-05-19 14:48                     ` [PATCH v18 09/11] drivers: add data path for Rx and Tx liujie5
2026-05-19 14:48                     ` [PATCH v18 10/11] net/sxe2: add vectorized " liujie5
2026-05-19 14:48                     ` [PATCH v18 11/11] net/sxe2: implement Tx done cleanup liujie5
2026-05-20  2:17                       ` [PATCH v19 00/11]net/sxe2: fix logic errors and address feedback liujie5
2026-05-20  2:17                         ` [PATCH v19 01/11] mailmap: add Jie Liu liujie5
2026-05-20  2:18                         ` [PATCH v19 02/11] doc: add sxe2 guide and release notes liujie5
2026-05-20  2:18                         ` [PATCH v19 03/11] common/sxe2: add sxe2 basic structures liujie5
2026-05-20  2:18                         ` [PATCH v19 04/11] drivers: add base driver skeleton liujie5
2026-05-20  2:18                         ` [PATCH v19 05/11] drivers: add base driver probe skeleton liujie5
2026-05-20  2:18                         ` [PATCH v19 06/11] drivers: support PCI BAR mapping liujie5
2026-05-20  2:18                         ` [PATCH v19 07/11] common/sxe2: add ioctl interface for DMA map and unmap liujie5
2026-05-20  2:18                         ` [PATCH v19 08/11] net/sxe2: support queue setup and control liujie5
2026-05-20  2:18                         ` [PATCH v19 09/11] drivers: add data path for Rx and Tx liujie5
2026-05-20  2:18                         ` [PATCH v19 10/11] net/sxe2: add vectorized " liujie5
2026-05-20  2:18                         ` [PATCH v19 11/11] net/sxe2: implement Tx done cleanup liujie5
2026-05-20  0:32                     ` [PATCH v18 00/11] net/sxe2: fix logic errors and address feedback Stephen Hemminger
2026-05-19 14:16                 ` [PATCH v17 " Stephen Hemminger
2026-05-18 17:25             ` [PATCH v16 " Stephen Hemminger
2026-05-18  0:02         ` [PATCH v15 " Stephen Hemminger
2026-05-18 17:13 ` Stephen Hemminger [this message]
2026-05-20  2:36   ` [PATCH v13 0/5] Support add/remove memory region and get-max-slots Bathija, Pravin
  -- strict thread matches above, loose matches on Subject: below --
2026-05-14 22:46 pravin.bathija

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260518101344.7a1de0ef@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=dev@dpdk.org \
    --cc=fengchengwen@huawei.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=pravin.bathija@dell.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox