DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: <pravin.bathija@dell.com>
Cc: <dev@dpdk.org>, <fengchengwen@huawei.com>,
	<maxime.coquelin@redhat.com>, <thomas@monjalon.net>
Subject: Re: [PATCH v15 0/5] Support add/remove memory region and get-max-slots
Date: Fri, 5 Jun 2026 09:45:33 -0700	[thread overview]
Message-ID: <20260605094533.26cd079c@phoenix.local> (raw)
In-Reply-To: <20260604235723.1046607-1-pravin.bathija@dell.com>

On Thu, 4 Jun 2026 23:57:18 +0000
<pravin.bathija@dell.com> wrote:

> From: Pravin M Bathija <pravin.bathija@dell.com>
> 
> This is version v15 of the patchset and it incorporates the
> recommendations made by Maxime Coquelin.
> 
> Patch 4/5
> - Changed VHOST_USER_REM_MEM_REG handler declaration from
>   accepts_fd=true to accepts_fd=false, as the remove request does not
>   expect FDs in ancillary data.
> - Removed all close_msg_fds(ctx) calls from vhost_user_rem_mem_reg(), no
>   longer needed since the handler is declared as not accepting FDs.
> - Removed validate_msg_fds(dev, ctx, 0) check from
>   vhost_user_rem_mem_reg(), as FD validation is now handled generically
>   by the framework. 
> - Added targeted IOTLB cache invalidation in vhost_user_rem_mem_reg()
>   using vhost_user_iotlb_cache_remove() for the removed region's GPA
>   range, instead of the nuclear iotlb_flush_all() used by set_mem_table.
> 
> This implementation has been extensively tested by doing Read/Write I/O
> from multiple instances of fio + libblkio (front-end) talking to
> spdk/dpdk (back-end) based drives. Tested with qemu front-end talking to
> dpdk testpmd (back-end) performing add/removal of memory regions. Also
> tested post-copy live migration after doing add_memory_region.
> 
> Version Log:
> Version v15 (Current version): Incorporate code review suggestions from
> Maxime Coquelin as described above.
> 
> Version v14: Incorporate code review suggestions from Stephen Hemminger
> and Fengcheng Wen.
> Changes from Fengcheng Wen review:
> Patch 3/5
> - Moved free_all_mem_regions() call sites in vhost_user_set_mem_table()
>   from patch 4/5 to patch 3/5 so each commit compiles independently
> Patch 4/5
> - Renamed _dev_invalidate_vrings() to vhost_user_invalidate_vrings() to
>   follow vhost naming convention
> -  Added comment explaining *pdev propagation through
>    translate_ring_addresses / numa_realloc()
> - Reordered local variables in vhost_user_add_mem_reg() and
>   vhost_user_rem_mem_reg() by descending line length
> - Shortened overlap check variable names (current_region_guest_start/end
>   --> cur_start/end, proposed_region_guest_start/end -> new_start/end)  
> - Fixed DMA error path in vhost_user_add_mem_reg(): added
>   free_new_region_no_dma label so async_dma_map_region(false) is not
>   called when the map itself failed.
> Changes from Stephen Hemminger review:
> Patch 4/5
> - vhost_user_add_mem_reg() now constructs a reply with the back-end's
>   host mapping address in userspace_addr and returns
>   RTE_VHOST_MSG_RESULT_REPLY per the vhost-user spec
> - Added validate_msg_fds(dev, ctx, 0) in vhost_user_rem_mem_reg() to
>   reject malformed messages with unexpected file descriptors
> - Dropped unnecessary (uint64_t) cast in vhost_user_get_max_mem_slots()
> 
> Version v13: Incorporate code review suggestions from Fengcheng Wen
> Patch 2/5
> Renamed VhostUserSingleMemReg to VhostUserMemRegMsg and memory_single
> to memreg
> Patches 3/5 and 4/5
> Relocated function remove_guest_pages from patch 3/5 to 4/5
> 
> Version v12: Incorporate code review suggestions from Maxime Coquelin
> and ai-code-review.
> Patch 3/5
> Refactored async_dma_map() to delegate to async_dma_map_region(),
> eliminating code duplication between the two functions.
> Restored original comments in async_dma_map_region() explaining why
> ENODEV and EINVAL errors are ignored (these were stripped in v10)
> Reverted unnecessary changes to vhost_user_postcopy_register() --
> removed the host_user_addr == 0 checks and reg_msg_index indirection
> that were added in  v10, since this function is only called from
> vhost_user_set_mem_table() where regions are always contiguous.
> 
> Version v11: Incorporate code review suggestions from Stephen Hemminger.
> Patch 4/5
> Fix incomplete cleanup in vhost_user_add_mem_reg() when
> vhost_user_mmap_region() fails after the mmap succeeds (e.g.
> add_guest_pages() realloc failure) realloc failure). The error path now
> calls remove_guest_pages() and free_mem_region() to undo the mapping
> and stale guest-page entries, preventing a leaked mmap and slot reuse
> corruption. The plain close(fd) path is kept for pre-mmap failures.
> 
> Version v10: Incorporate code review suggestions from Stephen Hemminger.
> Patch 4/5
> Moved dev_invalidate_vrings after free_mem_region, array compaction, and
> nregions decrement. This ensures translate_ring_addresses only sees
> surviving memory regions, preventing vring pointers from resolving into
> a region that is about to be unmapped.
> 
> Version v9: Incorporate code review suggestions from Stephen Hemminger.
> Patch 3/5
> Restored max_guest_pages initial value to hardcoded 8 instead of
> VHOST_MEMORY_MAX_NREGIONS, matching upstream semantics.
> Patch 4/5
> Added close(reg->fd) and reg->fd = -1 before goto close_msg_fds in the
> mmap failure path to fix fd leak after fd was moved from ctx->fds[0].
> Converted dev_invalidate_vrings from a plain function to a macro +
> implementation function pair, accepting message ID as a parameter so
> the static_assert reports the correct handler at each call site.
> Updated dev_invalidate_vrings call in add_mem_reg to pass
> VHOST_USER_ADD_MEM_REG as message ID.
> Updated dev_invalidate_vrings call in rem_mem_reg to pass
> VHOST_USER_REM_MEM_REG as message ID.
> 
> Version v8:  Incorporate code review suggestions from Stephen Hemminger.
> rewrite async_dma_map_region function to iterate guest pages by host
> address range matching
> change function dev_invalidate_vrings to accept a double pointer to
> propagate pointer updates
> new function remove_guest_pages was added
> add_mem_reg error path was narrowed to only clean up the single failed
> region instead of destroting all existing regions
> 
> Version v7: Incorporate code review suggestions from Maxime Coquelin.
> Add debug messages to vhost_postcopy_register function.
> 
> Version v6: Added the enablement of this feature as a final patch in
> this patch-set and other code optimizations as suggested by Maxime
> Coquelin.
> 
> Version v5: removed the patch that increased the number of memory regions
> from 8 to 128. This will be submitted as a separate feature at a later
> point after incorporating additional optimizations. Also includes code
> optimizations as suggested by Feng Cheng Wen.
> 
> Version v4: code optimizations as suggested by Feng Cheng Wen.
> 
> Version v3: code optimizations as suggested by Maxime Coquelin
> and Thomas Monjalon.
> 
> Version v2: code optimizations as suggested by Maxime Coquelin.
> 
> Version v1: Initial patch set.
> 
> Pravin M Bathija (5):
>   vhost: add user to mailmap and define to vhost hdr
>   vhost: header defines for add/rem mem region
>   vhost: refactor memory helper functions
>   vhost: add mem region add/remove handlers
>   vhost: enable configure memory slots
> 
>  .mailmap               |   1 +
>  lib/vhost/rte_vhost.h  |   4 +
>  lib/vhost/vhost_user.c | 425 +++++++++++++++++++++++++++++++++++------
>  lib/vhost/vhost_user.h |  10 +
>  4 files changed, 378 insertions(+), 62 deletions(-)
> 

I don't think this is ready to merge based on AI review.
Did AI review with Opus 4.8 on a chat which has past context.

Summary of v15 findings


New in v15 (both patch 4/5, both errors):

    Use-after-free on the reply path: reg points into dev->mem->regions[], but dev_invalidate_vrings() -> translate_ring_addresses() -> numa_realloc() can relocate dev->mem. dev is refreshed via *pdev, reg is not, then reg->host_user_addr is read for the reply. Re-derive reg (or capture host_user_addr) after dev = *pdev.
    ADD_MEM_REG reply sent unconditionally: handler always returns RESULT_REPLY, but the spec makes the mapping-address reply postcopy- only. In non-postcopy mode this desyncs the channel (no REPLY_ACK: the front-end never reads it; with REPLY_ACK: it expects a u64 ack, not a memreg). Gate the reply on dev->postcopy_listening, else return RESULT_OK -- same as SET_MEM_TABLE.

Carried over from v13 (now in a different form):

    The v13 Warning (missing postcopy mapping-address reply) is addressed but mis-gated; correct fix is the conditional reply above. Until then postcopy correctness still isn't right.


      parent reply	other threads:[~2026-06-05 16:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-04 23:57 [PATCH v15 0/5] Support add/remove memory region and get-max-slots pravin.bathija
2026-06-04 23:57 ` [PATCH v15 1/5] vhost: add user to mailmap and define to vhost hdr pravin.bathija
2026-06-04 23:57 ` [PATCH v15 2/5] vhost: header defines for add/rem mem region pravin.bathija
2026-06-04 23:57 ` [PATCH v15 3/5] vhost: refactor memory helper functions pravin.bathija
2026-06-04 23:57 ` [PATCH v15 4/5] vhost: add mem region add/remove handlers pravin.bathija
2026-06-05 11:32   ` Maxime Coquelin
2026-06-04 23:57 ` [PATCH v15 5/5] vhost: enable configure memory slots pravin.bathija
2026-06-05 13:14 ` [PATCH v15 0/5] Support add/remove memory region and get-max-slots Maxime Coquelin
2026-06-05 16:45 ` Stephen Hemminger [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260605094533.26cd079c@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=dev@dpdk.org \
    --cc=fengchengwen@huawei.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=pravin.bathija@dell.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox