public inbox for dev@dpdk.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Long Li <longli@microsoft.com>
Cc: dev@dpdk.org, Wei Hu <weh@microsoft.com>,
	stable@dpdk.org, Dariusz Sosnowski <dsosnowski@nvidia.com>,
	Viacheslav Ovsiienko <viacheslavo@nvidia.com>,
	Bing Zhao <bingz@nvidia.com>, Ori Kam <orika@nvidia.com>,
	Suanming Mou <suanmingm@nvidia.com>,
	Matan Azrad <matan@nvidia.com>
Subject: Re: [PATCH v5 0/7] multi-process and VF hotplug fixes
Date: Fri, 27 Feb 2026 16:41:48 -0800	[thread overview]
Message-ID: <20260227164148.470898ef@phoenix.local> (raw)
In-Reply-To: <20260227015928.14338-1-longli@microsoft.com>

On Thu, 26 Feb 2026 17:59:20 -0800
Long Li <longli@microsoft.com> wrote:

> Fix several issues with VF hotplug and multi-process support in
> netvsc, mana, mlx5, and mlx4 drivers:
> 
> - Fix race conditions between VSP notifications and DPDK device events
>   during VF add/remove, with proper locking of VF-related fields
> - Add multi-process communication infrastructure for coordinating VF
>   removal across primary and secondary processes
> - Fix Protection Domain resource leak on device close in mana
> - Fix devargs memory leak during VF hotplug in netvsc
> - Fix fast-path ops (rte_eth_fp_ops) setup in secondary processes for
>   mana, mlx5, and mlx4, ensuring burst function pointers are restored
>   after STOP->START cycles
> 
> v5:
> - Patches 5,6,7: Also restore rte_eth_fp_ops burst function pointers
>   (rx_pkt_burst, tx_pkt_burst) in START_RXTX handler, not just queue
>   data pointers. Without this, after a STOP->START cycle the secondary
>   process burst pointers remain set to dummy functions.
> 
> v4:
> - Patch 1: Check hn_vf_add() return value in netvsc_hotplug_retry
> - Patch 1: Track fresh_attach to avoid tearing down original VF
>   attachment when configure/start fails on an -EEXIST path
> - Patch 2: Move counter decrement and netvsc_uninit_once() after device
>   cleanup in eth_hn_remove() to prevent use-after-free of shared data
> - Patch 2: Clear netvsc_shared_data on init failure paths to prevent
>   dangling pointer
> 
> v3:
> - Fix review comments from v2
> 
> v2:
> - Initial rework of VF add/remove locking
> 
> Long Li (7):
>   net/netvsc: fix race conditions on VF add/remove events
>   net/netvsc: add multi-process VF device removal support
>   net/mana: fix PD resource leak on device close
>   net/netvsc: fix devargs memory leak on hotplug
>   net/mana: fix fast-path ops setup in secondary process
>   net/mlx5: fix fast-path ops setup in secondary process
>   net/mlx4: fix fast-path ops setup in secondary process
> 
>  drivers/net/mana/mana.c             |  14 ++
>  drivers/net/mana/mp.c               |   8 +
>  drivers/net/mlx4/mlx4_mp.c          |   6 +
>  drivers/net/mlx5/linux/mlx5_mp_os.c |   6 +
>  drivers/net/netvsc/hn_ethdev.c      | 300 +++++++++++++++++++++++++++-
>  drivers/net/netvsc/hn_nvs.h         |   6 +
>  drivers/net/netvsc/hn_rxtx.c        |  40 ++--
>  drivers/net/netvsc/hn_var.h         |   1 +
>  drivers/net/netvsc/hn_vf.c          | 148 ++++++++------
>  9 files changed, 437 insertions(+), 92 deletions(-)
> 

Looks okay to me, the AI review feedback raised a couple of questions.
If it is ok will take it as is for this release.

The AI summary was:

Patch 1: hn_vf_add_unlocked() — when hn_nvs_set_datapath() fails at switch_data_path: after a fresh attach, the VF is not detached (no goto detach). This leaves inconsistent state.

Patch 2: netvsc_uninit_once() — primary can free the shared memzone while secondaries still reference netvsc_shared_data, causing a dangling pointer. The local-only secondary_cnt check doesn't reflect remote secondary processes.

Warnings (should consider)

Patch 1: Potential deadlock in hn_vf_close() — holding write lock while calling rte_eth_dev_callback_unregister() which synchronously waits for in-progress callbacks that may themselves try to acquire the write lock via hn_remove_delayed().

There were a couple more things but these were just AI being overly paranoid.



  parent reply	other threads:[~2026-02-28  0:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-27  1:59 [PATCH v5 0/7] multi-process and VF hotplug fixes Long Li
2026-02-27  1:59 ` [PATCH v5 1/7] net/netvsc: fix race conditions on VF add/remove events Long Li
2026-02-27  1:59 ` [PATCH v5 2/7] net/netvsc: add multi-process VF device removal support Long Li
2026-02-27  1:59 ` [PATCH v5 3/7] net/mana: fix PD resource leak on device close Long Li
2026-02-27  1:59 ` [PATCH v5 4/7] net/netvsc: fix devargs memory leak on hotplug Long Li
2026-02-27  1:59 ` [PATCH v5 5/7] net/mana: fix fast-path ops setup in secondary process Long Li
2026-02-27  1:59 ` [PATCH v5 6/7] net/mlx5: " Long Li
2026-02-27  1:59 ` [PATCH v5 7/7] net/mlx4: " Long Li
2026-02-28  0:41 ` Stephen Hemminger [this message]
2026-02-28  1:36   ` [EXTERNAL] Re: [PATCH v5 0/7] multi-process and VF hotplug fixes Long Li
2026-02-28 17:03 ` Stephen Hemminger
2026-03-01  5:00   ` [EXTERNAL] " Long Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260227164148.470898ef@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=bingz@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=dsosnowski@nvidia.com \
    --cc=longli@microsoft.com \
    --cc=matan@nvidia.com \
    --cc=orika@nvidia.com \
    --cc=stable@dpdk.org \
    --cc=suanmingm@nvidia.com \
    --cc=viacheslavo@nvidia.com \
    --cc=weh@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox