From: Stephen Hemminger <stephen@networkplumber.org>
To: Long Li <longli@microsoft.com>
Cc: dev@dpdk.org, Wei Hu <weh@microsoft.com>,
stable@dpdk.org, Dariusz Sosnowski <dsosnowski@nvidia.com>,
Viacheslav Ovsiienko <viacheslavo@nvidia.com>,
Bing Zhao <bingz@nvidia.com>, Ori Kam <orika@nvidia.com>,
Suanming Mou <suanmingm@nvidia.com>,
Matan Azrad <matan@nvidia.com>
Subject: Re: [PATCH v5 0/7] multi-process and VF hotplug fixes
Date: Fri, 27 Feb 2026 16:41:48 -0800 [thread overview]
Message-ID: <20260227164148.470898ef@phoenix.local> (raw)
In-Reply-To: <20260227015928.14338-1-longli@microsoft.com>
On Thu, 26 Feb 2026 17:59:20 -0800
Long Li <longli@microsoft.com> wrote:
> Fix several issues with VF hotplug and multi-process support in
> netvsc, mana, mlx5, and mlx4 drivers:
>
> - Fix race conditions between VSP notifications and DPDK device events
> during VF add/remove, with proper locking of VF-related fields
> - Add multi-process communication infrastructure for coordinating VF
> removal across primary and secondary processes
> - Fix Protection Domain resource leak on device close in mana
> - Fix devargs memory leak during VF hotplug in netvsc
> - Fix fast-path ops (rte_eth_fp_ops) setup in secondary processes for
> mana, mlx5, and mlx4, ensuring burst function pointers are restored
> after STOP->START cycles
>
> v5:
> - Patches 5,6,7: Also restore rte_eth_fp_ops burst function pointers
> (rx_pkt_burst, tx_pkt_burst) in START_RXTX handler, not just queue
> data pointers. Without this, after a STOP->START cycle the secondary
> process burst pointers remain set to dummy functions.
>
> v4:
> - Patch 1: Check hn_vf_add() return value in netvsc_hotplug_retry
> - Patch 1: Track fresh_attach to avoid tearing down original VF
> attachment when configure/start fails on an -EEXIST path
> - Patch 2: Move counter decrement and netvsc_uninit_once() after device
> cleanup in eth_hn_remove() to prevent use-after-free of shared data
> - Patch 2: Clear netvsc_shared_data on init failure paths to prevent
> dangling pointer
>
> v3:
> - Fix review comments from v2
>
> v2:
> - Initial rework of VF add/remove locking
>
> Long Li (7):
> net/netvsc: fix race conditions on VF add/remove events
> net/netvsc: add multi-process VF device removal support
> net/mana: fix PD resource leak on device close
> net/netvsc: fix devargs memory leak on hotplug
> net/mana: fix fast-path ops setup in secondary process
> net/mlx5: fix fast-path ops setup in secondary process
> net/mlx4: fix fast-path ops setup in secondary process
>
> drivers/net/mana/mana.c | 14 ++
> drivers/net/mana/mp.c | 8 +
> drivers/net/mlx4/mlx4_mp.c | 6 +
> drivers/net/mlx5/linux/mlx5_mp_os.c | 6 +
> drivers/net/netvsc/hn_ethdev.c | 300 +++++++++++++++++++++++++++-
> drivers/net/netvsc/hn_nvs.h | 6 +
> drivers/net/netvsc/hn_rxtx.c | 40 ++--
> drivers/net/netvsc/hn_var.h | 1 +
> drivers/net/netvsc/hn_vf.c | 148 ++++++++------
> 9 files changed, 437 insertions(+), 92 deletions(-)
>
Looks okay to me, the AI review feedback raised a couple of questions.
If it is ok will take it as is for this release.
The AI summary was:
Patch 1: hn_vf_add_unlocked() — when hn_nvs_set_datapath() fails at switch_data_path: after a fresh attach, the VF is not detached (no goto detach). This leaves inconsistent state.
Patch 2: netvsc_uninit_once() — primary can free the shared memzone while secondaries still reference netvsc_shared_data, causing a dangling pointer. The local-only secondary_cnt check doesn't reflect remote secondary processes.
Warnings (should consider)
Patch 1: Potential deadlock in hn_vf_close() — holding write lock while calling rte_eth_dev_callback_unregister() which synchronously waits for in-progress callbacks that may themselves try to acquire the write lock via hn_remove_delayed().
There were a couple more things but these were just AI being overly paranoid.
next prev parent reply other threads:[~2026-02-28 0:41 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-27 1:59 [PATCH v5 0/7] multi-process and VF hotplug fixes Long Li
2026-02-27 1:59 ` [PATCH v5 1/7] net/netvsc: fix race conditions on VF add/remove events Long Li
2026-02-27 1:59 ` [PATCH v5 2/7] net/netvsc: add multi-process VF device removal support Long Li
2026-02-27 1:59 ` [PATCH v5 3/7] net/mana: fix PD resource leak on device close Long Li
2026-02-27 1:59 ` [PATCH v5 4/7] net/netvsc: fix devargs memory leak on hotplug Long Li
2026-02-27 1:59 ` [PATCH v5 5/7] net/mana: fix fast-path ops setup in secondary process Long Li
2026-02-27 1:59 ` [PATCH v5 6/7] net/mlx5: " Long Li
2026-02-27 1:59 ` [PATCH v5 7/7] net/mlx4: " Long Li
2026-02-28 0:41 ` Stephen Hemminger [this message]
2026-02-28 1:36 ` [EXTERNAL] Re: [PATCH v5 0/7] multi-process and VF hotplug fixes Long Li
2026-02-28 17:03 ` Stephen Hemminger
2026-03-01 5:00 ` [EXTERNAL] " Long Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260227164148.470898ef@phoenix.local \
--to=stephen@networkplumber.org \
--cc=bingz@nvidia.com \
--cc=dev@dpdk.org \
--cc=dsosnowski@nvidia.com \
--cc=longli@microsoft.com \
--cc=matan@nvidia.com \
--cc=orika@nvidia.com \
--cc=stable@dpdk.org \
--cc=suanmingm@nvidia.com \
--cc=viacheslavo@nvidia.com \
--cc=weh@microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.