From: Stephen Hemminger <stephen@networkplumber.org>
To: longli@linux.microsoft.com
Cc: dev@dpdk.org, Wei Hu <weh@microsoft.com>,
stable@dpdk.org, Dariusz Sosnowski <dsosnowski@nvidia.com>,
Viacheslav Ovsiienko <viacheslavo@nvidia.com>,
Bing Zhao <bingz@nvidia.com>, Ori Kam <orika@nvidia.com>,
Suanming Mou <suanmingm@nvidia.com>,
Matan Azrad <matan@nvidia.com>, Long Li <longli@microsoft.com>
Subject: Re: [PATCH v2 2/8] net/netvsc: fix race conditions on VF add/remove events
Date: Mon, 23 Feb 2026 09:43:58 -0800 [thread overview]
Message-ID: <20260223094358.58f5b0f4@phoenix.local> (raw)
In-Reply-To: <20260221024540.659098-2-longli@linux.microsoft.com>
On Fri, 20 Feb 2026 18:45:21 -0800
longli@linux.microsoft.com wrote:
> From: Long Li <longli@microsoft.com>
>
> Netvsc gets notification from VSP on VF add/remove over VMBUS, but the
> timing may not match the DPDK sequence of device events triggered from
> uevents from kernel.
>
> Remove the retry logic from the code when attach to VF and rely on DPDK
> event to attach to VF. With this change, both the notifications from VSP
> and the DPDK will attempt a VF attach.
>
> Also implement locking when checking on all VF related fields.
>
> Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
> Cc: stable@dpdk.org
>
> Signed-off-by: Long Li <longli@microsoft.com>
AI review spotted related issue.
**Patch 2 (net/netvsc: fix race conditions on VF add/remove events)** — the most complex patch in the series.
**What it fixes correctly:**
The old Tx/Rx paths had a TOCTOU race: they checked `vf_vsc_switched` without the lock, acquired the lock, then re-checked. A VF remove could complete between the first check and the lock acquisition. The new code takes the read lock *before* any VF state checks — correct fix. The lock is properly released on both paths.
The upgrade of `hn_vf_close()` from read lock to write lock is also a real bug fix, since it modifies `vf_attached` and calls `rte_eth_dev_close()`.
Moving callback registration into `hn_vf_attach()` with proper rollback (via the new `hn_vf_detach()` helper) is a good structural improvement that ties callback lifetime to VF attach/detach lifecycle.
The unconditional clear of `vf_vsc_switched` in `hn_vf_remove_unlocked()` is correct — if the VF is being removed, the switched flag must be cleared regardless of whether `hn_nvs_set_datapath(SYNTHETIC)` succeeded.
**One potential concern (~60% confidence):**
If the VF is successfully configured and started but `hn_nvs_set_datapath(VF)` fails at the `switch_data_path:` label, the function returns an error but leaves the VF started and attached. The callers don't clean this up. This is a pre-existing design issue the patch doesn't worsen, and the hypervisor may retry, but it could confuse subsequent add/remove cycles.
next prev parent reply other threads:[~2026-02-23 17:44 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-21 2:45 [PATCH v2 1/8] net/netvsc: secondary ignore promiscuous enable/disable longli
2026-02-21 2:45 ` [PATCH v2 2/8] net/netvsc: fix race conditions on VF add/remove events longli
2026-02-23 17:40 ` Stephen Hemminger
2026-02-23 17:43 ` Stephen Hemminger [this message]
2026-02-23 22:31 ` [EXTERNAL] " Long Li
2026-02-21 2:45 ` [PATCH v2 3/8] net/netvsc: add multi-process VF device removal support longli
2026-02-23 17:45 ` Stephen Hemminger
2026-02-23 22:32 ` [EXTERNAL] " Long Li
2026-02-21 2:45 ` [PATCH v2 4/8] net/mana: fix PD resource leak on device close longli
2026-02-21 2:45 ` [PATCH v2 5/8] net/netvsc: fix devargs memory leak on hotplug longli
2026-02-21 2:45 ` [PATCH v2 6/8] net/mana: fix fast-path ops setup in secondary process longli
2026-02-21 2:45 ` [PATCH v2 7/8] net/mlx5: " longli
2026-02-21 2:45 ` [PATCH v2 8/8] net/mlx4: " longli
2026-02-21 5:16 ` [PATCH v2 1/8] net/netvsc: secondary ignore promiscuous enable/disable Stephen Hemminger
2026-02-23 23:25 ` [EXTERNAL] " Long Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260223094358.58f5b0f4@phoenix.local \
--to=stephen@networkplumber.org \
--cc=bingz@nvidia.com \
--cc=dev@dpdk.org \
--cc=dsosnowski@nvidia.com \
--cc=longli@linux.microsoft.com \
--cc=longli@microsoft.com \
--cc=matan@nvidia.com \
--cc=orika@nvidia.com \
--cc=stable@dpdk.org \
--cc=suanmingm@nvidia.com \
--cc=viacheslavo@nvidia.com \
--cc=weh@microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox