public inbox for dev@dpdk.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: longli@linux.microsoft.com
Cc: dev@dpdk.org, Wei Hu <weh@microsoft.com>,
	stable@dpdk.org, Dariusz Sosnowski <dsosnowski@nvidia.com>,
	Viacheslav Ovsiienko <viacheslavo@nvidia.com>,
	Bing Zhao <bingz@nvidia.com>, Ori Kam <orika@nvidia.com>,
	Suanming Mou <suanmingm@nvidia.com>,
	Matan Azrad <matan@nvidia.com>, Long Li <longli@microsoft.com>
Subject: Re: [PATCH v2 2/8] net/netvsc: fix race conditions on VF add/remove events
Date: Mon, 23 Feb 2026 09:43:58 -0800	[thread overview]
Message-ID: <20260223094358.58f5b0f4@phoenix.local> (raw)
In-Reply-To: <20260221024540.659098-2-longli@linux.microsoft.com>

On Fri, 20 Feb 2026 18:45:21 -0800
longli@linux.microsoft.com wrote:

> From: Long Li <longli@microsoft.com>
> 
> Netvsc gets notification from VSP on VF add/remove over VMBUS, but the
> timing may not match the DPDK sequence of device events triggered from
> uevents from kernel.
> 
> Remove the retry logic from the code when attach to VF and rely on DPDK
> event to attach to VF. With this change, both the notifications from VSP
> and the DPDK will attempt a VF attach.
> 
> Also implement locking when checking on all VF related fields.
> 
> Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Long Li <longli@microsoft.com>

AI review spotted related issue.

**Patch 2 (net/netvsc: fix race conditions on VF add/remove events)** — the most complex patch in the series.

**What it fixes correctly:**

The old Tx/Rx paths had a TOCTOU race: they checked `vf_vsc_switched` without the lock, acquired the lock, then re-checked. A VF remove could complete between the first check and the lock acquisition. The new code takes the read lock *before* any VF state checks — correct fix. The lock is properly released on both paths.

The upgrade of `hn_vf_close()` from read lock to write lock is also a real bug fix, since it modifies `vf_attached` and calls `rte_eth_dev_close()`.

Moving callback registration into `hn_vf_attach()` with proper rollback (via the new `hn_vf_detach()` helper) is a good structural improvement that ties callback lifetime to VF attach/detach lifecycle.

The unconditional clear of `vf_vsc_switched` in `hn_vf_remove_unlocked()` is correct — if the VF is being removed, the switched flag must be cleared regardless of whether `hn_nvs_set_datapath(SYNTHETIC)` succeeded.

**One potential concern (~60% confidence):**

If the VF is successfully configured and started but `hn_nvs_set_datapath(VF)` fails at the `switch_data_path:` label, the function returns an error but leaves the VF started and attached. The callers don't clean this up. This is a pre-existing design issue the patch doesn't worsen, and the hypervisor may retry, but it could confuse subsequent add/remove cycles.

  parent reply	other threads:[~2026-02-23 17:44 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-21  2:45 [PATCH v2 1/8] net/netvsc: secondary ignore promiscuous enable/disable longli
2026-02-21  2:45 ` [PATCH v2 2/8] net/netvsc: fix race conditions on VF add/remove events longli
2026-02-23 17:40   ` Stephen Hemminger
2026-02-23 17:43   ` Stephen Hemminger [this message]
2026-02-23 22:31     ` [EXTERNAL] " Long Li
2026-02-21  2:45 ` [PATCH v2 3/8] net/netvsc: add multi-process VF device removal support longli
2026-02-23 17:45   ` Stephen Hemminger
2026-02-23 22:32     ` [EXTERNAL] " Long Li
2026-02-21  2:45 ` [PATCH v2 4/8] net/mana: fix PD resource leak on device close longli
2026-02-21  2:45 ` [PATCH v2 5/8] net/netvsc: fix devargs memory leak on hotplug longli
2026-02-21  2:45 ` [PATCH v2 6/8] net/mana: fix fast-path ops setup in secondary process longli
2026-02-21  2:45 ` [PATCH v2 7/8] net/mlx5: " longli
2026-02-21  2:45 ` [PATCH v2 8/8] net/mlx4: " longli
2026-02-21  5:16 ` [PATCH v2 1/8] net/netvsc: secondary ignore promiscuous enable/disable Stephen Hemminger
2026-02-23 23:25   ` [EXTERNAL] " Long Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260223094358.58f5b0f4@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=bingz@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=dsosnowski@nvidia.com \
    --cc=longli@linux.microsoft.com \
    --cc=longli@microsoft.com \
    --cc=matan@nvidia.com \
    --cc=orika@nvidia.com \
    --cc=stable@dpdk.org \
    --cc=suanmingm@nvidia.com \
    --cc=viacheslavo@nvidia.com \
    --cc=weh@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox