All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: longli@linux.microsoft.com
Cc: dev@dpdk.org, Wei Hu <weh@microsoft.com>,
	stable@dpdk.org, Dariusz Sosnowski <dsosnowski@nvidia.com>,
	Viacheslav Ovsiienko <viacheslavo@nvidia.com>,
	Bing Zhao <bingz@nvidia.com>, Ori Kam <orika@nvidia.com>,
	Suanming Mou <suanmingm@nvidia.com>,
	Matan Azrad <matan@nvidia.com>, Long Li <longli@microsoft.com>
Subject: Re: [PATCH v2 2/8] net/netvsc: fix race conditions on VF add/remove events
Date: Mon, 23 Feb 2026 09:43:58 -0800	[thread overview]
Message-ID: <20260223094358.58f5b0f4@phoenix.local> (raw)
In-Reply-To: <20260221024540.659098-2-longli@linux.microsoft.com>

On Fri, 20 Feb 2026 18:45:21 -0800
longli@linux.microsoft.com wrote:

> From: Long Li <longli@microsoft.com>
> 
> Netvsc gets notification from VSP on VF add/remove over VMBUS, but the
> timing may not match the DPDK sequence of device events triggered from
> uevents from kernel.
> 
> Remove the retry logic from the code when attach to VF and rely on DPDK
> event to attach to VF. With this change, both the notifications from VSP
> and the DPDK will attempt a VF attach.
> 
> Also implement locking when checking on all VF related fields.
> 
> Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Long Li <longli@microsoft.com>

AI review spotted related issue.

**Patch 2 (net/netvsc: fix race conditions on VF add/remove events)** — the most complex patch in the series.

**What it fixes correctly:**

The old Tx/Rx paths had a TOCTOU race: they checked `vf_vsc_switched` without the lock, acquired the lock, then re-checked. A VF remove could complete between the first check and the lock acquisition. The new code takes the read lock *before* any VF state checks — correct fix. The lock is properly released on both paths.

The upgrade of `hn_vf_close()` from read lock to write lock is also a real bug fix, since it modifies `vf_attached` and calls `rte_eth_dev_close()`.

Moving callback registration into `hn_vf_attach()` with proper rollback (via the new `hn_vf_detach()` helper) is a good structural improvement that ties callback lifetime to VF attach/detach lifecycle.

The unconditional clear of `vf_vsc_switched` in `hn_vf_remove_unlocked()` is correct — if the VF is being removed, the switched flag must be cleared regardless of whether `hn_nvs_set_datapath(SYNTHETIC)` succeeded.

**One potential concern (~60% confidence):**

If the VF is successfully configured and started but `hn_nvs_set_datapath(VF)` fails at the `switch_data_path:` label, the function returns an error but leaves the VF started and attached. The callers don't clean this up. This is a pre-existing design issue the patch doesn't worsen, and the hypervisor may retry, but it could confuse subsequent add/remove cycles.

  parent reply	other threads:[~2026-02-23 17:44 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-21  2:45 [PATCH v2 1/8] net/netvsc: secondary ignore promiscuous enable/disable longli
2026-02-21  2:45 ` [PATCH v2 2/8] net/netvsc: fix race conditions on VF add/remove events longli
2026-02-23 17:40   ` Stephen Hemminger
2026-02-23 17:43   ` Stephen Hemminger [this message]
2026-02-23 22:31     ` [EXTERNAL] " Long Li
2026-02-21  2:45 ` [PATCH v2 3/8] net/netvsc: add multi-process VF device removal support longli
2026-02-23 17:45   ` Stephen Hemminger
2026-02-23 22:32     ` [EXTERNAL] " Long Li
2026-02-21  2:45 ` [PATCH v2 4/8] net/mana: fix PD resource leak on device close longli
2026-02-21  2:45 ` [PATCH v2 5/8] net/netvsc: fix devargs memory leak on hotplug longli
2026-02-21  2:45 ` [PATCH v2 6/8] net/mana: fix fast-path ops setup in secondary process longli
2026-02-21  2:45 ` [PATCH v2 7/8] net/mlx5: " longli
2026-02-21  2:45 ` [PATCH v2 8/8] net/mlx4: " longli
2026-02-21  5:16 ` [PATCH v2 1/8] net/netvsc: secondary ignore promiscuous enable/disable Stephen Hemminger
2026-02-23 23:25   ` [EXTERNAL] " Long Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260223094358.58f5b0f4@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=bingz@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=dsosnowski@nvidia.com \
    --cc=longli@linux.microsoft.com \
    --cc=longli@microsoft.com \
    --cc=matan@nvidia.com \
    --cc=orika@nvidia.com \
    --cc=stable@dpdk.org \
    --cc=suanmingm@nvidia.com \
    --cc=viacheslavo@nvidia.com \
    --cc=weh@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.