Re: [PATCH v2 0/2] net/netvsc: runtime queue reconfiguration and fixes

public inbox for dev@dpdk.org
 help / color / mirror / Atom feed

From: Stephen Hemminger <stephen@networkplumber.org>
To: Long Li <longli@microsoft.com>
Cc: dev@dpdk.org, weh@microsoft.com, stable@dpdk.org
Subject: Re: [PATCH v2 0/2] net/netvsc: runtime queue reconfiguration and fixes
Date: Mon, 2 Mar 2026 11:38:45 -0800	[thread overview]
Message-ID: <20260302113845.1dc61524@phoenix.local> (raw)
In-Reply-To: <20260226042414.965994-1-longli@microsoft.com>

On Wed, 25 Feb 2026 20:24:12 -0800
Long Li <longli@microsoft.com> wrote:

> This series adds support for runtime queue count reconfiguration in the
> netvsc PMD, along with a prerequisite bug fix.
> 
> Patch 1 fixes a pre-existing subchannel resource leak in
> eth_hn_dev_uninit() where subchannels are never closed before the
> primary channel.
> 
> Patch 2 adds runtime queue count reconfiguration via port
> stop/configure/start, with full NVS/RNDIS session teardown and reinit
> when the queue count changes.
> 
> v2:
> - Split subchannel leak fix into separate patch with Fixes tag (patch 1)
> - Fix reinit_failed recovery: re-map device before chan_open when device
>   is unmapped to prevent undefined behavior on unmapped ring buffers
> - Move hn_rndis_conf_offload() to after reinit block so offload config
>   targets the final RNDIS session instead of being lost on teardown
> - Use write lock in hn_vf_tx/rx_queue_release() to prevent race with
>   concurrent fast-path readers holding read lock
> - Reset RSS indirection table to queue 0 in subchan_cleanup error path
> - Fix multi-line comment style to follow DPDK convention
> 
> Long Li (2):
>   net/netvsc: fix subchannel leak on device removal
>   net/netvsc: support runtime queue count reconfiguration
> 
>  drivers/net/netvsc/hn_ethdev.c | 181 +++++++++++++++++++++++++++++++--
>  drivers/net/netvsc/hn_vf.c     |  16 +--
>  2 files changed, 181 insertions(+), 16 deletions(-)
> 

After a couple of cycles with Claude, reduced the issues to more concise summary.
Work is still needed.

---

## Patch 1: "fix subchannel leak on device removal"

Adds a loop in `eth_hn_dev_uninit` to close subchannels before closing the primary channel. This fixes a real resource leak I didn't explicitly call out (subchannels allocated in `hn_subchan_configure` were never closed on removal).

## Patch 2: "support runtime queue count reconfiguration"

Major rework of `hn_dev_configure` to support changing queue counts at runtime, plus TX drain in `hn_dev_stop`, plus write-lock fix in VF queue release.

---

## Re-evaluation of my original findings

### Finding #1: Double-free of `hv->primary` — STILL PRESENT

Neither patch changes this. The flow is still:

1. `eth_hn_dev_uninit` → `hn_dev_close` → `hn_dev_free_queues` which calls `hn_rx_queue_free(rxq, false)` — this frees `hv->primary` via `hn_rx_queue_free_common`
2. `eth_hn_dev_uninit` then does `rte_free(hv->primary)` at line 1448 — double-free

### Finding #2: Init error path leaks `hv->primary` and `channels[0]` — STILL PRESENT

The `failed:` label in `eth_hn_dev_init` still only calls `hn_chim_uninit` and `hn_detach`. It never frees `hv->primary` or closes `hv->channels[0]`.

### Finding #3: `hv->primary` alloc failure leaks `channels[0]` — STILL PRESENT

Lines 1376-1380 are untouched.

### Finding #4: `max_chan <= 0` jumps to `failed` with `err` possibly zero — STILL PRESENT

Lines 1406-1407 are untouched. `err` is 0 from the successful `hn_rndis_get_eaddr` call, so the `failed:` path tears down the device and returns success.

### Finding #5: `hn_dev_start` leaks event callback on rxfilter failure — STILL PRESENT

Lines 1035-1047 are untouched.

---

## New issues introduced by Patch 2

### New Issue A: `hn_dev_configure` reinit path doesn't update `hv->primary->chan`

When the reinit path closes the primary channel and opens a new one (lines in the patch around `rte_vmbus_chan_close(hv->channels[0])` ... `rte_vmbus_chan_open(hv->vmbus, &hv->channels[0])`), the `hv->primary` rx queue struct still holds a stale `chan` pointer from the old channel. `hv->primary->chan` was set during `hn_rx_queue_alloc` at init time and is never updated here. The next control operation using `hv->primary` (which uses `hn_primary_chan(hv)` indirectly through `hn_nvs_execute` → `rxq->ring_lock`) would operate on the new channel correctly because `hn_primary_chan()` reads from `hv->channels[0]`, but `hv->primary->chan` is stale. This matters if anything uses `rxq->chan` directly for the primary queue.

### New Issue B: `hn_chim_uninit`/`hn_chim_init` not called during reinit

The reinit path in `hn_dev_configure` calls `hn_detach` (which calls `hn_nvs_detach` → `hn_nvs_disconn_rxbuf` + `hn_nvs_disconn_chim`) and then `hn_attach` (which calls `hn_nvs_attach` → `hn_nvs_conn_chim`). After reinit, `hv->chim_cnt` and `hv->chim_szmax` get updated by `hn_nvs_conn_chim`, but the chimney bitmap (`hv->chim_bmap`) still reflects the old allocation state. If `chim_cnt` changes across the reinit, the bitmap is the wrong size. This could lead to out-of-bounds bitmap access.

### New Issue C: `reinit_failed` recovery doesn't restore `hv->primary->rxbuf_info`

When `hn_attach` succeeds in the reinit path, `hn_nvs_conn_rxbuf` allocates a *new* `rxbuf_info` for `hv->primary`. But `hn_detach` (called earlier) doesn't free the old `rxbuf_info`. So the old one leaks, and the pointer is overwritten. In the recovery path, if `hn_attach` succeeds there too, another `rxbuf_info` is allocated, leaking the previous one.

### New Issue D: VF queue release NULL-sets VF queues but VF setup doesn't re-set them

In `hn_vf_tx_queue_release`, the patch adds `vf_dev->data->tx_queues[queue_id] = NULL`. This is good for preventing use-after-free. However, the corresponding `hn_vf_tx_queue_setup` calls `rte_eth_tx_queue_setup` on the VF which internally sets the VF's queue pointer. So this should be fine on re-setup — not a bug, just noting the asymmetry is intentional.

---

## Summary

All five original findings remain unfixed by these patches. The patches address different (legitimate) problems: subchannel leaks and runtime queue reconfig. The most critical outstanding issue is still the double-free of `hv->primary` (#1), and the new reinit path in Patch 2 introduces potential concerns around stale chimney bitmap state (#B) and leaked `rxbuf_info` (#C).

next prev parent reply	other threads:[~2026-03-02 19:38 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-26  4:24 [PATCH v2 0/2] net/netvsc: runtime queue reconfiguration and fixes Long Li
2026-02-26  4:24 ` [PATCH v2 1/2] net/netvsc: fix subchannel leak on device removal Long Li
2026-02-26  4:24 ` [PATCH v2 2/2] net/netvsc: support runtime queue count reconfiguration Long Li
2026-03-02 19:38 ` Stephen Hemminger [this message]
2026-03-03  2:55   ` [EXTERNAL] Re: [PATCH v2 0/2] net/netvsc: runtime queue reconfiguration and fixes Long Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260302113845.1dc61524@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=dev@dpdk.org \
    --cc=longli@microsoft.com \
    --cc=stable@dpdk.org \
    --cc=weh@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox