[PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears

DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears
@ 2026-05-06  2:05 Long Li
  2026-05-06  2:05 ` [PATCH v2 2/7] net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug Long Li
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Long Li @ 2026-05-06  2:05 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li, stable

After PCI rescan on Azure, the MANA kernel driver can take over 100
seconds to probe and create the /sys/bus/pci/devices/<dev>/net directory.
The previous fixed retry limit (NETVSC_MAX_HOTADD_RETRY=10, ~12 seconds)
was insufficient, causing VF re-attach to fail with 'Failed to parse PCI
device' on systems with slow MANA driver initialization.

Replace the fixed retry limit with an indefinite retry that only gives up
when the PCI device itself disappears from sysfs. This is safe because:

- The retry uses rte_eal_alarm callbacks which are serialized on the EAL
  interrupt thread, preventing races with VF remove or device close paths.
- Device close (eth_hn_dev_uninit) explicitly cancels all pending hotplug
  alarms via rte_eal_alarm_cancel and frees the context.
- If the PCI device is removed while retrying, access() detects the
  missing sysfs path and stops immediately.

A periodic NOTICE log every 30 retries (~30s) provides visibility into
long waits without flooding the log at DEBUG level.

Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
---
v2:
- Added detailed comment explaining why indefinite retry is
  safe (PCI sysfs disappearance is the termination condition)

 drivers/net/netvsc/hn_ethdev.c | 39 +++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index b8880edb4c..34040b3e57 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -89,8 +89,8 @@ struct netvsc_mp_param {
 #define NETVSC_ARG_TXBREAK "tx_copybreak"
 #define NETVSC_ARG_RX_EXTMBUF_ENABLE "rx_extmbuf_enable"
 
-/* The max number of retry when hot adding a VF device */
-#define NETVSC_MAX_HOTADD_RETRY 10
+/* Retry interval for hot-add VF device (microseconds) */
+#define NETVSC_HOTADD_RETRY_INTERVAL 1000000
 
 struct hn_xstats_name_off {
 	char name[RTE_ETH_XSTATS_NAME_SIZE];
@@ -622,19 +622,38 @@ static void netvsc_hotplug_retry(void *args)
 	PMD_DRV_LOG(DEBUG, "%s: retry count %d",
 		    __func__, hot_ctx->eal_hot_plug_retry);
 
-	if (hot_ctx->eal_hot_plug_retry++ > NETVSC_MAX_HOTADD_RETRY) {
-		PMD_DRV_LOG(NOTICE, "Failed to parse PCI device retry=%d",
-			    hot_ctx->eal_hot_plug_retry);
+	hot_ctx->eal_hot_plug_retry++;
+
+	/* Check if PCI device still exists — if it disappeared, give up.
+	 * Otherwise keep retrying indefinitely until the net directory
+	 * appears.  This is safe because:
+	 * - MANA driver probe can take >100s after PCI rescan
+	 * - The retry uses rte_eal_alarm callbacks serialized on the
+	 *   EAL interrupt thread, preventing races with device close
+	 * - Device close cancels pending alarms and frees the context
+	 * - If the PCI device is removed, the access() check below
+	 *   detects the missing sysfs path and stops immediately
+	 */
+	snprintf(buf, sizeof(buf), "/sys/bus/pci/devices/%s", d->name);
+	if (access(buf, F_OK) != 0) {
+		PMD_DRV_LOG(NOTICE,
+			    "PCI device %s no longer exists, giving up after %d retries",
+			    d->name, hot_ctx->eal_hot_plug_retry);
 		goto free_hotadd_ctx;
 	}
 
 	snprintf(buf, sizeof(buf), "/sys/bus/pci/devices/%s/net", d->name);
 	di = opendir(buf);
 	if (!di) {
-		PMD_DRV_LOG(DEBUG, "%s: can't open directory %s, "
-			    "retrying in 1 second", __func__, buf);
-		/* The device is still being initialized, retry after 1 second */
-		rte_eal_alarm_set(1000000, netvsc_hotplug_retry, hot_ctx);
+		if (hot_ctx->eal_hot_plug_retry % 30 == 0)
+			PMD_DRV_LOG(NOTICE,
+				    "%s: waiting for %s (retry %d, %ds elapsed)",
+				    __func__, buf, hot_ctx->eal_hot_plug_retry,
+				    hot_ctx->eal_hot_plug_retry);
+		else
+			PMD_DRV_LOG(DEBUG, "%s: can't open directory %s, "
+				    "retrying in 1 second", __func__, buf);
+		rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL, netvsc_hotplug_retry, hot_ctx);
 		return;
 	}
 
@@ -758,7 +777,7 @@ netvsc_hotadd_callback(const char *device_name, enum rte_dev_event_type type,
 			rte_spinlock_lock(&hv->hotadd_lock);
 			LIST_INSERT_HEAD(&hv->hotadd_list, hot_ctx, list);
 			rte_spinlock_unlock(&hv->hotadd_lock);
-			rte_eal_alarm_set(1000000, netvsc_hotplug_retry, hot_ctx);
+			rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL, netvsc_hotplug_retry, hot_ctx);
 			return;
 		}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/7] net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug
  2026-05-06  2:05 [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
@ 2026-05-06  2:05 ` Long Li
  2026-05-06  2:05 ` [PATCH v2 3/7] net/netvsc: retry full probe when IB device not ready during hotplug Long Li
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-06  2:05 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li, stable

When the MANA VF net directory appears after PCI rescan, udev may rename
the interface (e.g. eth1 → ens1) before DPDK can query its MAC address
via SIOCGIFHWADDR. The ioctl fails because the interface name is stale
during the rename window.

Instead of giving up when SIOCGIFHWADDR fails, close the directory and
schedule another retry. The next attempt will re-read the directory with
the updated interface name (e.g. ens1 instead of eth1) and succeed.

This was observed on Azure VMs where the MANA kernel driver takes >30
seconds to probe after PCI rescan, and udev renames the interface
immediately after registration.

Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
---
v2:
- Changed SIOCGIFHWADDR retry log level from NOTICE to DEBUG
- Updated comment to reference PCI device check as safety bound

 drivers/net/netvsc/hn_ethdev.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index 34040b3e57..1fa64cab18 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -673,10 +673,18 @@ static void netvsc_hotplug_retry(void *args)
 		ret = ioctl(s, SIOCGIFHWADDR, &req);
 		close(s);
 		if (ret == -1) {
-			PMD_DRV_LOG(ERR,
-				    "Failed to send SIOCGIFHWADDR for device %s",
+			/* Interface may be renamed by udev (e.g. eth1 → ens1).
+			 * Retry from the top — the PCI device check above
+			 * ensures we stop if the device disappears.
+			 */
+			PMD_DRV_LOG(DEBUG,
+				    "Failed to send SIOCGIFHWADDR for device %s, "
+				    "interface may be renaming, retrying",
 				    dir->d_name);
-			break;
+			closedir(di);
+			rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
+					  netvsc_hotplug_retry, hot_ctx);
+			return;
 		}
 		if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER)
 			continue;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/7] net/netvsc: retry full probe when IB device not ready during hotplug
  2026-05-06  2:05 [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
  2026-05-06  2:05 ` [PATCH v2 2/7] net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug Long Li
@ 2026-05-06  2:05 ` Long Li
  2026-05-06  2:05 ` [PATCH v2 4/7] net/netvsc: add debug logging for VF hotplug retry Long Li
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-06  2:05 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li, stable

When rte_eal_hotplug_add returns -ENODEV during VF hot-add, it means the
MANA IB/verbs device is not yet registered by the mana_ib kernel module.
This happens because the mana_ib auxiliary driver probes asynchronously
after the MANA net driver creates the network interface.

On Azure VMs, the gap between netdev registration and IB device
registration can be several seconds. Previously, netvsc would log the
error and give up after finding the matching MAC address.

Now, on -ENODEV, restart the full retry loop from the PCI device
existence check. This re-scans the net directory to pick up any
interface renames (e.g. eth1 -> ens1) and retries until the IB device
is ready.

The -EEXIST return (device already probed by another netvsc port on the
same PCI device) is handled silently, as hn_vf_add will find the
already-probed VF port.

Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
---
v2:
- Restored ERR log level for non-ENODEV/non-EEXIST failures
  from rte_eal_hotplug_add (was incorrectly lowered to NOTICE)
- Added comment explaining why ENODEV retry is safe
  (bounded by PCI sysfs disappearance)

 drivers/net/netvsc/hn_ethdev.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index 1fa64cab18..130fea38ab 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -716,17 +716,36 @@ static void netvsc_hotplug_retry(void *args)
 			 * parent device, restore its args.
 			 */
 			ret = rte_eal_hotplug_add(d->bus->name, d->name, drv_str ? drv_str : "");
-			if (ret) {
-				PMD_DRV_LOG(ERR,
-					    "Failed to add PCI device %s",
+			free(drv_str);
+
+			if (ret == -ENODEV) {
+				/* IB device not ready yet (mana_ib not probed).
+				 * Restart the full retry from the PCI device
+				 * check so we re-verify the device exists and
+				 * get fresh interface names after any renames.
+				 * This retries indefinitely — the PCI sysfs
+				 * check at the top of this function ensures
+				 * we stop if the device disappears.
+				 */
+				PMD_DRV_LOG(NOTICE,
+					    "IB device not ready for %s, "
+					    "restarting probe in 1 second",
 					    d->name);
+				closedir(di);
+				rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
+						  netvsc_hotplug_retry,
+						  hot_ctx);
+				return;
 			}
 
-			free(drv_str);
+			if (ret && ret != -EEXIST)
+				PMD_DRV_LOG(ERR,
+					    "Failed to add PCI device %s (ret=%d)",
+					    d->name, ret);
 
 			ret = hn_vf_add(dev, hv);
 			if (ret)
-				PMD_DRV_LOG(ERR, "Failed to add VF in hotplug retry: %d", ret);
+				PMD_DRV_LOG(ERR, "Failed to add VF: %d", ret);
 			break;
 		}
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/7] net/netvsc: add debug logging for VF hotplug retry
  2026-05-06  2:05 [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
  2026-05-06  2:05 ` [PATCH v2 2/7] net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug Long Li
  2026-05-06  2:05 ` [PATCH v2 3/7] net/netvsc: retry full probe when IB device not ready during hotplug Long Li
@ 2026-05-06  2:05 ` Long Li
  2026-05-06  2:05 ` [PATCH v2 5/7] net/netvsc: retry when no matching MAC found in net directory Long Li
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-06  2:05 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li

Add detailed DEBUG-level logging at every decision point in the
netvsc_hotplug_retry function to diagnose VF re-attach failures:

- Log each interface found in the net/ directory
- Log when sa_family is not ARPHRD_ETHER
- Log MAC address comparison details on mismatch using
  RTE_ETHER_ADDR_PRT_FMT for consistency with the rest of DPDK
- Log when the retry loop exits (with retry count)

Per-iteration trace uses DEBUG level to avoid flooding the log on
multi-NIC VMs with indefinite retries; NOTICE is reserved for
one-shot state transitions.

These logs help correlate DPDK hotplug retry behavior with kernel
dmesg timestamps to identify timing issues during VF re-attach
after PCI rescan on Azure.

Signed-off-by: Long Li <longli@microsoft.com>
---
v2:
- Changed all per-iteration log calls from NOTICE to DEBUG to
  avoid flooding logs on multi-NIC VMs with indefinite retries
- Replaced manual MAC format with RTE_ETHER_ADDR_PRT_FMT and
  RTE_ETHER_ADDR_BYTES macros
- Removed redundant else after break in MAC match block
- Renamed patch from "NOTICE-level" to "debug logging"

 drivers/net/netvsc/hn_ethdev.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index 130fea38ab..d1c12ca9d5 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -662,6 +662,11 @@ static void netvsc_hotplug_retry(void *args)
 		if (!strcmp(dir->d_name, ".") || !strcmp(dir->d_name, ".."))
 			continue;
 
+		PMD_DRV_LOG(DEBUG,
+			    "%s: checking interface %s in %s (retry %d)",
+			    __func__, dir->d_name, buf,
+			    hot_ctx->eal_hot_plug_retry);
+
 		/* trying to get mac address if this is a network device*/
 		s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
 		if (s == -1) {
@@ -686,8 +691,12 @@ static void netvsc_hotplug_retry(void *args)
 					  netvsc_hotplug_retry, hot_ctx);
 			return;
 		}
-		if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER)
+		if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER) {
+			PMD_DRV_LOG(DEBUG,
+				    "%s: device %s sa_family=%d not ARPHRD_ETHER, skipping",
+				    __func__, dir->d_name, req.ifr_hwaddr.sa_family);
 			continue;
+		}
 
 		memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data,
 		       RTE_DIM(eth_addr.addr_bytes));
@@ -748,12 +757,23 @@ static void netvsc_hotplug_retry(void *args)
 				PMD_DRV_LOG(ERR, "Failed to add VF: %d", ret);
 			break;
 		}
+
+		PMD_DRV_LOG(DEBUG,
+			    "%s: MAC mismatch for %s: got "
+			    RTE_ETHER_ADDR_PRT_FMT
+			    " expected " RTE_ETHER_ADDR_PRT_FMT,
+			    __func__, dir->d_name,
+			    RTE_ETHER_ADDR_BYTES(&eth_addr),
+			    RTE_ETHER_ADDR_BYTES(dev->data->mac_addrs));
 	}
 
 free_hotadd_ctx:
 	if (di)
 		closedir(di);
 
+	PMD_DRV_LOG(NOTICE, "%s: retry loop exiting for device %s (retry %d)",
+		    __func__, d->name, hot_ctx->eal_hot_plug_retry);
+
 	rte_spinlock_lock(&hv->hotadd_lock);
 	LIST_REMOVE(hot_ctx, list);
 	rte_spinlock_unlock(&hv->hotadd_lock);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 5/7] net/netvsc: retry when no matching MAC found in net directory
  2026-05-06  2:05 [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
                   ` (2 preceding siblings ...)
  2026-05-06  2:05 ` [PATCH v2 4/7] net/netvsc: add debug logging for VF hotplug retry Long Li
@ 2026-05-06  2:05 ` Long Li
  2026-05-06  2:05 ` [PATCH v2 6/7] net/netvsc: forward per-queue stats from VF device Long Li
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-06  2:05 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li, stable

On multi-NIC Azure VMs, a single MANA PCI device (7870:00:00.0) hosts
multiple VF interfaces. After PCI rescan, these interfaces register
at different times — the management NIC's VF appears first, followed
by the test NIC's VF.

Previously, when netvsc_hotplug_retry scanned the net/ directory and
found interfaces with non-matching MACs, it would exit the readdir
loop and free the hotadd context, permanently giving up. The matching
VF interface had not appeared yet.

Now, when the readdir loop ends without finding a matching MAC (dir
is NULL after loop), schedule another retry instead of giving up.
This uses a separate mac_retry counter (limit 120, ~2 minutes) so
the main retry loop remains unlimited.

Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
---
v2:
- Fixed commit message: "limit 30, ~30 seconds" corrected to
  "limit 120, ~2 minutes" to match NETVSC_MAX_MAC_RETRY=120
- Changed mac_retry log level from NOTICE to DEBUG
- Changed implicit pointer checks to explicit NULL comparisons
  (if (di) -> if (di != NULL), if (!dir) -> if (dir == NULL))

 drivers/net/netvsc/hn_ethdev.c | 36 +++++++++++++++++++++++++++++++++-
 drivers/net/netvsc/hn_var.h    |  1 +
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index d1c12ca9d5..08465489f2 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -92,6 +92,12 @@ struct netvsc_mp_param {
 /* Retry interval for hot-add VF device (microseconds) */
 #define NETVSC_HOTADD_RETRY_INTERVAL 1000000
 
+/* Max retries when net/ directory exists but no matching MAC found.
+ * On multi-NIC PCI devices, a second VF may register later.
+ * 120 retries = ~2 minutes.
+ */
+#define NETVSC_MAX_MAC_RETRY 120
+
 struct hn_xstats_name_off {
 	char name[RTE_ETH_XSTATS_NAME_SIZE];
 	unsigned int offset;
@@ -767,8 +773,36 @@ static void netvsc_hotplug_retry(void *args)
 			    RTE_ETHER_ADDR_BYTES(dev->data->mac_addrs));
 	}
 
+	/* If we opened the net directory but didn't find a matching MAC,
+	 * the VF interface may not have appeared yet (e.g. on a multi-NIC
+	 * PCI device, the second VF registers later). Retry.
+	 */
+	if (di != NULL) {
+		closedir(di);
+		di = NULL;
+		if (dir == NULL) {
+			/* readdir returned NULL — loop ended without match */
+			hot_ctx->mac_retry++;
+			if (hot_ctx->mac_retry < NETVSC_MAX_MAC_RETRY) {
+				PMD_DRV_LOG(DEBUG,
+					    "%s: no matching MAC found in %s, "
+					    "retrying in 1 second (mac_retry %d/%d)",
+					    __func__, buf,
+					    hot_ctx->mac_retry,
+					    NETVSC_MAX_MAC_RETRY);
+				rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
+						  netvsc_hotplug_retry,
+						  hot_ctx);
+				return;
+			}
+			PMD_DRV_LOG(NOTICE,
+				    "%s: no matching MAC found after %d retries, giving up",
+				    __func__, hot_ctx->mac_retry);
+		}
+	}
+
 free_hotadd_ctx:
-	if (di)
+	if (di != NULL)
 		closedir(di);
 
 	PMD_DRV_LOG(NOTICE, "%s: retry loop exiting for device %s (retry %d)",
diff --git a/drivers/net/netvsc/hn_var.h b/drivers/net/netvsc/hn_var.h
index ef55dee28e..574b909c82 100644
--- a/drivers/net/netvsc/hn_var.h
+++ b/drivers/net/netvsc/hn_var.h
@@ -127,6 +127,7 @@ struct hv_hotadd_context {
 	struct hn_data *hv;
 	struct rte_devargs da;
 	int eal_hot_plug_retry;
+	int mac_retry;
 };
 
 struct hn_data {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 6/7] net/netvsc: forward per-queue stats from VF device
  2026-05-06  2:05 [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
                   ` (3 preceding siblings ...)
  2026-05-06  2:05 ` [PATCH v2 5/7] net/netvsc: retry when no matching MAC found in net directory Long Li
@ 2026-05-06  2:05 ` Long Li
  2026-05-06  2:05 ` [PATCH v2 7/7] net/netvsc: handle VF recovery events for service reset Long Li
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-06  2:05 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li, stable

hn_vf_stats_get was ignoring the qstats parameter (__rte_unused),
calling rte_eth_stats_get which only collects aggregate stats. This
meant per-queue stats (rx_q0_good_packets, tx_q0_good_packets, etc.)
were always zero when VF datapath was active, even though the
underlying MANA driver populates them in its stats_get callback.

Call the VF device's stats_get op directly with the qstats pointer
so per-queue counters are forwarded through netvsc to the xstats
telemetry output.

Fixes: dc7680e8597c ("net/netvsc: support integrated VF")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
---
v2:
- Added comment explaining why dev_ops->stats_get is called
  directly (per-queue stats not available via public API)
- Added -ENOTSUP fallback to rte_eth_stats_get for VF drivers
  that do not support per-queue stats
- Changed implicit NULL check to explicit (stats_get != NULL)

 drivers/net/netvsc/hn_vf.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/net/netvsc/hn_vf.c b/drivers/net/netvsc/hn_vf.c
index 1fcc65a712..49e6a5b283 100644
--- a/drivers/net/netvsc/hn_vf.c
+++ b/drivers/net/netvsc/hn_vf.c
@@ -749,7 +749,7 @@ void hn_vf_rx_queue_release(struct hn_data *hv, uint16_t queue_id)
 
 int hn_vf_stats_get(struct rte_eth_dev *dev,
 		    struct rte_eth_stats *stats,
-		    struct eth_queue_stats *qstats __rte_unused)
+		    struct eth_queue_stats *qstats)
 {
 	struct hn_data *hv = dev->data->dev_private;
 	struct rte_eth_dev *vf_dev;
@@ -757,8 +757,22 @@ int hn_vf_stats_get(struct rte_eth_dev *dev,
 
 	rte_rwlock_read_lock(&hv->vf_lock);
 	vf_dev = hn_get_vf_dev(hv);
-	if (vf_dev)
-		ret = rte_eth_stats_get(vf_dev->data->port_id, stats);
+	if (vf_dev) {
+		/* Call dev_ops->stats_get directly instead of the public
+		 * rte_eth_stats_get API because we need to forward the
+		 * per-queue stats (qstats) which the public API does not
+		 * support.  Fall back to the public API if the VF driver
+		 * does not implement stats_get or returns -ENOTSUP.
+		 */
+		if (vf_dev->dev_ops->stats_get != NULL) {
+			ret = vf_dev->dev_ops->stats_get(vf_dev, stats, qstats);
+			if (ret == -ENOTSUP)
+				ret = rte_eth_stats_get(vf_dev->data->port_id,
+							stats);
+		} else {
+			ret = rte_eth_stats_get(vf_dev->data->port_id, stats);
+		}
+	}
 	rte_rwlock_read_unlock(&hv->vf_lock);
 	return ret;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 7/7] net/netvsc: handle VF recovery events for service reset
  2026-05-06  2:05 [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
                   ` (4 preceding siblings ...)
  2026-05-06  2:05 ` [PATCH v2 6/7] net/netvsc: forward per-queue stats from VF device Long Li
@ 2026-05-06  2:05 ` Long Li
  2026-05-07  2:49 ` [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Stephen Hemminger
  2026-05-15 19:28 ` [PATCH v3 0/7] net/netvsc: fix VF hotplug and service reset handling Long Li
  7 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-06  2:05 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li

Register callbacks for RTE_ETH_EVENT_ERR_RECOVERING,
RTE_ETH_EVENT_RECOVERY_SUCCESS, and RTE_ETH_EVENT_RECOVERY_FAILED
events on the VF port to handle MANA service resets.

- On ERR_RECOVERING: switch data path to synthetic but keep the
  VF device attached in DPDK
- On RECOVERY_SUCCESS: switch data path back to VF
- On RECOVERY_FAILED: do full VF removal (same as INTR_RMV)
- Unregister all recovery callbacks during detach, removal, and
  close

This ensures that during a service reset (kernel suspend/resume
without PCI remove), netvsc keeps the VF attached and seamlessly
switches back to it after recovery, without requiring a PCI
hot-add event.

This change is compatible with the current behavior when no
service reset messages are received.

Signed-off-by: Long Li <longli@microsoft.com>
---
v2:
- Added dev_started check in recovery_success callback,
  mirroring hn_vf_add_unlocked to avoid switching data path
  to VF when device is stopped
- Added vf_attached guard in recovery_failed callback to
  prevent spurious removal after concurrent INTR_RMV
- Added comment in recovering callback explaining why direct
  vf_lock acquisition is safe (vs deferred in rmv callback)

 drivers/net/netvsc/hn_vf.c | 159 +++++++++++++++++++++++++++++++++++++
 1 file changed, 159 insertions(+)

diff --git a/drivers/net/netvsc/hn_vf.c b/drivers/net/netvsc/hn_vf.c
index 49e6a5b283..52e1bb7413 100644
--- a/drivers/net/netvsc/hn_vf.c
+++ b/drivers/net/netvsc/hn_vf.c
@@ -50,6 +50,13 @@ static int hn_vf_match(const struct rte_eth_dev *dev)
 }
 
 
+static int hn_eth_recovering_callback(uint16_t port_id,
+	enum rte_eth_event_type event, void *cb_arg, void *out);
+static int hn_eth_recovery_success_callback(uint16_t port_id,
+	enum rte_eth_event_type event, void *cb_arg, void *out);
+static int hn_eth_recovery_failed_callback(uint16_t port_id,
+	enum rte_eth_event_type event, void *cb_arg, void *out);
+
 /*
  * Attach new PCI VF device and return the port_id
  */
@@ -111,7 +118,56 @@ static int hn_vf_attach(struct rte_eth_dev *dev, struct hn_data *hv)
 		return ret;
 	}
 
+	/* Register recovery event callbacks for service reset handling */
+	ret = rte_eth_dev_callback_register(hv->vf_ctx.vf_port,
+					    RTE_ETH_EVENT_ERR_RECOVERING,
+					    hn_eth_recovering_callback, hv);
+	if (ret) {
+		PMD_DRV_LOG(ERR,
+			    "Registering recovering callback failed for vf port %d ret %d",
+			    port, ret);
+		goto err_recovering;
+	}
+
+	ret = rte_eth_dev_callback_register(hv->vf_ctx.vf_port,
+					    RTE_ETH_EVENT_RECOVERY_SUCCESS,
+					    hn_eth_recovery_success_callback, hv);
+	if (ret) {
+		PMD_DRV_LOG(ERR,
+			    "Registering recovery success callback failed for vf port %d ret %d",
+			    port, ret);
+		goto err_recovery_success;
+	}
+
+	ret = rte_eth_dev_callback_register(hv->vf_ctx.vf_port,
+					    RTE_ETH_EVENT_RECOVERY_FAILED,
+					    hn_eth_recovery_failed_callback, hv);
+	if (ret) {
+		PMD_DRV_LOG(ERR,
+			    "Registering recovery failed callback failed for vf port %d ret %d",
+			    port, ret);
+		goto err_recovery_failed;
+	}
+
 	return 0;
+
+err_recovery_failed:
+	rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+					RTE_ETH_EVENT_RECOVERY_SUCCESS,
+					hn_eth_recovery_success_callback, hv);
+err_recovery_success:
+	rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+					RTE_ETH_EVENT_ERR_RECOVERING,
+					hn_eth_recovering_callback, hv);
+err_recovering:
+	rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+					RTE_ETH_EVENT_INTR_RMV,
+					hn_eth_rmv_event_callback, hv);
+	hv->vf_ctx.vf_attached = false;
+	hv->vf_ctx.vf_port = 0;
+	if (rte_eth_dev_owner_unset(port, hv->owner.id) < 0)
+		PMD_DRV_LOG(ERR, "Failed to unset owner for port %d", port);
+	return ret;
 }
 
 static void hn_vf_remove_unlocked(struct hn_data *hv);
@@ -143,6 +199,12 @@ static void hn_remove_delayed(void *args)
 		PMD_DRV_LOG(ERR,
 			    "rte_eth_dev_callback_unregister failed ret=%d",
 			    ret);
+	rte_eth_dev_callback_unregister(port_id, RTE_ETH_EVENT_ERR_RECOVERING,
+					hn_eth_recovering_callback, hv);
+	rte_eth_dev_callback_unregister(port_id, RTE_ETH_EVENT_RECOVERY_SUCCESS,
+					hn_eth_recovery_success_callback, hv);
+	rte_eth_dev_callback_unregister(port_id, RTE_ETH_EVENT_RECOVERY_FAILED,
+					hn_eth_recovery_failed_callback, hv);
 
 	/* Detach and release port_id from system */
 	ret = rte_eth_dev_stop(port_id);
@@ -187,6 +249,85 @@ int hn_eth_rmv_event_callback(uint16_t port_id,
 	return 0;
 }
 
+/*
+ * Handle VF error recovery event from MANA PMD.
+ * Switch data path to synthetic but keep the VF attached.
+ *
+ * Unlike hn_eth_rmv_event_callback (which defers via rte_eal_alarm_set
+ * to break potential lock-order coupling), we acquire vf_lock directly
+ * here.  This is safe because the MANA PMD fires recovery events from
+ * its interrupt handler context without holding any lock that could
+ * overlap with vf_lock.
+ */
+static int
+hn_eth_recovering_callback(uint16_t port_id,
+			   enum rte_eth_event_type event __rte_unused,
+			   void *cb_arg, void *out __rte_unused)
+{
+	struct hn_data *hv = cb_arg;
+
+	PMD_DRV_LOG(NOTICE, "VF port %u recovering from error", port_id);
+
+	rte_rwlock_write_lock(&hv->vf_lock);
+	hn_vf_remove_unlocked(hv);
+	rte_rwlock_write_unlock(&hv->vf_lock);
+
+	return 0;
+}
+
+/*
+ * Handle VF recovery success event from MANA PMD.
+ * Switch data path back to VF.
+ */
+static int
+hn_eth_recovery_success_callback(uint16_t port_id,
+				 enum rte_eth_event_type event __rte_unused,
+				 void *cb_arg, void *out __rte_unused)
+{
+	struct hn_data *hv = cb_arg;
+	struct rte_eth_dev *dev = &rte_eth_devices[hv->port_id];
+	int ret;
+
+	PMD_DRV_LOG(NOTICE, "VF port %u recovery succeeded", port_id);
+
+	rte_rwlock_write_lock(&hv->vf_lock);
+	/* Only switch data path to VF if the netvsc device is started,
+	 * mirroring the check in hn_vf_add_unlocked.  If the device was
+	 * stopped during recovery, defer to hn_vf_start().
+	 */
+	if (dev->data->dev_started &&
+	    hv->vf_ctx.vf_attached && !hv->vf_ctx.vf_vsc_switched) {
+		ret = hn_nvs_set_datapath(hv, NVS_DATAPATH_VF);
+		if (ret)
+			PMD_DRV_LOG(ERR, "Failed to switch to VF after recovery");
+		else
+			hv->vf_ctx.vf_vsc_switched = true;
+	}
+	rte_rwlock_write_unlock(&hv->vf_lock);
+
+	return 0;
+}
+
+/*
+ * Handle VF recovery failure event from MANA PMD.
+ * VF is unusable, do full removal.
+ */
+static int
+hn_eth_recovery_failed_callback(uint16_t port_id,
+				enum rte_eth_event_type event __rte_unused,
+				void *cb_arg, void *out __rte_unused)
+{
+	struct hn_data *hv = cb_arg;
+
+	PMD_DRV_LOG(NOTICE, "VF port %u recovery failed, removing", port_id);
+
+	/* Guard against concurrent INTR_RMV that already detached the VF */
+	if (hv->vf_ctx.vf_attached)
+		rte_eal_alarm_set(1, hn_remove_delayed, hv);
+
+	return 0;
+}
+
 static int hn_setup_vf_queues(int port, struct rte_eth_dev *dev)
 {
 	struct hn_rx_queue *rx_queue;
@@ -247,6 +388,12 @@ static void hn_vf_detach(struct hn_data *hv)
 
 	rte_eth_dev_callback_unregister(port, RTE_ETH_EVENT_INTR_RMV,
 					hn_eth_rmv_event_callback, hv);
+	rte_eth_dev_callback_unregister(port, RTE_ETH_EVENT_ERR_RECOVERING,
+					hn_eth_recovering_callback, hv);
+	rte_eth_dev_callback_unregister(port, RTE_ETH_EVENT_RECOVERY_SUCCESS,
+					hn_eth_recovery_success_callback, hv);
+	rte_eth_dev_callback_unregister(port, RTE_ETH_EVENT_RECOVERY_FAILED,
+					hn_eth_recovery_failed_callback, hv);
 
 	if (rte_eth_dev_owner_unset(port, hv->owner.id) < 0)
 		PMD_DRV_LOG(ERR, "Failed to unset owner for port %d", port);
@@ -630,6 +777,18 @@ int hn_vf_close(struct rte_eth_dev *dev)
 						RTE_ETH_EVENT_INTR_RMV,
 						hn_eth_rmv_event_callback,
 						hv);
+		rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+						RTE_ETH_EVENT_ERR_RECOVERING,
+						hn_eth_recovering_callback,
+						hv);
+		rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+						RTE_ETH_EVENT_RECOVERY_SUCCESS,
+						hn_eth_recovery_success_callback,
+						hv);
+		rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+						RTE_ETH_EVENT_RECOVERY_FAILED,
+						hn_eth_recovery_failed_callback,
+						hv);
 		rte_eal_alarm_cancel(hn_remove_delayed, hv);
 		ret = rte_eth_dev_close(hv->vf_ctx.vf_port);
 		hv->vf_ctx.vf_attached = false;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears
  2026-05-06  2:05 [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
                   ` (5 preceding siblings ...)
  2026-05-06  2:05 ` [PATCH v2 7/7] net/netvsc: handle VF recovery events for service reset Long Li
@ 2026-05-07  2:49 ` Stephen Hemminger
  2026-05-15 19:45   ` [EXTERNAL] " Long Li
  2026-05-15 19:28 ` [PATCH v3 0/7] net/netvsc: fix VF hotplug and service reset handling Long Li
  7 siblings, 1 reply; 17+ messages in thread
From: Stephen Hemminger @ 2026-05-07  2:49 UTC (permalink / raw)
  To: Long Li; +Cc: dev, Wei Hu, stable

On Tue,  5 May 2026 19:05:22 -0700
Long Li <longli@microsoft.com> wrote:

> After PCI rescan on Azure, the MANA kernel driver can take over 100
> seconds to probe and create the /sys/bus/pci/devices/<dev>/net directory.
> The previous fixed retry limit (NETVSC_MAX_HOTADD_RETRY=10, ~12 seconds)
> was insufficient, causing VF re-attach to fail with 'Failed to parse PCI
> device' on systems with slow MANA driver initialization.
> 
> Replace the fixed retry limit with an indefinite retry that only gives up
> when the PCI device itself disappears from sysfs. This is safe because:
> 
> - The retry uses rte_eal_alarm callbacks which are serialized on the EAL
>   interrupt thread, preventing races with VF remove or device close paths.
> - Device close (eth_hn_dev_uninit) explicitly cancels all pending hotplug
>   alarms via rte_eal_alarm_cancel and frees the context.
> - If the PCI device is removed while retrying, access() detects the
>   missing sysfs path and stops immediately.
> 
> A periodic NOTICE log every 30 retries (~30s) provides visibility into
> long waits without flooding the log at DEBUG level.
> 
> Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
> Cc: stable@dpdk.org
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
Better but still seeing AI review warnings.

Reviewed the v2 7-patch series against upstream drivers/net/netvsc/. Patches 1, 2, 3, and 5 are clean. Findings on the rest:
Patch 4 — the new "retry loop exiting" NOTICE fires on every termination including the success path, producing a noise alert on every successful VF re-attach.
Patch 6 — two warnings: (a) reaching directly into vf_dev->dev_ops->stats_get works only because eth_stats_qstats_get() already memset the buffers before invoking netvsc's callback, an undocumented dependency on the caller; (b) the else fallback to rte_eth_stats_get() is dead code — it returns -ENOTSUP for the same reason as the direct call.
Patch 7 — the recovering and recovery_success callbacks acquire vf_lock directly from event-callback context, departing from the existing INTR_RMV pattern that defers work via rte_eal_alarm_set precisely to avoid cross-driver lock-order assumptions. The unlocked vf_attached read in recovery_failed is a benign race that can be simplified by dropping the guard.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 0/7] net/netvsc: fix VF hotplug and service reset handling
  2026-05-06  2:05 [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
                   ` (6 preceding siblings ...)
  2026-05-07  2:49 ` [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Stephen Hemminger
@ 2026-05-15 19:28 ` Long Li
  2026-05-15 19:28   ` [PATCH v3 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
                     ` (6 more replies)
  7 siblings, 7 replies; 17+ messages in thread
From: Long Li @ 2026-05-15 19:28 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li

This series fixes several issues in the netvsc PMD's VF hot-plug retry
logic and adds support for MANA service reset (suspend/resume) recovery.

Patches 1-5 fix the VF hot-add retry path to handle Azure-specific
timing issues: slow MANA driver probe (>100s), udev interface renames,
asynchronous mana_ib registration, and multi-NIC staggered VF
appearance.

Patch 6 fixes per-queue stats forwarding from VF to netvsc.

Patch 7 adds recovery event handling for MANA service resets, where
the kernel suspends/resumes the VF without PCI remove.

v3:
- Patch 1: wrapped rte_eal_alarm_set lines to fix checkpatch
  line-length warning
- Patch 4: changed "retry loop exiting" log from NOTICE to DEBUG
  to avoid noise on every successful VF re-attach
- Patch 6: removed dead -ENOTSUP fallback to rte_eth_stats_get,
  replaced with direct -ENOTSUP return; documented caller contract
  for zeroed buffers
- Patch 7: deferred all recovery callbacks via rte_eal_alarm_set
  consistent with INTR_RMV pattern; dropped unlocked vf_attached
  guard in recovery_failed; cancel new alarms in hn_vf_close

v2:
- Patch 1: added comment explaining why indefinite retry is safe
- Patch 2: changed SIOCGIFHWADDR retry log to DEBUG
- Patch 3: restored ERR log for non-ENODEV/EEXIST failures
- Patch 4: changed per-iteration logs from NOTICE to DEBUG;
  used RTE_ETHER_ADDR_PRT_FMT macros
- Patch 5: fixed commit message (limit 120 not 30); changed
  mac_retry log to DEBUG; explicit NULL comparisons
- Patch 6: added comment for direct dev_ops call; added -ENOTSUP
  fallback
- Patch 7: added dev_started check in recovery_success; added
  vf_attached guard in recovery_failed

Long Li (7):
  net/netvsc: retry VF hotplug indefinitely until PCI device disappears
  net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug
  net/netvsc: retry full probe when IB device not ready during hotplug
  net/netvsc: add debug logging for VF hotplug retry
  net/netvsc: retry when no matching MAC found in net directory
  net/netvsc: forward per-queue stats from VF device
  net/netvsc: handle VF recovery events for service reset

 drivers/net/netvsc/hn_ethdev.c | 142 +++++++++++++++++++++----
 drivers/net/netvsc/hn_var.h    |   1 +
 drivers/net/netvsc/hn_vf.c     | 182 ++++++++++++++++++++++++++++++++-
 3 files changed, 302 insertions(+), 23 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears
  2026-05-15 19:28 ` [PATCH v3 0/7] net/netvsc: fix VF hotplug and service reset handling Long Li
@ 2026-05-15 19:28   ` Long Li
  2026-05-15 19:28   ` [PATCH v3 2/7] net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug Long Li
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-15 19:28 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li, stable

After PCI rescan on Azure, the MANA kernel driver can take over 100
seconds to probe and create the /sys/bus/pci/devices/<dev>/net directory.
The previous fixed retry limit (NETVSC_MAX_HOTADD_RETRY=10, ~12 seconds)
was insufficient, causing VF re-attach to fail with 'Failed to parse PCI
device' on systems with slow MANA driver initialization.

Replace the fixed retry limit with an indefinite retry that only gives up
when the PCI device itself disappears from sysfs. This is safe because:

- The retry uses rte_eal_alarm callbacks which are serialized on the EAL
  interrupt thread, preventing races with VF remove or device close paths.
- Device close (eth_hn_dev_uninit) explicitly cancels all pending hotplug
  alarms via rte_eal_alarm_cancel and frees the context.
- If the PCI device is removed while retrying, access() detects the
  missing sysfs path and stops immediately.

A periodic NOTICE log every 30 retries (~30s) provides visibility into
long waits without flooding the log at DEBUG level.

Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
---
v3:
- Wrapped rte_eal_alarm_set lines to stay within 100 columns
  (checkpatch line-length warning)
v2:
- Added detailed comment explaining why indefinite retry is
  safe (PCI sysfs disappearance is the termination condition)

 drivers/net/netvsc/hn_ethdev.c | 41 +++++++++++++++++++++++++---------
 1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index b8880edb4c..85e500c178 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -89,8 +89,8 @@ struct netvsc_mp_param {
 #define NETVSC_ARG_TXBREAK "tx_copybreak"
 #define NETVSC_ARG_RX_EXTMBUF_ENABLE "rx_extmbuf_enable"
 
-/* The max number of retry when hot adding a VF device */
-#define NETVSC_MAX_HOTADD_RETRY 10
+/* Retry interval for hot-add VF device (microseconds) */
+#define NETVSC_HOTADD_RETRY_INTERVAL 1000000
 
 struct hn_xstats_name_off {
 	char name[RTE_ETH_XSTATS_NAME_SIZE];
@@ -622,19 +622,39 @@ static void netvsc_hotplug_retry(void *args)
 	PMD_DRV_LOG(DEBUG, "%s: retry count %d",
 		    __func__, hot_ctx->eal_hot_plug_retry);
 
-	if (hot_ctx->eal_hot_plug_retry++ > NETVSC_MAX_HOTADD_RETRY) {
-		PMD_DRV_LOG(NOTICE, "Failed to parse PCI device retry=%d",
-			    hot_ctx->eal_hot_plug_retry);
+	hot_ctx->eal_hot_plug_retry++;
+
+	/* Check if PCI device still exists — if it disappeared, give up.
+	 * Otherwise keep retrying indefinitely until the net directory
+	 * appears.  This is safe because:
+	 * - MANA driver probe can take >100s after PCI rescan
+	 * - The retry uses rte_eal_alarm callbacks serialized on the
+	 *   EAL interrupt thread, preventing races with device close
+	 * - Device close cancels pending alarms and frees the context
+	 * - If the PCI device is removed, the access() check below
+	 *   detects the missing sysfs path and stops immediately
+	 */
+	snprintf(buf, sizeof(buf), "/sys/bus/pci/devices/%s", d->name);
+	if (access(buf, F_OK) != 0) {
+		PMD_DRV_LOG(NOTICE,
+			    "PCI device %s no longer exists, giving up after %d retries",
+			    d->name, hot_ctx->eal_hot_plug_retry);
 		goto free_hotadd_ctx;
 	}
 
 	snprintf(buf, sizeof(buf), "/sys/bus/pci/devices/%s/net", d->name);
 	di = opendir(buf);
 	if (!di) {
-		PMD_DRV_LOG(DEBUG, "%s: can't open directory %s, "
-			    "retrying in 1 second", __func__, buf);
-		/* The device is still being initialized, retry after 1 second */
-		rte_eal_alarm_set(1000000, netvsc_hotplug_retry, hot_ctx);
+		if (hot_ctx->eal_hot_plug_retry % 30 == 0)
+			PMD_DRV_LOG(NOTICE,
+				    "%s: waiting for %s (retry %d, %ds elapsed)",
+				    __func__, buf, hot_ctx->eal_hot_plug_retry,
+				    hot_ctx->eal_hot_plug_retry);
+		else
+			PMD_DRV_LOG(DEBUG, "%s: can't open directory %s, "
+				    "retrying in 1 second", __func__, buf);
+		rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
+				  netvsc_hotplug_retry, hot_ctx);
 		return;
 	}
 
@@ -758,7 +778,8 @@ netvsc_hotadd_callback(const char *device_name, enum rte_dev_event_type type,
 			rte_spinlock_lock(&hv->hotadd_lock);
 			LIST_INSERT_HEAD(&hv->hotadd_list, hot_ctx, list);
 			rte_spinlock_unlock(&hv->hotadd_lock);
-			rte_eal_alarm_set(1000000, netvsc_hotplug_retry, hot_ctx);
+			rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
+					  netvsc_hotplug_retry, hot_ctx);
 			return;
 		}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/7] net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug
  2026-05-15 19:28 ` [PATCH v3 0/7] net/netvsc: fix VF hotplug and service reset handling Long Li
  2026-05-15 19:28   ` [PATCH v3 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
@ 2026-05-15 19:28   ` Long Li
  2026-05-15 19:28   ` [PATCH v3 3/7] net/netvsc: retry full probe when IB device not ready during hotplug Long Li
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-15 19:28 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li, stable

When the MANA VF net directory appears after PCI rescan, udev may rename
the interface (e.g. eth1 → ens1) before DPDK can query its MAC address
via SIOCGIFHWADDR. The ioctl fails because the interface name is stale
during the rename window.

Instead of giving up when SIOCGIFHWADDR fails, close the directory and
schedule another retry. The next attempt will re-read the directory with
the updated interface name (e.g. ens1 instead of eth1) and succeed.

This was observed on Azure VMs where the MANA kernel driver takes >30
seconds to probe after PCI rescan, and udev renames the interface
immediately after registration.

Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
---
v3: no change
v2:
- Changed SIOCGIFHWADDR retry log level from NOTICE to DEBUG
- Updated comment to reference PCI device check as safety bound

 drivers/net/netvsc/hn_ethdev.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index 85e500c178..096489d66d 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -674,10 +674,18 @@ static void netvsc_hotplug_retry(void *args)
 		ret = ioctl(s, SIOCGIFHWADDR, &req);
 		close(s);
 		if (ret == -1) {
-			PMD_DRV_LOG(ERR,
-				    "Failed to send SIOCGIFHWADDR for device %s",
+			/* Interface may be renamed by udev (e.g. eth1 → ens1).
+			 * Retry from the top — the PCI device check above
+			 * ensures we stop if the device disappears.
+			 */
+			PMD_DRV_LOG(DEBUG,
+				    "Failed to send SIOCGIFHWADDR for device %s, "
+				    "interface may be renaming, retrying",
 				    dir->d_name);
-			break;
+			closedir(di);
+			rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
+					  netvsc_hotplug_retry, hot_ctx);
+			return;
 		}
 		if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER)
 			continue;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/7] net/netvsc: retry full probe when IB device not ready during hotplug
  2026-05-15 19:28 ` [PATCH v3 0/7] net/netvsc: fix VF hotplug and service reset handling Long Li
  2026-05-15 19:28   ` [PATCH v3 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
  2026-05-15 19:28   ` [PATCH v3 2/7] net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug Long Li
@ 2026-05-15 19:28   ` Long Li
  2026-05-15 19:28   ` [PATCH v3 4/7] net/netvsc: add debug logging for VF hotplug retry Long Li
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-15 19:28 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li, stable

When rte_eal_hotplug_add returns -ENODEV during VF hot-add, it means the
MANA IB/verbs device is not yet registered by the mana_ib kernel module.
This happens because the mana_ib auxiliary driver probes asynchronously
after the MANA net driver creates the network interface.

On Azure VMs, the gap between netdev registration and IB device
registration can be several seconds. Previously, netvsc would log the
error and give up after finding the matching MAC address.

Now, on -ENODEV, restart the full retry loop from the PCI device
existence check. This re-scans the net directory to pick up any
interface renames (e.g. eth1 -> ens1) and retries until the IB device
is ready.

The -EEXIST return (device already probed by another netvsc port on the
same PCI device) is handled silently, as hn_vf_add will find the
already-probed VF port.

Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
---
v3: no change
v2:
- Restored ERR log level for non-ENODEV/non-EEXIST failures
  from rte_eal_hotplug_add
- Added comment explaining why ENODEV retry is safe

 drivers/net/netvsc/hn_ethdev.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index 096489d66d..9e4fc33949 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -717,17 +717,36 @@ static void netvsc_hotplug_retry(void *args)
 			 * parent device, restore its args.
 			 */
 			ret = rte_eal_hotplug_add(d->bus->name, d->name, drv_str ? drv_str : "");
-			if (ret) {
-				PMD_DRV_LOG(ERR,
-					    "Failed to add PCI device %s",
+			free(drv_str);
+
+			if (ret == -ENODEV) {
+				/* IB device not ready yet (mana_ib not probed).
+				 * Restart the full retry from the PCI device
+				 * check so we re-verify the device exists and
+				 * get fresh interface names after any renames.
+				 * This retries indefinitely — the PCI sysfs
+				 * check at the top of this function ensures
+				 * we stop if the device disappears.
+				 */
+				PMD_DRV_LOG(NOTICE,
+					    "IB device not ready for %s, "
+					    "restarting probe in 1 second",
 					    d->name);
+				closedir(di);
+				rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
+						  netvsc_hotplug_retry,
+						  hot_ctx);
+				return;
 			}
 
-			free(drv_str);
+			if (ret && ret != -EEXIST)
+				PMD_DRV_LOG(ERR,
+					    "Failed to add PCI device %s (ret=%d)",
+					    d->name, ret);
 
 			ret = hn_vf_add(dev, hv);
 			if (ret)
-				PMD_DRV_LOG(ERR, "Failed to add VF in hotplug retry: %d", ret);
+				PMD_DRV_LOG(ERR, "Failed to add VF: %d", ret);
 			break;
 		}
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 4/7] net/netvsc: add debug logging for VF hotplug retry
  2026-05-15 19:28 ` [PATCH v3 0/7] net/netvsc: fix VF hotplug and service reset handling Long Li
                     ` (2 preceding siblings ...)
  2026-05-15 19:28   ` [PATCH v3 3/7] net/netvsc: retry full probe when IB device not ready during hotplug Long Li
@ 2026-05-15 19:28   ` Long Li
  2026-05-15 19:28   ` [PATCH v3 5/7] net/netvsc: retry when no matching MAC found in net directory Long Li
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-15 19:28 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li

Add detailed DEBUG-level logging at every decision point in the
netvsc_hotplug_retry function to diagnose VF re-attach failures:

- Log each interface found in the net/ directory
- Log when sa_family is not ARPHRD_ETHER
- Log MAC address comparison details on mismatch using
  RTE_ETHER_ADDR_PRT_FMT for consistency with the rest of DPDK
- Log when the retry loop exits (with retry count)

Per-iteration trace uses DEBUG level to avoid flooding the log on
multi-NIC VMs with indefinite retries; NOTICE is reserved for
one-shot state transitions.

These logs help correlate DPDK hotplug retry behavior with kernel
dmesg timestamps to identify timing issues during VF re-attach
after PCI rescan on Azure.

Signed-off-by: Long Li <longli@microsoft.com>
---
v3:
- Changed "retry loop exiting" log from NOTICE to DEBUG to
  avoid noise on every successful VF re-attach
v2:
- Changed per-iteration logs from NOTICE to DEBUG
- Used RTE_ETHER_ADDR_PRT_FMT macros

 drivers/net/netvsc/hn_ethdev.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index 9e4fc33949..16fb2b344d 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -663,6 +663,11 @@ static void netvsc_hotplug_retry(void *args)
 		if (!strcmp(dir->d_name, ".") || !strcmp(dir->d_name, ".."))
 			continue;
 
+		PMD_DRV_LOG(DEBUG,
+			    "%s: checking interface %s in %s (retry %d)",
+			    __func__, dir->d_name, buf,
+			    hot_ctx->eal_hot_plug_retry);
+
 		/* trying to get mac address if this is a network device*/
 		s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
 		if (s == -1) {
@@ -687,8 +692,12 @@ static void netvsc_hotplug_retry(void *args)
 					  netvsc_hotplug_retry, hot_ctx);
 			return;
 		}
-		if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER)
+		if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER) {
+			PMD_DRV_LOG(DEBUG,
+				    "%s: device %s sa_family=%d not ARPHRD_ETHER, skipping",
+				    __func__, dir->d_name, req.ifr_hwaddr.sa_family);
 			continue;
+		}
 
 		memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data,
 		       RTE_DIM(eth_addr.addr_bytes));
@@ -749,12 +758,23 @@ static void netvsc_hotplug_retry(void *args)
 				PMD_DRV_LOG(ERR, "Failed to add VF: %d", ret);
 			break;
 		}
+
+		PMD_DRV_LOG(DEBUG,
+			    "%s: MAC mismatch for %s: got "
+			    RTE_ETHER_ADDR_PRT_FMT
+			    " expected " RTE_ETHER_ADDR_PRT_FMT,
+			    __func__, dir->d_name,
+			    RTE_ETHER_ADDR_BYTES(&eth_addr),
+			    RTE_ETHER_ADDR_BYTES(dev->data->mac_addrs));
 	}
 
 free_hotadd_ctx:
 	if (di)
 		closedir(di);
 
+	PMD_DRV_LOG(DEBUG, "%s: retry loop exiting for device %s (retry %d)",
+		    __func__, d->name, hot_ctx->eal_hot_plug_retry);
+
 	rte_spinlock_lock(&hv->hotadd_lock);
 	LIST_REMOVE(hot_ctx, list);
 	rte_spinlock_unlock(&hv->hotadd_lock);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 5/7] net/netvsc: retry when no matching MAC found in net directory
  2026-05-15 19:28 ` [PATCH v3 0/7] net/netvsc: fix VF hotplug and service reset handling Long Li
                     ` (3 preceding siblings ...)
  2026-05-15 19:28   ` [PATCH v3 4/7] net/netvsc: add debug logging for VF hotplug retry Long Li
@ 2026-05-15 19:28   ` Long Li
  2026-05-15 19:28   ` [PATCH v3 6/7] net/netvsc: forward per-queue stats from VF device Long Li
  2026-05-15 19:28   ` [PATCH v3 7/7] net/netvsc: handle VF recovery events for service reset Long Li
  6 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-15 19:28 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li, stable

On multi-NIC Azure VMs, a single MANA PCI device (7870:00:00.0) hosts
multiple VF interfaces. After PCI rescan, these interfaces register
at different times — the management NIC's VF appears first, followed
by the test NIC's VF.

Previously, when netvsc_hotplug_retry scanned the net/ directory and
found interfaces with non-matching MACs, it would exit the readdir
loop and free the hotadd context, permanently giving up. The matching
VF interface had not appeared yet.

Now, when the readdir loop ends without finding a matching MAC (dir
is NULL after loop), schedule another retry instead of giving up.
This uses a separate mac_retry counter (limit 120, ~2 minutes) so
the main retry loop remains unlimited.

Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
---
v3: no change
v2:
- Fixed commit message (limit 120 not 30)
- Changed mac_retry log to DEBUG
- Explicit NULL comparisons

 drivers/net/netvsc/hn_ethdev.c | 36 +++++++++++++++++++++++++++++++++-
 drivers/net/netvsc/hn_var.h    |  1 +
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index 16fb2b344d..72743872bb 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -92,6 +92,12 @@ struct netvsc_mp_param {
 /* Retry interval for hot-add VF device (microseconds) */
 #define NETVSC_HOTADD_RETRY_INTERVAL 1000000
 
+/* Max retries when net/ directory exists but no matching MAC found.
+ * On multi-NIC PCI devices, a second VF may register later.
+ * 120 retries = ~2 minutes.
+ */
+#define NETVSC_MAX_MAC_RETRY 120
+
 struct hn_xstats_name_off {
 	char name[RTE_ETH_XSTATS_NAME_SIZE];
 	unsigned int offset;
@@ -768,8 +774,36 @@ static void netvsc_hotplug_retry(void *args)
 			    RTE_ETHER_ADDR_BYTES(dev->data->mac_addrs));
 	}
 
+	/* If we opened the net directory but didn't find a matching MAC,
+	 * the VF interface may not have appeared yet (e.g. on a multi-NIC
+	 * PCI device, the second VF registers later). Retry.
+	 */
+	if (di != NULL) {
+		closedir(di);
+		di = NULL;
+		if (dir == NULL) {
+			/* readdir returned NULL — loop ended without match */
+			hot_ctx->mac_retry++;
+			if (hot_ctx->mac_retry < NETVSC_MAX_MAC_RETRY) {
+				PMD_DRV_LOG(DEBUG,
+					    "%s: no matching MAC found in %s, "
+					    "retrying in 1 second (mac_retry %d/%d)",
+					    __func__, buf,
+					    hot_ctx->mac_retry,
+					    NETVSC_MAX_MAC_RETRY);
+				rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
+						  netvsc_hotplug_retry,
+						  hot_ctx);
+				return;
+			}
+			PMD_DRV_LOG(NOTICE,
+				    "%s: no matching MAC found after %d retries, giving up",
+				    __func__, hot_ctx->mac_retry);
+		}
+	}
+
 free_hotadd_ctx:
-	if (di)
+	if (di != NULL)
 		closedir(di);
 
 	PMD_DRV_LOG(DEBUG, "%s: retry loop exiting for device %s (retry %d)",
diff --git a/drivers/net/netvsc/hn_var.h b/drivers/net/netvsc/hn_var.h
index ef55dee28e..574b909c82 100644
--- a/drivers/net/netvsc/hn_var.h
+++ b/drivers/net/netvsc/hn_var.h
@@ -127,6 +127,7 @@ struct hv_hotadd_context {
 	struct hn_data *hv;
 	struct rte_devargs da;
 	int eal_hot_plug_retry;
+	int mac_retry;
 };
 
 struct hn_data {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 6/7] net/netvsc: forward per-queue stats from VF device
  2026-05-15 19:28 ` [PATCH v3 0/7] net/netvsc: fix VF hotplug and service reset handling Long Li
                     ` (4 preceding siblings ...)
  2026-05-15 19:28   ` [PATCH v3 5/7] net/netvsc: retry when no matching MAC found in net directory Long Li
@ 2026-05-15 19:28   ` Long Li
  2026-05-15 19:28   ` [PATCH v3 7/7] net/netvsc: handle VF recovery events for service reset Long Li
  6 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-15 19:28 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li, stable

hn_vf_stats_get was ignoring the qstats parameter (__rte_unused),
calling rte_eth_stats_get which only collects aggregate stats. This
meant per-queue stats (rx_q0_good_packets, tx_q0_good_packets, etc.)
were always zero when VF datapath was active, even though the
underlying MANA driver populates them in its stats_get callback.

Call the VF device's stats_get op directly with the qstats pointer
so per-queue counters are forwarded through netvsc to the xstats
telemetry output.

Fixes: dc7680e8597c ("net/netvsc: support integrated VF")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
---
v3:
- Removed dead -ENOTSUP fallback to rte_eth_stats_get,
  replaced with direct -ENOTSUP return
- Documented caller contract for zeroed buffers
v2:
- Added comment for direct dev_ops call
- Added -ENOTSUP fallback

 drivers/net/netvsc/hn_vf.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/net/netvsc/hn_vf.c b/drivers/net/netvsc/hn_vf.c
index 1fcc65a712..c83cc973fd 100644
--- a/drivers/net/netvsc/hn_vf.c
+++ b/drivers/net/netvsc/hn_vf.c
@@ -749,7 +749,7 @@ void hn_vf_rx_queue_release(struct hn_data *hv, uint16_t queue_id)
 
 int hn_vf_stats_get(struct rte_eth_dev *dev,
 		    struct rte_eth_stats *stats,
-		    struct eth_queue_stats *qstats __rte_unused)
+		    struct eth_queue_stats *qstats)
 {
 	struct hn_data *hv = dev->data->dev_private;
 	struct rte_eth_dev *vf_dev;
@@ -757,8 +757,19 @@ int hn_vf_stats_get(struct rte_eth_dev *dev,
 
 	rte_rwlock_read_lock(&hv->vf_lock);
 	vf_dev = hn_get_vf_dev(hv);
-	if (vf_dev)
-		ret = rte_eth_stats_get(vf_dev->data->port_id, stats);
+	if (vf_dev) {
+		/* Call dev_ops->stats_get directly instead of the public
+		 * rte_eth_stats_get API because we need to forward the
+		 * per-queue stats (qstats) which the public API does not
+		 * support.  The caller (eth_stats_qstats_get) has already
+		 * zeroed stats and qstats before invoking this callback.
+		 */
+		if (vf_dev->dev_ops->stats_get != NULL)
+			ret = vf_dev->dev_ops->stats_get(vf_dev, stats,
+							 qstats);
+		else
+			ret = -ENOTSUP;
+	}
 	rte_rwlock_read_unlock(&hv->vf_lock);
 	return ret;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 7/7] net/netvsc: handle VF recovery events for service reset
  2026-05-15 19:28 ` [PATCH v3 0/7] net/netvsc: fix VF hotplug and service reset handling Long Li
                     ` (5 preceding siblings ...)
  2026-05-15 19:28   ` [PATCH v3 6/7] net/netvsc: forward per-queue stats from VF device Long Li
@ 2026-05-15 19:28   ` Long Li
  6 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-15 19:28 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li

Register callbacks for RTE_ETH_EVENT_ERR_RECOVERING,
RTE_ETH_EVENT_RECOVERY_SUCCESS, and RTE_ETH_EVENT_RECOVERY_FAILED
events on the VF port to handle MANA service resets.

- On ERR_RECOVERING: defer data path switch to synthetic via
  rte_eal_alarm_set, keeping VF attached in DPDK
- On RECOVERY_SUCCESS: defer data path switch back to VF
- On RECOVERY_FAILED: do full VF removal (same as INTR_RMV)
- Unregister all recovery callbacks and cancel pending alarms
  during detach, removal, and close

All recovery callbacks defer work via rte_eal_alarm_set, consistent
with the existing INTR_RMV pattern, to avoid cross-driver lock-order
assumptions in event-callback context.

This ensures that during a service reset (kernel suspend/resume
without PCI remove), netvsc keeps the VF attached and seamlessly
switches back to it after recovery, without requiring a PCI
hot-add event.

This change is compatible with the current behavior when no
service reset messages are received.

Signed-off-by: Long Li <longli@microsoft.com>
---
v3:
- Deferred recovering and recovery_success callbacks via
  rte_eal_alarm_set, consistent with INTR_RMV pattern
- Dropped unlocked vf_attached guard in recovery_failed
- Cancel new deferred alarms in hn_vf_close
v2:
- Added dev_started check in recovery_success
- Added vf_attached guard in recovery_failed
- Added comment explaining direct lock acquisition

 drivers/net/netvsc/hn_vf.c | 165 +++++++++++++++++++++++++++++++++++++
 1 file changed, 165 insertions(+)

diff --git a/drivers/net/netvsc/hn_vf.c b/drivers/net/netvsc/hn_vf.c
index c83cc973fd..27c46ce404 100644
--- a/drivers/net/netvsc/hn_vf.c
+++ b/drivers/net/netvsc/hn_vf.c
@@ -50,6 +50,13 @@ static int hn_vf_match(const struct rte_eth_dev *dev)
 }
 
 
+static int hn_eth_recovering_callback(uint16_t port_id,
+	enum rte_eth_event_type event, void *cb_arg, void *out);
+static int hn_eth_recovery_success_callback(uint16_t port_id,
+	enum rte_eth_event_type event, void *cb_arg, void *out);
+static int hn_eth_recovery_failed_callback(uint16_t port_id,
+	enum rte_eth_event_type event, void *cb_arg, void *out);
+
 /*
  * Attach new PCI VF device and return the port_id
  */
@@ -111,7 +118,56 @@ static int hn_vf_attach(struct rte_eth_dev *dev, struct hn_data *hv)
 		return ret;
 	}
 
+	/* Register recovery event callbacks for service reset handling */
+	ret = rte_eth_dev_callback_register(hv->vf_ctx.vf_port,
+					    RTE_ETH_EVENT_ERR_RECOVERING,
+					    hn_eth_recovering_callback, hv);
+	if (ret) {
+		PMD_DRV_LOG(ERR,
+			    "Registering recovering callback failed for vf port %d ret %d",
+			    port, ret);
+		goto err_recovering;
+	}
+
+	ret = rte_eth_dev_callback_register(hv->vf_ctx.vf_port,
+					    RTE_ETH_EVENT_RECOVERY_SUCCESS,
+					    hn_eth_recovery_success_callback, hv);
+	if (ret) {
+		PMD_DRV_LOG(ERR,
+			    "Registering recovery success callback failed for vf port %d ret %d",
+			    port, ret);
+		goto err_recovery_success;
+	}
+
+	ret = rte_eth_dev_callback_register(hv->vf_ctx.vf_port,
+					    RTE_ETH_EVENT_RECOVERY_FAILED,
+					    hn_eth_recovery_failed_callback, hv);
+	if (ret) {
+		PMD_DRV_LOG(ERR,
+			    "Registering recovery failed callback failed for vf port %d ret %d",
+			    port, ret);
+		goto err_recovery_failed;
+	}
+
 	return 0;
+
+err_recovery_failed:
+	rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+					RTE_ETH_EVENT_RECOVERY_SUCCESS,
+					hn_eth_recovery_success_callback, hv);
+err_recovery_success:
+	rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+					RTE_ETH_EVENT_ERR_RECOVERING,
+					hn_eth_recovering_callback, hv);
+err_recovering:
+	rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+					RTE_ETH_EVENT_INTR_RMV,
+					hn_eth_rmv_event_callback, hv);
+	hv->vf_ctx.vf_attached = false;
+	hv->vf_ctx.vf_port = 0;
+	if (rte_eth_dev_owner_unset(port, hv->owner.id) < 0)
+		PMD_DRV_LOG(ERR, "Failed to unset owner for port %d", port);
+	return ret;
 }
 
 static void hn_vf_remove_unlocked(struct hn_data *hv);
@@ -143,6 +199,12 @@ static void hn_remove_delayed(void *args)
 		PMD_DRV_LOG(ERR,
 			    "rte_eth_dev_callback_unregister failed ret=%d",
 			    ret);
+	rte_eth_dev_callback_unregister(port_id, RTE_ETH_EVENT_ERR_RECOVERING,
+					hn_eth_recovering_callback, hv);
+	rte_eth_dev_callback_unregister(port_id, RTE_ETH_EVENT_RECOVERY_SUCCESS,
+					hn_eth_recovery_success_callback, hv);
+	rte_eth_dev_callback_unregister(port_id, RTE_ETH_EVENT_RECOVERY_FAILED,
+					hn_eth_recovery_failed_callback, hv);
 
 	/* Detach and release port_id from system */
 	ret = rte_eth_dev_stop(port_id);
@@ -187,6 +249,89 @@ int hn_eth_rmv_event_callback(uint16_t port_id,
 	return 0;
 }
 
+/*
+ * Deferred handler for VF error recovery event.
+ * Switch data path to synthetic but keep the VF attached.
+ */
+static void hn_recovering_delayed(void *args)
+{
+	struct hn_data *hv = args;
+
+	rte_rwlock_write_lock(&hv->vf_lock);
+	hn_vf_remove_unlocked(hv);
+	rte_rwlock_write_unlock(&hv->vf_lock);
+}
+
+static int
+hn_eth_recovering_callback(uint16_t port_id,
+			   enum rte_eth_event_type event __rte_unused,
+			   void *cb_arg, void *out __rte_unused)
+{
+	struct hn_data *hv = cb_arg;
+
+	PMD_DRV_LOG(NOTICE, "VF port %u recovering from error", port_id);
+	rte_eal_alarm_set(1, hn_recovering_delayed, hv);
+
+	return 0;
+}
+
+/*
+ * Deferred handler for VF recovery success event.
+ * Switch data path back to VF.
+ */
+static void hn_recovery_success_delayed(void *args)
+{
+	struct hn_data *hv = args;
+	struct rte_eth_dev *dev = &rte_eth_devices[hv->port_id];
+	int ret;
+
+	rte_rwlock_write_lock(&hv->vf_lock);
+	/* Only switch data path to VF if the netvsc device is started,
+	 * mirroring the check in hn_vf_add_unlocked.  If the device was
+	 * stopped during recovery, defer to hn_vf_start().
+	 */
+	if (dev->data->dev_started &&
+	    hv->vf_ctx.vf_attached && !hv->vf_ctx.vf_vsc_switched) {
+		ret = hn_nvs_set_datapath(hv, NVS_DATAPATH_VF);
+		if (ret)
+			PMD_DRV_LOG(ERR,
+				    "Failed to switch to VF after recovery");
+		else
+			hv->vf_ctx.vf_vsc_switched = true;
+	}
+	rte_rwlock_write_unlock(&hv->vf_lock);
+}
+
+static int
+hn_eth_recovery_success_callback(uint16_t port_id,
+				 enum rte_eth_event_type event __rte_unused,
+				 void *cb_arg, void *out __rte_unused)
+{
+	struct hn_data *hv = cb_arg;
+
+	PMD_DRV_LOG(NOTICE, "VF port %u recovery succeeded", port_id);
+	rte_eal_alarm_set(1, hn_recovery_success_delayed, hv);
+
+	return 0;
+}
+
+/*
+ * Handle VF recovery failure event from MANA PMD.
+ * VF is unusable, do full removal.
+ */
+static int
+hn_eth_recovery_failed_callback(uint16_t port_id,
+				enum rte_eth_event_type event __rte_unused,
+				void *cb_arg, void *out __rte_unused)
+{
+	struct hn_data *hv = cb_arg;
+
+	PMD_DRV_LOG(NOTICE, "VF port %u recovery failed, removing", port_id);
+	rte_eal_alarm_set(1, hn_remove_delayed, hv);
+
+	return 0;
+}
+
 static int hn_setup_vf_queues(int port, struct rte_eth_dev *dev)
 {
 	struct hn_rx_queue *rx_queue;
@@ -247,6 +392,12 @@ static void hn_vf_detach(struct hn_data *hv)
 
 	rte_eth_dev_callback_unregister(port, RTE_ETH_EVENT_INTR_RMV,
 					hn_eth_rmv_event_callback, hv);
+	rte_eth_dev_callback_unregister(port, RTE_ETH_EVENT_ERR_RECOVERING,
+					hn_eth_recovering_callback, hv);
+	rte_eth_dev_callback_unregister(port, RTE_ETH_EVENT_RECOVERY_SUCCESS,
+					hn_eth_recovery_success_callback, hv);
+	rte_eth_dev_callback_unregister(port, RTE_ETH_EVENT_RECOVERY_FAILED,
+					hn_eth_recovery_failed_callback, hv);
 
 	if (rte_eth_dev_owner_unset(port, hv->owner.id) < 0)
 		PMD_DRV_LOG(ERR, "Failed to unset owner for port %d", port);
@@ -630,7 +781,21 @@ int hn_vf_close(struct rte_eth_dev *dev)
 						RTE_ETH_EVENT_INTR_RMV,
 						hn_eth_rmv_event_callback,
 						hv);
+		rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+						RTE_ETH_EVENT_ERR_RECOVERING,
+						hn_eth_recovering_callback,
+						hv);
+		rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+						RTE_ETH_EVENT_RECOVERY_SUCCESS,
+						hn_eth_recovery_success_callback,
+						hv);
+		rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+						RTE_ETH_EVENT_RECOVERY_FAILED,
+						hn_eth_recovery_failed_callback,
+						hv);
 		rte_eal_alarm_cancel(hn_remove_delayed, hv);
+		rte_eal_alarm_cancel(hn_recovering_delayed, hv);
+		rte_eal_alarm_cancel(hn_recovery_success_delayed, hv);
 		ret = rte_eth_dev_close(hv->vf_ctx.vf_port);
 		hv->vf_ctx.vf_attached = false;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* RE: [EXTERNAL] Re: [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears
  2026-05-07  2:49 ` [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Stephen Hemminger
@ 2026-05-15 19:45   ` Long Li
  0 siblings, 0 replies; 17+ messages in thread
From: Long Li @ 2026-05-15 19:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev@dpdk.org, Wei Hu, stable@dpdk.org



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Wednesday, May 6, 2026 7:50 PM
> To: Long Li <longli@microsoft.com>
> Cc: dev@dpdk.org; Wei Hu <weh@microsoft.com>; stable@dpdk.org
> Subject: [EXTERNAL] Re: [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely
> until PCI device disappears
> 
> On Tue,  5 May 2026 19:05:22 -0700
> Long Li <longli@microsoft.com> wrote:
> 
> > After PCI rescan on Azure, the MANA kernel driver can take over 100
> > seconds to probe and create the /sys/bus/pci/devices/<dev>/net directory.
> > The previous fixed retry limit (NETVSC_MAX_HOTADD_RETRY=10, ~12
> > seconds) was insufficient, causing VF re-attach to fail with 'Failed
> > to parse PCI device' on systems with slow MANA driver initialization.
> >
> > Replace the fixed retry limit with an indefinite retry that only gives
> > up when the PCI device itself disappears from sysfs. This is safe because:
> >
> > - The retry uses rte_eal_alarm callbacks which are serialized on the EAL
> >   interrupt thread, preventing races with VF remove or device close paths.
> > - Device close (eth_hn_dev_uninit) explicitly cancels all pending hotplug
> >   alarms via rte_eal_alarm_cancel and frees the context.
> > - If the PCI device is removed while retrying, access() detects the
> >   missing sysfs path and stops immediately.
> >
> > A periodic NOTICE log every 30 retries (~30s) provides visibility into
> > long waits without flooding the log at DEBUG level.
> >
> > Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
> > Cc: stable@dpdk.org
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> Better but still seeing AI review warnings.

I have sent v3.

Thanks,
Long

> 
> Reviewed the v2 7-patch series against upstream drivers/net/netvsc/. Patches 1,
> 2, 3, and 5 are clean. Findings on the rest:
> Patch 4 — the new "retry loop exiting" NOTICE fires on every termination
> including the success path, producing a noise alert on every successful VF re-
> attach.
> Patch 6 — two warnings: (a) reaching directly into vf_dev->dev_ops->stats_get
> works only because eth_stats_qstats_get() already memset the buffers before
> invoking netvsc's callback, an undocumented dependency on the caller; (b) the
> else fallback to rte_eth_stats_get() is dead code — it returns -ENOTSUP for the
> same reason as the direct call.
> Patch 7 — the recovering and recovery_success callbacks acquire vf_lock directly
> from event-callback context, departing from the existing INTR_RMV pattern that
> defers work via rte_eal_alarm_set precisely to avoid cross-driver lock-order
> assumptions. The unlocked vf_attached read in recovery_failed is a benign race
> that can be simplified by dropping the guard.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-05-15 19:45 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06  2:05 [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
2026-05-06  2:05 ` [PATCH v2 2/7] net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug Long Li
2026-05-06  2:05 ` [PATCH v2 3/7] net/netvsc: retry full probe when IB device not ready during hotplug Long Li
2026-05-06  2:05 ` [PATCH v2 4/7] net/netvsc: add debug logging for VF hotplug retry Long Li
2026-05-06  2:05 ` [PATCH v2 5/7] net/netvsc: retry when no matching MAC found in net directory Long Li
2026-05-06  2:05 ` [PATCH v2 6/7] net/netvsc: forward per-queue stats from VF device Long Li
2026-05-06  2:05 ` [PATCH v2 7/7] net/netvsc: handle VF recovery events for service reset Long Li
2026-05-07  2:49 ` [PATCH v2 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Stephen Hemminger
2026-05-15 19:45   ` [EXTERNAL] " Long Li
2026-05-15 19:28 ` [PATCH v3 0/7] net/netvsc: fix VF hotplug and service reset handling Long Li
2026-05-15 19:28   ` [PATCH v3 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
2026-05-15 19:28   ` [PATCH v3 2/7] net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug Long Li
2026-05-15 19:28   ` [PATCH v3 3/7] net/netvsc: retry full probe when IB device not ready during hotplug Long Li
2026-05-15 19:28   ` [PATCH v3 4/7] net/netvsc: add debug logging for VF hotplug retry Long Li
2026-05-15 19:28   ` [PATCH v3 5/7] net/netvsc: retry when no matching MAC found in net directory Long Li
2026-05-15 19:28   ` [PATCH v3 6/7] net/netvsc: forward per-queue stats from VF device Long Li
2026-05-15 19:28   ` [PATCH v3 7/7] net/netvsc: handle VF recovery events for service reset Long Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox