[PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears

DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears
@ 2026-05-05 20:44 Long Li
  2026-05-05 20:44 ` [PATCH 2/7] net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug Long Li
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Long Li @ 2026-05-05 20:44 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li, stable

After PCI rescan on Azure, the MANA kernel driver can take over 100
seconds to probe and create the /sys/bus/pci/devices/<dev>/net directory.
The previous fixed retry limit (NETVSC_MAX_HOTADD_RETRY=10, ~12 seconds)
was insufficient, causing VF re-attach to fail with 'Failed to parse PCI
device' on systems with slow MANA driver initialization.

Replace the fixed retry limit with an indefinite retry that only gives up
when the PCI device itself disappears from sysfs. This is safe because:

- The retry uses rte_eal_alarm callbacks which are serialized on the EAL
  interrupt thread, preventing races with VF remove or device close paths.
- Device close (eth_hn_dev_uninit) explicitly cancels all pending hotplug
  alarms via rte_eal_alarm_cancel and frees the context.
- If the PCI device is removed while retrying, access() detects the
  missing sysfs path and stops immediately.

A periodic NOTICE log every 30 retries (~30s) provides visibility into
long waits without flooding the log at DEBUG level.

Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/net/netvsc/hn_ethdev.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index b8880edb4c..61e5aa464d 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -89,8 +89,8 @@ struct netvsc_mp_param {
 #define NETVSC_ARG_TXBREAK "tx_copybreak"
 #define NETVSC_ARG_RX_EXTMBUF_ENABLE "rx_extmbuf_enable"
 
-/* The max number of retry when hot adding a VF device */
-#define NETVSC_MAX_HOTADD_RETRY 10
+/* Retry interval for hot-add VF device (microseconds) */
+#define NETVSC_HOTADD_RETRY_INTERVAL 1000000
 
 struct hn_xstats_name_off {
 	char name[RTE_ETH_XSTATS_NAME_SIZE];
@@ -622,19 +622,32 @@ static void netvsc_hotplug_retry(void *args)
 	PMD_DRV_LOG(DEBUG, "%s: retry count %d",
 		    __func__, hot_ctx->eal_hot_plug_retry);
 
-	if (hot_ctx->eal_hot_plug_retry++ > NETVSC_MAX_HOTADD_RETRY) {
-		PMD_DRV_LOG(NOTICE, "Failed to parse PCI device retry=%d",
-			    hot_ctx->eal_hot_plug_retry);
+	hot_ctx->eal_hot_plug_retry++;
+
+	/* Check if PCI device still exists — if it disappeared, give up.
+	 * Otherwise keep retrying until the net directory appears
+	 * (MANA driver probe can take >100s after PCI rescan).
+	 */
+	snprintf(buf, sizeof(buf), "/sys/bus/pci/devices/%s", d->name);
+	if (access(buf, F_OK) != 0) {
+		PMD_DRV_LOG(NOTICE,
+			    "PCI device %s no longer exists, giving up after %d retries",
+			    d->name, hot_ctx->eal_hot_plug_retry);
 		goto free_hotadd_ctx;
 	}
 
 	snprintf(buf, sizeof(buf), "/sys/bus/pci/devices/%s/net", d->name);
 	di = opendir(buf);
 	if (!di) {
-		PMD_DRV_LOG(DEBUG, "%s: can't open directory %s, "
-			    "retrying in 1 second", __func__, buf);
-		/* The device is still being initialized, retry after 1 second */
-		rte_eal_alarm_set(1000000, netvsc_hotplug_retry, hot_ctx);
+		if (hot_ctx->eal_hot_plug_retry % 30 == 0)
+			PMD_DRV_LOG(NOTICE,
+				    "%s: waiting for %s (retry %d, %ds elapsed)",
+				    __func__, buf, hot_ctx->eal_hot_plug_retry,
+				    hot_ctx->eal_hot_plug_retry);
+		else
+			PMD_DRV_LOG(DEBUG, "%s: can't open directory %s, "
+				    "retrying in 1 second", __func__, buf);
+		rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL, netvsc_hotplug_retry, hot_ctx);
 		return;
 	}
 
@@ -758,7 +771,7 @@ netvsc_hotadd_callback(const char *device_name, enum rte_dev_event_type type,
 			rte_spinlock_lock(&hv->hotadd_lock);
 			LIST_INSERT_HEAD(&hv->hotadd_list, hot_ctx, list);
 			rte_spinlock_unlock(&hv->hotadd_lock);
-			rte_eal_alarm_set(1000000, netvsc_hotplug_retry, hot_ctx);
+			rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL, netvsc_hotplug_retry, hot_ctx);
 			return;
 		}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/7] net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug
  2026-05-05 20:44 [PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
@ 2026-05-05 20:44 ` Long Li
  2026-05-05 20:44 ` [PATCH 3/7] net/netvsc: retry full probe when IB device not ready during hotplug Long Li
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Long Li @ 2026-05-05 20:44 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li, stable

When the MANA VF net directory appears after PCI rescan, udev may rename
the interface (e.g. eth1 → ens1) before DPDK can query its MAC address
via SIOCGIFHWADDR. The ioctl fails because the interface name is stale
during the rename window.

Instead of giving up when SIOCGIFHWADDR fails, close the directory and
schedule another retry. The next attempt will re-read the directory with
the updated interface name (e.g. ens1 instead of eth1) and succeed.

This was observed on Azure VMs where the MANA kernel driver takes >30
seconds to probe after PCI rescan, and udev renames the interface
immediately after registration.

Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/net/netvsc/hn_ethdev.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index 61e5aa464d..8bb2df3c19 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -667,10 +667,17 @@ static void netvsc_hotplug_retry(void *args)
 		ret = ioctl(s, SIOCGIFHWADDR, &req);
 		close(s);
 		if (ret == -1) {
-			PMD_DRV_LOG(ERR,
-				    "Failed to send SIOCGIFHWADDR for device %s",
+			/* Interface may be renamed by udev (e.g. eth1 → ens1),
+			 * retry instead of giving up.
+			 */
+			PMD_DRV_LOG(NOTICE,
+				    "Failed to send SIOCGIFHWADDR for device %s, "
+				    "interface may be renaming, retrying",
 				    dir->d_name);
-			break;
+			closedir(di);
+			rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
+					  netvsc_hotplug_retry, hot_ctx);
+			return;
 		}
 		if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER)
 			continue;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/7] net/netvsc: retry full probe when IB device not ready during hotplug
  2026-05-05 20:44 [PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
  2026-05-05 20:44 ` [PATCH 2/7] net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug Long Li
@ 2026-05-05 20:44 ` Long Li
  2026-05-05 20:44 ` [PATCH 4/7] net/netvsc: add NOTICE-level debug logging for VF hotplug retry Long Li
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Long Li @ 2026-05-05 20:44 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li, stable

When rte_eal_hotplug_add returns -ENODEV during VF hot-add, it means the
MANA IB/verbs device is not yet registered by the mana_ib kernel module.
This happens because the mana_ib auxiliary driver probes asynchronously
after the MANA net driver creates the network interface.

On Azure VMs, the gap between netdev registration and IB device
registration can be several seconds. Previously, netvsc would log the
error and give up after finding the matching MAC address.

Now, on -ENODEV, restart the full retry loop from the PCI device
existence check. This re-scans the net directory to pick up any
interface renames (e.g. eth1 -> ens1) and retries until the IB device
is ready.

The -EEXIST return (device already probed by another netvsc port on the
same PCI device) is handled silently, as hn_vf_add will find the
already-probed VF port.

Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/net/netvsc/hn_ethdev.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index 8bb2df3c19..5d1ef10eff 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -709,17 +709,33 @@ static void netvsc_hotplug_retry(void *args)
 			 * parent device, restore its args.
 			 */
 			ret = rte_eal_hotplug_add(d->bus->name, d->name, drv_str ? drv_str : "");
-			if (ret) {
-				PMD_DRV_LOG(ERR,
-					    "Failed to add PCI device %s",
+			free(drv_str);
+
+			if (ret == -ENODEV) {
+				/* IB device not ready yet (mana_ib not probed).
+				 * Restart the full retry from PCI device check
+				 * so we re-verify the device and get fresh
+				 * interface names after any renames.
+				 */
+				PMD_DRV_LOG(NOTICE,
+					    "IB device not ready for %s, "
+					    "restarting probe in 1 second",
 					    d->name);
+				closedir(di);
+				rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
+						  netvsc_hotplug_retry,
+						  hot_ctx);
+				return;
 			}
 
-			free(drv_str);
+			if (ret && ret != -EEXIST)
+				PMD_DRV_LOG(NOTICE,
+					    "Failed to add PCI device %s (ret=%d)",
+					    d->name, ret);
 
 			ret = hn_vf_add(dev, hv);
 			if (ret)
-				PMD_DRV_LOG(ERR, "Failed to add VF in hotplug retry: %d", ret);
+				PMD_DRV_LOG(ERR, "Failed to add VF: %d", ret);
 			break;
 		}
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/7] net/netvsc: add NOTICE-level debug logging for VF hotplug retry
  2026-05-05 20:44 [PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
  2026-05-05 20:44 ` [PATCH 2/7] net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug Long Li
  2026-05-05 20:44 ` [PATCH 3/7] net/netvsc: retry full probe when IB device not ready during hotplug Long Li
@ 2026-05-05 20:44 ` Long Li
  2026-05-05 20:44 ` [PATCH 5/7] net/netvsc: retry when no matching MAC found in net directory Long Li
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Long Li @ 2026-05-05 20:44 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li

Add detailed NOTICE-level logging at every decision point in the
netvsc_hotplug_retry function to diagnose VF re-attach failures:

- Log each interface found in the net/ directory
- Log when sa_family is not ARPHRD_ETHER
- Log MAC address comparison details on mismatch
- Log when the retry loop exits (with retry count)

These logs help correlate DPDK hotplug retry behavior with kernel
dmesg timestamps to identify timing issues during VF re-attach
after PCI rescan on Azure.

Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/net/netvsc/hn_ethdev.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index 5d1ef10eff..124af4f1a1 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -656,6 +656,11 @@ static void netvsc_hotplug_retry(void *args)
 		if (!strcmp(dir->d_name, ".") || !strcmp(dir->d_name, ".."))
 			continue;
 
+		PMD_DRV_LOG(NOTICE,
+			    "%s: checking interface %s in %s (retry %d)",
+			    __func__, dir->d_name, buf,
+			    hot_ctx->eal_hot_plug_retry);
+
 		/* trying to get mac address if this is a network device*/
 		s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
 		if (s == -1) {
@@ -679,8 +684,12 @@ static void netvsc_hotplug_retry(void *args)
 					  netvsc_hotplug_retry, hot_ctx);
 			return;
 		}
-		if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER)
+		if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER) {
+			PMD_DRV_LOG(NOTICE,
+				    "%s: device %s sa_family=%d not ARPHRD_ETHER, skipping",
+				    __func__, dir->d_name, req.ifr_hwaddr.sa_family);
 			continue;
+		}
 
 		memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data,
 		       RTE_DIM(eth_addr.addr_bytes));
@@ -737,6 +746,20 @@ static void netvsc_hotplug_retry(void *args)
 			if (ret)
 				PMD_DRV_LOG(ERR, "Failed to add VF: %d", ret);
 			break;
+		} else {
+			PMD_DRV_LOG(NOTICE,
+				    "%s: MAC mismatch for %s: got %02x:%02x:%02x:%02x:%02x:%02x "
+				    "expected %02x:%02x:%02x:%02x:%02x:%02x",
+				    __func__, dir->d_name,
+				    eth_addr.addr_bytes[0], eth_addr.addr_bytes[1],
+				    eth_addr.addr_bytes[2], eth_addr.addr_bytes[3],
+				    eth_addr.addr_bytes[4], eth_addr.addr_bytes[5],
+				    dev->data->mac_addrs->addr_bytes[0],
+				    dev->data->mac_addrs->addr_bytes[1],
+				    dev->data->mac_addrs->addr_bytes[2],
+				    dev->data->mac_addrs->addr_bytes[3],
+				    dev->data->mac_addrs->addr_bytes[4],
+				    dev->data->mac_addrs->addr_bytes[5]);
 		}
 	}
 
@@ -744,6 +767,9 @@ static void netvsc_hotplug_retry(void *args)
 	if (di)
 		closedir(di);
 
+	PMD_DRV_LOG(NOTICE, "%s: retry loop exiting for device %s (retry %d)",
+		    __func__, d->name, hot_ctx->eal_hot_plug_retry);
+
 	rte_spinlock_lock(&hv->hotadd_lock);
 	LIST_REMOVE(hot_ctx, list);
 	rte_spinlock_unlock(&hv->hotadd_lock);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/7] net/netvsc: retry when no matching MAC found in net directory
  2026-05-05 20:44 [PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
                   ` (2 preceding siblings ...)
  2026-05-05 20:44 ` [PATCH 4/7] net/netvsc: add NOTICE-level debug logging for VF hotplug retry Long Li
@ 2026-05-05 20:44 ` Long Li
  2026-05-05 20:44 ` [PATCH 6/7] net/netvsc: forward per-queue stats from VF device Long Li
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Long Li @ 2026-05-05 20:44 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li, stable

On multi-NIC Azure VMs, a single MANA PCI device (7870:00:00.0) hosts
multiple VF interfaces. After PCI rescan, these interfaces register
at different times — the management NIC's VF appears first, followed
by the test NIC's VF.

Previously, when netvsc_hotplug_retry scanned the net/ directory and
found interfaces with non-matching MACs, it would exit the readdir
loop and free the hotadd context, permanently giving up. The matching
VF interface had not appeared yet.

Now, when the readdir loop ends without finding a matching MAC (dir
is NULL after loop), schedule another retry instead of giving up.
This uses a separate mac_retry counter (limit 30, ~30 seconds) so
the main retry loop remains unlimited.

Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/net/netvsc/hn_ethdev.c | 34 ++++++++++++++++++++++++++++++++++
 drivers/net/netvsc/hn_var.h    |  1 +
 2 files changed, 35 insertions(+)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index 124af4f1a1..8d9d6bbe8b 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -92,6 +92,12 @@ struct netvsc_mp_param {
 /* Retry interval for hot-add VF device (microseconds) */
 #define NETVSC_HOTADD_RETRY_INTERVAL 1000000
 
+/* Max retries when net/ directory exists but no matching MAC found.
+ * On multi-NIC PCI devices, a second VF may register later.
+ * 120 retries = ~2 minutes.
+ */
+#define NETVSC_MAX_MAC_RETRY 120
+
 struct hn_xstats_name_off {
 	char name[RTE_ETH_XSTATS_NAME_SIZE];
 	unsigned int offset;
@@ -763,6 +769,34 @@ static void netvsc_hotplug_retry(void *args)
 		}
 	}
 
+	/* If we opened the net directory but didn't find a matching MAC,
+	 * the VF interface may not have appeared yet (e.g. on a multi-NIC
+	 * PCI device, the second VF registers later). Retry.
+	 */
+	if (di) {
+		closedir(di);
+		di = NULL;
+		if (!dir) {
+			/* readdir returned NULL — loop ended without match */
+			hot_ctx->mac_retry++;
+			if (hot_ctx->mac_retry < NETVSC_MAX_MAC_RETRY) {
+				PMD_DRV_LOG(NOTICE,
+					    "%s: no matching MAC found in %s, "
+					    "retrying in 1 second (mac_retry %d/%d)",
+					    __func__, buf,
+					    hot_ctx->mac_retry,
+					    NETVSC_MAX_MAC_RETRY);
+				rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
+						  netvsc_hotplug_retry,
+						  hot_ctx);
+				return;
+			}
+			PMD_DRV_LOG(NOTICE,
+				    "%s: no matching MAC found after %d retries, giving up",
+				    __func__, hot_ctx->mac_retry);
+		}
+	}
+
 free_hotadd_ctx:
 	if (di)
 		closedir(di);
diff --git a/drivers/net/netvsc/hn_var.h b/drivers/net/netvsc/hn_var.h
index ef55dee28e..574b909c82 100644
--- a/drivers/net/netvsc/hn_var.h
+++ b/drivers/net/netvsc/hn_var.h
@@ -127,6 +127,7 @@ struct hv_hotadd_context {
 	struct hn_data *hv;
 	struct rte_devargs da;
 	int eal_hot_plug_retry;
+	int mac_retry;
 };
 
 struct hn_data {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 6/7] net/netvsc: forward per-queue stats from VF device
  2026-05-05 20:44 [PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
                   ` (3 preceding siblings ...)
  2026-05-05 20:44 ` [PATCH 5/7] net/netvsc: retry when no matching MAC found in net directory Long Li
@ 2026-05-05 20:44 ` Long Li
  2026-05-05 20:44 ` [PATCH 7/7] net/netvsc: handle VF recovery events for service reset Long Li
  2026-05-05 22:03 ` [PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Stephen Hemminger
  6 siblings, 0 replies; 9+ messages in thread
From: Long Li @ 2026-05-05 20:44 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li, stable

hn_vf_stats_get was ignoring the qstats parameter (__rte_unused),
calling rte_eth_stats_get which only collects aggregate stats. This
meant per-queue stats (rx_q0_good_packets, tx_q0_good_packets, etc.)
were always zero when VF datapath was active, even though the
underlying MANA driver populates them in its stats_get callback.

Call the VF device's stats_get op directly with the qstats pointer
so per-queue counters are forwarded through netvsc to the xstats
telemetry output.

Fixes: dc7680e8597c ("net/netvsc: support integrated VF")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/net/netvsc/hn_vf.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/netvsc/hn_vf.c b/drivers/net/netvsc/hn_vf.c
index 1fcc65a712..497f747aab 100644
--- a/drivers/net/netvsc/hn_vf.c
+++ b/drivers/net/netvsc/hn_vf.c
@@ -749,7 +749,7 @@ void hn_vf_rx_queue_release(struct hn_data *hv, uint16_t queue_id)
 
 int hn_vf_stats_get(struct rte_eth_dev *dev,
 		    struct rte_eth_stats *stats,
-		    struct eth_queue_stats *qstats __rte_unused)
+		    struct eth_queue_stats *qstats)
 {
 	struct hn_data *hv = dev->data->dev_private;
 	struct rte_eth_dev *vf_dev;
@@ -757,8 +757,12 @@ int hn_vf_stats_get(struct rte_eth_dev *dev,
 
 	rte_rwlock_read_lock(&hv->vf_lock);
 	vf_dev = hn_get_vf_dev(hv);
-	if (vf_dev)
-		ret = rte_eth_stats_get(vf_dev->data->port_id, stats);
+	if (vf_dev) {
+		if (vf_dev->dev_ops->stats_get)
+			ret = vf_dev->dev_ops->stats_get(vf_dev, stats, qstats);
+		else
+			ret = rte_eth_stats_get(vf_dev->data->port_id, stats);
+	}
 	rte_rwlock_read_unlock(&hv->vf_lock);
 	return ret;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 7/7] net/netvsc: handle VF recovery events for service reset
  2026-05-05 20:44 [PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
                   ` (4 preceding siblings ...)
  2026-05-05 20:44 ` [PATCH 6/7] net/netvsc: forward per-queue stats from VF device Long Li
@ 2026-05-05 20:44 ` Long Li
  2026-05-05 22:03 ` [PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Stephen Hemminger
  6 siblings, 0 replies; 9+ messages in thread
From: Long Li @ 2026-05-05 20:44 UTC (permalink / raw)
  To: dev, Wei Hu, Stephen Hemminger; +Cc: Long Li

Register callbacks for RTE_ETH_EVENT_ERR_RECOVERING,
RTE_ETH_EVENT_RECOVERY_SUCCESS, and RTE_ETH_EVENT_RECOVERY_FAILED
events on the VF port to handle MANA service resets.

- On ERR_RECOVERING: switch data path to synthetic but keep the
  VF device attached in DPDK
- On RECOVERY_SUCCESS: switch data path back to VF
- On RECOVERY_FAILED: do full VF removal (same as INTR_RMV)
- Unregister all recovery callbacks during detach, removal, and
  close

This ensures that during a service reset (kernel suspend/resume
without PCI remove), netvsc keeps the VF attached and seamlessly
switches back to it after recovery, without requiring a PCI
hot-add event.

This change is compatible with the current behavior when no
service reset messages are received.

Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/net/netvsc/hn_vf.c | 144 +++++++++++++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)

diff --git a/drivers/net/netvsc/hn_vf.c b/drivers/net/netvsc/hn_vf.c
index 497f747aab..f9b04fa4dc 100644
--- a/drivers/net/netvsc/hn_vf.c
+++ b/drivers/net/netvsc/hn_vf.c
@@ -50,6 +50,13 @@ static int hn_vf_match(const struct rte_eth_dev *dev)
 }
 
 
+static int hn_eth_recovering_callback(uint16_t port_id,
+	enum rte_eth_event_type event, void *cb_arg, void *out);
+static int hn_eth_recovery_success_callback(uint16_t port_id,
+	enum rte_eth_event_type event, void *cb_arg, void *out);
+static int hn_eth_recovery_failed_callback(uint16_t port_id,
+	enum rte_eth_event_type event, void *cb_arg, void *out);
+
 /*
  * Attach new PCI VF device and return the port_id
  */
@@ -111,7 +118,56 @@ static int hn_vf_attach(struct rte_eth_dev *dev, struct hn_data *hv)
 		return ret;
 	}
 
+	/* Register recovery event callbacks for service reset handling */
+	ret = rte_eth_dev_callback_register(hv->vf_ctx.vf_port,
+					    RTE_ETH_EVENT_ERR_RECOVERING,
+					    hn_eth_recovering_callback, hv);
+	if (ret) {
+		PMD_DRV_LOG(ERR,
+			    "Registering recovering callback failed for vf port %d ret %d",
+			    port, ret);
+		goto err_recovering;
+	}
+
+	ret = rte_eth_dev_callback_register(hv->vf_ctx.vf_port,
+					    RTE_ETH_EVENT_RECOVERY_SUCCESS,
+					    hn_eth_recovery_success_callback, hv);
+	if (ret) {
+		PMD_DRV_LOG(ERR,
+			    "Registering recovery success callback failed for vf port %d ret %d",
+			    port, ret);
+		goto err_recovery_success;
+	}
+
+	ret = rte_eth_dev_callback_register(hv->vf_ctx.vf_port,
+					    RTE_ETH_EVENT_RECOVERY_FAILED,
+					    hn_eth_recovery_failed_callback, hv);
+	if (ret) {
+		PMD_DRV_LOG(ERR,
+			    "Registering recovery failed callback failed for vf port %d ret %d",
+			    port, ret);
+		goto err_recovery_failed;
+	}
+
 	return 0;
+
+err_recovery_failed:
+	rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+					RTE_ETH_EVENT_RECOVERY_SUCCESS,
+					hn_eth_recovery_success_callback, hv);
+err_recovery_success:
+	rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+					RTE_ETH_EVENT_ERR_RECOVERING,
+					hn_eth_recovering_callback, hv);
+err_recovering:
+	rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+					RTE_ETH_EVENT_INTR_RMV,
+					hn_eth_rmv_event_callback, hv);
+	hv->vf_ctx.vf_attached = false;
+	hv->vf_ctx.vf_port = 0;
+	if (rte_eth_dev_owner_unset(port, hv->owner.id) < 0)
+		PMD_DRV_LOG(ERR, "Failed to unset owner for port %d", port);
+	return ret;
 }
 
 static void hn_vf_remove_unlocked(struct hn_data *hv);
@@ -143,6 +199,12 @@ static void hn_remove_delayed(void *args)
 		PMD_DRV_LOG(ERR,
 			    "rte_eth_dev_callback_unregister failed ret=%d",
 			    ret);
+	rte_eth_dev_callback_unregister(port_id, RTE_ETH_EVENT_ERR_RECOVERING,
+					hn_eth_recovering_callback, hv);
+	rte_eth_dev_callback_unregister(port_id, RTE_ETH_EVENT_RECOVERY_SUCCESS,
+					hn_eth_recovery_success_callback, hv);
+	rte_eth_dev_callback_unregister(port_id, RTE_ETH_EVENT_RECOVERY_FAILED,
+					hn_eth_recovery_failed_callback, hv);
 
 	/* Detach and release port_id from system */
 	ret = rte_eth_dev_stop(port_id);
@@ -187,6 +249,70 @@ int hn_eth_rmv_event_callback(uint16_t port_id,
 	return 0;
 }
 
+/*
+ * Handle VF error recovery event from MANA PMD.
+ * Switch data path to synthetic but keep the VF attached.
+ */
+static int
+hn_eth_recovering_callback(uint16_t port_id,
+			   enum rte_eth_event_type event __rte_unused,
+			   void *cb_arg, void *out __rte_unused)
+{
+	struct hn_data *hv = cb_arg;
+
+	PMD_DRV_LOG(NOTICE, "VF port %u recovering from error", port_id);
+
+	rte_rwlock_write_lock(&hv->vf_lock);
+	hn_vf_remove_unlocked(hv);
+	rte_rwlock_write_unlock(&hv->vf_lock);
+
+	return 0;
+}
+
+/*
+ * Handle VF recovery success event from MANA PMD.
+ * Switch data path back to VF.
+ */
+static int
+hn_eth_recovery_success_callback(uint16_t port_id,
+				 enum rte_eth_event_type event __rte_unused,
+				 void *cb_arg, void *out __rte_unused)
+{
+	struct hn_data *hv = cb_arg;
+	int ret;
+
+	PMD_DRV_LOG(NOTICE, "VF port %u recovery succeeded", port_id);
+
+	rte_rwlock_write_lock(&hv->vf_lock);
+	if (hv->vf_ctx.vf_attached && !hv->vf_ctx.vf_vsc_switched) {
+		ret = hn_nvs_set_datapath(hv, NVS_DATAPATH_VF);
+		if (ret)
+			PMD_DRV_LOG(ERR, "Failed to switch to VF after recovery");
+		else
+			hv->vf_ctx.vf_vsc_switched = true;
+	}
+	rte_rwlock_write_unlock(&hv->vf_lock);
+
+	return 0;
+}
+
+/*
+ * Handle VF recovery failure event from MANA PMD.
+ * VF is unusable, do full removal.
+ */
+static int
+hn_eth_recovery_failed_callback(uint16_t port_id,
+				enum rte_eth_event_type event __rte_unused,
+				void *cb_arg, void *out __rte_unused)
+{
+	struct hn_data *hv = cb_arg;
+
+	PMD_DRV_LOG(NOTICE, "VF port %u recovery failed, removing", port_id);
+	rte_eal_alarm_set(1, hn_remove_delayed, hv);
+
+	return 0;
+}
+
 static int hn_setup_vf_queues(int port, struct rte_eth_dev *dev)
 {
 	struct hn_rx_queue *rx_queue;
@@ -247,6 +373,12 @@ static void hn_vf_detach(struct hn_data *hv)
 
 	rte_eth_dev_callback_unregister(port, RTE_ETH_EVENT_INTR_RMV,
 					hn_eth_rmv_event_callback, hv);
+	rte_eth_dev_callback_unregister(port, RTE_ETH_EVENT_ERR_RECOVERING,
+					hn_eth_recovering_callback, hv);
+	rte_eth_dev_callback_unregister(port, RTE_ETH_EVENT_RECOVERY_SUCCESS,
+					hn_eth_recovery_success_callback, hv);
+	rte_eth_dev_callback_unregister(port, RTE_ETH_EVENT_RECOVERY_FAILED,
+					hn_eth_recovery_failed_callback, hv);
 
 	if (rte_eth_dev_owner_unset(port, hv->owner.id) < 0)
 		PMD_DRV_LOG(ERR, "Failed to unset owner for port %d", port);
@@ -630,6 +762,18 @@ int hn_vf_close(struct rte_eth_dev *dev)
 						RTE_ETH_EVENT_INTR_RMV,
 						hn_eth_rmv_event_callback,
 						hv);
+		rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+						RTE_ETH_EVENT_ERR_RECOVERING,
+						hn_eth_recovering_callback,
+						hv);
+		rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+						RTE_ETH_EVENT_RECOVERY_SUCCESS,
+						hn_eth_recovery_success_callback,
+						hv);
+		rte_eth_dev_callback_unregister(hv->vf_ctx.vf_port,
+						RTE_ETH_EVENT_RECOVERY_FAILED,
+						hn_eth_recovery_failed_callback,
+						hv);
 		rte_eal_alarm_cancel(hn_remove_delayed, hv);
 		ret = rte_eth_dev_close(hv->vf_ctx.vf_port);
 		hv->vf_ctx.vf_attached = false;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears
  2026-05-05 20:44 [PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
                   ` (5 preceding siblings ...)
  2026-05-05 20:44 ` [PATCH 7/7] net/netvsc: handle VF recovery events for service reset Long Li
@ 2026-05-05 22:03 ` Stephen Hemminger
  2026-05-06  2:06   ` [EXTERNAL] " Long Li
  6 siblings, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2026-05-05 22:03 UTC (permalink / raw)
  To: Long Li; +Cc: dev, Wei Hu, stable

On Tue,  5 May 2026 13:44:50 -0700
Long Li <longli@microsoft.com> wrote:

> After PCI rescan on Azure, the MANA kernel driver can take over 100
> seconds to probe and create the /sys/bus/pci/devices/<dev>/net directory.
> The previous fixed retry limit (NETVSC_MAX_HOTADD_RETRY=10, ~12 seconds)
> was insufficient, causing VF re-attach to fail with 'Failed to parse PCI
> device' on systems with slow MANA driver initialization.
> 
> Replace the fixed retry limit with an indefinite retry that only gives up
> when the PCI device itself disappears from sysfs. This is safe because:
> 
> - The retry uses rte_eal_alarm callbacks which are serialized on the EAL
>   interrupt thread, preventing races with VF remove or device close paths.
> - Device close (eth_hn_dev_uninit) explicitly cancels all pending hotplug
>   alarms via rte_eal_alarm_cancel and frees the context.
> - If the PCI device is removed while retrying, access() detects the
>   missing sysfs path and stops immediately.
> 
> A periodic NOTICE log every 30 retries (~30s) provides visibility into
> long waits without flooding the log at DEBUG level.
> 
> Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
> Cc: stable@dpdk.org
> Signed-off-by: Long Li <longli@microsoft.com>
> ---

Since this series touches lots of stuff, decided to do a more through than
normal AI review

Review of patch series: net/netvsc VF hotplug improvements (7 patches)

Series targets stale interface names, slow MANA driver probe, multi-VF
PCI devices, and adds service-reset event handling. Reviewed against
upstream main as of clone date.

=========================================================================
[PATCH 3/7] net/netvsc: retry full probe when IB device not ready during
                       hotplug
=========================================================================

Warning:

  drivers/net/netvsc/hn_ethdev.c (around the new ENODEV/EEXIST handling)

  The original code logged failures from rte_eal_hotplug_add at
  PMD_DRV_LOG(ERR). The patch lowers all non-ENODEV, non-EEXIST returns
  to PMD_DRV_LOG(NOTICE):

      if (ret && ret != -EEXIST)
              PMD_DRV_LOG(NOTICE,
                          "Failed to add PCI device %s (ret=%d)",
                          d->name, ret);

  -ENOMEM, -EBUSY, -EIO and similar errors from rte_eal_hotplug_add are
  legitimate failures that operators need to see. Recommend keeping ERR
  for unrecognized error codes; only -ENODEV and -EEXIST are now expected
  outcomes worth a quieter log level.

=========================================================================
[PATCH 4/7] net/netvsc: add NOTICE-level debug logging for VF hotplug
                       retry
=========================================================================

Warning:

  drivers/net/netvsc/hn_ethdev.c, multiple sites in netvsc_hotplug_retry

  The new log calls inside the readdir() loop fire at NOTICE for every
  directory entry on every retry pass:

    PMD_DRV_LOG(NOTICE, "%s: checking interface %s in %s (retry %d)", ...)
    PMD_DRV_LOG(NOTICE, "%s: device %s sa_family=%d not ARPHRD_ETHER, ...")
    PMD_DRV_LOG(NOTICE, "%s: MAC mismatch for %s: got ... expected ...")

  Combined with patch 1 (indefinite retries) and patch 5 (up to 120
  mac_retry passes), a multi-NIC VM can produce hundreds of NOTICE lines
  per VF re-attach. Per-iteration trace belongs at DEBUG; reserve NOTICE
  for one-shot state transitions (loop start, match found, give-up).

Info:

  drivers/net/netvsc/hn_ethdev.c, MAC mismatch log

  The format string and byte arguments expand the MAC inline:

      "%02x:%02x:%02x:%02x:%02x:%02x ..."
      eth_addr.addr_bytes[0], ..., eth_addr.addr_bytes[5],
      dev->data->mac_addrs->addr_bytes[0], ...

  rte_ether.h provides RTE_ETHER_ADDR_PRT_FMT and RTE_ETHER_ADDR_BYTES()
  for exactly this; using them is shorter and consistent with the rest
  of DPDK.

Info:

  drivers/net/netvsc/hn_ethdev.c, MAC match block

  The new "} else {" follows an if-branch that already ends in break, so
  the else is redundant — the log can sit unindented after the closing
  brace of the if.

=========================================================================
[PATCH 5/7] net/netvsc: retry when no matching MAC found in net directory
=========================================================================

Error:

  Commit message vs. code disagree on the retry budget.

  Commit message:
      "This uses a separate mac_retry counter (limit 30, ~30 seconds) so
       the main retry loop remains unlimited."

  Code:
      /* Max retries when net/ directory exists but no matching MAC found.
       * On multi-NIC PCI devices, a second VF may register later.
       * 120 retries = ~2 minutes.
       */
      #define NETVSC_MAX_MAC_RETRY 120

  Either the commit message or the macro value needs to be corrected so
  they agree. With NETVSC_HOTADD_RETRY_INTERVAL = 1s, 120 retries is
  ~2 min, not ~30 s.

Info:

  drivers/net/netvsc/hn_ethdev.c, in the new "no MAC match" block

      if (di) {
              closedir(di);
              di = NULL;

  DPDK style requires explicit pointer comparison: "if (di != NULL)".

=========================================================================
[PATCH 6/7] net/netvsc: forward per-queue stats from VF device
=========================================================================

Warning:

  drivers/net/netvsc/hn_vf.c, hn_vf_stats_get

      if (vf_dev->dev_ops->stats_get)
              ret = vf_dev->dev_ops->stats_get(vf_dev, stats, qstats);
      else
              ret = rte_eth_stats_get(vf_dev->data->port_id, stats);

  Reaching directly into another ethdev's dev_ops->stats_get is not a
  pattern used anywhere else in DPDK (bonding, failsafe, etc. all go
  through rte_eth_stats_get). The shortcut bypasses three things the
  public wrapper does:

    - RTE_ETH_VALID_PORTID_OR_ERR_RET on the VF port_id
    - memset(stats, 0) and stats->rx_nombuf = data->rx_mbuf_alloc_failed
    - eth_err() return-code translation

  The memset is harmless here because the outer ethdev wrapper has
  already zeroed stats/qstats for the netvsc port. The rx_nombuf
  attribution, however, changes: previously stats->rx_nombuf was
  pre-loaded with the VF's data->rx_mbuf_alloc_failed before the VF's
  callback ran; now it's whatever was set for the netvsc port. Worth
  documenting in the commit message if intentional.

  Also, the dev_ops pointer test uses an implicit comparison:

      if (vf_dev->dev_ops->stats_get)

  Per DPDK style this should be "!= NULL".

  A cleaner long-term fix would be to extend the ethdev API so per-queue
  stats can be forwarded through the public path, or wrap this dispatch
  in a small helper with a comment explaining why the public API is
  insufficient.

Info:

  drivers/net/netvsc/hn_vf.c

  Some VF drivers may return -ENOTSUP or partial data when called with a
  non-NULL qstats. The patch unconditionally propagates that return,
  which could turn a previously-successful stats_get into a failure for
  netvsc users on certain VF drivers. A graceful fallback to
  rte_eth_stats_get on -ENOTSUP would preserve backward compatibility.

=========================================================================
[PATCH 7/7] net/netvsc: handle VF recovery events for service reset
=========================================================================

Warning:

  drivers/net/netvsc/hn_vf.c, hn_eth_recovery_success_callback

      rte_rwlock_write_lock(&hv->vf_lock);
      if (hv->vf_ctx.vf_attached && !hv->vf_ctx.vf_vsc_switched) {
              ret = hn_nvs_set_datapath(hv, NVS_DATAPATH_VF);
              ...
      }

  hn_vf_add_unlocked guards the equivalent NVS_DATAPATH_VF switch with
  "if (dev->data->dev_started)" precisely to avoid routing traffic to
  the VF before queues are configured. The new callback omits that
  check. If the user calls rte_eth_dev_stop() on the netvsc port during
  the ERR_RECOVERING -> RECOVERY_SUCCESS window, this callback will
  switch the host data path to a VF whose DPDK queues may have been
  torn down. Recommend mirroring the dev_started check.

Warning:

  drivers/net/netvsc/hn_vf.c, hn_eth_recovering_callback and
                              hn_eth_recovery_success_callback

  Both new callbacks acquire hv->vf_lock as a writer directly in the
  event-callback context. The existing hn_eth_rmv_event_callback
  intentionally does not take vf_lock — it defers via:

      rte_eal_alarm_set(1, hn_remove_delayed, hv);

  to break a possible lock-order coupling with the calling driver (the
  PMD that fires rte_eth_dev_callback_process may itself be holding an
  internal lock). Taking vf_lock directly inside the recovery callbacks
  introduces a different lock-ordering invariant from the rest of the
  file. Consider either deferring through rte_eal_alarm_set as well, or
  adding a comment explaining why direct acquisition is safe in the
  MANA -> netvsc invocation path.

Info:

  drivers/net/netvsc/hn_vf.c, hn_eth_recovery_failed_callback

      rte_eal_alarm_set(1, hn_remove_delayed, hv);

  The state of hv->vf_ctx.vf_attached is not checked. If RECOVERY_FAILED
  arrives after a concurrent INTR_RMV has already detached the VF, the
  alarm will fire hn_remove_delayed against an already-removed port,
  generating a spurious "Start to remove port 0" log and downstream
  unregister failures. The damage is limited because hn_remove_delayed
  takes the lock and re-checks state, but the noise is avoidable with a
  vf_attached guard before scheduling the alarm.

=========================================================================
General notes on the series
=========================================================================

Patches 1, 2, and 5 together turn netvsc_hotplug_retry into a
genuinely-unbounded loop in two dimensions:

  - opendir failures retry indefinitely (patch 1, capped only by PCI
    device disappearance)
  - SIOCGIFHWADDR failures and -ENODEV from rte_eal_hotplug_add reschedule
    without incrementing any counter (patches 2, 3)
  - "no matching MAC" retries are bounded by mac_retry / 120 (patch 5)

Only patch 5's path has a hard upper bound. The other paths rely on the
PCI device eventually disappearing if the VF never comes up. On a
permanently broken VF that stays present in sysfs but never registers
a netdev, the EAL alarm thread will spin forever. Worth confirming that
this is the intended behaviour, or adding an overall ceiling (e.g., a
much higher cap than 30s but still finite) for the opendir / ioctl /
ENODEV paths.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [EXTERNAL] Re: [PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears
  2026-05-05 22:03 ` [PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Stephen Hemminger
@ 2026-05-06  2:06   ` Long Li
  0 siblings, 0 replies; 9+ messages in thread
From: Long Li @ 2026-05-06  2:06 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev@dpdk.org, Wei Hu, stable@dpdk.org

> 
> On Tue,  5 May 2026 13:44:50 -0700
> Long Li <longli@microsoft.com> wrote:
> 
> > After PCI rescan on Azure, the MANA kernel driver can take over 100
> > seconds to probe and create the /sys/bus/pci/devices/<dev>/net directory.
> > The previous fixed retry limit (NETVSC_MAX_HOTADD_RETRY=10, ~12
> > seconds) was insufficient, causing VF re-attach to fail with 'Failed
> > to parse PCI device' on systems with slow MANA driver initialization.
> >
> > Replace the fixed retry limit with an indefinite retry that only gives
> > up when the PCI device itself disappears from sysfs. This is safe because:
> >
> > - The retry uses rte_eal_alarm callbacks which are serialized on the EAL
> >   interrupt thread, preventing races with VF remove or device close paths.
> > - Device close (eth_hn_dev_uninit) explicitly cancels all pending hotplug
> >   alarms via rte_eal_alarm_cancel and frees the context.
> > - If the PCI device is removed while retrying, access() detects the
> >   missing sysfs path and stops immediately.
> >
> > A periodic NOTICE log every 30 retries (~30s) provides visibility into
> > long waits without flooding the log at DEBUG level.
> >
> > Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
> > Cc: stable@dpdk.org
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> 
> Since this series touches lots of stuff, decided to do a more through than normal
> AI review

I have sent v2, with all the comments addressed.

Thanks,
Long


> 
> Review of patch series: net/netvsc VF hotplug improvements (7 patches)
> 
> Series targets stale interface names, slow MANA driver probe, multi-VF PCI
> devices, and adds service-reset event handling. Reviewed against upstream main
> as of clone date.
> 
> =================================================================
> ========
> [PATCH 3/7] net/netvsc: retry full probe when IB device not ready during
>                        hotplug
> =================================================================
> ========
> 
> Warning:
> 
>   drivers/net/netvsc/hn_ethdev.c (around the new ENODEV/EEXIST handling)
> 
>   The original code logged failures from rte_eal_hotplug_add at
>   PMD_DRV_LOG(ERR). The patch lowers all non-ENODEV, non-EEXIST returns
>   to PMD_DRV_LOG(NOTICE):
> 
>       if (ret && ret != -EEXIST)
>               PMD_DRV_LOG(NOTICE,
>                           "Failed to add PCI device %s (ret=%d)",
>                           d->name, ret);
> 
>   -ENOMEM, -EBUSY, -EIO and similar errors from rte_eal_hotplug_add are
>   legitimate failures that operators need to see. Recommend keeping ERR
>   for unrecognized error codes; only -ENODEV and -EEXIST are now expected
>   outcomes worth a quieter log level.
> 
> 
> =================================================================
> ========
> [PATCH 4/7] net/netvsc: add NOTICE-level debug logging for VF hotplug
>                        retry
> =================================================================
> ========
> 
> Warning:
> 
>   drivers/net/netvsc/hn_ethdev.c, multiple sites in netvsc_hotplug_retry
> 
>   The new log calls inside the readdir() loop fire at NOTICE for every
>   directory entry on every retry pass:
> 
>     PMD_DRV_LOG(NOTICE, "%s: checking interface %s in %s (retry %d)", ...)
>     PMD_DRV_LOG(NOTICE, "%s: device %s sa_family=%d not
> ARPHRD_ETHER, ...")
>     PMD_DRV_LOG(NOTICE, "%s: MAC mismatch for %s: got ... expected ...")
> 
>   Combined with patch 1 (indefinite retries) and patch 5 (up to 120
>   mac_retry passes), a multi-NIC VM can produce hundreds of NOTICE lines
>   per VF re-attach. Per-iteration trace belongs at DEBUG; reserve NOTICE
>   for one-shot state transitions (loop start, match found, give-up).
> 
> Info:
> 
>   drivers/net/netvsc/hn_ethdev.c, MAC mismatch log
> 
>   The format string and byte arguments expand the MAC inline:
> 
>       "%02x:%02x:%02x:%02x:%02x:%02x ..."
>       eth_addr.addr_bytes[0], ..., eth_addr.addr_bytes[5],
>       dev->data->mac_addrs->addr_bytes[0], ...
> 
>   rte_ether.h provides RTE_ETHER_ADDR_PRT_FMT and
> RTE_ETHER_ADDR_BYTES()
>   for exactly this; using them is shorter and consistent with the rest
>   of DPDK.
> 
> Info:
> 
>   drivers/net/netvsc/hn_ethdev.c, MAC match block
> 
>   The new "} else {" follows an if-branch that already ends in break, so
>   the else is redundant — the log can sit unindented after the closing
>   brace of the if.
> 
> 
> =================================================================
> ========
> [PATCH 5/7] net/netvsc: retry when no matching MAC found in net directory
> =================================================================
> ========
> 
> Error:
> 
>   Commit message vs. code disagree on the retry budget.
> 
>   Commit message:
>       "This uses a separate mac_retry counter (limit 30, ~30 seconds) so
>        the main retry loop remains unlimited."
> 
>   Code:
>       /* Max retries when net/ directory exists but no matching MAC found.
>        * On multi-NIC PCI devices, a second VF may register later.
>        * 120 retries = ~2 minutes.
>        */
>       #define NETVSC_MAX_MAC_RETRY 120
> 
>   Either the commit message or the macro value needs to be corrected so
>   they agree. With NETVSC_HOTADD_RETRY_INTERVAL = 1s, 120 retries is
>   ~2 min, not ~30 s.
> 
> Info:
> 
>   drivers/net/netvsc/hn_ethdev.c, in the new "no MAC match" block
> 
>       if (di) {
>               closedir(di);
>               di = NULL;
> 
>   DPDK style requires explicit pointer comparison: "if (di != NULL)".
> 
> 
> =================================================================
> ========
> [PATCH 6/7] net/netvsc: forward per-queue stats from VF device
> =================================================================
> ========
> 
> Warning:
> 
>   drivers/net/netvsc/hn_vf.c, hn_vf_stats_get
> 
>       if (vf_dev->dev_ops->stats_get)
>               ret = vf_dev->dev_ops->stats_get(vf_dev, stats, qstats);
>       else
>               ret = rte_eth_stats_get(vf_dev->data->port_id, stats);
> 
>   Reaching directly into another ethdev's dev_ops->stats_get is not a
>   pattern used anywhere else in DPDK (bonding, failsafe, etc. all go
>   through rte_eth_stats_get). The shortcut bypasses three things the
>   public wrapper does:
> 
>     - RTE_ETH_VALID_PORTID_OR_ERR_RET on the VF port_id
>     - memset(stats, 0) and stats->rx_nombuf = data->rx_mbuf_alloc_failed
>     - eth_err() return-code translation
> 
>   The memset is harmless here because the outer ethdev wrapper has
>   already zeroed stats/qstats for the netvsc port. The rx_nombuf
>   attribution, however, changes: previously stats->rx_nombuf was
>   pre-loaded with the VF's data->rx_mbuf_alloc_failed before the VF's
>   callback ran; now it's whatever was set for the netvsc port. Worth
>   documenting in the commit message if intentional.
> 
>   Also, the dev_ops pointer test uses an implicit comparison:
> 
>       if (vf_dev->dev_ops->stats_get)
> 
>   Per DPDK style this should be "!= NULL".
> 
>   A cleaner long-term fix would be to extend the ethdev API so per-queue
>   stats can be forwarded through the public path, or wrap this dispatch
>   in a small helper with a comment explaining why the public API is
>   insufficient.
> 
> Info:
> 
>   drivers/net/netvsc/hn_vf.c
> 
>   Some VF drivers may return -ENOTSUP or partial data when called with a
>   non-NULL qstats. The patch unconditionally propagates that return,
>   which could turn a previously-successful stats_get into a failure for
>   netvsc users on certain VF drivers. A graceful fallback to
>   rte_eth_stats_get on -ENOTSUP would preserve backward compatibility.
> 
> 
> =================================================================
> ========
> [PATCH 7/7] net/netvsc: handle VF recovery events for service reset
> =================================================================
> ========
> 
> Warning:
> 
>   drivers/net/netvsc/hn_vf.c, hn_eth_recovery_success_callback
> 
>       rte_rwlock_write_lock(&hv->vf_lock);
>       if (hv->vf_ctx.vf_attached && !hv->vf_ctx.vf_vsc_switched) {
>               ret = hn_nvs_set_datapath(hv, NVS_DATAPATH_VF);
>               ...
>       }
> 
>   hn_vf_add_unlocked guards the equivalent NVS_DATAPATH_VF switch with
>   "if (dev->data->dev_started)" precisely to avoid routing traffic to
>   the VF before queues are configured. The new callback omits that
>   check. If the user calls rte_eth_dev_stop() on the netvsc port during
>   the ERR_RECOVERING -> RECOVERY_SUCCESS window, this callback will
>   switch the host data path to a VF whose DPDK queues may have been
>   torn down. Recommend mirroring the dev_started check.
> 
> Warning:
> 
>   drivers/net/netvsc/hn_vf.c, hn_eth_recovering_callback and
>                               hn_eth_recovery_success_callback
> 
>   Both new callbacks acquire hv->vf_lock as a writer directly in the
>   event-callback context. The existing hn_eth_rmv_event_callback
>   intentionally does not take vf_lock — it defers via:
> 
>       rte_eal_alarm_set(1, hn_remove_delayed, hv);
> 
>   to break a possible lock-order coupling with the calling driver (the
>   PMD that fires rte_eth_dev_callback_process may itself be holding an
>   internal lock). Taking vf_lock directly inside the recovery callbacks
>   introduces a different lock-ordering invariant from the rest of the
>   file. Consider either deferring through rte_eal_alarm_set as well, or
>   adding a comment explaining why direct acquisition is safe in the
>   MANA -> netvsc invocation path.
> 
> Info:
> 
>   drivers/net/netvsc/hn_vf.c, hn_eth_recovery_failed_callback
> 
>       rte_eal_alarm_set(1, hn_remove_delayed, hv);
> 
>   The state of hv->vf_ctx.vf_attached is not checked. If RECOVERY_FAILED
>   arrives after a concurrent INTR_RMV has already detached the VF, the
>   alarm will fire hn_remove_delayed against an already-removed port,
>   generating a spurious "Start to remove port 0" log and downstream
>   unregister failures. The damage is limited because hn_remove_delayed
>   takes the lock and re-checks state, but the noise is avoidable with a
>   vf_attached guard before scheduling the alarm.
> 
> 
> =================================================================
> ========
> General notes on the series
> =================================================================
> ========
> 
> Patches 1, 2, and 5 together turn netvsc_hotplug_retry into a genuinely-
> unbounded loop in two dimensions:
> 
>   - opendir failures retry indefinitely (patch 1, capped only by PCI
>     device disappearance)
>   - SIOCGIFHWADDR failures and -ENODEV from rte_eal_hotplug_add reschedule
>     without incrementing any counter (patches 2, 3)
>   - "no matching MAC" retries are bounded by mac_retry / 120 (patch 5)
> 
> Only patch 5's path has a hard upper bound. The other paths rely on the PCI
> device eventually disappearing if the VF never comes up. On a permanently
> broken VF that stays present in sysfs but never registers a netdev, the EAL alarm
> thread will spin forever. Worth confirming that this is the intended behaviour, or
> adding an overall ceiling (e.g., a much higher cap than 30s but still finite) for the
> opendir / ioctl / ENODEV paths.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-05-06  2:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-05 20:44 [PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Long Li
2026-05-05 20:44 ` [PATCH 2/7] net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug Long Li
2026-05-05 20:44 ` [PATCH 3/7] net/netvsc: retry full probe when IB device not ready during hotplug Long Li
2026-05-05 20:44 ` [PATCH 4/7] net/netvsc: add NOTICE-level debug logging for VF hotplug retry Long Li
2026-05-05 20:44 ` [PATCH 5/7] net/netvsc: retry when no matching MAC found in net directory Long Li
2026-05-05 20:44 ` [PATCH 6/7] net/netvsc: forward per-queue stats from VF device Long Li
2026-05-05 20:44 ` [PATCH 7/7] net/netvsc: handle VF recovery events for service reset Long Li
2026-05-05 22:03 ` [PATCH 1/7] net/netvsc: retry VF hotplug indefinitely until PCI device disappears Stephen Hemminger
2026-05-06  2:06   ` [EXTERNAL] " Long Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox