public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/5] idpf: skip getting/setting ring params if vport is NULL during HW reset
@ 2026-01-07  1:04 Li Li
  2026-01-07  1:05 ` [PATCH 2/5] idpf: skip changing MTU " Li Li
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Li Li @ 2026-01-07  1:04 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, David S. Miller, Jakub Kicinski,
	Eric Dumazet, intel-wired-lan
  Cc: netdev, linux-kernel, David Decotigny, Anjali Singhai,
	Sridhar Samudrala, Brian Vazquez, Li Li, emil.s.tantilov

When an idpf HW reset is triggered, it clears the vport but does
not clear the netdev held by vport:

    // In idpf_vport_dealloc() called by idpf_init_hard_reset(),
    // idpf_init_hard_reset() sets IDPF_HR_RESET_IN_PROG, so
    // idpf_decfg_netdev() doesn't get called.
    if (!test_bit(IDPF_HR_RESET_IN_PROG, adapter->flags))
        idpf_decfg_netdev(vport);
    // idpf_decfg_netdev() would clear netdev but it isn't called:
    unregister_netdev(vport->netdev);
    free_netdev(vport->netdev);
    vport->netdev = NULL;
    // Later in idpf_init_hard_reset(), the vport is cleared:
    kfree(adapter->vports);
    adapter->vports = NULL;

During an idpf HW reset, when "ethtool -g/-G" is called on the netdev,
the vport associated with the netdev is NULL, and so a kernel panic
would happen:

[  513.185327] BUG: kernel NULL pointer dereference, address: 0000000000000038
...
[  513.232756] RIP: 0010:idpf_get_ringparam+0x45/0x80

This can be reproduced reliably by injecting a TX timeout to cause
an idpf HW reset, and injecting a virtchnl error to cause the HW
reset to fail and retry, while calling "ethtool -g/-G" on the netdev
at the same time.

With this patch applied, we see the following error but no kernel
panics anymore:

[  476.323630] idpf 0000:05:00.0 eth1: failed to get ring params due to no vport in netdev

Signed-off-by: Li Li <boolli@google.com>
---
 drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
index d5711be0b8e69..6a4b630b786c2 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
@@ -639,6 +638,10 @@ static void idpf_get_ringparam(struct net_device *netdev,
 
 	idpf_vport_ctrl_lock(netdev);
 	vport = idpf_netdev_to_vport(netdev);
+	if (!vport) {
+		netdev_err(netdev, "failed to get ring params due to no vport in netdev\n");
+		goto unlock;
+	}
 
 	ring->rx_max_pending = IDPF_MAX_RXQ_DESC;
 	ring->tx_max_pending = IDPF_MAX_TXQ_DESC;
@@ -647,6 +651,7 @@ static void idpf_get_ringparam(struct net_device *netdev,
 
 	kring->tcp_data_split = idpf_vport_get_hsplit(vport);
 
+unlock:
 	idpf_vport_ctrl_unlock(netdev);
 }
 
@@ -673,6 +674,11 @@ static int idpf_set_ringparam(struct net_device *netdev,
 
 	idpf_vport_ctrl_lock(netdev);
 	vport = idpf_netdev_to_vport(netdev);
+	if (!vport) {
+		netdev_err(netdev, "ring params not changed due to no vport in netdev\n");
+		err = -EFAULT;
+		goto unlock_mutex;
+	}
 
 	idx = vport->idx;
 
-- 
2.52.0.351.gbe84eed79e-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/5] idpf: skip changing MTU if vport is NULL during HW reset
  2026-01-07  1:04 [PATCH 1/5] idpf: skip getting/setting ring params if vport is NULL during HW reset Li Li
@ 2026-01-07  1:05 ` Li Li
  2026-01-07  1:05 ` [PATCH 3/5] idpf: skip getting RX flow rules " Li Li
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Li Li @ 2026-01-07  1:05 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, David S. Miller, Jakub Kicinski,
	Eric Dumazet, intel-wired-lan
  Cc: netdev, linux-kernel, David Decotigny, Anjali Singhai,
	Sridhar Samudrala, Brian Vazquez, Li Li, emil.s.tantilov

When an idpf HW reset is triggered, it clears the vport but does
not clear the netdev held by vport:

    // In idpf_vport_dealloc() called by idpf_init_hard_reset(),
    // idpf_init_hard_reset() sets IDPF_HR_RESET_IN_PROG, so
    // idpf_decfg_netdev() doesn't get called.
    if (!test_bit(IDPF_HR_RESET_IN_PROG, adapter->flags))
        idpf_decfg_netdev(vport);
    // idpf_decfg_netdev() would clear netdev but it isn't called:
    unregister_netdev(vport->netdev);
    free_netdev(vport->netdev);
    vport->netdev = NULL;
    // Later in idpf_init_hard_reset(), the vport is cleared:
    kfree(adapter->vports);
    adapter->vports = NULL;

During an idpf HW reset, when userspace changes the MTU of the netdev,
the vport associated with the netdev is NULL, and so a kernel panic
would happen:

[ 2081.955742] BUG: kernel NULL pointer dereference, address: 0000000000000068
...
[ 2082.002739] RIP: 0010:idpf_initiate_soft_reset+0x19/0x190

This can be reproduced reliably by injecting a TX timeout to cause
an idpf HW reset, and injecting a virtchnl error to cause the HW
reset to fail and retry, while changing the MTU of the netdev in
userspace.

With this patch applied, we see the following error but no kernel
panics anymore:

[  304.291346] idpf 0000:05:00.0 eth1: mtu not changed due to no vport innetdev
RTNETLINK answers: Bad address

Signed-off-by: Li Li <boolli@google.com>
---
 drivers/net/ethernet/intel/idpf/idpf_lib.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
index 57b8b3fd9124c..53b31989722a7 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
@@ -2328,11 +2327,17 @@ static int idpf_change_mtu(struct net_device *netdev, int new_mtu)
 
 	idpf_vport_ctrl_lock(netdev);
 	vport = idpf_netdev_to_vport(netdev);
+	if (!vport) {
+		netdev_err(netdev, "mtu not changed due to no vport in netdev\n");
+		err = -EFAULT;
+		goto unlock;
+	}
 
 	WRITE_ONCE(netdev->mtu, new_mtu);
 
 	err = idpf_initiate_soft_reset(vport, IDPF_SR_MTU_CHANGE);
 
+unlock:
 	idpf_vport_ctrl_unlock(netdev);
 
 	return err;
-- 
2.52.0.351.gbe84eed79e-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/5] idpf: skip getting RX flow rules if vport is NULL during HW reset
  2026-01-07  1:04 [PATCH 1/5] idpf: skip getting/setting ring params if vport is NULL during HW reset Li Li
  2026-01-07  1:05 ` [PATCH 2/5] idpf: skip changing MTU " Li Li
@ 2026-01-07  1:05 ` Li Li
  2026-01-12  9:58   ` [Intel-wired-lan] " Loktionov, Aleksandr
  2026-01-07  1:05 ` [PATCH 4/5] idpf: skip setting channels " Li Li
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Li Li @ 2026-01-07  1:05 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, David S. Miller, Jakub Kicinski,
	Eric Dumazet, intel-wired-lan
  Cc: netdev, linux-kernel, David Decotigny, Anjali Singhai,
	Sridhar Samudrala, Brian Vazquez, Li Li, emil.s.tantilov

When an idpf HW reset is triggered, it clears the vport but does
not clear the netdev held by vport:

    // In idpf_vport_dealloc() called by idpf_init_hard_reset(),
    // idpf_init_hard_reset() sets IDPF_HR_RESET_IN_PROG, so
    // idpf_decfg_netdev() doesn't get called.
    if (!test_bit(IDPF_HR_RESET_IN_PROG, adapter->flags))
        idpf_decfg_netdev(vport);
    // idpf_decfg_netdev() would clear netdev but it isn't called:
    unregister_netdev(vport->netdev);
    free_netdev(vport->netdev);
    vport->netdev = NULL;
    // Later in idpf_init_hard_reset(), the vport is cleared:
    kfree(adapter->vports);
    adapter->vports = NULL;

During an idpf HW reset, when userspace gets RX flow classification
rules of the netdev, the vport associated with the netdev is NULL,
and so a kernel panic would happen:

[ 1466.308592] BUG: kernel NULL pointer dereference, address: 0000000000000032
...
[ 1466.356222] RIP: 0010:idpf_get_rxnfc+0x3b/0x70

This can be reproduced reliably by injecting a TX timeout to cause
an idpf HW reset, and injecting a virtchnl error to cause the HW
reset to fail and retry, while running "ethtool -n" in userspace.

With this patch applied, we see the following error but no kernel
panics anymore:

[  312.476576] idpf 0000:05:00.0 eth1: failed to get rules due to no vport in netdev
Cannot get RX rings: Bad address
rxclass: Cannot get RX class rule count: Bad address
RX classification rule retrieval failed

Signed-off-by: Li Li <boolli@google.com>
---
 drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
index 6a4b630b786c2..c71af85408a29 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
@@ -45,6 +44,11 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
 
 	idpf_vport_ctrl_lock(netdev);
 	vport = idpf_netdev_to_vport(netdev);
+	if (!vport) {
+		netdev_err(netdev, "failed to get rules due to no vport in netdev\n");
+		err = -EFAULT;
+		goto unlock;
+	}
 	vport_config = np->adapter->vport_config[np->vport_idx];
 	user_config = &vport_config->user_config;
 
@@ -85,6 +90,7 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
 		break;
 	}
 
+unlock:
 	idpf_vport_ctrl_unlock(netdev);
 
 	return err;
-- 
2.52.0.351.gbe84eed79e-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/5] idpf: skip setting channels if vport is NULL during HW reset
  2026-01-07  1:04 [PATCH 1/5] idpf: skip getting/setting ring params if vport is NULL during HW reset Li Li
  2026-01-07  1:05 ` [PATCH 2/5] idpf: skip changing MTU " Li Li
  2026-01-07  1:05 ` [PATCH 3/5] idpf: skip getting RX flow rules " Li Li
@ 2026-01-07  1:05 ` Li Li
  2026-01-07  1:05 ` [PATCH 5/5] idpf: skip stopping/opening vport if it " Li Li
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Li Li @ 2026-01-07  1:05 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, David S. Miller, Jakub Kicinski,
	Eric Dumazet, intel-wired-lan
  Cc: netdev, linux-kernel, David Decotigny, Anjali Singhai,
	Sridhar Samudrala, Brian Vazquez, Li Li, emil.s.tantilov

When an idpf HW reset is triggered, it clears the vport but does
not clear the netdev held by vport:

    // In idpf_vport_dealloc() called by idpf_init_hard_reset(),
    // idpf_init_hard_reset() sets IDPF_HR_RESET_IN_PROG, so
    // idpf_decfg_netdev() doesn't get called.
    if (!test_bit(IDPF_HR_RESET_IN_PROG, adapter->flags))
        idpf_decfg_netdev(vport);
    // idpf_decfg_netdev() would clear netdev but it isn't called:
    unregister_netdev(vport->netdev);
    free_netdev(vport->netdev);
    vport->netdev = NULL;
    // Later in idpf_init_hard_reset(), the vport is cleared:
    kfree(adapter->vports);
    adapter->vports = NULL;

During an idpf HW reset, when userspace changes the netdev channels,
the vport associated with the netdev is NULL, and so a kernel panic
would happen:

[ 2245.795117] BUG: kernel NULL pointer dereference, address: 0000000000000088
...
[ 2245.842720] RIP: 0010:idpf_set_channels+0x40/0x120

This can be reproduced reliably by injecting a TX timeout to cause
an idpf HW reset, and injecting a virtchnl error to cause the HW
reset to fail and retry, while running "ethtool -L" in userspace.

With this patch applied, we see the following error but no kernel
panics anymore:

[ 1176.743096] idpf 0000:05:00.0 eth1: channels not changed due to no vport in netdev
netlink error: Bad address

Signed-off-by: Li Li <boolli@google.com>
---
 drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
index c71af85408a29..1b03528041af4 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
@@ -580,6 +579,11 @@ static int idpf_set_channels(struct net_device *netdev,
 
 	idpf_vport_ctrl_lock(netdev);
 	vport = idpf_netdev_to_vport(netdev);
+	if (!vport) {
+		netdev_err(netdev, "channels not changed due to no vport in netdev\n");
+		err = -EFAULT;
+		goto unlock_mutex;
+	}
 
 	idx = vport->idx;
 	vport_config = vport->adapter->vport_config[idx];
-- 
2.52.0.351.gbe84eed79e-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/5] idpf: skip stopping/opening vport if it is NULL during HW reset
  2026-01-07  1:04 [PATCH 1/5] idpf: skip getting/setting ring params if vport is NULL during HW reset Li Li
                   ` (2 preceding siblings ...)
  2026-01-07  1:05 ` [PATCH 4/5] idpf: skip setting channels " Li Li
@ 2026-01-07  1:05 ` Li Li
  2026-01-09  6:06   ` [Intel-wired-lan] " Loktionov, Aleksandr
  2026-01-07  5:30 ` [Intel-wired-lan] [PATCH 1/5] idpf: skip getting/setting ring params if vport " Paul Menzel
  2026-01-07 17:41 ` Tantilov, Emil S
  5 siblings, 1 reply; 12+ messages in thread
From: Li Li @ 2026-01-07  1:05 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, David S. Miller, Jakub Kicinski,
	Eric Dumazet, intel-wired-lan
  Cc: netdev, linux-kernel, David Decotigny, Anjali Singhai,
	Sridhar Samudrala, Brian Vazquez, Li Li, emil.s.tantilov

When an idpf HW reset is triggered, it clears the vport but does
not clear the netdev held by vport:

    // In idpf_vport_dealloc() called by idpf_init_hard_reset(),
    // idpf_init_hard_reset() sets IDPF_HR_RESET_IN_PROG, so
    // idpf_decfg_netdev() doesn't get called.
    if (!test_bit(IDPF_HR_RESET_IN_PROG, adapter->flags))
        idpf_decfg_netdev(vport);
    // idpf_decfg_netdev() would clear netdev but it isn't called:
    unregister_netdev(vport->netdev);
    free_netdev(vport->netdev);
    vport->netdev = NULL;
    // Later in idpf_init_hard_reset(), the vport is cleared:
    kfree(adapter->vports);
    adapter->vports = NULL;

During an idpf HW reset, when userspace restarts the network service,
the vport associated with the netdev is NULL, and so a kernel panic would
happen:

[ 1791.669339] BUG: kernel NULL pointer dereference, address: 0000000000000070
...
[ 1791.717130] RIP: 0010:idpf_vport_stop+0x16/0x1c0

This can be reproduced reliably by injecting a TX timeout to cause
an idpf HW reset, and injecting a virtchnl error to cause the HW
reset to fail and retry, while running "service network restart" in
userspace.

With this patch applied, we see the following error but no kernel
panics anymore:

[  181.409483] idpf 0000:05:00.0 eth1: mtu not changed due to no vport innetdev
RTNETLINK answers: Bad address
...
[  181.913644] idpf 0000:05:00.0 eth1: not stopping vport because it is NULL
[  181.938675] idpf 0000:05:00.0 eth1: mtu not changed due to no vport in netdev
...
[  242.849499] idpf 0000:05:00.0 eth1: not opening vport because it is NULL
...
[  304.289364] idpf 0000:05:00.0 eth0: not opening vport because it is NULL

Signed-off-by: Li Li <boolli@google.com>
---
 drivers/net/ethernet/intel/idpf/idpf_lib.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
index 53b31989722a7..a9a556499262b 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
@@ -1021,6 +1021,8 @@ static void idpf_vport_stop(struct idpf_vport *vport, bool rtnl)
  */
 static int idpf_stop(struct net_device *netdev)
 {
+	if (!netdev)
+		return 0;
 	struct idpf_netdev_priv *np = netdev_priv(netdev);
 	struct idpf_vport *vport;
 
@@ -1029,9 +1031,14 @@ static int idpf_stop(struct net_device *netdev)
 
 	idpf_vport_ctrl_lock(netdev);
 	vport = idpf_netdev_to_vport(netdev);
+	if (!vport) {
+		netdev_err(netdev, "not stopping vport because it is NULL");
+		goto unlock;
+	}
 
 	idpf_vport_stop(vport, false);
 
+unlock:
 	idpf_vport_ctrl_unlock(netdev);
 
 	return 0;
@@ -2301,6 +2308,11 @@ static int idpf_open(struct net_device *netdev)
 
 	idpf_vport_ctrl_lock(netdev);
 	vport = idpf_netdev_to_vport(netdev);
+	if (!vport) {
+		netdev_err(netdev, "not opening vport because it is NULL");
+		err = -EFAULT;
+		goto unlock;
+	}
 
 	err = idpf_set_real_num_queues(vport);
 	if (err)
-- 
2.52.0.351.gbe84eed79e-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Intel-wired-lan] [PATCH 1/5] idpf: skip getting/setting ring params if vport is NULL during HW reset
  2026-01-07  1:04 [PATCH 1/5] idpf: skip getting/setting ring params if vport is NULL during HW reset Li Li
                   ` (3 preceding siblings ...)
  2026-01-07  1:05 ` [PATCH 5/5] idpf: skip stopping/opening vport if it " Li Li
@ 2026-01-07  5:30 ` Paul Menzel
  2026-01-07 18:39   ` Li Li
  2026-01-07 17:41 ` Tantilov, Emil S
  5 siblings, 1 reply; 12+ messages in thread
From: Paul Menzel @ 2026-01-07  5:30 UTC (permalink / raw)
  To: Li Li
  Cc: Tony Nguyen, Przemek Kitszel, David S. Miller, Jakub Kicinski,
	Eric Dumazet, intel-wired-lan, netdev, linux-kernel,
	David Decotigny, Anjali Singhai, Sridhar Samudrala, Brian Vazquez,
	emil.s.tantilov

Dear Li,


Thank you for your patch.

Am 07.01.26 um 02:04 schrieb Li Li via Intel-wired-lan:
> When an idpf HW reset is triggered, it clears the vport but does
> not clear the netdev held by vport:
> 
>      // In idpf_vport_dealloc() called by idpf_init_hard_reset(),
>      // idpf_init_hard_reset() sets IDPF_HR_RESET_IN_PROG, so
>      // idpf_decfg_netdev() doesn't get called.

No need to format this as code comments. At least it confused me a little.

>      if (!test_bit(IDPF_HR_RESET_IN_PROG, adapter->flags))
>          idpf_decfg_netdev(vport);
>      // idpf_decfg_netdev() would clear netdev but it isn't called:
>      unregister_netdev(vport->netdev);
>      free_netdev(vport->netdev);
>      vport->netdev = NULL;
>      // Later in idpf_init_hard_reset(), the vport is cleared:
>      kfree(adapter->vports);
>      adapter->vports = NULL;
> 
> During an idpf HW reset, when "ethtool -g/-G" is called on the netdev,
> the vport associated with the netdev is NULL, and so a kernel panic
> would happen:
> 
> [  513.185327] BUG: kernel NULL pointer dereference, address: 0000000000000038
> ...
> [  513.232756] RIP: 0010:idpf_get_ringparam+0x45/0x80
> 
> This can be reproduced reliably by injecting a TX timeout to cause
> an idpf HW reset, and injecting a virtchnl error to cause the HW
> reset to fail and retry, while calling "ethtool -g/-G" on the netdev
> at the same time.

If you shared the commands, how to do that, it would make reproducing 
the issue easier.

> With this patch applied, we see the following error but no kernel
> panics anymore:
> 
> [  476.323630] idpf 0000:05:00.0 eth1: failed to get ring params due to no vport in netdev
> 
> Signed-off-by: Li Li <boolli@google.com>
> ---
>   drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> index d5711be0b8e69..6a4b630b786c2 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> @@ -639,6 +638,10 @@ static void idpf_get_ringparam(struct net_device *netdev,
>   
>   	idpf_vport_ctrl_lock(netdev);
>   	vport = idpf_netdev_to_vport(netdev);
> +	if (!vport) {
> +		netdev_err(netdev, "failed to get ring params due to no vport in netdev\n");

If vport == NULL is expected, why log it as an error. What should the 
user do? Wait until reset is done?

> +		goto unlock;
> +	}
>   
>   	ring->rx_max_pending = IDPF_MAX_RXQ_DESC;
>   	ring->tx_max_pending = IDPF_MAX_TXQ_DESC;
> @@ -647,6 +651,7 @@ static void idpf_get_ringparam(struct net_device *netdev,
>   
>   	kring->tcp_data_split = idpf_vport_get_hsplit(vport);
>   
> +unlock:
>   	idpf_vport_ctrl_unlock(netdev);
>   }
>   
> @@ -673,6 +674,11 @@ static int idpf_set_ringparam(struct net_device *netdev,
>   
>   	idpf_vport_ctrl_lock(netdev);
>   	vport = idpf_netdev_to_vport(netdev);
> +	if (!vport) {
> +		netdev_err(netdev, "ring params not changed due to no vport in netdev\n");
> +		err = -EFAULT;
> +		goto unlock_mutex;
> +	}
>   
>   	idx = vport->idx;
>   

Is there another – possible more involved – solution possible to wait 
until the hardware reset finished?


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-wired-lan] [PATCH 1/5] idpf: skip getting/setting ring params if vport is NULL during HW reset
  2026-01-07  1:04 [PATCH 1/5] idpf: skip getting/setting ring params if vport is NULL during HW reset Li Li
                   ` (4 preceding siblings ...)
  2026-01-07  5:30 ` [Intel-wired-lan] [PATCH 1/5] idpf: skip getting/setting ring params if vport " Paul Menzel
@ 2026-01-07 17:41 ` Tantilov, Emil S
  2026-01-07 18:40   ` Li Li
  5 siblings, 1 reply; 12+ messages in thread
From: Tantilov, Emil S @ 2026-01-07 17:41 UTC (permalink / raw)
  To: Li Li, Tony Nguyen, Przemek Kitszel, David S. Miller,
	Jakub Kicinski, Eric Dumazet, intel-wired-lan
  Cc: netdev, linux-kernel, David Decotigny, Anjali Singhai,
	Sridhar Samudrala, Brian Vazquez



On 1/6/2026 5:04 PM, Li Li via Intel-wired-lan wrote:
> When an idpf HW reset is triggered, it clears the vport but does
> not clear the netdev held by vport:
> 
>      // In idpf_vport_dealloc() called by idpf_init_hard_reset(),
>      // idpf_init_hard_reset() sets IDPF_HR_RESET_IN_PROG, so
>      // idpf_decfg_netdev() doesn't get called.
>      if (!test_bit(IDPF_HR_RESET_IN_PROG, adapter->flags))
>          idpf_decfg_netdev(vport);
>      // idpf_decfg_netdev() would clear netdev but it isn't called:
>      unregister_netdev(vport->netdev);
>      free_netdev(vport->netdev);
>      vport->netdev = NULL;
>      // Later in idpf_init_hard_reset(), the vport is cleared:
>      kfree(adapter->vports);
>      adapter->vports = NULL;
> 
> During an idpf HW reset, when "ethtool -g/-G" is called on the netdev,
> the vport associated with the netdev is NULL, and so a kernel panic
> would happen:
> 
> [  513.185327] BUG: kernel NULL pointer dereference, address: 0000000000000038
> ...
> [  513.232756] RIP: 0010:idpf_get_ringparam+0x45/0x80
> 
> This can be reproduced reliably by injecting a TX timeout to cause
> an idpf HW reset, and injecting a virtchnl error to cause the HW
> reset to fail and retry, while calling "ethtool -g/-G" on the netdev
> at the same time.

I have posted series that resolves these issues in the reset path by
reshuffling the flow a bit and adding netif_device_detach/attach to
make sure the netdevs are better protected in the middle of a reset:
https://lore.kernel.org/intel-wired-lan/20251121001218.4565-1-emil.s.tantilov@intel.com/

If you are still seeing issues with the above applied, let me know and I
can take a look.

> 
> With this patch applied, we see the following error but no kernel
> panics anymore:
> 
> [  476.323630] idpf 0000:05:00.0 eth1: failed to get ring params due to no vport in netdev
> 
> Signed-off-by: Li Li <boolli@google.com>
> ---
>   drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> index d5711be0b8e69..6a4b630b786c2 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> @@ -639,6 +638,10 @@ static void idpf_get_ringparam(struct net_device *netdev,
>   
>   	idpf_vport_ctrl_lock(netdev);
>   	vport = idpf_netdev_to_vport(netdev);
> +	if (!vport) {

We used to have these all over the place, but the code was changed to
rely on idpf_vport_ctrl_lock() for the protection of the vport state.
Still some issues remain with the error paths (hence the series above),
but in general we don't want to resort to vport NULL checks and rather
fix the reset flows to rely on cleaner logic and locks.

Thanks,
Emil

> +		netdev_err(netdev, "failed to get ring params due to no vport in netdev\n");
> +		goto unlock;
> +	}
>   
>   	ring->rx_max_pending = IDPF_MAX_RXQ_DESC;
>   	ring->tx_max_pending = IDPF_MAX_TXQ_DESC;
> @@ -647,6 +651,7 @@ static void idpf_get_ringparam(struct net_device *netdev,
>   
>   	kring->tcp_data_split = idpf_vport_get_hsplit(vport);
>   
> +unlock:
>   	idpf_vport_ctrl_unlock(netdev);
>   }
>   
> @@ -673,6 +674,11 @@ static int idpf_set_ringparam(struct net_device *netdev,
>   
>   	idpf_vport_ctrl_lock(netdev);
>   	vport = idpf_netdev_to_vport(netdev);
> +	if (!vport) {
> +		netdev_err(netdev, "ring params not changed due to no vport in netdev\n");
> +		err = -EFAULT;
> +		goto unlock_mutex;
> +	}
>   
>   	idx = vport->idx;
>   


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-wired-lan] [PATCH 1/5] idpf: skip getting/setting ring params if vport is NULL during HW reset
  2026-01-07  5:30 ` [Intel-wired-lan] [PATCH 1/5] idpf: skip getting/setting ring params if vport " Paul Menzel
@ 2026-01-07 18:39   ` Li Li
  0 siblings, 0 replies; 12+ messages in thread
From: Li Li @ 2026-01-07 18:39 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Tony Nguyen, Przemek Kitszel, David S. Miller, Jakub Kicinski,
	Eric Dumazet, intel-wired-lan, netdev, linux-kernel,
	David Decotigny, Anjali Singhai, Sridhar Samudrala, Brian Vazquez,
	emil.s.tantilov

On Tue, Jan 6, 2026 at 9:30 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Li,
>
>
> Thank you for your patch.
>
> Am 07.01.26 um 02:04 schrieb Li Li via Intel-wired-lan:
> > When an idpf HW reset is triggered, it clears the vport but does
> > not clear the netdev held by vport:
> >
> >      // In idpf_vport_dealloc() called by idpf_init_hard_reset(),
> >      // idpf_init_hard_reset() sets IDPF_HR_RESET_IN_PROG, so
> >      // idpf_decfg_netdev() doesn't get called.
>
> No need to format this as code comments. At least it confused me a little.

Thanks for the pointer. Will drop the comment format in the future.

>
> >      if (!test_bit(IDPF_HR_RESET_IN_PROG, adapter->flags))
> >          idpf_decfg_netdev(vport);
> >      // idpf_decfg_netdev() would clear netdev but it isn't called:
> >      unregister_netdev(vport->netdev);
> >      free_netdev(vport->netdev);
> >      vport->netdev = NULL;
> >      // Later in idpf_init_hard_reset(), the vport is cleared:
> >      kfree(adapter->vports);
> >      adapter->vports = NULL;
> >
> > During an idpf HW reset, when "ethtool -g/-G" is called on the netdev,
> > the vport associated with the netdev is NULL, and so a kernel panic
> > would happen:
> >
> > [  513.185327] BUG: kernel NULL pointer dereference, address: 0000000000000038
> > ...
> > [  513.232756] RIP: 0010:idpf_get_ringparam+0x45/0x80
> >
> > This can be reproduced reliably by injecting a TX timeout to cause
> > an idpf HW reset, and injecting a virtchnl error to cause the HW
> > reset to fail and retry, while calling "ethtool -g/-G" on the netdev
> > at the same time.
>
> If you shared the commands, how to do that, it would make reproducing
> the issue easier.

Here's what I did to introduce TX timeouts and virtchnl timeouts at run time:

--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -15,6 +15,9 @@ struct idpf_tx_stash {
 #define idpf_tx_buf_compl_tag(buf)     (*(u32 *)&(buf)->priv)
 LIBETH_SQE_CHECK_PRIV(u32);

+static bool SIMULATE_TX_TIMEOUT;
+module_param(SIMULATE_TX_TIMEOUT, bool, 0644);
+
 /**
  * idpf_chk_linearize - Check if skb exceeds max descriptors per packet
  * @skb: send buffer
@@ -79,6 +82,8 @@ void idpf_tx_timeout(struct net_device *netdev,
unsigned int txqueue)

        adapter->tx_timeout_count++;

+       SIMULATE_TX_TIMEOUT = false;
+
        netdev_err(netdev, "Detected Tx timeout: Count %d, Queue %d\n",
                   adapter->tx_timeout_count, txqueue);
        if (!idpf_is_reset_in_prog(adapter)) {
@@ -2028,6 +2033,12 @@ static bool idpf_tx_clean_complq(struct
idpf_compl_queue *complq, int budget,
                }
                tx_q = complq->txq_grp->txqs[rel_tx_qid];

+               if (unlikely(SIMULATE_TX_TIMEOUT && (tx_q->idx == 1))) {
+                       netdev_err(tx_q->netdev, "boolli test:
triggering TX timeout for TX queue id %d\n", tx_q->idx);
+                       goto fetch_next_desc;
+               }
+
+
                /* Determine completion type */
                ctype = le16_get_bits(tx_desc->qid_comptype_gen,
                                      IDPF_TXD_COMPLQ_COMPL_TYPE_M);

--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -6,6 +6,9 @@
 #include "idpf.h"
 #include "idpf_virtchnl.h"

+static bool SIMULATE_VC_TIMEOUT;
+module_param(SIMULATE_VC_TIMEOUT, bool, 0644);
+
 #define IDPF_VC_XN_MIN_TIMEOUT_MSEC    2000
 #define IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC        (60 * 1000)
 #define IDPF_VC_XN_IDX_M               GENMASK(7, 0)
@@ -800,6 +803,10 @@ static int idpf_send_ver_msg(struct idpf_adapter *adapter)
        xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;

        reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
+       if (SIMULATE_VC_TIMEOUT) {
+               dev_err(&adapter->pdev->dev, "boolli test: simulating
VC timeout by returning -ETIME in idpf_send_ver_msg");
+               reply_sz = -ETIME;
+       }
        if (reply_sz < 0)
                return reply_sz;
        if (reply_sz < sizeof(vvi))

Then after the kernel is booted, we can introduce the TX timeout that
lasts forever by doing the following:

echo 1 | tee /sys/module/idpf/parameters/SIMULATE_TX_TIMEOUT && echo 1
| tee /sys/module/idpf/parameters/SIMULATE_VC_TIMEOUT

All my experiments in this patch series were performed after the
kernel was put in such a state.

>
> > With this patch applied, we see the following error but no kernel
> > panics anymore:
> >
> > [  476.323630] idpf 0000:05:00.0 eth1: failed to get ring params due to no vport in netdev
> >
> > Signed-off-by: Li Li <boolli@google.com>
> > ---
> >   drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 12 ++++++++++++
> >   1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> > index d5711be0b8e69..6a4b630b786c2 100644
> > --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> > +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> > @@ -639,6 +638,10 @@ static void idpf_get_ringparam(struct net_device *netdev,
> >
> >       idpf_vport_ctrl_lock(netdev);
> >       vport = idpf_netdev_to_vport(netdev);
> > +     if (!vport) {
> > +             netdev_err(netdev, "failed to get ring params due to no vport in netdev\n");
>
> If vport == NULL is expected, why log it as an error. What should the
> user do? Wait until reset is done?
>
> > +             goto unlock;
> > +     }
> >
> >       ring->rx_max_pending = IDPF_MAX_RXQ_DESC;
> >       ring->tx_max_pending = IDPF_MAX_TXQ_DESC;
> > @@ -647,6 +651,7 @@ static void idpf_get_ringparam(struct net_device *netdev,
> >
> >       kring->tcp_data_split = idpf_vport_get_hsplit(vport);
> >
> > +unlock:
> >       idpf_vport_ctrl_unlock(netdev);
> >   }
> >
> > @@ -673,6 +674,11 @@ static int idpf_set_ringparam(struct net_device *netdev,
> >
> >       idpf_vport_ctrl_lock(netdev);
> >       vport = idpf_netdev_to_vport(netdev);
> > +     if (!vport) {
> > +             netdev_err(netdev, "ring params not changed due to no vport in netdev\n");
> > +             err = -EFAULT;
> > +             goto unlock_mutex;
> > +     }
> >
> >       idx = vport->idx;
> >
>
> Is there another – possible more involved – solution possible to wait
> until the hardware reset finished?

Please see Emil's patch series at
https://lore.kernel.org/intel-wired-lan/20251121001218.4565-1-emil.s.tantilov@intel.com/,
in which https://lore.kernel.org/intel-wired-lan/20251121001218.4565-3-emil.s.tantilov@intel.com/
detaches the netdev at the start of a HW reset, which I also think is
a more elegant solution than mine.

I'm going to drop this patch series in favor of Emil's solution above.

>
>
> Kind regards,
>
> Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-wired-lan] [PATCH 1/5] idpf: skip getting/setting ring params if vport is NULL during HW reset
  2026-01-07 17:41 ` Tantilov, Emil S
@ 2026-01-07 18:40   ` Li Li
  0 siblings, 0 replies; 12+ messages in thread
From: Li Li @ 2026-01-07 18:40 UTC (permalink / raw)
  To: Tantilov, Emil S
  Cc: Tony Nguyen, Przemek Kitszel, David S. Miller, Jakub Kicinski,
	Eric Dumazet, intel-wired-lan, netdev, linux-kernel,
	David Decotigny, Anjali Singhai, Sridhar Samudrala, Brian Vazquez

Please reject this patch series given the underlying issue is fixed in
an earlier patch
series already, thanks.

On Wed, Jan 7, 2026 at 9:41 AM Tantilov, Emil S
<emil.s.tantilov@intel.com> wrote:
>
>
>
> On 1/6/2026 5:04 PM, Li Li via Intel-wired-lan wrote:
> > When an idpf HW reset is triggered, it clears the vport but does
> > not clear the netdev held by vport:
> >
> >      // In idpf_vport_dealloc() called by idpf_init_hard_reset(),
> >      // idpf_init_hard_reset() sets IDPF_HR_RESET_IN_PROG, so
> >      // idpf_decfg_netdev() doesn't get called.
> >      if (!test_bit(IDPF_HR_RESET_IN_PROG, adapter->flags))
> >          idpf_decfg_netdev(vport);
> >      // idpf_decfg_netdev() would clear netdev but it isn't called:
> >      unregister_netdev(vport->netdev);
> >      free_netdev(vport->netdev);
> >      vport->netdev = NULL;
> >      // Later in idpf_init_hard_reset(), the vport is cleared:
> >      kfree(adapter->vports);
> >      adapter->vports = NULL;
> >
> > During an idpf HW reset, when "ethtool -g/-G" is called on the netdev,
> > the vport associated with the netdev is NULL, and so a kernel panic
> > would happen:
> >
> > [  513.185327] BUG: kernel NULL pointer dereference, address: 0000000000000038
> > ...
> > [  513.232756] RIP: 0010:idpf_get_ringparam+0x45/0x80
> >
> > This can be reproduced reliably by injecting a TX timeout to cause
> > an idpf HW reset, and injecting a virtchnl error to cause the HW
> > reset to fail and retry, while calling "ethtool -g/-G" on the netdev
> > at the same time.
>
> I have posted series that resolves these issues in the reset path by
> reshuffling the flow a bit and adding netif_device_detach/attach to
> make sure the netdevs are better protected in the middle of a reset:
> https://lore.kernel.org/intel-wired-lan/20251121001218.4565-1-emil.s.tantilov@intel.com/
>
> If you are still seeing issues with the above applied, let me know and I
> can take a look.

Thanks Emil! Yes I performed the experiment at a commit past your
patch series above, and it
does look like the kernel panic does appear anymore. Now performing
ethtool commands during
HW resets would result in "netlink error: No such device", which is
expected because we are detaching
the netdev at the start of the HW reset.

Please reject this patch series, thanks!

>
> >
> > With this patch applied, we see the following error but no kernel
> > panics anymore:
> >
> > [  476.323630] idpf 0000:05:00.0 eth1: failed to get ring params due to no vport in netdev
> >
> > Signed-off-by: Li Li <boolli@google.com>
> > ---
> >   drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 12 ++++++++++++
> >   1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> > index d5711be0b8e69..6a4b630b786c2 100644
> > --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> > +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> > @@ -639,6 +638,10 @@ static void idpf_get_ringparam(struct net_device *netdev,
> >
> >       idpf_vport_ctrl_lock(netdev);
> >       vport = idpf_netdev_to_vport(netdev);
> > +     if (!vport) {
>
> We used to have these all over the place, but the code was changed to
> rely on idpf_vport_ctrl_lock() for the protection of the vport state.
> Still some issues remain with the error paths (hence the series above),
> but in general we don't want to resort to vport NULL checks and rather
> fix the reset flows to rely on cleaner logic and locks.
>
> Thanks,
> Emil
>
> > +             netdev_err(netdev, "failed to get ring params due to no vport in netdev\n");
> > +             goto unlock;
> > +     }
> >
> >       ring->rx_max_pending = IDPF_MAX_RXQ_DESC;
> >       ring->tx_max_pending = IDPF_MAX_TXQ_DESC;
> > @@ -647,6 +651,7 @@ static void idpf_get_ringparam(struct net_device *netdev,
> >
> >       kring->tcp_data_split = idpf_vport_get_hsplit(vport);
> >
> > +unlock:
> >       idpf_vport_ctrl_unlock(netdev);
> >   }
> >
> > @@ -673,6 +674,11 @@ static int idpf_set_ringparam(struct net_device *netdev,
> >
> >       idpf_vport_ctrl_lock(netdev);
> >       vport = idpf_netdev_to_vport(netdev);
> > +     if (!vport) {
> > +             netdev_err(netdev, "ring params not changed due to no vport in netdev\n");
> > +             err = -EFAULT;
> > +             goto unlock_mutex;
> > +     }
> >
> >       idx = vport->idx;
> >
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [Intel-wired-lan] [PATCH 5/5] idpf: skip stopping/opening vport if it is NULL during HW reset
  2026-01-07  1:05 ` [PATCH 5/5] idpf: skip stopping/opening vport if it " Li Li
@ 2026-01-09  6:06   ` Loktionov, Aleksandr
  2026-01-09  6:10     ` Loktionov, Aleksandr
  0 siblings, 1 reply; 12+ messages in thread
From: Loktionov, Aleksandr @ 2026-01-09  6:06 UTC (permalink / raw)
  To: Li Li, Nguyen, Anthony L, Kitszel, Przemyslaw, David S. Miller,
	Jakub Kicinski, Eric Dumazet, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Decotigny, Singhai, Anjali, Samudrala, Sridhar,
	Brian Vazquez, Tantilov, Emil S



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Li Li via Intel-wired-lan
> Sent: Wednesday, January 7, 2026 2:05 AM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>; David S. Miller
> <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; Eric
> Dumazet <edumazet@google.com>; intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; David
> Decotigny <decot@google.com>; Singhai, Anjali
> <anjali.singhai@intel.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; Brian Vazquez <brianvv@google.com>;
> Li Li <boolli@google.com>; Tantilov, Emil S
> <emil.s.tantilov@intel.com>
> Subject: [Intel-wired-lan] [PATCH 5/5] idpf: skip stopping/opening
> vport if it is NULL during HW reset
> 
> When an idpf HW reset is triggered, it clears the vport but does not
> clear the netdev held by vport:
> 
>     // In idpf_vport_dealloc() called by idpf_init_hard_reset(),
>     // idpf_init_hard_reset() sets IDPF_HR_RESET_IN_PROG, so
>     // idpf_decfg_netdev() doesn't get called.
>     if (!test_bit(IDPF_HR_RESET_IN_PROG, adapter->flags))
>         idpf_decfg_netdev(vport);
>     // idpf_decfg_netdev() would clear netdev but it isn't called:
>     unregister_netdev(vport->netdev);
>     free_netdev(vport->netdev);
>     vport->netdev = NULL;
>     // Later in idpf_init_hard_reset(), the vport is cleared:
>     kfree(adapter->vports);
>     adapter->vports = NULL;
> 
> During an idpf HW reset, when userspace restarts the network
> service, the vport associated with the netdev is NULL, and so a
> kernel panic would
> happen:
> 
> [ 1791.669339] BUG: kernel NULL pointer dereference, address:
> 0000000000000070 ...
> [ 1791.717130] RIP: 0010:idpf_vport_stop+0x16/0x1c0
> 
> This can be reproduced reliably by injecting a TX timeout to cause
> an idpf HW reset, and injecting a virtchnl error to cause the HW
> reset to fail and retry, while running "service network restart" in
> userspace.
> 
> With this patch applied, we see the following error but no kernel
> panics anymore:
> 
> [  181.409483] idpf 0000:05:00.0 eth1: mtu not changed due to no
> vport innetdev RTNETLINK answers: Bad address ...
> [  181.913644] idpf 0000:05:00.0 eth1: not stopping vport because it
> is NULL [  181.938675] idpf 0000:05:00.0 eth1: mtu not changed due
> to no vport in netdev ...
> [  242.849499] idpf 0000:05:00.0 eth1: not opening vport because it
> is NULL ...
> [  304.289364] idpf 0000:05:00.0 eth0: not opening vport because it
> is NULL
> 
> Signed-off-by: Li Li <boolli@google.com>
> ---
>  drivers/net/ethernet/intel/idpf/idpf_lib.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c
> b/drivers/net/ethernet/intel/idpf/idpf_lib.c
> index 53b31989722a7..a9a556499262b 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
> @@ -1021,6 +1021,8 @@ static void idpf_vport_stop(struct idpf_vport
> *vport, bool rtnl)
>   */
>  static int idpf_stop(struct net_device *netdev)  {
> +	if (!netdev)
> +		return 0;
>  	struct idpf_netdev_priv *np = netdev_priv(netdev);
>  	struct idpf_vport *vport;
> 
> @@ -1029,9 +1031,14 @@ static int idpf_stop(struct net_device
> *netdev)
> 
>  	idpf_vport_ctrl_lock(netdev);
>  	vport = idpf_netdev_to_vport(netdev);
> +	if (!vport) {
> +		netdev_err(netdev, "not stopping vport because it is
> NULL");
Please don't forget to add trailing '\n'.

> +		goto unlock;
> +	}
> 
>  	idpf_vport_stop(vport, false);
> 
> +unlock:
>  	idpf_vport_ctrl_unlock(netdev);
> 
>  	return 0;
> @@ -2301,6 +2308,11 @@ static int idpf_open(struct net_device
> *netdev)
> 
>  	idpf_vport_ctrl_lock(netdev);
>  	vport = idpf_netdev_to_vport(netdev);
> +	if (!vport) {
> +		netdev_err(netdev, "not opening vport because it is
> NULL");
Please don't forget to add trailing '\n', here too.

> +		err = -EFAULT;
> +		goto unlock;
> +	}
> 
>  	err = idpf_set_real_num_queues(vport);
>  	if (err)
> --
> 2.52.0.351.gbe84eed79e-goog

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [Intel-wired-lan] [PATCH 5/5] idpf: skip stopping/opening vport if it is NULL during HW reset
  2026-01-09  6:06   ` [Intel-wired-lan] " Loktionov, Aleksandr
@ 2026-01-09  6:10     ` Loktionov, Aleksandr
  0 siblings, 0 replies; 12+ messages in thread
From: Loktionov, Aleksandr @ 2026-01-09  6:10 UTC (permalink / raw)
  To: Loktionov, Aleksandr, Li Li, Nguyen, Anthony L,
	Kitszel, Przemyslaw, David S. Miller, Jakub Kicinski,
	Eric Dumazet, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Decotigny, Singhai, Anjali, Samudrala, Sridhar,
	Brian Vazquez, Tantilov, Emil S



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Loktionov, Aleksandr
> Sent: Friday, January 9, 2026 7:07 AM
> To: Li Li <boolli@google.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; David S. Miller <davem@davemloft.net>;
> Jakub Kicinski <kuba@kernel.org>; Eric Dumazet <edumazet@google.com>;
> intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; David
> Decotigny <decot@google.com>; Singhai, Anjali
> <anjali.singhai@intel.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; Brian Vazquez <brianvv@google.com>;
> Tantilov, Emil S <emil.s.tantilov@intel.com>
> Subject: Re: [Intel-wired-lan] [PATCH 5/5] idpf: skip stopping/opening
> vport if it is NULL during HW reset
> 
> 
> 
> > -----Original Message-----
> > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> > Of Li Li via Intel-wired-lan
> > Sent: Wednesday, January 7, 2026 2:05 AM
> > To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> > Przemyslaw <przemyslaw.kitszel@intel.com>; David S. Miller
> > <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; Eric
> Dumazet
> > <edumazet@google.com>; intel-wired-lan@lists.osuosl.org
> > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; David
> > Decotigny <decot@google.com>; Singhai, Anjali
> > <anjali.singhai@intel.com>; Samudrala, Sridhar
> > <sridhar.samudrala@intel.com>; Brian Vazquez <brianvv@google.com>;
> Li
> > Li <boolli@google.com>; Tantilov, Emil S <emil.s.tantilov@intel.com>
> > Subject: [Intel-wired-lan] [PATCH 5/5] idpf: skip stopping/opening
> > vport if it is NULL during HW reset
> >
> > When an idpf HW reset is triggered, it clears the vport but does not
> > clear the netdev held by vport:
> >
> >     // In idpf_vport_dealloc() called by idpf_init_hard_reset(),
> >     // idpf_init_hard_reset() sets IDPF_HR_RESET_IN_PROG, so
> >     // idpf_decfg_netdev() doesn't get called.
> >     if (!test_bit(IDPF_HR_RESET_IN_PROG, adapter->flags))
> >         idpf_decfg_netdev(vport);
> >     // idpf_decfg_netdev() would clear netdev but it isn't called:
> >     unregister_netdev(vport->netdev);
> >     free_netdev(vport->netdev);
> >     vport->netdev = NULL;
> >     // Later in idpf_init_hard_reset(), the vport is cleared:
> >     kfree(adapter->vports);
> >     adapter->vports = NULL;
> >
> > During an idpf HW reset, when userspace restarts the network
> service,
> > the vport associated with the netdev is NULL, and so a kernel panic
> > would
> > happen:
> >
> > [ 1791.669339] BUG: kernel NULL pointer dereference, address:
> > 0000000000000070 ...
> > [ 1791.717130] RIP: 0010:idpf_vport_stop+0x16/0x1c0
> >
> > This can be reproduced reliably by injecting a TX timeout to cause
> an
> > idpf HW reset, and injecting a virtchnl error to cause the HW reset
> to
> > fail and retry, while running "service network restart" in
> userspace.
> >
> > With this patch applied, we see the following error but no kernel
> > panics anymore:
> >
> > [  181.409483] idpf 0000:05:00.0 eth1: mtu not changed due to no
> vport
> > innetdev RTNETLINK answers: Bad address ...
"innetdev" -> "in netdev"

> > [  181.913644] idpf 0000:05:00.0 eth1: not stopping vport because it
> > is NULL [  181.938675] idpf 0000:05:00.0 eth1: mtu not changed due
> to
> > no vport in netdev ...
> > [  242.849499] idpf 0000:05:00.0 eth1: not opening vport because it
> is
> > NULL ...
> > [  304.289364] idpf 0000:05:00.0 eth0: not opening vport because it
> is
> > NULL
> >
> > Signed-off-by: Li Li <boolli@google.com>
> > ---
> >  drivers/net/ethernet/intel/idpf/idpf_lib.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c
> > b/drivers/net/ethernet/intel/idpf/idpf_lib.c
> > index 53b31989722a7..a9a556499262b 100644
> > --- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
> > +++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
> > @@ -1021,6 +1021,8 @@ static void idpf_vport_stop(struct idpf_vport
> > *vport, bool rtnl)
> >   */
> >  static int idpf_stop(struct net_device *netdev)  {
> > +	if (!netdev)
> > +		return 0;
> >  	struct idpf_netdev_priv *np = netdev_priv(netdev);
> >  	struct idpf_vport *vport;
> >
> > @@ -1029,9 +1031,14 @@ static int idpf_stop(struct net_device
> > *netdev)
> >
> >  	idpf_vport_ctrl_lock(netdev);
> >  	vport = idpf_netdev_to_vport(netdev);
> > +	if (!vport) {
> > +		netdev_err(netdev, "not stopping vport because it is
> > NULL");
> Please don't forget to add trailing '\n'.
> 
> > +		goto unlock;
> > +	}
> >
> >  	idpf_vport_stop(vport, false);
> >
> > +unlock:
> >  	idpf_vport_ctrl_unlock(netdev);
> >
> >  	return 0;
> > @@ -2301,6 +2308,11 @@ static int idpf_open(struct net_device
> > *netdev)
> >
> >  	idpf_vport_ctrl_lock(netdev);
> >  	vport = idpf_netdev_to_vport(netdev);
> > +	if (!vport) {
> > +		netdev_err(netdev, "not opening vport because it is
> > NULL");
> Please don't forget to add trailing '\n', here too.
> 
> > +		err = -EFAULT;
> > +		goto unlock;
> > +	}
> >
> >  	err = idpf_set_real_num_queues(vport);
> >  	if (err)
> > --
> > 2.52.0.351.gbe84eed79e-goog
> 
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [Intel-wired-lan] [PATCH 3/5] idpf: skip getting RX flow rules if vport is NULL during HW reset
  2026-01-07  1:05 ` [PATCH 3/5] idpf: skip getting RX flow rules " Li Li
@ 2026-01-12  9:58   ` Loktionov, Aleksandr
  0 siblings, 0 replies; 12+ messages in thread
From: Loktionov, Aleksandr @ 2026-01-12  9:58 UTC (permalink / raw)
  To: Li Li, Nguyen, Anthony L, Kitszel, Przemyslaw, David S. Miller,
	Jakub Kicinski, Eric Dumazet, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Decotigny, Singhai, Anjali, Samudrala, Sridhar,
	Brian Vazquez, Tantilov, Emil S



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Li Li via Intel-wired-lan
> Sent: Wednesday, January 7, 2026 2:05 AM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>; David S. Miller
> <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; Eric Dumazet
> <edumazet@google.com>; intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; David
> Decotigny <decot@google.com>; Singhai, Anjali
> <anjali.singhai@intel.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; Brian Vazquez <brianvv@google.com>; Li
> Li <boolli@google.com>; Tantilov, Emil S <emil.s.tantilov@intel.com>
> Subject: [Intel-wired-lan] [PATCH 3/5] idpf: skip getting RX flow
> rules if vport is NULL during HW reset
> 
> When an idpf HW reset is triggered, it clears the vport but does not
> clear the netdev held by vport:
> 
>     // In idpf_vport_dealloc() called by idpf_init_hard_reset(),
>     // idpf_init_hard_reset() sets IDPF_HR_RESET_IN_PROG, so
>     // idpf_decfg_netdev() doesn't get called.
>     if (!test_bit(IDPF_HR_RESET_IN_PROG, adapter->flags))
>         idpf_decfg_netdev(vport);
>     // idpf_decfg_netdev() would clear netdev but it isn't called:
>     unregister_netdev(vport->netdev);
>     free_netdev(vport->netdev);
>     vport->netdev = NULL;
>     // Later in idpf_init_hard_reset(), the vport is cleared:
>     kfree(adapter->vports);
>     adapter->vports = NULL;
> 
> During an idpf HW reset, when userspace gets RX flow classification
> rules of the netdev, the vport associated with the netdev is NULL, and
> so a kernel panic would happen:
> 
> [ 1466.308592] BUG: kernel NULL pointer dereference, address:
> 0000000000000032 ...
> [ 1466.356222] RIP: 0010:idpf_get_rxnfc+0x3b/0x70
> 
> This can be reproduced reliably by injecting a TX timeout to cause an
> idpf HW reset, and injecting a virtchnl error to cause the HW reset to
> fail and retry, while running "ethtool -n" in userspace.
> 
> With this patch applied, we see the following error but no kernel
> panics anymore:
> 
> [  312.476576] idpf 0000:05:00.0 eth1: failed to get rules due to no
> vport in netdev Cannot get RX rings: Bad address
> rxclass: Cannot get RX class rule count: Bad address RX classification
> rule retrieval failed
> 
> Signed-off-by: Li Li <boolli@google.com>
> ---
>  drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> index 6a4b630b786c2..c71af85408a29 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> @@ -45,6 +44,11 @@ static int idpf_get_rxnfc(struct net_device
> *netdev, struct ethtool_rxnfc *cmd,
> 
>  	idpf_vport_ctrl_lock(netdev);
>  	vport = idpf_netdev_to_vport(netdev);
> +	if (!vport) {
> +		netdev_err(netdev, "failed to get rules due to no vport
> in netdev\n");
> +		err = -EFAULT;
> +		goto unlock;
> +	}
>  	vport_config = np->adapter->vport_config[np->vport_idx];
>  	user_config = &vport_config->user_config;
> 
> @@ -85,6 +90,7 @@ static int idpf_get_rxnfc(struct net_device *netdev,
> struct ethtool_rxnfc *cmd,
>  		break;
>  	}
> 
> +unlock:
>  	idpf_vport_ctrl_unlock(netdev);
> 
>  	return err;
> --
> 2.52.0.351.gbe84eed79e-goog

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-01-12  9:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-07  1:04 [PATCH 1/5] idpf: skip getting/setting ring params if vport is NULL during HW reset Li Li
2026-01-07  1:05 ` [PATCH 2/5] idpf: skip changing MTU " Li Li
2026-01-07  1:05 ` [PATCH 3/5] idpf: skip getting RX flow rules " Li Li
2026-01-12  9:58   ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-01-07  1:05 ` [PATCH 4/5] idpf: skip setting channels " Li Li
2026-01-07  1:05 ` [PATCH 5/5] idpf: skip stopping/opening vport if it " Li Li
2026-01-09  6:06   ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-01-09  6:10     ` Loktionov, Aleksandr
2026-01-07  5:30 ` [Intel-wired-lan] [PATCH 1/5] idpf: skip getting/setting ring params if vport " Paul Menzel
2026-01-07 18:39   ` Li Li
2026-01-07 17:41 ` Tantilov, Emil S
2026-01-07 18:40   ` Li Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox