netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH iwl-net v2] ice: Add netif_device_attach/detach into PF reset flow
@ 2024-08-12 12:50 Dawid Osuchowski
  2024-08-12 16:11 ` Larysa Zaremba
  2024-08-13 11:49 ` Maciej Fijalkowski
  0 siblings, 2 replies; 6+ messages in thread
From: Dawid Osuchowski @ 2024-08-12 12:50 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Dawid Osuchowski, Jakub Kicinski, Igor Bagnucki

Ethtool callbacks can be executed while reset is in progress and try to
access deleted resources, e.g. getting coalesce settings can result in a
NULL pointer dereference seen below.

Reproduction steps:
Once the driver is fully initialized, trigger reset:
	# echo 1 > /sys/class/net/<interface>/device/reset
when reset is in progress try to get coalesce settings using ethtool:
	# ethtool -c <interface>

BUG: kernel NULL pointer dereference, address: 0000000000000020
PGD 0 P4D 0
Oops: Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 11 PID: 19713 Comm: ethtool Tainted: G S                 6.10.0-rc7+ #7
RIP: 0010:ice_get_q_coalesce+0x2e/0xa0 [ice]
RSP: 0018:ffffbab1e9bcf6a8 EFLAGS: 00010206
RAX: 000000000000000c RBX: ffff94512305b028 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9451c3f2e588 RDI: ffff9451c3f2e588
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff9451c3f2e580 R11: 000000000000001f R12: ffff945121fa9000
R13: ffffbab1e9bcf760 R14: 0000000000000013 R15: ffffffff9e65dd40
FS:  00007faee5fbe740(0000) GS:ffff94546fd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000020 CR3: 0000000106c2e005 CR4: 00000000001706f0
Call Trace:
<TASK>
ice_get_coalesce+0x17/0x30 [ice]
coalesce_prepare_data+0x61/0x80
ethnl_default_doit+0xde/0x340
genl_family_rcv_msg_doit+0xf2/0x150
genl_rcv_msg+0x1b3/0x2c0
netlink_rcv_skb+0x5b/0x110
genl_rcv+0x28/0x40
netlink_unicast+0x19c/0x290
netlink_sendmsg+0x222/0x490
__sys_sendto+0x1df/0x1f0
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x82/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7faee60d8e27

Calling netif_device_detach() before reset makes the net core not call
the driver when ethtool command is issued, the attempt to execute an
ethtool command during reset will result in the following message:

    netlink error: No such device

instead of NULL pointer dereference. Once reset is done and
ice_rebuild() is executing, the netif_device_attach() is called to allow
for ethtool operations to occur again in a safe manner.

Fixes: fcea6f3da546 ("ice: Add stats and ethtool support")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
---
Changes since v1:
* Changed Fixes tag to point to another commit
* Minified the stacktrace

Suggestion from Kuba: https://lore.kernel.org/netdev/20240610194756.5be5be90@kernel.org/
Previous attempt: https://lore.kernel.org/netdev/20240722122839.51342-1-dawid.osuchowski@linux.intel.com/
---
 drivers/net/ethernet/intel/ice/ice_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index eaa73cc200f4..16b4920741ff 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -608,6 +608,8 @@ ice_prepare_for_reset(struct ice_pf *pf, enum ice_reset_req reset_type)
 			memset(&vsi->mqprio_qopt, 0, sizeof(vsi->mqprio_qopt));
 		}
 	}
+	if (vsi->netdev)
+		netif_device_detach(vsi->netdev);
 skip:
 
 	/* clear SW filtering DB */
@@ -7568,11 +7570,13 @@ static void ice_update_pf_netdev_link(struct ice_pf *pf)
 
 		ice_get_link_status(pf->vsi[i]->port_info, &link_up);
 		if (link_up) {
+			netif_device_attach(pf->vsi[i]->netdev);
 			netif_carrier_on(pf->vsi[i]->netdev);
 			netif_tx_wake_all_queues(pf->vsi[i]->netdev);
 		} else {
 			netif_carrier_off(pf->vsi[i]->netdev);
 			netif_tx_stop_all_queues(pf->vsi[i]->netdev);
+			netif_device_detach(pf->vsi[i]->netdev);
 		}
 	}
 }
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH iwl-net v2] ice: Add netif_device_attach/detach into PF reset flow
  2024-08-12 12:50 [PATCH iwl-net v2] ice: Add netif_device_attach/detach into PF reset flow Dawid Osuchowski
@ 2024-08-12 16:11 ` Larysa Zaremba
  2024-08-13 11:49 ` Maciej Fijalkowski
  1 sibling, 0 replies; 6+ messages in thread
From: Larysa Zaremba @ 2024-08-12 16:11 UTC (permalink / raw)
  To: Dawid Osuchowski; +Cc: intel-wired-lan, netdev, Jakub Kicinski, Igor Bagnucki

On Mon, Aug 12, 2024 at 02:50:09PM +0200, Dawid Osuchowski wrote:
> Ethtool callbacks can be executed while reset is in progress and try to
> access deleted resources, e.g. getting coalesce settings can result in a
> NULL pointer dereference seen below.
> 
> Reproduction steps:
> Once the driver is fully initialized, trigger reset:
> 	# echo 1 > /sys/class/net/<interface>/device/reset
> when reset is in progress try to get coalesce settings using ethtool:
> 	# ethtool -c <interface>
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000020
> PGD 0 P4D 0
> Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 11 PID: 19713 Comm: ethtool Tainted: G S                 6.10.0-rc7+ #7
> RIP: 0010:ice_get_q_coalesce+0x2e/0xa0 [ice]
> RSP: 0018:ffffbab1e9bcf6a8 EFLAGS: 00010206
> RAX: 000000000000000c RBX: ffff94512305b028 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffff9451c3f2e588 RDI: ffff9451c3f2e588
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: ffff9451c3f2e580 R11: 000000000000001f R12: ffff945121fa9000
> R13: ffffbab1e9bcf760 R14: 0000000000000013 R15: ffffffff9e65dd40
> FS:  00007faee5fbe740(0000) GS:ffff94546fd80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000020 CR3: 0000000106c2e005 CR4: 00000000001706f0
> Call Trace:
> <TASK>
> ice_get_coalesce+0x17/0x30 [ice]
> coalesce_prepare_data+0x61/0x80
> ethnl_default_doit+0xde/0x340
> genl_family_rcv_msg_doit+0xf2/0x150
> genl_rcv_msg+0x1b3/0x2c0
> netlink_rcv_skb+0x5b/0x110
> genl_rcv+0x28/0x40
> netlink_unicast+0x19c/0x290
> netlink_sendmsg+0x222/0x490
> __sys_sendto+0x1df/0x1f0
> __x64_sys_sendto+0x24/0x30
> do_syscall_64+0x82/0x160
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7faee60d8e27
> 
> Calling netif_device_detach() before reset makes the net core not call
> the driver when ethtool command is issued, the attempt to execute an
> ethtool command during reset will result in the following message:
> 
>     netlink error: No such device
> 
> instead of NULL pointer dereference. Once reset is done and
> ice_rebuild() is executing, the netif_device_attach() is called to allow
> for ethtool operations to occur again in a safe manner.
> 
> Fixes: fcea6f3da546 ("ice: Add stats and ethtool support")
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
> Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>

Your SoB should be the last tag. Other than that

Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>

> ---
> Changes since v1:
> * Changed Fixes tag to point to another commit
> * Minified the stacktrace
> 
> Suggestion from Kuba: https://lore.kernel.org/netdev/20240610194756.5be5be90@kernel.org/
> Previous attempt: https://lore.kernel.org/netdev/20240722122839.51342-1-dawid.osuchowski@linux.intel.com/
> ---
>  drivers/net/ethernet/intel/ice/ice_main.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index eaa73cc200f4..16b4920741ff 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -608,6 +608,8 @@ ice_prepare_for_reset(struct ice_pf *pf, enum ice_reset_req reset_type)
>  			memset(&vsi->mqprio_qopt, 0, sizeof(vsi->mqprio_qopt));
>  		}
>  	}
> +	if (vsi->netdev)
> +		netif_device_detach(vsi->netdev);
>  skip:
>  
>  	/* clear SW filtering DB */
> @@ -7568,11 +7570,13 @@ static void ice_update_pf_netdev_link(struct ice_pf *pf)
>  
>  		ice_get_link_status(pf->vsi[i]->port_info, &link_up);
>  		if (link_up) {
> +			netif_device_attach(pf->vsi[i]->netdev);
>  			netif_carrier_on(pf->vsi[i]->netdev);
>  			netif_tx_wake_all_queues(pf->vsi[i]->netdev);
>  		} else {
>  			netif_carrier_off(pf->vsi[i]->netdev);
>  			netif_tx_stop_all_queues(pf->vsi[i]->netdev);
> +			netif_device_detach(pf->vsi[i]->netdev);
>  		}
>  	}
>  }
> -- 
> 2.44.0
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH iwl-net v2] ice: Add netif_device_attach/detach into PF reset flow
  2024-08-12 12:50 [PATCH iwl-net v2] ice: Add netif_device_attach/detach into PF reset flow Dawid Osuchowski
  2024-08-12 16:11 ` Larysa Zaremba
@ 2024-08-13 11:49 ` Maciej Fijalkowski
  2024-08-13 15:31   ` Dawid Osuchowski
  1 sibling, 1 reply; 6+ messages in thread
From: Maciej Fijalkowski @ 2024-08-13 11:49 UTC (permalink / raw)
  To: Dawid Osuchowski; +Cc: intel-wired-lan, netdev, Jakub Kicinski, Igor Bagnucki

On Mon, Aug 12, 2024 at 02:50:09PM +0200, Dawid Osuchowski wrote:
> Ethtool callbacks can be executed while reset is in progress and try to
> access deleted resources, e.g. getting coalesce settings can result in a
> NULL pointer dereference seen below.
> 
> Reproduction steps:
> Once the driver is fully initialized, trigger reset:
> 	# echo 1 > /sys/class/net/<interface>/device/reset
> when reset is in progress try to get coalesce settings using ethtool:
> 	# ethtool -c <interface>
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000020
> PGD 0 P4D 0
> Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 11 PID: 19713 Comm: ethtool Tainted: G S                 6.10.0-rc7+ #7
> RIP: 0010:ice_get_q_coalesce+0x2e/0xa0 [ice]
> RSP: 0018:ffffbab1e9bcf6a8 EFLAGS: 00010206
> RAX: 000000000000000c RBX: ffff94512305b028 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffff9451c3f2e588 RDI: ffff9451c3f2e588
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: ffff9451c3f2e580 R11: 000000000000001f R12: ffff945121fa9000
> R13: ffffbab1e9bcf760 R14: 0000000000000013 R15: ffffffff9e65dd40
> FS:  00007faee5fbe740(0000) GS:ffff94546fd80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000020 CR3: 0000000106c2e005 CR4: 00000000001706f0
> Call Trace:
> <TASK>
> ice_get_coalesce+0x17/0x30 [ice]
> coalesce_prepare_data+0x61/0x80
> ethnl_default_doit+0xde/0x340
> genl_family_rcv_msg_doit+0xf2/0x150
> genl_rcv_msg+0x1b3/0x2c0
> netlink_rcv_skb+0x5b/0x110
> genl_rcv+0x28/0x40
> netlink_unicast+0x19c/0x290
> netlink_sendmsg+0x222/0x490
> __sys_sendto+0x1df/0x1f0
> __x64_sys_sendto+0x24/0x30
> do_syscall_64+0x82/0x160
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7faee60d8e27
> 
> Calling netif_device_detach() before reset makes the net core not call
> the driver when ethtool command is issued, the attempt to execute an
> ethtool command during reset will result in the following message:
> 
>     netlink error: No such device
> 
> instead of NULL pointer dereference. Once reset is done and
> ice_rebuild() is executing, the netif_device_attach() is called to allow
> for ethtool operations to occur again in a safe manner.
> 
> Fixes: fcea6f3da546 ("ice: Add stats and ethtool support")

What about other intel drivers tho?

> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
> Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
> ---
> Changes since v1:
> * Changed Fixes tag to point to another commit
> * Minified the stacktrace
> 
> Suggestion from Kuba: https://lore.kernel.org/netdev/20240610194756.5be5be90@kernel.org/
> Previous attempt: https://lore.kernel.org/netdev/20240722122839.51342-1-dawid.osuchowski@linux.intel.com/
> ---
>  drivers/net/ethernet/intel/ice/ice_main.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index eaa73cc200f4..16b4920741ff 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -608,6 +608,8 @@ ice_prepare_for_reset(struct ice_pf *pf, enum ice_reset_req reset_type)
>  			memset(&vsi->mqprio_qopt, 0, sizeof(vsi->mqprio_qopt));
>  		}
>  	}
> +	if (vsi->netdev)
> +		netif_device_detach(vsi->netdev);
>  skip:
>  
>  	/* clear SW filtering DB */
> @@ -7568,11 +7570,13 @@ static void ice_update_pf_netdev_link(struct ice_pf *pf)
>  
>  		ice_get_link_status(pf->vsi[i]->port_info, &link_up);
>  		if (link_up) {
> +			netif_device_attach(pf->vsi[i]->netdev);
>  			netif_carrier_on(pf->vsi[i]->netdev);
>  			netif_tx_wake_all_queues(pf->vsi[i]->netdev);
>  		} else {
>  			netif_carrier_off(pf->vsi[i]->netdev);
>  			netif_tx_stop_all_queues(pf->vsi[i]->netdev);
> +			netif_device_detach(pf->vsi[i]->netdev);
>  		}
>  	}
>  }
> -- 
> 2.44.0
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH iwl-net v2] ice: Add netif_device_attach/detach into PF reset flow
  2024-08-13 11:49 ` Maciej Fijalkowski
@ 2024-08-13 15:31   ` Dawid Osuchowski
  2024-08-13 19:24     ` Maciej Fijalkowski
  0 siblings, 1 reply; 6+ messages in thread
From: Dawid Osuchowski @ 2024-08-13 15:31 UTC (permalink / raw)
  To: Maciej Fijalkowski; +Cc: intel-wired-lan, netdev, Jakub Kicinski, Igor Bagnucki

On 13.08.2024 13:49, Maciej Fijalkowski wrote:
> What about other intel drivers tho?

I have not performed detailed analysis of other intel ethernet drivers 
in this regard, but it is surely a topic worth investigating.

--Dawid

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH iwl-net v2] ice: Add netif_device_attach/detach into PF reset flow
  2024-08-13 15:31   ` Dawid Osuchowski
@ 2024-08-13 19:24     ` Maciej Fijalkowski
  2024-08-14 10:57       ` Dawid Osuchowski
  0 siblings, 1 reply; 6+ messages in thread
From: Maciej Fijalkowski @ 2024-08-13 19:24 UTC (permalink / raw)
  To: Dawid Osuchowski; +Cc: intel-wired-lan, netdev, Jakub Kicinski, Igor Bagnucki

On Tue, Aug 13, 2024 at 05:31:37PM +0200, Dawid Osuchowski wrote:
> On 13.08.2024 13:49, Maciej Fijalkowski wrote:
> > What about other intel drivers tho?
> 
> I have not performed detailed analysis of other intel ethernet drivers in
> this regard, but it is surely a topic worth investigating.

If you could take some action upon this then it would be great. I'm always
hesitating with providing the review tag against a change that already
contains few of them, but given that I dedicated some time to look into
that:

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

> 
> --Dawid

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH iwl-net v2] ice: Add netif_device_attach/detach into PF reset flow
  2024-08-13 19:24     ` Maciej Fijalkowski
@ 2024-08-14 10:57       ` Dawid Osuchowski
  0 siblings, 0 replies; 6+ messages in thread
From: Dawid Osuchowski @ 2024-08-14 10:57 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: intel-wired-lan, netdev, Jakub Kicinski, Igor Bagnucki,
	Kalesh Anakkur Purayil

On 13.08.2024 21:24, Maciej Fijalkowski wrote:
> On Tue, Aug 13, 2024 at 05:31:37PM +0200, Dawid Osuchowski wrote:
>> On 13.08.2024 13:49, Maciej Fijalkowski wrote:
>>> What about other intel drivers tho?
>>
>> I have not performed detailed analysis of other intel ethernet drivers in
>> this regard, but it is surely a topic worth investigating.
> 
> If you could take some action upon this then it would be great. I'm always
> hesitating with providing the review tag against a change that already
> contains few of them, but given that I dedicated some time to look into
> that:
> 
I got a valid concern from Kalesh (CCd) on the v1 thread 
(https://lore.kernel.org/netdev/CAH-L+nOFqs-K5YzfrfmpRHbhDGM-+1ahhWh4NXATX1FqZiPVLQ@mail.gmail.com/) 
about the attaching only if link is up.

On 14.08.2024 05:19, Kalesh Anakkur Purayil wrote:
 > [Kalesh] Is there any reason to attach back the netdev only if link is
 > up? IMO, you should attach the device back irrespective of physical
 > link status. In ice_prepare_for_reset(), you are detaching the device
 > unconditionally.
 >
 > I may be missing something here.

I agree with his suggestion to do the netif_device_attach() irrespective 
of link being up. Should I sent a v3 with the change? I have already 
tested that locally and it seems to fix the reported issue with NULL 
pointer dereference as well.

--Dawid

> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> 
>>
>> --Dawid



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-08-14 10:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-12 12:50 [PATCH iwl-net v2] ice: Add netif_device_attach/detach into PF reset flow Dawid Osuchowski
2024-08-12 16:11 ` Larysa Zaremba
2024-08-13 11:49 ` Maciej Fijalkowski
2024-08-13 15:31   ` Dawid Osuchowski
2024-08-13 19:24     ` Maciej Fijalkowski
2024-08-14 10:57       ` Dawid Osuchowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).