* [Intel-wired-lan] [PATCH iwl-net] ice: Add netif_device_attach/detach into PF reset flow
@ 2024-08-12 10:22 Dawid Osuchowski
2024-08-12 10:48 ` Dawid Osuchowski
2024-08-14 3:19 ` Kalesh Anakkur Purayil
0 siblings, 2 replies; 4+ messages in thread
From: Dawid Osuchowski @ 2024-08-12 10:22 UTC (permalink / raw)
To: intel-wired-lan; +Cc: netdev, Igor Bagnucki, Dawid Osuchowski, Jakub Kicinski
Ethtool callbacks can be executed while reset is in progress and try to
access deleted resources, e.g. getting coalesce settings can result in a
NULL pointer dereference seen below.
Once the driver is fully initialized, trigger reset:
# echo 1 > /sys/class/net/<interface>/device/reset
when reset is in progress try to get coalesce settings using ethtool:
# ethtool -c <interface>
Calling netif_device_detach() before reset makes the net core not call
the driver when ethtool command is issued, the attempt to execute an
ethtool command during reset will result in the following message:
netlink error: No such device
instead of NULL pointer dereference. Once reset is done and
ice_rebuild() is executing, the netif_device_attach() is called to allow
for ethtool operations to occur again in a safe manner.
[ +0.000105] BUG: kernel NULL pointer dereference, address: 0000000000000020
[ +0.000027] #PF: supervisor read access in kernel mode
[ +0.000011] #PF: error_code(0x0000) - not-present page
[ +0.000011] PGD 0 P4D 0
[ +0.000008] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
[ +0.000012] CPU: 11 PID: 19713 Comm: ethtool Tainted: G S 6.10.0-rc7+ #7
[ +0.000015] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
[ +0.000013] RIP: 0010:ice_get_q_coalesce+0x2e/0xa0 [ice]
[ +0.000090] Code: 00 55 53 48 89 fb 48 89 f7 48 83 ec 08 0f b7 8b 86 04 00 00 0f b7 83 82 04 00 00 39 d1 7e 30 48 8b 4b 18 48 63 ea 48 8b 0c e9 <48> 8b 71 20 48 81 c6 a0 01 00 00 39 c2 7c 32 e8 ee fe ff ff 85 c0
[ +0.000029] RSP: 0018:ffffbab1e9bcf6a8 EFLAGS: 00010206
[ +0.000012] RAX: 000000000000000c RBX: ffff94512305b028 RCX: 0000000000000000
[ +0.000012] RDX: 0000000000000000 RSI: ffff9451c3f2e588 RDI: ffff9451c3f2e588
[ +0.000012] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ +0.000013] R10: ffff9451c3f2e580 R11: 000000000000001f R12: ffff945121fa9000
[ +0.000012] R13: ffffbab1e9bcf760 R14: 0000000000000013 R15: ffffffff9e65dd40
[ +0.000012] FS: 00007faee5fbe740(0000) GS:ffff94546fd80000(0000) knlGS:0000000000000000
[ +0.000014] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000011] CR2: 0000000000000020 CR3: 0000000106c2e005 CR4: 00000000001706f0
[ +0.000012] Call Trace:
[ +0.000009] <TASK>
[ +0.000007] ? __die+0x23/0x70
[ +0.000012] ? page_fault_oops+0x173/0x510
[ +0.000011] ? ice_get_q_coalesce+0x2e/0xa0 [ice]
[ +0.000071] ? search_module_extables+0x19/0x60
[ +0.000013] ? search_bpf_extables+0x5f/0x80
[ +0.000012] ? exc_page_fault+0x7e/0x180
[ +0.000013] ? asm_exc_page_fault+0x26/0x30
[ +0.000014] ? ice_get_q_coalesce+0x2e/0xa0 [ice]
[ +0.000070] ice_get_coalesce+0x17/0x30 [ice]
[ +0.000070] coalesce_prepare_data+0x61/0x80
[ +0.000012] ethnl_default_doit+0xde/0x340
[ +0.000012] genl_family_rcv_msg_doit+0xf2/0x150
[ +0.000013] genl_rcv_msg+0x1b3/0x2c0
[ +0.000009] ? __pfx_ethnl_default_doit+0x10/0x10
[ +0.000011] ? __pfx_genl_rcv_msg+0x10/0x10
[ +0.000010] netlink_rcv_skb+0x5b/0x110
[ +0.000013] genl_rcv+0x28/0x40
[ +0.000007] netlink_unicast+0x19c/0x290
[ +0.000012] netlink_sendmsg+0x222/0x490
[ +0.000011] __sys_sendto+0x1df/0x1f0
[ +0.000013] __x64_sys_sendto+0x24/0x30
[ +0.000340] do_syscall_64+0x82/0x160
[ +0.000309] ? __mod_memcg_lruvec_state+0xa6/0x150
[ +0.000309] ? __lruvec_stat_mod_folio+0x68/0xa0
[ +0.000311] ? folio_add_file_rmap_ptes+0x86/0xb0
[ +0.000309] ? next_uptodate_folio+0x89/0x290
[ +0.000309] ? filemap_map_pages+0x521/0x5f0
[ +0.000302] ? do_fault+0x26e/0x470
[ +0.000293] ? __handle_mm_fault+0x7dc/0x1060
[ +0.000295] ? __count_memcg_events+0x58/0xf0
[ +0.000289] ? count_memcg_events.constprop.0+0x1a/0x30
[ +0.000292] ? handle_mm_fault+0xae/0x320
[ +0.000284] ? do_user_addr_fault+0x33a/0x6a0
[ +0.000280] ? exc_page_fault+0x7e/0x180
[ +0.000289] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000271] RIP: 0033:0x7faee60d8e27
Fixes: 67fe64d78c43 ("ice: Implement getting and setting ethtool coalesce")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
---
drivers/net/ethernet/intel/ice/ice_main.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index eaa73cc200f4..16b4920741ff 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -608,6 +608,8 @@ ice_prepare_for_reset(struct ice_pf *pf, enum ice_reset_req reset_type)
memset(&vsi->mqprio_qopt, 0, sizeof(vsi->mqprio_qopt));
}
}
+ if (vsi->netdev)
+ netif_device_detach(vsi->netdev);
skip:
/* clear SW filtering DB */
@@ -7568,11 +7570,13 @@ static void ice_update_pf_netdev_link(struct ice_pf *pf)
ice_get_link_status(pf->vsi[i]->port_info, &link_up);
if (link_up) {
+ netif_device_attach(pf->vsi[i]->netdev);
netif_carrier_on(pf->vsi[i]->netdev);
netif_tx_wake_all_queues(pf->vsi[i]->netdev);
} else {
netif_carrier_off(pf->vsi[i]->netdev);
netif_tx_stop_all_queues(pf->vsi[i]->netdev);
+ netif_device_detach(pf->vsi[i]->netdev);
}
}
}
--
2.44.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-net] ice: Add netif_device_attach/detach into PF reset flow
2024-08-12 10:22 [Intel-wired-lan] [PATCH iwl-net] ice: Add netif_device_attach/detach into PF reset flow Dawid Osuchowski
@ 2024-08-12 10:48 ` Dawid Osuchowski
2024-08-14 3:19 ` Kalesh Anakkur Purayil
1 sibling, 0 replies; 4+ messages in thread
From: Dawid Osuchowski @ 2024-08-12 10:48 UTC (permalink / raw)
To: intel-wired-lan; +Cc: netdev, Igor Bagnucki, Jakub Kicinski
On 12.08.2024 12:22, Dawid Osuchowski wrote:
> Ethtool callbacks can be executed while reset is in progress and try to
> access deleted resources, e.g. getting coalesce settings can result in a
> NULL pointer dereference seen below.
Please disregard this submission, I have been made aware of additional
issues in internal review and will send new version with the changes.
--Dawid
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-net] ice: Add netif_device_attach/detach into PF reset flow
2024-08-12 10:22 [Intel-wired-lan] [PATCH iwl-net] ice: Add netif_device_attach/detach into PF reset flow Dawid Osuchowski
2024-08-12 10:48 ` Dawid Osuchowski
@ 2024-08-14 3:19 ` Kalesh Anakkur Purayil
2024-08-14 10:44 ` Dawid Osuchowski
1 sibling, 1 reply; 4+ messages in thread
From: Kalesh Anakkur Purayil @ 2024-08-14 3:19 UTC (permalink / raw)
To: Dawid Osuchowski; +Cc: netdev, intel-wired-lan, Igor Bagnucki, Jakub Kicinski
[-- Attachment #1: Type: text/plain, Size: 6246 bytes --]
Hi David,
One question in line.
On Mon, Aug 12, 2024 at 3:52 PM Dawid Osuchowski
<dawid.osuchowski@linux.intel.com> wrote:
>
> Ethtool callbacks can be executed while reset is in progress and try to
> access deleted resources, e.g. getting coalesce settings can result in a
> NULL pointer dereference seen below.
>
> Once the driver is fully initialized, trigger reset:
> # echo 1 > /sys/class/net/<interface>/device/reset
> when reset is in progress try to get coalesce settings using ethtool:
> # ethtool -c <interface>
>
> Calling netif_device_detach() before reset makes the net core not call
> the driver when ethtool command is issued, the attempt to execute an
> ethtool command during reset will result in the following message:
>
> netlink error: No such device
>
> instead of NULL pointer dereference. Once reset is done and
> ice_rebuild() is executing, the netif_device_attach() is called to allow
> for ethtool operations to occur again in a safe manner.
>
> [ +0.000105] BUG: kernel NULL pointer dereference, address: 0000000000000020
> [ +0.000027] #PF: supervisor read access in kernel mode
> [ +0.000011] #PF: error_code(0x0000) - not-present page
> [ +0.000011] PGD 0 P4D 0
> [ +0.000008] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> [ +0.000012] CPU: 11 PID: 19713 Comm: ethtool Tainted: G S 6.10.0-rc7+ #7
> [ +0.000015] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
> [ +0.000013] RIP: 0010:ice_get_q_coalesce+0x2e/0xa0 [ice]
> [ +0.000090] Code: 00 55 53 48 89 fb 48 89 f7 48 83 ec 08 0f b7 8b 86 04 00 00 0f b7 83 82 04 00 00 39 d1 7e 30 48 8b 4b 18 48 63 ea 48 8b 0c e9 <48> 8b 71 20 48 81 c6 a0 01 00 00 39 c2 7c 32 e8 ee fe ff ff 85 c0
> [ +0.000029] RSP: 0018:ffffbab1e9bcf6a8 EFLAGS: 00010206
> [ +0.000012] RAX: 000000000000000c RBX: ffff94512305b028 RCX: 0000000000000000
> [ +0.000012] RDX: 0000000000000000 RSI: ffff9451c3f2e588 RDI: ffff9451c3f2e588
> [ +0.000012] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ +0.000013] R10: ffff9451c3f2e580 R11: 000000000000001f R12: ffff945121fa9000
> [ +0.000012] R13: ffffbab1e9bcf760 R14: 0000000000000013 R15: ffffffff9e65dd40
> [ +0.000012] FS: 00007faee5fbe740(0000) GS:ffff94546fd80000(0000) knlGS:0000000000000000
> [ +0.000014] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ +0.000011] CR2: 0000000000000020 CR3: 0000000106c2e005 CR4: 00000000001706f0
> [ +0.000012] Call Trace:
> [ +0.000009] <TASK>
> [ +0.000007] ? __die+0x23/0x70
> [ +0.000012] ? page_fault_oops+0x173/0x510
> [ +0.000011] ? ice_get_q_coalesce+0x2e/0xa0 [ice]
> [ +0.000071] ? search_module_extables+0x19/0x60
> [ +0.000013] ? search_bpf_extables+0x5f/0x80
> [ +0.000012] ? exc_page_fault+0x7e/0x180
> [ +0.000013] ? asm_exc_page_fault+0x26/0x30
> [ +0.000014] ? ice_get_q_coalesce+0x2e/0xa0 [ice]
> [ +0.000070] ice_get_coalesce+0x17/0x30 [ice]
> [ +0.000070] coalesce_prepare_data+0x61/0x80
> [ +0.000012] ethnl_default_doit+0xde/0x340
> [ +0.000012] genl_family_rcv_msg_doit+0xf2/0x150
> [ +0.000013] genl_rcv_msg+0x1b3/0x2c0
> [ +0.000009] ? __pfx_ethnl_default_doit+0x10/0x10
> [ +0.000011] ? __pfx_genl_rcv_msg+0x10/0x10
> [ +0.000010] netlink_rcv_skb+0x5b/0x110
> [ +0.000013] genl_rcv+0x28/0x40
> [ +0.000007] netlink_unicast+0x19c/0x290
> [ +0.000012] netlink_sendmsg+0x222/0x490
> [ +0.000011] __sys_sendto+0x1df/0x1f0
> [ +0.000013] __x64_sys_sendto+0x24/0x30
> [ +0.000340] do_syscall_64+0x82/0x160
> [ +0.000309] ? __mod_memcg_lruvec_state+0xa6/0x150
> [ +0.000309] ? __lruvec_stat_mod_folio+0x68/0xa0
> [ +0.000311] ? folio_add_file_rmap_ptes+0x86/0xb0
> [ +0.000309] ? next_uptodate_folio+0x89/0x290
> [ +0.000309] ? filemap_map_pages+0x521/0x5f0
> [ +0.000302] ? do_fault+0x26e/0x470
> [ +0.000293] ? __handle_mm_fault+0x7dc/0x1060
> [ +0.000295] ? __count_memcg_events+0x58/0xf0
> [ +0.000289] ? count_memcg_events.constprop.0+0x1a/0x30
> [ +0.000292] ? handle_mm_fault+0xae/0x320
> [ +0.000284] ? do_user_addr_fault+0x33a/0x6a0
> [ +0.000280] ? exc_page_fault+0x7e/0x180
> [ +0.000289] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ +0.000271] RIP: 0033:0x7faee60d8e27
>
> Fixes: 67fe64d78c43 ("ice: Implement getting and setting ethtool coalesce")
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
> Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
> ---
> drivers/net/ethernet/intel/ice/ice_main.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index eaa73cc200f4..16b4920741ff 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -608,6 +608,8 @@ ice_prepare_for_reset(struct ice_pf *pf, enum ice_reset_req reset_type)
> memset(&vsi->mqprio_qopt, 0, sizeof(vsi->mqprio_qopt));
> }
> }
> + if (vsi->netdev)
> + netif_device_detach(vsi->netdev);
> skip:
>
> /* clear SW filtering DB */
> @@ -7568,11 +7570,13 @@ static void ice_update_pf_netdev_link(struct ice_pf *pf)
>
> ice_get_link_status(pf->vsi[i]->port_info, &link_up);
> if (link_up) {
> + netif_device_attach(pf->vsi[i]->netdev);
> netif_carrier_on(pf->vsi[i]->netdev);
> netif_tx_wake_all_queues(pf->vsi[i]->netdev);
> } else {
> netif_carrier_off(pf->vsi[i]->netdev);
> netif_tx_stop_all_queues(pf->vsi[i]->netdev);
> + netif_device_detach(pf->vsi[i]->netdev);
[Kalesh] Is there any reason to attach back the netdev only if link is
up? IMO, you should attach the device back irrespective of physical
link status. In ice_prepare_for_reset(), you are detaching the device
unconditionally.
I may be missing something here.
> }
> }
> }
> --
> 2.44.0
>
>
--
Regards,
Kalesh A P
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4239 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-net] ice: Add netif_device_attach/detach into PF reset flow
2024-08-14 3:19 ` Kalesh Anakkur Purayil
@ 2024-08-14 10:44 ` Dawid Osuchowski
0 siblings, 0 replies; 4+ messages in thread
From: Dawid Osuchowski @ 2024-08-14 10:44 UTC (permalink / raw)
To: Kalesh Anakkur Purayil
Cc: netdev, intel-wired-lan, Igor Bagnucki, Jakub Kicinski
On 14.08.2024 05:19, Kalesh Anakkur Purayil wrote:
> On Mon, Aug 12, 2024 at 3:52 PM Dawid Osuchowski
> <dawid.osuchowski@linux.intel.com> wrote:
>> @@ -7568,11 +7570,13 @@ static void ice_update_pf_netdev_link(struct ice_pf *pf)
>>
>> ice_get_link_status(pf->vsi[i]->port_info, &link_up);
>> if (link_up) {
>> + netif_device_attach(pf->vsi[i]->netdev);
>> netif_carrier_on(pf->vsi[i]->netdev);
>> netif_tx_wake_all_queues(pf->vsi[i]->netdev);
>> } else {
>> netif_carrier_off(pf->vsi[i]->netdev);
>> netif_tx_stop_all_queues(pf->vsi[i]->netdev);
>> + netif_device_detach(pf->vsi[i]->netdev);
> [Kalesh] Is there any reason to attach back the netdev only if link is
> up? IMO, you should attach the device back irrespective of physical
> link status. In ice_prepare_for_reset(), you are detaching the device
> unconditionally.
>
> I may be missing something here.
Hey Kalesh,
I think you are right, it is a mistake on my end. I have already sent a
v2 but without this change. I just tested if this works with the attach
irrespective of link status and it also resolves the reported issue that
the patch is supposed to fix and doesn't introduce any regression that I
am aware of. I will forward your concern to the v2 thread and will post
a v3 with the change.
--Dawid
>> }
>> }
>> }
>
>> --
>> 2.44.0
>>
>>
>
>
> --
> Regards,
> Kalesh A P
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-08-14 10:45 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-12 10:22 [Intel-wired-lan] [PATCH iwl-net] ice: Add netif_device_attach/detach into PF reset flow Dawid Osuchowski
2024-08-12 10:48 ` Dawid Osuchowski
2024-08-14 3:19 ` Kalesh Anakkur Purayil
2024-08-14 10:44 ` Dawid Osuchowski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox