netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH iwl-net] idpf: fix adapter NULL pointer dereference on reboot
@ 2025-03-07  0:39 Emil Tantilov
  2025-03-07  5:58 ` Michal Swiatkowski
  2025-03-10  6:22 ` Simon Horman
  0 siblings, 2 replies; 5+ messages in thread
From: Emil Tantilov @ 2025-03-07  0:39 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, decot, willemb, anthony.l.nguyen, davem, edumazet, kuba,
	pabeni, madhu.chittim, Aleksandr.Loktionov, yuma, mschmidt

Driver calls idpf_remove() from idpf_shutdown(), which can end up
calling idpf_remove() again when disabling SRIOV.

echo 1 > /sys/class/net/<netif>/device/sriov_numvfs
reboot

BUG: kernel NULL pointer dereference, address: 0000000000000020
...
RIP: 0010:idpf_remove+0x22/0x1f0 [idpf]
...
? idpf_remove+0x22/0x1f0 [idpf]
? idpf_remove+0x1e4/0x1f0 [idpf]
pci_device_remove+0x3f/0xb0
device_release_driver_internal+0x19f/0x200
pci_stop_bus_device+0x6d/0x90
pci_stop_and_remove_bus_device+0x12/0x20
pci_iov_remove_virtfn+0xbe/0x120
sriov_disable+0x34/0xe0
idpf_sriov_configure+0x58/0x140 [idpf]
idpf_remove+0x1b9/0x1f0 [idpf]
idpf_shutdown+0x12/0x30 [idpf]
pci_device_shutdown+0x35/0x60
device_shutdown+0x156/0x200
...

Replace the direct idpf_remove() call in idpf_shutdown() with
idpf_vc_core_deinit() and idpf_deinit_dflt_mbx(), which perform
the bulk of the cleanup, such as stopping the init task, freeing IRQs,
destroying the vports and freeing the mailbox.

Reported-by: Yuying Ma <yuma@redhat.com>
Fixes: e850efed5e15 ("idpf: add module register and probe functionality")
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
---
 drivers/net/ethernet/intel/idpf/idpf_main.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
index b6c515d14cbf..bec4a02c5373 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_main.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
@@ -87,7 +87,11 @@ static void idpf_remove(struct pci_dev *pdev)
  */
 static void idpf_shutdown(struct pci_dev *pdev)
 {
-	idpf_remove(pdev);
+	struct idpf_adapter *adapter = pci_get_drvdata(pdev);
+
+	cancel_delayed_work_sync(&adapter->vc_event_task);
+	idpf_vc_core_deinit(adapter);
+	idpf_deinit_dflt_mbx(adapter);
 
 	if (system_state == SYSTEM_POWER_OFF)
 		pci_set_power_state(pdev, PCI_D3hot);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH iwl-net] idpf: fix adapter NULL pointer dereference on reboot
  2025-03-07  0:39 [PATCH iwl-net] idpf: fix adapter NULL pointer dereference on reboot Emil Tantilov
@ 2025-03-07  5:58 ` Michal Swiatkowski
  2025-03-11  4:52   ` Tantilov, Emil S
  2025-03-10  6:22 ` Simon Horman
  1 sibling, 1 reply; 5+ messages in thread
From: Michal Swiatkowski @ 2025-03-07  5:58 UTC (permalink / raw)
  To: Emil Tantilov
  Cc: intel-wired-lan, netdev, decot, willemb, anthony.l.nguyen, davem,
	edumazet, kuba, pabeni, madhu.chittim, Aleksandr.Loktionov, yuma,
	mschmidt

On Thu, Mar 06, 2025 at 04:39:56PM -0800, Emil Tantilov wrote:
> Driver calls idpf_remove() from idpf_shutdown(), which can end up
> calling idpf_remove() again when disabling SRIOV.
> 

The same is done in other drivers (ice, iavf). Why here it is a problem?
I am asking because heaving one function to remove is pretty handy.
Maybe the problem can be fixed by some changes in idpf_remove() instead?

> echo 1 > /sys/class/net/<netif>/device/sriov_numvfs
> reboot
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000020
> ...
> RIP: 0010:idpf_remove+0x22/0x1f0 [idpf]
> ...
> ? idpf_remove+0x22/0x1f0 [idpf]
> ? idpf_remove+0x1e4/0x1f0 [idpf]
> pci_device_remove+0x3f/0xb0
> device_release_driver_internal+0x19f/0x200
> pci_stop_bus_device+0x6d/0x90
> pci_stop_and_remove_bus_device+0x12/0x20
> pci_iov_remove_virtfn+0xbe/0x120
> sriov_disable+0x34/0xe0
> idpf_sriov_configure+0x58/0x140 [idpf]
> idpf_remove+0x1b9/0x1f0 [idpf]
> idpf_shutdown+0x12/0x30 [idpf]
> pci_device_shutdown+0x35/0x60
> device_shutdown+0x156/0x200
> ...
> 
> Replace the direct idpf_remove() call in idpf_shutdown() with
> idpf_vc_core_deinit() and idpf_deinit_dflt_mbx(), which perform
> the bulk of the cleanup, such as stopping the init task, freeing IRQs,
> destroying the vports and freeing the mailbox.
> 
> Reported-by: Yuying Ma <yuma@redhat.com>
> Fixes: e850efed5e15 ("idpf: add module register and probe functionality")
> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
> ---
>  drivers/net/ethernet/intel/idpf/idpf_main.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
> index b6c515d14cbf..bec4a02c5373 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_main.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
> @@ -87,7 +87,11 @@ static void idpf_remove(struct pci_dev *pdev)
>   */
>  static void idpf_shutdown(struct pci_dev *pdev)
>  {
> -	idpf_remove(pdev);
> +	struct idpf_adapter *adapter = pci_get_drvdata(pdev);
> +
> +	cancel_delayed_work_sync(&adapter->vc_event_task);
> +	idpf_vc_core_deinit(adapter);
> +	idpf_deinit_dflt_mbx(adapter);
>  
>  	if (system_state == SYSTEM_POWER_OFF)
>  		pci_set_power_state(pdev, PCI_D3hot);
> -- 
> 2.17.2

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH iwl-net] idpf: fix adapter NULL pointer dereference on reboot
  2025-03-07  0:39 [PATCH iwl-net] idpf: fix adapter NULL pointer dereference on reboot Emil Tantilov
  2025-03-07  5:58 ` Michal Swiatkowski
@ 2025-03-10  6:22 ` Simon Horman
  2025-03-11  4:42   ` [Intel-wired-lan] " Tantilov, Emil S
  1 sibling, 1 reply; 5+ messages in thread
From: Simon Horman @ 2025-03-10  6:22 UTC (permalink / raw)
  To: Emil Tantilov
  Cc: intel-wired-lan, netdev, decot, willemb, anthony.l.nguyen, davem,
	edumazet, kuba, pabeni, madhu.chittim, Aleksandr.Loktionov, yuma,
	mschmidt

On Thu, Mar 06, 2025 at 04:39:56PM -0800, Emil Tantilov wrote:
> Driver calls idpf_remove() from idpf_shutdown(), which can end up
> calling idpf_remove() again when disabling SRIOV.
> 
> echo 1 > /sys/class/net/<netif>/device/sriov_numvfs
> reboot
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000020
> ...
> RIP: 0010:idpf_remove+0x22/0x1f0 [idpf]
> ...
> ? idpf_remove+0x22/0x1f0 [idpf]
> ? idpf_remove+0x1e4/0x1f0 [idpf]
> pci_device_remove+0x3f/0xb0
> device_release_driver_internal+0x19f/0x200
> pci_stop_bus_device+0x6d/0x90
> pci_stop_and_remove_bus_device+0x12/0x20
> pci_iov_remove_virtfn+0xbe/0x120
> sriov_disable+0x34/0xe0
> idpf_sriov_configure+0x58/0x140 [idpf]
> idpf_remove+0x1b9/0x1f0 [idpf]
> idpf_shutdown+0x12/0x30 [idpf]
> pci_device_shutdown+0x35/0x60
> device_shutdown+0x156/0x200
> ...
> 
> Replace the direct idpf_remove() call in idpf_shutdown() with
> idpf_vc_core_deinit() and idpf_deinit_dflt_mbx(), which perform
> the bulk of the cleanup, such as stopping the init task, freeing IRQs,
> destroying the vports and freeing the mailbox.

Hi Emil,

I think it would be worth adding some commentary on the rest of
the clean-up performed by idpf_remove() and why it is correct
to no longer do so directly from a call to idpf_remove() from
idpf_shutdown() (IOW, it isn't clear to me :).

...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-net] idpf: fix adapter NULL pointer dereference on reboot
  2025-03-10  6:22 ` Simon Horman
@ 2025-03-11  4:42   ` Tantilov, Emil S
  0 siblings, 0 replies; 5+ messages in thread
From: Tantilov, Emil S @ 2025-03-11  4:42 UTC (permalink / raw)
  To: Simon Horman
  Cc: willemb, pabeni, netdev, yuma, Aleksandr.Loktionov, edumazet,
	madhu.chittim, anthony.l.nguyen, kuba, intel-wired-lan, decot,
	davem, Michal Swiatkowski

On 3/9/2025 11:22 PM, Simon Horman wrote:
> On Thu, Mar 06, 2025 at 04:39:56PM -0800, Emil Tantilov wrote:
>> Driver calls idpf_remove() from idpf_shutdown(), which can end up
>> calling idpf_remove() again when disabling SRIOV.
>>
>> echo 1 > /sys/class/net/<netif>/device/sriov_numvfs
>> reboot
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000020
>> ...
>> RIP: 0010:idpf_remove+0x22/0x1f0 [idpf]
>> ...
>> ? idpf_remove+0x22/0x1f0 [idpf]
>> ? idpf_remove+0x1e4/0x1f0 [idpf]
>> pci_device_remove+0x3f/0xb0
>> device_release_driver_internal+0x19f/0x200
>> pci_stop_bus_device+0x6d/0x90
>> pci_stop_and_remove_bus_device+0x12/0x20
>> pci_iov_remove_virtfn+0xbe/0x120
>> sriov_disable+0x34/0xe0
>> idpf_sriov_configure+0x58/0x140 [idpf]
>> idpf_remove+0x1b9/0x1f0 [idpf]
>> idpf_shutdown+0x12/0x30 [idpf]
>> pci_device_shutdown+0x35/0x60
>> device_shutdown+0x156/0x200
>> ...
>>
>> Replace the direct idpf_remove() call in idpf_shutdown() with
>> idpf_vc_core_deinit() and idpf_deinit_dflt_mbx(), which perform
>> the bulk of the cleanup, such as stopping the init task, freeing IRQs,
>> destroying the vports and freeing the mailbox.
> 
> Hi Emil,
> 
> I think it would be worth adding some commentary on the rest of
> the clean-up performed by idpf_remove() and why it is correct
The main reason behind the change is to avoid calling sriov_disable(), 
which ends up calling idpf_remove() again via pci_device_remove(). The 
idpf_remove() will crash in that situation as it attempts to access 
adapter pointer, which was already freed.

> to no longer do so directly from a call to idpf_remove() from
> idpf_shutdown() (IOW, it isn't clear to me :).
I assume you are asking what portion of the idpf_remove() will not be 
present in idpf_shutdown() as result? Aside from not calling 
sriov_disable(), there is a small cleanup of stale netdevs and the 
destruction of WQs, which did not seem like would be needed on shutdown. 
Then again, I was not able to find documentation on what steps are 
required for shutdown and mostly checked on how other drivers handle it 
(where there is no 1:1 overlap between shutdown and remove), and applied 
similar steps to idpf. Ideally I do not wish to do more than is needed 
for that flow.

> 
> ...


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH iwl-net] idpf: fix adapter NULL pointer dereference on reboot
  2025-03-07  5:58 ` Michal Swiatkowski
@ 2025-03-11  4:52   ` Tantilov, Emil S
  0 siblings, 0 replies; 5+ messages in thread
From: Tantilov, Emil S @ 2025-03-11  4:52 UTC (permalink / raw)
  To: Michal Swiatkowski
  Cc: intel-wired-lan, netdev, decot, willemb, anthony.l.nguyen, davem,
	edumazet, kuba, pabeni, madhu.chittim, Aleksandr.Loktionov, yuma,
	mschmidt, Simon Horman



On 3/6/2025 9:58 PM, Michal Swiatkowski wrote:
> On Thu, Mar 06, 2025 at 04:39:56PM -0800, Emil Tantilov wrote:
>> Driver calls idpf_remove() from idpf_shutdown(), which can end up
>> calling idpf_remove() again when disabling SRIOV.
>>
> 
> The same is done in other drivers (ice, iavf). Why here it is a problem?
> I am asking because heaving one function to remove is pretty handy.
> Maybe the problem can be fixed by some changes in idpf_remove() instead?

It was indeed handy, until we ran into the crash. I did look into fixing 
it in idpf_remove(), but I don't think I have a lot of options. I can 
simply check and exit on adapter being NULL, but this types of checks 
are usually frowned upon, so I looked into alternatives.

The main difference between idpf and ice is that idpf will load on both 
VF and PF devices. From what I can tell, the VFs created by ice are 
supported by iavf (0x1889 device id). With VFs created, on idpf, we end 
up calling into idpf_remove() twice. First on shutdown and then again 
when idpf_remove calls into sriov_disable(), because the VF devices have 
the same driver, hence the same remove routine.

> 
>> echo 1 > /sys/class/net/<netif>/device/sriov_numvfs
>> reboot
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000020
>> ...
>> RIP: 0010:idpf_remove+0x22/0x1f0 [idpf]
>> ...
>> ? idpf_remove+0x22/0x1f0 [idpf]
>> ? idpf_remove+0x1e4/0x1f0 [idpf]
>> pci_device_remove+0x3f/0xb0
>> device_release_driver_internal+0x19f/0x200
>> pci_stop_bus_device+0x6d/0x90
>> pci_stop_and_remove_bus_device+0x12/0x20
>> pci_iov_remove_virtfn+0xbe/0x120
>> sriov_disable+0x34/0xe0
>> idpf_sriov_configure+0x58/0x140 [idpf]
>> idpf_remove+0x1b9/0x1f0 [idpf]
>> idpf_shutdown+0x12/0x30 [idpf]
>> pci_device_shutdown+0x35/0x60
>> device_shutdown+0x156/0x200
>> ...
>>
>> Replace the direct idpf_remove() call in idpf_shutdown() with
>> idpf_vc_core_deinit() and idpf_deinit_dflt_mbx(), which perform
>> the bulk of the cleanup, such as stopping the init task, freeing IRQs,
>> destroying the vports and freeing the mailbox.
>>
>> Reported-by: Yuying Ma <yuma@redhat.com>
>> Fixes: e850efed5e15 ("idpf: add module register and probe functionality")
>> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
>> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
>> ---
>>   drivers/net/ethernet/intel/idpf/idpf_main.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
>> index b6c515d14cbf..bec4a02c5373 100644
>> --- a/drivers/net/ethernet/intel/idpf/idpf_main.c
>> +++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
>> @@ -87,7 +87,11 @@ static void idpf_remove(struct pci_dev *pdev)
>>    */
>>   static void idpf_shutdown(struct pci_dev *pdev)
>>   {
>> -	idpf_remove(pdev);
>> +	struct idpf_adapter *adapter = pci_get_drvdata(pdev);
>> +
>> +	cancel_delayed_work_sync(&adapter->vc_event_task);
>> +	idpf_vc_core_deinit(adapter);
>> +	idpf_deinit_dflt_mbx(adapter);
>>   
>>   	if (system_state == SYSTEM_POWER_OFF)
>>   		pci_set_power_state(pdev, PCI_D3hot);
>> -- 
>> 2.17.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-03-11  4:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-07  0:39 [PATCH iwl-net] idpf: fix adapter NULL pointer dereference on reboot Emil Tantilov
2025-03-07  5:58 ` Michal Swiatkowski
2025-03-11  4:52   ` Tantilov, Emil S
2025-03-10  6:22 ` Simon Horman
2025-03-11  4:42   ` [Intel-wired-lan] " Tantilov, Emil S

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).