linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 1/6] ice: avoid bonding causing auxiliary plug/unplug under RTNL lock
       [not found] <20230131213703.1347761-1-anthony.l.nguyen@intel.com>
@ 2023-01-31 21:36 ` Tony Nguyen
  2023-02-01  9:49   ` Leon Romanovsky
  2023-01-31 21:36 ` [PATCH net 2/6] ice: Do not use WQ_MEM_RECLAIM flag for workqueue Tony Nguyen
  1 sibling, 1 reply; 7+ messages in thread
From: Tony Nguyen @ 2023-01-31 21:36 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: Dave Ertman, netdev, anthony.l.nguyen, poros, ivecera,
	shiraz.saleem, mustafa.ismail, jgg, leonro, linux-rdma,
	Jaroslav Pulchart, Michal Swiatkowski, Gurucharan G

From: Dave Ertman <david.m.ertman@intel.com>

RDMA is not supported in ice on a PF that has been added to a bonded
interface. To enforce this, when an interface enters a bond, we unplug
the auxiliary device that supports RDMA functionality.  This unplug
currently happens in the context of handling the netdev bonding event.
This event is sent to the ice driver under RTNL context.  This is causing
a deadlock where the RDMA driver is waiting for the RTNL lock to complete
the removal.

Defer the unplugging/re-plugging of the auxiliary device to the service
task so that it is not performed under the RTNL lock context.

Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Link: https://lore.kernel.org/linux-rdma/68b14b11-d0c7-65c9-4eeb-0487c95e395d@leemhuis.info/
Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave")
Fixes: 4eace75e0853 ("RDMA/irdma: Report the correct link speed")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h      | 14 +++++---------
 drivers/net/ethernet/intel/ice/ice_main.c | 17 +++++++----------
 2 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 713069f809ec..3cad5e6b2ad1 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -506,6 +506,7 @@ enum ice_pf_flags {
 	ICE_FLAG_VF_VLAN_PRUNING,
 	ICE_FLAG_LINK_LENIENT_MODE_ENA,
 	ICE_FLAG_PLUG_AUX_DEV,
+	ICE_FLAG_UNPLUG_AUX_DEV,
 	ICE_FLAG_MTU_CHANGED,
 	ICE_FLAG_GNSS,			/* GNSS successfully initialized */
 	ICE_PF_FLAGS_NBITS		/* must be last */
@@ -950,16 +951,11 @@ static inline void ice_set_rdma_cap(struct ice_pf *pf)
  */
 static inline void ice_clear_rdma_cap(struct ice_pf *pf)
 {
-	/* We can directly unplug aux device here only if the flag bit
-	 * ICE_FLAG_PLUG_AUX_DEV is not set because ice_unplug_aux_dev()
-	 * could race with ice_plug_aux_dev() called from
-	 * ice_service_task(). In this case we only clear that bit now and
-	 * aux device will be unplugged later once ice_plug_aux_device()
-	 * called from ice_service_task() finishes (see ice_service_task()).
+	/* defer unplug to service task to avoid RTNL lock and
+	 * clear PLUG bit so that pending plugs don't interfere
 	 */
-	if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
-		ice_unplug_aux_dev(pf);
-
+	clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags);
+	set_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags);
 	clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
 }
 #endif /* _ICE_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 5f86e4111fa9..055494dbcce0 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2290,18 +2290,15 @@ static void ice_service_task(struct work_struct *work)
 		}
 	}
 
-	if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) {
-		/* Plug aux device per request */
+	/* Plug aux device per request */
+	if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
 		ice_plug_aux_dev(pf);
 
-		/* Mark plugging as done but check whether unplug was
-		 * requested during ice_plug_aux_dev() call
-		 * (e.g. from ice_clear_rdma_cap()) and if so then
-		 * plug aux device.
-		 */
-		if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
-			ice_unplug_aux_dev(pf);
-	}
+	/* unplug aux dev per request, if an unplug request came in
+	 * while processing a plug request, this will handle it
+	 */
+	if (test_and_clear_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags))
+		ice_unplug_aux_dev(pf);
 
 	if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) {
 		struct iidc_event *event;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net 2/6] ice: Do not use WQ_MEM_RECLAIM flag for workqueue
       [not found] <20230131213703.1347761-1-anthony.l.nguyen@intel.com>
  2023-01-31 21:36 ` [PATCH net 1/6] ice: avoid bonding causing auxiliary plug/unplug under RTNL lock Tony Nguyen
@ 2023-01-31 21:36 ` Tony Nguyen
  2023-02-01  9:51   ` Leon Romanovsky
  1 sibling, 1 reply; 7+ messages in thread
From: Tony Nguyen @ 2023-01-31 21:36 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: Anirudh Venkataramanan, netdev, anthony.l.nguyen, shiraz.saleem,
	mustafa.ismail, jgg, leonro, linux-rdma, Marcin Szycik,
	Jakub Andrysiak

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

When both ice and the irdma driver are loaded, a warning in
check_flush_dependency is being triggered. This is due to ice driver
workqueue being allocated with the WQ_MEM_RECLAIM flag and the irdma one
is not.

According to kernel documentation, this flag should be set if the
workqueue will be involved in the kernel's memory reclamation flow.
Since it is not, there is no need for the ice driver's WQ to have this
flag set so remove it.

Example trace:

[  +0.000004] workqueue: WQ_MEM_RECLAIM ice:ice_service_task [ice] is flushing !WQ_MEM_RECLAIM infiniband:0x0
[  +0.000139] WARNING: CPU: 0 PID: 728 at kernel/workqueue.c:2632 check_flush_dependency+0x178/0x1a0
[  +0.000011] Modules linked in: bonding tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_cha
in_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rfkill vfat fat intel_rapl_msr intel
_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct1
0dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_
core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_cm iw_cm iTCO_wdt iTCO_vendor_support ipmi_ssif irdma mei_me ib_uverbs
ib_core intel_uncore joydev pcspkr i2c_i801 acpi_ipmi mei lpc_ich i2c_smbus intel_pch_thermal ioatdma ipmi_si acpi_power_meter
acpi_pad xfs libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg ahci ixgbe libahci ice i40e igb crc32c_intel mdio i2c_algo_bit liba
ta dca wmi dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse
[  +0.000161]  [last unloaded: bonding]
[  +0.000006] CPU: 0 PID: 728 Comm: kworker/0:2 Tainted: G S                 6.2.0-rc2_next-queue-13jan-00458-gc20aabd57164 #1
[  +0.000006] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
[  +0.000003] Workqueue: ice ice_service_task [ice]
[  +0.000127] RIP: 0010:check_flush_dependency+0x178/0x1a0
[  +0.000005] Code: 89 8e 02 01 e8 49 3d 40 00 49 8b 55 18 48 8d 8d d0 00 00 00 48 8d b3 d0 00 00 00 4d 89 e0 48 c7 c7 e0 3b 08
9f e8 bb d3 07 01 <0f> 0b e9 be fe ff ff 80 3d 24 89 8e 02 00 0f 85 6b ff ff ff e9 06
[  +0.000004] RSP: 0018:ffff88810a39f990 EFLAGS: 00010282
[  +0.000005] RAX: 0000000000000000 RBX: ffff888141bc2400 RCX: 0000000000000000
[  +0.000004] RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffa1213a80
[  +0.000003] RBP: ffff888194bf3400 R08: ffffed117b306112 R09: ffffed117b306112
[  +0.000003] R10: ffff888bd983088b R11: ffffed117b306111 R12: 0000000000000000
[  +0.000003] R13: ffff888111f84d00 R14: ffff88810a3943ac R15: ffff888194bf3400
[  +0.000004] FS:  0000000000000000(0000) GS:ffff888bd9800000(0000) knlGS:0000000000000000
[  +0.000003] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000003] CR2: 000056035b208b60 CR3: 000000017795e005 CR4: 00000000007706f0
[  +0.000003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  +0.000003] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  +0.000002] PKRU: 55555554
[  +0.000003] Call Trace:
[  +0.000002]  <TASK>
[  +0.000003]  __flush_workqueue+0x203/0x840
[  +0.000006]  ? mutex_unlock+0x84/0xd0
[  +0.000008]  ? __pfx_mutex_unlock+0x10/0x10
[  +0.000004]  ? __pfx___flush_workqueue+0x10/0x10
[  +0.000006]  ? mutex_lock+0xa3/0xf0
[  +0.000005]  ib_cache_cleanup_one+0x39/0x190 [ib_core]
[  +0.000174]  __ib_unregister_device+0x84/0xf0 [ib_core]
[  +0.000094]  ib_unregister_device+0x25/0x30 [ib_core]
[  +0.000093]  irdma_ib_unregister_device+0x97/0xc0 [irdma]
[  +0.000064]  ? __pfx_irdma_ib_unregister_device+0x10/0x10 [irdma]
[  +0.000059]  ? up_write+0x5c/0x90
[  +0.000005]  irdma_remove+0x36/0x90 [irdma]
[  +0.000062]  auxiliary_bus_remove+0x32/0x50
[  +0.000007]  device_release_driver_internal+0xfa/0x1c0
[  +0.000005]  bus_remove_device+0x18a/0x260
[  +0.000007]  device_del+0x2e5/0x650
[  +0.000005]  ? __pfx_device_del+0x10/0x10
[  +0.000003]  ? mutex_unlock+0x84/0xd0
[  +0.000004]  ? __pfx_mutex_unlock+0x10/0x10
[  +0.000004]  ? _raw_spin_unlock+0x18/0x40
[  +0.000005]  ice_unplug_aux_dev+0x52/0x70 [ice]
[  +0.000160]  ice_service_task+0x1309/0x14f0 [ice]
[  +0.000134]  ? __pfx___schedule+0x10/0x10
[  +0.000006]  process_one_work+0x3b1/0x6c0
[  +0.000008]  worker_thread+0x69/0x670
[  +0.000005]  ? __kthread_parkme+0xec/0x110
[  +0.000007]  ? __pfx_worker_thread+0x10/0x10
[  +0.000005]  kthread+0x17f/0x1b0
[  +0.000005]  ? __pfx_kthread+0x10/0x10
[  +0.000004]  ret_from_fork+0x29/0x50
[  +0.000009]  </TASK>

Fixes: 940b61af02f4 ("ice: Initialize PF and setup miscellaneous interrupt")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Jakub Andrysiak <jakub.andrysiak@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 055494dbcce0..8b81e661a0c9 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -5538,7 +5538,7 @@ static int __init ice_module_init(void)
 	pr_info("%s\n", ice_driver_string);
 	pr_info("%s\n", ice_copyright);
 
-	ice_wq = alloc_workqueue("%s", WQ_MEM_RECLAIM, 0, KBUILD_MODNAME);
+	ice_wq = alloc_workqueue("%s", 0, 0, KBUILD_MODNAME);
 	if (!ice_wq) {
 		pr_err("Failed to create workqueue\n");
 		return -ENOMEM;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/6] ice: avoid bonding causing auxiliary plug/unplug under RTNL lock
  2023-01-31 21:36 ` [PATCH net 1/6] ice: avoid bonding causing auxiliary plug/unplug under RTNL lock Tony Nguyen
@ 2023-02-01  9:49   ` Leon Romanovsky
  2023-02-06 23:12     ` Tony Nguyen
  2023-02-14 22:24     ` Ertman, David M
  0 siblings, 2 replies; 7+ messages in thread
From: Leon Romanovsky @ 2023-02-01  9:49 UTC (permalink / raw)
  To: Tony Nguyen
  Cc: davem, kuba, pabeni, edumazet, Dave Ertman, netdev, poros,
	ivecera, shiraz.saleem, mustafa.ismail, jgg, linux-rdma,
	Jaroslav Pulchart, Michal Swiatkowski, Gurucharan G

On Tue, Jan 31, 2023 at 01:36:58PM -0800, Tony Nguyen wrote:
> From: Dave Ertman <david.m.ertman@intel.com>
> 
> RDMA is not supported in ice on a PF that has been added to a bonded
> interface. To enforce this, when an interface enters a bond, we unplug
> the auxiliary device that supports RDMA functionality.  This unplug
> currently happens in the context of handling the netdev bonding event.
> This event is sent to the ice driver under RTNL context.  This is causing
> a deadlock where the RDMA driver is waiting for the RTNL lock to complete
> the removal.
> 
> Defer the unplugging/re-plugging of the auxiliary device to the service
> task so that it is not performed under the RTNL lock context.
> 
> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> Link: https://lore.kernel.org/linux-rdma/68b14b11-d0c7-65c9-4eeb-0487c95e395d@leemhuis.info/
> Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave")
> Fixes: 4eace75e0853 ("RDMA/irdma: Report the correct link speed")
> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice.h      | 14 +++++---------
>  drivers/net/ethernet/intel/ice/ice_main.c | 17 +++++++----------
>  2 files changed, 12 insertions(+), 19 deletions(-)

<...>

> index 5f86e4111fa9..055494dbcce0 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -2290,18 +2290,15 @@ static void ice_service_task(struct work_struct *work)
>  		}
>  	}
>  
> -	if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) {
> -		/* Plug aux device per request */
> +	/* Plug aux device per request */
> +	if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))

Very interesting pattern. You are not holding any locks while running
ice_service_task() and clear bits before you actually performed requested
operation.

How do you protect from races while testing bits in other places of ice
driver?

Thanks

>  		ice_plug_aux_dev(pf);
>  
> -		/* Mark plugging as done but check whether unplug was
> -		 * requested during ice_plug_aux_dev() call
> -		 * (e.g. from ice_clear_rdma_cap()) and if so then
> -		 * plug aux device.
> -		 */
> -		if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
> -			ice_unplug_aux_dev(pf);
> -	}
> +	/* unplug aux dev per request, if an unplug request came in
> +	 * while processing a plug request, this will handle it
> +	 */
> +	if (test_and_clear_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags))
> +		ice_unplug_aux_dev(pf);
>  
>  	if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) {
>  		struct iidc_event *event;
> -- 
> 2.38.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 2/6] ice: Do not use WQ_MEM_RECLAIM flag for workqueue
  2023-01-31 21:36 ` [PATCH net 2/6] ice: Do not use WQ_MEM_RECLAIM flag for workqueue Tony Nguyen
@ 2023-02-01  9:51   ` Leon Romanovsky
  0 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2023-02-01  9:51 UTC (permalink / raw)
  To: Tony Nguyen
  Cc: davem, kuba, pabeni, edumazet, Anirudh Venkataramanan, netdev,
	shiraz.saleem, mustafa.ismail, jgg, linux-rdma, Marcin Szycik,
	Jakub Andrysiak

On Tue, Jan 31, 2023 at 01:36:59PM -0800, Tony Nguyen wrote:
> From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
> 
> When both ice and the irdma driver are loaded, a warning in
> check_flush_dependency is being triggered. This is due to ice driver
> workqueue being allocated with the WQ_MEM_RECLAIM flag and the irdma one
> is not.
> 
> According to kernel documentation, this flag should be set if the
> workqueue will be involved in the kernel's memory reclamation flow.
> Since it is not, there is no need for the ice driver's WQ to have this
> flag set so remove it.
> 
> Example trace:
> 
> [  +0.000004] workqueue: WQ_MEM_RECLAIM ice:ice_service_task [ice] is flushing !WQ_MEM_RECLAIM infiniband:0x0
> [  +0.000139] WARNING: CPU: 0 PID: 728 at kernel/workqueue.c:2632 check_flush_dependency+0x178/0x1a0
> [  +0.000011] Modules linked in: bonding tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_cha
> in_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rfkill vfat fat intel_rapl_msr intel
> _rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct1
> 0dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_
> core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_cm iw_cm iTCO_wdt iTCO_vendor_support ipmi_ssif irdma mei_me ib_uverbs
> ib_core intel_uncore joydev pcspkr i2c_i801 acpi_ipmi mei lpc_ich i2c_smbus intel_pch_thermal ioatdma ipmi_si acpi_power_meter
> acpi_pad xfs libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg ahci ixgbe libahci ice i40e igb crc32c_intel mdio i2c_algo_bit liba
> ta dca wmi dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse
> [  +0.000161]  [last unloaded: bonding]
> [  +0.000006] CPU: 0 PID: 728 Comm: kworker/0:2 Tainted: G S                 6.2.0-rc2_next-queue-13jan-00458-gc20aabd57164 #1
> [  +0.000006] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
> [  +0.000003] Workqueue: ice ice_service_task [ice]
> [  +0.000127] RIP: 0010:check_flush_dependency+0x178/0x1a0
> [  +0.000005] Code: 89 8e 02 01 e8 49 3d 40 00 49 8b 55 18 48 8d 8d d0 00 00 00 48 8d b3 d0 00 00 00 4d 89 e0 48 c7 c7 e0 3b 08
> 9f e8 bb d3 07 01 <0f> 0b e9 be fe ff ff 80 3d 24 89 8e 02 00 0f 85 6b ff ff ff e9 06
> [  +0.000004] RSP: 0018:ffff88810a39f990 EFLAGS: 00010282
> [  +0.000005] RAX: 0000000000000000 RBX: ffff888141bc2400 RCX: 0000000000000000
> [  +0.000004] RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffa1213a80
> [  +0.000003] RBP: ffff888194bf3400 R08: ffffed117b306112 R09: ffffed117b306112
> [  +0.000003] R10: ffff888bd983088b R11: ffffed117b306111 R12: 0000000000000000
> [  +0.000003] R13: ffff888111f84d00 R14: ffff88810a3943ac R15: ffff888194bf3400
> [  +0.000004] FS:  0000000000000000(0000) GS:ffff888bd9800000(0000) knlGS:0000000000000000
> [  +0.000003] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  +0.000003] CR2: 000056035b208b60 CR3: 000000017795e005 CR4: 00000000007706f0
> [  +0.000003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  +0.000003] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  +0.000002] PKRU: 55555554
> [  +0.000003] Call Trace:
> [  +0.000002]  <TASK>
> [  +0.000003]  __flush_workqueue+0x203/0x840
> [  +0.000006]  ? mutex_unlock+0x84/0xd0
> [  +0.000008]  ? __pfx_mutex_unlock+0x10/0x10
> [  +0.000004]  ? __pfx___flush_workqueue+0x10/0x10
> [  +0.000006]  ? mutex_lock+0xa3/0xf0
> [  +0.000005]  ib_cache_cleanup_one+0x39/0x190 [ib_core]
> [  +0.000174]  __ib_unregister_device+0x84/0xf0 [ib_core]
> [  +0.000094]  ib_unregister_device+0x25/0x30 [ib_core]
> [  +0.000093]  irdma_ib_unregister_device+0x97/0xc0 [irdma]
> [  +0.000064]  ? __pfx_irdma_ib_unregister_device+0x10/0x10 [irdma]
> [  +0.000059]  ? up_write+0x5c/0x90
> [  +0.000005]  irdma_remove+0x36/0x90 [irdma]
> [  +0.000062]  auxiliary_bus_remove+0x32/0x50
> [  +0.000007]  device_release_driver_internal+0xfa/0x1c0
> [  +0.000005]  bus_remove_device+0x18a/0x260
> [  +0.000007]  device_del+0x2e5/0x650
> [  +0.000005]  ? __pfx_device_del+0x10/0x10
> [  +0.000003]  ? mutex_unlock+0x84/0xd0
> [  +0.000004]  ? __pfx_mutex_unlock+0x10/0x10
> [  +0.000004]  ? _raw_spin_unlock+0x18/0x40
> [  +0.000005]  ice_unplug_aux_dev+0x52/0x70 [ice]
> [  +0.000160]  ice_service_task+0x1309/0x14f0 [ice]
> [  +0.000134]  ? __pfx___schedule+0x10/0x10
> [  +0.000006]  process_one_work+0x3b1/0x6c0
> [  +0.000008]  worker_thread+0x69/0x670
> [  +0.000005]  ? __kthread_parkme+0xec/0x110
> [  +0.000007]  ? __pfx_worker_thread+0x10/0x10
> [  +0.000005]  kthread+0x17f/0x1b0
> [  +0.000005]  ? __pfx_kthread+0x10/0x10
> [  +0.000004]  ret_from_fork+0x29/0x50
> [  +0.000009]  </TASK>
> 
> Fixes: 940b61af02f4 ("ice: Initialize PF and setup miscellaneous interrupt")
> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
> Tested-by: Jakub Andrysiak <jakub.andrysiak@intel.com>
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Thanks,
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/6] ice: avoid bonding causing auxiliary plug/unplug under RTNL lock
  2023-02-01  9:49   ` Leon Romanovsky
@ 2023-02-06 23:12     ` Tony Nguyen
  2023-02-14 22:24     ` Ertman, David M
  1 sibling, 0 replies; 7+ messages in thread
From: Tony Nguyen @ 2023-02-06 23:12 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
	edumazet@google.com, Ertman, David M, netdev@vger.kernel.org,
	poros, ivecera, Saleem, Shiraz, Ismail, Mustafa, jgg@nvidia.com,
	linux-rdma@vger.kernel.org, Jaroslav Pulchart, Michal Swiatkowski,
	G, GurucharanX



On 2/1/2023 1:49 AM, Leon Romanovsky wrote:
> On Tue, Jan 31, 2023 at 01:36:58PM -0800, Tony Nguyen wrote:
>> From: Dave Ertman <david.m.ertman@intel.com>
>>
>> RDMA is not supported in ice on a PF that has been added to a bonded
>> interface. To enforce this, when an interface enters a bond, we unplug
>> the auxiliary device that supports RDMA functionality.  This unplug
>> currently happens in the context of handling the netdev bonding event.
>> This event is sent to the ice driver under RTNL context.  This is causing
>> a deadlock where the RDMA driver is waiting for the RTNL lock to complete
>> the removal.
>>
>> Defer the unplugging/re-plugging of the auxiliary device to the service
>> task so that it is not performed under the RTNL lock context.
>>
>> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
>> Link: https://lore.kernel.org/linux-rdma/68b14b11-d0c7-65c9-4eeb-0487c95e395d@leemhuis.info/
>> Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave")
>> Fixes: 4eace75e0853 ("RDMA/irdma: Report the correct link speed")
>> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
>> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
>> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
>> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
>> ---
>>   drivers/net/ethernet/intel/ice/ice.h      | 14 +++++---------
>>   drivers/net/ethernet/intel/ice/ice_main.c | 17 +++++++----------
>>   2 files changed, 12 insertions(+), 19 deletions(-)
> 
> <...>
> 
>> index 5f86e4111fa9..055494dbcce0 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_main.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
>> @@ -2290,18 +2290,15 @@ static void ice_service_task(struct work_struct *work)
>>   		}
>>   	}
>>   
>> -	if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) {
>> -		/* Plug aux device per request */
>> +	/* Plug aux device per request */
>> +	if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
> 
> Very interesting pattern. You are not holding any locks while running
> ice_service_task() and clear bits before you actually performed requested
> operation.
> 
> How do you protect from races while testing bits in other places of ice
> driver?

I haven't heard from Dave so I'm going to drop this from the series so 
that the other patches can move on.

Thanks,
Tony

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH net 1/6] ice: avoid bonding causing auxiliary plug/unplug under RTNL lock
  2023-02-01  9:49   ` Leon Romanovsky
  2023-02-06 23:12     ` Tony Nguyen
@ 2023-02-14 22:24     ` Ertman, David M
  2023-02-15 12:13       ` Leon Romanovsky
  1 sibling, 1 reply; 7+ messages in thread
From: Ertman, David M @ 2023-02-14 22:24 UTC (permalink / raw)
  To: Leon Romanovsky, Nguyen, Anthony L
  Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
	edumazet@google.com, netdev@vger.kernel.org, poros, ivecera,
	Saleem, Shiraz, Ismail, Mustafa, jgg@nvidia.com,
	linux-rdma@vger.kernel.org, Jaroslav Pulchart, Michal Swiatkowski,
	G, GurucharanX

> -----Original Message-----
> From: Leon Romanovsky <leonro@nvidia.com>
> Sent: Wednesday, February 1, 2023 1:50 AM
> Subject: Re: [PATCH net 1/6] ice: avoid bonding causing auxiliary plug/unplug
> under RTNL lock
> 
> On Tue, Jan 31, 2023 at 01:36:58PM -0800, Tony Nguyen wrote:
> > From: Dave Ertman <david.m.ertman@intel.com>
> >
> > RDMA is not supported in ice on a PF that has been added to a bonded
> > interface. To enforce this, when an interface enters a bond, we unplug
> > the auxiliary device that supports RDMA functionality.  This unplug
> > currently happens in the context of handling the netdev bonding event.
> > This event is sent to the ice driver under RTNL context.  This is causing
> > a deadlock where the RDMA driver is waiting for the RTNL lock to complete
> > the removal.
> >
> > Defer the unplugging/re-plugging of the auxiliary device to the service
> > task so that it is not performed under the RTNL lock context.
> >
> > Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> > Link: https://lore.kernel.org/linux-rdma/68b14b11-d0c7-65c9-4eeb-
> 0487c95e395d@leemhuis.info/
> > Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave")
> > Fixes: 4eace75e0853 ("RDMA/irdma: Report the correct link speed")
> > Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> > Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent
> worker at Intel)
> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice.h      | 14 +++++---------
> >  drivers/net/ethernet/intel/ice/ice_main.c | 17 +++++++----------
> >  2 files changed, 12 insertions(+), 19 deletions(-)
> 
> <...>
> 
> > index 5f86e4111fa9..055494dbcce0 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_main.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> > @@ -2290,18 +2290,15 @@ static void ice_service_task(struct work_struct
> *work)
> >  		}
> >  	}
> >
> > -	if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) {
> > -		/* Plug aux device per request */
> > +	/* Plug aux device per request */
> > +	if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
> 
> Very interesting pattern. You are not holding any locks while running
> ice_service_task() and clear bits before you actually performed requested
> operation.
> 
> How do you protect from races while testing bits in other places of ice
> driver?

Leon,

Thanks for the review and sorry for the late reply, got sidetracked into another project.

Your review caused us to re-evaluate the plug/unplug flow, and since these bits are only set/cleared in
the bonding event flow, and the UNPLUG bit set clears the PLUG bit, we attain the desired outcome
in all cases if we swap the order that we evaluate the bits in the service task.

Any multi-event situation that happens between or during service task will be handled in the expected way.

DaveE

> 
> Thanks
> 
> >  		ice_plug_aux_dev(pf);
> >
> > -		/* Mark plugging as done but check whether unplug was
> > -		 * requested during ice_plug_aux_dev() call
> > -		 * (e.g. from ice_clear_rdma_cap()) and if so then
> > -		 * plug aux device.
> > -		 */
> > -		if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf-
> >flags))
> > -			ice_unplug_aux_dev(pf);
> > -	}
> > +	/* unplug aux dev per request, if an unplug request came in
> > +	 * while processing a plug request, this will handle it
> > +	 */
> > +	if (test_and_clear_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags))
> > +		ice_unplug_aux_dev(pf);
> >
> >  	if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) {
> >  		struct iidc_event *event;
> > --
> > 2.38.1
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/6] ice: avoid bonding causing auxiliary plug/unplug under RTNL lock
  2023-02-14 22:24     ` Ertman, David M
@ 2023-02-15 12:13       ` Leon Romanovsky
  0 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2023-02-15 12:13 UTC (permalink / raw)
  To: Ertman, David M
  Cc: Nguyen, Anthony L, davem@davemloft.net, kuba@kernel.org,
	pabeni@redhat.com, edumazet@google.com, netdev@vger.kernel.org,
	poros, ivecera, Saleem, Shiraz, Ismail, Mustafa, jgg@nvidia.com,
	linux-rdma@vger.kernel.org, Jaroslav Pulchart, Michal Swiatkowski,
	G, GurucharanX

On Tue, Feb 14, 2023 at 10:24:04PM +0000, Ertman, David M wrote:
> > -----Original Message-----
> > From: Leon Romanovsky <leonro@nvidia.com>
> > Sent: Wednesday, February 1, 2023 1:50 AM
> > Subject: Re: [PATCH net 1/6] ice: avoid bonding causing auxiliary plug/unplug
> > under RTNL lock
> > 
> > On Tue, Jan 31, 2023 at 01:36:58PM -0800, Tony Nguyen wrote:
> > > From: Dave Ertman <david.m.ertman@intel.com>
> > >
> > > RDMA is not supported in ice on a PF that has been added to a bonded
> > > interface. To enforce this, when an interface enters a bond, we unplug
> > > the auxiliary device that supports RDMA functionality.  This unplug
> > > currently happens in the context of handling the netdev bonding event.
> > > This event is sent to the ice driver under RTNL context.  This is causing
> > > a deadlock where the RDMA driver is waiting for the RTNL lock to complete
> > > the removal.
> > >
> > > Defer the unplugging/re-plugging of the auxiliary device to the service
> > > task so that it is not performed under the RTNL lock context.
> > >
> > > Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> > > Link: https://lore.kernel.org/linux-rdma/68b14b11-d0c7-65c9-4eeb-
> > 0487c95e395d@leemhuis.info/
> > > Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave")
> > > Fixes: 4eace75e0853 ("RDMA/irdma: Report the correct link speed")
> > > Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> > > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> > > Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent
> > worker at Intel)
> > > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> > > ---
> > >  drivers/net/ethernet/intel/ice/ice.h      | 14 +++++---------
> > >  drivers/net/ethernet/intel/ice/ice_main.c | 17 +++++++----------
> > >  2 files changed, 12 insertions(+), 19 deletions(-)
> > 
> > <...>
> > 
> > > index 5f86e4111fa9..055494dbcce0 100644
> > > --- a/drivers/net/ethernet/intel/ice/ice_main.c
> > > +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> > > @@ -2290,18 +2290,15 @@ static void ice_service_task(struct work_struct
> > *work)
> > >  		}
> > >  	}
> > >
> > > -	if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) {
> > > -		/* Plug aux device per request */
> > > +	/* Plug aux device per request */
> > > +	if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
> > 
> > Very interesting pattern. You are not holding any locks while running
> > ice_service_task() and clear bits before you actually performed requested
> > operation.
> > 
> > How do you protect from races while testing bits in other places of ice
> > driver?
> 
> Leon,
> 
> Thanks for the review and sorry for the late reply, got sidetracked into another project.
> 
> Your review caused us to re-evaluate the plug/unplug flow, and since these bits are only set/cleared in
> the bonding event flow, and the UNPLUG bit set clears the PLUG bit, we attain the desired outcome
> in all cases if we swap the order that we evaluate the bits in the service task.

I afraid that it won't make ice state machine more understandable. :)

Thanks

> 
> Any multi-event situation that happens between or during service task will be handled in the expected way.
> 
> DaveE
> 
> > 
> > Thanks
> > 
> > >  		ice_plug_aux_dev(pf);
> > >
> > > -		/* Mark plugging as done but check whether unplug was
> > > -		 * requested during ice_plug_aux_dev() call
> > > -		 * (e.g. from ice_clear_rdma_cap()) and if so then
> > > -		 * plug aux device.
> > > -		 */
> > > -		if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf-
> > >flags))
> > > -			ice_unplug_aux_dev(pf);
> > > -	}
> > > +	/* unplug aux dev per request, if an unplug request came in
> > > +	 * while processing a plug request, this will handle it
> > > +	 */
> > > +	if (test_and_clear_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags))
> > > +		ice_unplug_aux_dev(pf);
> > >
> > >  	if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) {
> > >  		struct iidc_event *event;
> > > --
> > > 2.38.1
> > >

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-02-15 12:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20230131213703.1347761-1-anthony.l.nguyen@intel.com>
2023-01-31 21:36 ` [PATCH net 1/6] ice: avoid bonding causing auxiliary plug/unplug under RTNL lock Tony Nguyen
2023-02-01  9:49   ` Leon Romanovsky
2023-02-06 23:12     ` Tony Nguyen
2023-02-14 22:24     ` Ertman, David M
2023-02-15 12:13       ` Leon Romanovsky
2023-01-31 21:36 ` [PATCH net 2/6] ice: Do not use WQ_MEM_RECLAIM flag for workqueue Tony Nguyen
2023-02-01  9:51   ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).