* [Intel-wired-lan] [PATCH net 0/2] ice: avoid sleeping/scheduling in atomic contexts
@ 2022-03-23 12:43 Alexander Lobakin
2022-03-23 12:43 ` [Intel-wired-lan] [PATCH net 1/2] ice: fix 'scheduling while atomic' on aux critical err interrupt Alexander Lobakin
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Alexander Lobakin @ 2022-03-23 12:43 UTC (permalink / raw)
To: intel-wired-lan
The `ice_misc_intr() + ice_send_event_to_aux()` infamous pair failed
once again.
Fix yet another (hopefully last one) 'scheduling while atomic' splat
and finally plug the hole to gracefully return prematurely when
invoked in wrong context instead of panicking.
Alexander Lobakin (2):
ice: fix 'scheduling while atomic' on aux critical err interrupt
ice: don't allow to run ice_send_event_to_aux() in atomic ctx
drivers/net/ethernet/intel/ice/ice.h | 2 ++
drivers/net/ethernet/intel/ice/ice_idc.c | 3 +++
drivers/net/ethernet/intel/ice/ice_main.c | 25 ++++++++++++++---------
3 files changed, 20 insertions(+), 10 deletions(-)
--
Urgent fix, would like to make it directly through -net.
--
2.35.1
^ permalink raw reply [flat|nested] 6+ messages in thread* [Intel-wired-lan] [PATCH net 1/2] ice: fix 'scheduling while atomic' on aux critical err interrupt 2022-03-23 12:43 [Intel-wired-lan] [PATCH net 0/2] ice: avoid sleeping/scheduling in atomic contexts Alexander Lobakin @ 2022-03-23 12:43 ` Alexander Lobakin 2022-03-23 12:43 ` [Intel-wired-lan] [PATCH net 2/2] ice: don't allow to run ice_send_event_to_aux() in atomic ctx Alexander Lobakin ` (2 subsequent siblings) 3 siblings, 0 replies; 6+ messages in thread From: Alexander Lobakin @ 2022-03-23 12:43 UTC (permalink / raw) To: intel-wired-lan There's a kernel BUG splat on processing aux critical error interrupts in ice_misc_intr(): [ 2100.917085] BUG: scheduling while atomic: swapper/15/0/0x00010000 ... [ 2101.060770] Call Trace: [ 2101.063229] <IRQ> [ 2101.065252] dump_stack+0x41/0x60 [ 2101.068587] __schedule_bug.cold.100+0x4c/0x58 [ 2101.073060] __schedule+0x6a4/0x830 [ 2101.076570] schedule+0x35/0xa0 [ 2101.079727] schedule_preempt_disabled+0xa/0x10 [ 2101.084284] __mutex_lock.isra.7+0x310/0x420 [ 2101.088580] ? ice_misc_intr+0x201/0x2e0 [ice] [ 2101.093078] ice_send_event_to_aux+0x25/0x70 [ice] [ 2101.097921] ice_misc_intr+0x220/0x2e0 [ice] [ 2101.102232] __handle_irq_event_percpu+0x40/0x180 [ 2101.106965] handle_irq_event_percpu+0x30/0x80 [ 2101.111434] handle_irq_event+0x36/0x53 [ 2101.115292] handle_edge_irq+0x82/0x190 [ 2101.119148] handle_irq+0x1c/0x30 [ 2101.122480] do_IRQ+0x49/0xd0 [ 2101.125465] common_interrupt+0xf/0xf [ 2101.129146] </IRQ> ... As Andrew correctly mentioned previously[0], the following call ladder happens: ice_misc_intr() <- hardirq ice_send_event_to_aux() device_lock() mutex_lock() might_sleep() might_resched() <- oops Add a new PF state bit which indicates that an aux critical error occurred and serve it in ice_service_task() in process context. The new ice_pf::oicr_err_reg is read-write in both hardirq and process contexts, but only 3 bits of non-critical data probably aren't worth explicit synchronizing (and they're even in the same byte [31:24]). [0] https://lore.kernel.org/all/YeSRUVmrdmlUXHDn at lunn.ch Fixes: 348048e724a0e ("ice: Implement iidc operations") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Michal Kubiak <michal.kubiak@intel.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> --- drivers/net/ethernet/intel/ice/ice.h | 2 ++ drivers/net/ethernet/intel/ice/ice_main.c | 25 ++++++++++++++--------- 2 files changed, 17 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index bea1d1e39fa2..2ca887076dd4 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -290,6 +290,7 @@ enum ice_pf_state { ICE_LINK_DEFAULT_OVERRIDE_PENDING, ICE_PHY_INIT_COMPLETE, ICE_FD_VF_FLUSH_CTX, /* set at FD Rx IRQ or timeout */ + ICE_AUX_ERR_PENDING, ICE_STATE_NBITS /* must be last */ }; @@ -559,6 +560,7 @@ struct ice_pf { wait_queue_head_t reset_wait_queue; u32 hw_csum_rx_error; + u32 oicr_err_reg; u16 oicr_idx; /* Other interrupt cause MSIX vector index */ u16 num_avail_sw_msix; /* remaining MSIX SW vectors left unclaimed */ u16 max_pf_txqs; /* Total Tx queues PF wide */ diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index b7e8744b0c0a..296f9d5f7408 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2255,6 +2255,19 @@ static void ice_service_task(struct work_struct *work) return; } + if (test_and_clear_bit(ICE_AUX_ERR_PENDING, pf->state)) { + struct iidc_event *event; + + event = kzalloc(sizeof(*event), GFP_KERNEL); + if (event) { + set_bit(IIDC_EVENT_CRIT_ERR, event->type); + /* report the entire OICR value to AUX driver */ + swap(event->reg, pf->oicr_err_reg); + ice_send_event_to_aux(pf, event); + kfree(event); + } + } + if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) { /* Plug aux device per request */ ice_plug_aux_dev(pf); @@ -3041,17 +3054,9 @@ static irqreturn_t ice_misc_intr(int __always_unused irq, void *data) #define ICE_AUX_CRIT_ERR (PFINT_OICR_PE_CRITERR_M | PFINT_OICR_HMC_ERR_M | PFINT_OICR_PE_PUSH_M) if (oicr & ICE_AUX_CRIT_ERR) { - struct iidc_event *event; - + pf->oicr_err_reg |= oicr; + set_bit(ICE_AUX_ERR_PENDING, pf->state); ena_mask &= ~ICE_AUX_CRIT_ERR; - event = kzalloc(sizeof(*event), GFP_ATOMIC); - if (event) { - set_bit(IIDC_EVENT_CRIT_ERR, event->type); - /* report the entire OICR value to AUX driver */ - event->reg = oicr; - ice_send_event_to_aux(pf, event); - kfree(event); - } } /* Report any remaining unexpected interrupts */ -- 2.35.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Intel-wired-lan] [PATCH net 2/2] ice: don't allow to run ice_send_event_to_aux() in atomic ctx 2022-03-23 12:43 [Intel-wired-lan] [PATCH net 0/2] ice: avoid sleeping/scheduling in atomic contexts Alexander Lobakin 2022-03-23 12:43 ` [Intel-wired-lan] [PATCH net 1/2] ice: fix 'scheduling while atomic' on aux critical err interrupt Alexander Lobakin @ 2022-03-23 12:43 ` Alexander Lobakin 2022-03-23 17:40 ` [Intel-wired-lan] [PATCH net 0/2] ice: avoid sleeping/scheduling in atomic contexts Jakub Kicinski 2022-03-23 17:50 ` patchwork-bot+netdevbpf 3 siblings, 0 replies; 6+ messages in thread From: Alexander Lobakin @ 2022-03-23 12:43 UTC (permalink / raw) To: intel-wired-lan ice_send_event_to_aux() eventually descends to mutex_lock() (-> might_sched()), so it must not be called under non-task context. However, at least two fixes have happened already for the bug splats occurred due to this function being called from atomic context. To make the emergency landings softer, bail out early when executed in non-task context emitting a warn splat only once. This way we trade some events being potentially lost for system stability and avoid any related hangs and crashes. Fixes: 348048e724a0e ("ice: Implement iidc operations") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> --- drivers/net/ethernet/intel/ice/ice_idc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_idc.c b/drivers/net/ethernet/intel/ice/ice_idc.c index fc3580167e7b..5559230eff8b 100644 --- a/drivers/net/ethernet/intel/ice/ice_idc.c +++ b/drivers/net/ethernet/intel/ice/ice_idc.c @@ -34,6 +34,9 @@ void ice_send_event_to_aux(struct ice_pf *pf, struct iidc_event *event) { struct iidc_auxiliary_drv *iadrv; + if (WARN_ON_ONCE(!in_task())) + return; + if (!pf->adev) return; -- 2.35.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Intel-wired-lan] [PATCH net 0/2] ice: avoid sleeping/scheduling in atomic contexts 2022-03-23 12:43 [Intel-wired-lan] [PATCH net 0/2] ice: avoid sleeping/scheduling in atomic contexts Alexander Lobakin 2022-03-23 12:43 ` [Intel-wired-lan] [PATCH net 1/2] ice: fix 'scheduling while atomic' on aux critical err interrupt Alexander Lobakin 2022-03-23 12:43 ` [Intel-wired-lan] [PATCH net 2/2] ice: don't allow to run ice_send_event_to_aux() in atomic ctx Alexander Lobakin @ 2022-03-23 17:40 ` Jakub Kicinski 2022-03-23 17:54 ` Alexander Lobakin 2022-03-23 17:50 ` patchwork-bot+netdevbpf 3 siblings, 1 reply; 6+ messages in thread From: Jakub Kicinski @ 2022-03-23 17:40 UTC (permalink / raw) To: intel-wired-lan On Wed, 23 Mar 2022 13:43:51 +0100 Alexander Lobakin wrote: > -- > Urgent fix, would like to make it directly through -net. You may want to use three hyphens, two hyphens mean footer. Email clients gray those out, it's easy to miss :) ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Intel-wired-lan] [PATCH net 0/2] ice: avoid sleeping/scheduling in atomic contexts 2022-03-23 17:40 ` [Intel-wired-lan] [PATCH net 0/2] ice: avoid sleeping/scheduling in atomic contexts Jakub Kicinski @ 2022-03-23 17:54 ` Alexander Lobakin 0 siblings, 0 replies; 6+ messages in thread From: Alexander Lobakin @ 2022-03-23 17:54 UTC (permalink / raw) To: intel-wired-lan From: Jakub Kicinski <kuba@kernel.org> Date: Wed, 23 Mar 2022 10:40:05 -0700 > On Wed, 23 Mar 2022 13:43:51 +0100 Alexander Lobakin wrote: > > -- > > Urgent fix, would like to make it directly through -net. > > You may want to use three hyphens, two hyphens mean footer. > Email clients gray those out, it's easy to miss :) Good to know, thanks! :) Al ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Intel-wired-lan] [PATCH net 0/2] ice: avoid sleeping/scheduling in atomic contexts 2022-03-23 12:43 [Intel-wired-lan] [PATCH net 0/2] ice: avoid sleeping/scheduling in atomic contexts Alexander Lobakin ` (2 preceding siblings ...) 2022-03-23 17:40 ` [Intel-wired-lan] [PATCH net 0/2] ice: avoid sleeping/scheduling in atomic contexts Jakub Kicinski @ 2022-03-23 17:50 ` patchwork-bot+netdevbpf 3 siblings, 0 replies; 6+ messages in thread From: patchwork-bot+netdevbpf @ 2022-03-23 17:50 UTC (permalink / raw) To: intel-wired-lan Hello: This series was applied to netdev/net.git (master) by Jakub Kicinski <kuba@kernel.org>: On Wed, 23 Mar 2022 13:43:51 +0100 you wrote: > The `ice_misc_intr() + ice_send_event_to_aux()` infamous pair failed > once again. > Fix yet another (hopefully last one) 'scheduling while atomic' splat > and finally plug the hole to gracefully return prematurely when > invoked in wrong context instead of panicking. > > Alexander Lobakin (2): > ice: fix 'scheduling while atomic' on aux critical err interrupt > ice: don't allow to run ice_send_event_to_aux() in atomic ctx > > [...] Here is the summary with links: - [net,1/2] ice: fix 'scheduling while atomic' on aux critical err interrupt https://git.kernel.org/netdev/net/c/32d53c0aa3a7 - [net,2/2] ice: don't allow to run ice_send_event_to_aux() in atomic ctx https://git.kernel.org/netdev/net/c/5a3156932da0 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-03-23 17:54 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-03-23 12:43 [Intel-wired-lan] [PATCH net 0/2] ice: avoid sleeping/scheduling in atomic contexts Alexander Lobakin 2022-03-23 12:43 ` [Intel-wired-lan] [PATCH net 1/2] ice: fix 'scheduling while atomic' on aux critical err interrupt Alexander Lobakin 2022-03-23 12:43 ` [Intel-wired-lan] [PATCH net 2/2] ice: don't allow to run ice_send_event_to_aux() in atomic ctx Alexander Lobakin 2022-03-23 17:40 ` [Intel-wired-lan] [PATCH net 0/2] ice: avoid sleeping/scheduling in atomic contexts Jakub Kicinski 2022-03-23 17:54 ` Alexander Lobakin 2022-03-23 17:50 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox