netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] qed: Yet another scheduling while atomic fix
@ 2023-07-26 17:19 Konstantin Khorenko
  2023-07-26 17:19 ` [PATCH 1/1] qed: Fix scheduling in a tasklet while getting stats Konstantin Khorenko
  0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Khorenko @ 2023-07-26 17:19 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Manish Chopra, Ariel Elior, David Miller,
	Sudarsana Kalluru, Paolo Abeni, Konstantin Khorenko

Running an old RHEL7-based kernel we have got several cases of following
BUG_ON():

  BUG: scheduling while atomic: swapper/24/0/0x00000100

   [<ffffffffb41c6199>] schedule+0x29/0x70
   [<ffffffffb41c5512>] schedule_hrtimeout_range_clock+0xb2/0x150
   [<ffffffffb41c55c3>] schedule_hrtimeout_range+0x13/0x20
   [<ffffffffb41c3bcf>] usleep_range+0x4f/0x70
   [<ffffffffc08d3e58>] qed_ptt_acquire+0x38/0x100 [qed]
   [<ffffffffc08eac48>] _qed_get_vport_stats+0x458/0x580 [qed]
   [<ffffffffc08ead8c>] qed_get_vport_stats+0x1c/0xd0 [qed]
   [<ffffffffc08dffd3>] qed_get_protocol_stats+0x93/0x100 [qed]
                        qed_mcp_send_protocol_stats
            case MFW_DRV_MSG_GET_LAN_STATS:
            case MFW_DRV_MSG_GET_FCOE_STATS:
            case MFW_DRV_MSG_GET_ISCSI_STATS:
            case MFW_DRV_MSG_GET_RDMA_STATS:
   [<ffffffffc08e36d8>] qed_mcp_handle_events+0x2d8/0x890 [qed]
                        qed_int_assertion
                        qed_int_attentions
   [<ffffffffc08d9490>] qed_int_sp_dpc+0xa50/0xdc0 [qed]
   [<ffffffffb3aa7623>] tasklet_action+0x83/0x140
   [<ffffffffb41d9125>] __do_softirq+0x125/0x2bb
   [<ffffffffb41d560c>] call_softirq+0x1c/0x30
   [<ffffffffb3a30645>] do_softirq+0x65/0xa0
   [<ffffffffb3aa78d5>] irq_exit+0x105/0x110
   [<ffffffffb41d8996>] do_IRQ+0x56/0xf0

The situation is clear - tasklet function called schedule, but the fix
is not so trivial.

Checking the mainstream code it seem the same calltrace is still
possible on the latest kernel as well, so here is the fix.

The was a similar case recently for QEDE driver (reading stats through
sysfs) which resulted in the commit:
  42510dffd0e2 ("qed/qede: Fix scheduling while atomic")

i tried to implement the same logic as a fix for my case, but failed:
unfortunately it's not clear to me for this particular QED driver case
which statistic to collect in delay works for each particular device and
getting ALL possible stats for all devices, ignoring device type seems
incorrect.

Taking into account that i do not have access to the hardware at all,
the delay work approach is nearly impossible for me.

Thus i have taken the idea from patch v3 - just to provide the context
by the caller:
  https://www.spinics.net/lists/netdev/msg901089.html

At least this solution is technically clear and hopefully i did not make
stupid mistakes here.

The patch is COMPILE TESTED ONLY.

i would appreciate if somebody can test the patch. :)


Konstantin Khorenko (1):
  qed: Fix scheduling in a tasklet while getting stats

 drivers/net/ethernet/qlogic/qed/qed_dev_api.h |  2 ++
 drivers/net/ethernet/qlogic/qed/qed_fcoe.c    | 19 ++++++++++----
 drivers/net/ethernet/qlogic/qed/qed_fcoe.h    |  6 +++--
 drivers/net/ethernet/qlogic/qed/qed_hw.c      | 26 ++++++++++++++++---
 drivers/net/ethernet/qlogic/qed/qed_iscsi.c   | 19 ++++++++++----
 drivers/net/ethernet/qlogic/qed/qed_iscsi.h   |  6 +++--
 drivers/net/ethernet/qlogic/qed/qed_l2.c      | 19 ++++++++++----
 drivers/net/ethernet/qlogic/qed/qed_l2.h      |  3 +++
 drivers/net/ethernet/qlogic/qed/qed_main.c    |  6 ++---
 9 files changed, 80 insertions(+), 26 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-07-29 16:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-26 17:19 [PATCH 0/1] qed: Yet another scheduling while atomic fix Konstantin Khorenko
2023-07-26 17:19 ` [PATCH 1/1] qed: Fix scheduling in a tasklet while getting stats Konstantin Khorenko
2023-07-27 11:59   ` Simon Horman
2023-07-27 15:26     ` [PATCH v2 0/1] qed: Yet another scheduling while atomic fix Konstantin Khorenko
2023-07-27 15:26       ` [PATCH v2 1/1] qed: Fix scheduling in a tasklet while getting stats Konstantin Khorenko
2023-07-29 11:07         ` Simon Horman
2023-07-29 16:20       ` [PATCH v2 0/1] qed: Yet another scheduling while atomic fix patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).