From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org,
David Christensen <drc@linux.vnet.ibm.com>,
Manish Chopra <manishc@marvell.com>,
Ariel Elior <aelior@marvell.com>,
Jakub Kicinski <kuba@kernel.org>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.9 32/66] bnx2x: fix napi API usage sequence
Date: Tue, 10 May 2022 15:07:22 +0200 [thread overview]
Message-ID: <20220510130730.708432233@linuxfoundation.org> (raw)
In-Reply-To: <20220510130729.762341544@linuxfoundation.org>
From: Manish Chopra <manishc@marvell.com>
[ Upstream commit af68656d66eda219b7f55ce8313a1da0312c79e1 ]
While handling PCI errors (AER flow) driver tries to
disable NAPI [napi_disable()] after NAPI is deleted
[__netif_napi_del()] which causes unexpected system
hang/crash.
System message log shows the following:
=======================================
[ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000 [ 3222.537511] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures.
[ 3222.537512] EEH: Notify device drivers to shutdown [ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)'
[ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error detected [ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports:
'need reset'
[ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error detected [ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports:
'need reset'
[ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[ 3222.537890] EEH: Collect temporary log [ 3222.583481] EEH: of node=0384:80:00.0 [ 3222.583519] EEH: PCI device/vendor: 168e14e4 [ 3222.583557] EEH: PCI cmd/status register: 00100140 [ 3222.583557] EEH: PCI-E capabilities and status follow:
[ 3222.583744] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.583892] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.583893] EEH: PCI-E 20: 00000000 [ 3222.583893] EEH: PCI-E AER capability register set follows:
[ 3222.584079] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.584230] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.584378] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.584416] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.584416] EEH: of node=0384:80:00.1 [ 3222.584454] EEH: PCI device/vendor: 168e14e4 [ 3222.584491] EEH: PCI cmd/status register: 00100140 [ 3222.584492] EEH: PCI-E capabilities and status follow:
[ 3222.584677] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.584825] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.584826] EEH: PCI-E 20: 00000000 [ 3222.584826] EEH: PCI-E AER capability register set follows:
[ 3222.585011] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.585160] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.585309] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.585347] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2 [ 3222.586873] EEH: Reset without hotplug activity [ 3224.762767] EEH: Beginning: 'slot_reset'
[ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->slot_reset()
[ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset initializing...
[ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142) [ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset
--> driver unload
Uninterruptible tasks
=====================
crash> ps | grep UN
213 2 11 c000000004c89e00 UN 0.0 0 0 [eehd]
215 2 0 c000000004c80000 UN 0.0 0 0
[kworker/0:2]
2196 1 28 c000000004504f00 UN 0.1 15936 11136 wickedd
4287 1 9 c00000020d076800 UN 0.0 4032 3008 agetty
4289 1 20 c00000020d056680 UN 0.0 7232 3840 agetty
32423 2 26 c00000020038c580 UN 0.0 0 0
[kworker/26:3]
32871 4241 27 c0000002609ddd00 UN 0.1 18624 11648 sshd
32920 10130 16 c00000027284a100 UN 0.1 48512 12608 sendmail
33092 32987 0 c000000205218b00 UN 0.1 48512 12608 sendmail
33154 4567 16 c000000260e51780 UN 0.1 48832 12864 pickup
33209 4241 36 c000000270cb6500 UN 0.1 18624 11712 sshd
33473 33283 0 c000000205211480 UN 0.1 48512 12672 sendmail
33531 4241 37 c00000023c902780 UN 0.1 18624 11648 sshd
EEH handler hung while bnx2x sleeping and holding RTNL lock
===========================================================
crash> bt 213
PID: 213 TASK: c000000004c89e00 CPU: 11 COMMAND: "eehd"
#0 [c000000004d477e0] __schedule at c000000000c70808
#1 [c000000004d478b0] schedule at c000000000c70ee0
#2 [c000000004d478e0] schedule_timeout at c000000000c76dec
#3 [c000000004d479c0] msleep at c0000000002120cc
#4 [c000000004d479f0] napi_disable at c000000000a06448
^^^^^^^^^^^^^^^^
#5 [c000000004d47a30] bnx2x_netif_stop at c0080000018dba94 [bnx2x]
#6 [c000000004d47a60] bnx2x_io_slot_reset at c0080000018a551c [bnx2x]
#7 [c000000004d47b20] eeh_report_reset at c00000000004c9bc
#8 [c000000004d47b90] eeh_pe_report at c00000000004d1a8
#9 [c000000004d47c40] eeh_handle_normal_event at c00000000004da64
And the sleeping source code
============================
crash> dis -ls c000000000a06448
FILE: ../net/core/dev.c
LINE: 6702
6697 {
6698 might_sleep();
6699 set_bit(NAPI_STATE_DISABLE, &n->state);
6700
6701 while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
* 6702 msleep(1);
6703 while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
6704 msleep(1);
6705
6706 hrtimer_cancel(&n->timer);
6707
6708 clear_bit(NAPI_STATE_DISABLE, &n->state);
6709 }
EEH calls into bnx2x twice based on the system log above, first through
bnx2x_io_error_detected() and then bnx2x_io_slot_reset(), and executes
the following call chains:
bnx2x_io_error_detected()
+-> bnx2x_eeh_nic_unload()
+-> bnx2x_del_all_napi()
+-> __netif_napi_del()
bnx2x_io_slot_reset()
+-> bnx2x_netif_stop()
+-> bnx2x_napi_disable()
+->napi_disable()
Fix this by correcting the sequence of NAPI APIs usage,
that is delete the NAPI after disabling it.
Fixes: 7fa6f34081f1 ("bnx2x: AER revised")
Reported-by: David Christensen <drc@linux.vnet.ibm.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Link: https://lore.kernel.org/r/20220426153913.6966-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 8d17d464c067..398928642a97 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -14314,10 +14314,6 @@ static int bnx2x_eeh_nic_unload(struct bnx2x *bp)
/* Stop Tx */
bnx2x_tx_disable(bp);
- /* Delete all NAPI objects */
- bnx2x_del_all_napi(bp);
- if (CNIC_LOADED(bp))
- bnx2x_del_all_napi_cnic(bp);
netdev_reset_tc(bp->dev);
del_timer_sync(&bp->timer);
@@ -14422,6 +14418,11 @@ static pci_ers_result_t bnx2x_io_slot_reset(struct pci_dev *pdev)
bnx2x_drain_tx_queues(bp);
bnx2x_send_unload_req(bp, UNLOAD_RECOVERY);
bnx2x_netif_stop(bp, 1);
+ bnx2x_del_all_napi(bp);
+
+ if (CNIC_LOADED(bp))
+ bnx2x_del_all_napi_cnic(bp);
+
bnx2x_free_irq(bp);
/* Report UNLOAD_DONE to MCP */
--
2.35.1
next prev parent reply other threads:[~2022-05-10 13:19 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-10 13:06 [PATCH 4.9 00/66] 4.9.313-rc1 review Greg Kroah-Hartman
2022-05-10 13:06 ` [PATCH 4.9 01/66] floppy: disable FDRAWCMD by default Greg Kroah-Hartman
2022-05-10 13:06 ` [PATCH 4.9 02/66] Revert "net: ethernet: stmmac: fix altr_tse_pcs function when using a fixed-link" Greg Kroah-Hartman
2022-05-10 13:06 ` [PATCH 4.9 03/66] lightnvm: disable the subsystem Greg Kroah-Hartman
2022-05-10 13:06 ` [PATCH 4.9 04/66] USB: quirks: add a Realtek card reader Greg Kroah-Hartman
2022-05-10 13:06 ` [PATCH 4.9 05/66] USB: quirks: add STRING quirk for VCOM device Greg Kroah-Hartman
2022-05-10 13:06 ` [PATCH 4.9 06/66] USB: serial: whiteheat: fix heap overflow in WHITEHEAT_GET_DTR_RTS Greg Kroah-Hartman
2022-05-10 13:06 ` [PATCH 4.9 07/66] USB: serial: cp210x: add PIDs for Kamstrup USB Meter Reader Greg Kroah-Hartman
2022-05-10 13:06 ` [PATCH 4.9 08/66] USB: serial: option: add support for Cinterion MV32-WA/MV32-WB Greg Kroah-Hartman
2022-05-10 13:06 ` [PATCH 4.9 09/66] USB: serial: option: add Telit 0x1057, 0x1058, 0x1075 compositions Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 10/66] xhci: stop polling roothubs after shutdown Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 11/66] iio: dac: ad5592r: Fix the missing return value Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 12/66] iio: dac: ad5446: Fix read_raw not returning set value Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 13/66] iio: magnetometer: ak8975: Fix the error handling in ak8975_power_on() Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 14/66] usb: misc: fix improper handling of refcount in uss720_probe() Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 15/66] usb: gadget: uvc: Fix crash when encoding data for usb request Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 16/66] usb: gadget: configfs: clear deactivation flag in configfs_composite_unbind() Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 17/66] serial: 8250: Also set sticky MCR bits in console restoration Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 18/66] serial: 8250: Correct the clock for EndRun PTP/1588 PCIe device Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 19/66] hex2bin: make the function hex_to_bin constant-time Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 20/66] hex2bin: fix access beyond string end Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 21/66] ARM: dts: imx6qdl-apalis: Fix sgtl5000 detection issue Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 22/66] phy: samsung: Fix missing of_node_put() in exynos_sata_phy_probe Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 23/66] phy: samsung: exynos5250-sata: fix missing device put in probe error paths Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 24/66] ARM: OMAP2+: Fix refcount leak in omap_gic_of_init Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 25/66] ARM: dts: Fix mmc order for omap3-gta04 Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 26/66] mtd: rawnand: Fix return value check of wait_for_completion_timeout Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 27/66] pinctrl: pistachio: fix use of irq_of_parse_and_map() Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 28/66] ip_gre: Make o_seqno start from 0 in native mode Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 29/66] tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 30/66] bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create() Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 31/66] clk: sunxi: sun9i-mmc: check return value after calling platform_get_resource() Greg Kroah-Hartman
2022-05-10 13:07 ` Greg Kroah-Hartman [this message]
2022-05-10 13:07 ` [PATCH 4.9 33/66] ASoC: wm8731: Disable the regulator when probing fails Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 34/66] drivers: net: hippi: Fix deadlock in rr_close() Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 35/66] x86/cpu: Load microcode during restore_processor_state() Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 36/66] tty: n_gsm: fix wrong signal octet encoding in convergence layer type 2 Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 37/66] tty: n_gsm: fix malformed counter for out of frame data Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 38/66] tty: n_gsm: fix insufficient txframe size Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 39/66] tty: n_gsm: fix missing explicit ldisc flush Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 40/66] tty: n_gsm: fix wrong command retry handling Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 41/66] tty: n_gsm: fix wrong command frame length field encoding Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 42/66] tty: n_gsm: fix incorrect UA handling Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 43/66] MIPS: Fix CP0 counter erratum detection for R4k CPUs Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 44/66] parisc: Merge model and model name into one line in /proc/cpuinfo Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 45/66] ALSA: fireworks: fix wrong return count shorter than expected by 4 bytes Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 46/66] Revert "SUNRPC: attempt AF_LOCAL connect on setup" Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 47/66] firewire: fix potential uaf in outbound_phy_packet_callback() Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 48/66] firewire: remove check of list iterator against head past the loop body Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 49/66] firewire: core: extend card->lock in fw_core_handle_bus_reset Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 50/66] ASoC: wm8958: Fix change notifications for DSP controls Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 51/66] can: grcan: grcan_close(): fix deadlock Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 52/66] can: grcan: use ofdev->dev when allocating DMA memory Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 53/66] nfc: replace improper check device_is_registered() in netlink related functions Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 54/66] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 55/66] NFC: netlink: fix sleep in atomic bug when firmware download timeout Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 56/66] hwmon: (adt7470) Fix warning on module removal Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 57/66] ASoC: dmaengine: Restore NULL prepare_slave_config() callback Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 58/66] net: emaclite: Add error handling for of_address_to_resource() Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 59/66] smsc911x: allow using IRQ0 Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 60/66] btrfs: always log symlinks in full mode Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 61/66] net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter() Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 62/66] kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 63/66] net: sched: prevent UAF on tc_ctl_tfilter when temporarily dropping rtnl_lock Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 64/66] net: ipv6: ensure we call ipv6_mc_down() at most once Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 65/66] dm: fix mempool NULL pointer race when completing IO Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 4.9 66/66] dm: interlock pending dm_io and dm_wait_for_bios_completion Greg Kroah-Hartman
2022-05-10 16:46 ` [PATCH 4.9 00/66] 4.9.313-rc1 review Florian Fainelli
2022-05-10 18:07 ` Pavel Machek
2022-05-10 22:43 ` Shuah Khan
2022-05-11 1:10 ` Guenter Roeck
2022-05-11 11:15 ` Naresh Kamboju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220510130730.708432233@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=aelior@marvell.com \
--cc=drc@linux.vnet.ibm.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=manishc@marvell.com \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox