From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Su Yanjun <suyj.fnst@cn.fujitsu.com>,
"David S . Miller" <davem@davemloft.net>,
Sasha Levin <sashal@kernel.org>,
netdev@vger.kernel.org
Subject: [PATCH AUTOSEL 3.18 13/16] net: 8139cp: fix a BUG triggered by changing mtu with network traffic
Date: Wed, 12 Dec 2018 23:33:40 -0500 [thread overview]
Message-ID: <20181213043343.76896-13-sashal@kernel.org> (raw)
In-Reply-To: <20181213043343.76896-1-sashal@kernel.org>
From: Su Yanjun <suyj.fnst@cn.fujitsu.com>
[ Upstream commit a5d4a89245ead1f37ed135213653c5beebea4237 ]
When changing mtu many times with traffic, a bug is triggered:
[ 1035.684037] kernel BUG at lib/dynamic_queue_limits.c:26!
[ 1035.684042] invalid opcode: 0000 [#1] SMP
[ 1035.684049] Modules linked in: loop binfmt_misc 8139cp(OE) macsec
tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag tcp_lp
fuse uinput xt_CHECKSUM iptable_mangle ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun
bridge stp llc ebtable_filter ebtables ip6table_filter devlink
ip6_tables iptable_filter sunrpc snd_hda_codec_generic snd_hda_intel
snd_hda_codec snd_hda_core snd_hwdep ppdev snd_seq iosf_mbi crc32_pclmul
parport_pc snd_seq_device ghash_clmulni_intel parport snd_pcm
aesni_intel joydev lrw snd_timer virtio_balloon sg gf128mul glue_helper
ablk_helper cryptd snd soundcore i2c_piix4 pcspkr ip_tables xfs
libcrc32c sr_mod sd_mod cdrom crc_t10dif crct10dif_generic ata_generic
[ 1035.684102] pata_acpi virtio_console qxl drm_kms_helper syscopyarea
sysfillrect sysimgblt floppy fb_sys_fops crct10dif_pclmul
crct10dif_common ttm crc32c_intel serio_raw ata_piix drm libata 8139too
virtio_pci drm_panel_orientation_quirks virtio_ring virtio mii dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: 8139cp]
[ 1035.684132] CPU: 9 PID: 25140 Comm: if-mtu-change Kdump: loaded
Tainted: G OE ------------ T 3.10.0-957.el7.x86_64 #1
[ 1035.684134] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1035.684136] task: ffff8f59b1f5a080 ti: ffff8f5a2e32c000 task.ti:
ffff8f5a2e32c000
[ 1035.684149] RIP: 0010:[<ffffffffba3a40d0>] [<ffffffffba3a40d0>]
dql_completed+0x180/0x190
[ 1035.684162] RSP: 0000:ffff8f5a75483e50 EFLAGS: 00010093
[ 1035.684162] RAX: 00000000000000c2 RBX: ffff8f5a6f91c000 RCX:
0000000000000000
[ 1035.684162] RDX: 0000000000000000 RSI: 0000000000000184 RDI:
ffff8f599fea3ec0
[ 1035.684162] RBP: ffff8f5a75483ea8 R08: 00000000000000c2 R09:
0000000000000000
[ 1035.684162] R10: 00000000000616ef R11: ffff8f5a75483b56 R12:
ffff8f599fea3e00
[ 1035.684162] R13: 0000000000000001 R14: 0000000000000000 R15:
0000000000000184
[ 1035.684162] FS: 00007fa8434de740(0000) GS:ffff8f5a75480000(0000)
knlGS:0000000000000000
[ 1035.684162] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1035.684162] CR2: 00000000004305d0 CR3: 000000024eb66000 CR4:
00000000001406e0
[ 1035.684162] Call Trace:
[ 1035.684162] <IRQ>
[ 1035.684162] [<ffffffffc08cbaf8>] ? cp_interrupt+0x478/0x580 [8139cp]
[ 1035.684162] [<ffffffffba14a294>]
__handle_irq_event_percpu+0x44/0x1c0
[ 1035.684162] [<ffffffffba14a442>] handle_irq_event_percpu+0x32/0x80
[ 1035.684162] [<ffffffffba14a4cc>] handle_irq_event+0x3c/0x60
[ 1035.684162] [<ffffffffba14db29>] handle_fasteoi_irq+0x59/0x110
[ 1035.684162] [<ffffffffba02e554>] handle_irq+0xe4/0x1a0
[ 1035.684162] [<ffffffffba7795dd>] do_IRQ+0x4d/0xf0
[ 1035.684162] [<ffffffffba76b362>] common_interrupt+0x162/0x162
[ 1035.684162] <EOI>
[ 1035.684162] [<ffffffffba0c2ae4>] ? __wake_up_bit+0x24/0x70
[ 1035.684162] [<ffffffffba1e46f5>] ? do_set_pte+0xd5/0x120
[ 1035.684162] [<ffffffffba1b64fb>] unlock_page+0x2b/0x30
[ 1035.684162] [<ffffffffba1e4879>] do_read_fault.isra.61+0x139/0x1b0
[ 1035.684162] [<ffffffffba1e9134>] handle_pte_fault+0x2f4/0xd10
[ 1035.684162] [<ffffffffba1ebc6d>] handle_mm_fault+0x39d/0x9b0
[ 1035.684162] [<ffffffffba76f5e3>] __do_page_fault+0x203/0x500
[ 1035.684162] [<ffffffffba76f9c6>] trace_do_page_fault+0x56/0x150
[ 1035.684162] [<ffffffffba76ef42>] do_async_page_fault+0x22/0xf0
[ 1035.684162] [<ffffffffba76b788>] async_page_fault+0x28/0x30
[ 1035.684162] Code: 54 c7 47 54 ff ff ff ff 44 0f 49 ce 48 8b 35 48 2f
9c 00 48 89 77 58 e9 fe fe ff ff 0f 1f 80 00 00 00 00 41 89 d1 e9 ef fe
ff ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 55 8d 42 ff 48
[ 1035.684162] RIP [<ffffffffba3a40d0>] dql_completed+0x180/0x190
[ 1035.684162] RSP <ffff8f5a75483e50>
It's not the same as in 7fe0ee09 patch described.
As 8139cp uses shared irq mode, other device irq will trigger
cp_interrupt to execute.
cp_change_mtu
-> cp_close
-> cp_open
In cp_close routine just before free_irq(), some interrupt may occur.
In my environment, cp_interrupt exectutes and IntrStatus is 0x4,
exactly TxOk. That will cause cp_tx to wake device queue.
As device queue is started, cp_start_xmit and cp_open will run at same
time which will cause kernel BUG.
For example:
[#] for tx descriptor
At start:
[#][#][#]
num_queued=3
After cp_init_hw->cp_start_hw->netdev_reset_queue:
[#][#][#]
num_queued=0
When 8139cp starts to work then cp_tx will check
num_queued mismatchs the complete_bytes.
The patch will check IntrMask before check IntrStatus in cp_interrupt.
When 8139cp interrupt is disabled, just return.
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/realtek/8139cp.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/realtek/8139cp.c b/drivers/net/ethernet/realtek/8139cp.c
index 75b1693ec8bf..1c0c1a28161c 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -576,6 +576,7 @@ static irqreturn_t cp_interrupt (int irq, void *dev_instance)
struct cp_private *cp;
int handled = 0;
u16 status;
+ u16 mask;
if (unlikely(dev == NULL))
return IRQ_NONE;
@@ -583,6 +584,10 @@ static irqreturn_t cp_interrupt (int irq, void *dev_instance)
spin_lock(&cp->lock);
+ mask = cpr16(IntrMask);
+ if (!mask)
+ goto out_unlock;
+
status = cpr16(IntrStatus);
if (!status || (status == 0xFFFF))
goto out_unlock;
--
2.19.1
next prev parent reply other threads:[~2018-12-13 4:34 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-13 4:33 [PATCH AUTOSEL 3.18 01/16] scsi: libiscsi: Fix NULL pointer dereference in iscsi_eh_session_reset Sasha Levin
2018-12-13 4:33 ` [PATCH AUTOSEL 3.18 02/16] scsi: vmw_pscsi: Rearrange code to avoid multiple calls to free_irq during unload Sasha Levin
2018-12-13 4:33 ` [PATCH AUTOSEL 3.18 03/16] x86/earlyprintk/efi: Fix infinite loop on some screen widths Sasha Levin
2018-12-13 4:33 ` Sasha Levin
2018-12-13 4:33 ` [PATCH AUTOSEL 3.18 04/16] bonding: fix 802.3ad state sent to partner when unbinding slave Sasha Levin
2018-12-13 4:33 ` [PATCH AUTOSEL 3.18 05/16] SUNRPC: Fix leak of krb5p encode pages Sasha Levin
2018-12-13 4:33 ` [PATCH AUTOSEL 3.18 06/16] SUNRPC: Fix a potential race in xprt_connect() Sasha Levin
2018-12-13 4:33 ` [PATCH AUTOSEL 3.18 07/16] sbus: char: add of_node_put() Sasha Levin
2018-12-13 4:33 ` Sasha Levin
2018-12-13 4:33 ` [PATCH AUTOSEL 3.18 08/16] drivers/sbus/char: " Sasha Levin
2018-12-13 4:33 ` Sasha Levin
2018-12-13 4:33 ` [PATCH AUTOSEL 3.18 09/16] drivers/tty: add missing of_node_put() Sasha Levin
2018-12-13 4:33 ` Sasha Levin
2018-12-13 4:33 ` [PATCH AUTOSEL 3.18 10/16] ide: pmac: add of_node_put() Sasha Levin
2018-12-13 4:33 ` Sasha Levin
2018-12-13 4:33 ` [PATCH AUTOSEL 3.18 11/16] Input: omap-keypad - fix keyboard debounce configuration Sasha Levin
2018-12-13 4:33 ` [PATCH AUTOSEL 3.18 12/16] libata: whitelist all SAMSUNG MZ7KM* solid-state disks Sasha Levin
2018-12-13 4:33 ` Sasha Levin [this message]
2018-12-13 4:33 ` [PATCH AUTOSEL 3.18 14/16] ARM: 8814/1: mm: improve/fix ARM v7_dma_inv_range() unaligned address handling Sasha Levin
2018-12-13 4:33 ` [PATCH AUTOSEL 3.18 15/16] cifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs) Sasha Levin
2018-12-13 4:33 ` [PATCH AUTOSEL 3.18 16/16] i2c: scmi: Fix probe error on devices with an empty SMB0001 ACPI device node Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181213043343.76896-13-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=suyj.fnst@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.