From: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
To: Dany Madden <drt@linux.ibm.com>
Cc: "Jakub Kicinski" <kuba@kernel.org>,
"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
netdev@vger.kernel.org, linyunsheng@huawei.com,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Antoine Tenart" <atenart@kernel.org>,
"Alexander Lobakin" <alobakin@pm.me>,
"Wei Wang" <weiwan@google.com>, "Taehee Yoo" <ap420073@gmail.com>,
"Björn Töpel" <bjorn@kernel.org>, "Arnd Bergmann" <arnd@arndb.de>,
"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
"Neil Horman" <nhorman@redhat.com>,
"Dust Li" <dust.li@linux.alibaba.com>
Subject: Re: [PATCH net v2] napi: fix race inside napi_enable
Date: Thu, 21 Oct 2021 20:16:04 -0700 [thread overview]
Message-ID: <YXIs9GRNtNbl8MkZ@us.ibm.com> (raw)
In-Reply-To: <dc6902364a8f91c4292fe1c5e01b24be@imap.linux.ibm.com>
Dany Madden [drt@linux.ibm.com] wrote:
>
> We hit two napi related crashes while attempting mtu size change.
>
> 1st crash:
> [430425.020051] ------------[ cut here ]------------
> [430425.020053] kernel BUG at ../net/core/dev.c:6938!
> [430425.020057] Oops: Exception in kernel mode, sig: 5 [#1]
> [430425.020068] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [430425.020075] Modules linked in: binfmt_misc rpadlpar_io rpaphp xt_tcpudp
> iptable_filter ip_tables x_tables pseries_rng ibmvnic rng_core ibmveth
> vmx_crypto gf128mul fuse btrfs blake2b_generic xor zstd_compress
> lzo_compress raid6_pq dm_service_time crc32c_vpmsum dm_mirror dm_region_hash
> dm_log dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
> [430425.020123] CPU: 0 PID: 34337 Comm: kworker/0:3 Kdump: loaded Tainted: G
> W 5.15.0-rc2-suka-00486-gce916130f5f6 #3
> [430425.020133] Workqueue: events_long __ibmvnic_reset [ibmvnic]
> [430425.020145] NIP: c000000000cb03f4 LR: c0080000014a4ce8 CTR:
> c000000000cb03b0
> [430425.020151] REGS: c00000002e5d37e0 TRAP: 0700 Tainted: G W
> (5.15.0-rc2-suka-00486-gce916130f5f6)
> [430425.020159] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR:
> 28248428 XER: 20000001
> [430425.020176] CFAR: c0080000014ad9cc IRQMASK: 0
> GPR00: c0080000014a4ce8 c00000002e5d3a80 c000000001b12100
> c0000001274f3190
> GPR04: 00000000ffff36dc fffffffffffffff6 0000000000000019
> 0000000000000010
> GPR08: c00000002ec48900 0000000000000001 c0000001274f31a0
> c0080000014ad9b8
> GPR12: c000000000cb03b0 c000000001d00000 0000000080000000
> 00000000000003fe
> GPR16: 00000000000006e3 0000000000000000 0000000000000008
> c00000002ec48af8
> GPR20: c00000002ec48db0 0000000000000000 0000000000000004
> 0000000000000000
> GPR24: c00000002ec48000 0000000000000004 c00000002ec49070
> 0000000000000006
> GPR28: c00000002ec48900 c00000002ec48900 0000000000000002
> c00000002ec48000
> [430425.020248] NIP [c000000000cb03f4] napi_enable+0x44/0xc0
> [430425.020257] LR [c0080000014a4ce8] __ibmvnic_open+0xf0/0x440 [ibmvnic]
> [430425.020265] Call Trace:
> [430425.020269] [c00000002e5d3a80] [c00000002ec48900] 0xc00000002ec48900
> (unreliable)
> [430425.020277] [c00000002e5d3ab0] [c0080000014a4f40]
> __ibmvnic_open+0x348/0x440 [ibmvnic]
> [430425.020286] [c00000002e5d3b40] [c0080000014ace58]
> __ibmvnic_reset+0xb10/0xe40 [ibmvnic]
> [430425.020296] [c00000002e5d3c60] [c0000000001673a4]
> process_one_work+0x2d4/0x5d0
> [430425.020305] [c00000002e5d3d00] [c000000000167718]
> worker_thread+0x78/0x6c0
> [430425.020314] [c00000002e5d3da0] [c000000000173388] kthread+0x188/0x190
> [430425.020322] [c00000002e5d3e10] [c00000000000cee4]
> ret_from_kernel_thread+0x5c/0x64
> [430425.020331] Instruction dump:
> [430425.020335] 38a0fff6 39430010 e92d0c80 f9210028 39200000 60000000
> 60000000 e9030010
> [430425.020348] f9010020 e9210020 7d2948f8 792907e0 <0b090000> e9230038
> 7d072838 89290889
> [430425.020364] ---[ end trace 3abb5ec5589518ca ]---
> [430425.068100]
> [430425.068108] Sending IPI to other CPUs
> [430425.068206] IPI complete
> [430425.090333] kexec: Starting switchover sequence.
Jakub,
We hit this napi_enable() BUG_ON() crash three times this week. In two
of those times it appears that
napi->state = netdev_priv(netdev)
i.e it contains ibmvnic_adapter* in our case.
# Crash was on eth3
crash> net |grep eth3
c00000002e948000 eth3 10.1.194.173
crash> net_device |grep SIZE
SIZE: 2304
crash> px 2304
$1 = 0x900
crash> ibmvnic_adapter c00000002e948900 |grep napi
napi = 0xc00000003b7dc000,
num_active_rx_napi = 8,
napi_enabled = false,
crash> napi_struct 0xc00000003b7dc000 |grep state
state = 13835058056063650048,
state = 0 '\000',
crash> px 13835058056063650048
$2 = 0xc00000002e948900 #eth3 ibmvnic_adapter above
In the third case napi->state was 16 (i.e NAPI_STATE_SCHED was clear and
hence the bug in napi_enable()).
Let us know if any other fields are of interest. Do we have any clues on
when this started?
Sukadev
next prev parent reply other threads:[~2021-10-22 3:16 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-18 8:52 [PATCH net v2] napi: fix race inside napi_enable Xuan Zhuo
2021-09-20 8:50 ` patchwork-bot+netdevbpf
2021-09-20 19:20 ` Jakub Kicinski
2021-09-22 6:47 ` Xuan Zhuo
2021-09-23 13:14 ` Jakub Kicinski
[not found] ` <1632404456.506512-1-xuanzhuo@linux.alibaba.com>
2021-09-23 14:54 ` Jakub Kicinski
2021-10-18 21:58 ` Sukadev Bhattiprolu
2021-10-18 22:55 ` Jakub Kicinski
2021-10-18 23:36 ` Dany Madden
2021-10-18 23:47 ` Jakub Kicinski
2021-10-19 0:01 ` Sukadev Bhattiprolu
2021-10-22 3:16 ` Sukadev Bhattiprolu [this message]
2021-10-25 17:36 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YXIs9GRNtNbl8MkZ@us.ibm.com \
--to=sukadev@linux.ibm.com \
--cc=alobakin@pm.me \
--cc=ap420073@gmail.com \
--cc=arnd@arndb.de \
--cc=atenart@kernel.org \
--cc=bjorn@kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=drt@linux.ibm.com \
--cc=dust.li@linux.alibaba.com \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linyunsheng@huawei.com \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=nhorman@redhat.com \
--cc=weiwan@google.com \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).