All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Ellerman <mpe@ellerman.id.au>
To: Jakub Kicinski <kuba@kernel.org>,
	Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: dumazet@google.com, netdev <netdev@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Dany Madden <drt@linux.ibm.com>,
	alexandr.lobakin@intel.com,
	brian King <brking@linux.vnet.ibm.com>,
	Sukadev Bhattiprolu <sukadev@linux.ibm.com>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [5.16.0-rc5][ppc][net] kernel oops when hotplug remove of vNIC interface
Date: Thu, 06 Jan 2022 15:19:15 +1100	[thread overview]
Message-ID: <87lezt3398.fsf@mpe.ellerman.id.au> (raw)
In-Reply-To: <20220105102625.2738186e@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

Jakub Kicinski <kuba@kernel.org> writes:
> On Wed, 5 Jan 2022 13:56:53 +0530 Abdul Haleem wrote:
>> Greeting's
>> 
>> Mainline kernel 5.16.0-rc5 panics when DLPAR ADD of vNIC device on my 
>> Powerpc LPAR
>> 
>> Perform below dlpar commands in a loop from linux OS
>> 
>> drmgr -r -c slot -s U9080.HEX.134C488-V1-C3 -w 5 -d 1
>> drmgr -a -c slot -s U9080.HEX.134C488-V1-C3 -w 5 -d 1
>> 
>> after 7th iteration, the kernel panics with below messages
>> 
>> console messages:
>> [102056] ibmvnic 30000003 env3: Sending CRQ: 801e000864000000 
>> 0060000000000000
>> <intr> ibmvnic 30000003 env3: Handling CRQ: 809e000800000000 
>> 0000000000000000
>> [102056] ibmvnic 30000003 env3: Disabling tx_scrq[0] irq
>> [102056] ibmvnic 30000003 env3: Disabling tx_scrq[1] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[0] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[1] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[2] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[3] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[4] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[5] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[6] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[7] irq
>> [102056] ibmvnic 30000003 env3: Replenished 8 pools
>> Kernel attempted to read user page (10) - exploit attempt? (uid: 0)
>> BUG: Kernel NULL pointer dereference on read at 0x00000010
>> Faulting instruction address: 0xc000000000a3c840
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>> Modules linked in: bridge stp llc ib_core rpadlpar_io rpaphp nfnetlink 
>> tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag 
>> bonding rfkill ibmvnic sunrpc pseries_rng xts vmx_crypto gf128mul 
>> sch_fq_codel binfmt_misc ip_tables ext4 mbcache jbd2 dm_service_time 
>> sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth dm_multipath dm_mirror 
>> dm_region_hash dm_log dm_mod fuse
>> CPU: 9 PID: 102056 Comm: kworker/9:2 Kdump: loaded Not tainted 
>> 5.16.0-rc5-autotest-g6441998e2e37 #1
>> Workqueue: events_long __ibmvnic_reset [ibmvnic]
>> NIP:  c000000000a3c840 LR: c0080000029b5378 CTR: c000000000a3c820
>> REGS: c0000000548e37e0 TRAP: 0300   Not tainted 
>> (5.16.0-rc5-autotest-g6441998e2e37)
>> MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28248484  XER: 00000004
>> CFAR: c0080000029bdd24 DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0
>> GPR00: c0080000029b55d0 c0000000548e3a80 c0000000028f0200 0000000000000000
>> GPR04: c000000c7d1a7e00 fffffffffffffff6 0000000000000027 c000000c7d1a7e08
>> GPR08: 0000000000000023 0000000000000000 0000000000000010 c0080000029bdd10
>> GPR12: c000000000a3c820 c000000c7fca6680 0000000000000000 c000000133016bf8
>> GPR16: 00000000000003fe 0000000000001000 0000000000000002 0000000000000008
>> GPR20: c000000133016eb0 0000000000000000 0000000000000000 0000000000000003
>> GPR24: c000000133016000 c000000133017168 0000000020000000 c000000133016a00
>> GPR28: 0000000000000006 c000000133016a00 0000000000000001 c000000133016000
>> NIP [c000000000a3c840] napi_enable+0x20/0xc0
>> LR [c0080000029b5378] __ibmvnic_open+0xf0/0x430 [ibmvnic]
>> Call Trace:
>> [c0000000548e3a80] [0000000000000006] 0x6 (unreliable)
>> [c0000000548e3ab0] [c0080000029b55d0] __ibmvnic_open+0x348/0x430 [ibmvnic]
>> [c0000000548e3b40] [c0080000029bcc28] __ibmvnic_reset+0x500/0xdf0 [ibmvnic]
>> [c0000000548e3c60] [c000000000176228] process_one_work+0x288/0x570
>> [c0000000548e3d00] [c000000000176588] worker_thread+0x78/0x660
>> [c0000000548e3da0] [c0000000001822f0] kthread+0x1c0/0x1d0
>> [c0000000548e3e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
>> Instruction dump:
>> 7d2948f8 792307e0 4e800020 60000000 3c4c01eb 384239e0 f821ffd1 39430010
>> 38a0fff6 e92d1100 f9210028 39200000 <e9030010> f9010020 60420000 e9210020
>> ---[ end trace 5f8033b08fd27706 ]---
>> radix-mmu: Page sizes from device-tree:
>> 
>> the fault instruction points to
>> 
>> [root@ltcden11-lp1 boot]# gdb -batch 
>> vmlinuz-5.16.0-rc5-autotest-g6441998e2e37 -ex 'list *(0xc000000000a3c840)'
>> 0xc000000000a3c840 is in napi_enable (net/core/dev.c:6966).
>> 6961    void napi_enable(struct napi_struct *n)
>> 6962    {
>> 6963        unsigned long val, new;
>> 6964
>> 6965        do {
>> 6966            val = READ_ONCE(n->state);
>
> If n is NULL here that's gotta be a driver problem.

Definitely looks like it, the disassembly is:

  not     r9,r9
  clrldi  r3,r9,63
  blr				# end of previous function
  nop
  addis   r2,r12,491		# function entry
  addi    r2,r2,14816
  stdu    r1,-48(r1)		# stack frame creation
  li      r5,-10
  ld      r9,4352(r13)
  std     r9,40(r1)
  li      r9,0
  ld      r8,16(r3)		# load from r3 (n) + 16


The register dump shows that r3 is NULL, and it comes directly from the
caller. So we've been called with n = NULL.

cheers

WARNING: multiple messages have this Message-ID (diff)
From: Michael Ellerman <mpe@ellerman.id.au>
To: Jakub Kicinski <kuba@kernel.org>,
	Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	alexandr.lobakin@intel.com, dumazet@google.com,
	brian King <brking@linux.vnet.ibm.com>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	netdev <netdev@vger.kernel.org>,
	Sukadev Bhattiprolu <sukadev@linux.ibm.com>,
	Dany Madden <drt@linux.ibm.com>
Subject: Re: [5.16.0-rc5][ppc][net] kernel oops when hotplug remove of vNIC interface
Date: Thu, 06 Jan 2022 15:19:15 +1100	[thread overview]
Message-ID: <87lezt3398.fsf@mpe.ellerman.id.au> (raw)
In-Reply-To: <20220105102625.2738186e@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

Jakub Kicinski <kuba@kernel.org> writes:
> On Wed, 5 Jan 2022 13:56:53 +0530 Abdul Haleem wrote:
>> Greeting's
>> 
>> Mainline kernel 5.16.0-rc5 panics when DLPAR ADD of vNIC device on my 
>> Powerpc LPAR
>> 
>> Perform below dlpar commands in a loop from linux OS
>> 
>> drmgr -r -c slot -s U9080.HEX.134C488-V1-C3 -w 5 -d 1
>> drmgr -a -c slot -s U9080.HEX.134C488-V1-C3 -w 5 -d 1
>> 
>> after 7th iteration, the kernel panics with below messages
>> 
>> console messages:
>> [102056] ibmvnic 30000003 env3: Sending CRQ: 801e000864000000 
>> 0060000000000000
>> <intr> ibmvnic 30000003 env3: Handling CRQ: 809e000800000000 
>> 0000000000000000
>> [102056] ibmvnic 30000003 env3: Disabling tx_scrq[0] irq
>> [102056] ibmvnic 30000003 env3: Disabling tx_scrq[1] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[0] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[1] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[2] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[3] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[4] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[5] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[6] irq
>> [102056] ibmvnic 30000003 env3: Disabling rx_scrq[7] irq
>> [102056] ibmvnic 30000003 env3: Replenished 8 pools
>> Kernel attempted to read user page (10) - exploit attempt? (uid: 0)
>> BUG: Kernel NULL pointer dereference on read at 0x00000010
>> Faulting instruction address: 0xc000000000a3c840
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>> Modules linked in: bridge stp llc ib_core rpadlpar_io rpaphp nfnetlink 
>> tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag 
>> bonding rfkill ibmvnic sunrpc pseries_rng xts vmx_crypto gf128mul 
>> sch_fq_codel binfmt_misc ip_tables ext4 mbcache jbd2 dm_service_time 
>> sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth dm_multipath dm_mirror 
>> dm_region_hash dm_log dm_mod fuse
>> CPU: 9 PID: 102056 Comm: kworker/9:2 Kdump: loaded Not tainted 
>> 5.16.0-rc5-autotest-g6441998e2e37 #1
>> Workqueue: events_long __ibmvnic_reset [ibmvnic]
>> NIP:  c000000000a3c840 LR: c0080000029b5378 CTR: c000000000a3c820
>> REGS: c0000000548e37e0 TRAP: 0300   Not tainted 
>> (5.16.0-rc5-autotest-g6441998e2e37)
>> MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28248484  XER: 00000004
>> CFAR: c0080000029bdd24 DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0
>> GPR00: c0080000029b55d0 c0000000548e3a80 c0000000028f0200 0000000000000000
>> GPR04: c000000c7d1a7e00 fffffffffffffff6 0000000000000027 c000000c7d1a7e08
>> GPR08: 0000000000000023 0000000000000000 0000000000000010 c0080000029bdd10
>> GPR12: c000000000a3c820 c000000c7fca6680 0000000000000000 c000000133016bf8
>> GPR16: 00000000000003fe 0000000000001000 0000000000000002 0000000000000008
>> GPR20: c000000133016eb0 0000000000000000 0000000000000000 0000000000000003
>> GPR24: c000000133016000 c000000133017168 0000000020000000 c000000133016a00
>> GPR28: 0000000000000006 c000000133016a00 0000000000000001 c000000133016000
>> NIP [c000000000a3c840] napi_enable+0x20/0xc0
>> LR [c0080000029b5378] __ibmvnic_open+0xf0/0x430 [ibmvnic]
>> Call Trace:
>> [c0000000548e3a80] [0000000000000006] 0x6 (unreliable)
>> [c0000000548e3ab0] [c0080000029b55d0] __ibmvnic_open+0x348/0x430 [ibmvnic]
>> [c0000000548e3b40] [c0080000029bcc28] __ibmvnic_reset+0x500/0xdf0 [ibmvnic]
>> [c0000000548e3c60] [c000000000176228] process_one_work+0x288/0x570
>> [c0000000548e3d00] [c000000000176588] worker_thread+0x78/0x660
>> [c0000000548e3da0] [c0000000001822f0] kthread+0x1c0/0x1d0
>> [c0000000548e3e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
>> Instruction dump:
>> 7d2948f8 792307e0 4e800020 60000000 3c4c01eb 384239e0 f821ffd1 39430010
>> 38a0fff6 e92d1100 f9210028 39200000 <e9030010> f9010020 60420000 e9210020
>> ---[ end trace 5f8033b08fd27706 ]---
>> radix-mmu: Page sizes from device-tree:
>> 
>> the fault instruction points to
>> 
>> [root@ltcden11-lp1 boot]# gdb -batch 
>> vmlinuz-5.16.0-rc5-autotest-g6441998e2e37 -ex 'list *(0xc000000000a3c840)'
>> 0xc000000000a3c840 is in napi_enable (net/core/dev.c:6966).
>> 6961    void napi_enable(struct napi_struct *n)
>> 6962    {
>> 6963        unsigned long val, new;
>> 6964
>> 6965        do {
>> 6966            val = READ_ONCE(n->state);
>
> If n is NULL here that's gotta be a driver problem.

Definitely looks like it, the disassembly is:

  not     r9,r9
  clrldi  r3,r9,63
  blr				# end of previous function
  nop
  addis   r2,r12,491		# function entry
  addi    r2,r2,14816
  stdu    r1,-48(r1)		# stack frame creation
  li      r5,-10
  ld      r9,4352(r13)
  std     r9,40(r1)
  li      r9,0
  ld      r8,16(r3)		# load from r3 (n) + 16


The register dump shows that r3 is NULL, and it comes directly from the
caller. So we've been called with n = NULL.

cheers

  reply	other threads:[~2022-01-06  4:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-05  8:26 [5.16.0-rc5][ppc][net] kernel oops when hotplug remove of vNIC interface Abdul Haleem
2022-01-05  8:26 ` Abdul Haleem
2022-01-05 18:26 ` Jakub Kicinski
2022-01-05 18:26   ` Jakub Kicinski
2022-01-06  4:19   ` Michael Ellerman [this message]
2022-01-06  4:19     ` Michael Ellerman
2022-01-06 22:24     ` Sukadev Bhattiprolu
2022-01-06 22:24       ` Sukadev Bhattiprolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lezt3398.fsf@mpe.ellerman.id.au \
    --to=mpe@ellerman.id.au \
    --cc=abdhalee@linux.vnet.ibm.com \
    --cc=alexandr.lobakin@intel.com \
    --cc=brking@linux.vnet.ibm.com \
    --cc=drt@linux.ibm.com \
    --cc=dumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=netdev@vger.kernel.org \
    --cc=sukadev@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.