From: Stewart Smith <stewart@linux.ibm.com>
To: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, haren@linux.vnet.ibm.com
Subject: crash after NX error
Date: Tue, 04 Jun 2019 13:23:42 +1000 [thread overview]
Message-ID: <87pnnuav9d.fsf@linux.vnet.ibm.com> (raw)
On my two socket POWER9 system (powernv) with 842 zwap set up, I
recently got a crash with the Ubuntu kernel (I haven't tried with
upstream, and this is the first time the system has died like this, so
I'm not sure how repeatable it is).
[ 2.891463] zswap: loaded using pool 842-nx/zbud
...
[15626.124646] nx_compress_powernv: ERROR: CSB still not valid after 5000000 us, giving up : 00 00 00 00 00000000
[16868.932913] Unable to handle kernel paging request for data at address 0x6655f67da816cdb8
[16868.933726] Faulting instruction address: 0xc000000000391600
cpu 0x68: Vector: 380 (Data Access Out of Range) at [c000001c9d98b9a0]
pc: c000000000391600: kmem_cache_alloc+0x2e0/0x340
lr: c0000000003915ec: kmem_cache_alloc+0x2cc/0x340
sp: c000001c9d98bc20
msr: 900000000280b033
dar: 6655f67da816cdb8
current = 0xc000001ad43cb400
paca = 0xc00000000fac7800 softe: 0 irq_happened: 0x01
pid = 8319, comm = make
Linux version 4.15.0-50-generic (buildd@bos02-ppc64el-006) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #54-Ubuntu SMP Mon May 6 18:55:18 UTC 2019 (Ubuntu 4.15.0-50.54-generic 4.15.18)
68:mon> t
[c000001c9d98bc20] c0000000003914d4 kmem_cache_alloc+0x1b4/0x340 (unreliable)
[c000001c9d98bc80] c0000000003b1e14 __khugepaged_enter+0x54/0x220
[c000001c9d98bcc0] c00000000010f0ec copy_process.isra.5.part.6+0xebc/0x1a10
[c000001c9d98bda0] c00000000010fe4c _do_fork+0xec/0x510
[c000001c9d98be30] c00000000000b584 ppc_clone+0x8/0xc
--- Exception: c00 (System Call) at 00007afe9daf87f4
SP (7fffca606880) is in userspace
So, it looks like there could be a problem in the error path, plausibly
fixed by this patch:
commit 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5
Author: Haren Myneni <haren@linux.vnet.ibm.com>
Date: Wed Jun 13 00:32:40 2018 -0700
crypto/nx: Initialize 842 high and normal RxFIFO control registers
NX increments readOffset by FIFO size in receive FIFO control register
when CRB is read. But the index in RxFIFO has to match with the
corresponding entry in FIFO maintained by VAS in kernel. Otherwise NX
may be processing incorrect CRBs and can cause CRB timeout.
VAS FIFO offset is 0 when the receive window is opened during
initialization. When the module is reloaded or in kexec boot, readOffset
in FIFO control register may not match with VAS entry. This patch adds
nx_coproc_init OPAL call to reset readOffset and queued entries in FIFO
control register for both high and normal FIFOs.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
[mpe: Fixup uninitialized variable warning]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
$ git describe --contains 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5
v4.19-rc1~24^2~50
Which was never backported to any stable release, so probably needs to
be for v4.14 through v4.18. Notably, Ubuntu is on v4.15 and it doesn't
seem to have picked up the patch. I'm opening an Ubuntu bug for this.
Haren, is this something you can drive through the stable process
(assuming my above crash looks like this failure)?
--
Stewart Smith
OPAL Architect, IBM.
next reply other threads:[~2019-06-04 3:25 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-04 3:23 Stewart Smith [this message]
2019-06-05 5:21 ` crash after NX error Haren Myneni
2019-06-05 11:06 ` Michael Ellerman
2019-06-06 2:29 ` Stewart Smith
2019-06-11 0:44 ` Haren Myneni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pnnuav9d.fsf@linux.vnet.ibm.com \
--to=stewart@linux.ibm.com \
--cc=haren@linux.vnet.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).