linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
Cc: "robin.murphy@arm.com" <robin.murphy@arm.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: SError Interrupt on CPU0, code 0xbf000000 makes kernel panic
Date: Thu, 24 Mar 2022 14:17:26 +0000	[thread overview]
Message-ID: <87o81v5tah.wl-maz@kernel.org> (raw)
In-Reply-To: <c57d5088e189b0453e19f2537bb8dcbc5b4cccae.camel@infinera.com>

On Thu, 24 Mar 2022 14:01:53 +0000,
Joakim Tjernlund <Joakim.Tjernlund@infinera.com> wrote:
> 
> On Thu, 2022-03-24 at 13:16 +0000, Robin Murphy wrote:
> > On 2022-03-24 12:10, Joakim Tjernlund wrote:
> > > We have a custom SOC, CPU A53, that when an app accesses non existing address space reports:
> > > # > devmem 0x20000000 w 0x1000 #this will open /dev/mem and write
> > >   
> > > [   37.570886] SError Interrupt on CPU0, code 0xbf000000 -- SError
> > > [   37.571974] CPU: 0 PID: 72 Comm: devmem Not tainted 5.15.26-g18447c6fff6f-dirty #26
> > > [   37.573150] Hardware name: infinera,xr (DT)
> > > [   37.573599] pstate: 60000010 (nZCv q A32 LE aif -DIT -SSBS)
> > > [   37.574705] pc : 000000000098775c
> > > [   37.575063] lr : 0000000000986918
> > > [   37.575392] sp : 00000000ffd140a8
> > > [   37.575725] x12: 0000000000a36c10
> > > [   37.576443] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000020
> > > [   37.577872] x8 : 00000000ffd141c0 x7 : 00000000ffd14104 x6 : 0000000000986c9c
> > > [   37.579278] x5 : 000000000000001f x4 : 0000000000000004 x3 : 0000000000a37020
> > > [   37.580635] x2 : 0000000000000003 x1 : 0000000000001000 x0 : 0000000000000000
> > > [   37.582164] Kernel panic - not syncing: Asynchronous SError Interrupt
> > > [   37.582685] Kernel Offset: disabled
> > > [   37.582932] CPU features: 0x00001001,20000842
> > > [   37.583509] Memory Limit: none
> > > [   37.630058] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
> > > 
> > > and the kernel panics. This is a surprise as I expected the app to just be killed bus a SIGBUS.
> > > Is this what to expect?
> > > I see that kernel looks for the RAS extension but we don't have that.
> > > 
> > > Can anything be done not to panic the kernel for such accesses?
> > 
> > No. The error comes back to the CPU in an unattributable manner, so all 
> > it knows is that *something*, at some point in the past, went 
> > catastrophically wrong. Saying "this is fine..." and carrying on 
> > regardless isn't really viable. IIRC the RAS extension places 
> > constraints on the delivery of async SError such that it's slightly more 
> > possible to do something with, but without that all bets are off.
> 
> And this is because we don't have RAS? If we did have RAS
> would/could kernel  sort out the error and the app would get an
> SIGBUS or similar?

With RAS, the error would be containable, and attributed to the
userspace task by the kernel on the next exception. Without RAS, panic
is the only option, as we have no idea what the damage is. The machine
is on fire, for all we know.

> 
> > 
> > > Can one build a som sort of blacklisted address spaces which the MMU will block?
> > 
> > Sure, just configure the kernel with CONFIG_DEVMEM=n and it should never 
> > access anything invalid.
> > I'm not even entirely joking there - even for address ranges that the 
> > kernel *does* know about, you can still SError or deadlock by poking at 
> > something that's currently clock-gated or powered off, or lose coherency 
> > and cause corruption by accessing memory with the wrong attributes; at 
> > worst writing the wrong thing to the wrong place may even physically 
> > damage the hardware.
> > 
> I know /dev/mem is bad and it was an example but such SW errors can
> happen elsewhere to, we got one from a badly configured UIO device
> as well.  HW errors we just have to live with but I hoped we could
> handle some SW errors better.

I think you have the wrong end of the stick here. This *is* a HW
error, and the HW tells you so in no uncertain terms that something is
really bad.

If the device is supposed to be assignable to userspace, it either
must be designed not to respond with a SError no matter what userspace
is throwing at it (because let's face it, userspace will eventually do
something really bad), or the whole system must be designed in a way
that such error can be contained and attributed to the offending
party.

Just giving userspace any odd device and hoping that it will all be
fine is unfortunately wishful thinking.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-03-24 14:19 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-24 12:10 SError Interrupt on CPU0, code 0xbf000000 makes kernel panic Joakim Tjernlund
2022-03-24 13:16 ` Robin Murphy
2022-03-24 14:01   ` Joakim Tjernlund
2022-03-24 14:17     ` Marc Zyngier [this message]
2022-03-24 14:50       ` Joakim Tjernlund
2022-03-24 15:05         ` Robin Murphy
2022-03-24 15:11           ` Joakim Tjernlund
2022-03-24 15:25             ` Marc Zyngier
2022-03-24 15:42               ` Joakim Tjernlund
2022-03-24 15:54                 ` Robin Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o81v5tah.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=Joakim.Tjernlund@infinera.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=robin.murphy@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).