linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: james.morse@arm.com (James Morse)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 07/11] signal/arm64: Document conflicts with SI_USER and SIGFPE, SIGTRAP, SIGBUS
Date: Mon, 15 Jan 2018 19:30:26 +0000	[thread overview]
Message-ID: <5A5D0152.3090707@arm.com> (raw)
In-Reply-To: <20180115163028.GU22781@e103592.cambridge.arm.com>

Hi Dave,

Thanks for going through all these,

On 15/01/18 16:30, Dave Martin wrote:
> On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote:
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 9b7f89df49db..abe200587334 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -596,7 +596,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)

>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"synchronous external abort"	},
> 
> This si_code seems to be a fallback for if ACPI is absent or doesn't
> know what to do with this error.
> 
> -> SIGBUS/BUS_OBJERR?
> 
> Can probably legitimately happen for userspace for suitable MMIO mappings.

It can happen for normal memory too, there are specific ESR values for
parity/checksum errors when read/writing memory. I think this first one is
'other/unknown', and its up to the CPU how to classify them.


> Perhaps it's more serious though in the presence of ACPI.  Do we expect
> that ACPI can diagnose all localisable errors?

Its not just ACPI, the CPU's v8.2 RAS Extensions use this
synchronous-external-abort as notification of a RAS error, (the other details
are written to to memory-mapped nodes). With the v8.2 RAS Extensions the ESR
tells us if the error was contained.

For ACPI we rely on firmware to set an appropriate severity in the CPER records
generated by firmware. The APEI helpers will call panic() if they find a fatal
error.

For systems with neither {firmware,kernel}-first RAS, BUS_OBJERR looks like a
good choice.


>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 0 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 1 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 2 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 3 (translation table walk)"	},
> 
> Pagetable screwup or kernel/system/CPU bug -> SIGKILL, or panic().

(RAS mechanisms may claim this and send their own signals, if not:)

SIGKILL is probably a better choice here, while we do have an address, there is
nothing user-space can do about it.


>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"synchronous parity or ECC error" },	// Reserved when RAS is implemented
> 
> Possibly SIGBUS/BUS_MCEERR_AR (though I don't know exactly what
> userspace is supposed to do with this or whether this implies the
> existence or certain kernel features for managing the error that
> may not be present on arm64...)

I'd like to keep the MCEERR signals to errors that we know are contained, the
kernel has understood and handled.

(These features do exist for arm64, enabling CONFIG_MEMORY_FAILURE and a few
APEI options allows all this to work today with suitable firmware. My Seattle
claims to support it).


> Otherwise, SIGKILL.

Sounds good,


>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 0 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 1 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 2 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
>> +	{ do_sea,		SIGBUS,  BUS_FIXME,	"level 3 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
> 
> Process page tables corrupt: if the kernel couldn't fix this, the
> process can't reasonably fix it -> SIGKILL
> 
> Since this is a RAS-type error it could be triggered by a cosmic ray
> rather than requiring a kernel or system bug or other major failure, so
> we probably shouldn't panic the system if the error is localisable to a
> particular process.

Without the RAS-Extensions severity to tell us the error is contained I'm not
sure what we can expect. But given the page-tables are per-process, and we never
swap them to disk etc, its probably a safe bet that it doesn't matter either way
for these.


Thanks,

James

  parent reply	other threads:[~2018-01-15 19:30 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87373b6ghs.fsf@xmission.com>
2018-01-12  0:59 ` [PATCH 07/11] signal/arm64: Document conflicts with SI_USER and SIGFPE, SIGTRAP, SIGBUS Eric W. Biederman
2018-01-15 16:30   ` Dave Martin
2018-01-15 17:23     ` Eric W. Biederman
     [not found]       ` <20180116172407.GA22781@e103592.cambridge.arm.com>
     [not found]         ` <871sipl9p9.fsf@xmission.com>
2018-01-17 11:46           ` Dave Martin
2018-01-17 11:57           ` Russell King - ARM Linux
2018-01-17 12:15             ` Dave Martin
2018-01-17 12:37               ` Russell King - ARM Linux
2018-01-17 15:37                 ` Dave Martin
2018-01-17 15:49                   ` Russell King - ARM Linux
2018-01-17 16:11                     ` Dave Martin
2018-01-17 16:45                 ` Eric W. Biederman
2018-01-17 17:14                   ` Russell King - ARM Linux
2018-01-24 21:28                     ` Eric W. Biederman
2018-01-17 17:17       ` Dave Martin
2018-01-17 17:24         ` Eric W. Biederman
2018-01-17 17:39           ` Dave Martin
2018-01-15 19:30     ` James Morse [this message]
2018-01-12  0:59 ` [PATCH 08/11] signal/arm: Document conflicts with SI_USER and SIGFPE Eric W. Biederman
2018-01-15 17:49   ` Russell King - ARM Linux
2018-01-15 20:12     ` Eric W. Biederman
2018-01-19 12:05     ` Dave Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A5D0152.3090707@arm.com \
    --to=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).