* [RFC PATCH v2 0/3] arm64: Fix invalid si_codes for fault signals @ 2018-01-30 18:50 Dave Martin [not found] ` <1517338243-9749-1-git-send-email-Dave.Martin-5wv7dgnIgG8@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Dave Martin @ 2018-01-30 18:50 UTC (permalink / raw) To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r Cc: linux-arch-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman, Will Deacon, Catalin Marinas, Russell King Changes since RFC v1: * Renamed FPE_UNKNOWN to FPE_FLTUNK to be more consistent with the standardised error codes. _FLT because _some_ kind of floating- point exception has occurred, and UNK because it is architecturally unkown what exception(s) occurred. "Undiagnosed" may be a better word and I include this in the accompanying comment, but FPE_FLTUND is already taken for floating-point underflow. * Renumbered FPE_FLTUNK as 14, since this is the next unused number. My assumption that the x86-specific codes weren't upstream yet was wrong... * Split this definition out as a separate patch, since it is likely applicable to other architectures. These patches are based on: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git siginfo-next ea64d5acc8f0 ("signal: Unify and correct copy_siginfo_to_user32") ... with 526c3ddb6aa2 ("signal/arm64: Document conflicts with SI_USER and SIGFPE,SIGTRAP,SIGBUS") reverted. Original blurb: As reported by Eric Biederman, [1], [2], [3] etc., several architectures are inappropriately setting si_code to 0 when signaling faults to userspace, which will interpret this value as SI_USER leading to garbage being read out of siginfo fields that the kernel doesn't initialise. This seems not to be a huge problem in practice, since many affected faults should only happen if the kernel or hardware is buggy or broken. However, there are some cases that don't fall under this. This RFC series proposes fixes to eliminate these si_code == 0 cases from arm64. Some of the changes may be controversial. There are two main headlines here, which may be applicable to other architectures: * addition of a new code FPE_UNKNOWN for undiagnosable floating-point exception traps; * delivering SIGKILL instead of SIGSEGV/BUS/TRAP etc. for "impossible" exceptions that are not feasible to recover from and likely indicate a kernel or system bug or failure. In particular there is likely to be a fair amount of overlap between arm and arm64 here, but due to the longer history and evolution of the arm tree and 32-bit architecture I'm less confident in making correct judgements for that case. This series doesn't touch arm, for now. This series has been build-tested only. [1] [PATCH 00/11] siginfo fixes/cleanups esp SI_USER https://marc.info/?l=linux-kernel&m=151571871109016&w=2 [2] [PATCH 08/11] signal/arm: Document conflicts with SI_USER and SIGFPE http://lists.infradead.org/pipermail/linux-arm-kernel/2018-January/553574.html [3] [PATCH 07/11] signal/arm64: Document conflicts with SI_USER and SIGFPE, SIGTRAP, SIGBUS lists.infradead.org/pipermail/linux-arm-kernel/2018-January/553573.html --- Dave Martin (3): signal: Add FPE_FLTUNK si_code for undiagnosable fp exceptions arm64: fpsimd: Fix bad si_code for undiagnosed SIGFPE arm64: signal: Ensure si_code is valid for all fault signals arch/arm64/include/asm/esr.h | 9 +++ arch/arm64/kernel/fpsimd.c | 31 +++++----- arch/arm64/mm/fault.c | 114 ++++++++++++++++++------------------- include/uapi/asm-generic/siginfo.h | 3 +- 4 files changed, 85 insertions(+), 72 deletions(-) -- 2.1.4 ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <1517338243-9749-1-git-send-email-Dave.Martin-5wv7dgnIgG8@public.gmane.org>]
* [RFC PATCH v2 1/3] signal: Add FPE_FLTUNK si_code for undiagnosable fp exceptions [not found] ` <1517338243-9749-1-git-send-email-Dave.Martin-5wv7dgnIgG8@public.gmane.org> @ 2018-01-30 18:50 ` Dave Martin 2018-01-30 18:50 ` [RFC PATCH v2 2/3] arm64: fpsimd: Fix bad si_code for undiagnosed SIGFPE Dave Martin 2018-01-30 18:50 ` [RFC PATCH v2 3/3] arm64: signal: Ensure si_code is valid for all fault signals Dave Martin 2 siblings, 0 replies; 7+ messages in thread From: Dave Martin @ 2018-01-30 18:50 UTC (permalink / raw) To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r Cc: linux-arch-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman, Will Deacon, Catalin Marinas, Russell King Some architectures cannot always report accurately what kind of floating-point exception triggered a floating-point exception trap. This can occur with fp exceptions occuring on lanes in a vector instruction on arm64 for example. Rather than have every architecture come up with its own way of descrbing such a condition, this patch adds a common FPE_FLTUNK do report that an fp exception caused a trap but we cannot be certain which kind of fp exception it was. Signed-off-by: Dave Martin <Dave.Martin-5wv7dgnIgG8@public.gmane.org> --- include/uapi/asm-generic/siginfo.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index 254afc3..450b31f 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -229,7 +229,8 @@ typedef struct siginfo { # define __FPE_INVASC 12 /* invalid ASCII digit */ # define __FPE_INVDEC 13 /* invalid decimal digit */ #endif -#define NSIGFPE 13 +#define FPE_FLTUNK 14 /* undiagnosed floating-point exception */ +#define NSIGFPE 14 /* * SIGSEGV si_codes -- 2.1.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH v2 2/3] arm64: fpsimd: Fix bad si_code for undiagnosed SIGFPE [not found] ` <1517338243-9749-1-git-send-email-Dave.Martin-5wv7dgnIgG8@public.gmane.org> 2018-01-30 18:50 ` [RFC PATCH v2 1/3] signal: Add FPE_FLTUNK si_code for undiagnosable fp exceptions Dave Martin @ 2018-01-30 18:50 ` Dave Martin 2018-01-30 18:50 ` [RFC PATCH v2 3/3] arm64: signal: Ensure si_code is valid for all fault signals Dave Martin 2 siblings, 0 replies; 7+ messages in thread From: Dave Martin @ 2018-01-30 18:50 UTC (permalink / raw) To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r Cc: linux-arch-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman, Will Deacon, Catalin Marinas, Russell King Currently a SIGFPE delivered in response to a floating-point exception trap may have si_code set to 0 on arm64. As reported by Eric, this is a bad idea since this is the value of SI_USER -- yet this signal is definitely not the result of kill(2), tgkill(2) etc. and si_uid and si_pid make limited sense whereas we do want to yield a value for si_addr (which doesn't exist for SI_USER). It's not entirely clear whether the architecure permits a "spurious" fp exception trap where none of the exception flag bits in ESR_ELx is set. (IMHO the architectural intent is to forbid this.) However, it does permit those bits to contain garbage if the TFV bit in ESR_ELx is 0. That case isn't currently handled at all and may result in si_code == 0 or si_code containing a FPE_FLT* constant corresponding to an exception that did not in fact happen. There is nothing sensible we can return for si_code in such cases, but SI_USER is certainly not appropriate and will lead to violation of legitimate userspace assumptions. This patch allocates a new si_code value FPE_UNKNOWN that at least does not conflict with any existing SI_* or FPE_* code, and yields this in si_code for undiagnosable cases. This is probably the best simplicity/incorrectness tradeoff achieveable without relying on implementation-dependent features or adding a lot of code. In any case, there appears to be no perfect solution possible that would justify a lot of effort here. Yielding FPE_UNKNOWN when some well-defined fp exception caused the trap is a violation of POSIX, but this is forced by the architecture. We have no realistic prospect of yielding the correct code in such cases. At present I am not aware of any ARMv8 implementation that supports trapped floating-point exceptions in any case. The new code may be applicable to other architectures for similar reasons. No attempt is made to provide ESR_ELx to userspace in the signal frame, since architectural limitations mean that it is unlikely to provide much diagnostic value, doesn't benefit existing software and would create ABI with no proven purpose. The existing mechanism for passing it also has problems of its own which may result in the wrong value being passed to userspace due to interaction with mm faults. The implied rework does not appear justified. Reported-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> Signed-off-by: Dave Martin <Dave.Martin-5wv7dgnIgG8@public.gmane.org> --- arch/arm64/include/asm/esr.h | 9 +++++++++ arch/arm64/kernel/fpsimd.c | 27 +++++++++++++++------------ 2 files changed, 24 insertions(+), 12 deletions(-) diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h index 014d7d8..c585e91 100644 --- a/arch/arm64/include/asm/esr.h +++ b/arch/arm64/include/asm/esr.h @@ -220,6 +220,15 @@ (((e) & ESR_ELx_SYS64_ISS_OP2_MASK) >> \ ESR_ELx_SYS64_ISS_OP2_SHIFT)) +/* + * ISS field definitions for floating-point exception traps + * (FP_EXC_32/FP_EXC_64). + * + * (The FPEXC_* constants are used instead for common bits.) + */ + +#define ESR_ELx_FP_EXC_TFV (UL(1) << 23) + #ifndef __ASSEMBLY__ #include <asm/types.h> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index fae81f7..4fecda1 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -39,6 +39,7 @@ #include <linux/slab.h> #include <linux/sysctl.h> +#include <asm/esr.h> #include <asm/fpsimd.h> #include <asm/cputype.h> #include <asm/simd.h> @@ -867,18 +868,20 @@ asmlinkage void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs) asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs) { siginfo_t info; - unsigned int si_code = 0; - - if (esr & FPEXC_IOF) - si_code = FPE_FLTINV; - else if (esr & FPEXC_DZF) - si_code = FPE_FLTDIV; - else if (esr & FPEXC_OFF) - si_code = FPE_FLTOVF; - else if (esr & FPEXC_UFF) - si_code = FPE_FLTUND; - else if (esr & FPEXC_IXF) - si_code = FPE_FLTRES; + unsigned int si_code = FPE_FLTUNK; + + if (esr & ESR_ELx_FP_EXC_TFV) { + if (esr & FPEXC_IOF) + si_code = FPE_FLTINV; + else if (esr & FPEXC_DZF) + si_code = FPE_FLTDIV; + else if (esr & FPEXC_OFF) + si_code = FPE_FLTOVF; + else if (esr & FPEXC_UFF) + si_code = FPE_FLTUND; + else if (esr & FPEXC_IXF) + si_code = FPE_FLTRES; + } memset(&info, 0, sizeof(info)); info.si_signo = SIGFPE; -- 2.1.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH v2 3/3] arm64: signal: Ensure si_code is valid for all fault signals [not found] ` <1517338243-9749-1-git-send-email-Dave.Martin-5wv7dgnIgG8@public.gmane.org> 2018-01-30 18:50 ` [RFC PATCH v2 1/3] signal: Add FPE_FLTUNK si_code for undiagnosable fp exceptions Dave Martin 2018-01-30 18:50 ` [RFC PATCH v2 2/3] arm64: fpsimd: Fix bad si_code for undiagnosed SIGFPE Dave Martin @ 2018-01-30 18:50 ` Dave Martin 2018-02-13 13:58 ` James Morse 2 siblings, 1 reply; 7+ messages in thread From: Dave Martin @ 2018-01-30 18:50 UTC (permalink / raw) To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r Cc: linux-arch-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman, Will Deacon, Catalin Marinas, Russell King Currently, as reported by Eric, an invalid si_code value 0 is passed in many signals delivered to userspace in response to faults and other kernel errors. Typically 0 is passed when the fault is insufficiently diagnosable or when there does not appear to be any sensible alternative value to choose. This appears to violate POSIX, and is intuitively wrong for at least two reasons arising from the fact that 0 == SI_USER: 1) si_code is a union selector, and SI_USER (and si_code <= 0 in general) implies the existence of a different set of fields (siginfo._kill) from that which exists for a fault signal (siginfo._sigfault). However, the code raising the signal typically writes only the _sigfault fields, and the _kill fields make no sense in this case. Thus when userspace sees si_code == 0 (SI_USER) it may legitimately inspect fields in the inactive union member _kill and obtain garbage as a result. There appears to be software in the wild relying on this, albeit generally only for printing diagnostic messages. 2) Software that wants to be robust against spurious signals may discard signals where si_code == SI_USER (or <= 0), or may filter such signals based on the si_uid and si_pid fields of siginfo._sigkill. In the case of fault signals, this means that important (and usually fatal) error conditions may be silently ignored. In practice, many of the faults for which arm64 passes si_code == 0 are undiagnosable conditions such as exceptions with syndrome values in ESR_ELx to which the architecture does not yet assign any meaning, or conditions indicative of a bug or error in the kernel or system and thus that are unrecoverable and should never occur in normal operation. The approach taken in this patch is to translate all such undiagnosable or "impossible" synchronous fault conditions to SIGKILL, since these are at least probably localisable to a single process. Some of these conditions should really result in a kernel panic, but due to the lack of diagnostic information it is difficult to be certain: this patch does not add any calls to panic(), but this could change later if justified. Although si_code will not reach userspace in the case of SIGKILL, it is still desirable to pass a nonzero value so that the common siginfo handling code can detect incorrect use of si_code == 0 without false positives. In this case the si_code dependent siginfo fields will not be correctly initialised, but since they are not passed to userspace I deem this not to matter. A few faults can reasonably occur in realistic userspace scenarios, and _should_ raise a regular, handleable (but perhaps not ignorable/blockable) signal: for these, this patch attempts to choose a suitable standard si_code value for the raised signal in each case instead of 0. Reported-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> Signed-off-by: Dave Martin <Dave.Martin-5wv7dgnIgG8@public.gmane.org> --- arch/arm64/kernel/fpsimd.c | 4 +- arch/arm64/mm/fault.c | 114 ++++++++++++++++++++++----------------------- 2 files changed, 59 insertions(+), 59 deletions(-) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 4fecda1..944f24b 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -286,8 +286,8 @@ static void task_fpsimd_save(void) * re-enter user with corrupt state. * There's no way to recover, so kill it: */ - force_signal_inject( - SIGKILL, 0, current_pt_regs(), 0); + force_signal_inject(SIGKILL, SI_KERNEL, + current_pt_regs(), 0); return; } diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 9b7f89d..4baa922 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -596,7 +596,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) info.si_signo = SIGBUS; info.si_errno = 0; - info.si_code = 0; + info.si_code = BUS_OBJERR; if (esr & ESR_ELx_FnV) info.si_addr = NULL; else @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) } static const struct fault_info fault_info[] = { - { do_bad, SIGBUS, 0, "ttbr address size fault" }, - { do_bad, SIGBUS, 0, "level 1 address size fault" }, - { do_bad, SIGBUS, 0, "level 2 address size fault" }, - { do_bad, SIGBUS, 0, "level 3 address size fault" }, + { do_bad, SIGKILL, SI_KERNEL, "ttbr address size fault" }, + { do_bad, SIGKILL, SI_KERNEL, "level 1 address size fault" }, + { do_bad, SIGKILL, SI_KERNEL, "level 2 address size fault" }, + { do_bad, SIGKILL, SI_KERNEL, "level 3 address size fault" }, { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 0 translation fault" }, { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 1 translation fault" }, { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 2 translation fault" }, { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 3 translation fault" }, - { do_bad, SIGBUS, 0, "unknown 8" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 8" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 1 access flag fault" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 access flag fault" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 access flag fault" }, - { do_bad, SIGBUS, 0, "unknown 12" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 12" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 1 permission fault" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 permission fault" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 permission fault" }, - { do_sea, SIGBUS, 0, "synchronous external abort" }, - { do_bad, SIGBUS, 0, "unknown 17" }, - { do_bad, SIGBUS, 0, "unknown 18" }, - { do_bad, SIGBUS, 0, "unknown 19" }, - { do_sea, SIGBUS, 0, "level 0 (translation table walk)" }, - { do_sea, SIGBUS, 0, "level 1 (translation table walk)" }, - { do_sea, SIGBUS, 0, "level 2 (translation table walk)" }, - { do_sea, SIGBUS, 0, "level 3 (translation table walk)" }, - { do_sea, SIGBUS, 0, "synchronous parity or ECC error" }, // Reserved when RAS is implemented - { do_bad, SIGBUS, 0, "unknown 25" }, - { do_bad, SIGBUS, 0, "unknown 26" }, - { do_bad, SIGBUS, 0, "unknown 27" }, - { do_sea, SIGBUS, 0, "level 0 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented - { do_sea, SIGBUS, 0, "level 1 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented - { do_sea, SIGBUS, 0, "level 2 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented - { do_sea, SIGBUS, 0, "level 3 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented - { do_bad, SIGBUS, 0, "unknown 32" }, + { do_sea, SIGBUS, BUS_OBJERR, "synchronous external abort" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 17" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 18" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 19" }, + { do_sea, SIGKILL, SI_KERNEL, "level 0 (translation table walk)" }, + { do_sea, SIGKILL, SI_KERNEL, "level 1 (translation table walk)" }, + { do_sea, SIGKILL, SI_KERNEL, "level 2 (translation table walk)" }, + { do_sea, SIGKILL, SI_KERNEL, "level 3 (translation table walk)" }, + { do_sea, SIGBUS, BUS_OBJERR, "synchronous parity or ECC error" }, // Reserved when RAS is implemented + { do_bad, SIGKILL, SI_KERNEL, "unknown 25" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 26" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 27" }, + { do_sea, SIGKILL, SI_KERNEL, "level 0 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented + { do_sea, SIGKILL, SI_KERNEL, "level 1 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented + { do_sea, SIGKILL, SI_KERNEL, "level 2 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented + { do_sea, SIGKILL, SI_KERNEL, "level 3 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented + { do_bad, SIGKILL, SI_KERNEL, "unknown 32" }, { do_alignment_fault, SIGBUS, BUS_ADRALN, "alignment fault" }, - { do_bad, SIGBUS, 0, "unknown 34" }, - { do_bad, SIGBUS, 0, "unknown 35" }, - { do_bad, SIGBUS, 0, "unknown 36" }, - { do_bad, SIGBUS, 0, "unknown 37" }, - { do_bad, SIGBUS, 0, "unknown 38" }, - { do_bad, SIGBUS, 0, "unknown 39" }, - { do_bad, SIGBUS, 0, "unknown 40" }, - { do_bad, SIGBUS, 0, "unknown 41" }, - { do_bad, SIGBUS, 0, "unknown 42" }, - { do_bad, SIGBUS, 0, "unknown 43" }, - { do_bad, SIGBUS, 0, "unknown 44" }, - { do_bad, SIGBUS, 0, "unknown 45" }, - { do_bad, SIGBUS, 0, "unknown 46" }, - { do_bad, SIGBUS, 0, "unknown 47" }, - { do_bad, SIGBUS, 0, "TLB conflict abort" }, - { do_bad, SIGBUS, 0, "Unsupported atomic hardware update fault" }, - { do_bad, SIGBUS, 0, "unknown 50" }, - { do_bad, SIGBUS, 0, "unknown 51" }, - { do_bad, SIGBUS, 0, "implementation fault (lockdown abort)" }, - { do_bad, SIGBUS, 0, "implementation fault (unsupported exclusive)" }, - { do_bad, SIGBUS, 0, "unknown 54" }, - { do_bad, SIGBUS, 0, "unknown 55" }, - { do_bad, SIGBUS, 0, "unknown 56" }, - { do_bad, SIGBUS, 0, "unknown 57" }, - { do_bad, SIGBUS, 0, "unknown 58" }, - { do_bad, SIGBUS, 0, "unknown 59" }, - { do_bad, SIGBUS, 0, "unknown 60" }, - { do_bad, SIGBUS, 0, "section domain fault" }, - { do_bad, SIGBUS, 0, "page domain fault" }, - { do_bad, SIGBUS, 0, "unknown 63" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 34" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 35" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 36" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 37" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 38" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 39" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 40" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 41" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 42" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 43" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 44" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 45" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 46" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 47" }, + { do_bad, SIGKILL, SI_KERNEL, "TLB conflict abort" }, + { do_bad, SIGKILL, SI_KERNEL, "Unsupported atomic hardware update fault" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 50" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 51" }, + { do_bad, SIGKILL, SI_KERNEL, "implementation fault (lockdown abort)" }, + { do_bad, SIGBUS, BUS_OBJERR, "implementation fault (unsupported exclusive)" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 54" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 55" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 56" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 57" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 58" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 59" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 60" }, + { do_bad, SIGKILL, SI_KERNEL, "section domain fault" }, + { do_bad, SIGKILL, SI_KERNEL, "page domain fault" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 63" }, }; int handle_guest_sea(phys_addr_t addr, unsigned int esr) @@ -739,11 +739,11 @@ static struct fault_info __refdata debug_fault_info[] = { { do_bad, SIGTRAP, TRAP_HWBKPT, "hardware breakpoint" }, { do_bad, SIGTRAP, TRAP_HWBKPT, "hardware single-step" }, { do_bad, SIGTRAP, TRAP_HWBKPT, "hardware watchpoint" }, - { do_bad, SIGBUS, 0, "unknown 3" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 3" }, { do_bad, SIGTRAP, TRAP_BRKPT, "aarch32 BKPT" }, - { do_bad, SIGTRAP, 0, "aarch32 vector catch" }, + { do_bad, SIGKILL, SI_KERNEL, "aarch32 vector catch" }, { early_brk64, SIGTRAP, TRAP_BRKPT, "aarch64 BRK" }, - { do_bad, SIGBUS, 0, "unknown 7" }, + { do_bad, SIGKILL, SI_KERNEL, "unknown 7" }, }; void __init hook_debug_fault_code(int nr, -- 2.1.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH v2 3/3] arm64: signal: Ensure si_code is valid for all fault signals 2018-01-30 18:50 ` [RFC PATCH v2 3/3] arm64: signal: Ensure si_code is valid for all fault signals Dave Martin @ 2018-02-13 13:58 ` James Morse 2018-02-13 15:22 ` Dave Martin 0 siblings, 1 reply; 7+ messages in thread From: James Morse @ 2018-02-13 13:58 UTC (permalink / raw) To: Dave Martin, linux-arm-kernel Cc: linux-arch, linux-api, Will Deacon, Russell King, Eric W. Biederman, Catalin Marinas Hi Dave, On 30/01/18 18:50, Dave Martin wrote: > Currently, as reported by Eric, an invalid si_code value 0 is > passed in many signals delivered to userspace in response to faults > and other kernel errors. Typically 0 is passed when the fault is > insufficiently diagnosable or when there does not appear to be any > sensible alternative value to choose. > > This appears to violate POSIX, and is intuitively wrong for at > least two reasons arising from the fact that 0 == SI_USER: > > 1) si_code is a union selector, and SI_USER (and si_code <= 0 in > general) implies the existence of a different set of fields > (siginfo._kill) from that which exists for a fault signal > (siginfo._sigfault). However, the code raising the signal > typically writes only the _sigfault fields, and the _kill > fields make no sense in this case. > > Thus when userspace sees si_code == 0 (SI_USER) it may > legitimately inspect fields in the inactive union member _kill > and obtain garbage as a result. > > There appears to be software in the wild relying on this, > albeit generally only for printing diagnostic messages. > > 2) Software that wants to be robust against spurious signals may > discard signals where si_code == SI_USER (or <= 0), or may > filter such signals based on the si_uid and si_pid fields of > siginfo._sigkill. In the case of fault signals, this means > that important (and usually fatal) error conditions may be > silently ignored. > > In practice, many of the faults for which arm64 passes si_code == 0 > are undiagnosable conditions such as exceptions with syndrome > values in ESR_ELx to which the architecture does not yet assign any > meaning, or conditions indicative of a bug or error in the kernel > or system and thus that are unrecoverable and should never occur in > normal operation. > > The approach taken in this patch is to translate all such > undiagnosable or "impossible" synchronous fault conditions to > SIGKILL, since these are at least probably localisable to a single > process. Some of these conditions should really result in a kernel > panic, but due to the lack of diagnostic information it is > difficult to be certain: this patch does not add any calls to > panic(), but this could change later if justified. > > Although si_code will not reach userspace in the case of SIGKILL, > it is still desirable to pass a nonzero value so that the common > siginfo handling code can detect incorrect use of si_code == 0 > without false positives. In this case the si_code dependent > siginfo fields will not be correctly initialised, but since they > are not passed to userspace I deem this not to matter. > > A few faults can reasonably occur in realistic userspace scenarios, > and _should_ raise a regular, handleable (but perhaps not > ignorable/blockable) signal: for these, this patch attempts to > choose a suitable standard si_code value for the raised signal in > each case instead of 0. > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > index 9b7f89d..4baa922 100644 > --- a/arch/arm64/mm/fault.c > +++ b/arch/arm64/mm/fault.c > @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) [..] > + { do_sea, SIGKILL, SI_KERNEL, "level 0 (translation table walk)" }, > + { do_sea, SIGKILL, SI_KERNEL, "level 1 (translation table walk)" }, > + { do_sea, SIGKILL, SI_KERNEL, "level 2 (translation table walk)" }, > + { do_sea, SIGKILL, SI_KERNEL, "level 3 (translation table walk)" }, > + { do_sea, SIGBUS, BUS_OBJERR, "synchronous parity or ECC error" }, // Reserved when RAS is implemented I agree the translation-table related external-aborts should end up with SIGKILL: there is nothing user-space can do. You use the fault_info table to vary the signal and si_code that should be used, but do_mem_abort() only uses these if the fn returns an error. For do_sea(), regardless of the values in this table SIGBUS will be generated as it always returns 0. > @@ -596,7 +596,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) > > info.si_signo = SIGBUS; > info.si_errno = 0; > - info.si_code = 0; > + info.si_code = BUS_OBJERR; > if (esr & ESR_ELx_FnV) > info.si_addr = NULL; > else do_sea() has the right fault_info entry to hand, so I think these need to change to inf->sig and inf->code. (I assume its not valid to set si_addr for SIGKILL...) Thanks, James ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH v2 3/3] arm64: signal: Ensure si_code is valid for all fault signals 2018-02-13 13:58 ` James Morse @ 2018-02-13 15:22 ` Dave Martin [not found] ` <20180213152207.GP5862-M5GwZQ6tE7x5pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Dave Martin @ 2018-02-13 15:22 UTC (permalink / raw) To: James Morse Cc: linux-arm-kernel, linux-arch, linux-api, Will Deacon, Russell King, Eric W. Biederman, Catalin Marinas On Tue, Feb 13, 2018 at 01:58:55PM +0000, James Morse wrote: > Hi Dave, > > On 30/01/18 18:50, Dave Martin wrote: [...] > > The approach taken in this patch is to translate all such > > undiagnosable or "impossible" synchronous fault conditions to > > SIGKILL, since these are at least probably localisable to a single > > process. Some of these conditions should really result in a kernel > > panic, but due to the lack of diagnostic information it is > > difficult to be certain: this patch does not add any calls to > > panic(), but this could change later if justified. > > > > Although si_code will not reach userspace in the case of SIGKILL, > > it is still desirable to pass a nonzero value so that the common > > siginfo handling code can detect incorrect use of si_code == 0 > > without false positives. In this case the si_code dependent > > siginfo fields will not be correctly initialised, but since they > > are not passed to userspace I deem this not to matter. > > > > A few faults can reasonably occur in realistic userspace scenarios, > > and _should_ raise a regular, handleable (but perhaps not > > ignorable/blockable) signal: for these, this patch attempts to > > choose a suitable standard si_code value for the raised signal in > > each case instead of 0. > > > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > > index 9b7f89d..4baa922 100644 > > --- a/arch/arm64/mm/fault.c > > +++ b/arch/arm64/mm/fault.c > > @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) > [..] > > + { do_sea, SIGKILL, SI_KERNEL, "level 0 (translation table walk)" }, > > + { do_sea, SIGKILL, SI_KERNEL, "level 1 (translation table walk)" }, > > + { do_sea, SIGKILL, SI_KERNEL, "level 2 (translation table walk)" }, > > + { do_sea, SIGKILL, SI_KERNEL, "level 3 (translation table walk)" }, > > + { do_sea, SIGBUS, BUS_OBJERR, "synchronous parity or ECC error" }, // Reserved when RAS is implemented > > I agree the translation-table related external-aborts should end up with > SIGKILL: there is nothing user-space can do. > > You use the fault_info table to vary the signal and si_code that should be used, > but do_mem_abort() only uses these if the fn returns an error. For do_sea(), > regardless of the values in this table SIGBUS will be generated as it always > returns 0. > > > > @@ -596,7 +596,7 @@ static int do_sea(unsigned long addr, unsigned int esr, > struct pt_regs *regs) > > > > info.si_signo = SIGBUS; > > info.si_errno = 0; > > - info.si_code = 0; > > + info.si_code = BUS_OBJERR; > > if (esr & ESR_ELx_FnV) > > info.si_addr = NULL; > > else > > do_sea() has the right fault_info entry to hand, so I think these need to change > to inf->sig and inf->code. (I assume its not valid to set si_addr for SIGKILL...) Yes, I guess that makes sense. For SIGKILL, I'm assuming that it is harmless to populate si_addr: even though not strictly valid, the signal is never delivered to userspace. Even ptrace cannot see SIGKILL -- the trace just disappears and further ptrace calls fail with ESRCH. If is matters, I guess we could prepopulate si_uid = si_pid = 0 for this case. That's at least cleaner, so I might do that. For do_sea: I was thinking of the fault_info[] table entries as for the fallback case only, but (a) I also try to use them to affect what do_sea() does (which, as you observe, doesn't work right now), and (b) there's no reason why they shouldn't inform what fn does. So I think you're right. However, rather than duplicate code I wonder whether we can just rearrange do_mem_abort() so that the lines info.si_signo = inf->sig; info.si_errno = 0; info.si_code = inf->code; info.si_addr = (void __user *)addr; are moved ahead of the call to inf->fn(). This would have the effect of pre-populating info with sane defaults while still allowing inf->fn() to override them if appropriate. Thoughts? Cheers ---Dave ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20180213152207.GP5862-M5GwZQ6tE7x5pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org>]
* Re: [RFC PATCH v2 3/3] arm64: signal: Ensure si_code is valid for all fault signals [not found] ` <20180213152207.GP5862-M5GwZQ6tE7x5pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org> @ 2018-02-13 18:00 ` James Morse 0 siblings, 0 replies; 7+ messages in thread From: James Morse @ 2018-02-13 18:00 UTC (permalink / raw) To: Dave Martin Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-arch-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Russell King, Eric W. Biederman, Catalin Marinas Hi Dave, On 13/02/18 15:22, Dave Martin wrote: > On Tue, Feb 13, 2018 at 01:58:55PM +0000, James Morse wrote: >> On 30/01/18 18:50, Dave Martin wrote: >>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c >>> index 9b7f89d..4baa922 100644 >>> --- a/arch/arm64/mm/fault.c >>> +++ b/arch/arm64/mm/fault.c >>> @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) >> [..] >>> + { do_sea, SIGKILL, SI_KERNEL, "level 0 (translation table walk)" }, >>> + { do_sea, SIGKILL, SI_KERNEL, "level 1 (translation table walk)" }, >>> + { do_sea, SIGKILL, SI_KERNEL, "level 2 (translation table walk)" }, >>> + { do_sea, SIGKILL, SI_KERNEL, "level 3 (translation table walk)" }, >>> + { do_sea, SIGBUS, BUS_OBJERR, "synchronous parity or ECC error" }, // Reserved when RAS is implemented >> >> I agree the translation-table related external-aborts should end up with >> SIGKILL: there is nothing user-space can do. >> >> You use the fault_info table to vary the signal and si_code that should be used, >> but do_mem_abort() only uses these if the fn returns an error. For do_sea(), >> regardless of the values in this table SIGBUS will be generated as it always >> returns 0. >> >> >>> @@ -596,7 +596,7 @@ static int do_sea(unsigned long addr, unsigned int esr, >> struct pt_regs *regs) >>> >>> info.si_signo = SIGBUS; >>> info.si_errno = 0; >>> - info.si_code = 0; >>> + info.si_code = BUS_OBJERR; >>> if (esr & ESR_ELx_FnV) >>> info.si_addr = NULL; >>> else >> >> do_sea() has the right fault_info entry to hand, so I think these need to change >> to inf->sig and inf->code. (I assume its not valid to set si_addr for SIGKILL...) > > Yes, I guess that makes sense. > > For SIGKILL, I'm assuming that it is harmless to populate si_addr: even > though not strictly valid, the signal is never delivered to userspace. > Even ptrace cannot see SIGKILL -- the trace just disappears and further > ptrace calls fail with ESRCH. Good point! > If is matters, I guess we could prepopulate si_uid = si_pid = 0 for > this case. That's at least cleaner, so I might do that. > > > For do_sea: > > I was thinking of the fault_info[] table entries as for the fallback > case only, but (a) I also try to use them to affect what do_sea() does > (which, as you observe, doesn't work right now), and (b) there's no > reason why they shouldn't inform what fn does. Sure, > However, rather than duplicate code I wonder whether we can just > rearrange do_mem_abort() so that the lines > > info.si_signo = inf->sig; > info.si_errno = 0; > info.si_code = inf->code; > info.si_addr = (void __user *)addr; > > are moved ahead of the call to inf->fn(). > > This would have the effect of pre-populating info with sane defaults > while still allowing inf->fn() to override them if appropriate. I like the idea. It's a bit strange that do_mem_abort() looks up the table entry to call the handler, which looks up the table entry to find out what it should do. (__do_user_fault() already does this). This would change all of 'fn's prototypes, to save the struct-siginfo duplication in do_sea() and __do_user_fault(). Should the 'leaf' helpers still send the signal, or update the siginfo and return back to do_mem_abort()? Getting things like do_alignment_fault() in a kernel stack trace is the only reason I can see... Thanks, James ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-02-13 18:00 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-01-30 18:50 [RFC PATCH v2 0/3] arm64: Fix invalid si_codes for fault signals Dave Martin [not found] ` <1517338243-9749-1-git-send-email-Dave.Martin-5wv7dgnIgG8@public.gmane.org> 2018-01-30 18:50 ` [RFC PATCH v2 1/3] signal: Add FPE_FLTUNK si_code for undiagnosable fp exceptions Dave Martin 2018-01-30 18:50 ` [RFC PATCH v2 2/3] arm64: fpsimd: Fix bad si_code for undiagnosed SIGFPE Dave Martin 2018-01-30 18:50 ` [RFC PATCH v2 3/3] arm64: signal: Ensure si_code is valid for all fault signals Dave Martin 2018-02-13 13:58 ` James Morse 2018-02-13 15:22 ` Dave Martin [not found] ` <20180213152207.GP5862-M5GwZQ6tE7x5pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org> 2018-02-13 18:00 ` James Morse
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).