From: Kees Cook <kees@kernel.org>
To: Andrew Zaborowski <andrew.zaborowski@intel.com>
Cc: linux-edac@vger.kernel.org, linux-mm@kvack.org,
Tony Luck <tony.luck@intel.com>,
Eric Biederman <ebiederm@xmission.com>,
Borislav Petkov <bp@alien8.de>,
x86@kernel.org
Subject: Re: [RESEND][PATCH 1/3] x86: Add task_struct flag to force SIGBUS on MCE
Date: Mon, 5 Aug 2024 21:36:40 -0700 [thread overview]
Message-ID: <202408052135.342F9455@keescook> (raw)
In-Reply-To: <20240723144752.1478226-1-andrew.zaborowski@intel.com>
On Tue, Jul 23, 2024 at 04:47:50PM +0200, Andrew Zaborowski wrote:
> Uncorrected memory errors for user pages are signaled to processes
> using SIGBUS or, if the error happens in a syscall, an error retval
> from the syscall. The SIGBUS is documented in
> Documentation/mm/hwpoison.rst#failure-recovery-modes
>
> But there are corner cases where we cannot or don't want to return a
> plain error from the syscall. Subsequent commits covers two such cases:
> execve and rseq. Current code, in both places, will kill the task with a
> SIGSEGV on error. While not explicitly stated, it can be argued that it
> should be a SIGBUS, for consistency and for the benefit of the userspace
> signal handlers. Even if the process cannot handle the signal, perhaps
> the parent process can. This was the case in the scenario that
> motivated this patch.
>
> In both cases, the architecture's exception handler (MCE handler on x86)
> will queue a call to memory_failure. This doesn't work because the
> syscall-specific code sees the -EFAULT and terminates the task before
> the queued work runs.
>
> To fix this: 1. let pending work run in the error cases in both places.
>
> And 2. on MCE, ensure memory_failure() is passed MF_ACTION_REQUIRED so
> that the SIGBUS is queued. Normally when the MCE is in a syscall,
> a fixup of return IP and a call to kill_me_never() are what we want.
> But in this case it's necessary to queue kill_me_maybe() which will set
> MF_ACTION_REQUIRED which is checked by memory_failure().
>
> To do this the syscall code will set current->kill_on_efault, a new
> task_struct flag. Check that flag in
> arch/x86/kernel/cpu/mce/core.c:do_machine_check()
>
> Note: the flag is not x86 specific even if only x86 handling is being
> added here. The definition could be guarded by #ifdef
> CONFIG_MEMORY_FAILURE, but it would then need set/clear utilities.
>
> Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
> ---
> Resending through an SMTP server that won't add the company footer.
>
> This is a v2 of
> https://lore.kernel.org/linux-mm/20240501015340.3014724-1-andrew.zaborowski@intel.com/
> In the v1 the existing flag current->in_execve was being reused instead
> of adding a new one. Kees Cook commented in
> https://lore.kernel.org/linux-mm/202405010915.465AF19@keescook/ that
> current->in_execve is going away. Lacking a better idea and seeing
> that execve() and rseq() would benefit from using a common mechanism, I
> decided to add this new flag.
>
> Perhaps with a better name current->kill_on_efault could replace
> brpm->point_of_no_return to offset the pain of having this extra flag.
> ---
> arch/x86/kernel/cpu/mce/core.c | 18 +++++++++++++++++-
Since this touches arch/x86/, can an x86 maintainer review this? I can
carry this via the execve tree...
-Kees
> include/linux/sched.h | 2 ++
> 2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index ad0623b65..13f2ace3d 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1611,7 +1611,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
> if (p)
> SetPageHWPoison(p);
> }
> - } else {
> + } else if (!current->kill_on_efault) {
> /*
> * Handle an MCE which has happened in kernel space but from
> * which the kernel can recover: ex_has_fault_handler() has
> @@ -1628,6 +1628,22 @@ noinstr void do_machine_check(struct pt_regs *regs)
>
> if (m.kflags & MCE_IN_KERNEL_COPYIN)
> queue_task_work(&m, msg, kill_me_never);
> + } else {
> + /*
> + * Even with recovery code extra handling is required when
> + * we're not returning to userspace after error (e.g. in
> + * execve() beyond the point of no return) to ensure that
> + * a SIGBUS is delivered.
> + */
> + if (m.kflags & MCE_IN_KERNEL_RECOV) {
> + if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
> + mce_panic("Failed kernel mode recovery", &m, msg);
> + }
> +
> + if (!mce_usable_address(&m))
> + queue_task_work(&m, msg, kill_me_now);
> + else
> + queue_task_work(&m, msg, kill_me_maybe);
> }
>
> out:
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 61591ac6e..0cde1ba11 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -975,6 +975,8 @@ struct task_struct {
> /* delay due to memory thrashing */
> unsigned in_thrashing:1;
> #endif
> + /* Kill task on user memory access error */
> + unsigned kill_on_efault:1;
>
> unsigned long atomic_flags; /* Flags requiring atomic access. */
>
> --
> 2.43.0
>
--
Kees Cook
next prev parent reply other threads:[~2024-08-06 4:36 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-23 14:47 [RESEND][PATCH 1/3] x86: Add task_struct flag to force SIGBUS on MCE Andrew Zaborowski
2024-07-23 14:47 ` [RESEND][PATCH 2/3] execve: Ensure SIGBUS delivered on memory failure Andrew Zaborowski
2024-07-23 14:47 ` [RESEND][PATCH 3/3] rseq: " Andrew Zaborowski
2024-08-06 4:37 ` Kees Cook
2024-08-06 7:51 ` Peter Zijlstra
2024-08-06 14:19 ` Mathieu Desnoyers
2024-08-06 4:36 ` Kees Cook [this message]
2024-08-06 8:35 ` [RESEND][PATCH 1/3] x86: Add task_struct flag to force SIGBUS on MCE Borislav Petkov
[not found] ` <SA1PR11MB69926BFE8EFDA7B3C3D84560E7B82@SA1PR11MB6992.namprd11.prod.outlook.com>
2024-08-08 0:01 ` Andrew Zaborowski
2024-08-08 14:56 ` Borislav Petkov
2024-08-09 1:22 ` Andrew Zaborowski
2024-08-09 8:34 ` Borislav Petkov
[not found] ` <SA1PR11MB69927AE28B46583DCB5C97DEE7BA2@SA1PR11MB6992.namprd11.prod.outlook.com>
2024-08-10 1:20 ` Andrew Zaborowski
2024-08-10 3:21 ` Borislav Petkov
2024-08-10 3:55 ` Andrew Zaborowski
2024-08-10 9:25 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202408052135.342F9455@keescook \
--to=kees@kernel.org \
--cc=andrew.zaborowski@intel.com \
--cc=bp@alien8.de \
--cc=ebiederm@xmission.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.