From: Kees Cook <keescook@chromium.org>
To: Andrew Zaborowski <andrew.zaborowski@intel.com>
Cc: linux-edac@vger.kernel.org, linux-mm@kvack.org,
Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>,
Eric Biederman <ebiederm@xmission.com>
Subject: Re: [PATCH][RFC] exec: x86: Ensure SIGBUS delivered on MCE
Date: Wed, 1 May 2024 09:19:57 -0700 [thread overview]
Message-ID: <202405010915.465AF19@keescook> (raw)
In-Reply-To: <20240501015340.3014724-1-andrew.zaborowski@intel.com>
On Wed, May 01, 2024 at 03:53:40AM +0200, Andrew Zaborowski wrote:
> Uncorrected memory errors are signaled to processes using SIGBUS or an
> error retval from a syscall. But there's a corner cases in execve where
> a SIGSEGV will be delivered. Specifically this will happen if the binary
> loader triggers a memory error reading user pages. The architecture's
> handler (MCE handler on x86) may queue a call to memory_failure but that
> won't run until the execve() returns. The binary loader is called after
> bprm->point_of_no_return is set meaning that any error is handled by
> bprm_execve() with a SIGSEGV to the process.
Why is it needed to have a distinction between SIGBUS and SIGSEGV in
this case?
> To ensure it is terminated with a SIGBUS we 1. let pending work run in
> the bprm_execve error case.
>
> And 2. ensure memory_failure() is passed MF_ACTION_REQUIRED so that the
> SIGBUS is queued. Normally when the MCE is in a syscall, a fixup of
> return IP and a call to kill_me_never are enough. But in this case
> it's necessary to queue kill_me_maybe() which will set
> MF_ACTION_REQUIRED.
>
> Reuse current->in_execve to make the decision.
We're actually in the process of trying to remove[1] this flag, so I'd
like to avoid adding new users of it. It sounds like it's desirable here
because a choice is needed about kill_me_never() vs kill_me_maybe()?
-Kees
[1] https://lore.kernel.org/lkml/8fafb8e1-b6be-4d08-945f-b464e3a396c8@I-love.SAKURA.ne.jp/
>
> Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
> ---
> arch/x86/kernel/cpu/mce/core.c | 14 ++++++++++++++
> fs/exec.c | 12 +++++++++---
> include/linux/sched.h | 2 +-
> 3 files changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 84d41be6d06b..11effdff942c 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1593,6 +1593,20 @@ noinstr void do_machine_check(struct pt_regs *regs)
> else
> queue_task_work(&m, msg, kill_me_maybe);
>
> + } else if (current->in_execve) {
> + /*
> + * Cannot recover a task in execve() beyond point of no
> + * return but stop further user memory accesses.
> + */
> + if (m.kflags & MCE_IN_KERNEL_RECOV) {
> + if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
> + mce_panic("Failed kernel mode recovery", &m, msg);
> + }
> +
> + if (!mce_usable_address(&m))
> + queue_task_work(&m, msg, kill_me_now);
> + else
> + queue_task_work(&m, msg, kill_me_maybe);
> } else {
> /*
> * Handle an MCE which has happened in kernel space but from
> diff --git a/fs/exec.c b/fs/exec.c
> index cf1df7f16e55..1bea9c252a11 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -67,6 +67,7 @@
> #include <linux/time_namespace.h>
> #include <linux/user_events.h>
> #include <linux/rseq.h>
> +#include <linux/task_work.h>
>
> #include <linux/uaccess.h>
> #include <asm/mmu_context.h>
> @@ -1888,10 +1889,15 @@ static int bprm_execve(struct linux_binprm *bprm)
> * If past the point of no return ensure the code never
> * returns to the userspace process. Use an existing fatal
> * signal if present otherwise terminate the process with
> - * SIGSEGV.
> + * SIGSEGV. Run pending work before that in case it is
> + * terminating the process with a different signal.
> */
> - if (bprm->point_of_no_return && !fatal_signal_pending(current))
> - force_fatal_sig(SIGSEGV);
> + if (bprm->point_of_no_return) {
> + task_work_run();
> +
> + if (!fatal_signal_pending(current))
> + force_fatal_sig(SIGSEGV);
> + }
>
> sched_mm_cid_after_execve(current);
> current->fs->in_exec = 0;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 3c2abbc587b4..8970a191d8fe 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -922,7 +922,7 @@ struct task_struct {
> unsigned sched_rt_mutex:1;
> #endif
>
> - /* Bit to tell TOMOYO we're in execve(): */
> + /* Bit to tell TOMOYO and x86 MCE code we're in execve(): */
> unsigned in_execve:1;
> unsigned in_iowait:1;
> #ifndef TIF_RESTORE_SIGMASK
> --
> 2.39.3
>
--
Kees Cook
next prev parent reply other threads:[~2024-05-01 16:19 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-01 1:53 [PATCH][RFC] exec: x86: Ensure SIGBUS delivered on MCE Andrew Zaborowski
2024-05-01 16:19 ` Kees Cook [this message]
[not found] ` <SA1PR11MB69929DBECFE6D8503D214359E7192@SA1PR11MB6992.namprd11.prod.outlook.com>
2024-05-01 18:52 ` Andrew Zaborowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202405010915.465AF19@keescook \
--to=keescook@chromium.org \
--cc=andrew.zaborowski@intel.com \
--cc=bp@alien8.de \
--cc=ebiederm@xmission.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.