From: Tao Zhou <tao.zhou@linux.dev>
To: Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org, kexec@lists.infradead.org,
linux-rt-users@vger.kernel.org,
Eric Biederman <ebiederm@xmission.com>,
Arnd Bergmann <arnd@arndb.de>, Petr Mladek <pmladek@suse.com>,
Thomas Gleixner <tglx@linutronix.de>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Juri Lelli <jlelli@redhat.com>,
"Luis Claudio R. Goncalves" <lgoncalv@redhat.com>,
Tao Zhou <tao.zhou@linux.dev>
Subject: Re: [PATCH] panic, kexec: Don't mutex_trylock() in __crash_kexec()
Date: Fri, 17 Jun 2022 18:42:07 +0800 [thread overview]
Message-ID: <Yqxaf6V+hvCSXQSM@geo.homenetwork> (raw)
In-Reply-To: <20220616123709.347053-1-vschneid@redhat.com>
Hi Valentin,
On Thu, Jun 16, 2022 at 01:37:09PM +0100, Valentin Schneider wrote:
> Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
> panic() doesn't work. The cause of that lies in the PREEMPT_RT definition
> of mutex_trylock():
>
> if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
> return 0;
>
> This prevents an NMI panic() from executing the main body of
> __crash_kexec() which does the actual kexec into the kdump kernel.
> The warning and return are explained by:
>
> 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context")
> [...]
> The reasons for this are:
>
> 1) There is a potential deadlock in the slowpath
>
> 2) Another cpu which blocks on the rtmutex will boost the task
> which allegedly locked the rtmutex, but that cannot work
> because the hard/softirq context borrows the task context.
>
> Use a pair of barrier-ordered variables to serialize loading vs executing a
> crash kernel.
>
> Tested by triggering NMI panics via:
>
> $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
> $ echo 1 > /proc/sys/kernel/unknown_nmi_panic
> $ echo 1 > /proc/sys/kernel/panic
>
> $ ipmitool power diag
>
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
> Regarding the original explanation for the WARN & return:
>
> I don't get why 2) is a problem - if the lock is acquired by the trylock
> then the critical section will be run without interruption since it
> cannot sleep, the interrupted task may get boosted but that will not
> have any actual impact AFAICT.
> Regardless, even if this doesn't sleep, the ->wait_lock in the slowpath
> isn't NMI safe so this needs changing.
>
> I've thought about trying to defer the kexec out of an NMI (or IRQ)
> context, but that pretty much means deferring the panic() which I'm
> not sure is such a great idea.
> ---
> include/linux/kexec.h | 2 ++
> kernel/kexec.c | 18 ++++++++++++++----
> kernel/kexec_core.c | 41 +++++++++++++++++++++++++----------------
> kernel/kexec_file.c | 14 ++++++++++++++
> 4 files changed, 55 insertions(+), 20 deletions(-)
>
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index ce6536f1d269..89bbe150752e 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -369,6 +369,8 @@ extern int kimage_crash_copy_vmcoreinfo(struct kimage *image);
>
> extern struct kimage *kexec_image;
> extern struct kimage *kexec_crash_image;
> +extern bool panic_wants_kexec;
> +extern bool kexec_loading;
> extern int kexec_load_disabled;
>
> #ifndef kexec_flush_icache_page
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index b5e40f069768..1253f4bb3079 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -94,14 +94,23 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
> /*
> * Because we write directly to the reserved memory region when loading
> * crash kernels we need a mutex here to prevent multiple crash kernels
> - * from attempting to load simultaneously, and to prevent a crash kernel
> - * from loading over the top of a in use crash kernel.
> - *
> - * KISS: always take the mutex.
> + * from attempting to load simultaneously.
> */
> if (!mutex_trylock(&kexec_mutex))
> return -EBUSY;
>
> + /*
> + * Prevent loading a new crash kernel while one is in use.
> + *
> + * Pairs with smp_mb() in __crash_kexec().
> + */
> + WRITE_ONCE(kexec_loading, true);
> + smp_mb();
> + if (READ_ONCE(panic_wants_kexec)) {
> + ret = -EBUSY;
> + goto out_unlock;
> + }
> +
> if (flags & KEXEC_ON_CRASH) {
> dest_image = &kexec_crash_image;
> if (kexec_crash_image)
> @@ -165,6 +174,7 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>
> kimage_free(image);
> out_unlock:
> + WRITE_ONCE(kexec_loading, false);
> mutex_unlock(&kexec_mutex);
> return ret;
> }
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 4d34c78334ce..932cc0d4daa3 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -933,6 +933,8 @@ int kimage_load_segment(struct kimage *image,
>
> struct kimage *kexec_image;
> struct kimage *kexec_crash_image;
> +bool panic_wants_kexec;
> +bool kexec_loading;
> int kexec_load_disabled;
> #ifdef CONFIG_SYSCTL
> static struct ctl_table kexec_core_sysctls[] = {
> @@ -964,24 +966,31 @@ late_initcall(kexec_core_sysctl_init);
> */
> void __noclone __crash_kexec(struct pt_regs *regs)
> {
> - /* Take the kexec_mutex here to prevent sys_kexec_load
> - * running on one cpu from replacing the crash kernel
> - * we are using after a panic on a different cpu.
> + /*
> + * This should be taking kexec_mutex before doing anything with the
> + * kexec_crash_image, but this code can be run in NMI context which
> + * means we can't even trylock.
> *
> - * If the crash kernel was not located in a fixed area
> - * of memory the xchg(&kexec_crash_image) would be
> - * sufficient. But since I reuse the memory...
> + * Pairs with smp_mb() in do_kexec_load() and sys_kexec_file_load()
> */
> - if (mutex_trylock(&kexec_mutex)) {
> - if (kexec_crash_image) {
> - struct pt_regs fixed_regs;
> -
> - crash_setup_regs(&fixed_regs, regs);
> - crash_save_vmcoreinfo();
> - machine_crash_shutdown(&fixed_regs);
> - machine_kexec(kexec_crash_image);
> - }
> - mutex_unlock(&kexec_mutex);
> + WRITE_ONCE(panic_wants_kexec, true);
> + smp_mb();
> + /*
> + * If we're panic'ing while someone else is messing with the crash
> + * kernel, this isn't going to end well.
> + */
> + if (READ_ONCE(kexec_loading)) {
> + WRITE_ONCE(panic_wants_kexec, false);
> + return;
> + }
So this is from NMI. The mutex guarantee that kexec_file_load() or
do_kexec_load() just one of them beat on cpu. NMI can happen on more
than one cpu. That means that here be cumulativity here IMHO.
kexec_file_load()/ NMI0 NMI1..
do_kexec_load()
set kexec_loading=true
smp_mb() set panic_wants_kexec=ture
smp_mb()
see kexec_loading=ture and
conditionally set
panic_wants_kexec=false;
set panic_wants_kexec=ture
smp_mb()
see panic_wants_kexec=ture
conditionally set
kexec_loading=false
see kexec_loading=false
do kexec nmi things.
You see conditionlly set kexec_loading or panic_wants_kexec there no barrier
there and if the cumulativity to have the effect there should be a acquire-release,
if I am not wrong.
__crash_kexec():
WRITE_ONCE(panic_wants_kexec, true);
smp_mb();
/*
* If we're panic'ing while someone else is messing with the crash
* kernel, this isn't going to end well.
*/
if (READ_ONCE(kexec_loading)) {
smp_store_release(panic_wants_kexec, false);
return;
}
kexec_file_load()/do_kexec_load():
WRITE_ONCE(kexec_loading, true);
smp_mb();
if (smp_load_acquire(panic_wants_kexec)) {
WRITE_ONCE(kexec_loading, false);
...
}
For those input, I'm sure I lost and feel hot..
I thought that change the patten to load-store and set initial
value but failed.
Thanks,
Tao
> + if (kexec_crash_image) {
> + struct pt_regs fixed_regs;
> +
> + crash_setup_regs(&fixed_regs, regs);
> + crash_save_vmcoreinfo();
> + machine_crash_shutdown(&fixed_regs);
> + machine_kexec(kexec_crash_image);
> }
> }
> STACK_FRAME_NON_STANDARD(__crash_kexec);
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 145321a5e798..4bb399e6623e 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -337,6 +337,18 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
> if (!mutex_trylock(&kexec_mutex))
> return -EBUSY;
>
> + /*
> + * Prevent loading a new crash kernel while one is in use.
> + *
> + * Pairs with smp_mb() in __crash_kexec().
> + */
> + WRITE_ONCE(kexec_loading, true);
> + smp_mb();
> + if (READ_ONCE(panic_wants_kexec)) {
> + ret = -EBUSY;
> + goto out_unlock;
> + }
> +
> dest_image = &kexec_image;
> if (flags & KEXEC_FILE_ON_CRASH) {
> dest_image = &kexec_crash_image;
> @@ -406,6 +418,8 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
> if ((flags & KEXEC_FILE_ON_CRASH) && kexec_crash_image)
> arch_kexec_protect_crashkres();
>
> +out_unlock:
> + WRITE_ONCE(kexec_loading, false);
> mutex_unlock(&kexec_mutex);
> kimage_free(image);
> return ret;
> --
> 2.27.0
>
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2022-06-17 10:41 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-16 12:37 [PATCH] panic, kexec: Don't mutex_trylock() in __crash_kexec() Valentin Schneider
2022-06-17 10:42 ` Tao Zhou [this message]
2022-06-17 11:52 ` Valentin Schneider
2022-06-17 13:52 ` Petr Mladek
2022-06-17 14:46 ` Valentin Schneider
2022-06-17 15:13 ` Sebastian Andrzej Siewior
2022-06-17 16:09 ` Valentin Schneider
2022-06-17 16:53 ` Sebastian Andrzej Siewior
2022-06-22 15:34 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yqxaf6V+hvCSXQSM@geo.homenetwork \
--to=tao.zhou@linux.dev \
--cc=arnd@arndb.de \
--cc=bigeasy@linutronix.de \
--cc=ebiederm@xmission.com \
--cc=jlelli@redhat.com \
--cc=kexec@lists.infradead.org \
--cc=lgoncalv@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=pmladek@suse.com \
--cc=tglx@linutronix.de \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox