From: Andrew Morton <akpm@linux-foundation.org>
To: Eric DeVolder <eric.devolder@oracle.com>
Cc: linux-kernel@vger.kernel.org, bhe@redhat.com, vgoyal@redhat.com,
dyoung@redhat.com, ebiederm@xmission.com,
kexec@lists.infradead.org, sourabhjain@linux.ibm.com,
konrad.wilk@oracle.com, boris.ostrovsky@oracle.com
Subject: Re: [PATCH] kexec: change locking mechanism to a mutex
Date: Thu, 21 Sep 2023 17:26:50 -0700 [thread overview]
Message-ID: <20230921172650.aeacc5de4f45d13e5671d7b2@linux-foundation.org> (raw)
In-Reply-To: <20230921215938.2192-1-eric.devolder@oracle.com>
On Thu, 21 Sep 2023 17:59:38 -0400 Eric DeVolder <eric.devolder@oracle.com> wrote:
> Scaled up testing has revealed that the kexec_trylock()
> implementation leads to failures within the crash hotplug
> infrastructure due to the inability to acquire the lock,
> specifically the message:
>
> crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
>
> When hotplug events occur, the crash hotplug infrastructure first
> attempts to obtain the lock via the kexec_trylock(). However, the
> implementation either acquires the lock, or fails and returns; there
> is no waiting on the lock. Here is the comment/explanation from
> kernel/kexec_internal.h:kexec_trylock():
>
> * Whatever is used to serialize accesses to the kexec_crash_image needs to be
> * NMI safe, as __crash_kexec() can happen during nmi_panic(), so here we use a
> * "simple" atomic variable that is acquired with a cmpxchg().
>
> While this in theory can happen for either CPU or memory hoptlug,
> this problem is most prone to occur for memory hotplug.
>
> When memory is hot plugged, the memory is converted into smaller
> 128MiB memblocks (typically). As each memblock is processed, a
> kernel thread and a udev event thread are created. The udev thread
> tries for the lock via the reading of the sysfs node
> /sys/devices/system/memory/crash_hotplug node, and the kernel
> worker thread tries for the lock upon entering the crash hotplug
> infrastructure.
>
> These threads then compete for the kexec lock.
>
> For example, a 1GiB DIMM is converted into 8 memblocks, each
> spawning two threads for a total of 16 threads that create a small
> "swarm" all trying to acquire the lock. The larger the DIMM, the
> more the memblocks and the larger the swarm.
>
> At the root of the problem is the atomic lock behind kexec_trylock();
> it works well for low lock traffic; ie loading/unloading a capture
> kernel, things that happen basically once. But with the introduction
> of crash hotplug, the traffic through the lock increases significantly,
> and more importantly in bursts occurring at roughly the same time. Thus
> there is a need to wait on the lock.
>
> A possible workaround is to simply retry the lock, say up to N times.
> There is, of course, the problem of determining a value of N that works for
> all implementations, and for all the other call sites of kexec_trylock().
> Not ideal.
>
> The design decision to use the atomic lock is described in the comment
> from kexec_internal.h, cited above. However, examining the code of
> __crash_kexec():
>
> if (kexec_trylock()) {
> if (kexec_crash_image) {
> ...
> }
> kexec_unlock();
> }
>
> reveals that the use of kexec_trylock() here is actually a "best effort"
> due to the atomic lock. This atomic lock, prior to crash hotplug,
> would almost always be assured (another kexec syscall could hold the lock
> and prevent this, but that is about it).
>
> So at the point where the capture kernel would be invoked, if the lock
> is not obtained, then kdump doesn't occur.
>
> It is possible to instead use a mutex with proper waiting, and utilize
> mutex_trylock() as the "best effort" in __crash_kexec(). The use of a
> mutex then avoids all the lock acquisition problems that were revealed
> by the crash hotplug activity.
>
> Convert the atomic lock to a mutex.
>
> ...
>
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -47,7 +47,7 @@
> #include <crypto/hash.h>
> #include "kexec_internal.h"
>
> -atomic_t __kexec_lock = ATOMIC_INIT(0);
> +DEFINE_MUTEX(__kexec_lock);
>
> /* Flag to indicate we are going to kexec a new kernel */
> bool kexec_in_progress = false;
> @@ -1057,7 +1057,7 @@ void __noclone __crash_kexec(struct pt_regs *regs)
> * of memory the xchg(&kexec_crash_image) would be
> * sufficient. But since I reuse the memory...
> */
> - if (kexec_trylock()) {
> + if (mutex_trylock(&__kexec_lock)) {
> if (kexec_crash_image) {
> struct pt_regs fixed_regs;
What's happening here? If someone else held the lock we silently fail
to run the kexec? Shouldn't we at least alert the user to what just
happened?
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2023-09-22 1:31 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-21 21:59 [PATCH] kexec: change locking mechanism to a mutex Eric DeVolder
2023-09-22 0:22 ` Andrew Morton
2023-09-22 1:00 ` Eric DeVolder
2023-09-22 0:26 ` Andrew Morton [this message]
2023-09-22 1:02 ` Eric DeVolder
2023-09-22 3:36 ` Dave Young
2023-09-22 8:06 ` Baoquan He
2023-09-22 13:39 ` Eric DeVolder
2023-09-22 16:28 ` Valentin Schneider
2023-09-22 17:35 ` Eric DeVolder
2023-09-23 0:04 ` Baoquan He
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230921172650.aeacc5de4f45d13e5671d7b2@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=boris.ostrovsky@oracle.com \
--cc=dyoung@redhat.com \
--cc=ebiederm@xmission.com \
--cc=eric.devolder@oracle.com \
--cc=kexec@lists.infradead.org \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sourabhjain@linux.ibm.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox