From: Baoquan He <bhe@redhat.com>
To: Eric DeVolder <eric.devolder@oracle.com>
Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
vschneid@redhat.com, dyoung@redhat.com,
kexec@lists.infradead.org, sourabhjain@linux.ibm.com,
konrad.wilk@oracle.com, boris.ostrovsky@oracle.com
Subject: Re: [PATCH] Crash: add lock to serialize crash hotplug handling
Date: Mon, 25 Sep 2023 10:34:42 +0800 [thread overview]
Message-ID: <ZRDxwqY669XsDsMk@MiWiFi-R3L-srv> (raw)
In-Reply-To: <8515c858-4be8-fbd6-7868-c8bfb5492f83@oracle.com>
On 09/23/23 at 07:10am, Eric DeVolder wrote:
>
>
> On 9/22/23 18:54, Baoquan He wrote:
> > Eric reported that handling corresponding crash hotplug event can be
> > failed easily when many momery hotplug event are notified in a short period.
> > They failed because failing to take __kexec_lock.
> >
> > =======
> > [ 78.714569] Fallback order for Node 0: 0
> > [ 78.714575] Built 1 zonelists, mobility grouping on. Total pages: 1817886
> > [ 78.717133] Policy zone: Normal
> > [ 78.724423] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
> > [ 78.727207] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
> > [ 80.056643] PEFILE: Unsigned PE binary
> > =======
> >
> > The memory hotplug events are notified very quickly and very many,
> > while the handling of crash hotplug is much slower relatively. So the
> > atomic variable __kexec_lock and kexec_trylock() can't guarantee the
> > serialization of crash hotplug handling.
> >
> > Here, add a new mutex lock __crash_hotplug_lock to serialize crash
> > hotplug handling specifically. This doesn't impact the usage of
> > __kexec_lock.
> >
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> > kernel/crash_core.c | 3 +++
> > kernel/kexec_core.c | 1 +
> > kernel/kexec_internal.h | 11 +++++++++++
> > 3 files changed, 15 insertions(+)
> >
> > diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > index 03a7932cde0a..e8851724a530 100644
> > --- a/kernel/crash_core.c
> > +++ b/kernel/crash_core.c
> > @@ -783,9 +783,11 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
> > {
> > struct kimage *image;
> > + crash_hotplug_lock();
> > /* Obtain lock while changing crash information */
> > if (!kexec_trylock()) {
> > pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
> > + crash_hotplug_unlock();
> > return;
> > }
> > @@ -852,6 +854,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
> > out:
> > /* Release lock now that update complete */
> > kexec_unlock();
> > + crash_hotplug_unlock();
> > }
>
> The crash_check_update_elfcorehdr() also has kexec_trylock() and needs similar treatment.
>
> > static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index 9dc728982d79..b95a73f35d9a 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -48,6 +48,7 @@
> > #include "kexec_internal.h"
> > atomic_t __kexec_lock = ATOMIC_INIT(0);
> > +DEFINE_MUTEX(__crash_hotplug_lock);
> > /* Flag to indicate we are going to kexec a new kernel */
> > bool kexec_in_progress = false;
> > diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h
> > index 74da1409cd14..1db31625ef20 100644
> > --- a/kernel/kexec_internal.h
> > +++ b/kernel/kexec_internal.h
> > @@ -28,6 +28,17 @@ static inline void kexec_unlock(void)
> > atomic_set_release(&__kexec_lock, 0);
> > }
> > +/*
> > + * Different than kexec/kdump loading/unloading/crash or kexec jumping/shrinking
> > + * which usually rarely happen, there will be many crash hotplug events notified
> > + * during one short period, e.g one memory board is hot added and memory regions
> > + * are online. So mutex lock __crash_hotplug_lock is used to serialize the crash
> > + * hotplug handling specificially.
> > + * */
> > +extern struct mutex __crash_hotplug_lock;
> > +#define crash_hotplug_lock() mutex_lock(&__crash_hotplug_lock)
> > +#define crash_hotplug_unlock() mutex_unlock(&__crash_hotplug_lock)
> > +
> > #ifdef CONFIG_KEXEC_FILE
> > #include <linux/purgatory.h>
> > void kimage_file_post_load_cleanup(struct kimage *image);
>
> The new content for kexec_internal.h and kexec_core.c could/should probably be
> moved into crash_core.c, within the CONFIG_CRASH_HOTPLUG?
That makes sense, I will spin v2 and post.
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Baoquan He <bhe@redhat.com>
To: Eric DeVolder <eric.devolder@oracle.com>
Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
vschneid@redhat.com, dyoung@redhat.com,
kexec@lists.infradead.org, sourabhjain@linux.ibm.com,
konrad.wilk@oracle.com, boris.ostrovsky@oracle.com
Subject: Re: [PATCH] Crash: add lock to serialize crash hotplug handling
Date: Mon, 25 Sep 2023 10:34:42 +0800 [thread overview]
Message-ID: <ZRDxwqY669XsDsMk@MiWiFi-R3L-srv> (raw)
In-Reply-To: <8515c858-4be8-fbd6-7868-c8bfb5492f83@oracle.com>
On 09/23/23 at 07:10am, Eric DeVolder wrote:
>
>
> On 9/22/23 18:54, Baoquan He wrote:
> > Eric reported that handling corresponding crash hotplug event can be
> > failed easily when many momery hotplug event are notified in a short period.
> > They failed because failing to take __kexec_lock.
> >
> > =======
> > [ 78.714569] Fallback order for Node 0: 0
> > [ 78.714575] Built 1 zonelists, mobility grouping on. Total pages: 1817886
> > [ 78.717133] Policy zone: Normal
> > [ 78.724423] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
> > [ 78.727207] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
> > [ 80.056643] PEFILE: Unsigned PE binary
> > =======
> >
> > The memory hotplug events are notified very quickly and very many,
> > while the handling of crash hotplug is much slower relatively. So the
> > atomic variable __kexec_lock and kexec_trylock() can't guarantee the
> > serialization of crash hotplug handling.
> >
> > Here, add a new mutex lock __crash_hotplug_lock to serialize crash
> > hotplug handling specifically. This doesn't impact the usage of
> > __kexec_lock.
> >
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> > kernel/crash_core.c | 3 +++
> > kernel/kexec_core.c | 1 +
> > kernel/kexec_internal.h | 11 +++++++++++
> > 3 files changed, 15 insertions(+)
> >
> > diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > index 03a7932cde0a..e8851724a530 100644
> > --- a/kernel/crash_core.c
> > +++ b/kernel/crash_core.c
> > @@ -783,9 +783,11 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
> > {
> > struct kimage *image;
> > + crash_hotplug_lock();
> > /* Obtain lock while changing crash information */
> > if (!kexec_trylock()) {
> > pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
> > + crash_hotplug_unlock();
> > return;
> > }
> > @@ -852,6 +854,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
> > out:
> > /* Release lock now that update complete */
> > kexec_unlock();
> > + crash_hotplug_unlock();
> > }
>
> The crash_check_update_elfcorehdr() also has kexec_trylock() and needs similar treatment.
>
> > static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index 9dc728982d79..b95a73f35d9a 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -48,6 +48,7 @@
> > #include "kexec_internal.h"
> > atomic_t __kexec_lock = ATOMIC_INIT(0);
> > +DEFINE_MUTEX(__crash_hotplug_lock);
> > /* Flag to indicate we are going to kexec a new kernel */
> > bool kexec_in_progress = false;
> > diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h
> > index 74da1409cd14..1db31625ef20 100644
> > --- a/kernel/kexec_internal.h
> > +++ b/kernel/kexec_internal.h
> > @@ -28,6 +28,17 @@ static inline void kexec_unlock(void)
> > atomic_set_release(&__kexec_lock, 0);
> > }
> > +/*
> > + * Different than kexec/kdump loading/unloading/crash or kexec jumping/shrinking
> > + * which usually rarely happen, there will be many crash hotplug events notified
> > + * during one short period, e.g one memory board is hot added and memory regions
> > + * are online. So mutex lock __crash_hotplug_lock is used to serialize the crash
> > + * hotplug handling specificially.
> > + * */
> > +extern struct mutex __crash_hotplug_lock;
> > +#define crash_hotplug_lock() mutex_lock(&__crash_hotplug_lock)
> > +#define crash_hotplug_unlock() mutex_unlock(&__crash_hotplug_lock)
> > +
> > #ifdef CONFIG_KEXEC_FILE
> > #include <linux/purgatory.h>
> > void kimage_file_post_load_cleanup(struct kimage *image);
>
> The new content for kexec_internal.h and kexec_core.c could/should probably be
> moved into crash_core.c, within the CONFIG_CRASH_HOTPLUG?
That makes sense, I will spin v2 and post.
next prev parent reply other threads:[~2023-09-25 2:34 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-22 23:54 [PATCH] Crash: add lock to serialize crash hotplug handling Baoquan He
2023-09-22 23:54 ` Baoquan He
2023-09-23 12:10 ` Eric DeVolder
2023-09-23 12:10 ` Eric DeVolder
2023-09-25 2:34 ` Baoquan He [this message]
2023-09-25 2:34 ` Baoquan He
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZRDxwqY669XsDsMk@MiWiFi-R3L-srv \
--to=bhe@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=boris.ostrovsky@oracle.com \
--cc=dyoung@redhat.com \
--cc=eric.devolder@oracle.com \
--cc=kexec@lists.infradead.org \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sourabhjain@linux.ibm.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.