From: Borislav Petkov <bp@amd64.org>
To: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
"Luck, Tony" <tony.luck@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"x86@kernel.org" <x86@kernel.org>, Ingo Molnar <mingo@elte.hu>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt
Date: Fri, 26 Aug 2011 12:50:28 +0200 [thread overview]
Message-ID: <20110826105028.GB9510@aftab> (raw)
In-Reply-To: <4DFB1304.10604@jp.fujitsu.com>
On Fri, Jun 17, 2011 at 04:40:36AM -0400, Hidetoshi Seto wrote:
> Function del_timer_sync() has WARN_ON(in_irq()) in it because
> calling it from interrupt context can cause deadlock if it
> interrupts the target timer running.
>
> In MCE code, del_timer_sync() is used with on_each_cpu() in
> some parts for sysfs files:
> bank*, check_interval, cmci_disabled and ignore_ce.
>
> However use of on_each_cpu() results in calling the function
> passed as the argument in the interrupt context. It means you
> can see a flood of warnings from del_timer_sync() by a simple
> file access, for example:
>
> echo 300 > /sys/devices/system/machinecheck/machinecheck0/check_interval
>
> Fortunately these MCE specific files are rare-used and AFAIK
> only few MCE geeks experience this warning on write.
>
> To remove the warning (for my happy hacking), move timer deletion
> outside of the interrupt context ;-)
I'm hitting this warning too in my testing here so I'd like to expedite
this patch going in. I'll send the version below upstream next week if
there are no objections:
--
>From 0e4fe50cd4ed6f5990173f4587ad07dd4782fa30 Mon Sep 17 00:00:00 2001
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Date: Fri, 17 Jun 2011 04:40:36 -0400
Subject: [PATCH] x86, mce: Do not call del_timer_sync() in IRQ context
del_timer_sync() can cause a deadlock when called in interrupt context.
It is used with on_each_cpu() in some parts for sysfs files like bank*,
check_interval, cmci_disabled and ignore_ce.
However, use of on_each_cpu() results in calling the function passed
as the argument in interrupt context. This causes a flood of nested
warnings from del_timer_sync() (it runs on each CPU) caused even by a
simple file access like:
$ echo 300 > /sys/devices/system/machinecheck/machinecheck0/check_interval
Fortunately, these MCE-specific files are rarely used and AFAIK only few
MCE geeks experience this warning.
To remove the warning, move timer deletion outside of the interrupt
context.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 20 ++++++++++++++------
1 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 08363b0..5b5ccee 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1140,6 +1140,15 @@ static void mce_start_timer(unsigned long data)
add_timer_on(t, smp_processor_id());
}
+/* Must not be called in IRQ context where del_timer_sync() can deadlock */
+static void mce_timer_delete_all(void)
+{
+ int cpu;
+
+ for_each_online_cpu(cpu)
+ del_timer_sync(&per_cpu(mce_timer, cpu));
+}
+
static void mce_do_trigger(struct work_struct *work)
{
call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
@@ -1750,7 +1759,6 @@ static struct syscore_ops mce_syscore_ops = {
static void mce_cpu_restart(void *data)
{
- del_timer_sync(&__get_cpu_var(mce_timer));
if (!mce_available(__this_cpu_ptr(&cpu_info)))
return;
__mcheck_cpu_init_generic();
@@ -1760,16 +1768,15 @@ static void mce_cpu_restart(void *data)
/* Reinit MCEs after user configuration changes */
static void mce_restart(void)
{
+ mce_timer_delete_all();
on_each_cpu(mce_cpu_restart, NULL, 1);
}
/* Toggle features for corrected errors */
-static void mce_disable_ce(void *all)
+static void mce_disable_cmci(void *data)
{
if (!mce_available(__this_cpu_ptr(&cpu_info)))
return;
- if (all)
- del_timer_sync(&__get_cpu_var(mce_timer));
cmci_clear();
}
@@ -1852,7 +1859,8 @@ static ssize_t set_ignore_ce(struct sys_device *s,
if (mce_ignore_ce ^ !!new) {
if (new) {
/* disable ce features */
- on_each_cpu(mce_disable_ce, (void *)1, 1);
+ mce_timer_delete_all();
+ on_each_cpu(mce_disable_cmci, NULL, 1);
mce_ignore_ce = 1;
} else {
/* enable ce features */
@@ -1875,7 +1883,7 @@ static ssize_t set_cmci_disabled(struct sys_device *s,
if (mce_cmci_disabled ^ !!new) {
if (new) {
/* disable cmci */
- on_each_cpu(mce_disable_ce, NULL, 1);
+ on_each_cpu(mce_disable_cmci, NULL, 1);
mce_cmci_disabled = 1;
} else {
/* enable cmci */
--
1.7.4.rc2
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
next prev parent reply other threads:[~2011-08-26 10:50 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-17 8:37 [PATCH 0/8] x86, mce: misc fix/cleanups, cont Hidetoshi Seto
2011-06-17 8:40 ` [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt Hidetoshi Seto
2011-06-17 13:56 ` Borislav Petkov
2011-06-20 4:46 ` Hidetoshi Seto
2011-06-20 7:36 ` Borislav Petkov
2011-08-26 10:50 ` Borislav Petkov [this message]
2011-06-17 8:42 ` [PATCH 2/8] x86, mce: remove redundant mce_available() checks Hidetoshi Seto
2011-06-17 14:39 ` Borislav Petkov
2011-06-20 4:47 ` Hidetoshi Seto
2011-06-20 7:44 ` Borislav Petkov
2011-06-17 8:43 ` [PATCH 3/8] x86, mce: introduce mce_timer_add() Hidetoshi Seto
2011-06-17 15:11 ` Borislav Petkov
2011-06-17 8:44 ` [PATCH 4/8] x86, mce: rename bootparam parser Hidetoshi Seto
2011-06-17 15:41 ` Borislav Petkov
2011-06-17 22:25 ` Luck, Tony
2011-06-18 8:38 ` Borislav Petkov
2011-06-20 4:48 ` Hidetoshi Seto
2011-06-17 8:45 ` [PATCH 5/8] x86, mce: introduce mce_sysdev_init() Hidetoshi Seto
2011-06-17 16:32 ` Borislav Petkov
2011-06-20 4:48 ` Hidetoshi Seto
2011-06-17 8:46 ` [PATCH 6/8] x86, mce: introduce mce_memory_failure_process() Hidetoshi Seto
2011-06-17 16:59 ` Borislav Petkov
2011-06-17 8:49 ` [PATCH 7/8] x86, mce: rework use of TIF_MCE_NOTIFY Hidetoshi Seto
2011-06-17 8:50 ` [PATCH 8/8] x86, mce, edac: call edac_mce_parse() once per a record Hidetoshi Seto
2011-06-17 17:10 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110826105028.GB9510@aftab \
--to=bp@amd64.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.