From: "Chen, Gong" <gong.chen@linux.intel.com>
To: Borislav Petkov <bp@suse.de>
Cc: tony.luck@intel.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/4 Rebase] x86, MCE: Avoid potential deadlock in MCE context
Date: Fri, 22 May 2015 17:12:47 -0400 [thread overview]
Message-ID: <20150522211247.GB4930@gchen.bj.intel.com> (raw)
In-Reply-To: <20150520092800.GB3645@pd.tnic>
[-- Attachment #1: Type: text/plain, Size: 6649 bytes --]
On Wed, May 20, 2015 at 11:28:00AM +0200, Borislav Petkov wrote:
> Date: Wed, 20 May 2015 11:28:00 +0200
> From: Borislav Petkov <bp@suse.de>
> To: "Chen, Gong" <gong.chen@linux.intel.com>, tony.luck@intel.com
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 4/4 Rebase] x86, MCE: Avoid potential deadlock in MCE
> context
> User-Agent: Mutt/1.5.23 (2014-03-12)
>
> On Wed, May 20, 2015 at 03:35:38PM -0400, Chen, Gong wrote:
> > Printing in MCE context is a no-no, currently, as printk is not
> > NMI-safe. If some of the notifiers on the MCE chain call *printk*, we
> > may deadlock. In order to avoid that, delay printk into process context
> > to fix it.
> >
> > Background info at: https://lkml.org/lkml/2014/6/27/26
> >
> > Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
> > Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
> > Link: http://lkml.kernel.org/r/1406797523-28710-6-git-send-email-gong.chen@linux.intel.com
> > [ Boris: rewrite a bit. ]
> > Signed-off-by: Borislav Petkov <bp@suse.de>
> > ---
> > arch/x86/include/asm/mce.h | 1 +
> > arch/x86/kernel/cpu/mcheck/mce-apei.c | 2 +-
> > arch/x86/kernel/cpu/mcheck/mce.c | 8 ++++++--
> > arch/x86/kernel/cpu/mcheck/mce_intel.c | 1 -
> > arch/x86/kernel/cpu/mcheck/therm_throt.c | 1 +
> > arch/x86/kernel/cpu/mcheck/threshold.c | 1 +
> > 6 files changed, 10 insertions(+), 4 deletions(-)
>
> ....
>
> > diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > index 1af51b1586d7..2733f275237d 100644
> > --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > @@ -427,6 +427,7 @@ static inline void __smp_thermal_interrupt(void)
> > {
> > inc_irq_stat(irq_thermal_count);
> > smp_thermal_vector();
> > + mce_queue_irq_work();
>
> Hmm, at a second glance, this looks wrong. I think we should do that
> call in intel_thermal_interrupt().
>
> > asmlinkage __visible void smp_thermal_interrupt(struct pt_regs *regs)
> > diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c
> > index 7245980186ee..d695faa234eb 100644
> > --- a/arch/x86/kernel/cpu/mcheck/threshold.c
> > +++ b/arch/x86/kernel/cpu/mcheck/threshold.c
> > @@ -22,6 +22,7 @@ static inline void __smp_threshold_interrupt(void)
> > {
> > inc_irq_stat(irq_threshold_count);
> > mce_threshold_vector();
> > + mce_queue_irq_work();
>
> Same here.
>
> mce_queue_irq_work() call should be issued in both AMD and Intel
> threshold handlers but not in the generic one which is unlikely to queue
> any MCE...
>
> Right?
>
Since AMD doesn't queue any MCE, it should definitely only cover Intel.
Please check out following patch:
----8<----
From: "Chen, Gong" <gong.chen@linux.intel.com>
Date: Fri, 22 May 2015 17:02:50 -0400
Subject: [PATCH 4/4 rebase v2] x86, MCE: Avoid potential deadlock in MCE context
Printing in MCE context is a no-no, currently, as printk is not
NMI-safe. If some of the notifiers on the MCE chain call *printk*, we
may deadlock. In order to avoid that, delay printk into process context
to fix it.
Background info at: https://lkml.org/lkml/2014/6/27/26
v2 -> v1: move mce_queue_irq_work call into Intel specific logic
Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1406797523-28710-6-git-send-email-gong.chen@linux.intel.com
[ Boris: rewrite a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/include/asm/mce.h | 1 +
arch/x86/kernel/cpu/mcheck/mce-apei.c | 2 +-
arch/x86/kernel/cpu/mcheck/mce.c | 8 ++++++--
arch/x86/kernel/cpu/mcheck/mce_intel.c | 2 +-
arch/x86/kernel/cpu/mcheck/therm_throt.c | 2 ++
5 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index d16f983f46f5..781432dd8123 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -198,6 +198,7 @@ enum mcp_flags {
bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
int mce_notify_irq(void);
+void mce_queue_irq_work(void);
DECLARE_PER_CPU(struct mce, injectm);
diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
index a1aef9533154..380e3ac8fb62 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
@@ -57,7 +57,7 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
m.addr = mem_err->physical_addr;
mce_log(&m);
- mce_notify_irq();
+ mce_queue_irq_work();
}
EXPORT_SYMBOL_GPL(apei_mce_report_mem_error);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index b369b5fcda1d..a3be97961e22 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -156,7 +156,7 @@ void mce_log(struct mce *mce)
/* Emit the trace record: */
trace_mce_record(mce);
- atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, mce);
+ mce_genpool_add(mce);
mce->finished = 0;
wmb();
@@ -486,6 +486,11 @@ static void mce_irq_work_cb(struct irq_work *entry)
mce_schedule_work();
}
+void mce_queue_irq_work(void)
+{
+ irq_work_queue(&mce_irq_work);
+}
+
static void mce_report_event(struct pt_regs *regs)
{
if (regs->flags & (X86_VM_MASK|X86_EFLAGS_IF)) {
@@ -1105,7 +1110,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
/* assuming valid severity level != 0 */
m.severity = severity;
m.usable_addr = mce_usable_address(&m);
- mce_genpool_add(&m);
mce_log(&m);
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index b4a41cf030ed..3392002e2911 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -216,7 +216,7 @@ static void intel_threshold_interrupt(void)
return;
machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned));
- mce_notify_irq();
+ mce_queue_irq_work();
}
/*
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 1af51b1586d7..809a1e8f7fbd 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -413,6 +413,8 @@ static void intel_thermal_interrupt(void)
POWER_LIMIT_EVENT,
PACKAGE_LEVEL);
}
+
+ mce_queue_irq_work();
}
static void unexpected_thermal_interrupt(void)
--
2.3.2
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2015-05-22 8:19 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-20 19:35 MCE ring buffer management (Rebase) Chen, Gong
2015-05-20 19:35 ` [PATCH 1/4 Rebase] x86, MCE: Provide a lock-less memory pool to save error record Chen, Gong
2015-05-20 10:36 ` Borislav Petkov
2015-05-22 21:06 ` Chen, Gong
2015-05-22 8:17 ` Borislav Petkov
2015-05-20 19:35 ` [PATCH 2/4 Rebase] x86, MCE: Don't use percpu for MCE workqueue/irq_work Chen, Gong
2015-05-20 19:35 ` [PATCH 3/4 Rebase] x86, MCE: Remove mce_ring for SRAO error Chen, Gong
2015-05-20 19:35 ` [PATCH 4/4 Rebase] x86, MCE: Avoid potential deadlock in MCE context Chen, Gong
2015-05-20 9:28 ` Borislav Petkov
2015-05-22 21:12 ` Chen, Gong [this message]
2015-05-22 9:09 ` Borislav Petkov
2015-06-08 13:41 ` Borislav Petkov
2015-06-08 20:03 ` Luck, Tony
2015-06-08 20:26 ` Borislav Petkov
2015-06-11 8:27 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150522211247.GB4930@gchen.bj.intel.com \
--to=gong.chen@linux.intel.com \
--cc=bp@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox