* [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
@ 2014-04-17 9:57 tip-bot for Ingo Molnar
2014-04-17 10:09 ` Peter Zijlstra
0 siblings, 1 reply; 16+ messages in thread
From: tip-bot for Ingo Molnar @ 2014-04-17 9:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, torvalds, atodorov, peterz, tony.luck,
bp, gong.chen, jwboyer, qmewlo, tglx
Commit-ID: ea431643d6c38728195e2c456801c3ef66bb9991
Gitweb: http://git.kernel.org/tip/ea431643d6c38728195e2c456801c3ef66bb9991
Author: Ingo Molnar <mingo@kernel.org>
AuthorDate: Thu, 17 Apr 2014 10:25:53 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 17 Apr 2014 10:28:42 +0200
x86/mce: Fix CMCI preemption bugs
The following commit:
27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms")
Added two preemption bugs:
- machine_check_poll() does a get_cpu_var() without a matching
put_cpu_var(), which causes preemption imbalance and crashes upon
bootup.
- it does percpu ops without disabling preemption. Preemption is not
disabled due to the mistaken use of a raw spinlock.
To fix these bugs fix the imbalance and change
cmci_discover_lock to a regular spinlock.
Reported-by: Owen Kibel <qmewlo@gmail.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Chen, Gong <gong.chen@linux.intel.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Todorov <atodorov@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
--
arch/x86/kernel/cpu/mcheck/mce.c | 4 +---
arch/x86/kernel/cpu/mcheck/mce_intel.c | 18 +++++++++---------
2 files changed, 10 insertions(+), 12 deletions(-)
---
arch/x86/kernel/cpu/mcheck/mce.c | 4 +---
arch/x86/kernel/cpu/mcheck/mce_intel.c | 18 +++++++++---------
2 files changed, 10 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index eeee23f..68317c8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -598,7 +598,6 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
{
struct mce m;
int i;
- unsigned long *v;
this_cpu_inc(mce_poll_count);
@@ -618,8 +617,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
if (!(m.status & MCI_STATUS_VAL))
continue;
- v = &get_cpu_var(mce_polled_error);
- set_bit(0, v);
+ this_cpu_write(mce_polled_error, 1);
/*
* Uncorrected or signalled events are handled by the exception
* handler when it is enabled, so don't process those here.
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 3bdb95a..9a316b2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -42,7 +42,7 @@ static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned);
* cmci_discover_lock protects against parallel discovery attempts
* which could race against each other.
*/
-static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
+static DEFINE_SPINLOCK(cmci_discover_lock);
#define CMCI_THRESHOLD 1
#define CMCI_POLL_INTERVAL (30 * HZ)
@@ -144,14 +144,14 @@ static void cmci_storm_disable_banks(void)
int bank;
u64 val;
- raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+ spin_lock_irqsave(&cmci_discover_lock, flags);
owned = __get_cpu_var(mce_banks_owned);
for_each_set_bit(bank, owned, MAX_NR_BANKS) {
rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
val &= ~MCI_CTL2_CMCI_EN;
wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
}
- raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
}
static bool cmci_storm_detect(void)
@@ -211,7 +211,7 @@ static void cmci_discover(int banks)
int i;
int bios_wrong_thresh = 0;
- raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+ spin_lock_irqsave(&cmci_discover_lock, flags);
for (i = 0; i < banks; i++) {
u64 val;
int bios_zero_thresh = 0;
@@ -266,7 +266,7 @@ static void cmci_discover(int banks)
WARN_ON(!test_bit(i, __get_cpu_var(mce_poll_banks)));
}
}
- raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
if (mca_cfg.bios_cmci_threshold && bios_wrong_thresh) {
pr_info_once(
"bios_cmci_threshold: Some banks do not have valid thresholds set\n");
@@ -316,10 +316,10 @@ void cmci_clear(void)
if (!cmci_supported(&banks))
return;
- raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+ spin_lock_irqsave(&cmci_discover_lock, flags);
for (i = 0; i < banks; i++)
__cmci_disable_bank(i);
- raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
}
static void cmci_rediscover_work_func(void *arg)
@@ -360,9 +360,9 @@ void cmci_disable_bank(int bank)
if (!cmci_supported(&banks))
return;
- raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+ spin_lock_irqsave(&cmci_discover_lock, flags);
__cmci_disable_bank(bank);
- raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
}
static void intel_init_cmci(void)
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
2014-04-17 9:57 [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs tip-bot for Ingo Molnar
@ 2014-04-17 10:09 ` Peter Zijlstra
2014-04-17 10:24 ` Borislav Petkov
0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2014-04-17 10:09 UTC (permalink / raw)
To: mingo, hpa, linux-kernel, torvalds, atodorov, tony.luck,
gong.chen, bp, jwboyer, qmewlo, tglx
Cc: linux-tip-commits
On Thu, Apr 17, 2014 at 02:57:54AM -0700, tip-bot for Ingo Molnar wrote:
> Commit-ID: ea431643d6c38728195e2c456801c3ef66bb9991
> Gitweb: http://git.kernel.org/tip/ea431643d6c38728195e2c456801c3ef66bb9991
> Author: Ingo Molnar <mingo@kernel.org>
> AuthorDate: Thu, 17 Apr 2014 10:25:53 +0200
> Committer: Ingo Molnar <mingo@kernel.org>
> CommitDate: Thu, 17 Apr 2014 10:28:42 +0200
>
> x86/mce: Fix CMCI preemption bugs
>
> The following commit:
>
> 27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms")
>
> Added two preemption bugs:
>
> - machine_check_poll() does a get_cpu_var() without a matching
> put_cpu_var(), which causes preemption imbalance and crashes upon
> bootup.
>
> - it does percpu ops without disabling preemption. Preemption is not
> disabled due to the mistaken use of a raw spinlock.
it is arch_spinlock that doesn't disable preemption. raw_spinlock
disables preemption just fine.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
2014-04-17 10:09 ` Peter Zijlstra
@ 2014-04-17 10:24 ` Borislav Petkov
2014-04-17 14:03 ` Luck, Tony
0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2014-04-17 10:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, hpa, linux-kernel, torvalds, atodorov, tony.luck,
gong.chen, jwboyer, qmewlo, tglx, linux-tip-commits
On Thu, Apr 17, 2014 at 12:09:44PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 17, 2014 at 02:57:54AM -0700, tip-bot for Ingo Molnar wrote:
> > Commit-ID: ea431643d6c38728195e2c456801c3ef66bb9991
> > Gitweb: http://git.kernel.org/tip/ea431643d6c38728195e2c456801c3ef66bb9991
> > Author: Ingo Molnar <mingo@kernel.org>
> > AuthorDate: Thu, 17 Apr 2014 10:25:53 +0200
> > Committer: Ingo Molnar <mingo@kernel.org>
> > CommitDate: Thu, 17 Apr 2014 10:28:42 +0200
> >
> > x86/mce: Fix CMCI preemption bugs
> >
> > The following commit:
> >
> > 27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms")
> >
> > Added two preemption bugs:
> >
> > - machine_check_poll() does a get_cpu_var() without a matching
> > put_cpu_var(), which causes preemption imbalance and crashes upon
> > bootup.
> >
> > - it does percpu ops without disabling preemption. Preemption is not
> > disabled due to the mistaken use of a raw spinlock.
>
> it is arch_spinlock that doesn't disable preemption. raw_spinlock
> disables preemption just fine.
Hohum, __raw_spin_lock_irqsave does preempt_disable(). And
machine_check_poll should be running in irq context so why would the
original issue happen?
> kernel: [ 7.341085] BUG: using __this_cpu_write() in preemptible [00000000] code: modprobe/546
Unfortunately, I have only one line in a mail CCed to me.
Color me confused.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
2014-04-17 10:24 ` Borislav Petkov
@ 2014-04-17 14:03 ` Luck, Tony
2014-04-17 15:26 ` Borislav Petkov
0 siblings, 1 reply; 16+ messages in thread
From: Luck, Tony @ 2014-04-17 14:03 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra
Cc: mingo@kernel.org, hpa@zytor.com, linux-kernel@vger.kernel.org,
torvalds@linux-foundation.org, atodorov@redhat.com,
gong.chen@linux.intel.com, jwboyer@fedoraproject.org,
qmewlo@gmail.com, tglx@linutronix.de,
linux-tip-commits@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 458 bytes --]
> Hohum, __raw_spin_lock_irqsave does preempt_disable(). And
> machine_check_poll should be running in irq context so why would the
> original issue happen?
>
>> kernel: [ 7.341085] BUG: using __this_cpu_write() in preemptible [00000000] code: modprobe/546
>
> Unfortunately, I have only one line in a mail CCed to me.
>
> Color me confused.
Is this just the missing put_cpu() that Chen Gong already sent a patch for?
See attached
-Tony
[-- Attachment #2: Type: message/rfc822, Size: 2761 bytes --]
From: "Chen, Gong" <gong.chen@linux.intel.com>
To: "Luck, Tony" <tony.luck@intel.com>, "bp@alien8.de" <bp@alien8.de>, "m.chehab@samsung.com" <m.chehab@samsung.com>
Cc: "rostedt@goodmis.org" <rostedt@goodmis.org>, "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>, "arozansk@redhat.com" <arozansk@redhat.com>, "Chen, Gong" <gong.chen@linux.intel.com>
Subject: [PATCH 1/2 v2] x86, MCE: Fix a bug in CMCI handler
Date: Thu, 17 Apr 2014 06:28:35 +0000
Message-ID: <1397716119-6164-2-git-send-email-gong.chen@linux.intel.com>
This bug is introduced by me in commit 27f6c573e0. I forget
to execute put_cpu_var operation after get_cpu_var. Fix it
via this_cpu_write instead of get_cpu_var.
v2 -> v1: Separate cleanup from bug fix.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index eeee23f..68317c8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -598,7 +598,6 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
{
struct mce m;
int i;
- unsigned long *v;
this_cpu_inc(mce_poll_count);
@@ -618,8 +617,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
if (!(m.status & MCI_STATUS_VAL))
continue;
- v = &get_cpu_var(mce_polled_error);
- set_bit(0, v);
+ this_cpu_write(mce_polled_error, 1);
/*
* Uncorrected or signalled events are handled by the exception
* handler when it is enabled, so don't process those here.
--
1.9.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
2014-04-17 14:03 ` Luck, Tony
@ 2014-04-17 15:26 ` Borislav Petkov
2014-04-17 16:54 ` Josh Boyer
0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2014-04-17 15:26 UTC (permalink / raw)
To: Luck, Tony
Cc: Peter Zijlstra, mingo@kernel.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, torvalds@linux-foundation.org,
atodorov@redhat.com, gong.chen@linux.intel.com,
jwboyer@fedoraproject.org, qmewlo@gmail.com, tglx@linutronix.de,
linux-tip-commits@vger.kernel.org
On Thu, Apr 17, 2014 at 02:03:34PM +0000, Luck, Tony wrote:
> > Hohum, __raw_spin_lock_irqsave does preempt_disable(). And
> > machine_check_poll should be running in irq context so why would the
> > original issue happen?
> >
> >> kernel: [ 7.341085] BUG: using __this_cpu_write() in preemptible [00000000] code: modprobe/546
> >
> > Unfortunately, I have only one line in a mail CCed to me.
> >
> > Color me confused.
>
> Is this just the missing put_cpu() that Chen Gong already sent a patch for?
I'm not sure. There's some bug report floating around which contains the
"BUG" line above but I can't seem to find/get it.
I'll boot latest Linus tree on my SNB machine to check whether it
triggers here. Ingo says CONFIG_DEBUG_PREEMPT=y is causing it but this
is all hearsay stuff from where I'm standing...
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
2014-04-17 15:26 ` Borislav Petkov
@ 2014-04-17 16:54 ` Josh Boyer
2014-04-17 19:23 ` Borislav Petkov
0 siblings, 1 reply; 16+ messages in thread
From: Josh Boyer @ 2014-04-17 16:54 UTC (permalink / raw)
To: Borislav Petkov
Cc: Luck, Tony, Peter Zijlstra, mingo@kernel.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, torvalds@linux-foundation.org,
atodorov@redhat.com, gong.chen@linux.intel.com, qmewlo@gmail.com,
tglx@linutronix.de, linux-tip-commits@vger.kernel.org
On Thu, Apr 17, 2014 at 11:26 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Apr 17, 2014 at 02:03:34PM +0000, Luck, Tony wrote:
>> > Hohum, __raw_spin_lock_irqsave does preempt_disable(). And
>> > machine_check_poll should be running in irq context so why would the
>> > original issue happen?
>> >
>> >> kernel: [ 7.341085] BUG: using __this_cpu_write() in preemptible [00000000] code: modprobe/546
>> >
>> > Unfortunately, I have only one line in a mail CCed to me.
>> >
>> > Color me confused.
>>
>> Is this just the missing put_cpu() that Chen Gong already sent a patch for?
>
> I'm not sure. There's some bug report floating around which contains the
> "BUG" line above but I can't seem to find/get it.
>
> I'll boot latest Linus tree on my SNB machine to check whether it
> triggers here. Ingo says CONFIG_DEBUG_PREEMPT=y is causing it but this
> is all hearsay stuff from where I'm standing...
For some context.
A user (Owen) reported seeing the following backtrace with 3.15-rc1+:
kernel: [ 120.253539] Hardware name: Hewlett-Packard HP ENVY 15
Notebook PC/1962, BIOS F.24 08/27/2013
kernel: [ 120.253540] ffff88025f2146c0 ffffffff81c01e40
ffffffff8171dc1d ffffffff81d11aa0
kernel: [ 120.253543] ffffffff81c01e50 ffffffff81719a39
ffffffff81c01eb0 ffffffff8172151b
kernel: [ 120.253545] ffffffff81c18480 ffffffff81c01fd8
00000000000146c0 00000000000146c0
kernel: [ 120.253548] Call Trace:
kernel: [ 120.253555] [<ffffffff8171dc1d>] dump_stack+0x4d/0x6f
kernel: [ 120.253558] [<ffffffff81719a39>] __schedule_bug+0x4c/0x5a
kernel: [ 120.253560] [<ffffffff8172151b>] __schedule+0x6eb/0x7a0
kernel: [ 120.253563] [<ffffffff81721b91>] schedule_preempt_disabled+0x31/0x80
kernel: [ 120.253566] [<ffffffff810af8f3>] cpu_startup_entry+0x173/0x490
kernel: [ 120.253570] [<ffffffff8170e3e4>] rest_init+0x84/0x90
kernel: [ 120.253574] [<ffffffff81d34f83>] start_kernel+0x450/0x45b
kernel: [ 120.253576] [<ffffffff81d3493c>] ? repair_env_string+0x5c/0x5c
kernel: [ 120.253578] [<ffffffff81d34120>] ? early_idt_handlers+0x120/0x120
kernel: [ 120.253581] [<ffffffff81d345ee>] x86_64_start_reservations+0x2a/0x2c
kernel: [ 120.253583] [<ffffffff81d3472e>] x86_64_start_kernel+0x13e/0x14d
For whatever reason, his report didn't hit lkml, but it did hit Linus'
inbox. Linus took a look around, googled some, and came across a
report that Alexander filed 2 days ago against Fedora rawhide with a
similar backtrace:
https://bugzilla.redhat.com/show_bug.cgi?id=1087810
Linus CC'd me and Alexander on his reply to Owen, Ingo, Peter, etc.
Ingo was aware of Chen Gong's patch, but when Owen tested it it
produced the BUG line above. So Ingo came up with a slightly
different fix to hopefully resolve that as well. We haven't heard
from Owen whether Ingo's patch resolves everything yet.
I think (hope) that is the full backstory. Ingo or Peter could correct me.
josh
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
2014-04-17 16:54 ` Josh Boyer
@ 2014-04-17 19:23 ` Borislav Petkov
2014-04-17 19:25 ` Linus Torvalds
0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2014-04-17 19:23 UTC (permalink / raw)
To: Josh Boyer, Owen Kibel
Cc: Luck, Tony, Peter Zijlstra, mingo@kernel.org, hpa@zytor.com,
linux-kernel@vger.kernel.org, torvalds@linux-foundation.org,
atodorov@redhat.com, gong.chen@linux.intel.com, qmewlo@gmail.com,
tglx@linutronix.de, linux-tip-commits@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 3850 bytes --]
On Thu, Apr 17, 2014 at 12:54:21PM -0400, Josh Boyer wrote:
> A user (Owen) reported seeing the following backtrace with 3.15-rc1+:
>
> kernel: [ 120.253539] Hardware name: Hewlett-Packard HP ENVY 15
> Notebook PC/1962, BIOS F.24 08/27/2013
> kernel: [ 120.253540] ffff88025f2146c0 ffffffff81c01e40
> ffffffff8171dc1d ffffffff81d11aa0
> kernel: [ 120.253543] ffffffff81c01e50 ffffffff81719a39
> ffffffff81c01eb0 ffffffff8172151b
> kernel: [ 120.253545] ffffffff81c18480 ffffffff81c01fd8
> 00000000000146c0 00000000000146c0
> kernel: [ 120.253548] Call Trace:
> kernel: [ 120.253555] [<ffffffff8171dc1d>] dump_stack+0x4d/0x6f
> kernel: [ 120.253558] [<ffffffff81719a39>] __schedule_bug+0x4c/0x5a
> kernel: [ 120.253560] [<ffffffff8172151b>] __schedule+0x6eb/0x7a0
> kernel: [ 120.253563] [<ffffffff81721b91>] schedule_preempt_disabled+0x31/0x80
> kernel: [ 120.253566] [<ffffffff810af8f3>] cpu_startup_entry+0x173/0x490
> kernel: [ 120.253570] [<ffffffff8170e3e4>] rest_init+0x84/0x90
> kernel: [ 120.253574] [<ffffffff81d34f83>] start_kernel+0x450/0x45b
> kernel: [ 120.253576] [<ffffffff81d3493c>] ? repair_env_string+0x5c/0x5c
> kernel: [ 120.253578] [<ffffffff81d34120>] ? early_idt_handlers+0x120/0x120
> kernel: [ 120.253581] [<ffffffff81d345ee>] x86_64_start_reservations+0x2a/0x2c
> kernel: [ 120.253583] [<ffffffff81d3472e>] x86_64_start_kernel+0x13e/0x14d
>
> For whatever reason, his report didn't hit lkml, but it did hit Linus'
> inbox. Linus took a look around, googled some, and came across a
> report that Alexander filed 2 days ago against Fedora rawhide with a
> similar backtrace:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1087810
>
> Linus CC'd me and Alexander on his reply to Owen, Ingo, Peter, etc.
> Ingo was aware of Chen Gong's patch, but when Owen tested it it
> produced the BUG line above. So Ingo came up with a slightly
> different fix to hopefully resolve that as well. We haven't heard
> from Owen whether Ingo's patch resolves everything yet.
>
> I think (hope) that is the full backstory. Ingo or Peter could correct me.
Thanks Josh, that explains it. Owen bounced me his mail too, in the
meantime.
So, the only thing I'm seeing so far is that I cannot reproduce this
with Owen's config on my machines with Linus' tree from right now.
And the only sane idea I have currently is that without the get_cpu_var
fix, preemption count gets fumbled just right down the
identify_boot_cpu ->
mcheck_cpu_init ->
machine_check_poll ->
get_cpu_var ->
preempt_disable
path so that the initial schedule we do, catches it:
start_kernel -> rest_init -> schedule_preempt_disabled
Which would mean that the only fix for that should be Gong's
initial fix which Tony just sent (see attached), i.e. without the
s/raw_spin_lock/spin_lock/ changes from Ingo.
But you're saying here, Owen tested it already. Which kinda hints at with that
kernel: [ 7.341085] BUG: using __this_cpu_write() in preemptible [00000000] code: modprobe/546
line which was in one of the previous mails.
Anyway, I did merge Owen's config wrt PREEMPT into mine:
$ grep PREEMPT .config
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_PREEMPT_TRACER is not set
and I can't repro it. With, or without the attached patch :-\.
Which basically could mean that there's either some specific timing
Owen's or Alex' boxes are hitting which mine aren't or there's some
miscommunication in the whole thing.
@Owen: can you please test the attached patch if you haven't done so
already and report back?
Thanks.
/me is still confused.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 1265 bytes --]
This bug is introduced by me in commit 27f6c573e0. I forget
to execute put_cpu_var operation after get_cpu_var. Fix it
via this_cpu_write instead of get_cpu_var.
v2 -> v1: Separate cleanup from bug fix.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index eeee23f..68317c8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -598,7 +598,6 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
{
struct mce m;
int i;
- unsigned long *v;
this_cpu_inc(mce_poll_count);
@@ -618,8 +617,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
if (!(m.status & MCI_STATUS_VAL))
continue;
- v = &get_cpu_var(mce_polled_error);
- set_bit(0, v);
+ this_cpu_write(mce_polled_error, 1);
/*
* Uncorrected or signalled events are handled by the exception
* handler when it is enabled, so don't process those here.
--
1.9.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
2014-04-17 19:23 ` Borislav Petkov
@ 2014-04-17 19:25 ` Linus Torvalds
2014-04-17 19:42 ` Borislav Petkov
0 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2014-04-17 19:25 UTC (permalink / raw)
To: Borislav Petkov
Cc: Josh Boyer, Owen Kibel, Luck, Tony, Peter Zijlstra,
mingo@kernel.org, hpa@zytor.com, linux-kernel@vger.kernel.org,
atodorov@redhat.com, gong.chen@linux.intel.com,
tglx@linutronix.de, linux-tip-commits@vger.kernel.org
On Thu, Apr 17, 2014 at 12:23 PM, Borislav Petkov <bp@alien8.de> wrote:
>
> But you're saying here, Owen tested it already.
No, Owen tested a simpler patch that just changes the "get_cpu_var()"
to "__get_cpu_var()" and avoids the preempt increment.
Linus
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
2014-04-17 19:25 ` Linus Torvalds
@ 2014-04-17 19:42 ` Borislav Petkov
2014-04-17 20:58 ` Borislav Petkov
0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2014-04-17 19:42 UTC (permalink / raw)
To: Linus Torvalds
Cc: Josh Boyer, Owen Kibel, Luck, Tony, Peter Zijlstra,
mingo@kernel.org, hpa@zytor.com, linux-kernel@vger.kernel.org,
atodorov@redhat.com, gong.chen@linux.intel.com,
tglx@linutronix.de, linux-tip-commits@vger.kernel.org
On Thu, Apr 17, 2014 at 12:25:14PM -0700, Linus Torvalds wrote:
> No, Owen tested a simpler patch that just changes the "get_cpu_var()"
> to "__get_cpu_var()" and avoids the preempt increment.
Which basically would be the same as doing this_cpu_write() in the
proposed fix - both don't touch preemption. So it is something else.
More staring...
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
2014-04-17 19:42 ` Borislav Petkov
@ 2014-04-17 20:58 ` Borislav Petkov
[not found] ` <CAK7FoDt6tnknX7cioe7=2svVksyAyLUtoZX_oprzJNxoHZEpdw@mail.gmail.com>
0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2014-04-17 20:58 UTC (permalink / raw)
To: Linus Torvalds, Owen Kibel
Cc: Josh Boyer, Luck, Tony, Peter Zijlstra, mingo@kernel.org,
hpa@zytor.com, linux-kernel@vger.kernel.org, atodorov@redhat.com,
gong.chen@linux.intel.com, tglx@linutronix.de,
linux-tip-commits@vger.kernel.org
On Thu, Apr 17, 2014 at 09:42:41PM +0200, Borislav Petkov wrote:
> On Thu, Apr 17, 2014 at 12:25:14PM -0700, Linus Torvalds wrote:
> > No, Owen tested a simpler patch that just changes the "get_cpu_var()"
> > to "__get_cpu_var()" and avoids the preempt increment.
>
> Which basically would be the same as doing this_cpu_write() in the
> proposed fix - both don't touch preemption. So it is something else.
> More staring...
Ok, in one of the mails Ingo forwarded to me, it said it still failed with
> kernel: [ 7.341085] BUG: using __this_cpu_write() in preemptible [00000000] code: modprobe/546
but considering Owen tried with a simpler __get_cpu_var version, I
fail to see how the __this_cpu_write() BUG will happen. Btw, those
__this_cpu_write things have received preemption checks. I'm seeing
right now another thread happening on lkml:
http://lkml.kernel.org/r/8761m7lm3j.fsf@canonical.com
So, Owen, can you please clarify which patch you *did* text exactly and
whether it worked or not.
Also, did you test the patch below? If not, please give it a run too.
Thanks.
---
This bug is introduced by me in commit 27f6c573e0. I forget
to execute put_cpu_var operation after get_cpu_var. Fix it
via this_cpu_write instead of get_cpu_var.
v2 -> v1: Separate cleanup from bug fix.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index eeee23f..68317c8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -598,7 +598,6 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
{
struct mce m;
int i;
- unsigned long *v;
this_cpu_inc(mce_poll_count);
@@ -618,8 +617,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
if (!(m.status & MCI_STATUS_VAL))
continue;
- v = &get_cpu_var(mce_polled_error);
- set_bit(0, v);
+ this_cpu_write(mce_polled_error, 1);
/*
* Uncorrected or signalled events are handled by the exception
* handler when it is enabled, so don't process those here.
--
1.9.0
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
[not found] ` <CAK7FoDt6tnknX7cioe7=2svVksyAyLUtoZX_oprzJNxoHZEpdw@mail.gmail.com>
@ 2014-04-17 21:30 ` Borislav Petkov
2014-04-17 22:20 ` Borislav Petkov
0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2014-04-17 21:30 UTC (permalink / raw)
To: Owen Kibel
Cc: Linus Torvalds, Josh Boyer, Luck, Tony, Peter Zijlstra,
mingo@kernel.org, hpa@zytor.com, linux-kernel@vger.kernel.org,
atodorov@redhat.com, gong.chen@linux.intel.com,
tglx@linutronix.de, linux-tip-commits@vger.kernel.org
On Thu, Apr 17, 2014 at 02:21:41PM -0700, Owen Kibel wrote:
> The patch tested was from Chen, Gong
>
> https://lkml.org/lkml/2014/4/15/838
>
> It worked, apart from the warning on boot described previously - the
> possible boot message was extracted from /var/log/kern.log
>
> The above patch on mainline: 3.15-rc1 2014-04-13 [tar.xz]
>
> seems to have cured the problem
Ok, good. So this is straightened out.
@Ingo: just pick up Gong's fix without the raw_spin_lock* stuff.
> - the kernel appears fine after announcing something like
>
> kernel: [ 7.341085] BUG: using __this_cpu_write() in preemptible
> [00000000] code: modprobe/546
>
> in the boot process.
This is most likely unrelated and is caused by the preemption checks
added to __this_cpu_* in 188a81409ff7. If you'd like to debug this
further, please send a full dmesg:
dmesg > dmesg.log
Privately is fine too.
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
2014-04-17 21:30 ` Borislav Petkov
@ 2014-04-17 22:20 ` Borislav Petkov
2014-04-18 8:07 ` Ingo Molnar
0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2014-04-17 22:20 UTC (permalink / raw)
To: Owen Kibel
Cc: Linus Torvalds, Josh Boyer, Luck, Tony, Peter Zijlstra,
mingo@kernel.org, hpa@zytor.com, linux-kernel@vger.kernel.org,
atodorov@redhat.com, gong.chen@linux.intel.com,
tglx@linutronix.de, linux-tip-commits@vger.kernel.org
On Thu, Apr 17, 2014 at 11:30:12PM +0200, Borislav Petkov wrote:
> This is most likely unrelated and is caused by the preemption checks
> added to __this_cpu_* in 188a81409ff7. If you'd like to debug this
> further, please send a full dmesg:
>
> dmesg > dmesg.log
>
> Privately is fine too.
Ok, thanks for the dmesg. Replying to the thread with everybody:
The splat Owen is seeing is the same as this one at the beginning of
this thread here:
http://lkml.kernel.org/r/8761m7lm3j.fsf@canonical.com
which has a viable fix. Btw, those two splats happen on HP notebooks.
Ok, good, I think we're all solved. Phew :-)
Thanks to all for their help.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
2014-04-17 22:20 ` Borislav Petkov
@ 2014-04-18 8:07 ` Ingo Molnar
2014-04-18 9:22 ` Borislav Petkov
2014-04-25 8:47 ` Chen, Gong
0 siblings, 2 replies; 16+ messages in thread
From: Ingo Molnar @ 2014-04-18 8:07 UTC (permalink / raw)
To: Borislav Petkov
Cc: Owen Kibel, Linus Torvalds, Josh Boyer, Luck, Tony,
Peter Zijlstra, hpa@zytor.com, linux-kernel@vger.kernel.org,
atodorov@redhat.com, gong.chen@linux.intel.com,
tglx@linutronix.de, linux-tip-commits@vger.kernel.org
* Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Apr 17, 2014 at 11:30:12PM +0200, Borislav Petkov wrote:
> > This is most likely unrelated and is caused by the preemption checks
> > added to __this_cpu_* in 188a81409ff7. If you'd like to debug this
> > further, please send a full dmesg:
> >
> > dmesg > dmesg.log
> >
> > Privately is fine too.
>
> Ok, thanks for the dmesg. Replying to the thread with everybody:
>
> The splat Owen is seeing is the same as this one at the beginning of
> this thread here:
>
> http://lkml.kernel.org/r/8761m7lm3j.fsf@canonical.com
>
> which has a viable fix. Btw, those two splats happen on HP notebooks.
>
> Ok, good, I think we're all solved. Phew :-)
>
> Thanks to all for their help.
Okay, so AFAICS the fix in x86/urgent isn't wrong functionally, it's
just that the changelog incorrectly claims the raw-spinlock use is a
bug causing a problem here.
Still that raw spinlock is bogus and might be hiding other problems,
so we can keep the x86/urgent change (ea431643d6c3) as-is and I'll get
it to Linus later today ...
Thanks,
Ingo
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
2014-04-18 8:07 ` Ingo Molnar
@ 2014-04-18 9:22 ` Borislav Petkov
2014-04-25 8:47 ` Chen, Gong
1 sibling, 0 replies; 16+ messages in thread
From: Borislav Petkov @ 2014-04-18 9:22 UTC (permalink / raw)
To: Ingo Molnar
Cc: Owen Kibel, Linus Torvalds, Josh Boyer, Luck, Tony,
Peter Zijlstra, hpa@zytor.com, linux-kernel@vger.kernel.org,
atodorov@redhat.com, gong.chen@linux.intel.com,
tglx@linutronix.de, linux-tip-commits@vger.kernel.org
On Fri, Apr 18, 2014 at 10:07:16AM +0200, Ingo Molnar wrote:
> Okay, so AFAICS the fix in x86/urgent isn't wrong functionally, it's
> just that the changelog incorrectly claims the raw-spinlock use is a
> bug causing a problem here.
>
> Still that raw spinlock is bogus and might be hiding other problems,
> so we can keep the x86/urgent change (ea431643d6c3) as-is and I'll get
> it to Linus later today ...
Ok.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
2014-04-18 8:07 ` Ingo Molnar
2014-04-18 9:22 ` Borislav Petkov
@ 2014-04-25 8:47 ` Chen, Gong
2014-04-25 13:15 ` Ingo Molnar
1 sibling, 1 reply; 16+ messages in thread
From: Chen, Gong @ 2014-04-25 8:47 UTC (permalink / raw)
To: Ingo Molnar
Cc: Borislav Petkov, Owen Kibel, Linus Torvalds, Josh Boyer,
Luck, Tony, Peter Zijlstra, hpa@zytor.com,
linux-kernel@vger.kernel.org, atodorov@redhat.com,
tglx@linutronix.de, linux-tip-commits@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 591 bytes --]
On Fri, Apr 18, 2014 at 10:07:16AM +0200, Ingo Molnar wrote:
> Okay, so AFAICS the fix in x86/urgent isn't wrong functionally, it's
> just that the changelog incorrectly claims the raw-spinlock use is a
> bug causing a problem here.
>
> Still that raw spinlock is bogus and might be hiding other problems,
> so we can keep the x86/urgent change (ea431643d6c3) as-is and I'll get
> it to Linus later today ...
>
> Thanks,
>
> Ingo
Hi, Ingo
We ever had a patch(59d958d2c7) to make spinlock -> raw_spinlock.
Would you please explain it a little more why you revert it?
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs
2014-04-25 8:47 ` Chen, Gong
@ 2014-04-25 13:15 ` Ingo Molnar
0 siblings, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2014-04-25 13:15 UTC (permalink / raw)
To: Chen, Gong
Cc: Borislav Petkov, Owen Kibel, Linus Torvalds, Josh Boyer,
Luck, Tony, Peter Zijlstra, hpa@zytor.com,
linux-kernel@vger.kernel.org, atodorov@redhat.com,
tglx@linutronix.de, linux-tip-commits@vger.kernel.org
* Chen, Gong <gong.chen@linux.intel.com> wrote:
> On Fri, Apr 18, 2014 at 10:07:16AM +0200, Ingo Molnar wrote:
> > Okay, so AFAICS the fix in x86/urgent isn't wrong functionally, it's
> > just that the changelog incorrectly claims the raw-spinlock use is a
> > bug causing a problem here.
> >
> > Still that raw spinlock is bogus and might be hiding other problems,
> > so we can keep the x86/urgent change (ea431643d6c3) as-is and I'll get
> > it to Linus later today ...
> >
> > Thanks,
> >
> > Ingo
> Hi, Ingo
>
> We ever had a patch(59d958d2c7) to make spinlock -> raw_spinlock.
> Would you please explain it a little more why you revert it?
It was years ago and I forgot about that. Should be redone I guess.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2014-04-25 13:15 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-17 9:57 [tip:x86/urgent] x86/mce: Fix CMCI preemption bugs tip-bot for Ingo Molnar
2014-04-17 10:09 ` Peter Zijlstra
2014-04-17 10:24 ` Borislav Petkov
2014-04-17 14:03 ` Luck, Tony
2014-04-17 15:26 ` Borislav Petkov
2014-04-17 16:54 ` Josh Boyer
2014-04-17 19:23 ` Borislav Petkov
2014-04-17 19:25 ` Linus Torvalds
2014-04-17 19:42 ` Borislav Petkov
2014-04-17 20:58 ` Borislav Petkov
[not found] ` <CAK7FoDt6tnknX7cioe7=2svVksyAyLUtoZX_oprzJNxoHZEpdw@mail.gmail.com>
2014-04-17 21:30 ` Borislav Petkov
2014-04-17 22:20 ` Borislav Petkov
2014-04-18 8:07 ` Ingo Molnar
2014-04-18 9:22 ` Borislav Petkov
2014-04-25 8:47 ` Chen, Gong
2014-04-25 13:15 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).