* Re: [PATCH] x86 MCE: shut up lockdep warning
2009-05-08 3:47 [PATCH] x86 MCE: shut up lockdep warning Shaohua Li
@ 2009-05-08 6:45 ` Andi Kleen
2009-05-08 9:10 ` Ingo Molnar
2009-05-08 8:17 ` Hidetoshi Seto
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2009-05-08 6:45 UTC (permalink / raw)
To: Shaohua Li; +Cc: lkml, Andi Kleen, Andrew Morton
On Fri, May 08, 2009 at 11:47:09AM +0800, Shaohua Li wrote:
> lockdep report below warning when I try to offline one cpu:
> [ 110.835487] =================================
> [ 110.835616] [ INFO: inconsistent lock state ]
> [ 110.835688] 2.6.30-rc4-00336-g8c9ed89 #52
> [ 110.835757] ---------------------------------
> [ 110.835828] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [ 110.835908] swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [ 110.835982] (cmci_discover_lock){?.+...}, at: [<ffffffff80236dc0>] cmci_clear+0x30/0x9b
>
> smp_call_function_single() will disable interrupt. moving mce reenable/disable
> to workqueue, so no irq is disabled.
Looks good. Thanks.
Acked-by: Andi Kleen <ak@linux.intel.com>
>
> Signed-off-by: Shaohua Li<shaohua.li@intel.com>
-Andi
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] x86 MCE: shut up lockdep warning
2009-05-08 6:45 ` Andi Kleen
@ 2009-05-08 9:10 ` Ingo Molnar
2009-05-08 9:37 ` Andi Kleen
0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2009-05-08 9:10 UTC (permalink / raw)
To: Andi Kleen
Cc: Shaohua Li, lkml, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
Peter Zijlstra
* Andi Kleen <andi@firstfloor.org> wrote:
> On Fri, May 08, 2009 at 11:47:09AM +0800, Shaohua Li wrote:
> > lockdep report below warning when I try to offline one cpu:
> > [ 110.835487] =================================
> > [ 110.835616] [ INFO: inconsistent lock state ]
> > [ 110.835688] 2.6.30-rc4-00336-g8c9ed89 #52
> > [ 110.835757] ---------------------------------
> > [ 110.835828] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> > [ 110.835908] swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> > [ 110.835982] (cmci_discover_lock){?.+...}, at: [<ffffffff80236dc0>] cmci_clear+0x30/0x9b
> >
> > smp_call_function_single() will disable interrupt. moving mce reenable/disable
> > to workqueue, so no irq is disabled.
>
> Looks good. Thanks.
>
> Acked-by: Andi Kleen <ak@linux.intel.com>
The report is useful, but the fix does not look good at all, and you
should never have acked it:
- it works around a lockdep warning
- you did not realize the real bug while the warning was plain
- plus the patch introduces a fragile (because complex)
work_on_cpu() call into the CPU hotplug path, which could have
caused followup regressions.
Please also Cc: the relevant upstream subsystem maintainers in such
cases (the x86 maintaiers in this case).
Thanks,
Ingo
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] x86 MCE: shut up lockdep warning
2009-05-08 9:10 ` Ingo Molnar
@ 2009-05-08 9:37 ` Andi Kleen
2009-05-08 12:14 ` Shaohua Li
0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2009-05-08 9:37 UTC (permalink / raw)
To: Ingo Molnar
Cc: Andi Kleen, Shaohua Li, lkml, Andrew Morton, Thomas Gleixner,
H. Peter Anvin, Peter Zijlstra
> - it works around a lockdep warning
>
> - you did not realize the real bug while the warning was plain
Well I'm not sure you understood it either :) Actually I'm pretty
sure you did not. It would have been good if you had awaited proper
review comments before commiting your patch.
I don't think here's really a real bug because cpu hotunplug is single threaded
anyways (cpu add remove lock) and we don't do multiple discoveries in parallel.
The reason the lock is there is only during bringup with multiple CPUs
doing this in parallel on initial bootup. But we can't race against cpu
hotunplug there because there's not hotunplug before the system
is up with all configured CPUs.
So I think any way of shutting up lockdep is fine here and your
patch is overkill and disables interrupts unnecessarily.
> - plus the patch introduces a fragile (because complex)
> work_on_cpu() call into the CPU hotplug path, which could have
> caused followup regressions.
That's a reasonable point, but doesn't seem strong enough to
do full irq disabling. A better fix would be to find some other
way to shut off this lockdep warning for this case. Is there
such a way?
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86 MCE: shut up lockdep warning
2009-05-08 9:37 ` Andi Kleen
@ 2009-05-08 12:14 ` Shaohua Li
0 siblings, 0 replies; 11+ messages in thread
From: Shaohua Li @ 2009-05-08 12:14 UTC (permalink / raw)
To: Andi Kleen
Cc: Ingo Molnar, lkml, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
Peter Zijlstra
On Fri, May 08, 2009 at 05:37:57PM +0800, Andi Kleen wrote:
> > - it works around a lockdep warning
> >
> > - you did not realize the real bug while the warning was plain
>
> Well I'm not sure you understood it either :) Actually I'm pretty
> sure you did not. It would have been good if you had awaited proper
> review comments before commiting your patch.
>
> I don't think here's really a real bug because cpu hotunplug is single threaded
> anyways (cpu add remove lock) and we don't do multiple discoveries in parallel.
>
> The reason the lock is there is only during bringup with multiple CPUs
> doing this in parallel on initial bootup. But we can't race against cpu
> hotunplug there because there's not hotunplug before the system
> is up with all configured CPUs.
>
> So I think any way of shutting up lockdep is fine here and your
> patch is overkill and disables interrupts unnecessarily.
yes, this isn't a real bug, but just a false warning from lockdep. It's
my fault I didn't point this out first.
> > - plus the patch introduces a fragile (because complex)
> > work_on_cpu() call into the CPU hotplug path, which could have
> > caused followup regressions.
>
> That's a reasonable point, but doesn't seem strong enough to
> do full irq disabling. A better fix would be to find some other
> way to shut off this lockdep warning for this case. Is there
> such a way?
I tried to assign a lockdep class to the mce lock, which is the usual
way to avoid lockdep warning, but it appears not working here.
Thanks,
Shaohua
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86 MCE: shut up lockdep warning
2009-05-08 3:47 [PATCH] x86 MCE: shut up lockdep warning Shaohua Li
2009-05-08 6:45 ` Andi Kleen
@ 2009-05-08 8:17 ` Hidetoshi Seto
2009-05-08 8:28 ` [PATCH] x86 MCE: yet another " Hidetoshi Seto
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Hidetoshi Seto @ 2009-05-08 8:17 UTC (permalink / raw)
To: Shaohua Li; +Cc: lkml, Andi Kleen, Andrew Morton
Shaohua Li wrote:
> lockdep report below warning when I try to offline one cpu:
> [ 110.835487] =================================
> [ 110.835616] [ INFO: inconsistent lock state ]
> [ 110.835688] 2.6.30-rc4-00336-g8c9ed89 #52
> [ 110.835757] ---------------------------------
> [ 110.835828] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [ 110.835908] swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [ 110.835982] (cmci_discover_lock){?.+...}, at: [<ffffffff80236dc0>] cmci_clear+0x30/0x9b
>
> smp_call_function_single() will disable interrupt. moving mce reenable/disable
> to workqueue, so no irq is disabled.
I want a confirmation.
Will this scheduled work be executed properly on the cpu which is going
to offline?
> @@ -1106,14 +1108,14 @@ static int __cpuinit mce_cpu_callback(struct notifier_block *nfb,
> case CPU_DOWN_PREPARE:
> case CPU_DOWN_PREPARE_FROZEN:
> del_timer_sync(t);
> - smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
> + work_on_cpu(cpu, mce_disable_cpu, &action);
> break;
> case CPU_DOWN_FAILED:
> case CPU_DOWN_FAILED_FROZEN:
> t->expires = round_jiffies(jiffies +
> __get_cpu_var(next_interval));
> add_timer_on(t, cpu);
> - smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);
> + work_on_cpu(cpu, mce_reenable_cpu, &action);
> break;
> case CPU_POST_DEAD:
> /* intentionally ignoring frozen here */
>
I believe there is strong reason to have "1" in the last argument of
smp_call_function_single().
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH] x86 MCE: yet another shut up lockdep warning
2009-05-08 3:47 [PATCH] x86 MCE: shut up lockdep warning Shaohua Li
2009-05-08 6:45 ` Andi Kleen
2009-05-08 8:17 ` Hidetoshi Seto
@ 2009-05-08 8:28 ` Hidetoshi Seto
2009-05-08 9:07 ` [tip:x86/urgent] x86: MCE: make cmci_discover_lock irq-safe tip-bot for Hidetoshi Seto
2009-05-08 8:34 ` [PATCH] x86 MCE: shut up lockdep warning Peter Zijlstra
2009-05-08 9:04 ` Ingo Molnar
4 siblings, 1 reply; 11+ messages in thread
From: Hidetoshi Seto @ 2009-05-08 8:28 UTC (permalink / raw)
To: Shaohua Li; +Cc: lkml, Andi Kleen, Andrew Morton
lockdep report below warning when Li try to offline one cpu:
[ 110.835487] =================================
[ 110.835616] [ INFO: inconsistent lock state ]
[ 110.835688] 2.6.30-rc4-00336-g8c9ed89 #52
[ 110.835757] ---------------------------------
[ 110.835828] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[ 110.835908] swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
[ 110.835982] (cmci_discover_lock){?.+...}, at: [<ffffffff80236dc0>] cmci_clear+0x30/0x9b
cmci_clear can be called via smp_call_function_single().
It is better to disable interrupt while holding cmci_discover_lock.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reported-by: Shaohua Li<shaohua.li@intel.com>
---
arch/x86/kernel/cpu/mcheck/mce_intel_64.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel_64.c b/arch/x86/kernel/cpu/mcheck/mce_intel_64.c
index d6b72df..cef3ee3 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel_64.c
@@ -151,10 +151,11 @@ static void print_update(char *type, int *hdr, int num)
static void cmci_discover(int banks, int boot)
{
unsigned long *owned = (void *)&__get_cpu_var(mce_banks_owned);
+ unsigned long flags;
int hdr = 0;
int i;
- spin_lock(&cmci_discover_lock);
+ spin_lock_irqsave(&cmci_discover_lock, flags);
for (i = 0; i < banks; i++) {
u64 val;
@@ -184,7 +185,7 @@ static void cmci_discover(int banks, int boot)
WARN_ON(!test_bit(i, __get_cpu_var(mce_poll_banks)));
}
}
- spin_unlock(&cmci_discover_lock);
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
if (hdr)
printk(KERN_CONT "\n");
}
@@ -211,13 +212,14 @@ void cmci_recheck(void)
*/
void cmci_clear(void)
{
+ unsigned long flags;
int i;
int banks;
u64 val;
if (!cmci_supported(&banks))
return;
- spin_lock(&cmci_discover_lock);
+ spin_lock_irqsave(&cmci_discover_lock, flags);
for (i = 0; i < banks; i++) {
if (!test_bit(i, __get_cpu_var(mce_banks_owned)))
continue;
@@ -227,7 +229,7 @@ void cmci_clear(void)
wrmsrl(MSR_IA32_MC0_CTL2 + i, val);
__clear_bit(i, __get_cpu_var(mce_banks_owned));
}
- spin_unlock(&cmci_discover_lock);
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
}
/*
--
1.6.2.2
^ permalink raw reply related [flat|nested] 11+ messages in thread* [tip:x86/urgent] x86: MCE: make cmci_discover_lock irq-safe
2009-05-08 8:28 ` [PATCH] x86 MCE: yet another " Hidetoshi Seto
@ 2009-05-08 9:07 ` tip-bot for Hidetoshi Seto
0 siblings, 0 replies; 11+ messages in thread
From: tip-bot for Hidetoshi Seto @ 2009-05-08 9:07 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, andi, seto.hidetoshi, shaohua.li, akpm,
tglx, mingo
Commit-ID: e5299926d7459d9fa7c7f856983147817aedb10e
Gitweb: http://git.kernel.org/tip/e5299926d7459d9fa7c7f856983147817aedb10e
Author: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
AuthorDate: Fri, 8 May 2009 17:28:40 +0900
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 8 May 2009 11:03:26 +0200
x86: MCE: make cmci_discover_lock irq-safe
Lockdep reports the warning below when Li tries to offline one cpu:
[ 110.835487] =================================
[ 110.835616] [ INFO: inconsistent lock state ]
[ 110.835688] 2.6.30-rc4-00336-g8c9ed89 #52
[ 110.835757] ---------------------------------
[ 110.835828] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[ 110.835908] swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
[ 110.835982] (cmci_discover_lock){?.+...}, at: [<ffffffff80236dc0>] cmci_clear+0x30/0x9b
cmci_clear() can be called via smp_call_function_single().
It is better to disable interrupt while holding cmci_discover_lock,
to turn it into an irq-safe lock - we can deadlock otherwise.
[ Impact: fix possible deadlock in the MCE code ]
Reported-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4A03ED38.8000700@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Shaohua Li<shaohua.li@intel.com>
---
arch/x86/kernel/cpu/mcheck/mce_intel_64.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel_64.c b/arch/x86/kernel/cpu/mcheck/mce_intel_64.c
index d6b72df..cef3ee3 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel_64.c
@@ -151,10 +151,11 @@ static void print_update(char *type, int *hdr, int num)
static void cmci_discover(int banks, int boot)
{
unsigned long *owned = (void *)&__get_cpu_var(mce_banks_owned);
+ unsigned long flags;
int hdr = 0;
int i;
- spin_lock(&cmci_discover_lock);
+ spin_lock_irqsave(&cmci_discover_lock, flags);
for (i = 0; i < banks; i++) {
u64 val;
@@ -184,7 +185,7 @@ static void cmci_discover(int banks, int boot)
WARN_ON(!test_bit(i, __get_cpu_var(mce_poll_banks)));
}
}
- spin_unlock(&cmci_discover_lock);
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
if (hdr)
printk(KERN_CONT "\n");
}
@@ -211,13 +212,14 @@ void cmci_recheck(void)
*/
void cmci_clear(void)
{
+ unsigned long flags;
int i;
int banks;
u64 val;
if (!cmci_supported(&banks))
return;
- spin_lock(&cmci_discover_lock);
+ spin_lock_irqsave(&cmci_discover_lock, flags);
for (i = 0; i < banks; i++) {
if (!test_bit(i, __get_cpu_var(mce_banks_owned)))
continue;
@@ -227,7 +229,7 @@ void cmci_clear(void)
wrmsrl(MSR_IA32_MC0_CTL2 + i, val);
__clear_bit(i, __get_cpu_var(mce_banks_owned));
}
- spin_unlock(&cmci_discover_lock);
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
}
/*
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] x86 MCE: shut up lockdep warning
2009-05-08 3:47 [PATCH] x86 MCE: shut up lockdep warning Shaohua Li
` (2 preceding siblings ...)
2009-05-08 8:28 ` [PATCH] x86 MCE: yet another " Hidetoshi Seto
@ 2009-05-08 8:34 ` Peter Zijlstra
2009-05-08 19:11 ` Andi Kleen
2009-05-08 9:04 ` Ingo Molnar
4 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2009-05-08 8:34 UTC (permalink / raw)
To: Shaohua Li; +Cc: lkml, Andi Kleen, Andrew Morton
On Fri, 2009-05-08 at 11:47 +0800, Shaohua Li wrote:
> lockdep report below warning when I try to offline one cpu:
> [ 110.835487] =================================
> [ 110.835616] [ INFO: inconsistent lock state ]
> [ 110.835688] 2.6.30-rc4-00336-g8c9ed89 #52
> [ 110.835757] ---------------------------------
> [ 110.835828] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [ 110.835908] swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [ 110.835982] (cmci_discover_lock){?.+...}, at: [<ffffffff80236dc0>] cmci_clear+0x30/0x9b
>
> smp_call_function_single() will disable interrupt. moving mce reenable/disable
> to workqueue, so no irq is disabled.
So was this a genuine bug in your code? The patch suggests it was.
In that case the Subject is mis-representing the situation.
> Signed-off-by: Shaohua Li<shaohua.li@intel.com>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_64.c b/arch/x86/kernel/cpu/mcheck/mce_64.c
> index 6fb0b35..739c824 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_64.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_64.c
> @@ -1057,30 +1057,32 @@ static __cpuinit void mce_remove_device(unsigned int cpu)
> }
>
> /* Make sure there are no machine checks on offlined CPUs. */
> -static void mce_disable_cpu(void *h)
> +static long mce_disable_cpu(void *h)
> {
> int i;
> unsigned long action = *(unsigned long *)h;
>
> if (!mce_available(¤t_cpu_data))
> - return;
> + return 0;
> if (!(action & CPU_TASKS_FROZEN))
> cmci_clear();
> for (i = 0; i < banks; i++)
> wrmsrl(MSR_IA32_MC0_CTL + i*4, 0);
> + return 0;
> }
>
> -static void mce_reenable_cpu(void *h)
> +static long mce_reenable_cpu(void *h)
> {
> int i;
> unsigned long action = *(unsigned long *)h;
>
> if (!mce_available(¤t_cpu_data))
> - return;
> + return 0;
> if (!(action & CPU_TASKS_FROZEN))
> cmci_reenable();
> for (i = 0; i < banks; i++)
> wrmsrl(MSR_IA32_MC0_CTL + i*4, bank[i]);
> + return 0;
> }
>
> /* Get notified when a cpu comes on/off. Be hotplug friendly. */
> @@ -1106,14 +1108,14 @@ static int __cpuinit mce_cpu_callback(struct notifier_block *nfb,
> case CPU_DOWN_PREPARE:
> case CPU_DOWN_PREPARE_FROZEN:
> del_timer_sync(t);
> - smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
> + work_on_cpu(cpu, mce_disable_cpu, &action);
> break;
> case CPU_DOWN_FAILED:
> case CPU_DOWN_FAILED_FROZEN:
> t->expires = round_jiffies(jiffies +
> __get_cpu_var(next_interval));
> add_timer_on(t, cpu);
> - smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);
> + work_on_cpu(cpu, mce_reenable_cpu, &action);
> break;
> case CPU_POST_DEAD:
> /* intentionally ignoring frozen here */
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] x86 MCE: shut up lockdep warning
2009-05-08 8:34 ` [PATCH] x86 MCE: shut up lockdep warning Peter Zijlstra
@ 2009-05-08 19:11 ` Andi Kleen
0 siblings, 0 replies; 11+ messages in thread
From: Andi Kleen @ 2009-05-08 19:11 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Shaohua Li, lkml, Andi Kleen, Andrew Morton
> So was this a genuine bug in your code? The patch suggests it was.
No I think it was false. See my earlier email.
The higher level cpu_add_remove_lock makes sure that there is never
an interrupt when the code is running process context.
So Ingo's patch shuts off lockdep, but it's overkill and not really
needed. It would be better if there was some way to express
the dependency on cpu_add_remove_lock, but I'm not sure
that can be expressed in lockdep.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86 MCE: shut up lockdep warning
2009-05-08 3:47 [PATCH] x86 MCE: shut up lockdep warning Shaohua Li
` (3 preceding siblings ...)
2009-05-08 8:34 ` [PATCH] x86 MCE: shut up lockdep warning Peter Zijlstra
@ 2009-05-08 9:04 ` Ingo Molnar
4 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2009-05-08 9:04 UTC (permalink / raw)
To: Shaohua Li
Cc: lkml, Andi Kleen, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
Peter Zijlstra
* Shaohua Li <shaohua.li@intel.com> wrote:
> lockdep report below warning when I try to offline one cpu:
> [ 110.835487] =================================
> [ 110.835616] [ INFO: inconsistent lock state ]
> [ 110.835688] 2.6.30-rc4-00336-g8c9ed89 #52
> [ 110.835757] ---------------------------------
> [ 110.835828] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [ 110.835908] swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [ 110.835982] (cmci_discover_lock){?.+...}, at: [<ffffffff80236dc0>] cmci_clear+0x30/0x9b
>
> smp_call_function_single() will disable interrupt. moving mce reenable/disable
> to workqueue, so no irq is disabled.
>
> Signed-off-by: Shaohua Li<shaohua.li@intel.com>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_64.c b/arch/x86/kernel/cpu/mcheck/mce_64.c
> index 6fb0b35..739c824 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_64.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_64.c
> @@ -1057,30 +1057,32 @@ static __cpuinit void mce_remove_device(unsigned int cpu)
> }
>
> /* Make sure there are no machine checks on offlined CPUs. */
> -static void mce_disable_cpu(void *h)
> +static long mce_disable_cpu(void *h)
> {
> int i;
> unsigned long action = *(unsigned long *)h;
>
> if (!mce_available(¤t_cpu_data))
> - return;
> + return 0;
> if (!(action & CPU_TASKS_FROZEN))
> cmci_clear();
> for (i = 0; i < banks; i++)
> wrmsrl(MSR_IA32_MC0_CTL + i*4, 0);
> + return 0;
> }
>
> -static void mce_reenable_cpu(void *h)
> +static long mce_reenable_cpu(void *h)
> {
> int i;
> unsigned long action = *(unsigned long *)h;
>
> if (!mce_available(¤t_cpu_data))
> - return;
> + return 0;
> if (!(action & CPU_TASKS_FROZEN))
> cmci_reenable();
> for (i = 0; i < banks; i++)
> wrmsrl(MSR_IA32_MC0_CTL + i*4, bank[i]);
> + return 0;
> }
>
> /* Get notified when a cpu comes on/off. Be hotplug friendly. */
> @@ -1106,14 +1108,14 @@ static int __cpuinit mce_cpu_callback(struct notifier_block *nfb,
> case CPU_DOWN_PREPARE:
> case CPU_DOWN_PREPARE_FROZEN:
> del_timer_sync(t);
> - smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
> + work_on_cpu(cpu, mce_disable_cpu, &action);
> break;
> case CPU_DOWN_FAILED:
> case CPU_DOWN_FAILED_FROZEN:
> t->expires = round_jiffies(jiffies +
> __get_cpu_var(next_interval));
> add_timer_on(t, cpu);
> - smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);
> + work_on_cpu(cpu, mce_reenable_cpu, &action);
> break;
No, this needs a real fix - not a 'shut up lockdep' workaround.
One problem is that cmci_discover_lock is taken irq-unsafe, which is
obviously a bad idea ...
Could you please try the fix for that from Hidetoshi-san, attached
below (also available in latest -tip).
Thanks,
Ingo
------------------>
>From e5299926d7459d9fa7c7f856983147817aedb10e Mon Sep 17 00:00:00 2001
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Date: Fri, 8 May 2009 17:28:40 +0900
Subject: [PATCH] x86: MCE: make cmci_discover_lock irq-safe
Lockdep reports the warning below when Li tries to offline one cpu:
[ 110.835487] =================================
[ 110.835616] [ INFO: inconsistent lock state ]
[ 110.835688] 2.6.30-rc4-00336-g8c9ed89 #52
[ 110.835757] ---------------------------------
[ 110.835828] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[ 110.835908] swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
[ 110.835982] (cmci_discover_lock){?.+...}, at: [<ffffffff80236dc0>] cmci_clear+0x30/0x9b
cmci_clear() can be called via smp_call_function_single().
It is better to disable interrupt while holding cmci_discover_lock,
to turn it into an irq-safe lock - we can deadlock otherwise.
[ Impact: fix possible deadlock in the MCE code ]
Reported-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4A03ED38.8000700@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Shaohua Li<shaohua.li@intel.com>
---
arch/x86/kernel/cpu/mcheck/mce_intel_64.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel_64.c b/arch/x86/kernel/cpu/mcheck/mce_intel_64.c
index d6b72df..cef3ee3 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel_64.c
@@ -151,10 +151,11 @@ static void print_update(char *type, int *hdr, int num)
static void cmci_discover(int banks, int boot)
{
unsigned long *owned = (void *)&__get_cpu_var(mce_banks_owned);
+ unsigned long flags;
int hdr = 0;
int i;
- spin_lock(&cmci_discover_lock);
+ spin_lock_irqsave(&cmci_discover_lock, flags);
for (i = 0; i < banks; i++) {
u64 val;
@@ -184,7 +185,7 @@ static void cmci_discover(int banks, int boot)
WARN_ON(!test_bit(i, __get_cpu_var(mce_poll_banks)));
}
}
- spin_unlock(&cmci_discover_lock);
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
if (hdr)
printk(KERN_CONT "\n");
}
@@ -211,13 +212,14 @@ void cmci_recheck(void)
*/
void cmci_clear(void)
{
+ unsigned long flags;
int i;
int banks;
u64 val;
if (!cmci_supported(&banks))
return;
- spin_lock(&cmci_discover_lock);
+ spin_lock_irqsave(&cmci_discover_lock, flags);
for (i = 0; i < banks; i++) {
if (!test_bit(i, __get_cpu_var(mce_banks_owned)))
continue;
@@ -227,7 +229,7 @@ void cmci_clear(void)
wrmsrl(MSR_IA32_MC0_CTL2 + i, val);
__clear_bit(i, __get_cpu_var(mce_banks_owned));
}
- spin_unlock(&cmci_discover_lock);
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
}
/*
^ permalink raw reply related [flat|nested] 11+ messages in thread