* Re: [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path
@ 2014-06-22 17:25 Boris Ostrovsky
2014-06-24 10:25 ` Borislav Petkov
0 siblings, 1 reply; 3+ messages in thread
From: Boris Ostrovsky @ 2014-06-22 17:25 UTC (permalink / raw)
To: bp; +Cc: tony.luck, mattieu.souchaud, linux-edac, linux-kernel
----- bp@alien8.de wrote:
> On Fri, Jun 20, 2014 at 10:04:37PM -0400, Boris Ostrovsky wrote:
> > I'll try it later but this doesn't look sufficient to me: we might
> not
> > reach this point if subsys_system_register() or
> zalloc_cpumask_var()
> > fail.
>
> If those fail, I'd say we have a much bigger problem than undeleted
> timers.
You can add
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
if you prefer to go with that version. I still think it's not 100% reliable (because of what I said above) but at least it fixes the current breakage.
>
> > We could register the notifier as the first thing in this routine
> > (probably after mce_available() succeeds).
>
> I guess...
Actually that won't work --- we need to register bus at sysfs first.
-boris
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path
2014-06-22 17:25 [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path Boris Ostrovsky
@ 2014-06-24 10:25 ` Borislav Petkov
2014-06-24 10:34 ` [PATCH] x86, MCE: Robustify mcheck_init_device Borislav Petkov
0 siblings, 1 reply; 3+ messages in thread
From: Borislav Petkov @ 2014-06-24 10:25 UTC (permalink / raw)
To: Boris Ostrovsky; +Cc: tony.luck, mattieu.souchaud, linux-edac, linux-kernel
On Sun, Jun 22, 2014 at 10:25:50AM -0700, Boris Ostrovsky wrote:
> You can add
>
> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>
> if you prefer to go with that version. I still think it's not 100%
> reliable (because of what I said above) but at least it fixes the
> current breakage.
Thanks.
The thing is, I don't like to address hypothetical breakages if they can
never be triggered on any configuration.
But I'm going to leave an open mind so if you still can trigger *with*
this fix, then we can revisit it then again.
Thanks for testing!
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH] x86, MCE: Robustify mcheck_init_device
2014-06-24 10:25 ` Borislav Petkov
@ 2014-06-24 10:34 ` Borislav Petkov
0 siblings, 0 replies; 3+ messages in thread
From: Borislav Petkov @ 2014-06-24 10:34 UTC (permalink / raw)
To: Boris Ostrovsky; +Cc: tony.luck, mattieu.souchaud, linux-edac, linux-kernel
From: Borislav Petkov <bp@suse.de>
Subject: [PATCH] x86, MCE: Robustify mcheck_init_device
BorisO reports that misc_register() fails often on xen. The current code
unregisters the CPU hotplug notifier in that case. If then a CPU is
offlined and onlined back again, we end up with a second timer running
on that CPU, leading to soft lockups and system hangs.
So let's leave the hotcpu notifier always registered - even if
mce_device_create failed for some cores and never unreg it so that we
can deal with the timer handling accordingly.
Reported-and-Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1403274493-1371-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/kernel/cpu/mcheck/mce.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index bb92f38153b2..9a79c8dbd8e8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2451,6 +2451,12 @@ static __init int mcheck_init_device(void)
for_each_online_cpu(i) {
err = mce_device_create(i);
if (err) {
+ /*
+ * Register notifier anyway (and do not unreg it) so
+ * that we don't leave undeleted timers, see notifier
+ * callback above.
+ */
+ __register_hotcpu_notifier(&mce_cpu_notifier);
cpu_notifier_register_done();
goto err_device_create;
}
@@ -2471,10 +2477,6 @@ static __init int mcheck_init_device(void)
err_register:
unregister_syscore_ops(&mce_syscore_ops);
- cpu_notifier_register_begin();
- __unregister_hotcpu_notifier(&mce_cpu_notifier);
- cpu_notifier_register_done();
-
err_device_create:
/*
* We didn't keep track of which devices were created above, but
--
2.0.0
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-06-24 10:34 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-22 17:25 [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path Boris Ostrovsky
2014-06-24 10:25 ` Borislav Petkov
2014-06-24 10:34 ` [PATCH] x86, MCE: Robustify mcheck_init_device Borislav Petkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox