public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path
@ 2014-06-22 17:25 Boris Ostrovsky
  2014-06-24 10:25 ` Borislav Petkov
  0 siblings, 1 reply; 3+ messages in thread
From: Boris Ostrovsky @ 2014-06-22 17:25 UTC (permalink / raw)
  To: bp; +Cc: tony.luck, mattieu.souchaud, linux-edac, linux-kernel


----- bp@alien8.de wrote:

> On Fri, Jun 20, 2014 at 10:04:37PM -0400, Boris Ostrovsky wrote:
> > I'll try it later but this doesn't look sufficient to me: we might
> not
> > reach this point if subsys_system_register() or
> zalloc_cpumask_var()
> > fail.
> 
> If those fail, I'd say we have a much bigger problem than undeleted
> timers.


You can add 

Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

if you prefer to go with that version. I still think it's not 100% reliable (because of what I said above) but at least it fixes the current breakage.

> 
> > We could register the notifier as the first thing in this routine
> > (probably after mce_available() succeeds).
> 
> I guess...


Actually that won't work --- we need to register bus at sysfs first.

-boris

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path
  2014-06-22 17:25 [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path Boris Ostrovsky
@ 2014-06-24 10:25 ` Borislav Petkov
  2014-06-24 10:34   ` [PATCH] x86, MCE: Robustify mcheck_init_device Borislav Petkov
  0 siblings, 1 reply; 3+ messages in thread
From: Borislav Petkov @ 2014-06-24 10:25 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: tony.luck, mattieu.souchaud, linux-edac, linux-kernel

On Sun, Jun 22, 2014 at 10:25:50AM -0700, Boris Ostrovsky wrote:
> You can add
>
> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>
> if you prefer to go with that version. I still think it's not 100%
> reliable (because of what I said above) but at least it fixes the
> current breakage.

Thanks.

The thing is, I don't like to address hypothetical breakages if they can
never be triggered on any configuration.

But I'm going to leave an open mind so if you still can trigger *with*
this fix, then we can revisit it then again.

Thanks for testing!

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] x86, MCE: Robustify mcheck_init_device
  2014-06-24 10:25 ` Borislav Petkov
@ 2014-06-24 10:34   ` Borislav Petkov
  0 siblings, 0 replies; 3+ messages in thread
From: Borislav Petkov @ 2014-06-24 10:34 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: tony.luck, mattieu.souchaud, linux-edac, linux-kernel

From: Borislav Petkov <bp@suse.de>
Subject: [PATCH] x86, MCE: Robustify mcheck_init_device

BorisO reports that misc_register() fails often on xen. The current code
unregisters the CPU hotplug notifier in that case. If then a CPU is
offlined and onlined back again, we end up with a second timer running
on that CPU, leading to soft lockups and system hangs.

So let's leave the hotcpu notifier always registered - even if
mce_device_create failed for some cores and never unreg it so that we
can deal with the timer handling accordingly.

Reported-and-Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1403274493-1371-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index bb92f38153b2..9a79c8dbd8e8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2451,6 +2451,12 @@ static __init int mcheck_init_device(void)
 	for_each_online_cpu(i) {
 		err = mce_device_create(i);
 		if (err) {
+			/*
+			 * Register notifier anyway (and do not unreg it) so
+			 * that we don't leave undeleted timers, see notifier
+			 * callback above.
+			 */
+			__register_hotcpu_notifier(&mce_cpu_notifier);
 			cpu_notifier_register_done();
 			goto err_device_create;
 		}
@@ -2471,10 +2477,6 @@ static __init int mcheck_init_device(void)
 err_register:
 	unregister_syscore_ops(&mce_syscore_ops);
 
-	cpu_notifier_register_begin();
-	__unregister_hotcpu_notifier(&mce_cpu_notifier);
-	cpu_notifier_register_done();
-
 err_device_create:
 	/*
 	 * We didn't keep track of which devices were created above, but
-- 
2.0.0

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-06-24 10:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-22 17:25 [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path Boris Ostrovsky
2014-06-24 10:25 ` Borislav Petkov
2014-06-24 10:34   ` [PATCH] x86, MCE: Robustify mcheck_init_device Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox