From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: Borislav Petkov <bp@alien8.de>
Cc: tony.luck@intel.com, linux-kernel@vger.kernel.org,
linux-edac@vger.kernel.org, mattieu.souchaud@free.fr
Subject: Re: [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path
Date: Fri, 20 Jun 2014 11:41:27 -0400 [thread overview]
Message-ID: <53A45627.6090306@oracle.com> (raw)
In-Reply-To: <20140620152312.GB11391@pd.tnic>
On 06/20/2014 11:23 AM, Borislav Petkov wrote:
> On Fri, Jun 20, 2014 at 10:28:13AM -0400, Boris Ostrovsky wrote:
>> Commit 9c15a24b038f4d8da93a2bc2554731f8953a7c17 (x86/mce: Improve
>> mcheck_init_device() error handling) unregisters (or never registers)
>> MCE's hotplug notifier if an error is encountered.
> Well, mcheck_init_device() did encounter errors before that commit too,
> can you please go into detail on how exactly you're triggering this?
> Which error are you talking about exactly?
You can simulate this on baremetal by having, for example,
misc_register() fail (just add 'err = -EOI' after the call). Or you can
return an error right upon entry to mcheck_init_device() (I haven't
tested that though).
Then, after you are booted do a couple of
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online
Then sit still for about 10 minutes. I don't think any activity is
necessary.
You are dead now. If you are lucky you may see messages about soft
lockups or RCU stalls but often nothing.
> Lemme guess: some xen special handling which baremetal doesn't need.
Only in the sense that on Xen misc_register() often fails. But any
failure on baremetal will result in the same behavior.
>
>> Since unplugging a CPU would normally result in the notifier deleting
>> MCE timer we are now left with the timer running if a CPU is removed on
>> a system where mcheck_init_device() had failed.
>>
>> If we later hotplug this CPU back we add this timer again in
>> mcheck_cpu_init()). Eventually the two timers start intefering with each
>> other, causing soft lockups or system hangs.
>>
>> We should leave the notifier always on and, in fact, set it up early
>> during the boot.
> We do leave it always on - we only unregister it if we've encountered an
> error.
Right. And I think we shouldn't because we leave undeleted timers.
-boris
next prev parent reply other threads:[~2014-06-20 15:40 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-20 14:28 [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path Boris Ostrovsky
2014-06-20 15:23 ` Borislav Petkov
2014-06-20 15:41 ` Boris Ostrovsky [this message]
2014-06-20 15:58 ` Borislav Petkov
2014-06-20 16:16 ` Boris Ostrovsky
2014-06-20 17:52 ` Borislav Petkov
2014-06-20 19:39 ` Boris Ostrovsky
2014-06-20 20:03 ` Borislav Petkov
2014-06-20 20:16 ` Boris Ostrovsky
2014-06-20 20:29 ` Borislav Petkov
2014-06-20 20:43 ` Boris Ostrovsky
2014-06-20 21:11 ` Borislav Petkov
2014-06-21 2:04 ` Boris Ostrovsky
2014-06-21 10:08 ` Borislav Petkov
2014-07-24 23:36 ` [tip:x86/urgent] x86, MCE: Robustify mcheck_init_device tip-bot for Borislav Petkov
-- strict thread matches above, loose matches on Subject: below --
2014-06-22 17:25 [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path Boris Ostrovsky
2014-06-24 10:25 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53A45627.6090306@oracle.com \
--to=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mattieu.souchaud@free.fr \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).