linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: Borislav Petkov <bp@alien8.de>
Cc: tony.luck@intel.com, linux-kernel@vger.kernel.org,
	linux-edac@vger.kernel.org, mattieu.souchaud@free.fr
Subject: Re: [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path
Date: Fri, 20 Jun 2014 11:41:27 -0400	[thread overview]
Message-ID: <53A45627.6090306@oracle.com> (raw)
In-Reply-To: <20140620152312.GB11391@pd.tnic>

On 06/20/2014 11:23 AM, Borislav Petkov wrote:
> On Fri, Jun 20, 2014 at 10:28:13AM -0400, Boris Ostrovsky wrote:
>> Commit 9c15a24b038f4d8da93a2bc2554731f8953a7c17 (x86/mce: Improve
>> mcheck_init_device() error handling) unregisters (or never registers)
>> MCE's hotplug notifier if an error is encountered.
> Well, mcheck_init_device() did encounter errors before that commit too,
> can you please go into detail on how exactly you're triggering this?
> Which error are you talking about exactly?

You can simulate this on baremetal by having, for example, 
misc_register() fail (just add 'err = -EOI' after the call). Or you can 
return an error right upon entry to mcheck_init_device() (I haven't 
tested that though).

Then, after you are booted do a couple of
     echo 0 > /sys/devices/system/cpu/cpu1/online
     echo 1 > /sys/devices/system/cpu/cpu1/online

Then sit still for about 10 minutes. I don't think any activity is 
necessary.

You are dead now. If you are lucky you may see messages about soft 
lockups or RCU stalls but often nothing.

> Lemme guess: some xen special handling which baremetal doesn't need.

Only in the sense that on Xen misc_register() often fails. But any 
failure on baremetal will result in the same behavior.

>
>> Since unplugging a CPU would normally result in the notifier deleting
>> MCE timer we are now left with the timer running if a CPU is removed on
>> a system where mcheck_init_device() had failed.
>>
>> If we later hotplug this CPU back we add this timer again in
>> mcheck_cpu_init()). Eventually the two timers start intefering with each
>> other, causing soft lockups or system hangs.
>>
>> We should leave the notifier always on and, in fact, set it up early
>> during the boot.
> We do leave it always on - we only unregister it if we've encountered an
> error.

Right. And I think we shouldn't because we leave undeleted timers.

-boris


  reply	other threads:[~2014-06-20 15:40 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-20 14:28 [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path Boris Ostrovsky
2014-06-20 15:23 ` Borislav Petkov
2014-06-20 15:41   ` Boris Ostrovsky [this message]
2014-06-20 15:58     ` Borislav Petkov
2014-06-20 16:16       ` Boris Ostrovsky
2014-06-20 17:52         ` Borislav Petkov
2014-06-20 19:39           ` Boris Ostrovsky
2014-06-20 20:03             ` Borislav Petkov
2014-06-20 20:16               ` Boris Ostrovsky
2014-06-20 20:29                 ` Borislav Petkov
2014-06-20 20:43                   ` Boris Ostrovsky
2014-06-20 21:11                     ` Borislav Petkov
2014-06-21  2:04                       ` Boris Ostrovsky
2014-06-21 10:08                         ` Borislav Petkov
2014-07-24 23:36 ` [tip:x86/urgent] x86, MCE: Robustify mcheck_init_device tip-bot for Borislav Petkov
  -- strict thread matches above, loose matches on Subject: below --
2014-06-22 17:25 [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path Boris Ostrovsky
2014-06-24 10:25 ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53A45627.6090306@oracle.com \
    --to=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mattieu.souchaud@free.fr \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).