All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: "Raj, Ashok" <ashok.raj@intel.com>
Cc: linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org,
	Tony Luck <tony.luck@intel.com>
Subject: Re: [Patch V0] x86, mce: Ensure offline CPU's don't participate in mce rendezvous process.
Date: Fri, 4 Dec 2015 17:51:12 +0100	[thread overview]
Message-ID: <20151204165112.GI21177@pd.tnic> (raw)
In-Reply-To: <20151204171419.GA4870@otc-brkl-03.jf.intel.com>

On Fri, Dec 04, 2015 at 12:14:20PM -0500, Raj, Ashok wrote:
> Yes, thats possible to not do ist_enter() and the exception count. 
> 
> I tried to keep most of the part as is and leveraging code already
> doing the reading of MCG_STATUS. Architecturally we need to also check RIPV
> and if clear we should initiate shutdown.

So add that check too.

> When we add the logging from offline cpus as next step it would be safe to 
> use interrupt stack, and the offline

Franky, I'm not sure at all and very very wary of adding *any* code
which runs on an offlined CPU. Because *no one* does that and it hasn't
been tested at all. So who knows what happens.

What we should be doing is execute the *minimal* amount of code possible
and get out. No counting, no per-cpu variables. No nothing.

> I liked the observability part keeping the exception count. if and
> when we online the cpu again, it might look as it noticed nothing. Now
> we can check /proc/interrupts and see the offline cpu also observed
> the MCE.

And? Tell us what? That SMM fondled the hardware under our feet. TBH,
I'd tend to be much more drastic here and even taint the kernel. I mean,
seriously, what kind of MCEs which happen as a result of OS execution
are you expecting to get reported on an offlined CPU?

I can't think of very any.

Because we have been considering offlining a core as one possible RAS
action. So what happens is a user or a RAS agent offlines a core and
yet, that offlined core still reports MCEs. Something's terribly wrong
with that picture, IMO.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

  reply	other threads:[~2015-12-04 16:51 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-04  0:16 [Patch V0] x86, mce: Ensure offline CPU's don't participate in mce rendezvous process Ashok Raj
2015-12-03 23:34 ` Greg KH
2015-12-04 14:34 ` Borislav Petkov
2015-12-04 17:14   ` Raj, Ashok
2015-12-04 16:51     ` Borislav Petkov [this message]
2015-12-04 17:23       ` Luck, Tony
2015-12-04 17:36         ` Borislav Petkov
2015-12-04 17:53           ` Luck, Tony
2015-12-04 18:00             ` Borislav Petkov
2015-12-04 18:30               ` Luck, Tony
2015-12-04 19:38                 ` Borislav Petkov
2015-12-04 22:34             ` Andy Lutomirski
2015-12-05  0:08               ` Raj, Ashok
2015-12-04 23:14                 ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151204165112.GI21177@pd.tnic \
    --to=bp@alien8.de \
    --cc=ashok.raj@intel.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.