All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <ak@linux.intel.com>
To: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH -tip 1/3] x86, mce: Add mce_threshold option for intel cmci
Date: Tue, 31 Mar 2009 10:15:07 +0200	[thread overview]
Message-ID: <49D1D10B.5010308@linux.intel.com> (raw)
In-Reply-To: <49D1C4BC.1020806@jp.fujitsu.com>

Hidetoshi Seto wrote:
> Andi Kleen wrote:
>>>> BTW another thing you need to be aware of is that not all CMCI banks necessarily support
>>>> thresholds > 1. The SDM has a special algorithm to discover the counter width.
>>>> This means the scheme wouldn't work for some banks.
>>> My current implementation already follows the SDM.
>> Yes didn't want to doubt that, just saying that it's not very useful
>> to play with the thresholds on those "only one" banks.
> 
> I know such "only one" banks is possible according to specification,
> but I'd like to know how many such banks are there in real world.

I was told they are possible.

> # Exactly It is great that Intel introduced threshold capability.
> # But are there any reason why they don't implement it to all banks,
> # and even implemented why some cannot have > 1?
> ## ... Don't mind, this is not complaints to you, Andi.

I don't know why it was done this way.

>>>  - Disabling polling (but use CMCI) is pointless.
>>>     (only use on trouble that only break polling?)
>> You can already do that by setting check_interval == 0
> 
> Right.  Give documents for it, please.

Patch done.

>>>  - Disabling stuff for CE (both of polling and CMCI) will be help for some
>>>    particular cases.
>> Actually I have my doubts of that (if you think of the SMI logging
>> which should be able to get them first anyways without kernel options),
>> but a boot option for this at least wouldn't be particularly
>> bloated. I suspect the use case would be to mainly shut off
>> the printk.
> 
> Unfortunately SMI is not the case.

Hmm, how does your BIOS log on its own then if it doesn't use SMI for this?

>> Also it's still open if you want to do the logging of left over
>> errors from boot too or not included with this.
> 
> I don't care the left over record at this time.

That means you want to log them or not? There's already a option to
disable it, but I suspect if for user friendliness you would want
to combine them in one.

Note that this is the only way to log fatal panics to disk on normal
systems.

>>> IIRC, the complain was from user of IPF, because it was "noise" for him.
>>> Or just there was "it would be acceptable if the rate were 1/5" or so.
>>> Real solution will be killing CE related stuff in kernel at all, anyway.
>> Or in the BIOS. We can do it in the kernel, but I suspect for you
>> it would be user friendlier if the BIOS just never made them
>> visible.
> 
> However I heard that hiding such thing by BIOS might be a problem in
> case that making it visible is required for hardware certificates,
> e.g. Windows's certificates.

Windows uses a different mechanism anyways I believe.

> 
>>> In short, it changes behavior on uncorrected errors, from "panic" to "hang up."
>> Playing devils advocate here, but if your BIOS is really that intelligent
>> isn't that what you want?  As far as I understand your patches seem
>> to be all about moving things from the OS to the BIOS and that
>> would be the ultimate way to move UC errors to the BIOS too.
> 
> Traditionally (actually I'm not sure how much long ago it means) corrected
> errors were just ignored or only handled by BIOS, while uncorrected errors
> were forwarded to OS.  For another example, there are some particular cases
> that a vendor specific hardware monitoring application is bundled with the
> hardware, expecting that it can gather error information in the hardware,

That means it accesses MSRs directly?

> 
> Of course I don't doubt that such scheme will not applicable in these days,
> however there are still some doing so in the old style.  We should stop
> them but have not done yet.  Is it help you if I call setting ignore_ce as
> traditional-compatible mode?

I don't think it's traditional on most standard x86 systems at least.

> 
> Personally, I can understand a policy that a platform (server hardware)
> should be stand alone not depending on the OS running on it.
> Like PAL/SAL on IPF, intelligent firmwares will be able to take a part of
> error recovery.
> 
> But here I'm not requesting such fancy thing for x86.
> 
> In conclusion, the mce=ignore_ce and mce=no_cmci will be better interface.
> Compare with current version, it lacks threshold >1 support but it does
> no matter because threshold >1 will work improperly and help nothing.

There's still the open issue with the leftover events at boot.

-Andi

  reply	other threads:[~2009-03-31  8:15 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-26  8:39 [PATCH -tip 1/3] x86, mce: Add mce_threshold option for intel cmci Hidetoshi Seto
2009-03-26  9:10 ` Andi Kleen
2009-03-27  9:44   ` Hidetoshi Seto
2009-03-27 10:31     ` Andi Kleen
2009-03-30  9:06       ` Hidetoshi Seto
2009-03-30 10:05         ` Andi Kleen
2009-03-31  7:22           ` Hidetoshi Seto
2009-03-31  8:15             ` Andi Kleen [this message]
2009-03-28 12:00     ` Ingo Molnar
2009-03-28 12:08 ` Ingo Molnar
2009-03-30  9:42   ` Andi Kleen
2009-03-31  2:45     ` Hidetoshi Seto
2009-03-31  8:08       ` Andi Kleen
2009-03-31  2:45   ` Hidetoshi Seto
2009-04-01 15:07     ` Ingo Molnar
2009-04-02  4:43       ` Hidetoshi Seto
2009-04-02  4:54         ` [PATCH -tip 1/3] x86, mce: Revert "add mce_threshold option for intel cmci" Hidetoshi Seto
2009-04-02  4:55         ` [PATCH -tip 2/3] x86, mce: Revert "add mce=nopoll option to disable timer polling" Hidetoshi Seto
2009-04-02  4:58         ` [PATCH -tip 3/3] x86, mce: Add new option mce=no_cmci and mce=ignore_ce Hidetoshi Seto
2009-03-28 21:28 ` [tip:x86/mce2] x86, mce: Add mce_threshold option for intel cmci Hidetoshi Seto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49D1D10B.5010308@linux.intel.com \
    --to=ak@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.