From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Andi Kleen <ak@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH -tip 1/3] x86, mce: Add mce_threshold option for intel cmci
Date: Tue, 31 Mar 2009 16:22:36 +0900 [thread overview]
Message-ID: <49D1C4BC.1020806@jp.fujitsu.com> (raw)
In-Reply-To: <49D0996D.1050106@linux.intel.com>
Andi Kleen wrote:
>>> BTW another thing you need to be aware of is that not all CMCI banks necessarily support
>>> thresholds > 1. The SDM has a special algorithm to discover the counter width.
>>> This means the scheme wouldn't work for some banks.
>> My current implementation already follows the SDM.
>
> Yes didn't want to doubt that, just saying that it's not very useful
> to play with the thresholds on those "only one" banks.
I know such "only one" banks is possible according to specification,
but I'd like to know how many such banks are there in real world.
# Exactly It is great that Intel introduced threshold capability.
# But are there any reason why they don't implement it to all banks,
# and even implemented why some cannot have > 1?
## ... Don't mind, this is not complaints to you, Andi.
>> Summarize:
>> - Disabling CMCI (=use polling instead) is nice to have.
>
> with a boot parameter.
Nice to have a consensus.
>> - Disabling polling (but use CMCI) is pointless.
>> (only use on trouble that only break polling?)
>
> You can already do that by setting check_interval == 0
Right. Give documents for it, please.
>> - Disabling stuff for CE (both of polling and CMCI) will be help for some
>> particular cases.
>
> Actually I have my doubts of that (if you think of the SMI logging
> which should be able to get them first anyways without kernel options),
> but a boot option for this at least wouldn't be particularly
> bloated. I suspect the use case would be to mainly shut off
> the printk.
Unfortunately SMI is not the case.
>> - Increasing threshold is not so good idea?
>
> Yes.
OK, now I agree with it.
>> Personally, instead of "mce=nopoll" and "mce_threshold=[0|N]", an alternative
>> combination, one like "mce=no_corrected" or "mce=ignore_ce" for disable both
>> and another like "mce=no_cmci" for disabling CMCI, would be also OK.
>> Which do you prefer?
>
> mce=ignore_ce and mce=no_cmci
Thank you for expected response.
> Also it's still open if you want to do the logging of left over
> errors from boot too or not included with this.
I don't care the left over record at this time.
>> IIRC, the complain was from user of IPF, because it was "noise" for him.
>> Or just there was "it would be acceptable if the rate were 1/5" or so.
>> Real solution will be killing CE related stuff in kernel at all, anyway.
>
> Or in the BIOS. We can do it in the kernel, but I suspect for you
> it would be user friendlier if the BIOS just never made them
> visible.
However I heard that hiding such thing by BIOS might be a problem in
case that making it visible is required for hardware certificates,
e.g. Windows's certificates.
>> In short, it changes behavior on uncorrected errors, from "panic" to "hang up."
>
> Playing devils advocate here, but if your BIOS is really that intelligent
> isn't that what you want? As far as I understand your patches seem
> to be all about moving things from the OS to the BIOS and that
> would be the ultimate way to move UC errors to the BIOS too.
Traditionally (actually I'm not sure how much long ago it means) corrected
errors were just ignored or only handled by BIOS, while uncorrected errors
were forwarded to OS. For another example, there are some particular cases
that a vendor specific hardware monitoring application is bundled with the
hardware, expecting that it can gather error information in the hardware,
and assuming that OS and other applications never handle corrected errors.
Of course I don't doubt that such scheme will not applicable in these days,
however there are still some doing so in the old style. We should stop
them but have not done yet. Is it help you if I call setting ignore_ce as
traditional-compatible mode?
Personally, I can understand a policy that a platform (server hardware)
should be stand alone not depending on the OS running on it.
Like PAL/SAL on IPF, intelligent firmwares will be able to take a part of
error recovery.
But here I'm not requesting such fancy thing for x86.
In conclusion, the mce=ignore_ce and mce=no_cmci will be better interface.
Compare with current version, it lacks threshold >1 support but it does
no matter because threshold >1 will work improperly and help nothing.
I'll post new one. Please wait a moment...
Thanks,
H.Seto
next prev parent reply other threads:[~2009-03-31 7:22 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-26 8:39 [PATCH -tip 1/3] x86, mce: Add mce_threshold option for intel cmci Hidetoshi Seto
2009-03-26 9:10 ` Andi Kleen
2009-03-27 9:44 ` Hidetoshi Seto
2009-03-27 10:31 ` Andi Kleen
2009-03-30 9:06 ` Hidetoshi Seto
2009-03-30 10:05 ` Andi Kleen
2009-03-31 7:22 ` Hidetoshi Seto [this message]
2009-03-31 8:15 ` Andi Kleen
2009-03-28 12:00 ` Ingo Molnar
2009-03-28 12:08 ` Ingo Molnar
2009-03-30 9:42 ` Andi Kleen
2009-03-31 2:45 ` Hidetoshi Seto
2009-03-31 8:08 ` Andi Kleen
2009-03-31 2:45 ` Hidetoshi Seto
2009-04-01 15:07 ` Ingo Molnar
2009-04-02 4:43 ` Hidetoshi Seto
2009-04-02 4:54 ` [PATCH -tip 1/3] x86, mce: Revert "add mce_threshold option for intel cmci" Hidetoshi Seto
2009-04-02 4:55 ` [PATCH -tip 2/3] x86, mce: Revert "add mce=nopoll option to disable timer polling" Hidetoshi Seto
2009-04-02 4:58 ` [PATCH -tip 3/3] x86, mce: Add new option mce=no_cmci and mce=ignore_ce Hidetoshi Seto
2009-03-28 21:28 ` [tip:x86/mce2] x86, mce: Add mce_threshold option for intel cmci Hidetoshi Seto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49D1C4BC.1020806@jp.fujitsu.com \
--to=seto.hidetoshi@jp.fujitsu.com \
--cc=ak@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox