From: Andi Kleen <andi@firstfloor.org>
To: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>,
Andi Kleen <andi@firstfloor.org>, Ingo Molnar <mingo@elte.hu>,
mingo@redhat.com, hpa@zytor.com, linux-kernel@vger.kernel.org,
tglx@linutronix.de, Andreas Herrmann <andreas.herrmann3@amd.com>,
linux-tip-commits@vger.kernel.org,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Fr??d??ric Weisbecker <fweisbec@gmail.com>,
Mauro Carvalho Chehab <mchehab@infradead.org>,
Aristeu Rozanski <aris@redhat.com>,
Doug Thompson <norsk5@yahoo.com>,
Huang Ying <ying.huang@intel.com>,
Arjan van de Ven <arjan@infradead.org>
Subject: Re: [tip:x86/mce] x86, mce: Rename cpu_specific_poll to mce_cpu_specific_poll
Date: Tue, 26 Jan 2010 17:09:13 +0100 [thread overview]
Message-ID: <20100126160913.GD6567@basil.fritz.box> (raw)
In-Reply-To: <4B5EB092.80901@jp.fujitsu.com>
On Tue, Jan 26, 2010 at 06:06:26PM +0900, Hidetoshi Seto wrote:
> How about having a system file which can be maintained with kernel,
> e.g. like /proc/hwinfo, /sys/devices/platform/hwinfo, or directory
> with some files like /somewhere/hwinfo/{dmi,acpi,pci,...} etc.?
Why not do that in user space?
In fact it's often already done.
Just because we're kernel programmers doesn't mean that everything
needs to be solved inside the kernel.
> >> And since it's kernel
> >> based it cannot do most of the interesting reactions. And it doesn't
> >> have a usable interface to add user events.
> >>
> >> And yes having all that crap in syslog is completely useless, unless
> >> you're debugging code.
> >
> > So basically, IMHO we need:
> >
> > 1. Resilient error reporting that reliably pushes decoded error info to
> > userspace and/or network. That one might be tricky to do but we'll get
> > there.
>
> I think it would be better to think "error" is a subset of "event",
> which could be reported if interested but otherwise be filtered.
> Use of TRACE_EVENT() for mce event aim such approach at least.
The whole trace event infrastructure right now is not really
aimed/useful for "always on low overhead background monitoring" like
standard error handling requires.
In principle it could be probably fixed (although I'm a bit
sceptical on the "low overhead" part), but I suspect the result
would be neither optimized for error handling nor optimized
for performance monitoring anymore. They simply have
very different requirements.
When you do full event tracing anyways it makes some sense to get events
for errors too, but that's a quite different use-case.
For the "standard" error handling I think we're better of with
something optimized for the job.
> > 2. Error severity grading and acting upon each type accordingly. This
> > might need to be vendor-specific.
>
> I think you mean severity grading in kernel.
> Even if hardware reported an error and graded it as corrected, kernel
> can escalate the severity, likely based on some threshold.
I don't think the kernel should do that, it's so much a policy
decision and these are best kept as near the administrator
as possible (= user space)
That is for some cases it might make sense to have limited thresholds
in the kernel, but I suspect they are limited. Mostly it would
be the case when the hardware itselfs already keeps these counters.
>
> > 3. Proper error format suiting all types of errors.
>
> As mentioned in Andi's PDF, CPER format is one of good candidate
> available today, I think.
Yes for hardware errors. It's definitely not perfect and somewhat
overdesigned, but I'm not sure we could come up with a much better one.
At least a subset of it with some extensions might do. Also in some
cases the error is already in this format.
The advantage of it is that it's at least well understood and documented.
> > 4. Vendor-specific hooks where it is needed for in-kernel handling of
> > certain errors (L3 cache index disable, for example).
>
> Some difficulty would be there to add such hook in the UE handling path,
> but anyway we can have it for the CE path. Just need to be organized.
>
> > 5. Error thresholding, representation, etc all done in userspace (maybe
> > even on a different machine).
>
> (...BTW, how about putting mcelog tree under the /tools, Andi?)
I don't see the advantage. Linux has always been a collection
of packages, not a unified single big tree. Also my current
impression is that the in tree user space builds don't work
very well.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
next prev parent reply other threads:[~2010-01-26 16:09 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-21 22:17 [PATCH] x86: mce: Xeon75xx specific interface to get corrected memory error information Andi Kleen
2010-01-22 10:51 ` [tip:x86/mce] x86, " tip-bot for Andi Kleen
2010-01-22 10:51 ` [tip:x86/mce] x86, mce: Rename cpu_specific_poll to mce_cpu_specific_poll tip-bot for H. Peter Anvin
2010-01-23 5:17 ` Ingo Molnar
2010-01-23 7:58 ` Borislav Petkov
2010-01-23 9:00 ` Ingo Molnar
2010-01-24 10:08 ` Borislav Petkov
2010-01-25 13:19 ` Andi Kleen
2010-01-26 6:33 ` Borislav Petkov
2010-01-26 9:06 ` Hidetoshi Seto
2010-01-26 16:09 ` Andi Kleen [this message]
2010-01-26 15:36 ` Andi Kleen
2010-02-16 21:02 ` Ingo Molnar
2010-02-22 8:28 ` Borislav Petkov
2010-02-22 9:47 ` Ingo Molnar
2010-02-22 11:59 ` Mauro Carvalho Chehab
2010-02-24 17:42 ` Mauro Carvalho Chehab
2010-02-24 20:28 ` Andi Kleen
2010-01-27 12:34 ` Mauro Carvalho Chehab
2010-01-27 14:39 ` Andi Kleen
2010-01-27 15:04 ` Mauro Carvalho Chehab
2010-01-27 16:36 ` Andi Kleen
2010-01-23 11:33 ` Andi Kleen
2010-02-05 23:31 ` [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI tip-bot for Andi Kleen
2010-02-16 20:47 ` Ingo Molnar
2010-02-16 22:29 ` Andi Kleen
2010-02-19 10:50 ` Thomas Gleixner
2010-02-19 12:17 ` Andi Kleen
2010-02-19 12:45 ` Borislav Petkov
2010-02-19 13:21 ` Andi Kleen
2010-02-19 15:17 ` Mauro Carvalho Chehab
2010-02-19 15:37 ` Andi Kleen
2010-02-20 0:14 ` Mauro Carvalho Chehab
2010-02-20 9:01 ` Andi Kleen
2010-02-19 15:46 ` Thomas Gleixner
2010-02-22 7:38 ` Hidetoshi Seto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100126160913.GD6567@basil.fritz.box \
--to=andi@firstfloor.org \
--cc=a.p.zijlstra@chello.nl \
--cc=andreas.herrmann3@amd.com \
--cc=aris@redhat.com \
--cc=arjan@infradead.org \
--cc=fweisbec@gmail.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mchehab@infradead.org \
--cc=mingo@elte.hu \
--cc=mingo@redhat.com \
--cc=norsk5@yahoo.com \
--cc=petkovbb@googlemail.com \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=tglx@linutronix.de \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).