All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Borislav Petkov <petkovbb@googlemail.com>,
	Andi Kleen <andi@firstfloor.org>, Ingo Molnar <mingo@elte.hu>,
	mingo@redhat.com, hpa@zytor.com, linux-kernel@vger.kernel.org,
	tglx@linutronix.de, Andreas Herrmann <andreas.herrmann3@amd.com>,
	linux-tip-commits@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Fr??d??ric Weisbecker <fweisbec@gmail.com>,
	Mauro Carvalho Chehab <mchehab@infradead.org>,
	Aristeu Rozanski <aris@redhat.com>,
	Doug Thompson <norsk5@yahoo.com>,
	Huang Ying <ying.huang@intel.com>,
	Arjan van de Ven <arjan@infradead.org>
Subject: Re: [tip:x86/mce] x86, mce: Rename cpu_specific_poll to mce_cpu_specific_poll
Date: Tue, 26 Jan 2010 18:06:26 +0900	[thread overview]
Message-ID: <4B5EB092.80901@jp.fujitsu.com> (raw)
In-Reply-To: <20100126063343.GA18865@liondog.tnic>

(2010/01/26 15:33), Borislav Petkov wrote:
> In the end, even if the info were correct, it is still not nearly enough
> for all the information you might need from a system. So you end up
> pulling a dozen of different tools just to get the info you need. So
> yes, I really do think we need a tool to get do the job done right and
> on any system. And this tool should be distributed with the kernel
> sources like perf is, so that you don't have to jump through hoops to
> pull the stuff (Esp. if you have to build everything everytime like
> Andreas does :)).

How about having a system file which can be maintained with kernel,
e.g. like /proc/hwinfo, /sys/devices/platform/hwinfo, or directory
with some files like /somewhere/hwinfo/{dmi,acpi,pci,...} etc.?

>> And since it's kernel
>> based it cannot do most of the interesting reactions. And it doesn't
>> have a usable interface to add user events.
>>
>> And yes having all that crap in syslog is completely useless, unless
>> you're debugging code.
> 
> So basically, IMHO we need:
> 
> 1. Resilient error reporting that reliably pushes decoded error info to
> userspace and/or network. That one might be tricky to do but we'll get
> there.

I think it would be better to think "error" is a subset of "event",
which could be reported if interested but otherwise be filtered.
Use of TRACE_EVENT() for mce event aim such approach at least.

> 2. Error severity grading and acting upon each type accordingly. This
> might need to be vendor-specific.

I think you mean severity grading in kernel.
Even if hardware reported an error and graded it as corrected, kernel
can escalate the severity, likely based on some threshold.

> 3. Proper error format suiting all types of errors.

As mentioned in Andi's PDF, CPER format is one of good candidate
available today, I think.
However we could invent more suitable one if needed.

> 4. Vendor-specific hooks where it is needed for in-kernel handling of
> certain errors (L3 cache index disable, for example).

Some difficulty would be there to add such hook in the UE handling path,
but anyway we can have it for the CE path.  Just need to be organized.

> 5. Error thresholding, representation, etc all done in userspace (maybe
> even on a different machine).

(...BTW, how about putting mcelog tree under the /tools, Andi?)

> 6. Last but not least, and maybe this is wishful thinking, a good tool
> to dump hwinfo from the kernel. We do a great job of detecting that info
> already - we should do something with it, at least report it...

Of course I want to have a tool to get a summary (not full dump) of
current hardware status too: e.g.
  $ cat ./hwinfo/faulty
  WARN: DIMM @ slot X on node Y: 208 errors corrected in last 3 days
  INFO: PCI 0000:NN:01.1: 1 error recovered 37 hours ago

> Let's see what the others think.
> 
> Thanks.

Thanks,
H.Seto


  reply	other threads:[~2010-01-26  9:07 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-21 22:17 [PATCH] x86: mce: Xeon75xx specific interface to get corrected memory error information Andi Kleen
2010-01-22 10:51 ` [tip:x86/mce] x86, " tip-bot for Andi Kleen
2010-01-22 10:51 ` [tip:x86/mce] x86, mce: Rename cpu_specific_poll to mce_cpu_specific_poll tip-bot for H. Peter Anvin
2010-01-23  5:17   ` Ingo Molnar
2010-01-23  7:58     ` Borislav Petkov
2010-01-23  9:00       ` Ingo Molnar
2010-01-24 10:08         ` Borislav Petkov
2010-01-25 13:19           ` Andi Kleen
2010-01-26  6:33             ` Borislav Petkov
2010-01-26  9:06               ` Hidetoshi Seto [this message]
2010-01-26 16:09                 ` Andi Kleen
2010-01-26 15:36               ` Andi Kleen
2010-02-16 21:02           ` Ingo Molnar
2010-02-22  8:28             ` Borislav Petkov
2010-02-22  9:47               ` Ingo Molnar
2010-02-22 11:59                 ` Mauro Carvalho Chehab
2010-02-24 17:42                   ` Mauro Carvalho Chehab
2010-02-24 20:28                     ` Andi Kleen
2010-01-27 12:34         ` Mauro Carvalho Chehab
2010-01-27 14:39           ` Andi Kleen
2010-01-27 15:04             ` Mauro Carvalho Chehab
2010-01-27 16:36               ` Andi Kleen
2010-01-23 11:33     ` Andi Kleen
2010-02-05 23:31       ` [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI tip-bot for Andi Kleen
2010-02-16 20:47         ` Ingo Molnar
2010-02-16 22:29           ` Andi Kleen
2010-02-19 10:50             ` Thomas Gleixner
2010-02-19 12:17               ` Andi Kleen
2010-02-19 12:45                 ` Borislav Petkov
2010-02-19 13:21                   ` Andi Kleen
2010-02-19 15:17                     ` Mauro Carvalho Chehab
2010-02-19 15:37                       ` Andi Kleen
2010-02-20  0:14                         ` Mauro Carvalho Chehab
2010-02-20  9:01                           ` Andi Kleen
2010-02-19 15:46                 ` Thomas Gleixner
2010-02-22  7:38             ` Hidetoshi Seto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B5EB092.80901@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=andi@firstfloor.org \
    --cc=andreas.herrmann3@amd.com \
    --cc=aris@redhat.com \
    --cc=arjan@infradead.org \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mchehab@infradead.org \
    --cc=mingo@elte.hu \
    --cc=mingo@redhat.com \
    --cc=norsk5@yahoo.com \
    --cc=petkovbb@googlemail.com \
    --cc=tglx@linutronix.de \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.