public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Huang Ying <ying.huang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>, Len Brown <lenb@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andi Kleen <andi@firstfloor.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	Borislav Petkov <petkovbb@googlemail.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Don Zickus <dzickus@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mauro Carvalho Chehab <mchehab@redhat.com>,
	"Luck, Tony" <tony.luck@intel.com>
Subject: Re: [NAK] Re: [PATCH -v2 9/9] ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support
Date: Tue, 26 Oct 2010 12:15:36 +0200	[thread overview]
Message-ID: <20101026101536.GC16552@elte.hu> (raw)
In-Reply-To: <1288083163.2862.592.camel@yhuang-dev>


* Huang Ying <ying.huang@intel.com> wrote:

> Hi, Thomas,
> 
> On Tue, 2010-10-26 at 12:53 +0800, Thomas Gleixner wrote:
> > B1;2401;0cLen,
> > 
> > On Mon, 25 Oct 2010, Len Brown wrote:
> > 
> > > >  NAKed-by: Ingo Molnar <mingo@elte.hu>
> > > 
> > > Everybody knows that Linux has a lot to learn about RAS.
> > > 
> > > I think to catch up, we need to play to Linux's strengths
> > > of continuous improvement.  If we halt patches in this area
> > > then we could wait forever for the "perfect design".
> > 
> > it's not about perfect design. It's about creating new user space
> > ABIs. The patches introduce another error reporting user space ABI
> > with an ad hoc "fits the needs" design.
> > 
> > This is my major point of objection. 
> > 
> > I agree that Linux needs improvement on the RAS side, but does this
> > lack of features justify a new user space ABI which is totally
> > disconnected to existing RAS facilities ?
> > 
> > No, it does not. It's not our problem that Intel wasted time on
> > creating another character device driver to report errors to user
> > space. The time spent to do so would have been sufficient to do a
> > proper integration into the existing infrastructure.
> > 
> > I would not care at all if these patches would just introduce some
> > weird in kernel interfaces as we can clean that up at will. But
> > introducing a new user space ABI is setting the disconnect of RAS
> > related facilities into stone.
> > 
> > From Kconfig:
> > 
> >   EDAC is designed to report errors in the core system.
> >   These are low-level errors that are reported in the CPU or
> >   supporting chipset or other subsystems:
> >   memory errors, cache errors, PCI errors, thermal throttling, etc..
> >   If unsure, select 'Y'.
> > 
> > So please explain why your error reporting is so different from the
> > above that it justifies a separate facility. And you better come up
> > with a real good explanation other than we looked at EDAC and it did
> > not fit our needs.
> 
> As far as I know, EDAC guys plan to use some other "perfect interface" in the 
> future. So I think the current state is really waiting for the "perfect design".

Not sure what you mean by this, but Boris has posted links to his latest patch-set 
in this thread, see:

  http://kerneltrap.org/mailarchive/linux-kernel/2010/8/6/4603847

The Git coordinates are:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git, branch tip/perf/parse-events

The 'persistent events' facility he has prototyped there appears to be a good 
potential match for the ERST store.

It would be very useful to have another feature there: to mark persistent events as 
'dump into syslog on bootup', so that for example the contents of the ERST log could 
be dumped right on bootup. [but ERST would not be the only persistent event that 
could be marked like that.]

Note that we dont need/want other ABI accesses to the ERST log (i.e. we dont want 
/dev/erst-dbg), because we want the benefits of the generalization: tooling (RAS and 
other tooling) should learn how to deal with persistent events - not learn how to 
deal with ERST logs ... or with warm bootup RAM-embedded logs ... or to deal with 
kcrash embedded kernel logs ... etc.

There are many obvious advantages from implementing it like that: there's no need to 
special-code ERST to printk or ERST to whatever other facility cross links - it 
would be part of a generic/uniform event logging facility to begin with. ERST would 
only implement its own, narrow, hardware-specific event accessor methods - nothing 
else. Basically a small 'event driver'. This would be the most optimal, smallest, 
easiest to maintain approach - with no facility duplication and no fragmentation.

It's certainly more work as well _for the first such example_ - but from that point 
on any new hardware facility can be added with ease, and those too will fit into 
existing tooling in a very natural way.

So please help out with the persistent events work. If you need any pointers we'd be 
glad to help.

Thanks,

	Ingo

      reply	other threads:[~2010-10-26 10:15 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-25  7:43 [PATCH -v2 0/9] ACPI, APEI patches for 2.6.37 Huang Ying
2010-10-25  7:43 ` [PATCH -v2 1/9] ACPI, APEI, Add ERST record ID cache Huang Ying
2010-10-25  7:43 ` [PATCH -v2 2/9] Add lock-less version of bitmap_set/clear Huang Ying
2010-10-25  7:43 ` [PATCH -v2 3/9] lock-less NULL terminated single list implementation Huang Ying
2010-10-25  7:43 ` [PATCH -v2 4/9] lock-less general memory allocator Huang Ying
2010-10-25  7:43 ` [PATCH -v2 5/9] Hardware error device core Huang Ying
2010-10-25  7:43 ` [PATCH -v2 6/9] Hardware error record persistent support Huang Ying
2010-10-25  7:43 ` [PATCH -v2 7/9] ACPI, APEI, Use ERST for hardware error persisting before panic Huang Ying
2010-10-25  7:43 ` [PATCH -v2 8/9] ACPI, APEI, Report GHES error record with hardware error device core Huang Ying
2010-10-25  7:43 ` [PATCH -v2 9/9] ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support Huang Ying
2010-10-25  8:45   ` [NAK] " Ingo Molnar
2010-10-25  8:58     ` Huang Ying
2010-10-25  9:19       ` Andi Kleen
2010-10-25 11:15         ` Ingo Molnar
2010-10-25 12:04           ` Mauro Carvalho Chehab
2010-10-25 17:07             ` Tony Luck
2010-10-25 17:19               ` Mauro Carvalho Chehab
2010-10-25 12:37           ` Andi Kleen
2010-10-25 12:55             ` Ingo Molnar
2010-10-25 13:02               ` Ingo Molnar
2010-10-25 13:11               ` Andi Kleen
2010-10-25 13:47                 ` Ingo Molnar
2010-10-25 15:14                   ` Andi Kleen
2010-10-25 17:10                     ` Ingo Molnar
2010-10-27  8:25                       ` Ingo Molnar
2010-10-25 16:38         ` Thomas Gleixner
2010-10-25  9:25       ` Ingo Molnar
2010-10-25 17:14         ` Tony Luck
2010-10-25 20:23           ` Borislav Petkov
2010-10-25 21:23             ` Tony Luck
2010-10-25 21:51               ` Borislav Petkov
2010-10-25 23:35                 ` Tony Luck
2010-10-26  6:26                   ` Borislav Petkov
2010-10-26  1:06     ` Len Brown
2010-10-26  4:53       ` Thomas Gleixner
2010-10-26  7:22         ` Ingo Molnar
2010-10-26  7:30           ` Huang Ying
2010-10-26  7:55             ` Ingo Molnar
2010-10-26  8:32               ` Huang Ying
2010-10-26 10:03                 ` Ingo Molnar
2010-10-26  8:38         ` Andi Kleen
2010-10-26 10:00           ` Thomas Gleixner
2010-10-26  8:52         ` Huang Ying
2010-10-26 10:15           ` Ingo Molnar [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101026101536.GC16552@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=dzickus@redhat.com \
    --cc=hpa@zytor.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@redhat.com \
    --cc=petkovbb@googlemail.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox