From: Ingo Molnar <mingo@elte.hu>
To: Andi Kleen <andi@firstfloor.org>
Cc: Huang Ying <ying.huang@intel.com>, Len Brown <lenb@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
Borislav Petkov <petkovbb@googlemail.com>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>, Don Zickus <dzickus@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Mauro Carvalho Chehab <mchehab@redhat.com>,
Arjan van de Ven <arjan@infradead.org>
Subject: Re: [NAK] Re: [PATCH -v2 9/9] ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support
Date: Mon, 25 Oct 2010 15:47:40 +0200 [thread overview]
Message-ID: <20101025134740.GA8888@elte.hu> (raw)
In-Reply-To: <20101025131127.GC17622@basil.fritz.box>
* Andi Kleen <andi@firstfloor.org> wrote:
> On Mon, Oct 25, 2010 at 02:55:31PM +0200, Ingo Molnar wrote:
> >
> > * Andi Kleen <andi@firstfloor.org> wrote:
> >
> > > On Mon, Oct 25, 2010 at 01:15:30PM +0200, Ingo Molnar wrote:
> > >
> > > > > > > einj.c: it's about the 3rd separate 'error injection' concept that got
> > > > > > > introduced ...
> > > > > >
> > > > > > EINJ is a true platform feature, not just software feature. We need to support
> > > > > > it to debug various hardware error features.
> > > > >
> > > > > Also having multiple error injecting interfaces is a good thing.
> > > >
> > > > It's never a good thing to have separate, vendor dependent interfaces for what
> > > > to the user is basically the same conceptual thing!
> > >
> > > Perhaps a simple example (simplified, in practice there are more complications)
> > > makes it more clear:
> > >
> > > The memory error handler does different actions depending on what the state the
> > > page the error is happening on is in.
> >
> > What you appear to be arguing for is the ability to inject different types of
> > events.
>
> Different events in different contexts with different drivers with different
> parameters [...]
Correct.
> [...] using different tools.
That's possible, but i'd expect tools/ras/ to be populated with uniformly working
tools. There's little sense in fragmenting the hw-testing field...
> Commonality: about 0% exept there's "error" somewhere in the description.
Wrong. Their main purpose is common: they are events attached to existing hardware
topologies, which events can be configured, which events can be received and which
can be injected with attributes for rare-event simulation purposes.
The tool people have spoken to us clear and loud that they want to _receive_ events
in a unified and structured way - not via lots of separate ABIs from facilities that
have mismatching capabilities.
We want to be able to inject _other_ events as well, not just hw-error ones -
especially rare ones.
I.e. there's clear, demonstrated, patches-pending demand for uniformity and there's
similar demand for a broader concept.
You are now making the point that somehow the receipt and sending/injecting of 'hw
errors on Intel hardware' should be a separate, fragmented, disoriented, messy piece
of interface design, closely matching some ACPI spec detail, which should be
disassociated from the preferred mechanism of error reporting?
Your argument makes absolutely no sense to me.
The kernel is an abstraction machine: common hw aspects should be generalized to the
extent it makes sense, with reasonable extensions for anything we dont want (or
cannot) generalize.
There's _tons_ of interesting structure here to be taken advantage of: just look at
what Boris is trying to achieve with his EDAC tooling patches. See what Lin Ming is
trying to do by moving event descriptors to /sys, so that events can come with
elements of our hw and sw topology in a natural way.
There is absolutely no justification whatsoever for the new /dev/erst-dbg ABI ...
Furthermore, you have ignored my other argument for the second time now: why does
this code not do the event _reporting_ via the facilities we use and prefer? As far
as users are concerned, the ability to receive hardware error events in a unified
way is an even more important aspect than the matter of event injection.
Once you do that i think you will see how naturally error injection fits into the
picture as well. It is an aspect of pretty much any event (not just hw-error events)
that we want to be able to 'inject/simulate' them, to test tools.
Your refusal to even consider this possibility and to look at the EDAC/RAS patches
that deal with this is puzzling to me.
Thanks,
Ingo
next prev parent reply other threads:[~2010-10-25 13:48 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-25 7:43 [PATCH -v2 0/9] ACPI, APEI patches for 2.6.37 Huang Ying
2010-10-25 7:43 ` [PATCH -v2 1/9] ACPI, APEI, Add ERST record ID cache Huang Ying
2010-10-25 7:43 ` [PATCH -v2 2/9] Add lock-less version of bitmap_set/clear Huang Ying
2010-10-25 7:43 ` [PATCH -v2 3/9] lock-less NULL terminated single list implementation Huang Ying
2010-10-25 7:43 ` [PATCH -v2 4/9] lock-less general memory allocator Huang Ying
2010-10-25 7:43 ` [PATCH -v2 5/9] Hardware error device core Huang Ying
2010-10-25 7:43 ` [PATCH -v2 6/9] Hardware error record persistent support Huang Ying
2010-10-25 7:43 ` [PATCH -v2 7/9] ACPI, APEI, Use ERST for hardware error persisting before panic Huang Ying
2010-10-25 7:43 ` [PATCH -v2 8/9] ACPI, APEI, Report GHES error record with hardware error device core Huang Ying
2010-10-25 7:43 ` [PATCH -v2 9/9] ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support Huang Ying
2010-10-25 8:45 ` [NAK] " Ingo Molnar
2010-10-25 8:58 ` Huang Ying
2010-10-25 9:19 ` Andi Kleen
2010-10-25 11:15 ` Ingo Molnar
2010-10-25 12:04 ` Mauro Carvalho Chehab
2010-10-25 17:07 ` Tony Luck
2010-10-25 17:19 ` Mauro Carvalho Chehab
2010-10-25 12:37 ` Andi Kleen
2010-10-25 12:55 ` Ingo Molnar
2010-10-25 13:02 ` Ingo Molnar
2010-10-25 13:11 ` Andi Kleen
2010-10-25 13:47 ` Ingo Molnar [this message]
2010-10-25 15:14 ` Andi Kleen
2010-10-25 17:10 ` Ingo Molnar
2010-10-27 8:25 ` Ingo Molnar
2010-10-25 16:38 ` Thomas Gleixner
2010-10-25 9:25 ` Ingo Molnar
2010-10-25 17:14 ` Tony Luck
2010-10-25 20:23 ` Borislav Petkov
2010-10-25 21:23 ` Tony Luck
2010-10-25 21:51 ` Borislav Petkov
2010-10-25 23:35 ` Tony Luck
[not found] ` <AANLkTi=pJFUWusDNrwQA8bWYy4q5QZBHxkbikZGKvHLY@mail.gmail.com>
2010-10-26 6:26 ` Borislav Petkov
2010-10-26 1:06 ` Len Brown
2010-10-26 4:53 ` Thomas Gleixner
2010-10-26 7:22 ` Ingo Molnar
2010-10-26 7:30 ` Huang Ying
2010-10-26 7:55 ` Ingo Molnar
2010-10-26 8:32 ` Huang Ying
2010-10-26 10:03 ` Ingo Molnar
2010-10-26 8:38 ` Andi Kleen
2010-10-26 10:00 ` Thomas Gleixner
2010-10-26 8:52 ` Huang Ying
2010-10-26 10:15 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101025134740.GA8888@elte.hu \
--to=mingo@elte.hu \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=arjan@infradead.org \
--cc=dzickus@redhat.com \
--cc=hpa@zytor.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@redhat.com \
--cc=petkovbb@googlemail.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).