From: Ingo Molnar <mingo@elte.hu>
To: Andi Kleen <andi@firstfloor.org>
Cc: Huang Ying <ying.huang@intel.com>, Len Brown <lenb@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
Borislav Petkov <petkovbb@googlemail.com>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>, Don Zickus <dzickus@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Mauro Carvalho Chehab <mchehab@redhat.com>,
Arjan van de Ven <arjan@infradead.org>
Subject: Re: [NAK] Re: [PATCH -v2 9/9] ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support
Date: Mon, 25 Oct 2010 13:15:30 +0200 [thread overview]
Message-ID: <20101025111530.GA27659@elte.hu> (raw)
In-Reply-To: <20101025091913.GA17622@basil.fritz.box>
* Andi Kleen <andi@firstfloor.org> wrote:
> > > Sigh, please integrate all this into EDAC (drivers/edac/) properly, instead of
> > > turning it into YET ANOTHER hardware vendor special hw-errors thing. We can do
> > > better than this. EDAC is almost there: it has support for Nehalem, AMD, a
> > > couple of older chips.
> >
> > I think APEI (ACPI Platform Error Interface) is another driver. Why integrate
> > two drivers?
>
> Yes they're solving quite different problems from EDAC with different interfaces
> and for different devices in the ACPI space.
That's my whole point, _why_ do they have different interfaces?
EDAC is the upstream mechanism to organize hardware error reporting and to get
hardware errors to user-space. It is already successful in handling a wide range of
hardware in a similar fashion.
Furthermore, there is work ongoing to do the reporting via perf event channels, some
of that work is upstream already. Boris is working on persistent events, on RAS
tooling (tools/ras/) and on event injection. Here's a past submission of his work:
http://lwn.net/Articles/394522/
You are now doing a completely separate thing here, detaching a big CPU vendor from
the main body of Linux code that deals with this stuff.
IMHO that's not helpful _at all_.
> > > einj.c: it's about the 3rd separate 'error injection' concept that got
> > > introduced ...
> >
> > EINJ is a true platform feature, not just software feature. We need to support
> > it to debug various hardware error features.
>
> Also having multiple error injecting interfaces is a good thing.
It's never a good thing to have separate, vendor dependent interfaces for what to
the user is basically the same conceptual thing!
> Error injection is hard and one size definitely doesn't fit all. You need quite
> different ones depending on what you want to test, in which context etc.
And that kind of variance is in your opinion a good reason to introduce separate
user ABIs for it?
( And i dont care that there might be no 'end user' for hardware error injection per
se right now. There is certainly an 'end user' for hardware error events and even
_there_ you are introducing and pushing for separate, incompatible interfaces. )
We have really good historic data here: we got the _biggest_ practical advantage
from event enumeration (/debug/tracing/events/) when we extended it in a generic,
unified way to the rich topology that the hardware and the kernel gives us.
That way we got new, useful tools like powertop, timechart or pytimechart or the
edac tool, which can concentrate on a single, well-defined event topology and event
ABI.
Why do these tools like this kind of unified event enumeration and reporting
facilities, which you are fighting against so hard? Because of the big technological
advantage of having to deal with one enumeration and reporting facility alone. They
can get power events, scheduling events, timer events, kmalloc events all from the
same source - even though these subsystems have barely anything in common! Tools can
then combine these seemingly unrelated events into something new and useful.
It's a very extensible model, and with every new event type added, the tool space
gets richer _together_.
Error event injection to simulate/trigger various error conditions in those events
is a natural extension to the whole events framework - not something that should be
in a randomly different way.
What you are doing here is to fragment the whole landscape into small, incompatible,
vendor specific bits. Some of it is in /dev, some of it is in debugfs, some things
report via signals, etc. etc.
It's inconsistent, messy and doesnt integrate well with the events framework we are
building.
That was the main basis of my prior NAK, and you have said _nothing_ in the past
that invalidates the fundamental points of that NAK.
Instead you started, by stealth and by duplicity, looking for ways to get around
that conceptual NAK.
> For hwpoison we currently have three different injectors at least and I expect
> that to even grow more in the future as different features get added.
That's insane!
Ingo
next prev parent reply other threads:[~2010-10-25 11:15 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-25 7:43 [PATCH -v2 0/9] ACPI, APEI patches for 2.6.37 Huang Ying
2010-10-25 7:43 ` [PATCH -v2 1/9] ACPI, APEI, Add ERST record ID cache Huang Ying
2010-10-25 7:43 ` [PATCH -v2 2/9] Add lock-less version of bitmap_set/clear Huang Ying
2010-10-25 7:43 ` [PATCH -v2 3/9] lock-less NULL terminated single list implementation Huang Ying
2010-10-25 7:43 ` [PATCH -v2 4/9] lock-less general memory allocator Huang Ying
2010-10-25 7:43 ` [PATCH -v2 5/9] Hardware error device core Huang Ying
2010-10-25 7:43 ` [PATCH -v2 6/9] Hardware error record persistent support Huang Ying
2010-10-25 7:43 ` [PATCH -v2 7/9] ACPI, APEI, Use ERST for hardware error persisting before panic Huang Ying
2010-10-25 7:43 ` [PATCH -v2 8/9] ACPI, APEI, Report GHES error record with hardware error device core Huang Ying
2010-10-25 7:43 ` [PATCH -v2 9/9] ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support Huang Ying
2010-10-25 8:45 ` [NAK] " Ingo Molnar
2010-10-25 8:58 ` Huang Ying
2010-10-25 9:19 ` Andi Kleen
2010-10-25 11:15 ` Ingo Molnar [this message]
2010-10-25 12:04 ` Mauro Carvalho Chehab
2010-10-25 17:07 ` Tony Luck
2010-10-25 17:19 ` Mauro Carvalho Chehab
2010-10-25 12:37 ` Andi Kleen
2010-10-25 12:55 ` Ingo Molnar
2010-10-25 13:02 ` Ingo Molnar
2010-10-25 13:11 ` Andi Kleen
2010-10-25 13:47 ` Ingo Molnar
2010-10-25 15:14 ` Andi Kleen
2010-10-25 17:10 ` Ingo Molnar
2010-10-27 8:25 ` Ingo Molnar
2010-10-25 16:38 ` Thomas Gleixner
2010-10-25 9:25 ` Ingo Molnar
2010-10-25 17:14 ` Tony Luck
2010-10-25 20:23 ` Borislav Petkov
2010-10-25 21:23 ` Tony Luck
2010-10-25 21:51 ` Borislav Petkov
2010-10-25 23:35 ` Tony Luck
[not found] ` <AANLkTi=pJFUWusDNrwQA8bWYy4q5QZBHxkbikZGKvHLY@mail.gmail.com>
2010-10-26 6:26 ` Borislav Petkov
2010-10-26 1:06 ` Len Brown
2010-10-26 4:53 ` Thomas Gleixner
2010-10-26 7:22 ` Ingo Molnar
2010-10-26 7:30 ` Huang Ying
2010-10-26 7:55 ` Ingo Molnar
2010-10-26 8:32 ` Huang Ying
2010-10-26 10:03 ` Ingo Molnar
2010-10-26 8:38 ` Andi Kleen
2010-10-26 10:00 ` Thomas Gleixner
2010-10-26 8:52 ` Huang Ying
2010-10-26 10:15 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101025111530.GA27659@elte.hu \
--to=mingo@elte.hu \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=arjan@infradead.org \
--cc=dzickus@redhat.com \
--cc=hpa@zytor.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@redhat.com \
--cc=petkovbb@googlemail.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).