From: Andi Kleen <ak@linux.intel.com>
To: Tim Hockin <thockin@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
ying.huang@intel.com, Aaron Durbin <adurbin@gmail.com>,
priyankag@google.com
Subject: Re: x86/mce merge, integration hickup + crash, design thoughts
Date: Thu, 15 Jan 2009 23:56:43 +0100 [thread overview]
Message-ID: <496FBF2B.4010809@linux.intel.com> (raw)
In-Reply-To: <b3ece790901141132u28ba2482h2e7af7bd51224f2a@mail.gmail.com>
Tim Hockin wrote:
> Yeah, no offense, but that's horrible :)
I'm not sure it's worse than the XML like format proposals that seem to
get thrown around. That is I am the only one who mentioned the
X word yet, but the structured ASCII records that have been hinted
at would be exactly like that.
>
> Ideally, I'd rather see a more generic conduit for all sorts of
> events. Polled and exception MCEs. Thermal interrupts. MCE
> threshold interrupts.
Actually I think now MCE threshold interrupts should have never been
separate events. That was a design mistake in the AMD implementation
(together with all the sysfs complications)
An MCE threshold interrupt is just a slightly different internal
notification mechanism and it should only trigger the events it reads
from the MCE banks. Nothing more.
My upcoming CMCI code works exactly this way.
> PCI-express errors.
Yes we need some mechanism for those. Fortunately that's easier
because it doesn't need to handle NMIs.
> SATA
> disk timeouts.
Now that's a different issue. Generalized driver error reporting for everyone.
There was a lot of discussion some years ago from a IBM proposal to do
in general structured error reporting. But that was quite unpopular
and no-one really liked it.
What came out of it was the dev_printk() stuff that allows
to match error messages to devices. So you already have some
baby steps in this direction.
I suspect doing this fully generalized would be quite difficult
because there would be so many people you have to convince.
> Now I know there are different conduits for some events - netlink
> tells me about netif link up/down events I think. I would settle for
> a small number of interfaces. What I don't want is what we have today
> - EVERYTHING has a different interface. Some are poll()-able. Some
> have to be actively polled. Some have to have a daemon listening or
> else messages are dropped.
Well the kernel will always have limited buffers, so the someone
needs to listen problem will be always there.
There are not __that__ many I think.
Also whatever code handles this has to have special code for
all of these anyways, so having a variety of interfaces for them
doesn't seem like the end of the world to me.
>
> Put it this way: Given a thousand machines, I want to gather,
> collate, and correlate all these events. I want to be able to produce
> a "life story" of sorts for a machine and for a data center. Once I
> can do that, I can start to make predictive diagnoses more accurately,
> and I can know how much these things actually COST us.
Sure sounds nice. But frankly I don't see it happening. It would
be just too radical a change of too much code.
-Andi
next prev parent reply other threads:[~2009-01-15 23:03 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-27 15:50 x86/mce merge, integration hickup + crash, design thoughts Ingo Molnar
2008-12-27 22:51 ` Ingo Molnar
2008-12-29 21:41 ` Andi Kleen
2009-01-13 17:45 ` Ingo Molnar
2009-01-13 18:57 ` Tim Hockin
2009-01-14 9:29 ` Andi Kleen
2009-01-14 16:18 ` Tim Hockin
2009-01-14 18:05 ` Andi Kleen
2009-01-14 19:32 ` Tim Hockin
2009-01-15 22:56 ` Andi Kleen [this message]
2009-01-15 23:39 ` Tim Hockin
2009-01-14 2:02 ` Huang Ying
2008-12-30 21:13 ` Russ Anderson
2008-12-31 13:32 ` Andi Kleen
2008-12-31 18:09 ` Russ Anderson
2008-12-29 21:51 ` Andi Kleen
2008-12-30 6:50 ` Ingo Molnar
2008-12-30 9:13 ` Andi Kleen
2008-12-30 21:29 ` Russ Anderson
2009-01-12 22:02 ` Tim Hockin
2009-01-13 5:02 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=496FBF2B.4010809@linux.intel.com \
--to=ak@linux.intel.com \
--cc=adurbin@gmail.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=priyankag@google.com \
--cc=tglx@linutronix.de \
--cc=thockin@gmail.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox