All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <ak@linux.intel.com>
To: Tim Hockin <thockin@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	ying.huang@intel.com, Aaron Durbin <adurbin@gmail.com>,
	priyankag@google.com
Subject: Re: x86/mce merge, integration hickup + crash, design thoughts
Date: Thu, 15 Jan 2009 23:56:43 +0100	[thread overview]
Message-ID: <496FBF2B.4010809@linux.intel.com> (raw)
In-Reply-To: <b3ece790901141132u28ba2482h2e7af7bd51224f2a@mail.gmail.com>

Tim Hockin wrote:
> Yeah, no offense, but that's horrible :)

I'm not sure it's worse than the XML like format proposals that seem to
get thrown around. That is I am the only one who mentioned the
X word yet, but the structured ASCII records that have been hinted
at would be exactly like that.

> 
> Ideally, I'd rather see a more generic conduit for all sorts of
> events.  Polled and exception MCEs.  Thermal interrupts.  MCE
> threshold interrupts. 

Actually I think now MCE threshold interrupts should have never been
separate events. That was a design mistake in the AMD implementation
(together with all the sysfs complications)

An MCE threshold interrupt is just a slightly different internal
notification mechanism and it should only trigger the events it reads
from the MCE banks. Nothing more.
My upcoming CMCI code works exactly this way.

> PCI-express errors. 

Yes we need some mechanism for those. Fortunately that's easier
because it doesn't need to handle NMIs.

> SATA
> disk timeouts.

Now that's a different issue. Generalized driver error reporting for everyone.

There was a lot of discussion some years ago from a IBM proposal to do
in general structured error reporting. But that was quite unpopular
and no-one really liked it.

What came out of it was the dev_printk() stuff that allows
to match error messages to devices. So you already have some
baby steps in this direction.

I suspect doing this fully generalized would be quite difficult
because there would be so many people you have to convince.



> Now I know there are different conduits for some events - netlink
> tells me about netif link up/down events I think.  I would settle for
> a small number of interfaces.  What I don't want is what we have today
> - EVERYTHING has a different interface.  Some are poll()-able.  Some
> have to be actively polled.  Some have to have a daemon listening or
> else messages are dropped.  

Well the kernel will always have limited buffers, so the someone
needs to listen problem will be always there.

There are not __that__ many I think.

Also whatever code handles this has to have special code for
all of these anyways, so having a variety of interfaces for them
doesn't seem like the end of the world to me.

> 
> Put it this way:  Given a thousand machines, I want to gather,
> collate, and correlate all these events.  I want to be able to produce
> a "life story" of sorts for a machine and for a data center.  Once I
> can do that, I can start to make predictive diagnoses more accurately,
> and I can know how much these things actually COST us.

Sure sounds nice. But frankly I don't see it happening. It would
be just too radical a change of too much code.

-Andi


  reply	other threads:[~2009-01-15 23:03 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-27 15:50 x86/mce merge, integration hickup + crash, design thoughts Ingo Molnar
2008-12-27 22:51 ` Ingo Molnar
2008-12-29 21:41   ` Andi Kleen
2009-01-13 17:45     ` Ingo Molnar
2009-01-13 18:57       ` Tim Hockin
2009-01-14  9:29         ` Andi Kleen
2009-01-14 16:18           ` Tim Hockin
2009-01-14 18:05             ` Andi Kleen
2009-01-14 19:32               ` Tim Hockin
2009-01-15 22:56                 ` Andi Kleen [this message]
2009-01-15 23:39                   ` Tim Hockin
2009-01-14  2:02       ` Huang Ying
2008-12-30 21:13   ` Russ Anderson
2008-12-31 13:32     ` Andi Kleen
2008-12-31 18:09       ` Russ Anderson
2008-12-29 21:51 ` Andi Kleen
2008-12-30  6:50   ` Ingo Molnar
2008-12-30  9:13     ` Andi Kleen
2008-12-30 21:29 ` Russ Anderson
2009-01-12 22:02 ` Tim Hockin
2009-01-13  5:02   ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=496FBF2B.4010809@linux.intel.com \
    --to=ak@linux.intel.com \
    --cc=adurbin@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=priyankag@google.com \
    --cc=tglx@linutronix.de \
    --cc=thockin@gmail.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.