linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Tony Luck <tony.luck@intel.com>
Cc: "Joe Perches" <joe@perches.com>,
	"Mauro Carvalho Chehab" <mchehab@redhat.com>,
	"Hidetoshi Seto" <seto.hidetoshi@jp.fujitsu.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"bluesmoke-devel@lists.sourceforge.net"
	<bluesmoke-devel@lists.sourceforge.net>,
	"Linux Edac Mailing List" <linux-edac@vger.kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Ben Woodard" <woodard@redhat.com>,
	"Matt Domsch" <Matt_Domsch@dell.com>,
	"Doug Thompson" <dougthompson@xmission.com>,
	"Borislav Petkov" <bp@amd64.org>,
	"Young, Brent" <brent.young@intel.com>,
	"Peter Zijlstra" <a.p.zijlstra@chello.nl>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Arnaldo Carvalho de Melo" <acme@redhat.com>
Subject: Re: Hardware Error Kernel Mini-Summit
Date: Wed, 19 May 2010 00:00:02 +0200	[thread overview]
Message-ID: <20100518220002.GA23739@elte.hu> (raw)
In-Reply-To: <AANLkTimPRvuoW5-OcPlPx5cvnCTJa7xhAQDSLrYziB4j@mail.gmail.com>


* Tony Luck <tony.luck@intel.com> wrote:

> > This gives us a broad platform to add various RAS 
> > events as well, beyond raw hardware events: we could 
> > for example events for various system anomalies such 
> > as lockup messages, kernel warnings/oopses, IOMMU 
> > exceptions - maybe even pure software concepts such as 
> > fatal segmentation fault events, etc. etc.
> 
> This looks like sticky ground.  I can see the event 
> mechanism passing data to a user daemon working well for 
> all kinds of corrected and minor errors. But when you 
> start talking about lockups and fatal errors things get 
> a lot trickier. Often the main concern at this point is 
> error containment. Making sure that the flaky data 
> doesn't become visible (saved to storage, transmitted to 
> the network, etc.). [...]

I was pointing beyond the narrow hardware (memory) error 
point of view, towards a more generic 'system health' 
thinking.

In the broader view it may makes sense to for example 
define policy over excessive number of segfaults on a 
server system (where excessive segfaults are an anomaly), 
or a suspiciously large number of soft IO errors, etc.

But yes, of course, when it comes to hard memory errors, 
those take precedence, and handling them (and 
saving/propagating information about them while we still 
can) is a priority.

> [...] Getting from a machine check handler through some 
> context switches (and page faults etc.) to a user level 
> daemon before the error gets recorded looks to be really 
> hard.

As Boris mentioned it too, critical policy action can and 
will be done straight in the kernel.

	Ingo

  reply	other threads:[~2010-05-18 22:00 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-17 18:23 Hardware Error Kernel Mini-Summit Mauro Carvalho Chehab
2010-05-17 22:41 ` Andi Kleen
2010-05-18 16:50   ` Mauro Carvalho Chehab
2010-05-18 18:10     ` Andi Kleen
2010-05-18  6:52 ` Hidetoshi Seto
2010-05-18 16:44   ` Mauro Carvalho Chehab
2010-05-18 17:42     ` Joe Perches
2010-05-18 17:59       ` Mauro Carvalho Chehab
2010-05-18 18:45       ` Andi Kleen
2010-05-18 18:57         ` Joe Perches
2010-05-18 18:53       ` Ingo Molnar
2010-05-18 19:08         ` Luck, Tony
2010-05-18 19:18           ` Borislav Petkov
2010-05-18 19:34             ` Ingo Molnar
2010-05-18 22:14             ` Eric W. Biederman
2010-05-18 22:28               ` Andi Kleen
2010-05-19  1:14                 ` Eric W. Biederman
2010-05-19  6:46                   ` Borislav Petkov
2010-05-19  7:09                     ` Ingo Molnar
2010-05-19 11:54                       ` Mauro Carvalho Chehab
2010-05-20 12:37                         ` Ingo Molnar
2010-06-14 10:03                       ` Nils Carlson
2010-06-14 11:49                         ` Andi Kleen
2010-06-14 19:47                           ` Nils Carlson
2010-06-14 20:21                             ` Andi Kleen
2010-06-14 21:02                               ` Nils Carlson
2010-06-14 20:06                           ` Eric W. Biederman
2010-06-14 20:21                             ` Luck, Tony
2010-06-14 20:36                             ` Andi Kleen
2010-06-14 21:34                               ` Tony Luck
2010-06-14 23:46                                 ` Doug Thompson
2010-06-15  6:56                                   ` Andi Kleen
2010-06-15  8:06                                     ` Nils Carlson
2010-06-15 10:01                                       ` Borislav Petkov
2010-06-15 11:41                                       ` Andi Kleen
2010-06-15 12:21                                         ` Nils Carlson
2010-06-15 18:15                                           ` Luck, Tony
2010-06-15 18:38                                             ` Nils Carlson
2010-06-15 19:37                                             ` Andi Kleen
2010-06-15 19:35                                           ` Andi Kleen
2010-06-15 20:48                                             ` Nils Carlson
2010-06-16  9:40                                               ` Andi Kleen
2010-06-15 22:33                                     ` Tony Luck
2010-06-15  6:44                                 ` Andi Kleen
2010-05-19  9:03                   ` Andi Kleen
2010-05-24 16:21                     ` Russ Anderson
2010-05-24 18:26                       ` Andi Kleen
2010-05-19 17:30                   ` Tony Luck
2010-05-24 15:55                     ` Russ Anderson
2010-05-24 17:35                       ` Tony Luck
2010-05-24 18:31                         ` Andi Kleen
2010-05-18 22:29               ` Ingo Molnar
2010-05-18 19:30           ` Ingo Molnar
2010-05-18 20:42             ` Ingo Molnar
2010-05-18 21:37               ` Tony Luck
2010-05-18 22:00                 ` Ingo Molnar [this message]
2010-05-24 17:13                   ` Russ Anderson
2010-05-19  6:39                 ` Ingo Molnar
2010-05-18 13:06 ` Borislav Petkov
2010-05-18 16:52   ` Mauro Carvalho Chehab
2010-05-18 17:06 ` Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100518220002.GA23739@elte.hu \
    --to=mingo@elte.hu \
    --cc=Matt_Domsch@dell.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@redhat.com \
    --cc=bluesmoke-devel@lists.sourceforge.net \
    --cc=bp@amd64.org \
    --cc=brent.young@intel.com \
    --cc=dougthompson@xmission.com \
    --cc=fweisbec@gmail.com \
    --cc=joe@perches.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@redhat.com \
    --cc=mingo@redhat.com \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=woodard@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).