public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@amd64.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: "Luck, Tony" <tony.luck@intel.com>,
	Borislav Petkov <bp@amd64.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Arnaldo Carvalho de Melo <acme@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Mauro Carvalho Chehab <mchehab@redhat.com>,
	EDAC devel <linux-edac@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/4] x86, mce: Have MCE persistent event off by default for now
Date: Thu, 5 May 2011 09:17:47 +0200	[thread overview]
Message-ID: <20110505071747.GA3185@aftab> (raw)
In-Reply-To: <20110505063951.GB28015@elte.hu>

On Thu, May 05, 2011 at 02:39:51AM -0400, Ingo Molnar wrote:
> printk events are a compatibility wrapper to allow RAS functionality to have 
> easy and unified access to all system events that matter. The structure of 
> printk events is obviously the log level plus a free-form ASCII string, 
> something like:
> 
>  1- the printk timestamp

Yeah, we want here the timestamp when the event happened.

>  2- the log level of the kernel when the message was generated
>  3- the log level of the message
>  4- the printk message itself, as a free-form string
> 
> > [...] a big issue when you have some heavy duty infrastructure trying to 
> > parse and consume these messages.  We should really consider such stuff a 
> > user visible ABI, and thus not subject to random breakage - which is a 
> > radical departure from our current attitude to printk().
> 
> Indeed, turning printk into an ABI clearly wont fly upstream although i'd 
> expect upstream to *care more* about good printk messages if the RAS daemon 
> starts making good use of it. Any printk message that turns out to be useful 
> can be turned into an ABI by defining a proper structured event out of it, via 
> TRACE_EVENT() et al.

Actually let's have the RAS printk's as TRACE_EVENT's from the start
- it's not like we're going to convert every printk call into a RAS
printk event. We only want relevant ones from traps.c, maybe some power
management events, fs, maybe some critical security stuff, etc.

> This does not mean that it's not *useful* to allow the streaming of all print 
> evnts to the RAS daemon. They are available, they get generated and they 
> clearly look useful to me, and it will be useful when a sysadmin looks at the 
> RAS log to figure out an incident.
> 
> Consider an example of two logs, one with just pure RAS events, the other with 
> printk lines (and user-space events, see my patch a couple of months ago that 
> allows event injection for critical user-space events as well) embedded:
> 
> The MCE-only log:
> 
>  Subsystem  |  Time           | event
>  ------------------------------------------------------------------
>  [MCE]         May 5 05:23:56   correctable MCE event on memory bank X
>  [MCE]         May 5 06:19:59   correctable MCE event on memory bank X
> 
> Versus a broader, unified log (all events come via the perf event mmap 
> ring-buffer, ordered properly and delivered quickly and transparently):

Yes, especially since we can get it out to userspace even faster than
printk :).

>  Subsystem  |  Time           | event
>  ------------------------------------------------------------------
>  [MCE]         May 5 05:23:56   correctable MCE event on memory bank X
>  [printk]      May 5 06:19:53   thermal trip triggered
>  [MCE]         May 5 06:19:59   correctable MCE event on memory bank X
>  [fault]       May 5 06:20:00   delivered SIGSEGV to task 'httpd' 
>  [httpd]       May 5 06:20:00   unexpected restart
>  [printk]      May 5 06:20:01   EXT4-fs (9345): group descriptors corrupted!

   ^^ I wouldn't even call it "printk" since this is obviously [fs].
   The printk event should have a field that says from which subsys the
   printk comes from and thus make it so integrated that it is even
   invisible :).

And
>  [printk]      May 5 06:19:53   thermal trip triggered

could be

>  [pm]      May 5 06:19:53   thermal trip triggered

for power management.

> As a sysadmin i might misinterpret the first one as a low and still acceptable 
> rate of correctable MCE errors: roughly one event per hour.
> 
> I'd take the second log *much* more seriously and would prioritize this 
> incident as it likely indicates bad (overheating?) hardware and user-visible 
> crashes and possible uncorrected data corruption.
> 
> Note that we made use of printk events, fault events and user-space injected 
> events as well, in addition to the primary MCE events.
> 
> And yes, some of the printk events, if they are relied on frequently and 
> programmatically, will be turned into proper events - and this process is 
> helped by printk events.
> 
> As i understood it, being useful in such a way is one of the main goals of the 
> new RAS daemon.

Yep, this is absolutely what we want to do - we want to have RAS
events not only coupled with hardware events but actually collect all
_relevant_ events into a common RAS log that is very lightweight and
doesn't rely on printk. It can, _however_, be turned into printk's (and
it should) if no daemon is running. And yes, we can have special tags
like "[[..]]" or whatever to be able to grep it out even from dmesg.

Cool,
thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

  reply	other threads:[~2011-05-05  7:18 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-02 17:34 [PATCH 0/4] RAS daemon: kernel part Borislav Petkov
2011-05-02 17:34 ` [PATCH 1/4] perf: Start the restructuring Borislav Petkov
2011-05-02 17:34 ` [PATCH 2/4] perf: Add persistent event facilities Borislav Petkov
2011-05-03  6:40   ` Ingo Molnar
2011-05-03  6:48     ` Ingo Molnar
2011-05-03  7:12     ` Borislav Petkov
2011-05-03  8:22       ` Ingo Molnar
2011-05-03 12:51         ` [GIT PULL] Rename perf_event.c Borislav Petkov
2011-05-03 12:59     ` [PATCH 2/4] perf: Add persistent event facilities Frederic Weisbecker
2011-05-03 13:30       ` Borislav Petkov
2011-05-03 14:26         ` Borislav Petkov
2011-05-02 17:34 ` [PATCH 3/4] x86, mce: Add persistent MCE event Borislav Petkov
2011-05-03  6:44   ` Ingo Molnar
2011-05-03  7:18     ` Borislav Petkov
2011-05-03  8:27       ` Ingo Molnar
2011-05-03 15:14       ` Joe Perches
2011-05-03 15:22         ` Borislav Petkov
2011-05-03 15:32           ` Joe Perches
2011-05-03 15:34           ` Steven Rostedt
2011-05-03 15:42             ` Borislav Petkov
2011-05-02 17:34 ` [PATCH 4/4] x86, mce: Have MCE persistent event off by default for now Borislav Petkov
2011-05-03  6:45   ` Ingo Molnar
2011-05-03  7:23     ` Borislav Petkov
2011-05-03  8:17       ` Ingo Molnar
2011-05-03 17:17       ` Luck, Tony
2011-05-03 19:52         ` Borislav Petkov
2011-05-03 19:56           ` Ingo Molnar
2011-05-04  6:58         ` Ingo Molnar
2011-05-04 21:40           ` Luck, Tony
2011-05-05  1:34             ` Arnaldo Carvalho de Melo
2011-05-05  6:39             ` Ingo Molnar
2011-05-05  7:17               ` Borislav Petkov [this message]
2011-05-05  7:33                 ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110505071747.GA3185@aftab \
    --to=bp@amd64.org \
    --cc=acme@infradead.org \
    --cc=fweisbec@gmail.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@redhat.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox