All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@amd64.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: "Luck, Tony" <tony.luck@intel.com>,
	Borislav Petkov <bp@amd64.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Arnaldo Carvalho de Melo <acme@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Mauro Carvalho Chehab <mchehab@redhat.com>,
	EDAC devel <linux-edac@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/4] x86, mce: Have MCE persistent event off by default for now
Date: Thu, 5 May 2011 09:17:47 +0200	[thread overview]
Message-ID: <20110505071747.GA3185@aftab> (raw)
In-Reply-To: <20110505063951.GB28015@elte.hu>

On Thu, May 05, 2011 at 02:39:51AM -0400, Ingo Molnar wrote:
> printk events are a compatibility wrapper to allow RAS functionality to have 
> easy and unified access to all system events that matter. The structure of 
> printk events is obviously the log level plus a free-form ASCII string, 
> something like:
> 
>  1- the printk timestamp

Yeah, we want here the timestamp when the event happened.

>  2- the log level of the kernel when the message was generated
>  3- the log level of the message
>  4- the printk message itself, as a free-form string
> 
> > [...] a big issue when you have some heavy duty infrastructure trying to 
> > parse and consume these messages.  We should really consider such stuff a 
> > user visible ABI, and thus not subject to random breakage - which is a 
> > radical departure from our current attitude to printk().
> 
> Indeed, turning printk into an ABI clearly wont fly upstream although i'd 
> expect upstream to *care more* about good printk messages if the RAS daemon 
> starts making good use of it. Any printk message that turns out to be useful 
> can be turned into an ABI by defining a proper structured event out of it, via 
> TRACE_EVENT() et al.

Actually let's have the RAS printk's as TRACE_EVENT's from the start
- it's not like we're going to convert every printk call into a RAS
printk event. We only want relevant ones from traps.c, maybe some power
management events, fs, maybe some critical security stuff, etc.

> This does not mean that it's not *useful* to allow the streaming of all print 
> evnts to the RAS daemon. They are available, they get generated and they 
> clearly look useful to me, and it will be useful when a sysadmin looks at the 
> RAS log to figure out an incident.
> 
> Consider an example of two logs, one with just pure RAS events, the other with 
> printk lines (and user-space events, see my patch a couple of months ago that 
> allows event injection for critical user-space events as well) embedded:
> 
> The MCE-only log:
> 
>  Subsystem  |  Time           | event
>  ------------------------------------------------------------------
>  [MCE]         May 5 05:23:56   correctable MCE event on memory bank X
>  [MCE]         May 5 06:19:59   correctable MCE event on memory bank X
> 
> Versus a broader, unified log (all events come via the perf event mmap 
> ring-buffer, ordered properly and delivered quickly and transparently):

Yes, especially since we can get it out to userspace even faster than
printk :).

>  Subsystem  |  Time           | event
>  ------------------------------------------------------------------
>  [MCE]         May 5 05:23:56   correctable MCE event on memory bank X
>  [printk]      May 5 06:19:53   thermal trip triggered
>  [MCE]         May 5 06:19:59   correctable MCE event on memory bank X
>  [fault]       May 5 06:20:00   delivered SIGSEGV to task 'httpd' 
>  [httpd]       May 5 06:20:00   unexpected restart
>  [printk]      May 5 06:20:01   EXT4-fs (9345): group descriptors corrupted!

   ^^ I wouldn't even call it "printk" since this is obviously [fs].
   The printk event should have a field that says from which subsys the
   printk comes from and thus make it so integrated that it is even
   invisible :).

And
>  [printk]      May 5 06:19:53   thermal trip triggered

could be

>  [pm]      May 5 06:19:53   thermal trip triggered

for power management.

> As a sysadmin i might misinterpret the first one as a low and still acceptable 
> rate of correctable MCE errors: roughly one event per hour.
> 
> I'd take the second log *much* more seriously and would prioritize this 
> incident as it likely indicates bad (overheating?) hardware and user-visible 
> crashes and possible uncorrected data corruption.
> 
> Note that we made use of printk events, fault events and user-space injected 
> events as well, in addition to the primary MCE events.
> 
> And yes, some of the printk events, if they are relied on frequently and 
> programmatically, will be turned into proper events - and this process is 
> helped by printk events.
> 
> As i understood it, being useful in such a way is one of the main goals of the 
> new RAS daemon.

Yep, this is absolutely what we want to do - we want to have RAS
events not only coupled with hardware events but actually collect all
_relevant_ events into a common RAS log that is very lightweight and
doesn't rely on printk. It can, _however_, be turned into printk's (and
it should) if no daemon is running. And yes, we can have special tags
like "[[..]]" or whatever to be able to grep it out even from dmesg.

Cool,
thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

  reply	other threads:[~2011-05-05  7:18 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-02 17:34 [PATCH 0/4] RAS daemon: kernel part Borislav Petkov
2011-05-02 17:34 ` [PATCH 1/4] perf: Start the restructuring Borislav Petkov
2011-05-02 17:34 ` [PATCH 2/4] perf: Add persistent event facilities Borislav Petkov
2011-05-03  6:40   ` Ingo Molnar
2011-05-03  6:48     ` Ingo Molnar
2011-05-03  7:12     ` Borislav Petkov
2011-05-03  8:22       ` Ingo Molnar
2011-05-03 12:51         ` [GIT PULL] Rename perf_event.c Borislav Petkov
2011-05-03 12:59     ` [PATCH 2/4] perf: Add persistent event facilities Frederic Weisbecker
2011-05-03 13:30       ` Borislav Petkov
2011-05-03 14:26         ` Borislav Petkov
2011-05-02 17:34 ` [PATCH 3/4] x86, mce: Add persistent MCE event Borislav Petkov
2011-05-03  6:44   ` Ingo Molnar
2011-05-03  7:18     ` Borislav Petkov
2011-05-03  8:27       ` Ingo Molnar
2011-05-03 15:14       ` Joe Perches
2011-05-03 15:22         ` Borislav Petkov
2011-05-03 15:32           ` Joe Perches
2011-05-03 15:34           ` Steven Rostedt
2011-05-03 15:42             ` Borislav Petkov
2011-05-02 17:34 ` [PATCH 4/4] x86, mce: Have MCE persistent event off by default for now Borislav Petkov
2011-05-03  6:45   ` Ingo Molnar
2011-05-03  7:23     ` Borislav Petkov
2011-05-03  8:17       ` Ingo Molnar
2011-05-03 17:17       ` Luck, Tony
2011-05-03 19:52         ` Borislav Petkov
2011-05-03 19:56           ` Ingo Molnar
2011-05-04  6:58         ` Ingo Molnar
2011-05-04 21:40           ` Luck, Tony
2011-05-05  1:34             ` Arnaldo Carvalho de Melo
2011-05-05  6:39             ` Ingo Molnar
2011-05-05  7:17               ` Borislav Petkov [this message]
2011-05-05  7:33                 ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110505071747.GA3185@aftab \
    --to=bp@amd64.org \
    --cc=acme@infradead.org \
    --cc=fweisbec@gmail.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@redhat.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.