public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Borislav Petkov <bp@amd64.org>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>,
	Linux Edac Mailing List <linux-edac@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Aristeu Rozanski <arozansk@redhat.com>,
	Doug Thompson <norsk5@yahoo.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH v24b] RAS: Add a tracepoint for reporting memory controller events
Date: Fri, 18 May 2012 09:12:44 +0200	[thread overview]
Message-ID: <20120518071244.GE429@gmail.com> (raw)
In-Reply-To: <20120517214859.GA16777@aftab.osrc.amd.com>


* Borislav Petkov <bp@amd64.org> wrote:

> On Thu, May 17, 2012 at 05:41:17PM -0300, Mauro Carvalho Chehab wrote:
> > Add a new tracepoint-based hardware events report method for
> > reporting Memory Controller events.
> > 
> > Part of the description bellow is shamelessly copied from Tony
> > Luck's notes about the Hardware Error BoF during LPC 2010 [1].
> > Tony, thanks for your notes and discussions to generate the
> > h/w error reporting requirements.
> > 
> > [1] http://lwn.net/Articles/416669/
> > 
> >     We have several subsystems & methods for reporting hardware errors:
> > 
> >     1) EDAC ("Error Detection and Correction").  In its original form
> >     this consisted of a platform specific driver that read topology
> >     information and error counts from chipset registers and reported
> >     the results via a sysfs interface.
> > 
> >     2) mcelog - x86 specific decoding of machine check bank registers
> >     reporting in binary form via /dev/mcelog. Recent additions make use
> >     of the APEI extensions that were documented in version 4.0a of the
> >     ACPI specification to acquire more information about errors without
> >     having to rely reading chipset registers directly. A user level
> >     programs decodes into somewhat human readable format.
> > 
> >     3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
> >     decodes errors reported via machine check bank registers in AMD
> >     processors to the console log using printk();
> > 
> >     Each of these mechanisms has a band of followers ... and none
> >     of them appear to meet all the needs of all users.
> > 
> > As part of a RAS subsystem, let's encapsulate the memory error hardware
> > events into a trace facility.
> > 
> > The tracepoint printk will be displayed like:
> > 
> > mc_event: (Corrected|Uncorrected|Fatal) error:[error msg] on memory stick "[label]" ([location] [edac_mc detail] [driver_detail])
> > 
> > Where:
> > 	[error msg] is the driver-specific error message
> > 		    (e. g. "memory read", "bus error", ...);
> > 	[location] is the location in terms of memory controller and
> > 		   branch/channel/slot, channel/slot or csrow/channel;
> > 	[label] is the memory stick label;
> > 	[edac_mc detail] describes the address location of the error
> > 			 and the syndrome;
> > 	[driver detail] is driver-specifig error message details,
> > 			when needed/provided (e. g. "area:DMA", ...)
> > 
> > For example:
> > 
> > mc_event: Corrected error:memory read on memory stick "DIMM_1A" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)
> > 
> > Of course, any userspace tools meant to handle errors should not parse
> > the above data. They should, instead, use the binary fields provided by
> > the tracepoint, mapping them directly into their MIBs.
> 
> Nacked-by: Borislav Petkov <borislav.petkov@amd.com>

Just wondering why this got nacked, and what the 
suggestions/plans are to improve the situation: I assume Mauro 
is working on these things to solve problems, or to add 
features, Mauro could you please give a higher level list of 
those problems or features? There must be more to it than just a 
new tracepoint! :-)

Thanks,

	Ingo

  reply	other threads:[~2012-05-18  7:12 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-17 20:41 [PATCH v24b] RAS: Add a tracepoint for reporting memory controller events Mauro Carvalho Chehab
2012-05-17 21:48 ` Borislav Petkov
2012-05-18  7:12   ` Ingo Molnar [this message]
2012-05-18  9:56     ` Borislav Petkov
2012-05-18 10:59       ` Mauro Carvalho Chehab
2012-05-18 12:43         ` Borislav Petkov
2012-05-18 13:23           ` Mauro Carvalho Chehab
2012-05-18 14:05             ` Borislav Petkov
2012-05-18 14:31               ` Mauro Carvalho Chehab
2012-05-18 16:40                 ` Borislav Petkov
2012-05-18 17:27                   ` Mauro Carvalho Chehab
2012-05-18 18:52                     ` Borislav Petkov
2012-05-18 19:10                       ` Luck, Tony
2012-05-18 21:12                         ` Borislav Petkov
2012-05-19  9:26                           ` Borislav Petkov
2012-05-21 15:29                             ` Mauro Carvalho Chehab
2012-05-21 16:00                               ` Borislav Petkov
2012-05-21 16:40                                 ` Mauro Carvalho Chehab
2012-05-21 20:40                                   ` Borislav Petkov
2012-05-22  3:04                                     ` Mauro Carvalho Chehab
2012-05-22  9:28                                       ` Borislav Petkov
2012-05-22 10:18                                         ` Mauro Carvalho Chehab
2012-05-22 13:05                                           ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120518071244.GE429@gmail.com \
    --to=mingo@kernel.org \
    --cc=arozansk@redhat.com \
    --cc=bp@amd64.org \
    --cc=fweisbec@gmail.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@redhat.com \
    --cc=mingo@redhat.com \
    --cc=norsk5@yahoo.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox