linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab@redhat.com>
To: Borislav Petkov <bp@amd64.org>
Cc: Linux Edac Mailing List <linux-edac@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 01/31] events/hw_event: Create a Hardware Events Report Mecanism (HERM)
Date: Fri, 10 Feb 2012 12:17:51 -0200	[thread overview]
Message-ID: <4F35270F.1020402@redhat.com> (raw)
In-Reply-To: <20120210134115.GC16783@aftab>

Em 10-02-2012 11:41, Borislav Petkov escreveu:
> On Thu, Feb 09, 2012 at 10:01:00PM -0200, Mauro Carvalho Chehab wrote:
>> In order to provide a proper hardware event subsystem, let's
>> encapsulate hardware events into a common trace facility, and
>> make both edac and mce drivers to use it. After that, common
>> facilities can be moved into a new core for hardware events
>> reporting subsystem. This patch is the first of a series, and just
>> touches at mce.
> 
> I think it would work too if you had only one event:
> 
> * trace_hw_error(...)
> 
> which would have as an argument a string describing it, like
> "Uncorrected Memory Read Error", "Memory Read Error (out of range)" "TLB
> Multimatch Error" etc., followed by the rest of the error info.
> 
> Currently, you're introducing at least 5 trace_* calls _only_ for memory
> errors. What about the remaining couples of tens of errors which haven't
> been addressed yet?

Good point.

The way I see it is that:

- a non-memory related, non-parsed MCE event would generate a "mce_record" trace
	(we need an additional patch to disable it when the error is parsed.
	 I'll address it after finishing the tests with a few other platforms);

As more MCE parsers are added at the core, the situations where such event will
be generated will reduce, and will eventually disappear in long term.

- a non-x86 event (or a x86 event for a memory controller that is not addressed
by MCE events) will use a "mc_error";

- a x86 event generated via MCE will use a "mc_error_mce".

There are two special events defined when there's a memory error _and_ a driver
bug:

	"mc_out_of_range_mce" and "mc_out_of_range".

While the name of them and one of the parameters are memory-controller specific,
it should be easy to make it generic enough to be used by other types of errors.

The previous EDAC logic were to generate an out of range printk and return. With
the changes I made, it is possible to let the EDAC to provide the information
parsed, just discarding the bad parsed value. That's the approach I took, as the
other information there may be useful. By taking such approach, the MCE information
will be shown by the "mc_error_mce" trace. So, we can remove the "mc_out_of_range_mce"
without loosing any information.

In any case, we can't merge the *_mce with the non-mce variant, as the mce.h header
is arch specific and doesn't exist on PPC and tilera architectures.

So, the only event that we can actually remove is "mc_out_of_range_mce", if we let
the core generate two events for badly parsed error events. What do you think?

Regards,
Mauro



> 
> Thanks.
> 


  reply	other threads:[~2012-02-10 14:18 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 01/31] events/hw_event: Create a " Mauro Carvalho Chehab
2012-02-10 13:41   ` Borislav Petkov
2012-02-10 14:17     ` Mauro Carvalho Chehab [this message]
2012-02-12 12:48       ` Borislav Petkov
2012-02-12 17:21         ` Mauro Carvalho Chehab
2012-02-12 18:44           ` Borislav Petkov
2012-02-12 19:38             ` Mauro Carvalho Chehab
2012-02-13  9:21               ` Borislav Petkov
2012-02-13 10:23                 ` Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 02/31] events/hw_event: use __string() trace macros for events Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 03/31] hw_event: Consolidate uncorrected/corrected error msgs into one Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 04/31] drivers/edac: rename channel_info to csrow_channel_info Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 05/31] edac: Create a dimm struct and move the labels into it Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 06/31] edac: Add per dimm's sysfs nodes Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 07/31] edac: Prepare to push down to drivers the filling of the dimm_info Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 08/31] edac: Better describe the memory concepts The memory terms changed along the time, since when EDAC were originally written: new concepts were introduced, and some things have different meanings, depending on the memory architecture. Better define those terms, and better describe each supported memory type Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 09/31] i5400_edac: Convert it to report memory with the new location Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 10/31] i7300_edac: " Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 11/31] edac: move dimm properties to struct dimm_info Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 12/31] edac: Don't initialize csrow's first_page & friends when not needed Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 13/31] edac: move nr_pages to dimm struct Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 14/31] edac: Add per-dimm sysfs show nodes Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 15/31] edac: DIMM location cleanup Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 16/31] edac/ppc4xx_edac: Fix compilation Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 17/31] edac-mc: Allow reporting errors on a non-csrow oriented way Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 18/31] edac.h: Use kernel-doc-nano-HOWTO.txt notation for enums Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 19/31] edac: rework memory layer hierarchy description Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 20/31] edac: Export MC hierarchy counters for CE and UE Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 21/31] hw_event: Add x86 MCE events on it Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 22/31] amd64_edac: convert it to use the MCE log tracepoint where applicable Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 23/31] edac: Simplify logs for i7core and sb edac drivers Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 24/31] edac_mc: Some clenups at the log message Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 25/31] edac: Add a sysfs node to test the EDAC error report facility Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 26/31] edac_mc: Fix the enable label filter logic Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 27/31] edac: Initialize the dimm label with the known information Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 28/31] edac: don't OOPS if the csrow is not visible Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 29/31] edac: Fix sysfs csrow?/*ce*count counters Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 30/31] edac: Fix new error counts Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 31/31] edac: Fix per layer error count counters Mauro Carvalho Chehab
2012-02-10 13:26 ` [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Borislav Petkov
2012-02-10 16:39   ` Mauro Carvalho Chehab
2012-02-12 12:08     ` Borislav Petkov
2012-02-12 17:10       ` Mauro Carvalho Chehab
2012-02-13 21:29         ` Luck, Tony
2012-02-10 16:48 ` [PATCH v3 32/31] edac: restore mce.h file Mauro Carvalho Chehab
2012-02-13  9:23 ` [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F35270F.1020402@redhat.com \
    --to=mchehab@redhat.com \
    --cc=bp@amd64.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).