From: Mauro Carvalho Chehab <mchehab@redhat.com>
To: Borislav Petkov <bp@amd64.org>
Cc: Tony Luck <tony.luck@intel.com>, Ingo Molnar <mingo@elte.hu>,
EDAC devel <linux-edac@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Borislav Petkov <borislav.petkov@amd.com>
Subject: Re: [PATCH 4/4] EDAC: Convert AMD EDAC pieces to use RAS printk buffer
Date: Fri, 02 Mar 2012 11:52:59 -0300 [thread overview]
Message-ID: <4F50DECB.8030200@redhat.com> (raw)
In-Reply-To: <1330698314-9863-5-git-send-email-bp@amd64.org>
Em 02-03-2012 11:25, Borislav Petkov escreveu:
> From: Borislav Petkov <borislav.petkov@amd.com>
>
> This is an initial version of the patch which converts MCE decoding
> facilities to use the RAS printk buffer. When there's no userspace agent
> running (i.e., /sys/devices/system/ras/agent == 0), we fall back to the
> default printk's into dmesg which is what we've been doing so far.
>
> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
> ---
> drivers/edac/amd64_edac.c | 8 ++-
> drivers/edac/edac_mc.c | 42 ++++++---
> drivers/edac/mce_amd.c | 217 ++++++++++++++++++++++++---------------------
> 3 files changed, 149 insertions(+), 118 deletions(-)
>
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 08413377a43b..29e153c57e33 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -1,6 +1,7 @@
> -#include "amd64_edac.h"
> #include <asm/amd_nb.h>
> +#include <asm/ras.h>
>
> +#include "amd64_edac.h"
> static struct edac_pci_ctl_info *amd64_ctl_pci;
>
> static int report_gart_errors;
> @@ -1901,7 +1902,10 @@ static void amd64_handle_ce(struct mem_ctl_info *mci, struct mce *m)
> sys_addr = get_error_address(m);
> syndrome = extract_syndrome(m->status);
>
> - amd64_mc_err(mci, "CE ERROR_ADDRESS= 0x%llx\n", sys_addr);
> + if (ras_agent)
> + ras_printk(PR_EMERG, "ERR_ADDR: 0x%llx", sys_addr);
> + else
> + amd64_mc_err(mci, "CE ERROR_ADDRESS= 0x%llx\n", sys_addr);
>
> pvt->ops->map_sysaddr_to_csrow(mci, sys_addr, syndrome);
> }
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index ca6c04d350ee..3b3db477b5d0 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -30,8 +30,10 @@
> #include <asm/uaccess.h>
> #include <asm/page.h>
> #include <asm/edac.h>
> +#include <asm/ras.h>
> #include "edac_core.h"
> #include "edac_module.h"
> +#include "mce_amd.h"
>
> /* lock to memory controller's control array */
> static DEFINE_MUTEX(mem_ctls_mutex);
> @@ -701,14 +703,22 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
> return;
> }
>
> - if (edac_mc_get_log_ce())
> - /* FIXME - put in DIMM location */
> - edac_mc_printk(mci, KERN_WARNING,
> - "CE page 0x%lx, offset 0x%lx, grain %d, syndrome "
> - "0x%lx, row %d, channel %d, label \"%s\": %s\n",
> - page_frame_number, offset_in_page,
> - mci->csrows[row].grain, syndrome, row, channel,
> - mci->csrows[row].channels[channel].label, msg);
> + if (edac_mc_get_log_ce()) {
> + if (ras_agent)
> + ras_printk(PR_CONT, ", row: %d, channel: %d\n",
> + row, channel);
> + else
> + /* FIXME - put in DIMM location */
> + edac_mc_printk(mci, KERN_WARNING,
> + "CE page 0x%lx, offset 0x%lx, grain %d,"
> + " syndrome 0x%lx, row %d, channel %d,"
> + " label \"%s\": %s\n",
> + page_frame_number, offset_in_page,
> + mci->csrows[row].grain, syndrome,
> + row, channel,
> + mci->csrows[row].channels[channel].label,
> + msg);
> + }
As I've commented already, This piece of the code is not ok, due
to several reasons:
- the "ras_agent" helper functions is used only for amd64_edac. There's
no reason for use it elsewhere;
- the code here is adding the location only for CE errors. The location
of the error is also pertinent for the other types of errors;
- on my patches, this function disappears, being replaced by a single
function, that can be used to report all types of memory errors;
- non MCA drivers should also generate tracepoints;
- it is a way easier to add just one call to the tracepoint function here, than
to spread on all drivers.
As it is shown at:
http://git.kernel.org/?p=linux/kernel/git/mchehab/linux-edac.git;a=commitdiff_plain;h=4eb2a29419c1fefd76c8dbcd308b84a4b52faf4d
Once we made an agreement with regards to the tracepoint function,
a change similar to what it was proposed there, e. g.:
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 37d2c97..2dca0e3 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -899,7 +899,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
const int layer2,
const char *msg,
const char *other_detail,
- const void *mcelog)
+ const void *arch_log)
{
unsigned long remapped_page;
/* FIXME: too much for stack: move it to some pre-alocated area */
@@ -1033,8 +1047,17 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
"page 0x%lx offset 0x%lx grain %d\n",
page_frame_number, offset_in_page, grain);
+#ifdef CONFIG_X86
+ if (arch_log)
+ trace_mc_error_mce(type, mci->mc_idx, msg, label, location,
+ detail, other_detail, arch_log);
+ else
+ trace_mc_error(type, mci->mc_idx, msg, label, location,
+ detail, other_detail);
+#else
trace_mc_error(type, mci->mc_idx, msg, label, location,
detail, other_detail);
+#endif
if (type == HW_EVENT_ERR_CORRECTED) {
if (edac_mc_get_log_ce())
is enough to generate traces for all existing drivers.
Regards,
Mauro
next prev parent reply other threads:[~2012-03-02 14:53 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-02 14:25 [RFC -v2 PATCH 0/3] RAS: Use MCE tracepoint for decoded MCEs Borislav Petkov
2012-03-02 14:25 ` [PATCH 1/4] mce: Slim up struct mce Borislav Petkov
2012-03-02 17:47 ` Luck, Tony
2012-03-03 7:37 ` Ingo Molnar
2012-03-05 9:17 ` Borislav Petkov
2012-03-02 14:25 ` [PATCH 2/4] mce: Add a msg string to the MCE tracepoint Borislav Petkov
2012-03-02 14:25 ` [PATCH 3/4] x86, RAS: Add a decoded msg buffer Borislav Petkov
2012-03-02 14:25 ` [PATCH 4/4] EDAC: Convert AMD EDAC pieces to use RAS printk buffer Borislav Petkov
2012-03-02 14:52 ` Mauro Carvalho Chehab [this message]
2012-03-05 11:04 ` Borislav Petkov
2012-03-05 11:43 ` Mauro Carvalho Chehab
2012-03-05 12:44 ` Borislav Petkov
2012-03-05 13:35 ` Mauro Carvalho Chehab
2012-03-05 14:13 ` Borislav Petkov
2012-03-05 14:58 ` Mauro Carvalho Chehab
2012-03-05 22:00 ` [PATCHv5] EDAC core changes in order to properly report errors from all types of memory controllers Mauro Carvalho Chehab
2012-03-05 23:23 ` Borislav Petkov
2012-03-06 11:31 ` Mauro Carvalho Chehab
2012-03-06 12:16 ` Borislav Petkov
2012-03-07 0:20 ` [PATCHv7] " Mauro Carvalho Chehab
2012-03-07 8:42 ` Borislav Petkov
2012-03-07 11:36 ` Mauro Carvalho Chehab
2012-03-07 12:06 ` Borislav Petkov
2012-03-07 12:13 ` Mauro Carvalho Chehab
2012-03-02 14:41 ` [RFC -v2 PATCH 0/3] RAS: Use MCE tracepoint for decoded MCEs Mauro Carvalho Chehab
2012-03-02 14:48 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F50DECB.8030200@redhat.com \
--to=mchehab@redhat.com \
--cc=borislav.petkov@amd.com \
--cc=bp@amd64.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).