public inbox for linux-edac@vger.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Tony Luck <tony.luck@intel.com>, Yazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Subject: EDAC instances probing
Date: Fri, 11 Dec 2020 19:19:15 +0100	[thread overview]
Message-ID: <20201211181915.GD25974@zn.tnic> (raw)

Hi guys,

so we converted a couple of EDAC drivers to per-CPU-family autoprobing
instead of the PCI device IDs one which needed constant adding of new
device IDs.

However easy the new probing is, it spams dmesg on each CPU as it tries
loading on each CPU, when there's no ECC DIMMs or ECC is disabled.
Here's the output from a 128 CPU box:

$ grep EDAC dmesg.log | sed 's/\[.*\] //' | sort | uniq -c
    128 EDAC amd64: F17h detected (node 0).
    128 EDAC amd64: Node 0: DRAM ECC disabled.
      1 EDAC MC: Ver: 3.0.0

that's 2 lines per CPU.

Btw, people have complained about the spamming.

So I tried something clumsy, see below, which fixes this into what it
should say:

$ dmesg | grep EDAC
[    2.693470] EDAC MC: Ver: 3.0.0
[    8.284461] EDAC amd64: F17h detected (node 0).
[    8.287953] EDAC amd64: Node 0: DRAM ECC disabled.
[    8.381430] EDAC amd64: F17h detected (node 1).
[    8.384684] EDAC amd64: Node 1: DRAM ECC disabled.
[    8.461902] EDAC amd64: F17h detected (node 2).
[    8.461993] EDAC amd64: Node 2: DRAM ECC disabled.
[    8.536907] EDAC amd64: F17h detected (node 3).
[    8.538923] EDAC amd64: Node 3: DRAM ECC disabled.
[    8.643213] EDAC amd64: F17h detected (node 4).
[    8.645474] EDAC amd64: Node 4: DRAM ECC disabled.
[    8.713411] EDAC amd64: F17h detected (node 5).
[    8.714818] EDAC amd64: Node 5: DRAM ECC disabled.
[    8.807825] EDAC amd64: F17h detected (node 6).
[    8.809882] EDAC amd64: Node 6: DRAM ECC disabled.
[    8.908043] EDAC amd64: F17h detected (node 7).
[    8.910883] EDAC amd64: Node 7: DRAM ECC disabled.

Once per driver instance, however each driver accounts an instance -
logical node, physical node, whatever.

So it looks like this, do you guys think this is too ugly to live?

---
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index f7087ddddb90..de37d0d9a27b 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3581,6 +3581,7 @@ static int probe_one_instance(unsigned int nid)
 
 	dump_misc_regs(pvt);
 
+	set_bit(nid, edac_get_probed_instances());
 	return ret;
 
 err_enable:
@@ -3591,6 +3592,7 @@ static int probe_one_instance(unsigned int nid)
 	kfree(s);
 	ecc_stngs[nid] = NULL;
 
+	set_bit(nid, edac_get_probed_instances());
 err_out:
 	return ret;
 }
@@ -3674,6 +3676,10 @@ static int __init amd64_edac_init(void)
 		goto err_free;
 
 	for (i = 0; i < amd_nb_num(); i++) {
+
+		if (test_bit(i, edac_get_probed_instances()))
+			continue;
+
 		err = probe_one_instance(i);
 		if (err) {
 			/* unwind properly */
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index f6d462d0be2d..f97186237ccc 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -53,6 +53,15 @@ static LIST_HEAD(mc_devices);
  */
 static const char *edac_mc_owner;
 
+/* bitmap of already probed driver instances, 64 should be big enough. :-P */
+static DECLARE_BITMAP(probed_instances, 64);
+
+unsigned long *edac_get_probed_instances(void)
+{
+	return probed_instances;
+}
+EXPORT_SYMBOL_GPL(edac_get_probed_instances);
+
 static struct mem_ctl_info *error_desc_to_mci(struct edac_raw_error_desc *e)
 {
 	return container_of(e, struct mem_ctl_info, error_desc);
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 881b00eadf7a..7c0d4ac7c35a 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -255,4 +255,6 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
  */
 extern char *edac_op_state_to_string(int op_state);
 
+unsigned long *edac_get_probed_instances(void);
+
 #endif				/* _EDAC_MC_H_ */

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

             reply	other threads:[~2020-12-11 19:54 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-11 18:19 Borislav Petkov [this message]
2020-12-11 20:35 ` EDAC instances probing Yazen Ghannam
2020-12-11 20:58   ` Borislav Petkov
2021-01-13 20:33     ` Borislav Petkov
2021-01-23  4:45       ` Yazen Ghannam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201211181915.GD25974@zn.tnic \
    --to=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox