public inbox for linux-edac@vger.kernel.org
 help / color / mirror / Atom feed
From: Yazen Ghannam <yazen.ghannam@amd.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>, linux-edac <linux-edac@vger.kernel.org>
Subject: Re: EDAC instances probing
Date: Fri, 11 Dec 2020 14:35:20 -0600	[thread overview]
Message-ID: <20201211203520.GA2128@yaz-nikka.amd.com> (raw)
In-Reply-To: <20201211181915.GD25974@zn.tnic>

On Fri, Dec 11, 2020 at 07:19:15PM +0100, Borislav Petkov wrote:
> Hi guys,
> 
> so we converted a couple of EDAC drivers to per-CPU-family autoprobing
> instead of the PCI device IDs one which needed constant adding of new
> device IDs.
> 
> However easy the new probing is, it spams dmesg on each CPU as it tries
> loading on each CPU, when there's no ECC DIMMs or ECC is disabled.
> Here's the output from a 128 CPU box:
> 
> $ grep EDAC dmesg.log | sed 's/\[.*\] //' | sort | uniq -c
>     128 EDAC amd64: F17h detected (node 0).
>     128 EDAC amd64: Node 0: DRAM ECC disabled.
>       1 EDAC MC: Ver: 3.0.0
> 
> that's 2 lines per CPU.
> 
> Btw, people have complained about the spamming.
> 
> So I tried something clumsy, see below, which fixes this into what it
> should say:
> 
> $ dmesg | grep EDAC
> [    2.693470] EDAC MC: Ver: 3.0.0
> [    8.284461] EDAC amd64: F17h detected (node 0).
> [    8.287953] EDAC amd64: Node 0: DRAM ECC disabled.
> [    8.381430] EDAC amd64: F17h detected (node 1).
> [    8.384684] EDAC amd64: Node 1: DRAM ECC disabled.
> [    8.461902] EDAC amd64: F17h detected (node 2).
> [    8.461993] EDAC amd64: Node 2: DRAM ECC disabled.
> [    8.536907] EDAC amd64: F17h detected (node 3).
> [    8.538923] EDAC amd64: Node 3: DRAM ECC disabled.
> [    8.643213] EDAC amd64: F17h detected (node 4).
> [    8.645474] EDAC amd64: Node 4: DRAM ECC disabled.
> [    8.713411] EDAC amd64: F17h detected (node 5).
> [    8.714818] EDAC amd64: Node 5: DRAM ECC disabled.
> [    8.807825] EDAC amd64: F17h detected (node 6).
> [    8.809882] EDAC amd64: Node 6: DRAM ECC disabled.
> [    8.908043] EDAC amd64: F17h detected (node 7).
> [    8.910883] EDAC amd64: Node 7: DRAM ECC disabled.
> 
> Once per driver instance, however each driver accounts an instance -
> logical node, physical node, whatever.
> 
> So it looks like this, do you guys think this is too ugly to live?
>

I think it's okay. But it could even be a single boolean rather than a
bitmap. At least for amd64_edac_mod, the driver will probe all the
Northbridge/Data Fabric instances even if some fail. I don't know if the
same applies to other EDAC modules. Does this issue affect other
modules?

Also, would it make sense to go back to PCI device probing? We've still
needed to add PCI IDs for almost every model group. Probing by PCI
device should help us avoid this issue and also prevent some messages
where PCI IDs aren't found for supported families. For example, we had
that problem when Family 17h Models 30h-3Fh came out. The module would
load because it recognized Family 17h, but it would fail because the new
PCI IDs for Models 30h-3Fh were not recognized.

Thanks,
Yazen

  reply	other threads:[~2020-12-11 21:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-11 18:19 EDAC instances probing Borislav Petkov
2020-12-11 20:35 ` Yazen Ghannam [this message]
2020-12-11 20:58   ` Borislav Petkov
2021-01-13 20:33     ` Borislav Petkov
2021-01-23  4:45       ` Yazen Ghannam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201211203520.GA2128@yaz-nikka.amd.com \
    --to=yazen.ghannam@amd.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox