public inbox for linux-edac@vger.kernel.org
 help / color / mirror / Atom feed
From: "M K, Muralidhara" <muralimk@amd.com>
To: Borislav Petkov <bp@alien8.de>, Yazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	mchehab@kernel.org, Muralidhara M K <muralidhara.mk@amd.com>,
	Avadhut Naik <Avadhut.Naik@amd.com>
Subject: Re: [PATCH v2 1/4] EDAC/mce_amd: Remove SMCA Extended Error code descriptions
Date: Fri, 27 Oct 2023 10:35:33 +0530	[thread overview]
Message-ID: <ba6eea97-116a-4678-7800-d24692c65cd6@amd.com> (raw)
In-Reply-To: <20231026134016.GDZTpsQDYU4Ll6sAA3@fat_crate.local>



On 10/26/2023 7:10 PM, Borislav Petkov wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Thu, Oct 26, 2023 at 09:05:51AM -0400, Yazen Ghannam wrote:
>> Post-processing is one of the features that Avadhut implemented.
>>
>> https://github.com/mchehab/rasdaemon/commit/932118b04a04104dfac6b8536419803f236e6118
> 

Hi Yazen, Thanks for pointing to this commit. Yes I do remember.


> Yes, now try to decode the error with rasdaemon this way, by supplying
> the fields.
> 
> Then explain step-by-step what you've done in the commit message and in
> a documentation file in Documentation/ras/ so that people can find it
> and can actually do the decoding themselves.
> 
> It needs to be absolutely easy to decode those errors. Not tell people:
> "go look for the error description in the PPR".
> 
Yes, we have offline decoding option in rasdaemon

For example:
$ rasdaemon -p --status 0xdc2040000000011b --ipid 0x0000609600092f00 --smca
2023-10-26 23:51:34 -0500, Unified Memory Controller (bank=0), mca: DRAM 
ECC error. Ext Err Code: 0 Memory Error 'mem-tx: generic read, tx: 
generic, level: L3/generic', mci: Error_overflow CECC, Locn: 
memory_channel=0,csrow=0, Error Msg: Corrected error, no action required.

Observed the error string "mca: DRAM ECC error. Ext Err Code: 0"


Also, we can pass particular family/model to decode, Ex:for MI300A

$ rasdaemon -p --status 0xdc2040000000011b --ipid 0x0000609600092f00 
--smca --family 0x19 --model 0x90 --bank 19
2023-10-26 23:52:09 -0500, Unified Memory Controller (bank=19), mca: 
DRAM On Die ECC error. Ext Err Code: 0 Memory Error 'mem-tx: generic 
read, tx: generic, level: L3/generic', mci: Error_overflow CECC, Locn: 
memory_die_id=1, Error Msg: Corrected error, no action required.

Observed the error string as "mca: DRAM On Die ECC error. Ext Err Code: 0"

Thanks for the inputs. I will add the steps in commit message and in 
Documentation as well.


> Thx.
> 
> --
> Regards/Gruss,
>      Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
> 

  reply	other threads:[~2023-10-27  5:05 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-25  5:14 [PATCH v2 0/4] Few cleanups and AMD Family 19h Models 90h-9fh EDAC Support Muralidhara M K
2023-10-25  5:14 ` [PATCH v2 1/4] EDAC/mce_amd: Remove SMCA Extended Error code descriptions Muralidhara M K
2023-10-25 19:08   ` Borislav Petkov
2023-10-26  9:42     ` M K, Muralidhara
2023-10-26 11:14       ` Borislav Petkov
2023-10-26 12:02         ` M K, Muralidhara
2023-10-26 12:37           ` Borislav Petkov
2023-10-26 13:05             ` Yazen Ghannam
2023-10-26 13:40               ` Borislav Petkov
2023-10-27  5:05                 ` M K, Muralidhara [this message]
2023-10-25  5:14 ` [PATCH v2 2/4] x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank types Muralidhara M K
2023-10-26  8:19   ` Borislav Petkov
2023-10-26  9:46     ` M K, Muralidhara
2023-10-25  5:14 ` [PATCH v2 3/4] EDAC/mc: Add new HBM3 memory type Muralidhara M K
2023-10-25  5:14 ` [PATCH v2 4/4] EDAC/amd64: Add Family 19h Models 90h ~ 9fh enumeration support Muralidhara M K
     [not found]   ` <20231027144552.GGZTvNIE7g1S3jBM72@fat_crate.local>
2023-10-30  4:23     ` M K, Muralidhara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba6eea97-116a-4678-7800-d24692c65cd6@amd.com \
    --to=muralimk@amd.com \
    --cc=Avadhut.Naik@amd.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=muralidhara.mk@amd.com \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox