From: "M K, Muralidhara" <muralimk@amd.com>
To: Borislav Petkov <bp@alien8.de>, Yazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
mchehab@kernel.org, Muralidhara M K <muralidhara.mk@amd.com>,
Avadhut Naik <Avadhut.Naik@amd.com>
Subject: Re: [PATCH v2 1/4] EDAC/mce_amd: Remove SMCA Extended Error code descriptions
Date: Fri, 27 Oct 2023 10:35:33 +0530 [thread overview]
Message-ID: <ba6eea97-116a-4678-7800-d24692c65cd6@amd.com> (raw)
In-Reply-To: <20231026134016.GDZTpsQDYU4Ll6sAA3@fat_crate.local>
On 10/26/2023 7:10 PM, Borislav Petkov wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> On Thu, Oct 26, 2023 at 09:05:51AM -0400, Yazen Ghannam wrote:
>> Post-processing is one of the features that Avadhut implemented.
>>
>> https://github.com/mchehab/rasdaemon/commit/932118b04a04104dfac6b8536419803f236e6118
>
Hi Yazen, Thanks for pointing to this commit. Yes I do remember.
> Yes, now try to decode the error with rasdaemon this way, by supplying
> the fields.
>
> Then explain step-by-step what you've done in the commit message and in
> a documentation file in Documentation/ras/ so that people can find it
> and can actually do the decoding themselves.
>
> It needs to be absolutely easy to decode those errors. Not tell people:
> "go look for the error description in the PPR".
>
Yes, we have offline decoding option in rasdaemon
For example:
$ rasdaemon -p --status 0xdc2040000000011b --ipid 0x0000609600092f00 --smca
2023-10-26 23:51:34 -0500, Unified Memory Controller (bank=0), mca: DRAM
ECC error. Ext Err Code: 0 Memory Error 'mem-tx: generic read, tx:
generic, level: L3/generic', mci: Error_overflow CECC, Locn:
memory_channel=0,csrow=0, Error Msg: Corrected error, no action required.
Observed the error string "mca: DRAM ECC error. Ext Err Code: 0"
Also, we can pass particular family/model to decode, Ex:for MI300A
$ rasdaemon -p --status 0xdc2040000000011b --ipid 0x0000609600092f00
--smca --family 0x19 --model 0x90 --bank 19
2023-10-26 23:52:09 -0500, Unified Memory Controller (bank=19), mca:
DRAM On Die ECC error. Ext Err Code: 0 Memory Error 'mem-tx: generic
read, tx: generic, level: L3/generic', mci: Error_overflow CECC, Locn:
memory_die_id=1, Error Msg: Corrected error, no action required.
Observed the error string as "mca: DRAM On Die ECC error. Ext Err Code: 0"
Thanks for the inputs. I will add the steps in commit message and in
Documentation as well.
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>
next prev parent reply other threads:[~2023-10-27 5:05 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-25 5:14 [PATCH v2 0/4] Few cleanups and AMD Family 19h Models 90h-9fh EDAC Support Muralidhara M K
2023-10-25 5:14 ` [PATCH v2 1/4] EDAC/mce_amd: Remove SMCA Extended Error code descriptions Muralidhara M K
2023-10-25 19:08 ` Borislav Petkov
2023-10-26 9:42 ` M K, Muralidhara
2023-10-26 11:14 ` Borislav Petkov
2023-10-26 12:02 ` M K, Muralidhara
2023-10-26 12:37 ` Borislav Petkov
2023-10-26 13:05 ` Yazen Ghannam
2023-10-26 13:40 ` Borislav Petkov
2023-10-27 5:05 ` M K, Muralidhara [this message]
2023-10-25 5:14 ` [PATCH v2 2/4] x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank types Muralidhara M K
2023-10-26 8:19 ` Borislav Petkov
2023-10-26 9:46 ` M K, Muralidhara
2023-10-25 5:14 ` [PATCH v2 3/4] EDAC/mc: Add new HBM3 memory type Muralidhara M K
2023-10-25 5:14 ` [PATCH v2 4/4] EDAC/amd64: Add Family 19h Models 90h ~ 9fh enumeration support Muralidhara M K
[not found] ` <20231027144552.GGZTvNIE7g1S3jBM72@fat_crate.local>
2023-10-30 4:23 ` M K, Muralidhara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ba6eea97-116a-4678-7800-d24692c65cd6@amd.com \
--to=muralimk@amd.com \
--cc=Avadhut.Naik@amd.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=muralidhara.mk@amd.com \
--cc=yazen.ghannam@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox