Re: [PATCH 1/4] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
To: Borislav Petkov <bp@alien8.de>
Cc: <tony.luck@intel.com>, <hpa@zytor.com>, <mingo@redhat.com>,
	<tglx@linutronix.de>, <dougthompson@xmission.com>,
	<mchehab@osg.samsung.com>, <x86@kernel.org>,
	<linux-edac@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<ashok.raj@intel.com>, <gong.chen@linux.intel.com>,
	<len.brown@intel.com>, <peterz@infradead.org>,
	<ak@linux.intel.com>, <alexander.shishkin@linux.intel.com>
Subject: Re: [PATCH 1/4] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors
Date: Tue, 23 Feb 2016 16:50:37 -0600	[thread overview]
Message-ID: <56CCE23D.6080802@amd.com> (raw)
In-Reply-To: <20160223123744.GC3673@pd.tnic>



On 2/23/16 6:37 AM, Borislav Petkov wrote:
> On Tue, Feb 16, 2016 at 03:45:08PM -0600, Aravind Gopalakrishnan wrote:
>>   /* AMD-specific bits */
>>   #define MCI_STATUS_DEFERRED	(1ULL<<44)  /* declare an uncorrected error */
>>   #define MCI_STATUS_POISON	(1ULL<<43)  /* access poisonous data */
>> +#define MCI_STATUS_TCC		(1ULL<<55)  /* Task context corrupt */
> \n
>

Ack.

>> +/*
>> + * McaX field if set indicates a given bank supports MCA extensions:
>> + *  - Deferred error interrupt type is specifiable by bank
>> + *  - BlkPtr field indicates prescence of extended MISC registers.
> 				^^^^^^^^^
>
> Btw, that's MCi_MISC[BlkPtr] ?

MCi_MISC0[BlkPtr] specifically. Will update the comments about this.

> Also, please integrate a spell checker into your patch creation
> workflow.

Sorry about that. Looks like this pair is not defined in spelling.txt. 
So, might be worth adding there as well?

>> + *    But should not be used to determine MSR numbers
>> + *  - TCC bit is present in MCx_STATUS
> All sentences end with a "."

Will fix.

>
>> +/*
>> + * Enumerating new IP types and HWID values
> Please stop using gerund, i.e., -ing, in comments and commit messages.
>
> "Enumerate new IP ..." is just fine.

Ack.

>
>> + * in ScalableMCA enabled AMD processors
>> + */
>> +#ifdef CONFIG_X86_MCE_AMD
>> +enum ip_types {
> AMD-specific so "amd_ip_types"

Ok, will fix.

>
>> +	F17H_CORE = 0,	/* Core errors */
>> +	DF,		/* Data Fabric */
>> +	UMC,		/* Unified Memory Controller */
>> +	FUSE,		/* FUSE subsystem */
> What's FUSE subsystem?

It's the block for programming FUSE registers.

>
> In any case, this could use a different name in order not to confuse
> with Linux's filesystem in userspace.

Ok, will ask internally as well as to what name suits here.

>> +
>> +struct hwid {
> amd_hwid and so on. All below should have the "amd_" prefix so that it
> is obvious.

Will fix.

>
>> +	const char *ipname;
>> +	unsigned int hwid_value;
>> +};
>> +
>> +extern struct hwid hwid_mappings[N_IP_TYPES];
>> +
>> +enum core_mcatypes {
>> +	LS = 0,		/* Load Store */
>> +	IF,		/* Instruction Fetch */
>> +	L2_CACHE,	/* L2 cache */
>> +	DE,		/* Decoder unit */
>> +	RES,		/* Reserved */
>> +	EX,		/* Execution unit */
>> +	FP,		/* Floating Point */
>> +	L3_CACHE	/* L3 cache */
>> +};
>> +
>> +enum df_mcatypes {
>> +	CS = 0,		/* Coherent Slave */
>> +	PIE		/* Power management, Interrupts, etc */
>> +};
>> +#endif
> So all those are defined here but we have a header for exactly that
> drivers/edac/mce_amd.h. And then you define and export hwid_mappings in
> arch/x86/kernel/cpu/mcheck/mce_amd.c to not use it there.
>
> Why isn't all this in drivers/edac/mce_amd.[ch] ?
>
> And if it is there, then you obviously don't need the "amd_" prefix.

I have a patch that uses these enums here. But I didn't send it out 
along with this patchset as I was testing the patch.
So yes, I need it here and in the EDAC driver.

>
>> +
>>   #endif /* _ASM_X86_MCE_H */
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index 5523465..93bccbc 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -266,7 +266,9 @@
>>   
>>   /* 'SMCA': AMD64 Scalable MCA */
>>   #define MSR_AMD64_SMCA_MC0_CONFIG	0xc0002004
>> +#define MSR_AMD64_SMCA_MC0_IPID		0xc0002005
>>   #define MSR_AMD64_SMCA_MCx_CONFIG(x)	(MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
>> +#define MSR_AMD64_SMCA_MCx_IPID(x)	(MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
> Are those MSRs used in multiple files? If not, -> mce.h.

Yes, I'll need them in arch/x86/.../mce_amd.c as well.
A later patch will be using it there.

>>   
>>   
>> +/* Defining HWID to IP type mappings for Scalable MCA */
> " Define ..."

Ack

>
>> +	case L3_CACHE:
>> +		if (xec > (ARRAY_SIZE(f17h_l3_mce_desc) - 1))
>> +			goto wrong_f17hcore_error;
>> +
>> +		pr_cont("%s.\n", f17h_l3_mce_desc[xec]);
>> +		break;
>> +
>> +	default:
>> +		goto wrong_f17hcore_error;
> That's a lot of repeated code. You can assign the desc array to a temp
> variable depending on mca_type and do the if and pr_cont only once using
> that temp variable.

Ok, will simplify.

>
>> +
>> +	case PIE:
>> +		if (xec > (ARRAY_SIZE(f17h_pie_mce_desc) - 1))
>> +			goto wrong_df_error;
>> +
>> +		pr_cont("%s.\n", f17h_pie_mce_desc[xec]);
>> +		break;
> Ditto.
>

Will fix.

>> +
>> +/* Decode errors according to Scalable MCA specification */
>> +static void decode_smca_errors(struct mce *m)
>> +{
>> +	u32 low, high;
>> +	u32 addr = MSR_AMD64_SMCA_MCx_IPID(m->bank);
>> +	unsigned int hwid, mca_type, i;
>> +	u8 xec = XEC(m->status, xec_mask);
>> +
>> +	if (rdmsr_safe(addr, &low, &high)) {
>> +		pr_emerg("Unable to decode errors from banks\n");
> That statement is not very useful in its current form.

How about "Unable to gather IP block that threw the error. Therefore 
cannot decode errors further.\n"

Or should I just remove the pr_emerg()?

>
>> +	for (i = 0; i < ARRAY_SIZE(hwid_mappings); i++)
> So you use those hwid_mappings here. Why aren't they defined here too?

Same reason as the enums-
In a subsequent patch, I'll need it in arch/x86/../mce_amd.c
(I should have probably indicated this in the cover letter so you were 
aware of it. Sorry about that)

>
>> +	case SMU:
>> +		pr_emerg(HW_ERR "SMU Error: ");
>> +		if (xec > (ARRAY_SIZE(f17h_smu_mce_desc) - 1)) {
>> +			pr_cont("Unrecognized error code from SMU MCA bank\n");
>> +			return;
>> +		}
>> +		pr_cont("%s.\n", f17h_smu_mce_desc[xec]);
>> +		break;
> Also a lot of repeated code which could be simplified.

Hmm, will try to simplify it in V2.

>
>>   
>> +	if (mce_flags.smca) {
> ERROR: "mce_flags" [drivers/edac/edac_mce_amd.ko] undefined!
> make[1]: *** [__modpost] Error 1
> make: *** [modules] Error 2
> make: *** Waiting for unfinished jobs....
>
> Just read CPUID again here instead of exporting mce_flags.

Right. Will fix this.

>
>> +		u32 smca_low, smca_high;
>> +		u32 smca_addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
> s/smca_//

Will fix.

>
> Also, all those MSRs should be defined in drivers/edac/mce_amd.h if not
> used outside.

It will be used in arch/x86/../mce_amd.c

>
>> +
>> +		if (!rdmsr_safe(smca_addr, &smca_low, &smca_high) &&
>> +		    (smca_low & MCI_CONFIG_MCAX))
>> +			pr_cont("|%s",
> No need for that break here.

Ok, Will fix.

>
>>   
>> +	case 0x17:
>> +		xec_mask = 0x3f;
>> +		if (!mce_flags.smca) {
>> +			printk(KERN_WARNING "Decoding supported only on Scalable MCA enabled processors\n");
> What is that supposed to do? I thought all new ones will have SMCA...
>
>
If for some reason the CPUID bit is not set, then we should not assume 
the processor supports the features right?

Thanks,
-Aravind.

next prev parent reply	other threads:[~2016-02-23 23:22 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-16 21:45 [PATCH 0/4] Updates to EDAC and AMD MCE driver Aravind Gopalakrishnan
2016-02-16 21:45 ` [PATCH 1/4] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors Aravind Gopalakrishnan
2016-02-23 12:37   ` Borislav Petkov
2016-02-23 22:50     ` Aravind Gopalakrishnan [this message]
2016-02-24 11:28       ` Borislav Petkov
2016-02-24 17:57         ` Aravind Gopalakrishnan
2016-02-16 21:45 ` [PATCH 2/4] x86/mce/AMD: Fix logic to obtain block address Aravind Gopalakrishnan
2016-02-18 15:38   ` Aravind Gopalakrishnan
2016-02-23 12:39   ` Borislav Petkov
2016-02-23 22:56     ` Aravind Gopalakrishnan
2016-02-24 11:33       ` Borislav Petkov
2016-02-24 18:02         ` Aravind Gopalakrishnan
2016-02-24 20:15           ` Boris Petkov
2016-02-16 21:45 ` [PATCH 3/4] x86/mce: Clarify comments regarding deferred error Aravind Gopalakrishnan
2016-02-23 12:11   ` Borislav Petkov
2016-02-23 23:02     ` Aravind Gopalakrishnan
2016-02-24 11:37       ` Borislav Petkov
2016-02-24 18:06         ` Aravind Gopalakrishnan
2016-02-24 20:13           ` Boris Petkov
2016-02-16 21:45 ` [PATCH 4/4] x86/mce/AMD: Add comments for easier understanding Aravind Gopalakrishnan
2016-02-23 12:35   ` Borislav Petkov
2016-02-24 18:26     ` Aravind Gopalakrishnan
2016-02-26 17:44       ` Borislav Petkov
2016-02-26 19:08         ` Aravind Gopalakrishnan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56CCE23D.6080802@amd.com \
    --to=aravind.gopalakrishnan@amd.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=ashok.raj@intel.com \
    --cc=bp@alien8.de \
    --cc=dougthompson@xmission.com \
    --cc=gong.chen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=len.brown@intel.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@osg.samsung.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).