All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
To: Borislav Petkov <bp@alien8.de>
Cc: <tony.luck@intel.com>, <hpa@zytor.com>, <mingo@redhat.com>,
	<tglx@linutronix.de>, <dougthompson@xmission.com>,
	<mchehab@osg.samsung.com>, <x86@kernel.org>,
	<linux-edac@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<ashok.raj@intel.com>, <gong.chen@linux.intel.com>,
	<len.brown@intel.com>, <peterz@infradead.org>,
	<ak@linux.intel.com>, <alexander.shishkin@linux.intel.com>
Subject: Re: [PATCH 1/4] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors
Date: Tue, 23 Feb 2016 16:50:37 -0600	[thread overview]
Message-ID: <56CCE23D.6080802@amd.com> (raw)
In-Reply-To: <20160223123744.GC3673@pd.tnic>



On 2/23/16 6:37 AM, Borislav Petkov wrote:
> On Tue, Feb 16, 2016 at 03:45:08PM -0600, Aravind Gopalakrishnan wrote:
>>   /* AMD-specific bits */
>>   #define MCI_STATUS_DEFERRED	(1ULL<<44)  /* declare an uncorrected error */
>>   #define MCI_STATUS_POISON	(1ULL<<43)  /* access poisonous data */
>> +#define MCI_STATUS_TCC		(1ULL<<55)  /* Task context corrupt */
> \n
>

Ack.

>> +/*
>> + * McaX field if set indicates a given bank supports MCA extensions:
>> + *  - Deferred error interrupt type is specifiable by bank
>> + *  - BlkPtr field indicates prescence of extended MISC registers.
> 				^^^^^^^^^
>
> Btw, that's MCi_MISC[BlkPtr] ?

MCi_MISC0[BlkPtr] specifically. Will update the comments about this.

> Also, please integrate a spell checker into your patch creation
> workflow.

Sorry about that. Looks like this pair is not defined in spelling.txt. 
So, might be worth adding there as well?

>> + *    But should not be used to determine MSR numbers
>> + *  - TCC bit is present in MCx_STATUS
> All sentences end with a "."

Will fix.

>
>> +/*
>> + * Enumerating new IP types and HWID values
> Please stop using gerund, i.e., -ing, in comments and commit messages.
>
> "Enumerate new IP ..." is just fine.

Ack.

>
>> + * in ScalableMCA enabled AMD processors
>> + */
>> +#ifdef CONFIG_X86_MCE_AMD
>> +enum ip_types {
> AMD-specific so "amd_ip_types"

Ok, will fix.

>
>> +	F17H_CORE = 0,	/* Core errors */
>> +	DF,		/* Data Fabric */
>> +	UMC,		/* Unified Memory Controller */
>> +	FUSE,		/* FUSE subsystem */
> What's FUSE subsystem?

It's the block for programming FUSE registers.

>
> In any case, this could use a different name in order not to confuse
> with Linux's filesystem in userspace.

Ok, will ask internally as well as to what name suits here.

>> +
>> +struct hwid {
> amd_hwid and so on. All below should have the "amd_" prefix so that it
> is obvious.

Will fix.

>
>> +	const char *ipname;
>> +	unsigned int hwid_value;
>> +};
>> +
>> +extern struct hwid hwid_mappings[N_IP_TYPES];
>> +
>> +enum core_mcatypes {
>> +	LS = 0,		/* Load Store */
>> +	IF,		/* Instruction Fetch */
>> +	L2_CACHE,	/* L2 cache */
>> +	DE,		/* Decoder unit */
>> +	RES,		/* Reserved */
>> +	EX,		/* Execution unit */
>> +	FP,		/* Floating Point */
>> +	L3_CACHE	/* L3 cache */
>> +};
>> +
>> +enum df_mcatypes {
>> +	CS = 0,		/* Coherent Slave */
>> +	PIE		/* Power management, Interrupts, etc */
>> +};
>> +#endif
> So all those are defined here but we have a header for exactly that
> drivers/edac/mce_amd.h. And then you define and export hwid_mappings in
> arch/x86/kernel/cpu/mcheck/mce_amd.c to not use it there.
>
> Why isn't all this in drivers/edac/mce_amd.[ch] ?
>
> And if it is there, then you obviously don't need the "amd_" prefix.

I have a patch that uses these enums here. But I didn't send it out 
along with this patchset as I was testing the patch.
So yes, I need it here and in the EDAC driver.

>
>> +
>>   #endif /* _ASM_X86_MCE_H */
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index 5523465..93bccbc 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -266,7 +266,9 @@
>>   
>>   /* 'SMCA': AMD64 Scalable MCA */
>>   #define MSR_AMD64_SMCA_MC0_CONFIG	0xc0002004
>> +#define MSR_AMD64_SMCA_MC0_IPID		0xc0002005
>>   #define MSR_AMD64_SMCA_MCx_CONFIG(x)	(MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
>> +#define MSR_AMD64_SMCA_MCx_IPID(x)	(MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
> Are those MSRs used in multiple files? If not, -> mce.h.

Yes, I'll need them in arch/x86/.../mce_amd.c as well.
A later patch will be using it there.

>>   
>>   
>> +/* Defining HWID to IP type mappings for Scalable MCA */
> " Define ..."

Ack

>
>> +	case L3_CACHE:
>> +		if (xec > (ARRAY_SIZE(f17h_l3_mce_desc) - 1))
>> +			goto wrong_f17hcore_error;
>> +
>> +		pr_cont("%s.\n", f17h_l3_mce_desc[xec]);
>> +		break;
>> +
>> +	default:
>> +		goto wrong_f17hcore_error;
> That's a lot of repeated code. You can assign the desc array to a temp
> variable depending on mca_type and do the if and pr_cont only once using
> that temp variable.

Ok, will simplify.

>
>> +
>> +	case PIE:
>> +		if (xec > (ARRAY_SIZE(f17h_pie_mce_desc) - 1))
>> +			goto wrong_df_error;
>> +
>> +		pr_cont("%s.\n", f17h_pie_mce_desc[xec]);
>> +		break;
> Ditto.
>

Will fix.

>> +
>> +/* Decode errors according to Scalable MCA specification */
>> +static void decode_smca_errors(struct mce *m)
>> +{
>> +	u32 low, high;
>> +	u32 addr = MSR_AMD64_SMCA_MCx_IPID(m->bank);
>> +	unsigned int hwid, mca_type, i;
>> +	u8 xec = XEC(m->status, xec_mask);
>> +
>> +	if (rdmsr_safe(addr, &low, &high)) {
>> +		pr_emerg("Unable to decode errors from banks\n");
> That statement is not very useful in its current form.

How about "Unable to gather IP block that threw the error. Therefore 
cannot decode errors further.\n"

Or should I just remove the pr_emerg()?

>
>> +	for (i = 0; i < ARRAY_SIZE(hwid_mappings); i++)
> So you use those hwid_mappings here. Why aren't they defined here too?

Same reason as the enums-
In a subsequent patch, I'll need it in arch/x86/../mce_amd.c
(I should have probably indicated this in the cover letter so you were 
aware of it. Sorry about that)

>
>> +	case SMU:
>> +		pr_emerg(HW_ERR "SMU Error: ");
>> +		if (xec > (ARRAY_SIZE(f17h_smu_mce_desc) - 1)) {
>> +			pr_cont("Unrecognized error code from SMU MCA bank\n");
>> +			return;
>> +		}
>> +		pr_cont("%s.\n", f17h_smu_mce_desc[xec]);
>> +		break;
> Also a lot of repeated code which could be simplified.

Hmm, will try to simplify it in V2.

>
>>   
>> +	if (mce_flags.smca) {
> ERROR: "mce_flags" [drivers/edac/edac_mce_amd.ko] undefined!
> make[1]: *** [__modpost] Error 1
> make: *** [modules] Error 2
> make: *** Waiting for unfinished jobs....
>
> Just read CPUID again here instead of exporting mce_flags.

Right. Will fix this.

>
>> +		u32 smca_low, smca_high;
>> +		u32 smca_addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
> s/smca_//

Will fix.

>
> Also, all those MSRs should be defined in drivers/edac/mce_amd.h if not
> used outside.

It will be used in arch/x86/../mce_amd.c

>
>> +
>> +		if (!rdmsr_safe(smca_addr, &smca_low, &smca_high) &&
>> +		    (smca_low & MCI_CONFIG_MCAX))
>> +			pr_cont("|%s",
> No need for that break here.

Ok, Will fix.

>
>>   
>> +	case 0x17:
>> +		xec_mask = 0x3f;
>> +		if (!mce_flags.smca) {
>> +			printk(KERN_WARNING "Decoding supported only on Scalable MCA enabled processors\n");
> What is that supposed to do? I thought all new ones will have SMCA...
>
>
If for some reason the CPUID bit is not set, then we should not assume 
the processor supports the features right?

Thanks,
-Aravind.

  reply	other threads:[~2016-02-23 23:22 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-16 21:45 [PATCH 0/4] Updates to EDAC and AMD MCE driver Aravind Gopalakrishnan
2016-02-16 21:45 ` [PATCH 1/4] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors Aravind Gopalakrishnan
2016-02-23 12:37   ` Borislav Petkov
2016-02-23 22:50     ` Aravind Gopalakrishnan [this message]
2016-02-24 11:28       ` Borislav Petkov
2016-02-24 17:57         ` Aravind Gopalakrishnan
2016-02-16 21:45 ` [PATCH 2/4] x86/mce/AMD: Fix logic to obtain block address Aravind Gopalakrishnan
2016-02-18 15:38   ` Aravind Gopalakrishnan
2016-02-23 12:39   ` Borislav Petkov
2016-02-23 22:56     ` Aravind Gopalakrishnan
2016-02-24 11:33       ` Borislav Petkov
2016-02-24 18:02         ` Aravind Gopalakrishnan
2016-02-24 20:15           ` Boris Petkov
2016-02-16 21:45 ` [PATCH 3/4] x86/mce: Clarify comments regarding deferred error Aravind Gopalakrishnan
2016-02-23 12:11   ` Borislav Petkov
2016-02-23 23:02     ` Aravind Gopalakrishnan
2016-02-24 11:37       ` Borislav Petkov
2016-02-24 18:06         ` Aravind Gopalakrishnan
2016-02-24 20:13           ` Boris Petkov
2016-02-16 21:45 ` [PATCH 4/4] x86/mce/AMD: Add comments for easier understanding Aravind Gopalakrishnan
2016-02-23 12:35   ` Borislav Petkov
2016-02-24 18:26     ` Aravind Gopalakrishnan
2016-02-26 17:44       ` Borislav Petkov
2016-02-26 19:08         ` Aravind Gopalakrishnan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56CCE23D.6080802@amd.com \
    --to=aravind.gopalakrishnan@amd.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=ashok.raj@intel.com \
    --cc=bp@alien8.de \
    --cc=dougthompson@xmission.com \
    --cc=gong.chen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=len.brown@intel.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@osg.samsung.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.