From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752027AbcBWXWi (ORCPT ); Tue, 23 Feb 2016 18:22:38 -0500 Received: from mail-bn1bon0083.outbound.protection.outlook.com ([157.56.111.83]:46880 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750920AbcBWXWg (ORCPT ); Tue, 23 Feb 2016 18:22:36 -0500 Authentication-Results: linux.intel.com; dkim=none (message not signed) header.d=none;linux.intel.com; dmarc=none action=none header.from=amd.com; From: Aravind Gopalakrishnan Subject: Re: [PATCH 1/4] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors To: Borislav Petkov References: <1455659111-32074-1-git-send-email-Aravind.Gopalakrishnan@amd.com> <1455659111-32074-2-git-send-email-Aravind.Gopalakrishnan@amd.com> <20160223123744.GC3673@pd.tnic> CC: , , , , , , , , , , , , , , Message-ID: <56CCE23D.6080802@amd.com> Date: Tue, 23 Feb 2016 16:50:37 -0600 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <20160223123744.GC3673@pd.tnic> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [165.204.77.1] X-ClientProxiedBy: SN1PR17CA0017.namprd17.prod.outlook.com (25.169.33.155) To SN1PR12MB0431.namprd12.prod.outlook.com (25.162.105.12) X-MS-Office365-Filtering-Correlation-Id: 36f5ed16-53d8-4db6-2c64-08d33ca3c252 X-Microsoft-Exchange-Diagnostics: 1;SN1PR12MB0431;2:RtSc1W2WheEWWTo0Y3HNlD5caxwDqtstRv8NgcHDIJMS/0ifhHd3F0SMrnt+zCxoA3G+B3G3DLtWAogW1B5Bnz8wmtL6kapQrmMvAbRMfSpV0GUU0avwlVRlh9YdZ6vGPqzJ6vofxc1qTUSVLjieAXQYdFq+cRvMdXjvL9n1d1ojp1YU1C7kPSssAznG8VPN;3:zXvnwW80sfrhoYV7uRBBFsLR7C9h7qhDkBpQIsJJewZR+/am6qoHLZzJg2ejwNxFaf+xCACtwmC9lmmkpXQS9bEGnVpOLApY0tH64SXYa/e6+62YuGMYzMTAfi+YEE4l;25:JnIEvXvlvzPsVXJIy4UT/jsBi2GAz/ix57Z/SLL3drKh/itG5JHI2+iJUqq/Ch9mK9zI+1/qcM0aAxcesOlL+GRv/53f0r/C7TMtdq79PuVgbCppydb2EbIMXYN2s6M4DxEzV8FIKy5UwEozlmBzQNJSuowlFzF7NheJFB66IDrIGYVQTPbI1YVC78L53kWLCI1T0rkF0U28chmAevxji2fqzokGblgaiAvNDFsalHDY7zl/sNVmq+rIbg97QdyDtf4RxCg8bF99ql5VKWc9Yi53+F2oQeKsYK/48886RYUid5nZMf8hn7DFUxZfekXSNAu3zH6uPWztM3ENAgqz6FxkbQjjsBIPDPHryojvSks= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:SN1PR12MB0431; X-Microsoft-Exchange-Diagnostics: 1;SN1PR12MB0431;20:Yjtgo48ANafMBQYsz1MLZVYzD+odfqX+xR3MSmgML7culkdPTQuwoFouKeD249nQV0RUHVWW5EY0MgJIu9EDnjoTIPkiW6zfesCqtcWO8QIizlrQ34xTRd0I53vt0TIBqP34wCy5spYbPQpVXKXY5KMkjuTkuZPResfx9Pd3mftpB1zPB/GMAMO1Olu7qNrAz8SoUlA5foc3NfZaEpdok7oSSK4w9tfNO+c93t0URgmlovDwirYHxcjPmULPZ0iZY6FwCxw7ngaFRV1CoW1nm4+p5j5B2g2cICFFnrMOCHCGWQ0IwlBz3vJdwPUUTN77nOzfiZeu/HrthntB8CRiN3jkqiCV/0dfyibRRXLcb2P7i9j4cc1weCr1jZJRz6Gby+qeTOICGalgJb6Ir77+vOQHaQSgCfB6MVL63bIr8cjhUrdbDkvoRhSQpGKyJTnM9lOEraVSPqHzCQ4BuVaW+IHmFu5bfHw+F9ym4eGiKdIxFojQ3tETHrC7y5AfqVRI;4:JGgAzI6dEYAUws8d4yy4rVqxvVUGB70Oqt921bHMMEYLzi+OG820zEFzvDBctQzrtZQbHfxIRfs74txqztMSSClZWtT05opOIgt4RNLoOjqXtKXBaalfzfaQhyLM2bCKDxMsI6xchWggroelEtH2eY7W+yFMsNKLPULCkroaMitF8iz3g6y6zHDKS3WS+XpPyEA55uyYjGpdSmn5WUgk6oJCIeggD9yusdtWxaiuBEF/ItznoAyDzXhZfO0TvLVKTH9akICsjenO7aREZiOkdSPNMvMEAbJmTDanSg+BFqEGT8h5u68OxeaO/kQpmIq5g2I+hVQZUbfLzk/PggI19dogcTochCBUcBcv1bsVD3eohbF1CdLeeiYPj/IgkhBC X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046);SRVR:SN1PR12MB0431;BCL:0;PCL:0;RULEID:;SRVR:SN1PR12MB0431; X-Forefront-PRVS: 08617F610C X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(6009001)(377454003)(24454002)(164054003)(479174004)(92566002)(6116002)(83506001)(50466002)(2950100001)(77096005)(19580395003)(80316001)(40100003)(64126003)(33656002)(4001350100001)(122386002)(65816999)(189998001)(76176999)(87266999)(54356999)(87976001)(1096002)(42186005)(50986999)(23676002)(3846002)(53416004)(4326007)(47776003)(5004730100002)(2906002)(5008740100001)(586003)(66066001)(36756003)(230700001)(65956001)(86362001);DIR:OUT;SFP:1101;SCL:1;SRVR:SN1PR12MB0431;H:wsp100563wss.amd.com;FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtTTjFQUjEyTUIwNDMxOzIzOnlvZWpNRUhjVDNMamtTaWMrdnVYQUxMVjBG?= =?utf-8?B?YTl5RHg5MUxQbEl1QXAyK1ZTeHVZVUJOZVloVEIrdUlJR2FYR3hCK0pOMnUw?= =?utf-8?B?cndLL0NjSHl2dkpVbnVQN1N1MUQ1a3ZxYXNHSnV5TkdDTVlBaTRNRGNGVzBO?= =?utf-8?B?NElpYllBU3FGMEtsaEUxYUw4T0tZV0ExYlJCNGNNZTBMNUoyVjlRS21wNHZZ?= =?utf-8?B?eVJzNy9tdUxtSDVtWG9IMklFVkpmckRRTlU5ODByK3lCb3VzVW9MMTNiVi9u?= =?utf-8?B?Z3o2Q3NDa2RVZ1o1Tjh4S1JlMVdGN0lqZUprdGFjSWlCYVBrSlNHS2FYTytY?= =?utf-8?B?cnNZb1dXSEhyTVcyVGM0QURXcG1zdWE2NG8xUjBZMFpPNVN2QllnTXU0VFV4?= =?utf-8?B?WUtPY1JpR1JsM3BsMHlmZWE0djR1R25yZzNTSTN3UC9jS1E0RCtPMFZXZG1m?= =?utf-8?B?WWlLc1hZN0JDRXFUUjMyUFhoZHlwcm54UnhzOGF1bjlpNktQLzMzUER0aGRv?= =?utf-8?B?SjB6NkQ0d1VuSWM4eGt0VHJxdXZMclpZN2lxVkozUHJoYnFhc29UT1ZpYmVQ?= =?utf-8?B?UFJzeU56dXYrb1pScUV0ZHloaU1RSi94clZZV09IUEtjZno0MHRoV2RuRm84?= =?utf-8?B?ZjdIN1hid2RqbjJqdU5GTnZWNjBCSnVDS1ZSZ0IvSitVSmgvVm1UTFlOVmhF?= =?utf-8?B?QktSRkhQQWI4YkdLYmNNL3ovMGZzdUtzTFJjcUZxUCtIZDNKcDBnYnU4SWNi?= =?utf-8?B?Y2RrTWNFWU10aDFOQklxdUdpM0tsVDdPUnZ4MFlPNWJTb05oZVdSNHowVktu?= =?utf-8?B?Z012cWVPOXFuNitGTHdpSEp3bW1pcDBOeGVTektuSTBnTWRKaENLZ1JkNllP?= =?utf-8?B?QjBhTVdOaVhuMEZLZjZQSHVMMDdpNTlxRHhIbEhtalVCOEF3alVaSW1iUXUx?= =?utf-8?B?amhZNlkxWWdCdHJML0Y3VTlGS01XTGNqVU5GUVl4cW5zWUtZQk1ibDdNcndm?= =?utf-8?B?NkRlNjk1YkE4VmRRZlBSZmdPU0lJWE9LdGg3dVJBQ3VZUHpyMjRCc3paR2pG?= =?utf-8?B?TEY1eWhZRDZ4Q1VqVFl5dUFURTZ6WkVYb3luT0FKT2l3MS9tRUVObmFPOUJR?= =?utf-8?B?cjc1ZmU4a0ZMYWlWdkFBSlJxM2w1bFBucDhVbzJWZjdTSnY2eGdIa2JGemhU?= =?utf-8?B?aml4elpBWEJZQ1BCKy9GYjlWRlQ5SEJDdHRnZDVpZjVXNC8yRDV5Znd1UC9z?= =?utf-8?B?NWlHVEhZMU9zK0FQYXhWL3paNDNwL3Q1bnZ4eENwazJwVE51Rk8wT2dNTmJq?= =?utf-8?B?NW5EbWttNytGejBwWFI3WXJJd3FEaHNiLzBidUpWN0swUXJWcUhsSU53MHBq?= =?utf-8?B?c2dqYUg3VTFPSE12VHlzUDJ3YVcyYnNJeDArOFFnTjlqUE9NNFF3a3UzVWFr?= =?utf-8?B?eXNyL0JSbzQ4UkNYMHdKK2VRSFJmUWh0emVqSFo3alRjNEloMml5NzJVcG16?= =?utf-8?B?QzB2dz09?= X-Microsoft-Exchange-Diagnostics: 1;SN1PR12MB0431;5:XFnLsrMQdpY2oUPAZL0KzOK/uh+n6EpVmYZewDuIs2MoVRTX63PM+jzkuWXM55TI7wUXVD0sxM48K6rVIxYO3X0VH9+6dymtjirrHZBTOEUcfOIeKyxQ+bYNUdVNs5LDs4n5YBIQm1YKuDZLoCXTJg==;24:ya/DsrzCSXhREdOHlF6S8xon2npYkDnvMxKnIAa++qy9OIPMBPh9xyd1Cqz2rOArIeC0u0/b7y+O4gP7w/y8JLStYACGBcqO47gMw8p+RnA=;20:z//vR6aSOabOqczpbarqi9Z2kgAyRhOBPH1uyoiEqfqTEPfLLiK3J+dSwKXZQKPuDFruzuW4ZaY7gPaJ2N8uzW32CMwjusx9zA8BY7SnFjNnpwWPpyhuy3p44RUX8uks46g9WH+SH5ETkh+NWV+yDZmnCdHWLMitJwzgcu7ZhOzZ0587Ah/tnylatn+7JoiBuXelursA+mfqhTVIMAs24GbDaRAe/E5Ts6Aqji4gmmLeQM54UW7lOyl7FR0TdMMe SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Feb 2016 22:50:41.6091 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN1PR12MB0431 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/23/16 6:37 AM, Borislav Petkov wrote: > On Tue, Feb 16, 2016 at 03:45:08PM -0600, Aravind Gopalakrishnan wrote: >> /* AMD-specific bits */ >> #define MCI_STATUS_DEFERRED (1ULL<<44) /* declare an uncorrected error */ >> #define MCI_STATUS_POISON (1ULL<<43) /* access poisonous data */ >> +#define MCI_STATUS_TCC (1ULL<<55) /* Task context corrupt */ > \n > Ack. >> +/* >> + * McaX field if set indicates a given bank supports MCA extensions: >> + * - Deferred error interrupt type is specifiable by bank >> + * - BlkPtr field indicates prescence of extended MISC registers. > ^^^^^^^^^ > > Btw, that's MCi_MISC[BlkPtr] ? MCi_MISC0[BlkPtr] specifically. Will update the comments about this. > Also, please integrate a spell checker into your patch creation > workflow. Sorry about that. Looks like this pair is not defined in spelling.txt. So, might be worth adding there as well? >> + * But should not be used to determine MSR numbers >> + * - TCC bit is present in MCx_STATUS > All sentences end with a "." Will fix. > >> +/* >> + * Enumerating new IP types and HWID values > Please stop using gerund, i.e., -ing, in comments and commit messages. > > "Enumerate new IP ..." is just fine. Ack. > >> + * in ScalableMCA enabled AMD processors >> + */ >> +#ifdef CONFIG_X86_MCE_AMD >> +enum ip_types { > AMD-specific so "amd_ip_types" Ok, will fix. > >> + F17H_CORE = 0, /* Core errors */ >> + DF, /* Data Fabric */ >> + UMC, /* Unified Memory Controller */ >> + FUSE, /* FUSE subsystem */ > What's FUSE subsystem? It's the block for programming FUSE registers. > > In any case, this could use a different name in order not to confuse > with Linux's filesystem in userspace. Ok, will ask internally as well as to what name suits here. >> + >> +struct hwid { > amd_hwid and so on. All below should have the "amd_" prefix so that it > is obvious. Will fix. > >> + const char *ipname; >> + unsigned int hwid_value; >> +}; >> + >> +extern struct hwid hwid_mappings[N_IP_TYPES]; >> + >> +enum core_mcatypes { >> + LS = 0, /* Load Store */ >> + IF, /* Instruction Fetch */ >> + L2_CACHE, /* L2 cache */ >> + DE, /* Decoder unit */ >> + RES, /* Reserved */ >> + EX, /* Execution unit */ >> + FP, /* Floating Point */ >> + L3_CACHE /* L3 cache */ >> +}; >> + >> +enum df_mcatypes { >> + CS = 0, /* Coherent Slave */ >> + PIE /* Power management, Interrupts, etc */ >> +}; >> +#endif > So all those are defined here but we have a header for exactly that > drivers/edac/mce_amd.h. And then you define and export hwid_mappings in > arch/x86/kernel/cpu/mcheck/mce_amd.c to not use it there. > > Why isn't all this in drivers/edac/mce_amd.[ch] ? > > And if it is there, then you obviously don't need the "amd_" prefix. I have a patch that uses these enums here. But I didn't send it out along with this patchset as I was testing the patch. So yes, I need it here and in the EDAC driver. > >> + >> #endif /* _ASM_X86_MCE_H */ >> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h >> index 5523465..93bccbc 100644 >> --- a/arch/x86/include/asm/msr-index.h >> +++ b/arch/x86/include/asm/msr-index.h >> @@ -266,7 +266,9 @@ >> >> /* 'SMCA': AMD64 Scalable MCA */ >> #define MSR_AMD64_SMCA_MC0_CONFIG 0xc0002004 >> +#define MSR_AMD64_SMCA_MC0_IPID 0xc0002005 >> #define MSR_AMD64_SMCA_MCx_CONFIG(x) (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x)) >> +#define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x)) > Are those MSRs used in multiple files? If not, -> mce.h. Yes, I'll need them in arch/x86/.../mce_amd.c as well. A later patch will be using it there. >> >> >> +/* Defining HWID to IP type mappings for Scalable MCA */ > " Define ..." Ack > >> + case L3_CACHE: >> + if (xec > (ARRAY_SIZE(f17h_l3_mce_desc) - 1)) >> + goto wrong_f17hcore_error; >> + >> + pr_cont("%s.\n", f17h_l3_mce_desc[xec]); >> + break; >> + >> + default: >> + goto wrong_f17hcore_error; > That's a lot of repeated code. You can assign the desc array to a temp > variable depending on mca_type and do the if and pr_cont only once using > that temp variable. Ok, will simplify. > >> + >> + case PIE: >> + if (xec > (ARRAY_SIZE(f17h_pie_mce_desc) - 1)) >> + goto wrong_df_error; >> + >> + pr_cont("%s.\n", f17h_pie_mce_desc[xec]); >> + break; > Ditto. > Will fix. >> + >> +/* Decode errors according to Scalable MCA specification */ >> +static void decode_smca_errors(struct mce *m) >> +{ >> + u32 low, high; >> + u32 addr = MSR_AMD64_SMCA_MCx_IPID(m->bank); >> + unsigned int hwid, mca_type, i; >> + u8 xec = XEC(m->status, xec_mask); >> + >> + if (rdmsr_safe(addr, &low, &high)) { >> + pr_emerg("Unable to decode errors from banks\n"); > That statement is not very useful in its current form. How about "Unable to gather IP block that threw the error. Therefore cannot decode errors further.\n" Or should I just remove the pr_emerg()? > >> + for (i = 0; i < ARRAY_SIZE(hwid_mappings); i++) > So you use those hwid_mappings here. Why aren't they defined here too? Same reason as the enums- In a subsequent patch, I'll need it in arch/x86/../mce_amd.c (I should have probably indicated this in the cover letter so you were aware of it. Sorry about that) > >> + case SMU: >> + pr_emerg(HW_ERR "SMU Error: "); >> + if (xec > (ARRAY_SIZE(f17h_smu_mce_desc) - 1)) { >> + pr_cont("Unrecognized error code from SMU MCA bank\n"); >> + return; >> + } >> + pr_cont("%s.\n", f17h_smu_mce_desc[xec]); >> + break; > Also a lot of repeated code which could be simplified. Hmm, will try to simplify it in V2. > >> >> + if (mce_flags.smca) { > ERROR: "mce_flags" [drivers/edac/edac_mce_amd.ko] undefined! > make[1]: *** [__modpost] Error 1 > make: *** [modules] Error 2 > make: *** Waiting for unfinished jobs.... > > Just read CPUID again here instead of exporting mce_flags. Right. Will fix this. > >> + u32 smca_low, smca_high; >> + u32 smca_addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank); > s/smca_// Will fix. > > Also, all those MSRs should be defined in drivers/edac/mce_amd.h if not > used outside. It will be used in arch/x86/../mce_amd.c > >> + >> + if (!rdmsr_safe(smca_addr, &smca_low, &smca_high) && >> + (smca_low & MCI_CONFIG_MCAX)) >> + pr_cont("|%s", > No need for that break here. Ok, Will fix. > >> >> + case 0x17: >> + xec_mask = 0x3f; >> + if (!mce_flags.smca) { >> + printk(KERN_WARNING "Decoding supported only on Scalable MCA enabled processors\n"); > What is that supposed to do? I thought all new ones will have SMCA... > > If for some reason the CPUID bit is not set, then we should not assume the processor supports the features right? Thanks, -Aravind.