From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mauro Carvalho Chehab Subject: Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac Date: Tue, 18 Jul 2017 18:15:45 -0300 Message-ID: <20170718181545.32bd9181@vento.lan> References: <20170717215912.26070-1-toshi.kani@hpe.com> <20170717215912.26070-4-toshi.kani@hpe.com> <20170718060007.GB8736@nazgul.tnic> <1500407379.2042.21.camel@hpe.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Return-path: Received: from ec2-52-27-115-49.us-west-2.compute.amazonaws.com ([52.27.115.49]:58740 "EHLO osg.samsung.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751500AbdGRVPz (ORCPT ); Tue, 18 Jul 2017 17:15:55 -0400 In-Reply-To: <1500407379.2042.21.camel@hpe.com> Sender: linux-acpi-owner@vger.kernel.org List-Id: linux-acpi@vger.kernel.org To: "Kani, Toshimitsu" Cc: "tony.luck@intel.com" , "bp@alien8.de" , "linux-kernel@vger.kernel.org" , "tglx@linutronix.de" , "mchehab@kernel.org" , "rjw@rjwysocki.net" , "srinivas.pandruvada@linux.intel.com" , "lenb@kernel.org" , "linux-acpi@vger.kernel.org" , "linux-edac@vger.kernel.org" Em Tue, 18 Jul 2017 19:58:54 +0000 "Kani, Toshimitsu" escreveu: > On Tue, 2017-07-18 at 08:00 +0200, Borislav Petkov wrote: > > On Mon, Jul 17, 2017 at 03:59:12PM -0600, Toshi Kani wrote: > > > The ghes_edac driver was introduced in 2013 [1], but it has not > > > been enabled by any distro yet.  This driver obtains error info > > > from firmware interfaces, which are not properly implemented on > > > many platforms, as the driver always emits the messages below: > > > > > >  This EDAC driver relies on BIOS to enumerate memory and get error > > > reports.  Unfortunately, not all BIOSes reflect the memory layout > > > correctly  So, the end result of using this driver varies from > > > vendor to vendor  If you find incorrect reports, please contact > > > your hardware vendor  to correct its BIOS. > > > > > > To get out from this situation, add a platform type check to > > > selectively enable the driver on the platforms that are known to > > > have proper firmware implementation.  Platform vendors can add > > > their platforms to the list when they support ghes_edac. > > > > So maintaining whitelists for things has always been a PITA and we > > should try to avoid it, if possible. (We can always do it if nothing > > saner comes along.) > > Agreed. > > > Now, below is a dirty patch converting ghes_edac to a normal module. > > On systems where we have GHES, the firmware generally disables the > > detection of the presence of ECC hardware, thus preventing the > > platform EDAC driver from loading. > > I have HPE Haswell and Skylake test systems with GHES, but they do not > hide IMCs from the OS. So, the sb_edac and skx_edac drivers get > attached on these systems when ghes_edac is disabled. > > > Let me clarify: I have an AMD HP box which, when GHES is enabled in > > the BIOS, says that ECC is disabled in the memory controller and the > > amd64_edac driver doesn't load for that memory controller. > > Hmm... what's the platform name of this box? I can look into this case > if you need. > > > And I think we should try this first: have the firmware disable > > detection methods so that the platform drivers don't load. > > I do not think we can rely on this method. > > > Then, ghes_edac can be a simple module and no other driver would > > attempt loading. > > I like the use of notifier chain, which is much cleaner. > > > The question is: does the platform do this disabling now? > > Unfortunately, that is not the case today. The IMCs cannot be hidden > with the Device Hide registers for Skylake at least. We had a similar discussion several years ago when I wrote this driver. On that time, I talked with Red Hat, HP, Dell, Intel people and with some customers with large clusters. The way it is, ghes_edac is a poor man's driver. What it hopefully provide is a detection that an error happened, without really telling the user what component should be replaced. Ok, on machines with their own error reporting mechanism (like HP servers), a sys admin can look on some proprietary software (or bios), in order to identify what happened. Yet, BIOS doesn't provide any glue about what's the memory architecture, as it maps memory as if it was a single DIMM memory: (from ghes_edac_register) layers[0].type = EDAC_MC_LAYER_ALL_MEM; layers[0].size = num_dimm; layers[0].is_virt_csrow = true; So, even on systems where the BIOS actually knows how the memory cards are wired, it will mask the memory controller data. Now, the EDAC driver can also be used to identify what channels are used. That helps the sys admin to know if the memories are connected in a way that it will be using multiple channels, or not, helping to setup the machine to obtain the maximum possible performance. So, for example, on my Intel-based HP server, I can check such info with: $ ras-mc-ctl --mainboard ras-mc-ctl: mainboard: HP model ProLiant ML350 Gen9 $ ras-mc-ctl --layout +-----------------------------------------------------------------------+ | mc0 | mc1 | | channel0 | channel1 | channel2 | channel0 | channel1 | channel2 | -------+-----------------------------------------------------------------------+ slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | slot0: | 16384 MB | 0 MB | 16384 MB | 16384 MB | 0 MB | 16384 MB | -------+---------------------------------------------------------------------------+ So, I know that both CPUs will be connected to my memories, and, on both, it is using 2 channels. If I was using the ghes driver, that information would be hidden. So, due to all problems with ghes, it is enabled only if there are no better solution, e. g. on systems where there's no way to talk directly to the hardware (like on E7 Xeon machines, where the memory controller is actually on a separate chip that are controlled only by the BIOS). Thanks, Mauro From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Subject: [3/3] ghes_edac: add platform check to enable ghes_edac From: Mauro Carvalho Chehab Message-Id: <20170718181545.32bd9181@vento.lan> Date: Tue, 18 Jul 2017 18:15:45 -0300 To: "Kani, Toshimitsu" Cc: "tony.luck@intel.com" , "bp@alien8.de" , "linux-kernel@vger.kernel.org" , "tglx@linutronix.de" , "mchehab@kernel.org" , "rjw@rjwysocki.net" , "srinivas.pandruvada@linux.intel.com" , "lenb@kernel.org" , "linux-acpi@vger.kernel.org" , "linux-edac@vger.kernel.org" List-ID: RW0gVHVlLCAxOCBKdWwgMjAxNyAxOTo1ODo1NCArMDAwMAoiS2FuaSwgVG9zaGltaXRzdSIgPHRv c2hpLmthbmlAaHBlLmNvbT4gZXNjcmV2ZXU6Cgo+IE9uIFR1ZSwgMjAxNy0wNy0xOCBhdCAwODow MCArMDIwMCwgQm9yaXNsYXYgUGV0a292IHdyb3RlOgo+ID4gT24gTW9uLCBKdWwgMTcsIDIwMTcg YXQgMDM6NTk6MTJQTSAtMDYwMCwgVG9zaGkgS2FuaSB3cm90ZTogIAo+ID4gPiBUaGUgZ2hlc19l ZGFjIGRyaXZlciB3YXMgaW50cm9kdWNlZCBpbiAyMDEzIFsxXSwgYnV0IGl0IGhhcyBub3QKPiA+ ID4gYmVlbiBlbmFibGVkIGJ5IGFueSBkaXN0cm8geWV0LsKgwqBUaGlzIGRyaXZlciBvYnRhaW5z IGVycm9yIGluZm8KPiA+ID4gZnJvbSBmaXJtd2FyZSBpbnRlcmZhY2VzLCB3aGljaCBhcmUgbm90 IHByb3Blcmx5IGltcGxlbWVudGVkIG9uCj4gPiA+IG1hbnkgcGxhdGZvcm1zLCBhcyB0aGUgZHJp dmVyIGFsd2F5cyBlbWl0cyB0aGUgbWVzc2FnZXMgYmVsb3c6Cj4gPiA+IAo+ID4gPiDCoFRoaXMg RURBQyBkcml2ZXIgcmVsaWVzIG9uIEJJT1MgdG8gZW51bWVyYXRlIG1lbW9yeSBhbmQgZ2V0IGVy cm9yCj4gPiA+IHJlcG9ydHMuIMKgVW5mb3J0dW5hdGVseSwgbm90IGFsbCBCSU9TZXMgcmVmbGVj dCB0aGUgbWVtb3J5IGxheW91dAo+ID4gPiBjb3JyZWN0bHkgwqBTbywgdGhlIGVuZCByZXN1bHQg b2YgdXNpbmcgdGhpcyBkcml2ZXIgdmFyaWVzIGZyb20KPiA+ID4gdmVuZG9yIHRvIHZlbmRvciDC oElmIHlvdSBmaW5kIGluY29ycmVjdCByZXBvcnRzLCBwbGVhc2UgY29udGFjdAo+ID4gPiB5b3Vy IGhhcmR3YXJlIHZlbmRvciDCoHRvIGNvcnJlY3QgaXRzIEJJT1MuCj4gPiA+IAo+ID4gPiBUbyBn ZXQgb3V0IGZyb20gdGhpcyBzaXR1YXRpb24sIGFkZCBhIHBsYXRmb3JtIHR5cGUgY2hlY2sgdG8K PiA+ID4gc2VsZWN0aXZlbHkgZW5hYmxlIHRoZSBkcml2ZXIgb24gdGhlIHBsYXRmb3JtcyB0aGF0 IGFyZSBrbm93biB0bwo+ID4gPiBoYXZlIHByb3BlciBmaXJtd2FyZSBpbXBsZW1lbnRhdGlvbi7C oMKgUGxhdGZvcm0gdmVuZG9ycyBjYW4gYWRkCj4gPiA+IHRoZWlyIHBsYXRmb3JtcyB0byB0aGUg bGlzdCB3aGVuIHRoZXkgc3VwcG9ydCBnaGVzX2VkYWMuICAKPiA+IAo+ID4gU28gbWFpbnRhaW5p bmcgd2hpdGVsaXN0cyBmb3IgdGhpbmdzIGhhcyBhbHdheXMgYmVlbiBhIFBJVEEgYW5kIHdlCj4g PiBzaG91bGQgdHJ5IHRvIGF2b2lkIGl0LCBpZiBwb3NzaWJsZS4gKFdlIGNhbiBhbHdheXMgZG8g aXQgaWYgbm90aGluZwo+ID4gc2FuZXIgY29tZXMgYWxvbmcuKSAgCj4gCj4gQWdyZWVkLgo+IAo+ ID4gTm93LCBiZWxvdyBpcyBhIGRpcnR5IHBhdGNoIGNvbnZlcnRpbmcgZ2hlc19lZGFjIHRvIGEg bm9ybWFsIG1vZHVsZS4KPiA+IE9uIHN5c3RlbXMgd2hlcmUgd2UgaGF2ZSBHSEVTLCB0aGUgZmly bXdhcmUgZ2VuZXJhbGx5IGRpc2FibGVzIHRoZQo+ID4gZGV0ZWN0aW9uIG9mIHRoZSBwcmVzZW5j ZSBvZiBFQ0MgaGFyZHdhcmUsIHRodXMgcHJldmVudGluZyB0aGUKPiA+IHBsYXRmb3JtIEVEQUMg ZHJpdmVyIGZyb20gbG9hZGluZy4gIAo+IAo+IEkgaGF2ZSBIUEUgSGFzd2VsbCBhbmQgU2t5bGFr ZSB0ZXN0IHN5c3RlbXMgd2l0aCBHSEVTLCBidXQgdGhleSBkbyBub3QKPiBoaWRlIElNQ3MgZnJv bSB0aGUgT1MuICBTbywgdGhlIHNiX2VkYWMgYW5kIHNreF9lZGFjIGRyaXZlcnMgZ2V0Cj4gYXR0 YWNoZWQgb24gdGhlc2Ugc3lzdGVtcyB3aGVuIGdoZXNfZWRhYyBpcyBkaXNhYmxlZC4KPiAKPiA+ IExldCBtZSBjbGFyaWZ5OiBJIGhhdmUgYW4gQU1EIEhQIGJveCB3aGljaCwgd2hlbiBHSEVTIGlz IGVuYWJsZWQgaW4KPiA+IHRoZSBCSU9TLCBzYXlzIHRoYXQgRUNDIGlzIGRpc2FibGVkIGluIHRo ZSBtZW1vcnkgY29udHJvbGxlciBhbmQgdGhlCj4gPiBhbWQ2NF9lZGFjIGRyaXZlciBkb2Vzbid0 IGxvYWQgZm9yIHRoYXQgbWVtb3J5IGNvbnRyb2xsZXIuICAKPiAKPiBIbW0uLi4gd2hhdCdzIHRo ZSBwbGF0Zm9ybSBuYW1lIG9mIHRoaXMgYm94PyAgSSBjYW4gbG9vayBpbnRvIHRoaXMgY2FzZQo+ IGlmIHlvdSBuZWVkLgo+IAo+ID4gQW5kIEkgdGhpbmsgd2Ugc2hvdWxkIHRyeSB0aGlzIGZpcnN0 OiBoYXZlIHRoZSBmaXJtd2FyZSBkaXNhYmxlCj4gPiBkZXRlY3Rpb24gbWV0aG9kcyBzbyB0aGF0 IHRoZSBwbGF0Zm9ybSBkcml2ZXJzIGRvbid0IGxvYWQuICAKPiAKPiBJIGRvIG5vdCB0aGluayB3 ZSBjYW4gcmVseSBvbiB0aGlzIG1ldGhvZC4KPiAKPiA+IFRoZW4sIGdoZXNfZWRhYyBjYW4gYmUg YSBzaW1wbGUgbW9kdWxlIGFuZCBubyBvdGhlciBkcml2ZXIgd291bGQKPiA+IGF0dGVtcHQgbG9h ZGluZy4gIAo+IAo+IEkgbGlrZSB0aGUgdXNlIG9mIG5vdGlmaWVyIGNoYWluLCB3aGljaCBpcyBt dWNoIGNsZWFuZXIuCj4gCj4gPiBUaGUgcXVlc3Rpb24gaXM6IGRvZXMgdGhlIHBsYXRmb3JtIGRv IHRoaXMgZGlzYWJsaW5nIG5vdz8gIAo+IAo+IFVuZm9ydHVuYXRlbHksIHRoYXQgaXMgbm90IHRo ZSBjYXNlIHRvZGF5LiAgVGhlIElNQ3MgY2Fubm90IGJlIGhpZGRlbgo+IHdpdGggdGhlIERldmlj ZSBIaWRlIHJlZ2lzdGVycyBmb3IgU2t5bGFrZSBhdCBsZWFzdC4KCldlIGhhZCBhIHNpbWlsYXIg ZGlzY3Vzc2lvbiBzZXZlcmFsIHllYXJzIGFnbyB3aGVuIEkgd3JvdGUgdGhpcyBkcml2ZXIuCk9u IHRoYXQgdGltZSwgSSB0YWxrZWQgd2l0aCBSZWQgSGF0LCBIUCwgRGVsbCwgSW50ZWwgcGVvcGxl IGFuZCB3aXRoCnNvbWUgY3VzdG9tZXJzIHdpdGggbGFyZ2UgY2x1c3RlcnMuCgpUaGUgd2F5IGl0 IGlzLCBnaGVzX2VkYWMgaXMgYSBwb29yIG1hbidzIGRyaXZlci4gV2hhdCBpdCBob3BlZnVsbHkK cHJvdmlkZSBpcyBhIGRldGVjdGlvbiB0aGF0IGFuIGVycm9yIGhhcHBlbmVkLCB3aXRob3V0IHJl YWxseSB0ZWxsaW5nCnRoZSB1c2VyIHdoYXQgY29tcG9uZW50IHNob3VsZCBiZSByZXBsYWNlZC4K Ck9rLCBvbiBtYWNoaW5lcyB3aXRoIHRoZWlyIG93biBlcnJvciByZXBvcnRpbmcgbWVjaGFuaXNt IChsaWtlCkhQIHNlcnZlcnMpLCBhIHN5cyBhZG1pbiBjYW4gbG9vayBvbiBzb21lIHByb3ByaWV0 YXJ5IHNvZnR3YXJlCihvciBiaW9zKSwgaW4gb3JkZXIgdG8gaWRlbnRpZnkgd2hhdCBoYXBwZW5l ZC4KCllldCwgQklPUyBkb2Vzbid0IHByb3ZpZGUgYW55IGdsdWUgYWJvdXQgd2hhdCdzIHRoZSBt ZW1vcnkgYXJjaGl0ZWN0dXJlLAphcyBpdCBtYXBzIG1lbW9yeSBhcyBpZiBpdCB3YXMgYSBzaW5n bGUgRElNTSBtZW1vcnk6CgooZnJvbSBnaGVzX2VkYWNfcmVnaXN0ZXIpCgoJbGF5ZXJzWzBdLnR5 cGUgPSBFREFDX01DX0xBWUVSX0FMTF9NRU07CglsYXllcnNbMF0uc2l6ZSA9IG51bV9kaW1tOwoJ bGF5ZXJzWzBdLmlzX3ZpcnRfY3Nyb3cgPSB0cnVlOwoKU28sIGV2ZW4gb24gc3lzdGVtcyB3aGVy ZSB0aGUgQklPUyBhY3R1YWxseSBrbm93cyBob3cgdGhlIG1lbW9yeQpjYXJkcyBhcmUgd2lyZWQs IGl0IHdpbGwgbWFzayB0aGUgbWVtb3J5IGNvbnRyb2xsZXIgZGF0YS4KCk5vdywgdGhlIEVEQUMg ZHJpdmVyIGNhbiBhbHNvIGJlIHVzZWQgdG8gaWRlbnRpZnkgd2hhdApjaGFubmVscyBhcmUgdXNl ZC4gVGhhdCBoZWxwcyB0aGUgc3lzIGFkbWluIHRvIGtub3cgaWYgdGhlCm1lbW9yaWVzIGFyZSBj b25uZWN0ZWQgaW4gYSB3YXkgdGhhdCBpdCB3aWxsIGJlIHVzaW5nIG11bHRpcGxlCmNoYW5uZWxz LCBvciBub3QsIGhlbHBpbmcgdG8gc2V0dXAgdGhlIG1hY2hpbmUgdG8gb2J0YWluCnRoZSBtYXhp bXVtIHBvc3NpYmxlIHBlcmZvcm1hbmNlLgoKU28sIGZvciBleGFtcGxlLCBvbiBteSBJbnRlbC1i YXNlZCBIUCBzZXJ2ZXIsIEkgY2FuIGNoZWNrCnN1Y2ggaW5mbyB3aXRoOgoKJCByYXMtbWMtY3Rs IC0tbWFpbmJvYXJkCnJhcy1tYy1jdGw6IG1haW5ib2FyZDogSFAgbW9kZWwgUHJvTGlhbnQgTUwz NTAgR2VuOQokIHJhcy1tYy1jdGwgLS1sYXlvdXQKICAgICAgICstLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLSsKICAg ICAgIHwgICAgICAgICAgICAgICAgbWMwICAgICAgICAgICAgICAgIHwgICAgICAgICAgICAgICAg bWMxICAgICAgICAgICAgICAgIHwKICAgICAgIHwgY2hhbm5lbDAgIHwgY2hhbm5lbDEgIHwgY2hh bm5lbDIgIHwgY2hhbm5lbDAgIHwgY2hhbm5lbDEgIHwgY2hhbm5lbDIgIHwKLS0tLS0tLSstLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLSsKc2xvdDI6IHwgICAgIDAgTUIgIHwgICAgIDAgTUIgIHwgICAgIDAgTUIgIHwg ICAgIDAgTUIgIHwgICAgIDAgTUIgIHwgICAgIDAgTUIgIHwKc2xvdDE6IHwgICAgIDAgTUIgIHwg ICAgIDAgTUIgIHwgICAgIDAgTUIgIHwgICAgIDAgTUIgIHwgICAgIDAgTUIgIHwgICAgIDAgTUIg IHwKc2xvdDA6IHwgIDE2Mzg0IE1CICB8ICAgICAwIE1CICB8ICAxNjM4NCBNQiAgfCAgMTYzODQg TUIgIHwgICAgIDAgTUIgIHwgIDE2Mzg0IE1CICB8Ci0tLS0tLS0rLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t KwoKU28sIEkga25vdyB0aGF0IGJvdGggQ1BVcyB3aWxsIGJlIGNvbm5lY3RlZCB0byBteSBtZW1v cmllcywgYW5kLApvbiBib3RoLCBpdCBpcyB1c2luZyAyIGNoYW5uZWxzLgoKSWYgSSB3YXMgdXNp bmcgdGhlIGdoZXMgZHJpdmVyLCB0aGF0IGluZm9ybWF0aW9uIHdvdWxkIGJlIGhpZGRlbi4KClNv LCBkdWUgdG8gYWxsIHByb2JsZW1zIHdpdGggZ2hlcywgaXQgaXMgZW5hYmxlZCBvbmx5IGlmIHRo ZXJlIGFyZSBubwpiZXR0ZXIgc29sdXRpb24sIGUuIGcuIG9uIHN5c3RlbXMgd2hlcmUgdGhlcmUn cyBubyB3YXkgdG8gdGFsayBkaXJlY3RseQp0byB0aGUgaGFyZHdhcmUgKGxpa2Ugb24gRTcgWGVv biBtYWNoaW5lcywgd2hlcmUgdGhlIG1lbW9yeSBjb250cm9sbGVyCmlzIGFjdHVhbGx5IG9uIGEg c2VwYXJhdGUgY2hpcCB0aGF0IGFyZSBjb250cm9sbGVkIG9ubHkgYnkgdGhlIEJJT1MpLgoKVGhh bmtzLApNYXVybwotLS0KVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxp bmUgInVuc3Vic2NyaWJlIGxpbnV4LWVkYWMiIGluCnRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBt YWpvcmRvbW9Admdlci5rZXJuZWwub3JnCk1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6Ly92 Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbAo=