* [PATCH] amd64_edac: Build module on x86-32
@ 2014-11-02 10:22 Tomasz Pala
2014-11-02 10:33 ` Borislav Petkov
0 siblings, 1 reply; 10+ messages in thread
From: Tomasz Pala @ 2014-11-02 10:22 UTC (permalink / raw)
To: Borislav Petkov, linux-edac; +Cc: linux-kernel, trivial
While both K8 and F10h are AMD64 CPUs, EDAC doesn't require them to run
in long mode; AMD_NB dependency is enough.
Fixes: 3d373290450b ("amd64_edac: do not enable module by default").
Signed-off-by: Tomasz Pala <gotar@polanet.pl>
---
drivers/edac/Kconfig | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 7072c28..af9c9fa 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -106,7 +106,7 @@ config EDAC_GHES
config EDAC_AMD64
tristate "AMD64 (Opteron, Athlon64) K8, F10h"
- depends on EDAC_MM_EDAC && AMD_NB && X86_64 && EDAC_DECODE_MCE
+ depends on EDAC_MM_EDAC && AMD_NB && EDAC_DECODE_MCE
help
Support for error detection and correction of DRAM ECC errors on
the AMD64 families of memory controllers (K8 and F10h)
--
1.6.0.4
--
Tomasz Pala <gotar@pld-linux.org>
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH] amd64_edac: Build module on x86-32 2014-11-02 10:22 [PATCH] amd64_edac: Build module on x86-32 Tomasz Pala @ 2014-11-02 10:33 ` Borislav Petkov 2014-11-02 10:45 ` [PATCH] amd64_edac: Document why it is 64-bit only Borislav Petkov 2014-11-02 12:11 ` [PATCH] amd64_edac: Build module on x86-32 Tomasz Pala 0 siblings, 2 replies; 10+ messages in thread From: Borislav Petkov @ 2014-11-02 10:33 UTC (permalink / raw) To: Tomasz Pala; +Cc: linux-edac, linux-kernel, trivial On Sun, Nov 02, 2014 at 11:22:12AM +0100, Tomasz Pala wrote: > While both K8 and F10h are AMD64 CPUs, EDAC doesn't require them to run > in long mode; AMD_NB dependency is enough. Not enabling it on 32-bit was a conscious decision for the simple reason that with the current DIMM sizes, you can have 1 or 2 DIMMs tops which you can use on 32-bit and having a fat driver mapping memory errors to DIMMs in that case does seem like a waste of time, energy, resources... you name it. I guess I'll add a note about this in the Kconfig text because I keep getting patches about this once every couple of months :-) Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] amd64_edac: Document why it is 64-bit only 2014-11-02 10:33 ` Borislav Petkov @ 2014-11-02 10:45 ` Borislav Petkov 2014-11-02 12:11 ` [PATCH] amd64_edac: Build module on x86-32 Tomasz Pala 1 sibling, 0 replies; 10+ messages in thread From: Borislav Petkov @ 2014-11-02 10:45 UTC (permalink / raw) To: Tomasz Pala; +Cc: linux-edac, linux-kernel, trivial, Aravind Gopalakrishnan On Sun, Nov 02, 2014 at 11:33:00AM +0100, Borislav Petkov wrote: > I guess I'll add a note about this in the Kconfig text because I keep > getting patches about this once every couple of months :-) --- From: Borislav Petkov <bp@suse.de> Subject: [PATCH] amd64_edac: Document why it is 64-bit only I keep getting requests for this so let's hold it down why it doesn't make sense. Also, drop incorrect statement about K8 and F10h - this driver supports all relevant AMD CPUs. Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> --- drivers/edac/Kconfig | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig index 7072c2892d63..4be205741221 100644 --- a/drivers/edac/Kconfig +++ b/drivers/edac/Kconfig @@ -105,11 +105,16 @@ config EDAC_GHES In doubt, say 'Y'. config EDAC_AMD64 - tristate "AMD64 (Opteron, Athlon64) K8, F10h" + tristate "AMD64 (Opteron, Athlon64)" depends on EDAC_MM_EDAC && AMD_NB && X86_64 && EDAC_DECODE_MCE help Support for error detection and correction of DRAM ECC errors on - the AMD64 families of memory controllers (K8 and F10h) + the AMD64 families of memory controllers, everything >= K8. + + Note: this driver is 64-bit only on purpose because having a big + driver report errors for 1 or 2 DIMMs max on 32-bit, considering + current DIMM sizes, does not make a lot of sense to be worth the + effort and energy. config EDAC_AMD64_ERROR_INJECTION bool "Sysfs HW Error injection facilities" -- 2.0.0 -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] amd64_edac: Build module on x86-32 2014-11-02 10:33 ` Borislav Petkov 2014-11-02 10:45 ` [PATCH] amd64_edac: Document why it is 64-bit only Borislav Petkov @ 2014-11-02 12:11 ` Tomasz Pala 2014-11-02 12:35 ` Borislav Petkov 1 sibling, 1 reply; 10+ messages in thread From: Tomasz Pala @ 2014-11-02 12:11 UTC (permalink / raw) To: Borislav Petkov; +Cc: linux-edac, linux-kernel On Sun, Nov 02, 2014 at 11:33:00 +0100, Borislav Petkov wrote: > Not enabling it on 32-bit was a conscious decision for the simple reason > that with the current DIMM sizes, you can have 1 or 2 DIMMs tops which > you can use on 32-bit and having a fat driver mapping memory errors to > DIMMs in that case does seem like a waste of time, energy, resources... > you name it. In my case it's not about mapping but Detection. === begin story === Recently my PostgreSQL db failed with: invalid page header in block 240 of relation base/49095/161613 which was fortunately 'fixed' by: echo 1 > /proc/sys/vm/drop_caches It turned out that there were on-disk differences between RAID1 (md) components, not only shown by next run of mdadm-checkarray, but also visible in actual filesystem after splitting RAID1 into separate volumes. There were no problems registered in S.M.A.R.T. logs, but _somehow_ my data got corrupted and I got not a single diagnostic tool available. There were no power outages or any other abrupt events, it just happened, without any reason. I've found some page cache corruption reports on the net, but none of those matched my conditions. Currently I'm using checksums at application level (available since PostgreSQL 9.3) and FS level (BTRFS) and EDAC for 4x1 GB ECC UDIMM (I did replace 2x2 GB non-ECC with these). If I could I'd use block-level checksumming or setup RAID1 to scrub-on-read mode, as this system has very low usage volume and I don't care about performance at all. Unfortunately SATA T13 didn't made it to the market, and SCSI drives with DIF/DIX are overkill for this system. === end story === There is absolutely no reason for you to forbid me using EDAC. And your reasoning is flawn because: 1. I got 4 pieces of 1 GB ECC UDIMM, not 1 or 2 as you stated; isn't it supported config? or maybe you would like to replace my modules with 4 GB one free of charge (shipping included)? 2. It is my time, energy and resources, it's not up to you to decide how I'm going to waste them. What next, removing support for power-hungry CPUs? Anyway Poland recently got free CO2 emmisions in EU commision;) 3. If _that_ was the reason, why didn't you made it straightforward by depending on HIGHMEM64G? Because such ridiculous condition would be soon removed? 4. You could apply the same logic to all the other EDAC modules - next to the system mentioned above I got some real server boards, some of them running 32-bit kernel with ECC FBDIMMs and first from the top: config EDAC_I5000 depends on EDAC_MM_EDAC && X86 && PCI Actually there are only 2 X86_64 dependencies in drivers/edac/Kconfig: EDAC_SBRIDGE and EDAC_AMD64 - would you 'fix' every X86 as pointless? 5. If you want to prevent this module from loading when only 1 or 2 DIMMs are installed, just wire this into the module; I got 4 modules. Anyway, even with just 1 module installed, I'd like to know error rates to be aware of memory module/controller quality and replace it when failing too often. > I guess I'll add a note about this in the Kconfig text because I keep > getting patches about this once every couple of months :-) ...so, didn't you think that maybe someone needs this?! Once again: the circuits are working, there is no technical reason not to use them. It's up to the owner to decide whether it makes sense. regards, -- Tomasz Pala <gotar@pld-linux.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] amd64_edac: Build module on x86-32 2014-11-02 12:11 ` [PATCH] amd64_edac: Build module on x86-32 Tomasz Pala @ 2014-11-02 12:35 ` Borislav Petkov 2014-11-02 14:08 ` Tomasz Pala 0 siblings, 1 reply; 10+ messages in thread From: Borislav Petkov @ 2014-11-02 12:35 UTC (permalink / raw) To: Tomasz Pala; +Cc: linux-edac, linux-kernel On Sun, Nov 02, 2014 at 01:11:39PM +0100, Tomasz Pala wrote: > On Sun, Nov 02, 2014 at 11:33:00 +0100, Borislav Petkov wrote: > > > Not enabling it on 32-bit was a conscious decision for the simple reason > > that with the current DIMM sizes, you can have 1 or 2 DIMMs tops which > > you can use on 32-bit and having a fat driver mapping memory errors to > > DIMMs in that case does seem like a waste of time, energy, resources... > > you name it. > > In my case it's not about mapping but Detection. Detection of what? Which DIMMs or simply error reporting? Because you can get reported errors with simply enabling CONFIG_EDAC_DECODE_MCE - you don't really need amd64_edac for that. Or do you want for amd64_edac to try to pinpoint which DIMMs are causing the errors too? > === begin story === > > Recently my PostgreSQL db failed with: > > invalid page header in block 240 of relation base/49095/161613 > > which was fortunately 'fixed' by: > > echo 1 > /proc/sys/vm/drop_caches > > It turned out that there were on-disk differences between RAID1 (md) > components, not only shown by next run of mdadm-checkarray, but also > visible in actual filesystem after splitting RAID1 into separate > volumes. There were no problems registered in S.M.A.R.T. logs, but > _somehow_ my data got corrupted and I got not a single diagnostic tool > available. There were no power outages or any other abrupt events, it > just happened, without any reason. I've found some page cache corruption > reports on the net, but none of those matched my conditions. > > Currently I'm using checksums at application level (available since > PostgreSQL 9.3) and FS level (BTRFS) and EDAC for 4x1 GB ECC UDIMM > (I did replace 2x2 GB non-ECC with these). > > If I could I'd use block-level checksumming or setup RAID1 to > scrub-on-read mode, as this system has very low usage volume and I don't > care about performance at all. Unfortunately SATA T13 didn't made it to > the market, and SCSI drives with DIF/DIX are overkill for this system. So were you able to confirm that those errors went away after replacing the DIMMs? > There is absolutely no reason for you to forbid me using EDAC. > > And your reasoning is flawn because: ... First of all, you need to relax yourself. Just calm down a bit, maybe take a walk first. Take a deep breath, whatever helps. No one is forbidding you anything - we're simply talking here. And since you haven't heard my point yet, acting offended for no apparent reason is simply waste of energy on your part. Now, to the technical side: I'm not talking about your time, energy and resources but about mine! I don't have 32-bit configurations to test 32-bit amd64_edac and am not willing to go buy any. So let me flip your question: are you going to test amd64_edac on 32-bit and fix issues when people report them? If so, I'll gladly enable it there and bounce all such bugs to you for fixing after people start using it. Oh, and also, all fixes for 32-bit should *not* break 64-bit amd64_edac so you'll have to test that too. And this is the main reason why it isn't enabled on 32-bit: lack of resources and desire to maintain. Very simple. And finally, if you're only interested in error rates, CONFIG_EDAC_DECODE_MCE is enough - you get each error reported but without the amd64_edac output. > Once again: the circuits are working, there is no technical reason not > to use them. It's up to the owner to decide whether it makes sense. Not only to the owner, as I've stated above. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] amd64_edac: Build module on x86-32 2014-11-02 12:35 ` Borislav Petkov @ 2014-11-02 14:08 ` Tomasz Pala 2014-11-03 10:55 ` Borislav Petkov 0 siblings, 1 reply; 10+ messages in thread From: Tomasz Pala @ 2014-11-02 14:08 UTC (permalink / raw) To: Borislav Petkov; +Cc: linux-edac, linux-kernel On Sun, Nov 02, 2014 at 13:35:38 +0100, Borislav Petkov wrote: > Or do you want for amd64_edac to try to pinpoint which DIMMs are causing > the errors too? Yes - when error happens, it would be desirable to locate failing module. > So were you able to confirm that those errors went away after replacing > the DIMMs? Can't say - such error (noticed) happened to me only once, how many silent bit rots I've missed is hard to say, as I haven't got data checksums before. The previous modules were well tested in this motherboard, so I can't blame them nor any other component - it's a 'cosmic ray' situation. OK, with EDAC_DECODE_MCE I would know if I should blame RAM or not. But if UCE rate is 1/year I can't randomly remove modules and wait if the problem is gone. Any single UCE should result in action that narrows down the possibile causes. Other than 'replace entire RAM' obviously. > First of all, you need to relax yourself. Just calm down a bit, maybe > take a walk first. Take a deep breath, whatever helps. OK, done. Sorry for being rude. > I'm not talking about your time, energy and resources but about mine! I > don't have 32-bit configurations to test 32-bit amd64_edac and am not > willing to go buy any. So let me flip your question: are you going to > test amd64_edac on 32-bit and fix issues when people report them? 1. Yes, I'm going to test, but no, I'm not capable of fixing it, sorry. 1a. There were other reporters you said, maybe some of them are capable. 2. Were there any op-mode specific issues in this code till now? Does this differ from not having e.g. F10h hardware? If that happens, I might grant remote access to such machine, but that's unfortunatelly all. 3. Didn't know that lack of resources to support discrepancies that might occur (but not occuring right now) is valid reason for disabling module entirely. After all, there are many parts that are not maintained actively at all and nobody removes them preemptively. Back then it could be (X86_64 || EXPERIMENTAL), couldn't now it be just a note in the description? 4. To be honest I think that more people are abandoning x86-32 than enabling ECC on them, so I wouldn't worry about people starting to use this and report 32-bit related errors. If you got reports on this once per a few months that's the order of magnitude we are talking about. So, if 32-bit related error are real threat, not just an excuse, ENOTIME for handling them is fair enough - people determined to have this running like me will find their way. But please don't say it's not _worth_ it, Kconfig descriptions are not a place to make such judgements (as it's YOUR time vs MY data). I'd go for something more objective, like "this driver might be run on 32-bit kernel, however no complains would be accepted due to lack of resources to handle 32-bit specific bugs"). Oh, and one more thing about the proposed description - I've noticed before: [PATCH 01/16] amd64_edac: Remove F11h support Fri, 26 Nov 2010 20:04:08 +0100 F11h doesn't support DRAM ECC so whack it away. and I see F10h, F15h and F16h families only mentioned in amd64_edac.c. regards, -- Tomasz Pala <gotar@pld-linux.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] amd64_edac: Build module on x86-32 2014-11-02 14:08 ` Tomasz Pala @ 2014-11-03 10:55 ` Borislav Petkov 2014-11-05 12:03 ` Tomasz Pala 0 siblings, 1 reply; 10+ messages in thread From: Borislav Petkov @ 2014-11-03 10:55 UTC (permalink / raw) To: Tomasz Pala; +Cc: linux-edac, linux-kernel On Sun, Nov 02, 2014 at 03:08:39PM +0100, Tomasz Pala wrote: > Can't say - such error (noticed) happened to me only once, how many silent bit > rots I've missed is hard to say, as I haven't got data checksums before. > The previous modules were well tested in this motherboard, so I can't > blame them nor any other component - it's a 'cosmic ray' situation. So we still don't know. I wouldn't throw away the old DIMMs if it is a single failure only. > OK, with EDAC_DECODE_MCE I would know if I should blame RAM or not. But > if UCE rate is 1/year I can't randomly remove modules and wait if the > problem is gone. Any single UCE should result in action that narrows > down the possibile causes. Other than 'replace entire RAM' obviously. I think with UCE you mean uncorrectable error? If so, those normally announce themselves by a bunch of correctable errors prior. Which roughly means, a DIMM usually announces that it is going to get bad soon. So yes, enabling EDAC_DECODE_MCE should be a good first step as it will tell you when errors occur. Btw, I forgot to ask, why are you even running 32-bit? Do you have some old K8 CPU which is not 64-bit capable? As a matter of fact, can you apply your patch, enable CONFIG_EDAC_DEBUG and catch dmesg and send it to me, privately is fine too. > 1. Yes, I'm going to test, but no, I'm not capable of fixing it, sorry. > 1a. There were other reporters you said, maybe some of them are capable. Well, I haven't seen anything besides this one patch, similar to yours. And they've claimed that it works. But the devil is in the detail as always so what I'm afraid of is we enable this and then months later, once it trickles down to more users, bug reports start coming in. Bug reports which no one would have time to address and maybe fix. I guess we can add this hunk to your patch, albeit a bit controversial: diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 1a1d7c43a20f..17638d7cf5c2 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -3035,6 +3035,11 @@ static int __init amd64_edac_init(void) goto err_no_instances; setup_pci_device(); + +#ifdef CONFIG_X86_32 + amd64_err("%s on 32-bit is unsupported. USE AT YOUR OWN RISK!\n", EDAC_MOD_STR); +#endif + return 0; err_no_instances: > 2. Were there any op-mode specific issues in this code till now? Does > this differ from not having e.g. F10h hardware? If that happens, I > might grant remote access to such machine, but that's unfortunatelly all. > 3. Didn't know that lack of resources to support discrepancies that might > occur (but not occuring right now) is valid reason for disabling module > entirely. After all, there are many parts that are not maintained > actively at all and nobody removes them preemptively. Back then it > could be (X86_64 || EXPERIMENTAL), couldn't now it be just a note in > the description? EXPERIMENTAL is gone from the kernel. And as I explained already, I just don't have the bandwidth and am not really persuaded supporting 32-bit is worth the effort to do it. It is a conscious decision. That's why I say, if people want it, they can send me fixes but cannot expect me to fix stuff. > 4. To be honest I think that more people are abandoning x86-32 than > enabling ECC on them, so I wouldn't worry about people starting to use this > and report 32-bit related errors. If you got reports on this once per a few > months that's the order of magnitude we are talking about. Probably. And yes, people should abandon 32-bit if they haven't done so! :-D > So, if 32-bit related error are real threat, not just an excuse, ENOTIME > for handling them is fair enough - people determined to have this > running like me will find their way. But please don't say it's not > _worth_ it, Kconfig descriptions are not a place to make such judgements > (as it's YOUR time vs MY data). I'd go for something more objective, > like "this driver might be run on 32-bit kernel, however no complains > would be accepted due to lack of resources to handle 32-bit specific > bugs"). Fair enough. How about the warning above? It will issue upon successful loading on 32-bit. But I'd still like to know what is the reason you're not moving to 64-bit. > Oh, and one more thing about the proposed description - I've noticed before: > > [PATCH 01/16] amd64_edac: Remove F11h support > Fri, 26 Nov 2010 20:04:08 +0100 > > F11h doesn't support DRAM ECC so whack it away. > > and I see F10h, F15h and F16h families only mentioned in amd64_edac.c. I don't understand what you mean here... The driver supports everything from K8 on which can do ECC. Family 11h doesn't support ECC so no need for an EDAC driver. I hope this answers your question. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] amd64_edac: Build module on x86-32 2014-11-03 10:55 ` Borislav Petkov @ 2014-11-05 12:03 ` Tomasz Pala 2014-11-05 14:56 ` Borislav Petkov 0 siblings, 1 reply; 10+ messages in thread From: Tomasz Pala @ 2014-11-05 12:03 UTC (permalink / raw) To: Borislav Petkov; +Cc: linux-edac, linux-kernel On Mon, Nov 03, 2014 at 11:55:08 +0100, Borislav Petkov wrote: >> The previous modules were well tested in this motherboard, so I can't >> blame them nor any other component - it's a 'cosmic ray' situation. > > So we still don't know. I wouldn't throw away the old DIMMs if it is a > single failure only. They found their place in some workstations, for less critical usage. > Btw, I forgot to ask, why are you even running 32-bit? Do you have some > old K8 CPU which is not 64-bit capable? This system was backed-up by some Intel one without 64-bit support and it needed to be fully binary-compatible (including databases storage). Over the time, as older hardware is disposed, it might eventually be upgraded to 64-bit kernel running 32-bit userland in compat mode (full transition is not going to happen soon as costs of such operation, i.e. dumping and restoring all the data, application tests etc. greatly overweight any benefits), but even the kernel change is not trivial due to many quirks that happened every time before (and it was really hard to find some stable configuration). Thus, until there is some bigger maintaince undergoing or the hardware reaches it's lifetime, noone is going to "pay" (allocate time, people at night shifts etc.) for such change. > As a matter of fact, can you apply your patch, enable CONFIG_EDAC_DEBUG > and catch dmesg and send it to me, privately is fine too. There's not much of if related (system is running 3.14.4): MCE: In-kernel MCE decoding enabled. EDAC MC: Ver: 3.0.0 AMD64 EDAC driver v3.4.0 EDAC amd64: DRAM ECC enabled. EDAC amd64: K8 revF or later detected (node 0). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 2048MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 0MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: CS0: Unbuffered DDR2 RAM EDAC amd64: CS2: Unbuffered DDR2 RAM EDAC MC0: Giving out device to module amd64_edac controller K8: DEV 0000:00:18.2 (INTERRUPT) EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.2 (POLLED) (there are 4 modules 1 GB each, I haven't tested if above changes with ganged/unganged mode.) > Fair enough. How about the warning above? It will issue upon successful > loading on 32-bit. That is decent solution IMHO. There is a warning visible in logs (not only in sources or during configuration), so everyone interested would be informed in the first place they should start reading after any possible error happens. > But I'd still like to know what is the reason you're not moving to 64-bit. Mostly because "If it ain't broke, don't fix it" rule. These are systems with a few hundreds days uptime (e.g. 3 weeks ago some malfunction caused 1,5 half year uptime machine to reboot, 500-900 days are not so uncommon, I remember my pain rebooting machines over 1200 days online). Restarting them usually causes some minor troubles (not saved changes), changing software leads to compat troubles (that might be tested before going to production), but changing kernel makes uncertainty about entire platform, so it is avoided until necessary (and these running are polished as much as possible, with backported bugfixes etc.) So replacing rock-solid kernel with some other is a no-go, even preserving the current sources (there might always be some 64-bit related bugs). > The driver supports everything from K8 on which can do ECC. Family 11h > doesn't support ECC so no need for an EDAC driver. I hope this answers I mean your '[PATCH] amd64_edac: Document why it is 64-bit only': - the AMD64 families of memory controllers (K8 and F10h) + the AMD64 families of memory controllers, everything >= K8. "everything >= K8" mislead me. best regards, -- Tomasz Pala <gotar@pld-linux.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] amd64_edac: Build module on x86-32 2014-11-05 12:03 ` Tomasz Pala @ 2014-11-05 14:56 ` Borislav Petkov 2014-11-17 11:15 ` Tomasz Pala 0 siblings, 1 reply; 10+ messages in thread From: Borislav Petkov @ 2014-11-05 14:56 UTC (permalink / raw) To: Tomasz Pala; +Cc: linux-edac, linux-kernel On Wed, Nov 05, 2014 at 01:03:24PM +0100, Tomasz Pala wrote: > This system was backed-up by some Intel one without 64-bit support and > it needed to be fully binary-compatible (including databases storage). > > Over the time, as older hardware is disposed, it might eventually be > upgraded to 64-bit kernel running 32-bit userland in compat mode (full > transition is not going to happen soon as costs of such operation, i.e. > dumping and restoring all the data, application tests etc. greatly > overweight any benefits), but even the kernel change is not trivial due > to many quirks that happened every time before (and it was really hard > to find some stable configuration). Thus, until there is some bigger > maintaince undergoing or the hardware reaches it's lifetime, noone is > going to "pay" (allocate time, people at night shifts etc.) for such > change. Ok, understood. > There's not much of if related (system is running 3.14.4): CONFIG_EDAC_DEBUG gives additional debugging output and this is without it but it doesn't matter - I see you have a K8 box. > That is decent solution IMHO. There is a warning visible in logs (not > only in sources or during configuration), so everyone interested would > be informed in the first place they should start reading after any > possible error happens. I hope :) > Mostly because "If it ain't broke, don't fix it" rule. These are systems > with a few hundreds days uptime (e.g. 3 weeks ago some malfunction > caused 1,5 half year uptime machine to reboot, 500-900 days are not so > uncommon, I remember my pain rebooting machines over 1200 days online). > Restarting them usually causes some minor troubles (not saved changes), > changing software leads to compat troubles (that might be tested before > going to production), but changing kernel makes uncertainty about entire > platform, so it is avoided until necessary (and these running are > polished as much as possible, with backported bugfixes etc.) So > replacing rock-solid kernel with some other is a no-go, even preserving > the current sources (there might always be some 64-bit related bugs). I see. Just FYI though, one serious advantage of 64-bit is that gets orders of magnitude more testing than 32-bit so if you still are contemplating a 64-bit switch someday, remember that fact. :) > > The driver supports everything from K8 on which can do ECC. Family 11h > > doesn't support ECC so no need for an EDAC driver. I hope this answers > > I mean your '[PATCH] amd64_edac: Document why it is 64-bit only': > > - the AMD64 families of memory controllers (K8 and F10h) > + the AMD64 families of memory controllers, everything >= K8. > > "everything >= K8" mislead me. Ok, it is supposed to say, on everything K8 and later. K8 is what you have. What would make it more understandable? So here's an updated version of your patch: --- From: Tomasz Pala <gotar@polanet.pl> Subject: [PATCH] amd64_edac: Build module on x86-32 By popular demand, enable amd64_edac on 32-bit too. Boris: - update Kconfig text. - add a warning on load which states that 32-bit configurations are unsupported. Signed-off-by: Tomasz Pala <gotar@polanet.pl> Link: http://lkml.kernel.org/r/20141102102212.GA7034@polanet.pl Signed-off-by: Borislav Petkov <bp@suse.de> --- drivers/edac/Kconfig | 6 +++--- drivers/edac/amd64_edac.c | 5 +++++ 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig index 7072c2892d63..4316c9e955b3 100644 --- a/drivers/edac/Kconfig +++ b/drivers/edac/Kconfig @@ -105,11 +105,11 @@ config EDAC_GHES In doubt, say 'Y'. config EDAC_AMD64 - tristate "AMD64 (Opteron, Athlon64) K8, F10h" - depends on EDAC_MM_EDAC && AMD_NB && X86_64 && EDAC_DECODE_MCE + tristate "AMD64 (Opteron, Athlon64)" + depends on EDAC_MM_EDAC && AMD_NB && EDAC_DECODE_MCE help Support for error detection and correction of DRAM ECC errors on - the AMD64 families of memory controllers (K8 and F10h) + the AMD64 families (>= K8) of memory controllers. config EDAC_AMD64_ERROR_INJECTION bool "Sysfs HW Error injection facilities" diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 1a1d7c43a20f..17638d7cf5c2 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -3035,6 +3035,11 @@ static int __init amd64_edac_init(void) goto err_no_instances; setup_pci_device(); + +#ifdef CONFIG_X86_32 + amd64_err("%s on 32-bit is unsupported. USE AT YOUR OWN RISK!\n", EDAC_MOD_STR); +#endif + return 0; err_no_instances: -- 2.0.0 Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] amd64_edac: Build module on x86-32 2014-11-05 14:56 ` Borislav Petkov @ 2014-11-17 11:15 ` Tomasz Pala 0 siblings, 0 replies; 10+ messages in thread From: Tomasz Pala @ 2014-11-17 11:15 UTC (permalink / raw) To: Borislav Petkov; +Cc: linux-edac, linux-kernel On Wed, Nov 05, 2014 at 15:56:03 +0100, Borislav Petkov wrote: >> There's not much of if related (system is running 3.14.4): > > CONFIG_EDAC_DEBUG gives additional debugging output and this is without > it but it doesn't matter - I see you have a K8 box. Oh, I've misread the 'DEBUG' part. Unfortunately I've got only K8s to test. > Just FYI though, one serious advantage of 64-bit is that gets orders of > magnitude more testing than 32-bit so if you still are contemplating a > 64-bit switch someday, remember that fact. :) Yes, I'm aware of this and won't put 32-bit on new pieces of hardware. Well, it has always been a pain to support Old, Running, Important Systems;) >> - the AMD64 families of memory controllers (K8 and F10h) >> + the AMD64 families of memory controllers, everything >= K8. >> >> "everything >= K8" mislead me. > > Ok, it is supposed to say, on everything K8 and later. K8 is what you > have. What would make it more understandable? "K8, K10 and everything >= 15h"? To emphase there is a gap? Dunno... > So here's an updated version of your patch: That looks fine, thank you for your assistance and patience! regards, > --- > From: Tomasz Pala <gotar@polanet.pl> > Subject: [PATCH] amd64_edac: Build module on x86-32 > > By popular demand, enable amd64_edac on 32-bit too. > > Boris: > - update Kconfig text. > - add a warning on load which states that 32-bit configurations are unsupported. > > Signed-off-by: Tomasz Pala <gotar@polanet.pl> > Link: http://lkml.kernel.org/r/20141102102212.GA7034@polanet.pl > Signed-off-by: Borislav Petkov <bp@suse.de> > --- > drivers/edac/Kconfig | 6 +++--- > drivers/edac/amd64_edac.c | 5 +++++ > 2 files changed, 8 insertions(+), 3 deletions(-) > > diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig > index 7072c2892d63..4316c9e955b3 100644 > --- a/drivers/edac/Kconfig > +++ b/drivers/edac/Kconfig > @@ -105,11 +105,11 @@ config EDAC_GHES > In doubt, say 'Y'. > > config EDAC_AMD64 > - tristate "AMD64 (Opteron, Athlon64) K8, F10h" > - depends on EDAC_MM_EDAC && AMD_NB && X86_64 && EDAC_DECODE_MCE > + tristate "AMD64 (Opteron, Athlon64)" > + depends on EDAC_MM_EDAC && AMD_NB && EDAC_DECODE_MCE > help > Support for error detection and correction of DRAM ECC errors on > - the AMD64 families of memory controllers (K8 and F10h) > + the AMD64 families (>= K8) of memory controllers. > > config EDAC_AMD64_ERROR_INJECTION > bool "Sysfs HW Error injection facilities" > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c > index 1a1d7c43a20f..17638d7cf5c2 100644 > --- a/drivers/edac/amd64_edac.c > +++ b/drivers/edac/amd64_edac.c > @@ -3035,6 +3035,11 @@ static int __init amd64_edac_init(void) > goto err_no_instances; > > setup_pci_device(); > + > +#ifdef CONFIG_X86_32 > + amd64_err("%s on 32-bit is unsupported. USE AT YOUR OWN RISK!\n", EDAC_MOD_STR); > +#endif > + > return 0; > > err_no_instances: -- Tomasz Pala <gotar@pld-linux.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-11-17 11:15 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-11-02 10:22 [PATCH] amd64_edac: Build module on x86-32 Tomasz Pala 2014-11-02 10:33 ` Borislav Petkov 2014-11-02 10:45 ` [PATCH] amd64_edac: Document why it is 64-bit only Borislav Petkov 2014-11-02 12:11 ` [PATCH] amd64_edac: Build module on x86-32 Tomasz Pala 2014-11-02 12:35 ` Borislav Petkov 2014-11-02 14:08 ` Tomasz Pala 2014-11-03 10:55 ` Borislav Petkov 2014-11-05 12:03 ` Tomasz Pala 2014-11-05 14:56 ` Borislav Petkov 2014-11-17 11:15 ` Tomasz Pala
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox