* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree [not found] ` <1151962114.16528.18.camel@localhost.localdomain> @ 2006-07-04 9:23 ` Andi Kleen 2006-07-04 10:09 ` Alan Cox 2006-07-05 17:39 ` Doug Thompson 0 siblings, 2 replies; 19+ messages in thread From: Andi Kleen @ 2006-07-04 9:23 UTC (permalink / raw) To: Alan Cox; +Cc: Doug Thompson, akpm, mm-commits, norsk5, linux-kernel [moving to l-k - the discussion is how EDAC tries to duplicate already existing interface including a whole new duplicate polling machine check handler] On Mon, Jul 03, 2006 at 10:28:34PM +0100, Alan Cox wrote: > Ar Llu, 2006-07-03 am 20:48 +0200, ysgrifennodd Andi Kleen: > > The only way to get the slot name <-> address mapping is > > to ask the BIOS. > > > > I bet you hardcoded it for your systems right? > > Why don't you read the code ? Wouldn't be hard to check now would it. I'm pretty sure I'm right from the code, but I was asking for confirmation. Ok hardcoding was perhaps the wrong word, but what they output isn't useful to identify the broken DIMM if you don't have very detailed documentation of the motherboards which 99+% of all users don't. > > > Can you describe in more detail why you think that's not the case? > > I did that, you said "buzzwords" insulted me and deleted the argument > then started this second discussion as if it never occurred. Not > productive. It was refering to Doug's assertation that the memory address is not enough to identify the DIMM. I bet it was only because they didn't use the SMBIOS information, but again I was asking for confirmation. Regarding your buzzwords: I don't think mcelog is in any way less "manageable" or "consistent" than EDAC. > > Hmm, i haven't checked, but my understanding was that the newer > > Intel chipsets all forwarded the memory errors as machine > > check anyways. > > Quite a few still in use do not. We also have no idea where the future New ones? Would surprise me. Yes the machine check architecture doesn't try to handle all old systems, but then in practice error reporting on old x86 systems doesn't tend to work particularly well either. > > > I also don't think it's very fortunate to put all the complicated > > decoding code into kernel space. It works just fine in user space. > > Can you explain what value add it gives over machine checks in > > modern systems? > > See my original email, it provides consistency and means that we have > the same interface for different setups. That is very important just > like not having "reiser4_open()" "ext3_open" and the like is. mce code also uses a consistent interface - it's even the same code in kernel space for all systems. > It's also zero cost to people who don't chose to use the EDAC interface. > The alternative is that every single monitoring and hardware management > tool for Linux has to have its own set of glue interfaces for all the > different processor and chip specific details. At least for machine checks the mce interface is a single interface. We don't have a generic interface for logging some of the other errors (like PCI-E errors), but I don't see EDAC solving that. In some ways it's understandable because there is no generic PCI-E error handling code at all yet. > > > > Sorry about that. I saved your email, but at that time got overwhelmed > > > in other matters and just recently got back into EDAC. I apologize for > > > not responding sooner. > > > > Well you wasted a lot of time then redoing what's already done. > > The ecc code predates the MCE bits by years. The re-doing occurred > rather earlier. Rather more useful would be to get the common interface Earlier than the x86-64 machine check code? > provided by things like EDAC provided by the fairly CPU specific mce > code for the newer chips with a clean interface between the two and the > minimum of duplicated code. You could convert the EDAC drivers to log pseudo events with mce_log() like Intel thermal, AMD ecc threshold do. All the heavy decoding should be in user space in mcelog. Giving a consistent sysfs interface is a bit harder, but I suppose one could change the code to provide pseudo banks for enable/disable too. However that would be system specific again, so a default "all on/all off" policy might be quite ok. -Andi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-04 9:23 ` + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree Andi Kleen @ 2006-07-04 10:09 ` Alan Cox 2006-07-04 11:34 ` Andi Kleen 2006-07-05 17:39 ` Doug Thompson 1 sibling, 1 reply; 19+ messages in thread From: Alan Cox @ 2006-07-04 10:09 UTC (permalink / raw) To: Andi Kleen; +Cc: Doug Thompson, akpm, mm-commits, norsk5, linux-kernel Ar Maw, 2006-07-04 am 11:23 +0200, ysgrifennodd Andi Kleen: > Regarding your buzzwords: I don't think mcelog is in any way > less "manageable" or "consistent" than EDAC. Its chip specific rather than generalised so you need awareness of it. > > > Hmm, i haven't checked, but my understanding was that the newer > > > Intel chipsets all forwarded the memory errors as machine > > > check anyways. > > > > Quite a few still in use do not. We also have no idea where the future > > New ones? Would surprise me. All the world is not x86. > Yes the machine check architecture doesn't try to handle all old systems, > but then in practice error reporting on old x86 systems doesn't tend > to work particularly well either. Its pretty solid on the AMD 32bit chipsets and some of the older Intel ones. > mce code also uses a consistent interface - it's even the same > code in kernel space for all systems. For the subset of cases it supports. > We don't have a generic interface for logging some of the other errors > (like PCI-E errors), but I don't see EDAC solving that. In some ways > it's understandable because there is no generic PCI-E error handling > code at all yet. EDAC solves that for the PCI bus side. It's only solving the logging side not the "ok it exploded, now what" question - although there are some unrelated IBM patches in that area. > > The ecc code predates the MCE bits by years. The re-doing occurred > > rather earlier. Rather more useful would be to get the common interface > > Earlier than the x86-64 machine check code? Linux 1.2 I believe, certainly by 2.0 > Giving a consistent sysfs interface is a bit harder, but I suppose one > could change the code to provide pseudo banks for enable/disable too. > However that would be system specific again, so a default "all on/all off" > policy might be quite ok. I think we need the basic consistent sysfs case. Whether that is provided by the mcelog code in the AMD64 case, or by an exported hook from the MCE interfaces for AMD64 or duplicating the code in EDAC isn't so important (avoiding duplication aside of course). Alan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-04 10:09 ` Alan Cox @ 2006-07-04 11:34 ` Andi Kleen 2006-07-05 22:08 ` Alan Cox 0 siblings, 1 reply; 19+ messages in thread From: Andi Kleen @ 2006-07-04 11:34 UTC (permalink / raw) To: Alan Cox; +Cc: Doug Thompson, akpm, mm-commits, norsk5, linux-kernel On Tue, Jul 04, 2006 at 11:09:47AM +0100, Alan Cox wrote: > Ar Maw, 2006-07-04 am 11:23 +0200, ysgrifennodd Andi Kleen: > > Regarding your buzzwords: I don't think mcelog is in any way > > less "manageable" or "consistent" than EDAC. > > Its chip specific rather than generalised so you need awareness of it. You mean the final output? I guess it would be possible to add a generic output format for memory errors in mcelog, but it's not clear you can always get the same information from different chipsets. > > > > Hmm, i haven't checked, but my understanding was that the newer > > > > Intel chipsets all forwarded the memory errors as machine > > > > check anyways. > > > > > > Quite a few still in use do not. We also have no idea where the future > > > > New ones? Would surprise me. > > All the world is not x86. The rest of the world either doesn't do significant error handling (embedded, lowend) or has its own similar to mcelog error handling machine check systems (POWER, IA64) Ok Sparc, pa-risc, old SGI mips are left out currently but I'm sure the maintainers will attack that eventually if there is need. > > We don't have a generic interface for logging some of the other errors > > (like PCI-E errors), but I don't see EDAC solving that. In some ways > > it's understandable because there is no generic PCI-E error handling > > code at all yet. > > EDAC solves that for the PCI bus side. It's only solving the logging > side not the "ok it exploded, now what" question - although there are > some unrelated IBM patches in that area. Yes some of that might be useful still for legacy systems. In the future it should be more standardized with the standard x86 machine check architecture and standardized PCI Express advanced error handling. So generic drivers should do the heavy lifting. I'm not disputing it is still useful for some old systems, it just doesn't seem to be the right part forward for new ones. Is there work going on to hook up the old EDAC drivers for PCI errors to the new error handling? > > > The ecc code predates the MCE bits by years. The re-doing occurred > > > rather earlier. Rather more useful would be to get the common interface > > > > Earlier than the x86-64 machine check code? > > Linux 1.2 I believe, certainly by 2.0 Doubtful you wrote a K8 error handler at this time frame ;-) > > > Giving a consistent sysfs interface is a bit harder, but I suppose one > > could change the code to provide pseudo banks for enable/disable too. > > However that would be system specific again, so a default "all on/all off" > > policy might be quite ok. > > I think we need the basic consistent sysfs case. Whether that is What should i do? -Andi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-04 11:34 ` Andi Kleen @ 2006-07-05 22:08 ` Alan Cox 2006-07-05 22:04 ` Andi Kleen 0 siblings, 1 reply; 19+ messages in thread From: Alan Cox @ 2006-07-05 22:08 UTC (permalink / raw) To: Andi Kleen; +Cc: Doug Thompson, akpm, mm-commits, norsk5, linux-kernel Ar Maw, 2006-07-04 am 13:34 +0200, ysgrifennodd Andi Kleen: > > > Giving a consistent sysfs interface is a bit harder, but I suppose one > > > could change the code to provide pseudo banks for enable/disable too. > > > However that would be system specific again, so a default "all on/all off" > > > policy might be quite ok. > > > > I think we need the basic consistent sysfs case. Whether that is > > What should i do? Well personally I would favour the MCE logging stuff staying in because its clearly small, compact and enough for many users, and the EDAC stuff hooking that feed somehow so that people who want the detail and the common behaviour across platforms can load the extra module. As to filtering and control of the banks - that can always be done by filtering what is handed down from the MCE code if I understand it right so can be left in the EDAC side. But thats just my opinion. It is based on what I'm seeing in terms of feedback from people using EDAC a lot (eg in clusters). Alan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-05 22:08 ` Alan Cox @ 2006-07-05 22:04 ` Andi Kleen 2006-07-06 6:12 ` Eric W. Biederman 0 siblings, 1 reply; 19+ messages in thread From: Andi Kleen @ 2006-07-05 22:04 UTC (permalink / raw) To: Alan Cox; +Cc: Doug Thompson, akpm, mm-commits, norsk5, linux-kernel On Wed, Jul 05, 2006 at 11:08:21PM +0100, Alan Cox wrote: > Ar Maw, 2006-07-04 am 13:34 +0200, ysgrifennodd Andi Kleen: > > > > Giving a consistent sysfs interface is a bit harder, but I suppose one > > > > could change the code to provide pseudo banks for enable/disable too. > > > > However that would be system specific again, so a default "all on/all off" > > > > policy might be quite ok. > > > > > > I think we need the basic consistent sysfs case. Whether that is > > > > What should i do? s/i/it/ of course. Basically what I asked for is what you think that sysfs interface should do. You want a single error / no error knob? The problem is that anything more detailed requires knowledge of the specific hardware. The single knob on standard MCE would be for i in /sys/devices/system/machinecheck/*/bank* echo 0 > $i done (or 0xfffffffffffffffff to turn everything on) What else? What we identified as missing is a unified way for all hardware to report how many errors and on which DIMMs. I think I can easily add that to mcelog (it would already report it, but in a CPU specific format) > > Well personally I would favour the MCE logging stuff staying in because > its clearly small, compact and enough for many users, and the EDAC stuff > hooking that feed somehow so that people who want the detail and the As far as I can figure out there is no more detail offered by it at least for K8. All the information that is given by the Northbridge is in the MCE and the rest for the DIMM topology is in SMBIOS (or could be read from user space if really needed) I went through a similar development myself BTW. When I wrote the first Opteron machine check handler for 2.4 I also coded access to the PCI device and read the registers there. But later i realized that it's useless because the CPU shadows all these registers into the regular machine check MSRs. So you can just get it with a portable handler from there. When I redid the handler i threw it all out. Now you seem to want to add it in again ... Regarding non K8 x86-64 it would need more research, but I hop they also dump everything into the MSRs. > > As to filtering and control of the banks - that can always be done by > filtering what is handed down from the MCE code if I understand it right > so can be left in the EDAC side. I think that should be done in user space. -Andi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-05 22:04 ` Andi Kleen @ 2006-07-06 6:12 ` Eric W. Biederman 2006-07-06 13:01 ` Andi Kleen 0 siblings, 1 reply; 19+ messages in thread From: Eric W. Biederman @ 2006-07-06 6:12 UTC (permalink / raw) To: Andi Kleen Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel I think if this conversation is going to make headway we need to step back a minute, and ask what makes sense to do an where and not get caught up the details of an implementation. The goal of the EDAC code is to report errors the hardware has seen to upper layers of software so someone can do something with them. Also it is hoped we can get a moderately standard interface so that automated tools can recognize and do something when a problem is reported. Is this a reasonable set of goals? Memory errors are by far the most common kind of hardware error and worth discussing. For an uncorrectable memory error there are 3 interesting pieces of information. - Which cpu address did the error happen at. So we can kill the processes using that memory. Although simply killing the entire machine appears acceptable. - What is the chipsets idea of which DIMM the memory error occurred on. For bus based memory architectures like the opteron this is a chip select of the DIMM rank. For serial memory architectures this is some kind of bus address, but still useful for describing individual chips. - What is the silk screen label on the motherboard that corresponds to the chip selects with problems. If you look at the memory controller, and the associated error reporting registers (which are sometimes available in the machine check). There has always been enough information to determine the hardware address the memory controller knows the DIMM by. Getting the address of the error is usually possible but not always and not always very reliably. Mapping between the hardware address that the memory controller knows DIMMS by and the actual DIMMS themselves is actually pretty easy even if you don't have any motherboard information. It is just a matter of plugging in DIMMS in different positions and seeing which DIMMS that the hardware currently sees. It's maybe half a days work on an unknown motherboard. ... Assuming we can agree that this is sane information we want. The remaining question is how do we capture it. For the mapping to the hardware address that the memory controller knows the DIMM by requires the reading of hardware registers, some that are not easily accessible to user space so a kernel driver tends to make sense, just to get the information. Possibly we could just export that information and let the user space figure it out from there. But memory is a key system component and hardware designers are very creative so coming up with a consistent model would be very hard. So far we have had to improve our helper functions every couple of chipsets because the old models broke. Writing a driver split halfway between the kernel and user space sounds silly. .... The other pieces to me seem much more fluid. Especially since EDAC does not yet export much if anything to user space except through printk's in any stable kernel. .... As for the suggestion of using DMI as best as I can determine it suffers rather badly from the never ending creativity of the chipset developers and does not have a model that can describe what needs to happen for the current generation of chipset much less the bleeding edge ones. Which is besides the fact that the only thing that you can usually trust in DMI tables is the motherboard manufacturer. I do think getting the motherboard id out of DMI provides a great key to build a memory controller hardware address to DIMM label lookup table. With EDAC we have been computing that information in user space and caching it kernel side so we could generate immediately useful print statements. Which is handy but probably not necessary. Eric ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-06 6:12 ` Eric W. Biederman @ 2006-07-06 13:01 ` Andi Kleen 2006-07-06 15:31 ` Eric W. Biederman 0 siblings, 1 reply; 19+ messages in thread From: Andi Kleen @ 2006-07-06 13:01 UTC (permalink / raw) To: Eric W. Biederman Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel On Thu, Jul 06, 2006 at 12:12:14AM -0600, Eric W. Biederman wrote: > > I think if this conversation is going to make headway we > need to step back a minute, and ask what makes sense to do > an where and not get caught up the details of an implementation. It's rather from my POV - we already got implementations (most "big" architectures have already own advanced hardware error reporting systems) The EDAC folks want to add another. It's not clear what advantages it gives. As far as I can see it is in many ways a step back to at least compared to the x86-64 code: uses printk, adds tons of code in kernel that is better in user space, doesn't use SMBIOS, requires more CPU specific code instead of using portable MCE interfaces, is very complicated, ... Obviously I'm biased on this, but I went through many of these mistakes already myself when going from the 2.4 to 2.6 MCE handlers. I can see it still being used for old chipsets or chipsets that don't support machine checks for memory errors, but these should be mostly legacy. > - Which cpu address did the error happen at. > So we can kill the processes using that memory. > Although simply killing the entire machine appears acceptable. > > - What is the chipsets idea of which DIMM the memory error occurred on. > For bus based memory architectures like the opteron this > is a chip select of the DIMM rank. > For serial memory architectures this is some kind of bus address, > but still useful for describing individual chips. > > - What is the silk screen label on the motherboard that corresponds > to the chip selects with problems. > > If you look at the memory controller, and the associated error > reporting registers (which are sometimes available in the machine check). > There has always been enough information to determine the hardware > address the memory controller knows the DIMM by. > > Getting the address of the error is usually possible but not always > and not always very reliably. > > Mapping between the hardware address that the memory controller > knows DIMMS by and the actual DIMMS themselves is actually > pretty easy even if you don't have any motherboard information. That's all supposed to be done by the standard machine check handlers. I think EDAC just started because some older chipsets don't integrate error reporting into the standard machine checks. But in newer systems which are the way forward you get it from standard MCEs, no need for special drivers anymore. > It is just a matter of plugging in DIMMS in different positions > and seeing which DIMMS that the hardware currently sees. It's > maybe half a days work on an unknown motherboard. Sorry that's totally unrealistic for anybody outside a hardware vendor or perhaps a big supercomputing lab. Normal users don't want to sit down "half a day" with their new systems to figure out to what the DIMMs map. It either has to "just work" or they won't be able to use it. We need to figure out some way to do it automatically. While SMBIOS is not perfect, it is far better than any manual proposals That said I know SMBIOS can be wrong, so allowing to overwrite it makes sense. But requiring the users to do this by default is a complete non starter IMHO. > > knows the DIMM by requires the reading of hardware registers, > some that are not easily accessible to user space so a kernel driver > tends to make sense, just to get the information. > > Possibly we could just export that information and let the > user space figure it out from there. But memory is a key system You can do it completely in user space. See mcelog as proof. And figuring out the channel in a lot of code etc. seems overkill to me - or at least i haven't gotten an explanation why it's better than just using the reported address. > component and hardware designers are very creative so coming > up with a consistent model would be very hard. So far we Yes the error reporting is still machine specific so far. Doing it generically would be good. > have had to improve our helper functions every couple of chipsets > The other pieces to me seem much more fluid. Especially since EDAC > does not yet export much if anything to user space except through > printk's in any stable kernel. Yes that's another issue. printks are not very good for this. That is why I went over to a specialized logging device. > As for the suggestion of using DMI as best as I can determine it > suffers rather badly from the never ending creativity of the chipset > developers and does not have a model that can describe what needs > to happen for the current generation of chipset much less the bleeding > edge ones. Which is besides the fact that the only thing that you can > usually trust in DMI tables is the motherboard manufacturer. I think you paint it worse than it is. Also there are no realistic alternatives that I can see. Requiring all users to do it by hand is it certainly not. > I do think getting the motherboard id out of DMI provides a great key > to build a memory controller hardware address to DIMM label lookup > table. With EDAC we have been computing that information in user > space and caching it kernel side so we could generate immediately > useful print statements. Which is handy but probably not necessary. Ok that is a proposal, but still won't cover most motherboards that are out there. Having an override table for motherboards where DMI is known to be wrong certainly makes sense to me. But the default has to be DMI I think. If there was such a table somewhere I would be happy to support it with mcelog. But who would be willing to maintain such a table? It would be a lot of work. I'm still optimistic though - if Linux starts to use this information more aggressively then there will be much pressure from customers at least on server level kit vendors who still get this wrong. This won't help all the cheap desktop/laptop boards , but these tend to usually not have more than two DIMMs, so it's not that big an issue. -Andi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-06 13:01 ` Andi Kleen @ 2006-07-06 15:31 ` Eric W. Biederman 2006-07-06 16:51 ` Andi Kleen 0 siblings, 1 reply; 19+ messages in thread From: Eric W. Biederman @ 2006-07-06 15:31 UTC (permalink / raw) To: Andi Kleen Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel Andi Kleen <ak@muc.de> writes: > On Thu, Jul 06, 2006 at 12:12:14AM -0600, Eric W. Biederman wrote: >> >> knows the DIMM by requires the reading of hardware registers, >> some that are not easily accessible to user space so a kernel driver >> tends to make sense, just to get the information. >> >> Possibly we could just export that information and let the >> user space figure it out from there. But memory is a key system > > You can do it completely in user space. See mcelog as proof. > > And figuring out the channel in a lot of code etc. seems overkill to me - or > at least i haven't gotten an explanation why it's better than just > using the reported address. So breaking this down simply. With EDAC on my next boot I get positive confirmation that I either pulled the DIMM that the error happened on, or I pulled a different DIMM. Mapping the hardware addresses to the motherboard silk screen label before hand is unnecessary and just ensures that you pull out the DIMM you are trying for the first time. Making it an optimization for people who do that a lot. To the best of my knowledge mcelog even with the --dmi option cannot give me that. Knowing that we actually pulled out the DIMM that the errors were reported against is what we get by going beyond the address in the machine check. Does that cross the communication divide? Eric ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-06 15:31 ` Eric W. Biederman @ 2006-07-06 16:51 ` Andi Kleen 2006-07-06 17:46 ` Eric W. Biederman 0 siblings, 1 reply; 19+ messages in thread From: Andi Kleen @ 2006-07-06 16:51 UTC (permalink / raw) To: Eric W. Biederman Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel > With EDAC on my next boot I get positive confirmation that I either > pulled the DIMM that the error happened on, or I pulled a different > DIMM. How? You simulate a new error and let EDAC resolve it? > > Mapping the hardware addresses to the motherboard silk screen label > before hand is unnecessary and just ensures that you pull out the DIMM > you are trying for the first time. Making it an optimization for > people who do that a lot. Sorry I didn't parse that. > To the best of my knowledge mcelog even with the --dmi option cannot > give me that. You mean identify if a given DIMM is still plugged in? You can get that information from dmidecode -Andi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-06 16:51 ` Andi Kleen @ 2006-07-06 17:46 ` Eric W. Biederman 2006-07-06 18:08 ` Andi Kleen 0 siblings, 1 reply; 19+ messages in thread From: Eric W. Biederman @ 2006-07-06 17:46 UTC (permalink / raw) To: Andi Kleen Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel Andi Kleen <ak@muc.de> writes: >> With EDAC on my next boot I get positive confirmation that I either >> pulled the DIMM that the error happened on, or I pulled a different >> DIMM. > > How? You simulate a new error and let EDAC resolve it? No. There is a status report that tells you which pieces of hardware your memory controller sees. It is just a simple list. >> To the best of my knowledge mcelog even with the --dmi option cannot >> give me that. > > You mean identify if a given DIMM is still plugged in? You can get that > information from dmidecode If you can reliably decode an error to a DIMM that DMI reports, then yes even if DMI gets the label wrong you can reboot and see if the label you were aiming for is now missing. The principle is the same. The difference is that you can't reliably use DMI to decode to a DIMM. If you look at memory controller registers you can reliably do the same thing without relying on DMI. It works every time. Isn't something that just works, and is not at the mercy of the BIOS developers with too little time worth doing? Eric ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-06 17:46 ` Eric W. Biederman @ 2006-07-06 18:08 ` Andi Kleen 2006-07-06 18:34 ` Alan Cox 2006-07-06 18:43 ` Eric W. Biederman 0 siblings, 2 replies; 19+ messages in thread From: Andi Kleen @ 2006-07-06 18:08 UTC (permalink / raw) To: Eric W. Biederman Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel On Thu, Jul 06, 2006 at 11:46:00AM -0600, Eric W. Biederman wrote: > Andi Kleen <ak@muc.de> writes: > > >> With EDAC on my next boot I get positive confirmation that I either > >> pulled the DIMM that the error happened on, or I pulled a different > >> DIMM. > > > > How? You simulate a new error and let EDAC resolve it? > > No. There is a status report that tells you which pieces of hardware > your memory controller sees. It is just a simple list. Ok but that could be also done easily in user space that reads PCI config space. No need for a complicated kernel driver at all. > Isn't something that just works, and is not at the mercy of the BIOS > developers with too little time worth doing? I just don't see how it's very useful if you don't know which DIMM to replace in the first place. And to know that in your scheme you need your magical database with all motherboards ever shipped, which I don't consider realistic. -Andi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-06 18:08 ` Andi Kleen @ 2006-07-06 18:34 ` Alan Cox 2006-07-06 18:27 ` Andi Kleen 2006-07-06 18:43 ` Eric W. Biederman 1 sibling, 1 reply; 19+ messages in thread From: Alan Cox @ 2006-07-06 18:34 UTC (permalink / raw) To: Andi Kleen Cc: Eric W. Biederman, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel Ar Iau, 2006-07-06 am 20:08 +0200, ysgrifennodd Andi Kleen: > > No. There is a status report that tells you which pieces of hardware > > your memory controller sees. It is just a simple list. > > Ok but that could be also done easily in user space that reads > PCI config space. No need for a complicated kernel driver at all. The same is true of writing a file system and disk driver so I'm a bit confused why you think poking around in PCI space from user space is an argument or given how often such stuff breaks and how messy it gets (eg X) that we want to encourage it ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-06 18:34 ` Alan Cox @ 2006-07-06 18:27 ` Andi Kleen 2006-07-06 19:09 ` Eric W. Biederman 0 siblings, 1 reply; 19+ messages in thread From: Andi Kleen @ 2006-07-06 18:27 UTC (permalink / raw) To: Alan Cox Cc: Eric W. Biederman, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel On Thu, Jul 06, 2006 at 07:34:58PM +0100, Alan Cox wrote: > Ar Iau, 2006-07-06 am 20:08 +0200, ysgrifennodd Andi Kleen: > > > No. There is a status report that tells you which pieces of hardware > > > your memory controller sees. It is just a simple list. > > > > Ok but that could be also done easily in user space that reads > > PCI config space. No need for a complicated kernel driver at all. > > The same is true of writing a file system and disk driver so I'm a bit > confused why you think poking around in PCI space from user space is an > argument or given how often such stuff breaks and how messy it gets (eg > X) that we want to encourage it It depends on what you do. First a large part of X's messiness comes from it not using the proper interfaces. Or it trying to do complicated things like messing with bridges. Then anything with MMIO or interrupts or anything dynamic definitely belongs into kernel space agreed. But at least on K8 DIMM inventory is purely reading PCI config space on something that doesn't change and doesn't need any locking. It also doesn't need to do anything complicated, but just look for the right PCI ID. I don't see an issue with such simple static things in user space. I could probably write it as a shell script that parses lspci output (not saying that that would be the right way, but it's certainly doable) -Andi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-06 18:27 ` Andi Kleen @ 2006-07-06 19:09 ` Eric W. Biederman 2006-07-06 19:18 ` Andi Kleen 0 siblings, 1 reply; 19+ messages in thread From: Eric W. Biederman @ 2006-07-06 19:09 UTC (permalink / raw) To: Andi Kleen Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel Andi Kleen <ak@muc.de> writes: > > It depends on what you do. First a large part of X's messiness > comes from it not using the proper interfaces. > Or it trying to do complicated things like messing with bridges. Yep we sometimes have to mess with complicated things. > Then anything with MMIO or interrupts or anything dynamic > definitely belongs into kernel space agreed. Yep we sometimes have to mess with MMIO. > But at least on K8 DIMM inventory is purely reading PCI config space on > something that doesn't change and doesn't need any locking. > It also doesn't need to do anything complicated, but just look > for the right PCI ID. Mostly. Except for the part where you have to figure out the stepping of the processor connected to the memory controller to properly decode the registers. AMD should have used the revision field in pci config space but... > I don't see an issue with such simple static things in user space. I agree it should be that simple. But if all of your drivers are not that simple it doesn't make sense to put half of them in user space and half of them in the kernel, unless there is a good reason for them not to be in the kernel. Eric ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-06 19:09 ` Eric W. Biederman @ 2006-07-06 19:18 ` Andi Kleen 2006-07-06 19:43 ` Eric W. Biederman 0 siblings, 1 reply; 19+ messages in thread From: Andi Kleen @ 2006-07-06 19:18 UTC (permalink / raw) To: Eric W. Biederman Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel On Thu, Jul 06, 2006 at 01:09:35PM -0600, Eric W. Biederman wrote: > > Then anything with MMIO or interrupts or anything dynamic > > definitely belongs into kernel space agreed. > > Yep we sometimes have to mess with MMIO. Not on K8 at least, no? Maybe we should discuss each chipset separatedly :) > > > But at least on K8 DIMM inventory is purely reading PCI config space on > > something that doesn't change and doesn't need any locking. > > It also doesn't need to do anything complicated, but just look > > for the right PCI ID. > > Mostly. Except for the part where you have to figure out the stepping > of the processor connected to the memory controller to properly decode > the registers. AMD should have used the revision field in pci config > space but... That's in /proc/cpuinfo -Andi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-06 19:18 ` Andi Kleen @ 2006-07-06 19:43 ` Eric W. Biederman 0 siblings, 0 replies; 19+ messages in thread From: Eric W. Biederman @ 2006-07-06 19:43 UTC (permalink / raw) To: Andi Kleen Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel Andi Kleen <ak@muc.de> writes: > On Thu, Jul 06, 2006 at 01:09:35PM -0600, Eric W. Biederman wrote: >> > Then anything with MMIO or interrupts or anything dynamic >> > definitely belongs into kernel space agreed. >> >> Yep we sometimes have to mess with MMIO. > > Not on K8 at least, no? > > Maybe we should discuss each chipset separatedly :) :) >> > But at least on K8 DIMM inventory is purely reading PCI config space on >> > something that doesn't change and doesn't need any locking. >> > It also doesn't need to do anything complicated, but just look >> > for the right PCI ID. >> >> Mostly. Except for the part where you have to figure out the stepping >> of the processor connected to the memory controller to properly decode >> the registers. AMD should have used the revision field in pci config >> space but... > > That's in /proc/cpuinfo Some of it. Taking a quick glance I can't seem to see a nodeid field. Not that it especially likely you would have a system with mixed revision cpus (it is a pain in the BIOS) but since it is possible it at least make sense to try. Eric ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-06 18:08 ` Andi Kleen 2006-07-06 18:34 ` Alan Cox @ 2006-07-06 18:43 ` Eric W. Biederman 1 sibling, 0 replies; 19+ messages in thread From: Eric W. Biederman @ 2006-07-06 18:43 UTC (permalink / raw) To: Andi Kleen Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel Andi Kleen <ak@muc.de> writes: > On Thu, Jul 06, 2006 at 11:46:00AM -0600, Eric W. Biederman wrote: >> Andi Kleen <ak@muc.de> writes: >> >> >> With EDAC on my next boot I get positive confirmation that I either >> >> pulled the DIMM that the error happened on, or I pulled a different >> >> DIMM. >> > >> > How? You simulate a new error and let EDAC resolve it? >> >> No. There is a status report that tells you which pieces of hardware >> your memory controller sees. It is just a simple list. > > Ok but that could be also done easily in user space that reads > PCI config space. No need for a complicated kernel driver at all. User/kernel the task the driver has to do is the same, so the complexity doesn't really change. On some chipsets the registers are memory mapped, on others the memory controller is hidden by default. All of which are hard to deal with from user space. >> Isn't something that just works, and is not at the mercy of the BIOS >> developers with too little time worth doing? > > I just don't see how it's very useful if you don't know which DIMM > to replace in the first place. You do know which DIMM you just don't know what the label on the motherboard is. That is a lot different from knowing that some DIMM is bad. > And to know that in your scheme you need > your magical database with all motherboards ever shipped, which > I don't consider realistic. No you do not need a magic database. Having the mapping will save you a few minutes as you debug your hardware, but it is not critical. If you have a map of rank to motherboard label the work flow is: - Look for the label on the motherboard and pull that DIMM. If you do not have a map of rank to motherboard label the work flow is: - Make an educated guess - Boot up and see if you have pulled your target DIMM. If so stop. - If not turn off the computer - Replace the DIMM - start again with a different DIMM. A simple linear search will not take long, and because you don't have to reproduce the error on the problem DIMM it will go much faster then if all you knew was that a DIMM was bad. If you replace a lot of bad memory on a particular mother board you will either build the map in your head or you will store it in a file. Once you have the map in a file hopefully it will get sent to the EDAC maintainer and some one else can be saved the trouble. Historically memory has had a pretty bad infant mortality so there are people who replace a lot of memory on particular motherboards. Eric ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-04 9:23 ` + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree Andi Kleen 2006-07-04 10:09 ` Alan Cox @ 2006-07-05 17:39 ` Doug Thompson 2006-07-05 19:39 ` Andi Kleen 1 sibling, 1 reply; 19+ messages in thread From: Doug Thompson @ 2006-07-05 17:39 UTC (permalink / raw) To: Andi Kleen, Alan Cox Cc: Doug Thompson, akpm, mm-commits, norsk5, linux-kernel, eric biederman --- Andi Kleen <ak@muc.de> wrote: > [moving to l-k - the discussion is how EDAC tries > to duplicate already existing interface including > a whole new duplicate polling machine check handler] > > On Mon, Jul 03, 2006 at 10:28:34PM +0100, Alan Cox wrote: > > Ar Llu, 2006-07-03 am 20:48 +0200, ysgrifennodd Andi Kleen: > > > The only way to get the slot name <-> address mapping is > > > to ask the BIOS. > > > > > > I bet you hardcoded it for your systems right? > > > > Why don't you read the code ? Wouldn't be hard to check now would > it. > > I'm pretty sure I'm right from the code, but I was asking for > confirmation. The code provides for setting the DIMM labels from userspace (the mechanism) which can utilize DMI tables or explicit motherboard label implementation. See below on trustworthiness of DMI tables. > > Ok hardcoding was perhaps the wrong word, but what they output > isn't useful to identify the broken DIMM if you don't have very > detailed > documentation of the motherboards which 99+% of all users don't. Without correct DMI tables, this is true - it does require motherboard information. But if there are DMI tables, then the labels can be generated from DMI tables. We have notices too that the 'labels' found in the DMI tables do NOT match the silkscreens located on the motherboards proper. > > > > > > Can you describe in more detail why you think that's not the > case? > > > > I did that, you said "buzzwords" insulted me and deleted the > argument > > then started this second discussion as if it never occurred. Not > > productive. > > It was refering to Doug's assertation that the memory address > is not enough to identify the DIMM. > > I bet it was only because they didn't use the SMBIOS information, > but again I was asking for confirmation. EDAC does NOT use the SMBIOS information. It provides userland interface for setting/getting DIMM labels. 75% plus of our systems ship with LinuxBIOS. The rest with factory BIOS for various reasons. LinuxBIOS has NO DMI/SMBIOS information available for searching. Our LinuxBIOS engineers have found that the majority of the DMI/SMBIOS tables are incorrect and provide a false sense of security in terms of getting the right information that is needed in finding failing devices (DIMMs). In Bluesmoke, which is the nursery of EDAC, it utilizes a mechanism to set/get a CSROW/CHANNEL DIMM with a label from userspace. In the generic case, the userland script "can" parse the DMI tables if they are there and the user wishes to perform such automatic labeling. An override allows the explicit specification of vendor/model of motherboard. It does use a file based database for lookup - but that is simply an implementation. Using the same Bluesmoke 'pattern' of setter/getter DIMM labeler, EDAC is providing the mechanism of labeling the DIMMs. Where that label information comes from is pushed to userland: the implementations can be expanded to use DMI tables if desired (dmidecode based script), or an explicit labeling script. Currently, the EDAC scripts are not written, as they are awaiting final sysfs interface. But the pattern is the same as the working bluesmoke scripts. And they can be added to with more policy. Our users demand 100% correct DIMM labeling for error fault isolation, with minimal manual operation - that is the requirement we are trying to satisfy. These items are what lead to the Bluesmoke/EDAC labeling solution pattern. doug thompson > > Regarding your buzzwords: I don't think mcelog is in any way > less "manageable" or "consistent" than EDAC. > > > > > Hmm, i haven't checked, but my understanding was that the newer > > > Intel chipsets all forwarded the memory errors as machine > > > check anyways. > > > > Quite a few still in use do not. We also have no idea where the > future > > New ones? Would surprise me. > > Yes the machine check architecture doesn't try to handle all old > systems, > but then in practice error reporting on old x86 systems doesn't tend > to work particularly well either. > > > > > > I also don't think it's very fortunate to put all the complicated > > > decoding code into kernel space. It works just fine in user > space. > > > Can you explain what value add it gives over machine checks in > > > modern systems? > > > > See my original email, it provides consistency and means that we > have > > the same interface for different setups. That is very important > just > > like not having "reiser4_open()" "ext3_open" and the like is. > > mce code also uses a consistent interface - it's even the same > code in kernel space for all systems. > > > > It's also zero cost to people who don't chose to use the EDAC > interface. > > The alternative is that every single monitoring and hardware > management > > tool for Linux has to have its own set of glue interfaces for all > the > > different processor and chip specific details. > > At least for machine checks the mce interface is a single interface. > > We don't have a generic interface for logging some of the other > errors > (like PCI-E errors), but I don't see EDAC solving that. In some ways > it's understandable because there is no generic PCI-E error handling > code at all yet. > > > > > > > Sorry about that. I saved your email, but at that time got > overwhelmed > > > > in other matters and just recently got back into EDAC. I > apologize for > > > > not responding sooner. > > > > > > Well you wasted a lot of time then redoing what's already done. > > > > The ecc code predates the MCE bits by years. The re-doing occurred > > rather earlier. Rather more useful would be to get the common > interface > > Earlier than the x86-64 machine check code? > > > provided by things like EDAC provided by the fairly CPU specific > mce > > code for the newer chips with a clean interface between the two and > the > > minimum of duplicated code. > > You could convert the EDAC drivers to log pseudo events with > mce_log() like Intel > thermal, AMD ecc threshold do. All the heavy decoding should be in > user space > in mcelog. > > Giving a consistent sysfs interface is a bit harder, but I suppose > one > could change the code to provide pseudo banks for enable/disable too. > However that would be system specific again, so a default "all on/all > off" > policy might be quite ok. > > -Andi > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree 2006-07-05 17:39 ` Doug Thompson @ 2006-07-05 19:39 ` Andi Kleen 0 siblings, 0 replies; 19+ messages in thread From: Andi Kleen @ 2006-07-05 19:39 UTC (permalink / raw) To: Doug Thompson Cc: Alan Cox, akpm, mm-commits, norsk5, linux-kernel, eric biederman Ok since you didn't cover it I assume you agree that just using the address to get the DIMM is sufficient. Thanks. > Our LinuxBIOS engineers have found that the majority of the DMI/SMBIOS > tables are incorrect and provide a false sense of security in terms of > getting the right information that is needed in finding failing devices > (DIMMs). Hmm, I found a few outlyers[1] but most systems I checked were reasonable or had only small problems. I could however not always verify the mappings by pulling out DIMMs. Anyways why does LinuxBIOS not just supply a DMI table? Would seem to me like a vastly more elegant solution than requiring something in user space to identify the system in other ways. I don't even want to guess how you identify systems without a DMI table ... [1] A few not to be named but well known vendors seem to be too lazy to set the tables up properly and always mapped all addresses to all DIMMs. Since it's a serious RAS disadvantage for their systems I suppose angry customers will sooner or later fix that issue though. > Our users demand 100% correct DIMM labeling for error fault isolation, > with minimal manual operation - that is the requirement we are trying > to satisfy. These items are what lead to the Bluesmoke/EDAC labeling > solution pattern. Ok I can see that. But it makes it a very narrow solution because other people don't know as much about their hardware as you do. For mainline Linux we should try to focus support on standard mainstream PC hard&firm&software, not custom systems like you seem to attempt to. If you find wrong SM tables to be a serious problem I guess it would be possible to add a way to overwrite them in mcelog. Anyways you haven't described anything so far that the existing machine check handler/mcelog cannot do (mcelog with some small tweaks) -Andi ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2006-07-06 19:44 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20060701150430.GA38488@muc.de>
[not found] ` <20060703172633.50366.qmail@web50109.mail.yahoo.com>
[not found] ` <20060703184836.GA46236@muc.de>
[not found] ` <1151962114.16528.18.camel@localhost.localdomain>
2006-07-04 9:23 ` + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree Andi Kleen
2006-07-04 10:09 ` Alan Cox
2006-07-04 11:34 ` Andi Kleen
2006-07-05 22:08 ` Alan Cox
2006-07-05 22:04 ` Andi Kleen
2006-07-06 6:12 ` Eric W. Biederman
2006-07-06 13:01 ` Andi Kleen
2006-07-06 15:31 ` Eric W. Biederman
2006-07-06 16:51 ` Andi Kleen
2006-07-06 17:46 ` Eric W. Biederman
2006-07-06 18:08 ` Andi Kleen
2006-07-06 18:34 ` Alan Cox
2006-07-06 18:27 ` Andi Kleen
2006-07-06 19:09 ` Eric W. Biederman
2006-07-06 19:18 ` Andi Kleen
2006-07-06 19:43 ` Eric W. Biederman
2006-07-06 18:43 ` Eric W. Biederman
2006-07-05 17:39 ` Doug Thompson
2006-07-05 19:39 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox