Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
       [not found]     ` <1151962114.16528.18.camel@localhost.localdomain>
@ 2006-07-04  9:23       ` Andi Kleen
  2006-07-04 10:09         ` Alan Cox
  2006-07-05 17:39         ` Doug Thompson
  0 siblings, 2 replies; 19+ messages in thread
From: Andi Kleen @ 2006-07-04  9:23 UTC (permalink / raw)
  To: Alan Cox; +Cc: Doug Thompson, akpm, mm-commits, norsk5, linux-kernel

[moving to l-k - the discussion is how EDAC tries
to duplicate already existing interface including
a whole new duplicate polling machine check handler]

On Mon, Jul 03, 2006 at 10:28:34PM +0100, Alan Cox wrote:
> Ar Llu, 2006-07-03 am 20:48 +0200, ysgrifennodd Andi Kleen:
> > The only way to get the slot name <-> address mapping is 
> > to ask the BIOS.
> > 
> > I bet you hardcoded it for your systems right?
> 
> Why don't you read the code ? Wouldn't be hard to check now would it.

I'm pretty sure I'm right from the code, but I was asking for confirmation.

Ok hardcoding was perhaps the wrong word, but what they output
isn't useful to identify the broken DIMM if you don't have very detailed 
documentation of the motherboards which 99+% of all users don't.

> 
> > Can you describe in more detail why you think that's not the case?
> 
> I did that, you said "buzzwords" insulted me and deleted the argument
> then started this second discussion as if it never occurred. Not
> productive.

It was refering to Doug's assertation that the memory address
is not enough to identify the DIMM.

I bet it was only because they didn't use the SMBIOS information,
but again I was asking for confirmation.

Regarding your buzzwords: I don't think mcelog is in any way
less "manageable" or "consistent" than EDAC.

> > Hmm, i haven't checked, but my understanding was that the newer
> > Intel chipsets all forwarded the memory errors as machine 
> > check anyways.
> 
> Quite a few still in use do not. We also have no idea where the future

New ones?  Would surprise me.

Yes the machine check architecture doesn't try to handle all old systems,
but then in practice error reporting on old x86 systems doesn't tend
to work particularly well either.

> 
> > I also don't think it's very fortunate to put all the complicated
> > decoding code into kernel space. It works just fine in user space.
> > Can you explain what value add it gives over machine checks in
> > modern systems?
> 
> See my original email, it provides consistency and means that we have
> the same interface for different setups. That is very important just
> like not having "reiser4_open()" "ext3_open" and the like is.

mce code also uses a consistent interface - it's even the same
code in kernel space for all systems.

> It's also zero cost to people who don't chose to use the EDAC interface.
> The alternative is that every single monitoring and hardware management
> tool for Linux has to have its own set of glue interfaces for all the
> different processor and chip specific details.

At least for machine checks the mce interface is a single interface.

We don't have a generic interface for logging some of the other errors
(like PCI-E errors), but I don't see EDAC solving that. In some ways
it's understandable because there is no generic PCI-E error handling
code at all yet.

> 
> > > Sorry about that. I saved your email, but at that time got overwhelmed
> > > in other matters and just recently got back into EDAC. I apologize for
> > > not responding sooner.
> > 
> > Well you wasted a lot of time then redoing what's already done.
> 
> The ecc code predates the MCE bits by years. The re-doing occurred
> rather earlier. Rather more useful would be to get the common interface

Earlier than the x86-64 machine check code?

> provided by things like EDAC provided by the fairly CPU specific mce
> code for the newer chips with a clean interface between the two and the
> minimum of duplicated code.

You could convert the EDAC drivers to log pseudo events with mce_log() like Intel
thermal, AMD ecc threshold do. All the heavy decoding should be in user space
in mcelog.

Giving a consistent sysfs interface is a bit harder, but I suppose one 
could change the code to provide pseudo banks for enable/disable too.
However that would be system specific again, so a default "all on/all off" 
policy might be quite ok.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-04  9:23       ` + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree Andi Kleen
@ 2006-07-04 10:09         ` Alan Cox
  2006-07-04 11:34           ` Andi Kleen
  2006-07-05 17:39         ` Doug Thompson
  1 sibling, 1 reply; 19+ messages in thread
From: Alan Cox @ 2006-07-04 10:09 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Doug Thompson, akpm, mm-commits, norsk5, linux-kernel

Ar Maw, 2006-07-04 am 11:23 +0200, ysgrifennodd Andi Kleen:
> Regarding your buzzwords: I don't think mcelog is in any way
> less "manageable" or "consistent" than EDAC.

Its chip specific rather than generalised so you need awareness of it.

> > > Hmm, i haven't checked, but my understanding was that the newer
> > > Intel chipsets all forwarded the memory errors as machine 
> > > check anyways.
> > 
> > Quite a few still in use do not. We also have no idea where the future
> 
> New ones?  Would surprise me.

All the world is not x86. 

> Yes the machine check architecture doesn't try to handle all old systems,
> but then in practice error reporting on old x86 systems doesn't tend
> to work particularly well either.

Its pretty solid on the AMD 32bit chipsets and some of the older Intel
ones. 

> mce code also uses a consistent interface - it's even the same
> code in kernel space for all systems.

For the subset of cases it supports.

> We don't have a generic interface for logging some of the other errors
> (like PCI-E errors), but I don't see EDAC solving that. In some ways
> it's understandable because there is no generic PCI-E error handling
> code at all yet.

EDAC solves that for the PCI bus side. It's only solving the logging
side not the "ok it exploded, now what" question - although there are
some unrelated IBM patches in that area.

> > The ecc code predates the MCE bits by years. The re-doing occurred
> > rather earlier. Rather more useful would be to get the common interface
> 
> Earlier than the x86-64 machine check code?

Linux 1.2 I believe, certainly by 2.0

> Giving a consistent sysfs interface is a bit harder, but I suppose one 
> could change the code to provide pseudo banks for enable/disable too.
> However that would be system specific again, so a default "all on/all off" 
> policy might be quite ok.

I think we need the basic consistent sysfs case. Whether that is
provided by the mcelog code in the AMD64 case, or by an exported hook
from the MCE interfaces for AMD64 or duplicating the code in EDAC isn't
so important (avoiding duplication aside of course).


Alan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-04 10:09         ` Alan Cox
@ 2006-07-04 11:34           ` Andi Kleen
  2006-07-05 22:08             ` Alan Cox
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2006-07-04 11:34 UTC (permalink / raw)
  To: Alan Cox; +Cc: Doug Thompson, akpm, mm-commits, norsk5, linux-kernel

On Tue, Jul 04, 2006 at 11:09:47AM +0100, Alan Cox wrote:
> Ar Maw, 2006-07-04 am 11:23 +0200, ysgrifennodd Andi Kleen:
> > Regarding your buzzwords: I don't think mcelog is in any way
> > less "manageable" or "consistent" than EDAC.
> 
> Its chip specific rather than generalised so you need awareness of it.

You mean the final output?

I guess it would be possible to add a generic output format
for memory errors in mcelog, but it's not clear you can always get
the same information from different chipsets.


> > > > Hmm, i haven't checked, but my understanding was that the newer
> > > > Intel chipsets all forwarded the memory errors as machine 
> > > > check anyways.
> > > 
> > > Quite a few still in use do not. We also have no idea where the future
> > 
> > New ones?  Would surprise me.
> 
> All the world is not x86. 

The rest of the world either doesn't do significant error handling 
(embedded, lowend) or has its own similar to mcelog error handling machine 
check systems (POWER, IA64) 

Ok Sparc, pa-risc, old SGI mips are left out currently but I'm sure the 
maintainers will attack that eventually if there is need. 

> > We don't have a generic interface for logging some of the other errors
> > (like PCI-E errors), but I don't see EDAC solving that. In some ways
> > it's understandable because there is no generic PCI-E error handling
> > code at all yet.
> 
> EDAC solves that for the PCI bus side. It's only solving the logging
> side not the "ok it exploded, now what" question - although there are
> some unrelated IBM patches in that area.

Yes some of that might be useful still for legacy systems.

In the future it should be more standardized with the standard x86
machine check architecture and standardized PCI Express advanced
error handling. So generic drivers should do the heavy lifting.

I'm not disputing it is still useful for some old systems, it just
doesn't seem to be the right part forward for new ones.

Is there work going on to hook up the old EDAC drivers for PCI errors to 
the new error handling? 


> > > The ecc code predates the MCE bits by years. The re-doing occurred
> > > rather earlier. Rather more useful would be to get the common interface
> > 
> > Earlier than the x86-64 machine check code?
> 
> Linux 1.2 I believe, certainly by 2.0

Doubtful you wrote a K8 error handler at this time frame ;-)

> 
> > Giving a consistent sysfs interface is a bit harder, but I suppose one 
> > could change the code to provide pseudo banks for enable/disable too.
> > However that would be system specific again, so a default "all on/all off" 
> > policy might be quite ok.
> 
> I think we need the basic consistent sysfs case. Whether that is

What should i do?

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-04  9:23       ` + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree Andi Kleen
  2006-07-04 10:09         ` Alan Cox
@ 2006-07-05 17:39         ` Doug Thompson
  2006-07-05 19:39           ` Andi Kleen
  1 sibling, 1 reply; 19+ messages in thread
From: Doug Thompson @ 2006-07-05 17:39 UTC (permalink / raw)
  To: Andi Kleen, Alan Cox
  Cc: Doug Thompson, akpm, mm-commits, norsk5, linux-kernel,
	eric biederman

--- Andi Kleen <ak@muc.de> wrote:

> [moving to l-k - the discussion is how EDAC tries
> to duplicate already existing interface including
> a whole new duplicate polling machine check handler]
> 
> On Mon, Jul 03, 2006 at 10:28:34PM +0100, Alan Cox wrote:
> > Ar Llu, 2006-07-03 am 20:48 +0200, ysgrifennodd Andi Kleen:
> > > The only way to get the slot name <-> address mapping is 
> > > to ask the BIOS.
> > > 
> > > I bet you hardcoded it for your systems right?
> > 
> > Why don't you read the code ? Wouldn't be hard to check now would
> it.
> 
> I'm pretty sure I'm right from the code, but I was asking for
> confirmation.

The code provides for setting the DIMM labels from userspace (the
mechanism) which can utilize DMI tables or explicit motherboard label
implementation. See below on trustworthiness of DMI tables.

> 
> Ok hardcoding was perhaps the wrong word, but what they output
> isn't useful to identify the broken DIMM if you don't have very
> detailed 
> documentation of the motherboards which 99+% of all users don't.

Without correct DMI tables, this is true - it does require motherboard
information. But if there are DMI tables, then the labels can be
generated from DMI tables.  We have notices too that the 'labels' found
in the DMI tables do NOT match the silkscreens located on the
motherboards proper. 

> 
> > 
> > > Can you describe in more detail why you think that's not the
> case?
> > 
> > I did that, you said "buzzwords" insulted me and deleted the
> argument
> > then started this second discussion as if it never occurred. Not
> > productive.
> 
> It was refering to Doug's assertation that the memory address
> is not enough to identify the DIMM.
> 
> I bet it was only because they didn't use the SMBIOS information,
> but again I was asking for confirmation.

EDAC does NOT use the SMBIOS information. It provides userland
interface for setting/getting DIMM labels.

75% plus of our systems ship with LinuxBIOS. The rest with factory BIOS
for various reasons. LinuxBIOS has NO DMI/SMBIOS information available
for searching.

Our LinuxBIOS engineers have found that the majority of the DMI/SMBIOS
tables are incorrect and provide a false sense of security in terms of
getting the right information that is needed in finding failing devices
(DIMMs).

In Bluesmoke, which is the nursery of EDAC, it utilizes a mechanism to
set/get a CSROW/CHANNEL DIMM with a label from userspace. In the
generic case, the userland script "can" parse the DMI tables if they
are there and the user wishes to perform such automatic labeling. An
override allows the explicit specification of vendor/model of
motherboard. It does use a file based database for lookup - but that is
simply an implementation.

Using the same Bluesmoke 'pattern' of setter/getter DIMM labeler, EDAC
is providing the mechanism of labeling the DIMMs. Where that label
information comes from is pushed to userland: the implementations can
be expanded to use DMI tables if desired (dmidecode based script), or
an explicit labeling script. Currently, the EDAC scripts are not
written, as they are awaiting final sysfs interface. But the pattern is
the same as the working bluesmoke scripts. And they can be added to
with more policy.

Our users demand 100% correct DIMM labeling for error fault isolation,
with minimal manual operation - that is the requirement we are trying
to satisfy. These items are what lead to the Bluesmoke/EDAC labeling
solution pattern.

doug thompson

> 
> Regarding your buzzwords: I don't think mcelog is in any way
> less "manageable" or "consistent" than EDAC.
> 
> 
> > > Hmm, i haven't checked, but my understanding was that the newer
> > > Intel chipsets all forwarded the memory errors as machine 
> > > check anyways.
> > 
> > Quite a few still in use do not. We also have no idea where the
> future
> 
> New ones?  Would surprise me.
> 
> Yes the machine check architecture doesn't try to handle all old
> systems,
> but then in practice error reporting on old x86 systems doesn't tend
> to work particularly well either.
> 
> > 
> > > I also don't think it's very fortunate to put all the complicated
> > > decoding code into kernel space. It works just fine in user
> space.
> > > Can you explain what value add it gives over machine checks in
> > > modern systems?
> > 
> > See my original email, it provides consistency and means that we
> have
> > the same interface for different setups. That is very important
> just
> > like not having "reiser4_open()" "ext3_open" and the like is.
> 
> mce code also uses a consistent interface - it's even the same
> code in kernel space for all systems.
> 
> 
> > It's also zero cost to people who don't chose to use the EDAC
> interface.
> > The alternative is that every single monitoring and hardware
> management
> > tool for Linux has to have its own set of glue interfaces for all
> the
> > different processor and chip specific details.
> 
> At least for machine checks the mce interface is a single interface.
> 
> We don't have a generic interface for logging some of the other
> errors
> (like PCI-E errors), but I don't see EDAC solving that. In some ways
> it's understandable because there is no generic PCI-E error handling
> code at all yet.
> 
> > 
> > > > Sorry about that. I saved your email, but at that time got
> overwhelmed
> > > > in other matters and just recently got back into EDAC. I
> apologize for
> > > > not responding sooner.
> > > 
> > > Well you wasted a lot of time then redoing what's already done.
> > 
> > The ecc code predates the MCE bits by years. The re-doing occurred
> > rather earlier. Rather more useful would be to get the common
> interface
> 
> Earlier than the x86-64 machine check code?
> 
> > provided by things like EDAC provided by the fairly CPU specific
> mce
> > code for the newer chips with a clean interface between the two and
> the
> > minimum of duplicated code.
> 
> You could convert the EDAC drivers to log pseudo events with
> mce_log() like Intel
> thermal, AMD ecc threshold do. All the heavy decoding should be in
> user space
> in mcelog.
> 
> Giving a consistent sysfs interface is a bit harder, but I suppose
> one 
> could change the code to provide pseudo banks for enable/disable too.
> However that would be system specific again, so a default "all on/all
> off" 
> policy might be quite ok.
> 
> -Andi
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-05 17:39         ` Doug Thompson
@ 2006-07-05 19:39           ` Andi Kleen
  0 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2006-07-05 19:39 UTC (permalink / raw)
  To: Doug Thompson
  Cc: Alan Cox, akpm, mm-commits, norsk5, linux-kernel, eric biederman

Ok since you didn't cover it I assume you agree that just using
the address to get the DIMM is sufficient. Thanks.

> Our LinuxBIOS engineers have found that the majority of the DMI/SMBIOS
> tables are incorrect and provide a false sense of security in terms of
> getting the right information that is needed in finding failing devices
> (DIMMs).

Hmm, I found a few outlyers[1] but most systems I checked were 
reasonable or had only small problems. I could however not 
always verify the mappings by pulling out DIMMs. 

Anyways why does LinuxBIOS not just supply a DMI table? Would 
seem to me like a vastly more elegant solution than requiring
something in user space to identify the system in other ways.

I don't even want to guess how you identify systems without
a DMI table ...

[1] A few not to be named but well known vendors seem to be too lazy
to set the tables up properly and always mapped all addresses to all DIMMs. 
Since it's a serious RAS disadvantage for their systems I suppose
angry customers will sooner or later fix that issue though.

> Our users demand 100% correct DIMM labeling for error fault isolation,
> with minimal manual operation - that is the requirement we are trying
> to satisfy. These items are what lead to the Bluesmoke/EDAC labeling
> solution pattern.

Ok I can see that. But it makes it a very narrow solution because
other people don't know as much about their hardware as you do.

For mainline Linux we should try to focus support on standard mainstream PC 
hard&firm&software, not custom systems like you seem to attempt to.

If you find wrong SM tables to be a serious problem I guess
it would be possible to add a way to overwrite them in mcelog.

Anyways you haven't described anything so far that the existing
machine check handler/mcelog cannot do (mcelog with some small tweaks) 

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-05 22:08             ` Alan Cox
@ 2006-07-05 22:04               ` Andi Kleen
  2006-07-06  6:12                 ` Eric W. Biederman
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2006-07-05 22:04 UTC (permalink / raw)
  To: Alan Cox; +Cc: Doug Thompson, akpm, mm-commits, norsk5, linux-kernel

On Wed, Jul 05, 2006 at 11:08:21PM +0100, Alan Cox wrote:
> Ar Maw, 2006-07-04 am 13:34 +0200, ysgrifennodd Andi Kleen:
> > > > Giving a consistent sysfs interface is a bit harder, but I suppose one 
> > > > could change the code to provide pseudo banks for enable/disable too.
> > > > However that would be system specific again, so a default "all on/all off" 
> > > > policy might be quite ok.
> > > 
> > > I think we need the basic consistent sysfs case. Whether that is
> > 
> > What should i do?

s/i/it/ of course.

Basically what I asked for is what you think that sysfs interface
should do.

You want a single error / no error knob? 

The problem is that anything more detailed requires knowledge of the
specific hardware.

The single knob on standard MCE would be 

for i in /sys/devices/system/machinecheck/*/bank*
	echo 0 > $i
done

(or 0xfffffffffffffffff to turn everything on) 

What else?

What we identified as missing is a unified way for all hardware
to report how many errors and on which DIMMs. I think I can easily
add that to mcelog (it would already report it, but in a CPU
specific format) 

> 
> Well personally I would favour the MCE logging stuff staying in because
> its clearly small, compact and enough for many users, and the EDAC stuff
> hooking that feed somehow so that people who want the detail and the

As far as I can figure out there is no more detail offered by it at least
for K8.  All the information that is given by the Northbridge is in the MCE
and the rest for the DIMM topology is in SMBIOS (or could be read from user 
space if really needed) 

I went through a similar development myself BTW. When I wrote
the first Opteron machine check handler for 2.4 I also coded
access to the PCI device and read the registers there.
But later i realized that it's useless because the CPU shadows
all these registers into the regular machine check MSRs. So you
can just get it with a portable handler from there. When I redid
the handler i threw it all out.

Now you seem to want to add it in again ... 

Regarding non K8 x86-64 it would need more research, but I hop
they also dump everything into the MSRs.

> 
> As to filtering and control of the banks - that can always be done by
> filtering what is handed down from the MCE code if I understand it right
> so can be left in the EDAC side.

I think that should be done in user space.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-04 11:34           ` Andi Kleen
@ 2006-07-05 22:08             ` Alan Cox
  2006-07-05 22:04               ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: Alan Cox @ 2006-07-05 22:08 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Doug Thompson, akpm, mm-commits, norsk5, linux-kernel

Ar Maw, 2006-07-04 am 13:34 +0200, ysgrifennodd Andi Kleen:
> > > Giving a consistent sysfs interface is a bit harder, but I suppose one 
> > > could change the code to provide pseudo banks for enable/disable too.
> > > However that would be system specific again, so a default "all on/all off" 
> > > policy might be quite ok.
> > 
> > I think we need the basic consistent sysfs case. Whether that is
> 
> What should i do?

Well personally I would favour the MCE logging stuff staying in because
its clearly small, compact and enough for many users, and the EDAC stuff
hooking that feed somehow so that people who want the detail and the
common behaviour across platforms can load the extra module.

As to filtering and control of the banks - that can always be done by
filtering what is handed down from the MCE code if I understand it right
so can be left in the EDAC side.

But thats just my opinion. It is based on what I'm seeing in terms of
feedback from people using EDAC a lot (eg in clusters). 

Alan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-05 22:04               ` Andi Kleen
@ 2006-07-06  6:12                 ` Eric W. Biederman
  2006-07-06 13:01                   ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: Eric W. Biederman @ 2006-07-06  6:12 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel

I think if this conversation is going to make headway we
need to step back a minute, and ask what makes sense to do
an where and not get caught up the details of an implementation.

The goal of the EDAC code is to report errors the hardware has
seen to upper layers of software so someone can do something with
them.    Also it is hoped we can get a moderately standard interface
so that automated tools can recognize and do something when a
problem is reported.  Is this a reasonable set of goals?

Memory errors are by far the most common kind of hardware error
and worth discussing.    For an uncorrectable memory error there
are 3 interesting pieces of information.

- Which cpu address did the error happen at.
  So we can kill the processes using that memory.
  Although simply killing the entire machine appears acceptable.

- What is the chipsets idea of which DIMM the memory error occurred on.
  For bus based memory architectures like the opteron this
  is a chip select of the DIMM rank.
  For serial memory architectures this is some kind of bus address,
  but still useful for describing individual chips.

- What is the silk screen label on the motherboard that corresponds
  to the chip selects with problems.

If you look at the memory controller, and the associated error
reporting registers (which are sometimes available in the machine check).
There has always been enough information to determine the hardware
address the memory controller knows the DIMM by.

Getting the address of the error is usually possible but not always
and not always very reliably.

Mapping between the hardware address that the memory controller
knows DIMMS by and the actual DIMMS themselves is actually
pretty easy even if you don't have any motherboard information.
It is just a matter of plugging in DIMMS in different positions
and seeing which DIMMS that the hardware currently sees.  It's
maybe half a days work on an unknown motherboard.

...

Assuming we can agree that this is sane information we want.  The
remaining question is how do we capture it.

For the mapping to the hardware address that the memory controller
knows the DIMM by requires the reading of hardware registers,
some that are not easily accessible to user space so a kernel driver
tends to make sense, just to get the information.  

Possibly we could just  export that information and let the
user space figure it out from there.   But memory is a key system
component and hardware designers are very creative so coming
up with a consistent model would be very hard.  So far we
have had to improve our helper functions every couple of chipsets
because the old models broke.  Writing a driver split halfway between
the kernel and user space sounds silly.

....

The other pieces to me seem much more fluid.  Especially since EDAC
does not yet export much if anything to user space except through
printk's in any stable kernel.

....

As for the suggestion of using DMI as best as I can determine it
suffers rather badly from the never ending creativity of the chipset
developers and does not have a model that can describe what needs
to happen for the current generation of chipset much less the bleeding
edge ones.  Which is besides the fact that the only thing that you can
usually trust in DMI tables is the motherboard manufacturer.

I do think getting the motherboard id out of DMI provides a great key
to build a memory controller hardware address to DIMM label lookup
table.   With EDAC we have been computing that information in user
space and caching it kernel side so we could generate immediately
useful print statements.  Which is handy but probably not necessary.

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-06  6:12                 ` Eric W. Biederman
@ 2006-07-06 13:01                   ` Andi Kleen
  2006-07-06 15:31                     ` Eric W. Biederman
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2006-07-06 13:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel

On Thu, Jul 06, 2006 at 12:12:14AM -0600, Eric W. Biederman wrote:
> 
> I think if this conversation is going to make headway we
> need to step back a minute, and ask what makes sense to do
> an where and not get caught up the details of an implementation.

It's rather from my POV - 

we already got implementations (most "big" architectures have 
already own advanced hardware error reporting systems) 

The EDAC folks want to add another.

It's not clear what advantages it gives.

As far as I can see it is in many ways a step back to at least
compared to the x86-64 code:

uses printk, adds tons of code in kernel that is better
in user space, doesn't use SMBIOS, requires more CPU specific
code instead of using portable MCE interfaces, is very complicated, ...

Obviously I'm biased on this, but I went through many of these
mistakes already myself when going from the 2.4 to 2.6 MCE handlers.

I can see it still being used for old chipsets or chipsets
that don't support machine checks for memory errors, but these
should be mostly legacy.

> - Which cpu address did the error happen at.
>   So we can kill the processes using that memory.
>   Although simply killing the entire machine appears acceptable.
> 
> - What is the chipsets idea of which DIMM the memory error occurred on.
>   For bus based memory architectures like the opteron this
>   is a chip select of the DIMM rank.
>   For serial memory architectures this is some kind of bus address,
>   but still useful for describing individual chips.
> 
> - What is the silk screen label on the motherboard that corresponds
>   to the chip selects with problems.
> 
> If you look at the memory controller, and the associated error
> reporting registers (which are sometimes available in the machine check).
> There has always been enough information to determine the hardware
> address the memory controller knows the DIMM by.
> 
> Getting the address of the error is usually possible but not always
> and not always very reliably.
> 
> Mapping between the hardware address that the memory controller
> knows DIMMS by and the actual DIMMS themselves is actually
> pretty easy even if you don't have any motherboard information.

That's all supposed to be done by the standard machine check handlers.

I think EDAC just started because some older chipsets don't integrate
error reporting into the standard machine checks. 

But in newer systems which are the way forward you get it from
standard MCEs, no need for special drivers anymore.

> It is just a matter of plugging in DIMMS in different positions
> and seeing which DIMMS that the hardware currently sees.  It's
> maybe half a days work on an unknown motherboard.

Sorry that's totally unrealistic for anybody outside a hardware
vendor or perhaps a big supercomputing lab. Normal users don't
want to sit down "half a day" with their new systems to figure
out to what the DIMMs map.

It either has to "just work" or they won't be able to use it.

We need to figure out some way to do it automatically. While 
SMBIOS is not perfect, it is far better than any manual 
proposals

That said I know SMBIOS can be wrong, so allowing to overwrite
it makes sense. But requiring the users to do this by default
is a complete non starter IMHO.

> 
> knows the DIMM by requires the reading of hardware registers,
> some that are not easily accessible to user space so a kernel driver
> tends to make sense, just to get the information.  
> 
> Possibly we could just  export that information and let the
> user space figure it out from there.   But memory is a key system

You can do it completely in user space. See mcelog as proof.

And figuring out the channel in a lot of code etc. seems overkill to me - or 
at least i haven't gotten an explanation why it's better than just
using the reported address.

> component and hardware designers are very creative so coming
> up with a consistent model would be very hard.  So far we

Yes the error reporting is still machine specific so far.
Doing it generically would be good.

> have had to improve our helper functions every couple of chipsets
> The other pieces to me seem much more fluid.  Especially since EDAC
> does not yet export much if anything to user space except through
> printk's in any stable kernel.

Yes that's another issue. printks are not very good for this.
That is why I went over to a specialized logging device.

> As for the suggestion of using DMI as best as I can determine it
> suffers rather badly from the never ending creativity of the chipset
> developers and does not have a model that can describe what needs
> to happen for the current generation of chipset much less the bleeding
> edge ones.  Which is besides the fact that the only thing that you can
> usually trust in DMI tables is the motherboard manufacturer.

I think you paint it worse than it is. Also there are no
realistic alternatives that I can see. Requiring all users
to do it by hand is it certainly not.

> I do think getting the motherboard id out of DMI provides a great key
> to build a memory controller hardware address to DIMM label lookup
> table.   With EDAC we have been computing that information in user
> space and caching it kernel side so we could generate immediately
> useful print statements.  Which is handy but probably not necessary.

Ok that is a proposal, but still won't cover most motherboards
that are out there.

Having an override table for motherboards where DMI is known to 
be wrong certainly makes sense to me.  But the default has to
be DMI I think.

If there was such a table somewhere I would be happy to support
it with mcelog.

But who would be willing to maintain such a table? It would
be a lot of work.

I'm still optimistic though - if Linux starts to use this
information more aggressively then there will be much pressure
from customers at least on server level kit vendors who still get 
this wrong.

This won't help all the cheap desktop/laptop boards , but these tend
to usually not have more than two DIMMs, so it's not that big
an issue.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-06 13:01                   ` Andi Kleen
@ 2006-07-06 15:31                     ` Eric W. Biederman
  2006-07-06 16:51                       ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: Eric W. Biederman @ 2006-07-06 15:31 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel

Andi Kleen <ak@muc.de> writes:

> On Thu, Jul 06, 2006 at 12:12:14AM -0600, Eric W. Biederman wrote:
>> 
>> knows the DIMM by requires the reading of hardware registers,
>> some that are not easily accessible to user space so a kernel driver
>> tends to make sense, just to get the information.  
>> 
>> Possibly we could just  export that information and let the
>> user space figure it out from there.   But memory is a key system
>
> You can do it completely in user space. See mcelog as proof.
>
> And figuring out the channel in a lot of code etc. seems overkill to me - or 
> at least i haven't gotten an explanation why it's better than just
> using the reported address.

So breaking this down simply.

With EDAC on my next boot I get positive confirmation that I either
pulled the DIMM that the error happened on, or I pulled a different
DIMM.

Mapping the hardware addresses to the motherboard silk screen label
before hand is unnecessary and just ensures that you pull out the DIMM
you are trying for the first time.  Making it an optimization for
people who do that a lot.

To the best of my knowledge mcelog even with the --dmi option cannot
give me that.

Knowing that we actually pulled out the DIMM that the errors were
reported against is what we get by going beyond the address
in the machine check.

Does that cross the communication divide?

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-06 15:31                     ` Eric W. Biederman
@ 2006-07-06 16:51                       ` Andi Kleen
  2006-07-06 17:46                         ` Eric W. Biederman
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2006-07-06 16:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel

> With EDAC on my next boot I get positive confirmation that I either
> pulled the DIMM that the error happened on, or I pulled a different
> DIMM.

How? You simulate a new error and let EDAC resolve it?

> 
> Mapping the hardware addresses to the motherboard silk screen label
> before hand is unnecessary and just ensures that you pull out the DIMM
> you are trying for the first time.  Making it an optimization for
> people who do that a lot.

Sorry I didn't parse that. 

> To the best of my knowledge mcelog even with the --dmi option cannot
> give me that.

You mean identify if a given DIMM is still plugged in? You can get that 
information from dmidecode

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-06 16:51                       ` Andi Kleen
@ 2006-07-06 17:46                         ` Eric W. Biederman
  2006-07-06 18:08                           ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: Eric W. Biederman @ 2006-07-06 17:46 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel

Andi Kleen <ak@muc.de> writes:

>> With EDAC on my next boot I get positive confirmation that I either
>> pulled the DIMM that the error happened on, or I pulled a different
>> DIMM.
>
> How? You simulate a new error and let EDAC resolve it?

No. There is a status report that tells you which pieces of hardware
your memory controller sees.  It is just a simple list.

>> To the best of my knowledge mcelog even with the --dmi option cannot
>> give me that.
>
> You mean identify if a given DIMM is still plugged in? You can get that 
> information from dmidecode

If you can reliably decode an error to a DIMM that DMI reports, then
yes even if DMI gets the label wrong you can reboot and see if the label
you were aiming for is now missing.  The principle is the same.

The difference is that you can't reliably use DMI to decode to a DIMM.

If you look at memory controller registers you can reliably do the
same thing without relying on DMI.  It works every time.

Isn't something that just works, and is not at the mercy of the BIOS
developers with too little time worth doing?

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-06 17:46                         ` Eric W. Biederman
@ 2006-07-06 18:08                           ` Andi Kleen
  2006-07-06 18:34                             ` Alan Cox
  2006-07-06 18:43                             ` Eric W. Biederman
  0 siblings, 2 replies; 19+ messages in thread
From: Andi Kleen @ 2006-07-06 18:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel

On Thu, Jul 06, 2006 at 11:46:00AM -0600, Eric W. Biederman wrote:
> Andi Kleen <ak@muc.de> writes:
> 
> >> With EDAC on my next boot I get positive confirmation that I either
> >> pulled the DIMM that the error happened on, or I pulled a different
> >> DIMM.
> >
> > How? You simulate a new error and let EDAC resolve it?
> 
> No. There is a status report that tells you which pieces of hardware
> your memory controller sees.  It is just a simple list.

Ok but that could be also done easily in user space that reads
PCI config space. No need for a complicated kernel driver at all.

> Isn't something that just works, and is not at the mercy of the BIOS
> developers with too little time worth doing?

I just don't see how it's very useful if you don't know which DIMM
to replace in the first place. And to know that in your scheme you need
your magical database with all motherboards ever shipped, which
I don't consider realistic.

-Andi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-06 18:34                             ` Alan Cox
@ 2006-07-06 18:27                               ` Andi Kleen
  2006-07-06 19:09                                 ` Eric W. Biederman
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2006-07-06 18:27 UTC (permalink / raw)
  To: Alan Cox
  Cc: Eric W. Biederman, Doug Thompson, akpm, mm-commits, norsk5,
	linux-kernel

On Thu, Jul 06, 2006 at 07:34:58PM +0100, Alan Cox wrote:
> Ar Iau, 2006-07-06 am 20:08 +0200, ysgrifennodd Andi Kleen:
> > > No. There is a status report that tells you which pieces of hardware
> > > your memory controller sees.  It is just a simple list.
> > 
> > Ok but that could be also done easily in user space that reads
> > PCI config space. No need for a complicated kernel driver at all.
> 
> The same is true of writing a file system and disk driver so I'm a bit
> confused why you think poking around in PCI space from user space is an
> argument or given how often such stuff breaks and how messy it gets (eg
> X) that we want to encourage it

It depends on what you do. First a large part of X's messiness
comes from it not using the proper interfaces.
Or it trying to do complicated things like messing with bridges. 

Then anything with MMIO or interrupts or anything dynamic 
definitely belongs into kernel space agreed.

But at least on K8 DIMM inventory is purely reading PCI config space on
something that doesn't change and doesn't need any locking. 
It also doesn't need to do anything complicated, but just look
for the right PCI ID.

I don't see an issue with such simple static things in user space.

I could probably write it as a shell script that parses lspci output
(not saying that that would be the right way, but it's certainly
doable)

-Andi 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-06 18:08                           ` Andi Kleen
@ 2006-07-06 18:34                             ` Alan Cox
  2006-07-06 18:27                               ` Andi Kleen
  2006-07-06 18:43                             ` Eric W. Biederman
  1 sibling, 1 reply; 19+ messages in thread
From: Alan Cox @ 2006-07-06 18:34 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Eric W. Biederman, Doug Thompson, akpm, mm-commits, norsk5,
	linux-kernel

Ar Iau, 2006-07-06 am 20:08 +0200, ysgrifennodd Andi Kleen:
> > No. There is a status report that tells you which pieces of hardware
> > your memory controller sees.  It is just a simple list.
> 
> Ok but that could be also done easily in user space that reads
> PCI config space. No need for a complicated kernel driver at all.

The same is true of writing a file system and disk driver so I'm a bit
confused why you think poking around in PCI space from user space is an
argument or given how often such stuff breaks and how messy it gets (eg
X) that we want to encourage it



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-06 18:08                           ` Andi Kleen
  2006-07-06 18:34                             ` Alan Cox
@ 2006-07-06 18:43                             ` Eric W. Biederman
  1 sibling, 0 replies; 19+ messages in thread
From: Eric W. Biederman @ 2006-07-06 18:43 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel

Andi Kleen <ak@muc.de> writes:

> On Thu, Jul 06, 2006 at 11:46:00AM -0600, Eric W. Biederman wrote:
>> Andi Kleen <ak@muc.de> writes:
>> 
>> >> With EDAC on my next boot I get positive confirmation that I either
>> >> pulled the DIMM that the error happened on, or I pulled a different
>> >> DIMM.
>> >
>> > How? You simulate a new error and let EDAC resolve it?
>> 
>> No. There is a status report that tells you which pieces of hardware
>> your memory controller sees.  It is just a simple list.
>
> Ok but that could be also done easily in user space that reads
> PCI config space. No need for a complicated kernel driver at all.

User/kernel the task the driver has to do is the same, so
the complexity doesn't really change.

On some chipsets the registers are memory mapped, on others the
memory controller is hidden by default.

All of which are hard to deal with from user space.

>> Isn't something that just works, and is not at the mercy of the BIOS
>> developers with too little time worth doing?
>
> I just don't see how it's very useful if you don't know which DIMM
> to replace in the first place. 

You do know which DIMM you just don't know what the label on the
motherboard is.  That is a lot different from knowing that some DIMM
is bad.

> And to know that in your scheme you need
> your magical database with all motherboards ever shipped, which
> I don't consider realistic.

No you do not need a magic database.  Having the mapping will save
you a few minutes as you debug your hardware, but it is not critical.

If you have a map of rank to motherboard label the work flow is:
- Look for the label on the motherboard and pull that DIMM.

If you do not have a map of rank to motherboard label the work flow is:
- Make an educated guess
- Boot up and see if you have pulled your target DIMM.
  If so stop.
- If not turn off the computer
- Replace the DIMM
- start again with a different DIMM.

A simple linear search will not take long, and because you don't
have to reproduce the error on the problem DIMM it will go much
faster then if all you knew was that a DIMM was bad.

If you replace a lot of bad memory on a particular mother board
you will either build the map in your head or you will store it
in a file.  Once you have the map in a file hopefully it will get
sent to the EDAC maintainer and some one else can be saved the
trouble.

Historically memory has had a pretty bad infant mortality so there
are people who replace a lot of memory on particular motherboards.

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-06 18:27                               ` Andi Kleen
@ 2006-07-06 19:09                                 ` Eric W. Biederman
  2006-07-06 19:18                                   ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: Eric W. Biederman @ 2006-07-06 19:09 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel

Andi Kleen <ak@muc.de> writes:

>
> It depends on what you do. First a large part of X's messiness
> comes from it not using the proper interfaces.
> Or it trying to do complicated things like messing with bridges. 

Yep we sometimes have to mess with complicated things.

> Then anything with MMIO or interrupts or anything dynamic 
> definitely belongs into kernel space agreed.

Yep we sometimes have to mess with MMIO.

> But at least on K8 DIMM inventory is purely reading PCI config space on
> something that doesn't change and doesn't need any locking. 
> It also doesn't need to do anything complicated, but just look
> for the right PCI ID.

Mostly.  Except for the part where you have to figure out the stepping
of the processor connected to the memory controller to properly decode
the registers.  AMD should have used the revision field in pci config
space but...

> I don't see an issue with such simple static things in user space.

I agree it should be that simple. 

But if all of your drivers are not that simple it doesn't make sense
to put half of them in user space and half of them in the kernel,
unless there is a good reason for them not to be in the kernel.

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-06 19:09                                 ` Eric W. Biederman
@ 2006-07-06 19:18                                   ` Andi Kleen
  2006-07-06 19:43                                     ` Eric W. Biederman
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2006-07-06 19:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel

On Thu, Jul 06, 2006 at 01:09:35PM -0600, Eric W. Biederman wrote:
> > Then anything with MMIO or interrupts or anything dynamic 
> > definitely belongs into kernel space agreed.
> 
> Yep we sometimes have to mess with MMIO.

Not on K8 at least, no? 

Maybe we should discuss each chipset separatedly :)

> 
> > But at least on K8 DIMM inventory is purely reading PCI config space on
> > something that doesn't change and doesn't need any locking. 
> > It also doesn't need to do anything complicated, but just look
> > for the right PCI ID.
> 
> Mostly.  Except for the part where you have to figure out the stepping
> of the processor connected to the memory controller to properly decode
> the registers.  AMD should have used the revision field in pci config
> space but...

That's in /proc/cpuinfo

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
  2006-07-06 19:18                                   ` Andi Kleen
@ 2006-07-06 19:43                                     ` Eric W. Biederman
  0 siblings, 0 replies; 19+ messages in thread
From: Eric W. Biederman @ 2006-07-06 19:43 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, Doug Thompson, akpm, mm-commits, norsk5, linux-kernel

Andi Kleen <ak@muc.de> writes:

> On Thu, Jul 06, 2006 at 01:09:35PM -0600, Eric W. Biederman wrote:
>> > Then anything with MMIO or interrupts or anything dynamic 
>> > definitely belongs into kernel space agreed.
>> 
>> Yep we sometimes have to mess with MMIO.
>
> Not on K8 at least, no? 
>
> Maybe we should discuss each chipset separatedly :)
 :)

>> > But at least on K8 DIMM inventory is purely reading PCI config space on
>> > something that doesn't change and doesn't need any locking. 
>> > It also doesn't need to do anything complicated, but just look
>> > for the right PCI ID.
>> 
>> Mostly.  Except for the part where you have to figure out the stepping
>> of the processor connected to the memory controller to properly decode
>> the registers.  AMD should have used the revision field in pci config
>> space but...
>
> That's in /proc/cpuinfo

Some of it.  Taking a quick glance I can't seem to see a nodeid field.
Not that it especially likely you would have a system with mixed revision
cpus (it is a pain in the BIOS) but since it is possible it at least make
sense to try.

Eric


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2006-07-06 19:44 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20060701150430.GA38488@muc.de>
     [not found] ` <20060703172633.50366.qmail@web50109.mail.yahoo.com>
     [not found]   ` <20060703184836.GA46236@muc.de>
     [not found]     ` <1151962114.16528.18.camel@localhost.localdomain>
2006-07-04  9:23       ` + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree Andi Kleen
2006-07-04 10:09         ` Alan Cox
2006-07-04 11:34           ` Andi Kleen
2006-07-05 22:08             ` Alan Cox
2006-07-05 22:04               ` Andi Kleen
2006-07-06  6:12                 ` Eric W. Biederman
2006-07-06 13:01                   ` Andi Kleen
2006-07-06 15:31                     ` Eric W. Biederman
2006-07-06 16:51                       ` Andi Kleen
2006-07-06 17:46                         ` Eric W. Biederman
2006-07-06 18:08                           ` Andi Kleen
2006-07-06 18:34                             ` Alan Cox
2006-07-06 18:27                               ` Andi Kleen
2006-07-06 19:09                                 ` Eric W. Biederman
2006-07-06 19:18                                   ` Andi Kleen
2006-07-06 19:43                                     ` Eric W. Biederman
2006-07-06 18:43                             ` Eric W. Biederman
2006-07-05 17:39         ` Doug Thompson
2006-07-05 19:39           ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox