public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] EDAC: core EDAC support code
@ 2006-03-10  0:44 Doug Thompson
  0 siblings, 0 replies; 56+ messages in thread
From: Doug Thompson @ 2006-03-10  0:44 UTC (permalink / raw)
  To: greg
  Cc: arjan, gregkh, bluesmoke-devel, dsp, Doug Thompson, torvalds,
	alan, linux-kernel, rdunlap

On Fri, 2006-03-10 at 00:02 +0000, Greg KH  wrote:
> On Thu, Mar 09, 2006 at 03:51:25PM -0800, Dave Peterson wrote:
> > On Tuesday 07 March 2006 11:03, Arjan van de Ven wrote:
> > > afaics it is a list of pci devices. these should just be symlinks to the
> > > sysfs resource of these pci devices instead, not a flat table file.
> > 
> > Ok, I'm looking at the EDAC sysfs interface.  I see the following
> > issues concerning the "one value per file" rule:
> > 
> >     1.  /sys/devices/system/edac/mc/mc0/module_name contains two
> >         values, a module name and a version:
> > 
> >             # cat /sys/devices/system/edac/mc/mc0/module_name
> >             k8_edac  Ver: 2.0.1.devel Mar  8 2006
> 
> Woah.  That's what /sys/modules/ is for right?  Don't add new stuff
> please.
>
> >     2.  /sys/devices/system/edac/mc/mc0/supported_mem_type contains
> >         the following on the machine I am looking at:
> > 
> >             # cat /sys/devices/system/edac/mc/mc0/supported_mem_type
> >             Unbuffered-DDR Registered-DDR
> >             #
> > 
> >         Here we have a whitespace-delimited list of values.  Likewise,
> >         the following files contain whitespace-delimited lists:
> > 
> >             /sys/devices/system/edac/mc/mc0/edac_capability
> >             /sys/devices/system/edac/mc/mc0/edac_current_capability
> 
> What exactly do they look like?

Unbuffered-DDR Registered-DDR

These come from the memory controller on what types of memory are
possible (or capable), NOT actuals. Its a domain of multiple types.

> 
> >     3.  The following files contain comma-delimited lists of
> >         (vendor ID, device ID) tuples:
> > 
> >             /sys/devices/system/edac/pci/pci_parity_blacklist
> >             /sys/devices/system/edac/pci/pci_parity_whitelist
> 
> What exactly do they look like?

Example:

1022:7450,1434:16a6

is a list of 2 devices that should be skipped. Retrieved via a lspci.

PCI vendor-ID:device-ID series list. One or more devices are possible

The number of devices depends on how many devices should be skipped on a
given server. 

Somehow the system admin needs to tell edac/pci to skip one or more
devices in the PCI device list. As the iterator proceeds through the PCI
device list, it compares each to a blacklist (or whitelist if on - only
one or the other can be one) entry. If found it skipsits. I toyed with
the idea of each PCI device having a control file and the admin setting
a scan/no-scan state, but quickly dropped that.

If a device is blacklisted, it is non-conforming to the PCI Parity
standard of operation and its status cannot be trusted. PCI-X Infiniband
card is such a card, though they have promised a firmware update.

> 
> >         I assume this is what Arjan is referring to.
> >         Documentation/drivers/edac/edac.txt gives the following
> >         description of how the whitelist functions:
> > 
> >             This control file allows for an explicit list of PCI
> >             devices to be scanned for parity errors. Only devices
> >             found on this list will be examined.  The list is a line
> >             of hexadecimel VENDOR and DEVICE ID tuples:
> > 
> >             1022:7450,1434:16a6
> > 
> >             One or more can be inserted, seperated by a comma.
> >             To write the above list doing the following as one
> >             command line:
> > 
> >             echo "1022:7450,1434:16a6"
> >                     > /sys/devices/system/edac/pci/pci_parity_whitelist
> > 
> >             To display what the whitelist is, simply 'cat' the same
> >             file.
> > 
> > Looking at the current EDAC implementation, these are all of the
> > "one value per file" issues I see.  If anyone sees any others I
> > missed, please let me know.  Here are my thoughts on each:
> > 
> >     Issue #1
> >     --------
> >     Fixing this is easy.  /sys/devices/system/edac/mc/mc0/module_name
> >     can be replaced by two separate files, one providing the name and
> >     the other providing the version:
> > 
> >         /sys/devices/system/edac/mc/mc0/module_name
> >         /sys/devices/system/edac/mc/mc0/module_version
> 
> No, these should just be deleted.  Use the proper MODULE_* macros for
> these if you really want to display them to users.

When these macros are used, they then show up in the /sys/module/xxxx
directory, is that correct?

> 
> >     Issue #2
> >     --------
> >     To fix this, /sys/devices/system/edac/mc/mc0/supported_mem_type
> >     can be made into a directory containing a file representing each
> >     supported memory type.  Thus we might have the following:
> > 
> >         /sys/devices/system/edac/mc/mc0/supported_mem_type
> >         /sys/devices/system/edac/mc/mc0/supported_mem_type/Unbuffered-DDR
> >         /sys/devices/system/edac/mc/mc0/supported_mem_type/Registered-DDR
> > 
> >     In the above example, the files Unbuffered-DDR and Registered-DDR
> >     would each be empty in content.  The presence of each file would
> >     indicate that the memory type it represents is supported.

This attribute reports POSSIBLE memory types, not ACTUAL memory type.

> 
> I don't think the original file is really a big problem.

Are you saying to leave it as is?

the supported_mem_type reports what the memory controller is CAPABLE of
supporting. Then in '..../mc0/csrowN/mem_type' reports what is ACTUALLY
being used.

> 
> >     Issue #3
> >     --------
> >     I am unclear about what to do here.  If the list contents were
> >     read-only, it would be relatively easy to make
> >     /sys/devices/system/edac/pci/pci_parity_whitelist into a directory
> >     containing symlinks, one for each device.  However, the user is
> >     supposed to be able to modify the list contents.  This would imply
> >     that the user creates and destroys symlinks.  Does sysfs currently
> >     support this sort of behavior?  If not, what is the preferred
> >     means for implementing a user-modifiable set of values?

This input and presentation of a list was troublesome in matching the
one attribute policy.

> 
> No it doesn't.  How big can this list get?

Depends on how many PCI devices there are AND which ones have been
identified as non-conforming to the PCI standard for PCI parity status.

At most, the Number of devices minus 1, but then using a 'whitelist'
with just that one lone device would be better. IF all devices are
listed in a blacklist, then just turn off parity checking instead.

Getting the blacklist/whitelist into the edac driver took some pondering
and various designs on my part. I might have missed a more obvious
solution.

> 
> thanks,
> 
> greg k-h

thanks

doug t


^ permalink raw reply	[flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
@ 2006-03-10 16:56 Doug Thompson
  2006-03-10 17:11 ` Arjan van de Ven
  0 siblings, 1 reply; 56+ messages in thread
From: Doug Thompson @ 2006-03-10 16:56 UTC (permalink / raw)
  To: arjan; +Cc: tim, greg, bluesmoke-devel, dsp, Doug Thompson, linux-kernel

On Fri, 2006-03-10 at 11:40 +0000, Arjan van de Ven  wrote:
> On Fri, 2006-03-10 at 11:06 +0000, Tim Small wrote:
> > Arjan van de Ven wrote:
> > 
> > > It depends on how many PCI devices in your machine you wish to
> > >
> > >>blacklist or whitelist.  The motivation for this feature is that
> > >>certain known badly-designed devices report an endless stream of
> > >>spurious PCI bus parity errors.  We want to skip such devices when
> > >>checking for PCI bus parity errors.
> > >>    
> > >>
> > >
> > >ok so this is actually a per pci device property!
> > >I would suggest moving this property to the pci device itself, not doing
> > >it inside an edac directory.
> > >  
> > >
> > Yes, this seems more sensible to me.  For one thing, I suspect that just 
> > keying on vendor:device is probably too blunt for this and that 
> > blacklisting a particular PCI device revision is a likely requirement, 
> > as well as subsystem vendor/subsystem device.
> 
> and maybe even something as funky as firmware version.
> So it for sure is a per device (not per ID) property, and something that
> needs a global quirk table kind of thing with the option to do per
> driver overrides

Very definitely, this non-conforming misfeature of PCI compliance is a
per PCI device attribute. At the very least it is tied to VENDOR:DEVICE
tuple, and probably a subsystem vendor/device tuple as well. As to
firmware, that is also likely. Mellanox promised a new firmware update
to their board that supposely fixes this issue. Yet, I find no firmware
value in the PCI spec, just the Revision ID, which could be used as
firmware identifier. THis is up to the vendor.

So in order to be sure I understand, if this PARITY Non-Conformance
attribute was "moved" to the per device directory of sysfs
(/sys/devices/pci0000:00/0000:00:06.0 for an example), then we would
need a userland attribute file created here and then stored in the
'pci_dev' structure or the mentioned quirk structure. This field then
could be set by userland script(s), then EDAC-PCI could example that
data in its iteration of pci devices.  Is that correct?

I will admit I have heard of the "quirk" tables in the kernel, but don't
fully understand them.  From what I read here, a PCI device quirk table
would be a parallel structure to the 'struct pci_dev' for a given PCI
device. So every pci_dev structure created, then a quirk table structure
would be created, and in that quirk entry is a PARITY data item. That
data item is exposed into sysfs in the /sys/devices/pci* as the example
above.

An new getter functions would be needed so the EDAC PCI iterator could
'get' the current value of the attribute.

If the above is correct, then who would we need to contact for said
modification or approval for such? Is that you Greg KH, since you are
listed as the PCI SUBSYSTEM maintainer?

thanks

doug t




^ permalink raw reply	[flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
@ 2006-03-10 17:28 Doug Thompson
  0 siblings, 0 replies; 56+ messages in thread
From: Doug Thompson @ 2006-03-10 17:28 UTC (permalink / raw)
  To: arjan; +Cc: tim, greg, bluesmoke-devel, dsp, Doug Thompson, linux-kernel

On Fri, 2006-03-10 at 17:11 +0000, Arjan van de Ven  wrote:

> 
> > So in order to be sure I understand, if this PARITY Non-Conformance
> > attribute was "moved" to the per device directory of sysfs
> > (/sys/devices/pci0000:00/0000:00:06.0 for an example), then we would
> > need a userland attribute file created here and then stored in the
> > 'pci_dev' structure 
> 
> yes. Well to some degree I'm not even sure it needs to be exposed to
> userland like this. At least normally the kernel should know this
> internally and automatically. (after all the kernel has the job to
> abstract the hardware for the rest of the system; dealing with broken
> hardware is part of that)

The problem we have run into is latency of the OS keeping up to date
with the characteristics of a device. When I added the scan PCI parity
to bluesmoke/EDAC I made the ago old assumption that hardware vendors
followed the specs, at least in the majority. (Gee, they didn't 25 years
ago so why should they today??). Well it has been trial and error as we
placed bluesmoke on systems with the various cards we use and more than
I expected fail this conformance.

We needed the ability to override in real (people) time on a cluster,
the scanning of non-conforming boards. We cannot wait months for the OS
to catch up with new board. That is why I placed (breaking the model -
sorry about that) the blacklist "close" to the EDAC module.

For these reasons, we needed an access point for userland to override
the attribute of scanning a PCI device for parity. Device drivers won't
necessarily do this themselves. Something outside the kernel might have
to do it with data found after a device has been added to the system.

> 
> 
> > or the mentioned quirk structure. This field then
> > could be set by userland script(s), then EDAC-PCI could example that
> > data in its iteration of pci devices.  Is that correct?
> 
> that sounds way way way too complex. If this is "just" a field in the
> pci device... why would userland need to get involved? Your kernel side
> should be able to see that directly just fine.

The need for userland to actually DO the setting of the attribute
indicating that this device is non-conforming. That would be why

> 
> 
> 
> > If the above is correct, then who would we need to contact for said
> > modification or approval for such? Is that you Greg KH, since you are
> > listed as the PCI SUBSYSTEM maintainer?
> 
> Greg needs to OK the addition to the pci struct, but I don't forsee a
> problem personally since this is a more or less obvious and logical
> thing to add, and useful for more than one architecture

great!  thanks

doug t



^ permalink raw reply	[flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
@ 2006-03-10 20:37 Doug Thompson
  0 siblings, 0 replies; 56+ messages in thread
From: Doug Thompson @ 2006-03-10 20:37 UTC (permalink / raw)
  To: arjan
  Cc: greg, gregkh, bluesmoke-devel, dsp, Doug Thompson, torvalds, alan,
	linux-kernel, rdunlap

On Fri, 2006-03-10 at 19:33 +0000, Arjan van de Ven  wrote:
> On Fri, 2006-03-10 at 11:07 -0800, Dave Peterson wrote:
> > On Friday 10 March 2006 09:58, Arjan van de Ven wrote:
> > > > I'd be curious to hear people's opinions on the following idea:
> > > > move the PCI bus parity error checking functionality from EDAC
> > > > to the PCI subsystem.
> > >
> > > I can see the point on at least moving all the infrastructure there.
> > > The actual call to run it... maybe. that's more debatable I suppose.
> > 
> > Regarding the actual call to run it, I guess it depends on which of
> > the following you prefer:
> > 
> >     Scenario A
> >     ----------
> >     A more decentralized layout.  Here, the controls that govern the
> >     error handling behavior for a given category of hardware (a
> >     category might be "PCI devices" or "devices that use bus
> >     technology XYZ") are grouped together with other stuff for that
> >     category.
> 

I guess it might be useful to express what the original purpose of EDAC
we (Thayne, Eric, etc) were working toward. Arjan you express it
correctly.

We aimed to develop a central place to report all (or almost all)
hardware errors: starting with ECC errors, then added PCI parity errors.
And to do this in a consistent manner so that harvesting and reaping
that error info can become easier and more useful. (EDAC has a common
printk format as well).

Currently the ownership of that feature is the edac_mc module. It is the
one which creates the /sys/devices/system/edac directory and in turn the
'mc' and 'pci' directories.

When we move the PCI Parity checking code to the PCI subsystem it would
be good to maintain that directory layout. 

One way to do that is to make the edac_mc CORE a part of the kernel and
no longer a module. (This would also help on the kobject reference
counter issue).  Then the PCI Parity subsystem code would register
itself after the EDAC CORE code.

doug t


> this would basically make edac a place to report "help something went to
> the gutter". Sure. I see that useful. 
> 
> In fact there are 3 layers then
> 
> 1) low level "do check" function
> 
> 2) per bus code that calls the do check functions and whatever is needed
> for bus checks
> 
> 3) "EDAC" central command, which basically gathers all failure reports
> and does something with them (push them to userspace or implement the
> userspace chosen policy (panic/reboot/etc))
> 
> 
> having 1) separate from 2) is useful, it means that drivers can do
> synchronous checks in addition to the checks in 2) which will tend to be
> asynchronous....
> 
> 
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
@ 2006-03-11 17:04 Doug Thompson
  2006-03-13 19:35 ` Dave Peterson
  0 siblings, 1 reply; 56+ messages in thread
From: Doug Thompson @ 2006-03-11 17:04 UTC (permalink / raw)
  To: dsp
  Cc: arjan, greg, gregkh, bluesmoke-devel, Doug Thompson, torvalds,
	alan, linux-kernel, rdunlap

On Sat, 2006-03-11 at 01:57 +0000, Dave Peterson  wrote:
> On Friday 10 March 2006 13:23, Arjan van de Ven wrote:
> > hmm ok so I want a function that takes a device as parameter, and checks
> > the state of that device for errors. Internally that probably has to go
> > via a function pointer somewhere to a device specific check method.
> >
> > Or maybe a per test-type (pci parity / ECC / etc) check
> >
> > int pci_check_parity_errors(struct pci_dev *dev, int flags);
> >
> > something like that, or pci_check_and_clear_parity_errors()
> > (although that gets too long :)
> >
> > drivers can call that, say, after firmware init or something to validate
> > their device is sanely connected. Maybe pci_enable_device() could call
> > it too.
> >
> > This also needs a pci_suspend_parity_check() ... _resume_ ... so that
> > the driver can temporarily disable any checks, for example during device
> > reset/init. And then just before resume, it manually clears a check.
> 
> ok, perhaps things might look something like this?
> 
>     - A check_error() function checks a device for errors.  As you
>       mention above, this may be a device-specific check method or a
>       function like pci_check_parity_errors(dev, flags).  In either
>       case, if check_error() finds an error, it clears the error from
>       the device's registers and returns the error.  If check_error()
>       finds multiple errors, here are couple of options:
> 
>           - Return a list of all errors detected.
>           - Return a single error along with a boolean value that says
>             whether more errors are present.  If so, check_error() may
>             be called repeatedly until no errors remain.
> 
>     - A handle_error() function handles errors returned by
>       check_error().  Here are a few options: Each device may have a
>       handle_error() method that takes an error as a parameter.  Or,
>       each type of error may have its own handle_me() method.  A third
>       option is something like pci_handle_parity_error(dev, error).
>       In any case, depending on the error type, the handler may choose
>       to feed the error to EDAC.
> 
>     - Each hardware subsystem or device driver may determine its own
>       error checking/handling strategy.  Some may want to poll for
>       errors.  Others may want error detection to be asynchronous
>       (driven by interrupts or exceptions).  Or a subsystem may poll
>       for some kinds of errors and detect others asynchronously.  As
>       you suggest, errors may also be checked for in other situations,
>       such as after firmware init.
> 
>       For polling, the poll function calls check_error(), and then
>       calls handle_error() if an error is found.  For interrupt-driven
>       error checking, the interrupt handler calls check_error().  If
>       an error is found, the interrupt handler may either call
>       handle_error() directly or defer invocation of handle_error()
>       outside interrupt context.  If the interrupt is an NMI, deferred
>       invocation of handle_error() allows it to execute without
>       introducing deadlocks or race conditions.
> 
>     - For some types of hardware, at boot time the device's registers
>       contain spurious error info, which should be discarded.  This
>       may be done by calling check_error() and discarding the results.
> 
> > > > 2) per bus code that calls the do check functions and whatever is
> > > > needed for bus checks
> > > >
> > > > 3) "EDAC" central command, which basically gathers all failure reports
> > > > and does something with them (push them to userspace or implement the
> > > > userspace chosen policy (panic/reboot/etc))
> > >
> > > Are you suggesting something like the following?
> > >
> > >     - The controls that determine how the error checking is done are
> > >       located within the various hardware subsystems.  For instance,
> > >       with PCI parity checking, this would include stuff like setting
> > >       the polling frequency and determining which devices to check.
> >
> > yes. I would NOT make it overly tunable btw. For many things a sane
> > default can be chosen, and the effect of picking a different tuning
> > isn't all that big. Maybe think of it this way: a tuneable to a large
> > degree is an excuse for not determining the right value / heuristic in
> > the first place.
> 
> Sounds good.
> 
> > >     - When an error is actually detected, the subsystem that detected
> > >       the error (for instance, PCI) would feed the error information
> > >       to EDAC.  Then EDAC would determine how to respond to the error
> > >       (for instance, push it to userspace or implement the
> > >       userspace-chosen policy (panic/reboot/etc))
> >
> > yup.
> 
> Cool!  I think this also coincides with what Doug is saying.  Doug, how
> does this sound?

It sounds good. One issue is how this works with the IBM PCI Parity
handling submission? I don't remember if it has been included yet or
not. I haven't fully studied their model, but it allowed for device
drivers to register notification functions. The PCI subsystem would then
notify the driver of such errors so the driver could do what ever it
needed to do in the bad-thing-happened event.

What we are trying to accomplish is a 'detection, harvesting and
logging' mechanism of these events so that these events can be available
for userland harvesting and possibly can do an alarm of some sort.






^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2006-03-13 19:36 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200601190414.k0J4EZCV021775@hera.kernel.org>
2006-03-05 10:18 ` [PATCH] EDAC: core EDAC support code Arjan van de Ven
2006-03-05 10:30   ` Arjan van de Ven
2006-03-06 18:14     ` Dave Peterson
2006-03-06 18:22       ` Randy.Dunlap
2006-03-07 17:03         ` Dave Peterson
2006-03-07 17:20           ` Greg KH
2006-03-08  0:29             ` Dave Peterson
2006-03-07 19:03           ` Arjan van de Ven
2006-03-07 19:05             ` Greg KH
2006-03-09 23:51             ` Dave Peterson
2006-03-10  0:02               ` Greg KH
2006-03-10  1:46                 ` Dave Peterson
2006-03-10  7:36                   ` Arjan van de Ven
2006-03-10 11:06                     ` Tim Small
2006-03-10 11:40                       ` Arjan van de Ven
2006-03-10 17:46                     ` Dave Peterson
2006-03-10 17:58                       ` Arjan van de Ven
2006-03-10 19:07                         ` Dave Peterson
2006-03-10 19:33                           ` Arjan van de Ven
2006-03-10 21:13                             ` Dave Peterson
2006-03-10 21:23                               ` Arjan van de Ven
2006-03-11  1:57                                 ` Dave Peterson
2006-03-11  7:18                                   ` Arjan van de Ven
2006-03-13 19:31                                     ` Dave Peterson
2006-03-06 19:52       ` Greg KH
2006-03-05 15:55   ` Greg KH
2006-03-05 16:24     ` Arjan van de Ven
2006-03-06 18:52     ` Dave Peterson
2006-03-06 19:53       ` Greg KH
2006-03-06 21:01         ` Dave Peterson
2006-03-06 21:07           ` Arjan van de Ven
2006-03-09  3:19             ` Dave Peterson
2006-03-09  3:44               ` Al Viro
2006-03-09  5:51               ` Greg KH
2006-03-06 21:32           ` Al Viro
2006-03-06 21:53             ` Greg KH
2006-03-06 22:24               ` Al Viro
2006-03-06 22:55                 ` Greg KH
2006-03-07 10:41                 ` Andrew Morton
2006-03-07 11:08                   ` Al Viro
2006-03-08  2:46                   ` Rusty Russell
2006-03-07  1:45               ` Dmitry Torokhov
2006-03-07  1:57                 ` Greg KH
2006-03-07  2:10                   ` Dmitry Torokhov
2006-03-07 16:47             ` Dave Peterson
2006-03-07 17:04               ` Greg KH
2006-03-07 17:06                 ` Dave Peterson
2006-03-08  1:03                 ` Dmitry Torokhov
2006-03-08  1:33                   ` Greg KH
2006-03-10  0:44 Doug Thompson
  -- strict thread matches above, loose matches on Subject: below --
2006-03-10 16:56 Doug Thompson
2006-03-10 17:11 ` Arjan van de Ven
2006-03-10 17:28 Doug Thompson
2006-03-10 20:37 Doug Thompson
2006-03-11 17:04 Doug Thompson
2006-03-13 19:35 ` Dave Peterson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox