* Re: [PATCH] EDAC: core EDAC support code
[not found] <200601190414.k0J4EZCV021775@hera.kernel.org>
@ 2006-03-05 10:18 ` Arjan van de Ven
2006-03-05 10:30 ` Arjan van de Ven
2006-03-05 15:55 ` Greg KH
0 siblings, 2 replies; 56+ messages in thread
From: Arjan van de Ven @ 2006-03-05 10:18 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: gregkh
> +/* Main MC kobject release() function */
> +static void edac_memctrl_master_release(struct kobject *kobj)
> +{
> + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> +}
> +
ehhh how on earth can this be right?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-05 10:18 ` [PATCH] EDAC: core EDAC support code Arjan van de Ven
@ 2006-03-05 10:30 ` Arjan van de Ven
2006-03-06 18:14 ` Dave Peterson
2006-03-05 15:55 ` Greg KH
1 sibling, 1 reply; 56+ messages in thread
From: Arjan van de Ven @ 2006-03-05 10:30 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: torvalds, alan, gregkh
On Sun, 2006-03-05 at 11:18 +0100, Arjan van de Ven wrote:
> > +/* Main MC kobject release() function */
> > +static void edac_memctrl_master_release(struct kobject *kobj)
> > +{
> > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > +}
> > +
>
>
>
> ehhh how on earth can this be right?
oh and this stuff also violates the "one value per file" rule; can we
fix that urgently before it becomes part of the ABI in 2.6.16??
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-05 10:18 ` [PATCH] EDAC: core EDAC support code Arjan van de Ven
2006-03-05 10:30 ` Arjan van de Ven
@ 2006-03-05 15:55 ` Greg KH
2006-03-05 16:24 ` Arjan van de Ven
2006-03-06 18:52 ` Dave Peterson
1 sibling, 2 replies; 56+ messages in thread
From: Greg KH @ 2006-03-05 15:55 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: Linux Kernel Mailing List
On Sun, Mar 05, 2006 at 11:18:04AM +0100, Arjan van de Ven wrote:
>
> > +/* Main MC kobject release() function */
> > +static void edac_memctrl_master_release(struct kobject *kobj)
> > +{
> > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > +}
> > +
>
> ehhh how on earth can this be right?
Ugh. Good catch, it isn't right. Gotta love it when people try to
ignore the helpful messages the kernel gives you when you use an API
wrong :(
thanks,
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-05 15:55 ` Greg KH
@ 2006-03-05 16:24 ` Arjan van de Ven
2006-03-06 18:52 ` Dave Peterson
1 sibling, 0 replies; 56+ messages in thread
From: Arjan van de Ven @ 2006-03-05 16:24 UTC (permalink / raw)
To: Greg KH; +Cc: Linux Kernel Mailing List
On Sun, 2006-03-05 at 07:55 -0800, Greg KH wrote:
> On Sun, Mar 05, 2006 at 11:18:04AM +0100, Arjan van de Ven wrote:
> >
> > > +/* Main MC kobject release() function */
> > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > +{
> > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > +}
> > > +
> >
> > ehhh how on earth can this be right?
>
> Ugh. Good catch, it isn't right. Gotta love it when people try to
> ignore the helpful messages the kernel gives you when you use an API
> wrong :(
s/ignore/circumvent/
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-05 10:30 ` Arjan van de Ven
@ 2006-03-06 18:14 ` Dave Peterson
2006-03-06 18:22 ` Randy.Dunlap
2006-03-06 19:52 ` Greg KH
0 siblings, 2 replies; 56+ messages in thread
From: Dave Peterson @ 2006-03-06 18:14 UTC (permalink / raw)
To: Arjan van de Ven, Linux Kernel Mailing List; +Cc: torvalds, alan, gregkh
On Sunday 05 March 2006 02:30, Arjan van de Ven wrote:
> On Sun, 2006-03-05 at 11:18 +0100, Arjan van de Ven wrote:
> > > +/* Main MC kobject release() function */
> > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > +{
> > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > +}
> > > +
> >
> > ehhh how on earth can this be right?
>
> oh and this stuff also violates the "one value per file" rule; can we
> fix that urgently before it becomes part of the ABI in 2.6.16??
Ok, I'll admit to being a bit clueless about this. I'm not familiar
with the "one value per file" rule; can someone please explain?
On Sunday 05 March 2006 07:55, Greg KH wrote:
> Ugh. Good catch, it isn't right. Gotta love it when people try to
> ignore the helpful messages the kernel gives you when you use an API
> wrong :(
Hmm... I don't recall seeing any messages from the kernel. What
are you seeing?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-06 18:14 ` Dave Peterson
@ 2006-03-06 18:22 ` Randy.Dunlap
2006-03-07 17:03 ` Dave Peterson
2006-03-06 19:52 ` Greg KH
1 sibling, 1 reply; 56+ messages in thread
From: Randy.Dunlap @ 2006-03-06 18:22 UTC (permalink / raw)
To: Dave Peterson; +Cc: arjan, linux-kernel, torvalds, alan, gregkh
On Mon, 6 Mar 2006 10:14:22 -0800 Dave Peterson wrote:
> On Sunday 05 March 2006 02:30, Arjan van de Ven wrote:
> > On Sun, 2006-03-05 at 11:18 +0100, Arjan van de Ven wrote:
> > > > +/* Main MC kobject release() function */
> > > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > > +{
> > > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > > +}
> > > > +
> > >
> > > ehhh how on earth can this be right?
> >
> > oh and this stuff also violates the "one value per file" rule; can we
> > fix that urgently before it becomes part of the ABI in 2.6.16??
>
> Ok, I'll admit to being a bit clueless about this. I'm not familiar
> with the "one value per file" rule; can someone please explain?
it's in Documentation/filesystems/sysfs.txt
Strongly preferred.
> On Sunday 05 March 2006 07:55, Greg KH wrote:
> > Ugh. Good catch, it isn't right. Gotta love it when people try to
> > ignore the helpful messages the kernel gives you when you use an API
> > wrong :(
>
> Hmm... I don't recall seeing any messages from the kernel. What
> are you seeing?
---
~Randy
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-05 15:55 ` Greg KH
2006-03-05 16:24 ` Arjan van de Ven
@ 2006-03-06 18:52 ` Dave Peterson
2006-03-06 19:53 ` Greg KH
1 sibling, 1 reply; 56+ messages in thread
From: Dave Peterson @ 2006-03-06 18:52 UTC (permalink / raw)
To: Greg KH, Arjan van de Ven; +Cc: Linux Kernel Mailing List
On Sunday 05 March 2006 07:55, Greg KH wrote:
> On Sun, Mar 05, 2006 at 11:18:04AM +0100, Arjan van de Ven wrote:
> > > +/* Main MC kobject release() function */
> > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > +{
> > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > +}
> > > +
> >
> > ehhh how on earth can this be right?
>
> Ugh. Good catch, it isn't right. Gotta love it when people try to
> ignore the helpful messages the kernel gives you when you use an API
> wrong :(
Is the concern here that EDAC is not waiting for the reference count
on the kobject to reach 0, therefore creating the possibility of the
module unloading while the kobject (declared statically within the
module) is still in use?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-06 18:14 ` Dave Peterson
2006-03-06 18:22 ` Randy.Dunlap
@ 2006-03-06 19:52 ` Greg KH
1 sibling, 0 replies; 56+ messages in thread
From: Greg KH @ 2006-03-06 19:52 UTC (permalink / raw)
To: Dave Peterson
Cc: Arjan van de Ven, Linux Kernel Mailing List, torvalds, alan,
gregkh
On Mon, Mar 06, 2006 at 10:14:22AM -0800, Dave Peterson wrote:
> On Sunday 05 March 2006 02:30, Arjan van de Ven wrote:
> > On Sun, 2006-03-05 at 11:18 +0100, Arjan van de Ven wrote:
> > > > +/* Main MC kobject release() function */
> > > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > > +{
> > > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > > +}
> > > > +
> > >
> > > ehhh how on earth can this be right?
> >
> > oh and this stuff also violates the "one value per file" rule; can we
> > fix that urgently before it becomes part of the ABI in 2.6.16??
>
> Ok, I'll admit to being a bit clueless about this. I'm not familiar
> with the "one value per file" rule; can someone please explain?
sysfs consists of files that only have one value per file. Please do
not do otherwise, as it will become a maintance nightmare over time.
See the documentation file that Randy pointed you at for details.
> On Sunday 05 March 2006 07:55, Greg KH wrote:
> > Ugh. Good catch, it isn't right. Gotta love it when people try to
> > ignore the helpful messages the kernel gives you when you use an API
> > wrong :(
>
> Hmm... I don't recall seeing any messages from the kernel. What
> are you seeing?
If you do not have a release function, you will see the messages. Just
putting a printk() in a release function is not acceptable, you really
need to free the memory from it. If not, then your code is really
wrong...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-06 18:52 ` Dave Peterson
@ 2006-03-06 19:53 ` Greg KH
2006-03-06 21:01 ` Dave Peterson
0 siblings, 1 reply; 56+ messages in thread
From: Greg KH @ 2006-03-06 19:53 UTC (permalink / raw)
To: Dave Peterson; +Cc: Arjan van de Ven, Linux Kernel Mailing List
On Mon, Mar 06, 2006 at 10:52:57AM -0800, Dave Peterson wrote:
> On Sunday 05 March 2006 07:55, Greg KH wrote:
> > On Sun, Mar 05, 2006 at 11:18:04AM +0100, Arjan van de Ven wrote:
> > > > +/* Main MC kobject release() function */
> > > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > > +{
> > > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > > +}
> > > > +
> > >
> > > ehhh how on earth can this be right?
> >
> > Ugh. Good catch, it isn't right. Gotta love it when people try to
> > ignore the helpful messages the kernel gives you when you use an API
> > wrong :(
>
> Is the concern here that EDAC is not waiting for the reference count
> on the kobject to reach 0, therefore creating the possibility of the
> module unloading while the kobject (declared statically within the
> module) is still in use?
Eeek, don't statically create a kobject :(
Anyway, yes, that is a problem, if it is static, then you need to know
it is safe to unload. Even if it is dynamic that is also true...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-06 19:53 ` Greg KH
@ 2006-03-06 21:01 ` Dave Peterson
2006-03-06 21:07 ` Arjan van de Ven
2006-03-06 21:32 ` Al Viro
0 siblings, 2 replies; 56+ messages in thread
From: Dave Peterson @ 2006-03-06 21:01 UTC (permalink / raw)
To: Greg KH; +Cc: Arjan van de Ven, Linux Kernel Mailing List
On Monday 06 March 2006 11:53, Greg KH wrote:
> > Is the concern here that EDAC is not waiting for the reference count
> > on the kobject to reach 0, therefore creating the possibility of the
> > module unloading while the kobject (declared statically within the
> > module) is still in use?
>
> Eeek, don't statically create a kobject :(
>
> Anyway, yes, that is a problem, if it is static, then you need to know
> it is safe to unload. Even if it is dynamic that is also true...
Ok, now I understand. At first I thought it was something specific
to the way the debugf1() call was implemented that people were
commenting on.
Regarding the above problem with the kobject reference count, this
was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
complete() in edac_memctrl_master_release() and then have the module
cleanup code wait for the completion. I think there were a few other
instances of this type of problem that I also fixed in the
above-mentioned patch.
Is it more desirable to dynamically allocate kobjects than to declare
them statically? If so, I'd be curious to know why dynamic
allocation is preferred over static allocation. If desired, I can
make a patch that fixes EDAC so that its kobjects are dynamically
allocated.
Dave
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-06 21:01 ` Dave Peterson
@ 2006-03-06 21:07 ` Arjan van de Ven
2006-03-09 3:19 ` Dave Peterson
2006-03-06 21:32 ` Al Viro
1 sibling, 1 reply; 56+ messages in thread
From: Arjan van de Ven @ 2006-03-06 21:07 UTC (permalink / raw)
To: Dave Peterson; +Cc: Greg KH, Linux Kernel Mailing List
> Is it more desirable to dynamically allocate kobjects than to declare
> them statically?
Yes
> If so, I'd be curious to know why dynamic
> allocation is preferred over static allocation.
because the lifetime of the kobject is independent of the lifetime of
the memory of your static allocation.
Separate lifetimes -> separate memory is a very good design principle.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-06 21:01 ` Dave Peterson
2006-03-06 21:07 ` Arjan van de Ven
@ 2006-03-06 21:32 ` Al Viro
2006-03-06 21:53 ` Greg KH
2006-03-07 16:47 ` Dave Peterson
1 sibling, 2 replies; 56+ messages in thread
From: Al Viro @ 2006-03-06 21:32 UTC (permalink / raw)
To: Dave Peterson; +Cc: Greg KH, Arjan van de Ven, Linux Kernel Mailing List
On Mon, Mar 06, 2006 at 01:01:37PM -0800, Dave Peterson wrote:
> Regarding the above problem with the kobject reference count, this
> was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
> in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
> complete() in edac_memctrl_master_release() and then have the module
> cleanup code wait for the completion. I think there were a few other
> instances of this type of problem that I also fixed in the
> above-mentioned patch.
This is not a fix, this is a goddamn deadlock.
rmmod your_turd </sys/spew/from/your_turd
and there you go. rmmod can _NOT_ wait for sysfs references to go away.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-06 21:32 ` Al Viro
@ 2006-03-06 21:53 ` Greg KH
2006-03-06 22:24 ` Al Viro
2006-03-07 1:45 ` Dmitry Torokhov
2006-03-07 16:47 ` Dave Peterson
1 sibling, 2 replies; 56+ messages in thread
From: Greg KH @ 2006-03-06 21:53 UTC (permalink / raw)
To: Al Viro; +Cc: Dave Peterson, Arjan van de Ven, Linux Kernel Mailing List
On Mon, Mar 06, 2006 at 09:32:03PM +0000, Al Viro wrote:
> On Mon, Mar 06, 2006 at 01:01:37PM -0800, Dave Peterson wrote:
> > Regarding the above problem with the kobject reference count, this
> > was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
> > in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
> > complete() in edac_memctrl_master_release() and then have the module
> > cleanup code wait for the completion. I think there were a few other
> > instances of this type of problem that I also fixed in the
> > above-mentioned patch.
>
> This is not a fix, this is a goddamn deadlock.
> rmmod your_turd </sys/spew/from/your_turd
> and there you go. rmmod can _NOT_ wait for sysfs references to go away.
To be fair, the only part of the kernel that supports the above process,
is the network stack. And they implemented a special kind of lock to
handle just this kind of thing.
That is not something that I want the rest of the kernel to have to use.
If your code blocks when doing the above thing, that's fine with me.
Note, you better have the module owner reference right for the above to
not oops the kernel, deadlock is fine. There is no rule that we _have_
to allow rmmod to always succeed.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-06 21:53 ` Greg KH
@ 2006-03-06 22:24 ` Al Viro
2006-03-06 22:55 ` Greg KH
2006-03-07 10:41 ` Andrew Morton
2006-03-07 1:45 ` Dmitry Torokhov
1 sibling, 2 replies; 56+ messages in thread
From: Al Viro @ 2006-03-06 22:24 UTC (permalink / raw)
To: Greg KH; +Cc: Dave Peterson, Arjan van de Ven, Linux Kernel Mailing List
On Mon, Mar 06, 2006 at 01:53:44PM -0800, Greg KH wrote:
> > rmmod your_turd </sys/spew/from/your_turd
> > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
>
> To be fair, the only part of the kernel that supports the above process,
> is the network stack. And they implemented a special kind of lock to
> handle just this kind of thing.
>
> That is not something that I want the rest of the kernel to have to use.
> If your code blocks when doing the above thing, that's fine with me.
One word: fail. With -EBUSY.
> Note, you better have the module owner reference right for the above to
> not oops the kernel, deadlock is fine.
Never is.
> There is no rule that we _have_
> to allow rmmod to always succeed.
Quite so, which means we can have it fail saying that module removal has
failed. Deadlock is not the same thing.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-06 22:24 ` Al Viro
@ 2006-03-06 22:55 ` Greg KH
2006-03-07 10:41 ` Andrew Morton
1 sibling, 0 replies; 56+ messages in thread
From: Greg KH @ 2006-03-06 22:55 UTC (permalink / raw)
To: Al Viro; +Cc: Dave Peterson, Arjan van de Ven, Linux Kernel Mailing List
On Mon, Mar 06, 2006 at 10:24:00PM +0000, Al Viro wrote:
> On Mon, Mar 06, 2006 at 01:53:44PM -0800, Greg KH wrote:
> > > rmmod your_turd </sys/spew/from/your_turd
> > > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
> >
> > To be fair, the only part of the kernel that supports the above process,
> > is the network stack. And they implemented a special kind of lock to
> > handle just this kind of thing.
> >
> > That is not something that I want the rest of the kernel to have to use.
> > If your code blocks when doing the above thing, that's fine with me.
>
> One word: fail. With -EBUSY.
>
> > Note, you better have the module owner reference right for the above to
> > not oops the kernel, deadlock is fine.
>
> Never is.
My apologies, you are right, for some reason I thought rmmod would just
wait for the reference count to go away. I just tested this on a lot of
different things in sysfs and it works properly and rmmod will return an
error saying the module is in use at this time.
> > There is no rule that we _have_
> > to allow rmmod to always succeed.
>
> Quite so, which means we can have it fail saying that module removal has
> failed. Deadlock is not the same thing.
Agreed.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-06 21:53 ` Greg KH
2006-03-06 22:24 ` Al Viro
@ 2006-03-07 1:45 ` Dmitry Torokhov
2006-03-07 1:57 ` Greg KH
1 sibling, 1 reply; 56+ messages in thread
From: Dmitry Torokhov @ 2006-03-07 1:45 UTC (permalink / raw)
To: Greg KH; +Cc: Al Viro, Dave Peterson, Arjan van de Ven,
Linux Kernel Mailing List
On Monday 06 March 2006 16:53, Greg KH wrote:
> On Mon, Mar 06, 2006 at 09:32:03PM +0000, Al Viro wrote:
> > On Mon, Mar 06, 2006 at 01:01:37PM -0800, Dave Peterson wrote:
> > > Regarding the above problem with the kobject reference count, this
> > > was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
> > > in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
> > > complete() in edac_memctrl_master_release() and then have the module
> > > cleanup code wait for the completion. I think there were a few other
> > > instances of this type of problem that I also fixed in the
> > > above-mentioned patch.
> >
> > This is not a fix, this is a goddamn deadlock.
> > rmmod your_turd </sys/spew/from/your_turd
> > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
>
> To be fair, the only part of the kernel that supports the above process,
> is the network stack. And they implemented a special kind of lock to
> handle just this kind of thing.
>
Not so:
[root@core ~]# rmmod psmouse < /sys/bus/serio/devices/serio0/rate
ERROR: Module psmouse is in use
[root@core ~]# rmmod psmouse
[root@core ~]# modprobe psmouse
[root@core ~]#
It would be nice if more subsystem could handle this, preferably without
"Waiting for blah to become free" messages (as in W1).
--
Dmitry
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-07 1:45 ` Dmitry Torokhov
@ 2006-03-07 1:57 ` Greg KH
2006-03-07 2:10 ` Dmitry Torokhov
0 siblings, 1 reply; 56+ messages in thread
From: Greg KH @ 2006-03-07 1:57 UTC (permalink / raw)
To: Dmitry Torokhov
Cc: Al Viro, Dave Peterson, Arjan van de Ven,
Linux Kernel Mailing List
On Mon, Mar 06, 2006 at 08:45:59PM -0500, Dmitry Torokhov wrote:
> On Monday 06 March 2006 16:53, Greg KH wrote:
> > On Mon, Mar 06, 2006 at 09:32:03PM +0000, Al Viro wrote:
> > > On Mon, Mar 06, 2006 at 01:01:37PM -0800, Dave Peterson wrote:
> > > > Regarding the above problem with the kobject reference count, this
> > > > was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
> > > > in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
> > > > complete() in edac_memctrl_master_release() and then have the module
> > > > cleanup code wait for the completion. I think there were a few other
> > > > instances of this type of problem that I also fixed in the
> > > > above-mentioned patch.
> > >
> > > This is not a fix, this is a goddamn deadlock.
> > > rmmod your_turd </sys/spew/from/your_turd
> > > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
> >
> > To be fair, the only part of the kernel that supports the above process,
> > is the network stack. And they implemented a special kind of lock to
> > handle just this kind of thing.
> >
>
> Not so:
>
> [root@core ~]# rmmod psmouse < /sys/bus/serio/devices/serio0/rate
> ERROR: Module psmouse is in use
> [root@core ~]# rmmod psmouse
> [root@core ~]# modprobe psmouse
> [root@core ~]#
>
> It would be nice if more subsystem could handle this, preferably without
> "Waiting for blah to become free" messages (as in W1).
See my previous apology about this :)
Anyway, the network stack does have a special lock for unloading modules
while they are still "in use", but as long as Al was referring to your
above sequence, I have no disagreement.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-07 1:57 ` Greg KH
@ 2006-03-07 2:10 ` Dmitry Torokhov
0 siblings, 0 replies; 56+ messages in thread
From: Dmitry Torokhov @ 2006-03-07 2:10 UTC (permalink / raw)
To: Greg KH; +Cc: Al Viro, Dave Peterson, Arjan van de Ven,
Linux Kernel Mailing List
On Monday 06 March 2006 20:57, Greg KH wrote:
> On Mon, Mar 06, 2006 at 08:45:59PM -0500, Dmitry Torokhov wrote:
> > On Monday 06 March 2006 16:53, Greg KH wrote:
> > > On Mon, Mar 06, 2006 at 09:32:03PM +0000, Al Viro wrote:
> > > > On Mon, Mar 06, 2006 at 01:01:37PM -0800, Dave Peterson wrote:
> > > > > Regarding the above problem with the kobject reference count, this
> > > > > was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
> > > > > in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
> > > > > complete() in edac_memctrl_master_release() and then have the module
> > > > > cleanup code wait for the completion. I think there were a few other
> > > > > instances of this type of problem that I also fixed in the
> > > > > above-mentioned patch.
> > > >
> > > > This is not a fix, this is a goddamn deadlock.
> > > > rmmod your_turd </sys/spew/from/your_turd
> > > > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
> > >
> > > To be fair, the only part of the kernel that supports the above process,
> > > is the network stack. And they implemented a special kind of lock to
> > > handle just this kind of thing.
> > >
> >
> > Not so:
> >
> > [root@core ~]# rmmod psmouse < /sys/bus/serio/devices/serio0/rate
> > ERROR: Module psmouse is in use
> > [root@core ~]# rmmod psmouse
> > [root@core ~]# modprobe psmouse
> > [root@core ~]#
> >
> > It would be nice if more subsystem could handle this, preferably without
> > "Waiting for blah to become free" messages (as in W1).
>
> See my previous apology about this :)
>
> Anyway, the network stack does have a special lock for unloading modules
> while they are still "in use", but as long as Al was referring to your
> above sequence, I have no disagreement.
>
I am sorry, I butt in as I read LKML ;) I noticed Al's and your replies
only after I wrote mine...
--
Dmitry
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-06 22:24 ` Al Viro
2006-03-06 22:55 ` Greg KH
@ 2006-03-07 10:41 ` Andrew Morton
2006-03-07 11:08 ` Al Viro
2006-03-08 2:46 ` Rusty Russell
1 sibling, 2 replies; 56+ messages in thread
From: Andrew Morton @ 2006-03-07 10:41 UTC (permalink / raw)
To: Al Viro; +Cc: greg, dsp, arjan, linux-kernel, Rusty Russell
Al Viro <viro@ftp.linux.org.uk> wrote:
>
> On Mon, Mar 06, 2006 at 01:53:44PM -0800, Greg KH wrote:
> > > rmmod your_turd </sys/spew/from/your_turd
> > > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
> >
> > To be fair, the only part of the kernel that supports the above process,
> > is the network stack. And they implemented a special kind of lock to
> > handle just this kind of thing.
> >
> > That is not something that I want the rest of the kernel to have to use.
> > If your code blocks when doing the above thing, that's fine with me.
>
> One word: fail. With -EBUSY.
It seems quite simple to make wait_for_zero_refcount() interruptible?
Something like...
--- devel/kernel/module.c~modules-make-wait_for_zero_refcount-interruptible 2006-03-07 02:36:46.000000000 -0800
+++ devel-akpm/kernel/module.c 2006-03-07 02:39:18.000000000 -0800
@@ -578,8 +578,8 @@ static void wait_for_zero_refcount(struc
mutex_unlock(&module_mutex);
for (;;) {
DEBUGP("Looking at refcount...\n");
- set_current_state(TASK_UNINTERRUPTIBLE);
- if (module_refcount(mod) == 0)
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (module_refcount(mod) == 0 || signal_pending(current))
break;
schedule();
}
@@ -645,8 +645,18 @@ sys_delete_module(const char __user *nam
goto out;
/* Never wait if forced. */
- if (!forced && module_refcount(mod) != 0)
+ if (!forced && module_refcount(mod) != 0) {
wait_for_zero_refcount(mod);
+ if (module_refcount(mod)) {
+ /*
+ * Signalled: back out
+ */
+ mod->state = MODULE_STATE_LIVE;
+ mod->waiter = NULL;
+ ret = -EINTR;
+ goto out;
+ }
+ }
/* Final destruction now noone is using it. */
if (mod->exit != NULL) {
_
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-07 10:41 ` Andrew Morton
@ 2006-03-07 11:08 ` Al Viro
2006-03-08 2:46 ` Rusty Russell
1 sibling, 0 replies; 56+ messages in thread
From: Al Viro @ 2006-03-07 11:08 UTC (permalink / raw)
To: Andrew Morton; +Cc: greg, dsp, arjan, linux-kernel, Rusty Russell
On Tue, Mar 07, 2006 at 02:41:13AM -0800, Andrew Morton wrote:
> Al Viro <viro@ftp.linux.org.uk> wrote:
> >
> > On Mon, Mar 06, 2006 at 01:53:44PM -0800, Greg KH wrote:
> > > > rmmod your_turd </sys/spew/from/your_turd
> > > > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
> > >
> > > To be fair, the only part of the kernel that supports the above process,
> > > is the network stack. And they implemented a special kind of lock to
> > > handle just this kind of thing.
> > >
> > > That is not something that I want the rest of the kernel to have to use.
> > > If your code blocks when doing the above thing, that's fine with me.
> >
> > One word: fail. With -EBUSY.
>
> It seems quite simple to make wait_for_zero_refcount() interruptible?
> Something like...
That's something we need to do anyway, but here it's not a matter of removal
blocking on module refcount:
| Regarding the above problem with the kobject reference count, this
| was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
| in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
| complete() in edac_memctrl_master_release() and then have the module
| cleanup code wait for the completion. I think there were a few other
| instances of this type of problem that I also fixed in the
| above-mentioned patch.
and that is clearly broken. Moreover, unlike having rmmod -w interruptible
(which is obviously a very good idea), here we would be in the middle of
cleanup sequence and it would be too late to back off if interrupted.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-06 21:32 ` Al Viro
2006-03-06 21:53 ` Greg KH
@ 2006-03-07 16:47 ` Dave Peterson
2006-03-07 17:04 ` Greg KH
1 sibling, 1 reply; 56+ messages in thread
From: Dave Peterson @ 2006-03-07 16:47 UTC (permalink / raw)
To: Al Viro; +Cc: Greg KH, Arjan van de Ven, dthompson, Linux Kernel Mailing List
On Monday 06 March 2006 13:32, Al Viro wrote:
> On Mon, Mar 06, 2006 at 01:01:37PM -0800, Dave Peterson wrote:
> > Regarding the above problem with the kobject reference count, this
> > was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
> > in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
> > complete() in edac_memctrl_master_release() and then have the module
> > cleanup code wait for the completion. I think there were a few other
> > instances of this type of problem that I also fixed in the
> > above-mentioned patch.
>
> This is not a fix, this is a goddamn deadlock.
> rmmod your_turd </sys/spew/from/your_turd
> and there you go. rmmod can _NOT_ wait for sysfs references to go away.
Ok, how does this sound:
- Modify EDAC so it uses kmalloc() to create the kobject.
- Eliminate edac_memctrl_master_release(). Instead, use kfree() as
the release method for the kobject. Here, it's important to use a
function -outside- of EDAC as the release method since the core
EDAC module may have been unloaded by the time the release method
is called.
- Make similar modifications to the other places in EDAC where
kobjects are used.
At least this will keep the module unload operation from blocking
in the module cleanup function due to a nonzero kobject reference
count. I'm going to be away from my keyboard for most of the rest of
today. However, if there is general agreement that this is a
reasonable way to proceed, I'll make a patch that implements this
tomorrow.
Dave
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-06 18:22 ` Randy.Dunlap
@ 2006-03-07 17:03 ` Dave Peterson
2006-03-07 17:20 ` Greg KH
2006-03-07 19:03 ` Arjan van de Ven
0 siblings, 2 replies; 56+ messages in thread
From: Dave Peterson @ 2006-03-07 17:03 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: arjan, linux-kernel, torvalds, alan, gregkh
On Monday 06 March 2006 10:22, Randy.Dunlap wrote:
> On Mon, 6 Mar 2006 10:14:22 -0800 Dave Peterson wrote:
> > On Sunday 05 March 2006 02:30, Arjan van de Ven wrote:
> > > On Sun, 2006-03-05 at 11:18 +0100, Arjan van de Ven wrote:
> > > > > +/* Main MC kobject release() function */
> > > > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > > > +{
> > > > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > > > +}
> > > > > +
> > > >
> > > > ehhh how on earth can this be right?
> > >
> > > oh and this stuff also violates the "one value per file" rule; can we
> > > fix that urgently before it becomes part of the ABI in 2.6.16??
> >
> > Ok, I'll admit to being a bit clueless about this. I'm not familiar
> > with the "one value per file" rule; can someone please explain?
>
> it's in Documentation/filesystems/sysfs.txt
> Strongly preferred.
Ok, I assume the comment refers to the following:
Attributes should be ASCII text files, preferably with only one value
per file. It is noted that it may not be efficient to contain only
value per file, so it is socially acceptable to express an array of
values of the same type.
I was initially a bit confused because I thought the comment
specifically pertained to the piece of code shown above. I need to
take a closer look at the EDAC sysfs code - I'm not as familiar with
some of its details as I should be. Thanks for pointing out the
issue.
Dave
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-07 16:47 ` Dave Peterson
@ 2006-03-07 17:04 ` Greg KH
2006-03-07 17:06 ` Dave Peterson
2006-03-08 1:03 ` Dmitry Torokhov
0 siblings, 2 replies; 56+ messages in thread
From: Greg KH @ 2006-03-07 17:04 UTC (permalink / raw)
To: Dave Peterson
Cc: Al Viro, Arjan van de Ven, dthompson, Linux Kernel Mailing List
On Tue, Mar 07, 2006 at 08:47:44AM -0800, Dave Peterson wrote:
> On Monday 06 March 2006 13:32, Al Viro wrote:
> > On Mon, Mar 06, 2006 at 01:01:37PM -0800, Dave Peterson wrote:
> > > Regarding the above problem with the kobject reference count, this
> > > was recently fixed in the -mm tree (see edac-kobject-sysfs-fixes.patch
> > > in 2.6.16-rc5-mm2). The fix I implemented was to add a call to
> > > complete() in edac_memctrl_master_release() and then have the module
> > > cleanup code wait for the completion. I think there were a few other
> > > instances of this type of problem that I also fixed in the
> > > above-mentioned patch.
> >
> > This is not a fix, this is a goddamn deadlock.
> > rmmod your_turd </sys/spew/from/your_turd
> > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
>
> Ok, how does this sound:
>
> - Modify EDAC so it uses kmalloc() to create the kobject.
> - Eliminate edac_memctrl_master_release(). Instead, use kfree() as
> the release method for the kobject. Here, it's important to use a
> function -outside- of EDAC as the release method since the core
> EDAC module may have been unloaded by the time the release method
> is called.
No, if this happens then you are using the kobject incorrectly. How
could it be held if your module is unloaded? Don't you have the module
reference counting logic correct?
> - Make similar modifications to the other places in EDAC where
> kobjects are used.
Yes.
> At least this will keep the module unload operation from blocking
> in the module cleanup function due to a nonzero kobject reference
> count. I'm going to be away from my keyboard for most of the rest of
> today. However, if there is general agreement that this is a
> reasonable way to proceed, I'll make a patch that implements this
> tomorrow.
It's a good start :)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-07 17:04 ` Greg KH
@ 2006-03-07 17:06 ` Dave Peterson
2006-03-08 1:03 ` Dmitry Torokhov
1 sibling, 0 replies; 56+ messages in thread
From: Dave Peterson @ 2006-03-07 17:06 UTC (permalink / raw)
To: Greg KH; +Cc: Al Viro, Arjan van de Ven, dthompson, Linux Kernel Mailing List
On Tuesday 07 March 2006 09:04, Greg KH wrote:
> > - Modify EDAC so it uses kmalloc() to create the kobject.
> > - Eliminate edac_memctrl_master_release(). Instead, use kfree() as
> > the release method for the kobject. Here, it's important to use a
> > function -outside- of EDAC as the release method since the core
> > EDAC module may have been unloaded by the time the release method
> > is called.
>
> No, if this happens then you are using the kobject incorrectly. How
> could it be held if your module is unloaded? Don't you have the module
> reference counting logic correct?
Oops... my mistake.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-07 17:03 ` Dave Peterson
@ 2006-03-07 17:20 ` Greg KH
2006-03-08 0:29 ` Dave Peterson
2006-03-07 19:03 ` Arjan van de Ven
1 sibling, 1 reply; 56+ messages in thread
From: Greg KH @ 2006-03-07 17:20 UTC (permalink / raw)
To: Dave Peterson; +Cc: Randy.Dunlap, arjan, linux-kernel, torvalds, alan, gregkh
On Tue, Mar 07, 2006 at 09:03:19AM -0800, Dave Peterson wrote:
> On Monday 06 March 2006 10:22, Randy.Dunlap wrote:
> > On Mon, 6 Mar 2006 10:14:22 -0800 Dave Peterson wrote:
> > > On Sunday 05 March 2006 02:30, Arjan van de Ven wrote:
> > > > On Sun, 2006-03-05 at 11:18 +0100, Arjan van de Ven wrote:
> > > > > > +/* Main MC kobject release() function */
> > > > > > +static void edac_memctrl_master_release(struct kobject *kobj)
> > > > > > +{
> > > > > > + debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
> > > > > > +}
> > > > > > +
> > > > >
> > > > > ehhh how on earth can this be right?
> > > >
> > > > oh and this stuff also violates the "one value per file" rule; can we
> > > > fix that urgently before it becomes part of the ABI in 2.6.16??
> > >
> > > Ok, I'll admit to being a bit clueless about this. I'm not familiar
> > > with the "one value per file" rule; can someone please explain?
> >
> > it's in Documentation/filesystems/sysfs.txt
> > Strongly preferred.
>
> Ok, I assume the comment refers to the following:
>
> Attributes should be ASCII text files, preferably with only one value
> per file. It is noted that it may not be efficient to contain only
> value per file, so it is socially acceptable to express an array of
> values of the same type.
>
> I was initially a bit confused because I thought the comment
> specifically pertained to the piece of code shown above. I need to
> take a closer look at the EDAC sysfs code - I'm not as familiar with
> some of its details as I should be. Thanks for pointing out the
> issue.
How else should we word the above text so that people realize that it
pertains to them? You aren't the first person to ignore it, so there is
a real problem here :(
thanks,
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-07 17:03 ` Dave Peterson
2006-03-07 17:20 ` Greg KH
@ 2006-03-07 19:03 ` Arjan van de Ven
2006-03-07 19:05 ` Greg KH
2006-03-09 23:51 ` Dave Peterson
1 sibling, 2 replies; 56+ messages in thread
From: Arjan van de Ven @ 2006-03-07 19:03 UTC (permalink / raw)
To: Dave Peterson; +Cc: Randy.Dunlap, linux-kernel, torvalds, alan, gregkh
> I was initially a bit confused because I thought the comment
> specifically pertained to the piece of code shown above. I need to
> take a closer look at the EDAC sysfs code - I'm not as familiar with
> some of its details as I should be. Thanks for pointing out the
> issue.
afaics it is a list of pci devices. these should just be symlinks to the
sysfs resource of these pci devices instead, not a flat table file.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-07 19:03 ` Arjan van de Ven
@ 2006-03-07 19:05 ` Greg KH
2006-03-09 23:51 ` Dave Peterson
1 sibling, 0 replies; 56+ messages in thread
From: Greg KH @ 2006-03-07 19:05 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Dave Peterson, Randy.Dunlap, linux-kernel, torvalds, alan
On Tue, Mar 07, 2006 at 08:03:38PM +0100, Arjan van de Ven wrote:
>
> > I was initially a bit confused because I thought the comment
> > specifically pertained to the piece of code shown above. I need to
> > take a closer look at the EDAC sysfs code - I'm not as familiar with
> > some of its details as I should be. Thanks for pointing out the
> > issue.
>
> afaics it is a list of pci devices. these should just be symlinks to the
> sysfs resource of these pci devices instead, not a flat table file.
Ugh, all this talk is making me wonder what in the world this code is
doing. Time to go look at it...
And people complain that all interfaces in userspace should be instantly
marked "stable" because we all know exactly what we are doing :)
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-07 17:20 ` Greg KH
@ 2006-03-08 0:29 ` Dave Peterson
0 siblings, 0 replies; 56+ messages in thread
From: Dave Peterson @ 2006-03-08 0:29 UTC (permalink / raw)
To: Greg KH; +Cc: Randy.Dunlap, arjan, linux-kernel, torvalds, alan, gregkh
On Tuesday 07 March 2006 09:20, Greg KH wrote:
> > Ok, I assume the comment refers to the following:
> >
> > Attributes should be ASCII text files, preferably with only one value
> > per file. It is noted that it may not be efficient to contain only
> > value per file, so it is socially acceptable to express an array of
> > values of the same type.
> >
> > I was initially a bit confused because I thought the comment
> > specifically pertained to the piece of code shown above. I need to
> > take a closer look at the EDAC sysfs code - I'm not as familiar with
> > some of its details as I should be. Thanks for pointing out the
> > issue.
>
> How else should we word the above text so that people realize that it
> pertains to them? You aren't the first person to ignore it, so there is
> a real problem here :(
I think it's written clearly as it is. I didn't write the
sysfs-specific parts of the EDAC code. However I'll take the blame
for not having reviewed it more carefully before it went upstream.
I guess my attention has been divided among too many things lately.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-07 17:04 ` Greg KH
2006-03-07 17:06 ` Dave Peterson
@ 2006-03-08 1:03 ` Dmitry Torokhov
2006-03-08 1:33 ` Greg KH
1 sibling, 1 reply; 56+ messages in thread
From: Dmitry Torokhov @ 2006-03-08 1:03 UTC (permalink / raw)
To: Greg KH
Cc: Dave Peterson, Al Viro, Arjan van de Ven, dthompson,
Linux Kernel Mailing List
On Tuesday 07 March 2006 12:04, Greg KH wrote:
> On Tue, Mar 07, 2006 at 08:47:44AM -0800, Dave Peterson wrote:
> > Ok, how does this sound:
> >
> > - Modify EDAC so it uses kmalloc() to create the kobject.
> > - Eliminate edac_memctrl_master_release(). Instead, use kfree() as
> > the release method for the kobject. Here, it's important to use a
> > function -outside- of EDAC as the release method since the core
> > EDAC module may have been unloaded by the time the release method
> > is called.
>
> No, if this happens then you are using the kobject incorrectly. How
> could it be held if your module is unloaded? Don't you have the module
> reference counting logic correct?
>
It is pretty hard to implement kobject handling correctly. Consider the
following:
rmmod device_driver < /sys/devices/pci0000:00/...../power/state
for a driver that creates/destroys device objects.
Opening 'state' attribute will pin device structure into memory but will
not increase _your_ module's refcount. It is nice if you have a subsystem
core split from drivers code - then you can keep core module reference
until device objects are gone and allow individual drivers be unloaded
freely. But for single-module system it is pretty hard, that's why
platform devices are popular.
--
Dmitry
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-08 1:03 ` Dmitry Torokhov
@ 2006-03-08 1:33 ` Greg KH
0 siblings, 0 replies; 56+ messages in thread
From: Greg KH @ 2006-03-08 1:33 UTC (permalink / raw)
To: Dmitry Torokhov
Cc: Dave Peterson, Al Viro, Arjan van de Ven, dthompson,
Linux Kernel Mailing List
On Tue, Mar 07, 2006 at 08:03:41PM -0500, Dmitry Torokhov wrote:
> On Tuesday 07 March 2006 12:04, Greg KH wrote:
> > On Tue, Mar 07, 2006 at 08:47:44AM -0800, Dave Peterson wrote:
> > > Ok, how does this sound:
> > >
> > > - Modify EDAC so it uses kmalloc() to create the kobject.
> > > - Eliminate edac_memctrl_master_release(). Instead, use kfree() as
> > > the release method for the kobject. Here, it's important to use a
> > > function -outside- of EDAC as the release method since the core
> > > EDAC module may have been unloaded by the time the release method
> > > is called.
> >
> > No, if this happens then you are using the kobject incorrectly. How
> > could it be held if your module is unloaded? Don't you have the module
> > reference counting logic correct?
> >
>
> It is pretty hard to implement kobject handling correctly. Consider the
> following:
>
> rmmod device_driver < /sys/devices/pci0000:00/...../power/state
>
> for a driver that creates/destroys device objects.
I agree, that's one reason I really hate the "default" attributes :(
To do this "right" we need to make the attributes dynamically created
and the owner set to the proper module. I did that for the module core
code and it's on my todo list for the driver core too.
> Opening 'state' attribute will pin device structure into memory but will
> not increase _your_ module's refcount. It is nice if you have a subsystem
> core split from drivers code - then you can keep core module reference
> until device objects are gone and allow individual drivers be unloaded
> freely. But for single-module system it is pretty hard, that's why
> platform devices are popular.
They are popular for when you don't have a "bus", and rightfully so.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-07 10:41 ` Andrew Morton
2006-03-07 11:08 ` Al Viro
@ 2006-03-08 2:46 ` Rusty Russell
1 sibling, 0 replies; 56+ messages in thread
From: Rusty Russell @ 2006-03-08 2:46 UTC (permalink / raw)
To: Andrew Morton; +Cc: Al Viro, greg, dsp, arjan, linux-kernel
On Tue, 2006-03-07 at 02:41 -0800, Andrew Morton wrote:
> Al Viro <viro@ftp.linux.org.uk> wrote:
> >
> > On Mon, Mar 06, 2006 at 01:53:44PM -0800, Greg KH wrote:
> > > > rmmod your_turd </sys/spew/from/your_turd
> > > > and there you go. rmmod can _NOT_ wait for sysfs references to go away.
> > >
> > > To be fair, the only part of the kernel that supports the above process,
> > > is the network stack. And they implemented a special kind of lock to
> > > handle just this kind of thing.
> > >
> > > That is not something that I want the rest of the kernel to have to use.
> > > If your code blocks when doing the above thing, that's fine with me.
> >
> > One word: fail. With -EBUSY.
>
> It seems quite simple to make wait_for_zero_refcount() interruptible?
> Something like...
Al is correct.
It would have been so simple to implement rmmod as blocking, but it
seems that not what people want. They want modprobe -r to fail if the
module is busy, without ever causing spurious failures.
Hope that clarifies?
Rusty.
--
ccontrol: http://ozlabs.org/~rusty/ccontrol
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-06 21:07 ` Arjan van de Ven
@ 2006-03-09 3:19 ` Dave Peterson
2006-03-09 3:44 ` Al Viro
2006-03-09 5:51 ` Greg KH
0 siblings, 2 replies; 56+ messages in thread
From: Dave Peterson @ 2006-03-09 3:19 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: Greg KH, Al Viro, Linux Kernel Mailing List
On Monday 06 March 2006 13:07, Arjan van de Ven wrote:
> > Is it more desirable to dynamically allocate kobjects than to declare
> > them statically?
>
> Yes
>
> > If so, I'd be curious to know why dynamic
> > allocation is preferred over static allocation.
>
> because the lifetime of the kobject is independent of the lifetime of
> the memory of your static allocation.
> Separate lifetimes -> separate memory is a very good design principle.
I'm not familiar with the internals of the module unloading code.
However, my understanding of the discussion so far is that the kernel
will refuse to unload a module while any of its kobjects still have
nonzero reference counts (either by waiting for the reference counts
to hit 0 or returning -EBUSY).
If this is the case, then I don't see any harm in declaring kobjects
statically. Declaring a kobject statically is simpler than
dynamically allocating and freeing it. Am I still missing something?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-09 3:19 ` Dave Peterson
@ 2006-03-09 3:44 ` Al Viro
2006-03-09 5:51 ` Greg KH
1 sibling, 0 replies; 56+ messages in thread
From: Al Viro @ 2006-03-09 3:44 UTC (permalink / raw)
To: Dave Peterson; +Cc: Arjan van de Ven, Greg KH, Linux Kernel Mailing List
On Wed, Mar 08, 2006 at 07:19:59PM -0800, Dave Peterson wrote:
> I'm not familiar with the internals of the module unloading code.
> However, my understanding of the discussion so far is that the kernel
> will refuse to unload a module while any of its kobjects still have
> nonzero reference counts (either by waiting for the reference counts
> to hit 0 or returning -EBUSY).
>
> If this is the case,
... the world you are living in is drastically different from the one
where the rest of us lives.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-09 3:19 ` Dave Peterson
2006-03-09 3:44 ` Al Viro
@ 2006-03-09 5:51 ` Greg KH
1 sibling, 0 replies; 56+ messages in thread
From: Greg KH @ 2006-03-09 5:51 UTC (permalink / raw)
To: Dave Peterson; +Cc: Arjan van de Ven, Al Viro, Linux Kernel Mailing List
On Wed, Mar 08, 2006 at 07:19:59PM -0800, Dave Peterson wrote:
> On Monday 06 March 2006 13:07, Arjan van de Ven wrote:
> > > Is it more desirable to dynamically allocate kobjects than to declare
> > > them statically?
> >
> > Yes
> >
> > > If so, I'd be curious to know why dynamic
> > > allocation is preferred over static allocation.
> >
> > because the lifetime of the kobject is independent of the lifetime of
> > the memory of your static allocation.
> > Separate lifetimes -> separate memory is a very good design principle.
>
> I'm not familiar with the internals of the module unloading code.
> However, my understanding of the discussion so far is that the kernel
> will refuse to unload a module while any of its kobjects still have
> nonzero reference counts (either by waiting for the reference counts
> to hit 0 or returning -EBUSY).
There is no tie between kobjects and modules. Only between attributes
and modules _if_ you have properly set them up.
That is the main problem, kobjects are data and modules are code, you
need to be careful to handle their reference counting properly.
good luck,
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-07 19:03 ` Arjan van de Ven
2006-03-07 19:05 ` Greg KH
@ 2006-03-09 23:51 ` Dave Peterson
2006-03-10 0:02 ` Greg KH
1 sibling, 1 reply; 56+ messages in thread
From: Dave Peterson @ 2006-03-09 23:51 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Randy.Dunlap, linux-kernel, torvalds, alan, gregkh, Doug Thompson,
bluesmoke-devel
On Tuesday 07 March 2006 11:03, Arjan van de Ven wrote:
> afaics it is a list of pci devices. these should just be symlinks to the
> sysfs resource of these pci devices instead, not a flat table file.
Ok, I'm looking at the EDAC sysfs interface. I see the following
issues concerning the "one value per file" rule:
1. /sys/devices/system/edac/mc/mc0/module_name contains two
values, a module name and a version:
# cat /sys/devices/system/edac/mc/mc0/module_name
k8_edac Ver: 2.0.1.devel Mar 8 2006
#
2. /sys/devices/system/edac/mc/mc0/supported_mem_type contains
the following on the machine I am looking at:
# cat /sys/devices/system/edac/mc/mc0/supported_mem_type
Unbuffered-DDR Registered-DDR
#
Here we have a whitespace-delimited list of values. Likewise,
the following files contain whitespace-delimited lists:
/sys/devices/system/edac/mc/mc0/edac_capability
/sys/devices/system/edac/mc/mc0/edac_current_capability
3. The following files contain comma-delimited lists of
(vendor ID, device ID) tuples:
/sys/devices/system/edac/pci/pci_parity_blacklist
/sys/devices/system/edac/pci/pci_parity_whitelist
I assume this is what Arjan is referring to.
Documentation/drivers/edac/edac.txt gives the following
description of how the whitelist functions:
This control file allows for an explicit list of PCI
devices to be scanned for parity errors. Only devices
found on this list will be examined. The list is a line
of hexadecimel VENDOR and DEVICE ID tuples:
1022:7450,1434:16a6
One or more can be inserted, seperated by a comma.
To write the above list doing the following as one
command line:
echo "1022:7450,1434:16a6"
> /sys/devices/system/edac/pci/pci_parity_whitelist
To display what the whitelist is, simply 'cat' the same
file.
Looking at the current EDAC implementation, these are all of the
"one value per file" issues I see. If anyone sees any others I
missed, please let me know. Here are my thoughts on each:
Issue #1
--------
Fixing this is easy. /sys/devices/system/edac/mc/mc0/module_name
can be replaced by two separate files, one providing the name and
the other providing the version:
/sys/devices/system/edac/mc/mc0/module_name
/sys/devices/system/edac/mc/mc0/module_version
Issue #2
--------
To fix this, /sys/devices/system/edac/mc/mc0/supported_mem_type
can be made into a directory containing a file representing each
supported memory type. Thus we might have the following:
/sys/devices/system/edac/mc/mc0/supported_mem_type
/sys/devices/system/edac/mc/mc0/supported_mem_type/Unbuffered-DDR
/sys/devices/system/edac/mc/mc0/supported_mem_type/Registered-DDR
In the above example, the files Unbuffered-DDR and Registered-DDR
would each be empty in content. The presence of each file would
indicate that the memory type it represents is supported.
Issue #3
--------
I am unclear about what to do here. If the list contents were
read-only, it would be relatively easy to make
/sys/devices/system/edac/pci/pci_parity_whitelist into a directory
containing symlinks, one for each device. However, the user is
supposed to be able to modify the list contents. This would imply
that the user creates and destroys symlinks. Does sysfs currently
support this sort of behavior? If not, what is the preferred
means for implementing a user-modifiable set of values?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-09 23:51 ` Dave Peterson
@ 2006-03-10 0:02 ` Greg KH
2006-03-10 1:46 ` Dave Peterson
0 siblings, 1 reply; 56+ messages in thread
From: Greg KH @ 2006-03-10 0:02 UTC (permalink / raw)
To: Dave Peterson
Cc: Arjan van de Ven, Randy.Dunlap, linux-kernel, torvalds, alan,
gregkh, Doug Thompson, bluesmoke-devel
On Thu, Mar 09, 2006 at 03:51:25PM -0800, Dave Peterson wrote:
> On Tuesday 07 March 2006 11:03, Arjan van de Ven wrote:
> > afaics it is a list of pci devices. these should just be symlinks to the
> > sysfs resource of these pci devices instead, not a flat table file.
>
> Ok, I'm looking at the EDAC sysfs interface. I see the following
> issues concerning the "one value per file" rule:
>
> 1. /sys/devices/system/edac/mc/mc0/module_name contains two
> values, a module name and a version:
>
> # cat /sys/devices/system/edac/mc/mc0/module_name
> k8_edac Ver: 2.0.1.devel Mar 8 2006
Woah. That's what /sys/modules/ is for right? Don't add new stuff
please.
> 2. /sys/devices/system/edac/mc/mc0/supported_mem_type contains
> the following on the machine I am looking at:
>
> # cat /sys/devices/system/edac/mc/mc0/supported_mem_type
> Unbuffered-DDR Registered-DDR
> #
>
> Here we have a whitespace-delimited list of values. Likewise,
> the following files contain whitespace-delimited lists:
>
> /sys/devices/system/edac/mc/mc0/edac_capability
> /sys/devices/system/edac/mc/mc0/edac_current_capability
What exactly do they look like?
> 3. The following files contain comma-delimited lists of
> (vendor ID, device ID) tuples:
>
> /sys/devices/system/edac/pci/pci_parity_blacklist
> /sys/devices/system/edac/pci/pci_parity_whitelist
What exactly do they look like?
> I assume this is what Arjan is referring to.
> Documentation/drivers/edac/edac.txt gives the following
> description of how the whitelist functions:
>
> This control file allows for an explicit list of PCI
> devices to be scanned for parity errors. Only devices
> found on this list will be examined. The list is a line
> of hexadecimel VENDOR and DEVICE ID tuples:
>
> 1022:7450,1434:16a6
>
> One or more can be inserted, seperated by a comma.
> To write the above list doing the following as one
> command line:
>
> echo "1022:7450,1434:16a6"
> > /sys/devices/system/edac/pci/pci_parity_whitelist
>
> To display what the whitelist is, simply 'cat' the same
> file.
>
> Looking at the current EDAC implementation, these are all of the
> "one value per file" issues I see. If anyone sees any others I
> missed, please let me know. Here are my thoughts on each:
>
> Issue #1
> --------
> Fixing this is easy. /sys/devices/system/edac/mc/mc0/module_name
> can be replaced by two separate files, one providing the name and
> the other providing the version:
>
> /sys/devices/system/edac/mc/mc0/module_name
> /sys/devices/system/edac/mc/mc0/module_version
No, these should just be deleted. Use the proper MODULE_* macros for
these if you really want to display them to users.
> Issue #2
> --------
> To fix this, /sys/devices/system/edac/mc/mc0/supported_mem_type
> can be made into a directory containing a file representing each
> supported memory type. Thus we might have the following:
>
> /sys/devices/system/edac/mc/mc0/supported_mem_type
> /sys/devices/system/edac/mc/mc0/supported_mem_type/Unbuffered-DDR
> /sys/devices/system/edac/mc/mc0/supported_mem_type/Registered-DDR
>
> In the above example, the files Unbuffered-DDR and Registered-DDR
> would each be empty in content. The presence of each file would
> indicate that the memory type it represents is supported.
I don't think the original file is really a big problem.
> Issue #3
> --------
> I am unclear about what to do here. If the list contents were
> read-only, it would be relatively easy to make
> /sys/devices/system/edac/pci/pci_parity_whitelist into a directory
> containing symlinks, one for each device. However, the user is
> supposed to be able to modify the list contents. This would imply
> that the user creates and destroys symlinks. Does sysfs currently
> support this sort of behavior? If not, what is the preferred
> means for implementing a user-modifiable set of values?
No it doesn't. How big can this list get?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
@ 2006-03-10 0:44 Doug Thompson
0 siblings, 0 replies; 56+ messages in thread
From: Doug Thompson @ 2006-03-10 0:44 UTC (permalink / raw)
To: greg
Cc: arjan, gregkh, bluesmoke-devel, dsp, Doug Thompson, torvalds,
alan, linux-kernel, rdunlap
On Fri, 2006-03-10 at 00:02 +0000, Greg KH wrote:
> On Thu, Mar 09, 2006 at 03:51:25PM -0800, Dave Peterson wrote:
> > On Tuesday 07 March 2006 11:03, Arjan van de Ven wrote:
> > > afaics it is a list of pci devices. these should just be symlinks to the
> > > sysfs resource of these pci devices instead, not a flat table file.
> >
> > Ok, I'm looking at the EDAC sysfs interface. I see the following
> > issues concerning the "one value per file" rule:
> >
> > 1. /sys/devices/system/edac/mc/mc0/module_name contains two
> > values, a module name and a version:
> >
> > # cat /sys/devices/system/edac/mc/mc0/module_name
> > k8_edac Ver: 2.0.1.devel Mar 8 2006
>
> Woah. That's what /sys/modules/ is for right? Don't add new stuff
> please.
>
> > 2. /sys/devices/system/edac/mc/mc0/supported_mem_type contains
> > the following on the machine I am looking at:
> >
> > # cat /sys/devices/system/edac/mc/mc0/supported_mem_type
> > Unbuffered-DDR Registered-DDR
> > #
> >
> > Here we have a whitespace-delimited list of values. Likewise,
> > the following files contain whitespace-delimited lists:
> >
> > /sys/devices/system/edac/mc/mc0/edac_capability
> > /sys/devices/system/edac/mc/mc0/edac_current_capability
>
> What exactly do they look like?
Unbuffered-DDR Registered-DDR
These come from the memory controller on what types of memory are
possible (or capable), NOT actuals. Its a domain of multiple types.
>
> > 3. The following files contain comma-delimited lists of
> > (vendor ID, device ID) tuples:
> >
> > /sys/devices/system/edac/pci/pci_parity_blacklist
> > /sys/devices/system/edac/pci/pci_parity_whitelist
>
> What exactly do they look like?
Example:
1022:7450,1434:16a6
is a list of 2 devices that should be skipped. Retrieved via a lspci.
PCI vendor-ID:device-ID series list. One or more devices are possible
The number of devices depends on how many devices should be skipped on a
given server.
Somehow the system admin needs to tell edac/pci to skip one or more
devices in the PCI device list. As the iterator proceeds through the PCI
device list, it compares each to a blacklist (or whitelist if on - only
one or the other can be one) entry. If found it skipsits. I toyed with
the idea of each PCI device having a control file and the admin setting
a scan/no-scan state, but quickly dropped that.
If a device is blacklisted, it is non-conforming to the PCI Parity
standard of operation and its status cannot be trusted. PCI-X Infiniband
card is such a card, though they have promised a firmware update.
>
> > I assume this is what Arjan is referring to.
> > Documentation/drivers/edac/edac.txt gives the following
> > description of how the whitelist functions:
> >
> > This control file allows for an explicit list of PCI
> > devices to be scanned for parity errors. Only devices
> > found on this list will be examined. The list is a line
> > of hexadecimel VENDOR and DEVICE ID tuples:
> >
> > 1022:7450,1434:16a6
> >
> > One or more can be inserted, seperated by a comma.
> > To write the above list doing the following as one
> > command line:
> >
> > echo "1022:7450,1434:16a6"
> > > /sys/devices/system/edac/pci/pci_parity_whitelist
> >
> > To display what the whitelist is, simply 'cat' the same
> > file.
> >
> > Looking at the current EDAC implementation, these are all of the
> > "one value per file" issues I see. If anyone sees any others I
> > missed, please let me know. Here are my thoughts on each:
> >
> > Issue #1
> > --------
> > Fixing this is easy. /sys/devices/system/edac/mc/mc0/module_name
> > can be replaced by two separate files, one providing the name and
> > the other providing the version:
> >
> > /sys/devices/system/edac/mc/mc0/module_name
> > /sys/devices/system/edac/mc/mc0/module_version
>
> No, these should just be deleted. Use the proper MODULE_* macros for
> these if you really want to display them to users.
When these macros are used, they then show up in the /sys/module/xxxx
directory, is that correct?
>
> > Issue #2
> > --------
> > To fix this, /sys/devices/system/edac/mc/mc0/supported_mem_type
> > can be made into a directory containing a file representing each
> > supported memory type. Thus we might have the following:
> >
> > /sys/devices/system/edac/mc/mc0/supported_mem_type
> > /sys/devices/system/edac/mc/mc0/supported_mem_type/Unbuffered-DDR
> > /sys/devices/system/edac/mc/mc0/supported_mem_type/Registered-DDR
> >
> > In the above example, the files Unbuffered-DDR and Registered-DDR
> > would each be empty in content. The presence of each file would
> > indicate that the memory type it represents is supported.
This attribute reports POSSIBLE memory types, not ACTUAL memory type.
>
> I don't think the original file is really a big problem.
Are you saying to leave it as is?
the supported_mem_type reports what the memory controller is CAPABLE of
supporting. Then in '..../mc0/csrowN/mem_type' reports what is ACTUALLY
being used.
>
> > Issue #3
> > --------
> > I am unclear about what to do here. If the list contents were
> > read-only, it would be relatively easy to make
> > /sys/devices/system/edac/pci/pci_parity_whitelist into a directory
> > containing symlinks, one for each device. However, the user is
> > supposed to be able to modify the list contents. This would imply
> > that the user creates and destroys symlinks. Does sysfs currently
> > support this sort of behavior? If not, what is the preferred
> > means for implementing a user-modifiable set of values?
This input and presentation of a list was troublesome in matching the
one attribute policy.
>
> No it doesn't. How big can this list get?
Depends on how many PCI devices there are AND which ones have been
identified as non-conforming to the PCI standard for PCI parity status.
At most, the Number of devices minus 1, but then using a 'whitelist'
with just that one lone device would be better. IF all devices are
listed in a blacklist, then just turn off parity checking instead.
Getting the blacklist/whitelist into the edac driver took some pondering
and various designs on my part. I might have missed a more obvious
solution.
>
> thanks,
>
> greg k-h
thanks
doug t
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-10 0:02 ` Greg KH
@ 2006-03-10 1:46 ` Dave Peterson
2006-03-10 7:36 ` Arjan van de Ven
0 siblings, 1 reply; 56+ messages in thread
From: Dave Peterson @ 2006-03-10 1:46 UTC (permalink / raw)
To: Greg KH
Cc: Arjan van de Ven, Randy.Dunlap, linux-kernel, torvalds, alan,
gregkh, Doug Thompson, bluesmoke-devel
On Thursday 09 March 2006 16:02, Greg KH wrote:
> > 1. /sys/devices/system/edac/mc/mc0/module_name contains two
> > values, a module name and a version:
> >
> > # cat /sys/devices/system/edac/mc/mc0/module_name
> > k8_edac Ver: 2.0.1.devel Mar 8 2006
>
> Woah. That's what /sys/modules/ is for right? Don't add new stuff
> please.
My apologies for my lack of familiarity with how the contents of /sys
are supposed to be laid out. I'm just looking at what EDAC currently
does, trying to fix what's broken, and in the process learning how
sysfs is supposed to work.
> > Here we have a whitespace-delimited list of values. Likewise,
> > the following files contain whitespace-delimited lists:
> >
> > /sys/devices/system/edac/mc/mc0/edac_capability
> > /sys/devices/system/edac/mc/mc0/edac_current_capability
>
> What exactly do they look like?
On the machine I'm currently looking at, here is what I see:
# cd /sys/devices/system/edac/mc/mc0
# cat edac_capability
None SECDED S4ECD4ED
# cat edac_current_capability
None
#
Although edac_current_capability is shown above as having only one
value, it could in principle contain a few values (some subset of
the values contained in edac_capability).
> > 3. The following files contain comma-delimited lists of
> > (vendor ID, device ID) tuples:
> >
> > /sys/devices/system/edac/pci/pci_parity_blacklist
> > /sys/devices/system/edac/pci/pci_parity_whitelist
>
> What exactly do they look like?
Initially, both files are empty. If a user writes a list of values
to a file, the file will contain the values that the user wrote. For
instance, below we create a list containing two
(vendor ID, device ID) tuples:
# cd /sys/devices/system/edac/pci
# cat pci_parity_blacklist
# echo "1022:7450,1434:16a6" > pci_parity_blacklist
# cat pci_parity_blacklist
1022:7450,1434:16a6
#
> > Issue #1
> > --------
> > Fixing this is easy. /sys/devices/system/edac/mc/mc0/module_name
> > can be replaced by two separate files, one providing the name and
> > the other providing the version:
> >
> > /sys/devices/system/edac/mc/mc0/module_name
> > /sys/devices/system/edac/mc/mc0/module_version
>
> No, these should just be deleted. Use the proper MODULE_* macros for
> these if you really want to display them to users.
Ok, I'll make sure they get deleted.
> > Issue #2
> > --------
> > To fix this, /sys/devices/system/edac/mc/mc0/supported_mem_type
> > can be made into a directory containing a file representing each
> > supported memory type. Thus we might have the following:
> >
> > /sys/devices/system/edac/mc/mc0/supported_mem_type
> > /sys/devices/system/edac/mc/mc0/supported_mem_type/Unbuffered-DDR
> > /sys/devices/system/edac/mc/mc0/supported_mem_type/Registered-DDR
> >
> > In the above example, the files Unbuffered-DDR and Registered-DDR
> > would each be empty in content. The presence of each file would
> > indicate that the memory type it represents is supported.
>
> I don't think the original file is really a big problem.
Ok, so I'll leave this as-is?
> > Issue #3
> > --------
> > I am unclear about what to do here. If the list contents were
> > read-only, it would be relatively easy to make
> > /sys/devices/system/edac/pci/pci_parity_whitelist into a directory
> > containing symlinks, one for each device. However, the user is
> > supposed to be able to modify the list contents. This would imply
> > that the user creates and destroys symlinks. Does sysfs currently
> > support this sort of behavior? If not, what is the preferred
> > means for implementing a user-modifiable set of values?
>
> No it doesn't. How big can this list get?
It depends on how many PCI devices in your machine you wish to
blacklist or whitelist. The motivation for this feature is that
certain known badly-designed devices report an endless stream of
spurious PCI bus parity errors. We want to skip such devices when
checking for PCI bus parity errors.
Eventually we are thinking of maintaining a master list of known
bad hardware. The list will grow over time as users encounter
and report new instances of flaky devices. However, the user would
not have to feed the entire list to EDAC. I can imagine someone
writing a script that behaves as follows:
1. Determine the entire set of PCI devices present in the user's
machine.
2. Read the set of known bad hardware from the master list and
compute the intersection with the set of devices actually
present in the user's machine.
3. Feed the result of step 2 above to EDAC.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-10 1:46 ` Dave Peterson
@ 2006-03-10 7:36 ` Arjan van de Ven
2006-03-10 11:06 ` Tim Small
2006-03-10 17:46 ` Dave Peterson
0 siblings, 2 replies; 56+ messages in thread
From: Arjan van de Ven @ 2006-03-10 7:36 UTC (permalink / raw)
To: Dave Peterson
Cc: Greg KH, Randy.Dunlap, linux-kernel, torvalds, alan, gregkh,
Doug Thompson, bluesmoke-devel
On Thu, 2006-03-09 at 17:46 -0800, Dave Peterson wrote:
> > > Issue #3
> > > --------
> > > I am unclear about what to do here. If the list contents were
> > > read-only, it would be relatively easy to make
> > > /sys/devices/system/edac/pci/pci_parity_whitelist into a directory
> > > containing symlinks, one for each device. However, the user is
> > > supposed to be able to modify the list contents. This would imply
> > > that the user creates and destroys symlinks. Does sysfs currently
> > > support this sort of behavior? If not, what is the preferred
> > > means for implementing a user-modifiable set of values?
> >
> > No it doesn't. How big can this list get?
>
> It depends on how many PCI devices in your machine you wish to
> blacklist or whitelist. The motivation for this feature is that
> certain known badly-designed devices report an endless stream of
> spurious PCI bus parity errors. We want to skip such devices when
> checking for PCI bus parity errors.
ok so this is actually a per pci device property!
I would suggest moving this property to the pci device itself, not doing
it inside an edac directory.
by doing that you get the advantage that 1) it's a more logical place,
and 2) users can do it even for 1 of 2 cards, if they have 2 cards of
the same make and 3) you can use the generic kernel quirk interface for
this and 4) drivers can in principle control this for their hardware in
complex cases
I understand that on a PC, EDAC is the only user. but ppc has a similar
mechanism I suspect, and they more than likely would be able to share
this property....
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-10 7:36 ` Arjan van de Ven
@ 2006-03-10 11:06 ` Tim Small
2006-03-10 11:40 ` Arjan van de Ven
2006-03-10 17:46 ` Dave Peterson
1 sibling, 1 reply; 56+ messages in thread
From: Tim Small @ 2006-03-10 11:06 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Dave Peterson, Greg KH, linux-kernel, Doug Thompson,
bluesmoke-devel
Arjan van de Ven wrote:
> It depends on how many PCI devices in your machine you wish to
>
>>blacklist or whitelist. The motivation for this feature is that
>>certain known badly-designed devices report an endless stream of
>>spurious PCI bus parity errors. We want to skip such devices when
>>checking for PCI bus parity errors.
>>
>>
>
>ok so this is actually a per pci device property!
>I would suggest moving this property to the pci device itself, not doing
>it inside an edac directory.
>
>
Yes, this seems more sensible to me. For one thing, I suspect that just
keying on vendor:device is probably too blunt for this and that
blacklisting a particular PCI device revision is a likely requirement,
as well as subsystem vendor/subsystem device.
Tim.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-10 11:06 ` Tim Small
@ 2006-03-10 11:40 ` Arjan van de Ven
0 siblings, 0 replies; 56+ messages in thread
From: Arjan van de Ven @ 2006-03-10 11:40 UTC (permalink / raw)
To: Tim Small
Cc: Dave Peterson, Greg KH, linux-kernel, Doug Thompson,
bluesmoke-devel
On Fri, 2006-03-10 at 11:06 +0000, Tim Small wrote:
> Arjan van de Ven wrote:
>
> > It depends on how many PCI devices in your machine you wish to
> >
> >>blacklist or whitelist. The motivation for this feature is that
> >>certain known badly-designed devices report an endless stream of
> >>spurious PCI bus parity errors. We want to skip such devices when
> >>checking for PCI bus parity errors.
> >>
> >>
> >
> >ok so this is actually a per pci device property!
> >I would suggest moving this property to the pci device itself, not doing
> >it inside an edac directory.
> >
> >
> Yes, this seems more sensible to me. For one thing, I suspect that just
> keying on vendor:device is probably too blunt for this and that
> blacklisting a particular PCI device revision is a likely requirement,
> as well as subsystem vendor/subsystem device.
and maybe even something as funky as firmware version.
So it for sure is a per device (not per ID) property, and something that
needs a global quirk table kind of thing with the option to do per
driver overrides
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
@ 2006-03-10 16:56 Doug Thompson
2006-03-10 17:11 ` Arjan van de Ven
0 siblings, 1 reply; 56+ messages in thread
From: Doug Thompson @ 2006-03-10 16:56 UTC (permalink / raw)
To: arjan; +Cc: tim, greg, bluesmoke-devel, dsp, Doug Thompson, linux-kernel
On Fri, 2006-03-10 at 11:40 +0000, Arjan van de Ven wrote:
> On Fri, 2006-03-10 at 11:06 +0000, Tim Small wrote:
> > Arjan van de Ven wrote:
> >
> > > It depends on how many PCI devices in your machine you wish to
> > >
> > >>blacklist or whitelist. The motivation for this feature is that
> > >>certain known badly-designed devices report an endless stream of
> > >>spurious PCI bus parity errors. We want to skip such devices when
> > >>checking for PCI bus parity errors.
> > >>
> > >>
> > >
> > >ok so this is actually a per pci device property!
> > >I would suggest moving this property to the pci device itself, not doing
> > >it inside an edac directory.
> > >
> > >
> > Yes, this seems more sensible to me. For one thing, I suspect that just
> > keying on vendor:device is probably too blunt for this and that
> > blacklisting a particular PCI device revision is a likely requirement,
> > as well as subsystem vendor/subsystem device.
>
> and maybe even something as funky as firmware version.
> So it for sure is a per device (not per ID) property, and something that
> needs a global quirk table kind of thing with the option to do per
> driver overrides
Very definitely, this non-conforming misfeature of PCI compliance is a
per PCI device attribute. At the very least it is tied to VENDOR:DEVICE
tuple, and probably a subsystem vendor/device tuple as well. As to
firmware, that is also likely. Mellanox promised a new firmware update
to their board that supposely fixes this issue. Yet, I find no firmware
value in the PCI spec, just the Revision ID, which could be used as
firmware identifier. THis is up to the vendor.
So in order to be sure I understand, if this PARITY Non-Conformance
attribute was "moved" to the per device directory of sysfs
(/sys/devices/pci0000:00/0000:00:06.0 for an example), then we would
need a userland attribute file created here and then stored in the
'pci_dev' structure or the mentioned quirk structure. This field then
could be set by userland script(s), then EDAC-PCI could example that
data in its iteration of pci devices. Is that correct?
I will admit I have heard of the "quirk" tables in the kernel, but don't
fully understand them. From what I read here, a PCI device quirk table
would be a parallel structure to the 'struct pci_dev' for a given PCI
device. So every pci_dev structure created, then a quirk table structure
would be created, and in that quirk entry is a PARITY data item. That
data item is exposed into sysfs in the /sys/devices/pci* as the example
above.
An new getter functions would be needed so the EDAC PCI iterator could
'get' the current value of the attribute.
If the above is correct, then who would we need to contact for said
modification or approval for such? Is that you Greg KH, since you are
listed as the PCI SUBSYSTEM maintainer?
thanks
doug t
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-10 16:56 Doug Thompson
@ 2006-03-10 17:11 ` Arjan van de Ven
0 siblings, 0 replies; 56+ messages in thread
From: Arjan van de Ven @ 2006-03-10 17:11 UTC (permalink / raw)
To: Doug Thompson; +Cc: tim, greg, bluesmoke-devel, dsp, linux-kernel
> > and maybe even something as funky as firmware version.
> > So it for sure is a per device (not per ID) property, and something that
> > needs a global quirk table kind of thing with the option to do per
> > driver overrides
>
> Very definitely, this non-conforming misfeature of PCI compliance is a
> per PCI device attribute. At the very least it is tied to VENDOR:DEVICE
> tuple, and probably a subsystem vendor/device tuple as well. As to
> firmware, that is also likely. Mellanox promised a new firmware update
> to their board that supposely fixes this issue. Yet, I find no firmware
> value in the PCI spec, just the Revision ID, which could be used as
> firmware identifier. THis is up to the vendor.
exactly. So this is why a device driver needs to be able to override.
Eg for such device turn it off with a global quirk, and then let the
driver say "eh it's ok for THIS case"
> So in order to be sure I understand, if this PARITY Non-Conformance
> attribute was "moved" to the per device directory of sysfs
> (/sys/devices/pci0000:00/0000:00:06.0 for an example), then we would
> need a userland attribute file created here and then stored in the
> 'pci_dev' structure
yes. Well to some degree I'm not even sure it needs to be exposed to
userland like this. At least normally the kernel should know this
internally and automatically. (after all the kernel has the job to
abstract the hardware for the rest of the system; dealing with broken
hardware is part of that)
> or the mentioned quirk structure. This field then
> could be set by userland script(s), then EDAC-PCI could example that
> data in its iteration of pci devices. Is that correct?
that sounds way way way too complex. If this is "just" a field in the
pci device... why would userland need to get involved? Your kernel side
should be able to see that directly just fine.
> If the above is correct, then who would we need to contact for said
> modification or approval for such? Is that you Greg KH, since you are
> listed as the PCI SUBSYSTEM maintainer?
Greg needs to OK the addition to the pci struct, but I don't forsee a
problem personally since this is a more or less obvious and logical
thing to add, and useful for more than one architecture
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
@ 2006-03-10 17:28 Doug Thompson
0 siblings, 0 replies; 56+ messages in thread
From: Doug Thompson @ 2006-03-10 17:28 UTC (permalink / raw)
To: arjan; +Cc: tim, greg, bluesmoke-devel, dsp, Doug Thompson, linux-kernel
On Fri, 2006-03-10 at 17:11 +0000, Arjan van de Ven wrote:
>
> > So in order to be sure I understand, if this PARITY Non-Conformance
> > attribute was "moved" to the per device directory of sysfs
> > (/sys/devices/pci0000:00/0000:00:06.0 for an example), then we would
> > need a userland attribute file created here and then stored in the
> > 'pci_dev' structure
>
> yes. Well to some degree I'm not even sure it needs to be exposed to
> userland like this. At least normally the kernel should know this
> internally and automatically. (after all the kernel has the job to
> abstract the hardware for the rest of the system; dealing with broken
> hardware is part of that)
The problem we have run into is latency of the OS keeping up to date
with the characteristics of a device. When I added the scan PCI parity
to bluesmoke/EDAC I made the ago old assumption that hardware vendors
followed the specs, at least in the majority. (Gee, they didn't 25 years
ago so why should they today??). Well it has been trial and error as we
placed bluesmoke on systems with the various cards we use and more than
I expected fail this conformance.
We needed the ability to override in real (people) time on a cluster,
the scanning of non-conforming boards. We cannot wait months for the OS
to catch up with new board. That is why I placed (breaking the model -
sorry about that) the blacklist "close" to the EDAC module.
For these reasons, we needed an access point for userland to override
the attribute of scanning a PCI device for parity. Device drivers won't
necessarily do this themselves. Something outside the kernel might have
to do it with data found after a device has been added to the system.
>
>
> > or the mentioned quirk structure. This field then
> > could be set by userland script(s), then EDAC-PCI could example that
> > data in its iteration of pci devices. Is that correct?
>
> that sounds way way way too complex. If this is "just" a field in the
> pci device... why would userland need to get involved? Your kernel side
> should be able to see that directly just fine.
The need for userland to actually DO the setting of the attribute
indicating that this device is non-conforming. That would be why
>
>
>
> > If the above is correct, then who would we need to contact for said
> > modification or approval for such? Is that you Greg KH, since you are
> > listed as the PCI SUBSYSTEM maintainer?
>
> Greg needs to OK the addition to the pci struct, but I don't forsee a
> problem personally since this is a more or less obvious and logical
> thing to add, and useful for more than one architecture
great! thanks
doug t
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-10 7:36 ` Arjan van de Ven
2006-03-10 11:06 ` Tim Small
@ 2006-03-10 17:46 ` Dave Peterson
2006-03-10 17:58 ` Arjan van de Ven
1 sibling, 1 reply; 56+ messages in thread
From: Dave Peterson @ 2006-03-10 17:46 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Greg KH, Randy.Dunlap, linux-kernel, torvalds, alan, gregkh,
Doug Thompson, bluesmoke-devel
On Thursday 09 March 2006 23:36, Arjan van de Ven wrote:
> ok so this is actually a per pci device property!
> I would suggest moving this property to the pci device itself, not doing
> it inside an edac directory.
>
> by doing that you get the advantage that 1) it's a more logical place,
> and 2) users can do it even for 1 of 2 cards, if they have 2 cards of
> the same make and 3) you can use the generic kernel quirk interface for
> this and 4) drivers can in principle control this for their hardware in
> complex cases
>
> I understand that on a PC, EDAC is the only user. but ppc has a similar
> mechanism I suspect, and they more than likely would be able to share
> this property....
I'd be curious to hear people's opinions on the following idea:
move the PCI bus parity error checking functionality from EDAC
to the PCI subsystem.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-10 17:46 ` Dave Peterson
@ 2006-03-10 17:58 ` Arjan van de Ven
2006-03-10 19:07 ` Dave Peterson
0 siblings, 1 reply; 56+ messages in thread
From: Arjan van de Ven @ 2006-03-10 17:58 UTC (permalink / raw)
To: Dave Peterson
Cc: Greg KH, Randy.Dunlap, linux-kernel, torvalds, alan, gregkh,
Doug Thompson, bluesmoke-devel
On Fri, 2006-03-10 at 09:46 -0800, Dave Peterson wrote:
> On Thursday 09 March 2006 23:36, Arjan van de Ven wrote:
> > ok so this is actually a per pci device property!
> > I would suggest moving this property to the pci device itself, not doing
> > it inside an edac directory.
> >
> > by doing that you get the advantage that 1) it's a more logical place,
> > and 2) users can do it even for 1 of 2 cards, if they have 2 cards of
> > the same make and 3) you can use the generic kernel quirk interface for
> > this and 4) drivers can in principle control this for their hardware in
> > complex cases
> >
> > I understand that on a PC, EDAC is the only user. but ppc has a similar
> > mechanism I suspect, and they more than likely would be able to share
> > this property....
>
> I'd be curious to hear people's opinions on the following idea:
> move the PCI bus parity error checking functionality from EDAC
> to the PCI subsystem.
I can see the point on at least moving all the infrastructure there.
The actual call to run it... maybe. that's more debatable I suppose.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-10 17:58 ` Arjan van de Ven
@ 2006-03-10 19:07 ` Dave Peterson
2006-03-10 19:33 ` Arjan van de Ven
0 siblings, 1 reply; 56+ messages in thread
From: Dave Peterson @ 2006-03-10 19:07 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Greg KH, Randy.Dunlap, linux-kernel, torvalds, alan, gregkh,
Doug Thompson, bluesmoke-devel
On Friday 10 March 2006 09:58, Arjan van de Ven wrote:
> > I'd be curious to hear people's opinions on the following idea:
> > move the PCI bus parity error checking functionality from EDAC
> > to the PCI subsystem.
>
> I can see the point on at least moving all the infrastructure there.
> The actual call to run it... maybe. that's more debatable I suppose.
Regarding the actual call to run it, I guess it depends on which of
the following you prefer:
Scenario A
----------
A more decentralized layout. Here, the controls that govern the
error handling behavior for a given category of hardware (a
category might be "PCI devices" or "devices that use bus
technology XYZ") are grouped together with other stuff for that
category.
Scenario B
----------
A more centralized layout. Here, the controls that govern a wide
variety of error handling behaviors are all grouped together (for
instance, within EDAC).
I tend to prefer scenario A. The conceptual model I currently have in
my mind for EDAC is as follows:
- Each low-level EDAC module (amd76x, e7xxx, etc.) implements
hardware error handling functionality that is specific to just
that platform.
- The purpose of the core EDAC module is to serve as a home for
abstractions that are common to the various low-level EDAC
modules. This creates uniformity of mechanism, and also uniform
behavior from the user's point of view. It also encourages
avoidance of replicated code. However, the core EDAC module
(and by extension, the low-level EDAC modules) should not
contain stuff that is generic enough to fit elsewhere (for
instance, in the PCI subsystem). In other words, EDAC serves as
a catch-all for error handling functionality that is too
platform-specific to fit anywhere else.
However, these are just my personal thoughts. I'd be curious to hear
other ideas people may have.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-10 19:07 ` Dave Peterson
@ 2006-03-10 19:33 ` Arjan van de Ven
2006-03-10 21:13 ` Dave Peterson
0 siblings, 1 reply; 56+ messages in thread
From: Arjan van de Ven @ 2006-03-10 19:33 UTC (permalink / raw)
To: Dave Peterson
Cc: Greg KH, Randy.Dunlap, linux-kernel, torvalds, alan, gregkh,
Doug Thompson, bluesmoke-devel
On Fri, 2006-03-10 at 11:07 -0800, Dave Peterson wrote:
> On Friday 10 March 2006 09:58, Arjan van de Ven wrote:
> > > I'd be curious to hear people's opinions on the following idea:
> > > move the PCI bus parity error checking functionality from EDAC
> > > to the PCI subsystem.
> >
> > I can see the point on at least moving all the infrastructure there.
> > The actual call to run it... maybe. that's more debatable I suppose.
>
> Regarding the actual call to run it, I guess it depends on which of
> the following you prefer:
>
> Scenario A
> ----------
> A more decentralized layout. Here, the controls that govern the
> error handling behavior for a given category of hardware (a
> category might be "PCI devices" or "devices that use bus
> technology XYZ") are grouped together with other stuff for that
> category.
this would basically make edac a place to report "help something went to
the gutter". Sure. I see that useful.
In fact there are 3 layers then
1) low level "do check" function
2) per bus code that calls the do check functions and whatever is needed
for bus checks
3) "EDAC" central command, which basically gathers all failure reports
and does something with them (push them to userspace or implement the
userspace chosen policy (panic/reboot/etc))
having 1) separate from 2) is useful, it means that drivers can do
synchronous checks in addition to the checks in 2) which will tend to be
asynchronous....
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
@ 2006-03-10 20:37 Doug Thompson
0 siblings, 0 replies; 56+ messages in thread
From: Doug Thompson @ 2006-03-10 20:37 UTC (permalink / raw)
To: arjan
Cc: greg, gregkh, bluesmoke-devel, dsp, Doug Thompson, torvalds, alan,
linux-kernel, rdunlap
On Fri, 2006-03-10 at 19:33 +0000, Arjan van de Ven wrote:
> On Fri, 2006-03-10 at 11:07 -0800, Dave Peterson wrote:
> > On Friday 10 March 2006 09:58, Arjan van de Ven wrote:
> > > > I'd be curious to hear people's opinions on the following idea:
> > > > move the PCI bus parity error checking functionality from EDAC
> > > > to the PCI subsystem.
> > >
> > > I can see the point on at least moving all the infrastructure there.
> > > The actual call to run it... maybe. that's more debatable I suppose.
> >
> > Regarding the actual call to run it, I guess it depends on which of
> > the following you prefer:
> >
> > Scenario A
> > ----------
> > A more decentralized layout. Here, the controls that govern the
> > error handling behavior for a given category of hardware (a
> > category might be "PCI devices" or "devices that use bus
> > technology XYZ") are grouped together with other stuff for that
> > category.
>
I guess it might be useful to express what the original purpose of EDAC
we (Thayne, Eric, etc) were working toward. Arjan you express it
correctly.
We aimed to develop a central place to report all (or almost all)
hardware errors: starting with ECC errors, then added PCI parity errors.
And to do this in a consistent manner so that harvesting and reaping
that error info can become easier and more useful. (EDAC has a common
printk format as well).
Currently the ownership of that feature is the edac_mc module. It is the
one which creates the /sys/devices/system/edac directory and in turn the
'mc' and 'pci' directories.
When we move the PCI Parity checking code to the PCI subsystem it would
be good to maintain that directory layout.
One way to do that is to make the edac_mc CORE a part of the kernel and
no longer a module. (This would also help on the kobject reference
counter issue). Then the PCI Parity subsystem code would register
itself after the EDAC CORE code.
doug t
> this would basically make edac a place to report "help something went to
> the gutter". Sure. I see that useful.
>
> In fact there are 3 layers then
>
> 1) low level "do check" function
>
> 2) per bus code that calls the do check functions and whatever is needed
> for bus checks
>
> 3) "EDAC" central command, which basically gathers all failure reports
> and does something with them (push them to userspace or implement the
> userspace chosen policy (panic/reboot/etc))
>
>
> having 1) separate from 2) is useful, it means that drivers can do
> synchronous checks in addition to the checks in 2) which will tend to be
> asynchronous....
>
>
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-10 19:33 ` Arjan van de Ven
@ 2006-03-10 21:13 ` Dave Peterson
2006-03-10 21:23 ` Arjan van de Ven
0 siblings, 1 reply; 56+ messages in thread
From: Dave Peterson @ 2006-03-10 21:13 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Greg KH, Randy.Dunlap, linux-kernel, torvalds, alan, gregkh,
Doug Thompson, bluesmoke-devel
On Friday 10 March 2006 11:33, Arjan van de Ven wrote:
> > Regarding the actual call to run it, I guess it depends on which of
> > the following you prefer:
> >
> > Scenario A
> > ----------
> > A more decentralized layout. Here, the controls that govern the
> > error handling behavior for a given category of hardware (a
> > category might be "PCI devices" or "devices that use bus
> > technology XYZ") are grouped together with other stuff for that
> > category.
>
> this would basically make edac a place to report "help something went to
> the gutter". Sure. I see that useful.
>
> In fact there are 3 layers then
>
> 1) low level "do check" function
I'm a bit unclear here. Are you thinking of a single "do check"
function that bus-specific code and chipset-specific code can hook
into, or individual "do check" functions for various bus-specific and
chipset-specific modules?
> 2) per bus code that calls the do check functions and whatever is needed
> for bus checks
>
> 3) "EDAC" central command, which basically gathers all failure reports
> and does something with them (push them to userspace or implement the
> userspace chosen policy (panic/reboot/etc))
Are you suggesting something like the following?
- The controls that determine how the error checking is done are
located within the various hardware subsystems. For instance,
with PCI parity checking, this would include stuff like setting
the polling frequency and determining which devices to check.
- When an error is actually detected, the subsystem that detected
the error (for instance, PCI) would feed the error information
to EDAC. Then EDAC would determine how to respond to the error
(for instance, push it to userspace or implement the
userspace-chosen policy (panic/reboot/etc))
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-10 21:13 ` Dave Peterson
@ 2006-03-10 21:23 ` Arjan van de Ven
2006-03-11 1:57 ` Dave Peterson
0 siblings, 1 reply; 56+ messages in thread
From: Arjan van de Ven @ 2006-03-10 21:23 UTC (permalink / raw)
To: Dave Peterson
Cc: Greg KH, Randy.Dunlap, linux-kernel, torvalds, alan, gregkh,
Doug Thompson, bluesmoke-devel
On Fri, 2006-03-10 at 13:13 -0800, Dave Peterson wrote:
> On Friday 10 March 2006 11:33, Arjan van de Ven wrote:
> > > Regarding the actual call to run it, I guess it depends on which of
> > > the following you prefer:
> > >
> > > Scenario A
> > > ----------
> > > A more decentralized layout. Here, the controls that govern the
> > > error handling behavior for a given category of hardware (a
> > > category might be "PCI devices" or "devices that use bus
> > > technology XYZ") are grouped together with other stuff for that
> > > category.
> >
> > this would basically make edac a place to report "help something went to
> > the gutter". Sure. I see that useful.
> >
> > In fact there are 3 layers then
> >
> > 1) low level "do check" function
>
> I'm a bit unclear here. Are you thinking of a single "do check"
> function that bus-specific code and chipset-specific code can hook
> into, or individual "do check" functions for various bus-specific and
> chipset-specific modules?
hmm ok so I want a function that takes a device as parameter, and checks
the state of that device for errors. Internally that probably has to go
via a function pointer somewhere to a device specific check method.
Or maybe a per test-type (pci parity / ECC / etc) check
int pci_check_parity_errors(struct pci_dev *dev, int flags);
something like that, or pci_check_and_clear_parity_errors()
(although that gets too long :)
drivers can call that, say, after firmware init or something to validate
their device is sanely connected. Maybe pci_enable_device() could call
it too.
This also needs a pci_suspend_parity_check() ... _resume_ ... so that
the driver can temporarily disable any checks, for example during device
reset/init. And then just before resume, it manually clears a check.
>
> > 2) per bus code that calls the do check functions and whatever is needed
> > for bus checks
> >
> > 3) "EDAC" central command, which basically gathers all failure reports
> > and does something with them (push them to userspace or implement the
> > userspace chosen policy (panic/reboot/etc))
>
> Are you suggesting something like the following?
>
> - The controls that determine how the error checking is done are
> located within the various hardware subsystems. For instance,
> with PCI parity checking, this would include stuff like setting
> the polling frequency and determining which devices to check.
yes. I would NOT make it overly tunable btw. For many things a sane
default can be chosen, and the effect of picking a different tuning
isn't all that big. Maybe think of it this way: a tuneable to a large
degree is an excuse for not determining the right value / heuristic in
the first place.
> - When an error is actually detected, the subsystem that detected
> the error (for instance, PCI) would feed the error information
> to EDAC. Then EDAC would determine how to respond to the error
> (for instance, push it to userspace or implement the
> userspace-chosen policy (panic/reboot/etc))
yup.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-10 21:23 ` Arjan van de Ven
@ 2006-03-11 1:57 ` Dave Peterson
2006-03-11 7:18 ` Arjan van de Ven
0 siblings, 1 reply; 56+ messages in thread
From: Dave Peterson @ 2006-03-11 1:57 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Greg KH, Randy.Dunlap, linux-kernel, torvalds, alan, gregkh,
Doug Thompson, bluesmoke-devel
On Friday 10 March 2006 13:23, Arjan van de Ven wrote:
> hmm ok so I want a function that takes a device as parameter, and checks
> the state of that device for errors. Internally that probably has to go
> via a function pointer somewhere to a device specific check method.
>
> Or maybe a per test-type (pci parity / ECC / etc) check
>
> int pci_check_parity_errors(struct pci_dev *dev, int flags);
>
> something like that, or pci_check_and_clear_parity_errors()
> (although that gets too long :)
>
> drivers can call that, say, after firmware init or something to validate
> their device is sanely connected. Maybe pci_enable_device() could call
> it too.
>
> This also needs a pci_suspend_parity_check() ... _resume_ ... so that
> the driver can temporarily disable any checks, for example during device
> reset/init. And then just before resume, it manually clears a check.
ok, perhaps things might look something like this?
- A check_error() function checks a device for errors. As you
mention above, this may be a device-specific check method or a
function like pci_check_parity_errors(dev, flags). In either
case, if check_error() finds an error, it clears the error from
the device's registers and returns the error. If check_error()
finds multiple errors, here are couple of options:
- Return a list of all errors detected.
- Return a single error along with a boolean value that says
whether more errors are present. If so, check_error() may
be called repeatedly until no errors remain.
- A handle_error() function handles errors returned by
check_error(). Here are a few options: Each device may have a
handle_error() method that takes an error as a parameter. Or,
each type of error may have its own handle_me() method. A third
option is something like pci_handle_parity_error(dev, error).
In any case, depending on the error type, the handler may choose
to feed the error to EDAC.
- Each hardware subsystem or device driver may determine its own
error checking/handling strategy. Some may want to poll for
errors. Others may want error detection to be asynchronous
(driven by interrupts or exceptions). Or a subsystem may poll
for some kinds of errors and detect others asynchronously. As
you suggest, errors may also be checked for in other situations,
such as after firmware init.
For polling, the poll function calls check_error(), and then
calls handle_error() if an error is found. For interrupt-driven
error checking, the interrupt handler calls check_error(). If
an error is found, the interrupt handler may either call
handle_error() directly or defer invocation of handle_error()
outside interrupt context. If the interrupt is an NMI, deferred
invocation of handle_error() allows it to execute without
introducing deadlocks or race conditions.
- For some types of hardware, at boot time the device's registers
contain spurious error info, which should be discarded. This
may be done by calling check_error() and discarding the results.
> > > 2) per bus code that calls the do check functions and whatever is
> > > needed for bus checks
> > >
> > > 3) "EDAC" central command, which basically gathers all failure reports
> > > and does something with them (push them to userspace or implement the
> > > userspace chosen policy (panic/reboot/etc))
> >
> > Are you suggesting something like the following?
> >
> > - The controls that determine how the error checking is done are
> > located within the various hardware subsystems. For instance,
> > with PCI parity checking, this would include stuff like setting
> > the polling frequency and determining which devices to check.
>
> yes. I would NOT make it overly tunable btw. For many things a sane
> default can be chosen, and the effect of picking a different tuning
> isn't all that big. Maybe think of it this way: a tuneable to a large
> degree is an excuse for not determining the right value / heuristic in
> the first place.
Sounds good.
> > - When an error is actually detected, the subsystem that detected
> > the error (for instance, PCI) would feed the error information
> > to EDAC. Then EDAC would determine how to respond to the error
> > (for instance, push it to userspace or implement the
> > userspace-chosen policy (panic/reboot/etc))
>
> yup.
Cool! I think this also coincides with what Doug is saying. Doug, how
does this sound?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-11 1:57 ` Dave Peterson
@ 2006-03-11 7:18 ` Arjan van de Ven
2006-03-13 19:31 ` Dave Peterson
0 siblings, 1 reply; 56+ messages in thread
From: Arjan van de Ven @ 2006-03-11 7:18 UTC (permalink / raw)
To: Dave Peterson
Cc: Greg KH, Randy.Dunlap, linux-kernel, torvalds, alan, gregkh,
Doug Thompson, bluesmoke-devel
On Fri, 2006-03-10 at 17:57 -0800, Dave Peterson wrote:
> On Friday 10 March 2006 13:23, Arjan van de Ven wrote:
> > hmm ok so I want a function that takes a device as parameter, and checks
> > the state of that device for errors. Internally that probably has to go
> > via a function pointer somewhere to a device specific check method.
> >
> > Or maybe a per test-type (pci parity / ECC / etc) check
> >
> > int pci_check_parity_errors(struct pci_dev *dev, int flags);
> >
> > something like that, or pci_check_and_clear_parity_errors()
> > (although that gets too long :)
> >
> > drivers can call that, say, after firmware init or something to validate
> > their device is sanely connected. Maybe pci_enable_device() could call
> > it too.
> >
> > This also needs a pci_suspend_parity_check() ... _resume_ ... so that
> > the driver can temporarily disable any checks, for example during device
> > reset/init. And then just before resume, it manually clears a check.
>
> ok, perhaps things might look something like this?
<snip>
sounds like overdesign to me ;)
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
@ 2006-03-11 17:04 Doug Thompson
2006-03-13 19:35 ` Dave Peterson
0 siblings, 1 reply; 56+ messages in thread
From: Doug Thompson @ 2006-03-11 17:04 UTC (permalink / raw)
To: dsp
Cc: arjan, greg, gregkh, bluesmoke-devel, Doug Thompson, torvalds,
alan, linux-kernel, rdunlap
On Sat, 2006-03-11 at 01:57 +0000, Dave Peterson wrote:
> On Friday 10 March 2006 13:23, Arjan van de Ven wrote:
> > hmm ok so I want a function that takes a device as parameter, and checks
> > the state of that device for errors. Internally that probably has to go
> > via a function pointer somewhere to a device specific check method.
> >
> > Or maybe a per test-type (pci parity / ECC / etc) check
> >
> > int pci_check_parity_errors(struct pci_dev *dev, int flags);
> >
> > something like that, or pci_check_and_clear_parity_errors()
> > (although that gets too long :)
> >
> > drivers can call that, say, after firmware init or something to validate
> > their device is sanely connected. Maybe pci_enable_device() could call
> > it too.
> >
> > This also needs a pci_suspend_parity_check() ... _resume_ ... so that
> > the driver can temporarily disable any checks, for example during device
> > reset/init. And then just before resume, it manually clears a check.
>
> ok, perhaps things might look something like this?
>
> - A check_error() function checks a device for errors. As you
> mention above, this may be a device-specific check method or a
> function like pci_check_parity_errors(dev, flags). In either
> case, if check_error() finds an error, it clears the error from
> the device's registers and returns the error. If check_error()
> finds multiple errors, here are couple of options:
>
> - Return a list of all errors detected.
> - Return a single error along with a boolean value that says
> whether more errors are present. If so, check_error() may
> be called repeatedly until no errors remain.
>
> - A handle_error() function handles errors returned by
> check_error(). Here are a few options: Each device may have a
> handle_error() method that takes an error as a parameter. Or,
> each type of error may have its own handle_me() method. A third
> option is something like pci_handle_parity_error(dev, error).
> In any case, depending on the error type, the handler may choose
> to feed the error to EDAC.
>
> - Each hardware subsystem or device driver may determine its own
> error checking/handling strategy. Some may want to poll for
> errors. Others may want error detection to be asynchronous
> (driven by interrupts or exceptions). Or a subsystem may poll
> for some kinds of errors and detect others asynchronously. As
> you suggest, errors may also be checked for in other situations,
> such as after firmware init.
>
> For polling, the poll function calls check_error(), and then
> calls handle_error() if an error is found. For interrupt-driven
> error checking, the interrupt handler calls check_error(). If
> an error is found, the interrupt handler may either call
> handle_error() directly or defer invocation of handle_error()
> outside interrupt context. If the interrupt is an NMI, deferred
> invocation of handle_error() allows it to execute without
> introducing deadlocks or race conditions.
>
> - For some types of hardware, at boot time the device's registers
> contain spurious error info, which should be discarded. This
> may be done by calling check_error() and discarding the results.
>
> > > > 2) per bus code that calls the do check functions and whatever is
> > > > needed for bus checks
> > > >
> > > > 3) "EDAC" central command, which basically gathers all failure reports
> > > > and does something with them (push them to userspace or implement the
> > > > userspace chosen policy (panic/reboot/etc))
> > >
> > > Are you suggesting something like the following?
> > >
> > > - The controls that determine how the error checking is done are
> > > located within the various hardware subsystems. For instance,
> > > with PCI parity checking, this would include stuff like setting
> > > the polling frequency and determining which devices to check.
> >
> > yes. I would NOT make it overly tunable btw. For many things a sane
> > default can be chosen, and the effect of picking a different tuning
> > isn't all that big. Maybe think of it this way: a tuneable to a large
> > degree is an excuse for not determining the right value / heuristic in
> > the first place.
>
> Sounds good.
>
> > > - When an error is actually detected, the subsystem that detected
> > > the error (for instance, PCI) would feed the error information
> > > to EDAC. Then EDAC would determine how to respond to the error
> > > (for instance, push it to userspace or implement the
> > > userspace-chosen policy (panic/reboot/etc))
> >
> > yup.
>
> Cool! I think this also coincides with what Doug is saying. Doug, how
> does this sound?
It sounds good. One issue is how this works with the IBM PCI Parity
handling submission? I don't remember if it has been included yet or
not. I haven't fully studied their model, but it allowed for device
drivers to register notification functions. The PCI subsystem would then
notify the driver of such errors so the driver could do what ever it
needed to do in the bad-thing-happened event.
What we are trying to accomplish is a 'detection, harvesting and
logging' mechanism of these events so that these events can be available
for userland harvesting and possibly can do an alarm of some sort.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-11 7:18 ` Arjan van de Ven
@ 2006-03-13 19:31 ` Dave Peterson
0 siblings, 0 replies; 56+ messages in thread
From: Dave Peterson @ 2006-03-13 19:31 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Greg KH, Randy.Dunlap, linux-kernel, torvalds, alan, gregkh,
Doug Thompson, bluesmoke-devel
On Friday 10 March 2006 23:18, Arjan van de Ven wrote:
> > ok, perhaps things might look something like this?
>
> <snip>
>
> sounds like overdesign to me ;)
Hmm... ok :) I don't want to suggest that Linux's hardware error
handling mechanism should be "all of the above". Rather my intent is
this:
- To summarize some previously mentioned options for error
handling APIs, and mention a couple more ideas. More discussion
is needed to decide exactly what we want.
- To suggest that drivers and hardware subsystems may want some
flexibility in how they detect and process errors. (However,
there is also such a thing as too much flexibility; EDAC should
create a certain amount of uniformity)
- To state a couple of motivations for keeping error checking
(i.e. reading error info from registers and clearing the
registers) logically separate from error handling (i.e. doing
something with the info obtained from the registers)
Another thing to think about: If EDAC is going to collect error info
from subsystems and drivers, it needs some sort of API that supports
this. Although I don't think this requires anything fancy, it still
needs some discussion.
I'd be curious to read more discussion/ideas about error handling APIs
(and see some patches that illustrate the ideas).
Dave
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] EDAC: core EDAC support code
2006-03-11 17:04 Doug Thompson
@ 2006-03-13 19:35 ` Dave Peterson
0 siblings, 0 replies; 56+ messages in thread
From: Dave Peterson @ 2006-03-13 19:35 UTC (permalink / raw)
To: Doug Thompson
Cc: arjan, greg, gregkh, bluesmoke-devel, Doug Thompson, torvalds,
alan, linux-kernel, rdunlap
On Saturday 11 March 2006 09:04, Doug Thompson wrote:
> > > > - When an error is actually detected, the subsystem that detected
> > > > the error (for instance, PCI) would feed the error information
> > > > to EDAC. Then EDAC would determine how to respond to the error
> > > > (for instance, push it to userspace or implement the
> > > > userspace-chosen policy (panic/reboot/etc))
> > >
> > > yup.
> >
> > Cool! I think this also coincides with what Doug is saying. Doug, how
> > does this sound?
>
> It sounds good. One issue is how this works with the IBM PCI Parity
> handling submission? I don't remember if it has been included yet or
> not. I haven't fully studied their model, but it allowed for device
> drivers to register notification functions. The PCI subsystem would then
> notify the driver of such errors so the driver could do what ever it
> needed to do in the bad-thing-happened event.
Hmm... interesting. Can you provide any links to info on this?
^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2006-03-13 19:36 UTC | newest]
Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200601190414.k0J4EZCV021775@hera.kernel.org>
2006-03-05 10:18 ` [PATCH] EDAC: core EDAC support code Arjan van de Ven
2006-03-05 10:30 ` Arjan van de Ven
2006-03-06 18:14 ` Dave Peterson
2006-03-06 18:22 ` Randy.Dunlap
2006-03-07 17:03 ` Dave Peterson
2006-03-07 17:20 ` Greg KH
2006-03-08 0:29 ` Dave Peterson
2006-03-07 19:03 ` Arjan van de Ven
2006-03-07 19:05 ` Greg KH
2006-03-09 23:51 ` Dave Peterson
2006-03-10 0:02 ` Greg KH
2006-03-10 1:46 ` Dave Peterson
2006-03-10 7:36 ` Arjan van de Ven
2006-03-10 11:06 ` Tim Small
2006-03-10 11:40 ` Arjan van de Ven
2006-03-10 17:46 ` Dave Peterson
2006-03-10 17:58 ` Arjan van de Ven
2006-03-10 19:07 ` Dave Peterson
2006-03-10 19:33 ` Arjan van de Ven
2006-03-10 21:13 ` Dave Peterson
2006-03-10 21:23 ` Arjan van de Ven
2006-03-11 1:57 ` Dave Peterson
2006-03-11 7:18 ` Arjan van de Ven
2006-03-13 19:31 ` Dave Peterson
2006-03-06 19:52 ` Greg KH
2006-03-05 15:55 ` Greg KH
2006-03-05 16:24 ` Arjan van de Ven
2006-03-06 18:52 ` Dave Peterson
2006-03-06 19:53 ` Greg KH
2006-03-06 21:01 ` Dave Peterson
2006-03-06 21:07 ` Arjan van de Ven
2006-03-09 3:19 ` Dave Peterson
2006-03-09 3:44 ` Al Viro
2006-03-09 5:51 ` Greg KH
2006-03-06 21:32 ` Al Viro
2006-03-06 21:53 ` Greg KH
2006-03-06 22:24 ` Al Viro
2006-03-06 22:55 ` Greg KH
2006-03-07 10:41 ` Andrew Morton
2006-03-07 11:08 ` Al Viro
2006-03-08 2:46 ` Rusty Russell
2006-03-07 1:45 ` Dmitry Torokhov
2006-03-07 1:57 ` Greg KH
2006-03-07 2:10 ` Dmitry Torokhov
2006-03-07 16:47 ` Dave Peterson
2006-03-07 17:04 ` Greg KH
2006-03-07 17:06 ` Dave Peterson
2006-03-08 1:03 ` Dmitry Torokhov
2006-03-08 1:33 ` Greg KH
2006-03-10 0:44 Doug Thompson
-- strict thread matches above, loose matches on Subject: below --
2006-03-10 16:56 Doug Thompson
2006-03-10 17:11 ` Arjan van de Ven
2006-03-10 17:28 Doug Thompson
2006-03-10 20:37 Doug Thompson
2006-03-11 17:04 Doug Thompson
2006-03-13 19:35 ` Dave Peterson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox