* [lm-sensors] 18-rc1-mm1 and unchecked return-codes
2006-07-11 18:33 [lm-sensors] 18-rc1-mm1 and unchecked return-codes Jim Cromie
@ 2006-07-12 13:02 ` Mark M. Hoffman
2006-07-27 22:07 ` Jim Cromie
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Mark M. Hoffman @ 2006-07-12 13:02 UTC (permalink / raw)
To: lm-sensors
Hi Jim:
* Jim Cromie <jim.cromie at gmail.com> [2006-07-11 12:33:33 -0600]:
>
> from the 18-rc1-mm1 announcement:
>
> - We're getting a relatively large number of crash reports coming out of the
> core sysfs/kobject/driver/bus code, and they're all really hard to diagnose.
>
> I am suspecting that what's happening is that some registration functions
> are failing and the caller is ignoring that failure. The code proceeds and
> crashes much later, in obscure ways.
>
> All these functions return error codes, and we're not checking them. We
> should. So there's a patch which marks all these things as __must_check,
> which causes around 1,500 new warnings.
>
> These are all bugs and they all need to be fixed.
>
>
>
> Many of these 1500 are from lm-sensors modules;
> for example - a single function - device_create_file(),
> is called 1027 times, and 'every' one is a void context return.
>
> grep device_create_file drivers/hwmon/*.c
>
> So, what kinds of errors are possible here ?
> and what should we do if they occur ?
> does it ever happen that only 1 create fails ?
> do/would we care ? (not currently)
>
> These Qs are somewhat rhetorical, theyre at least a heads-up.
I can't resist answering rhetorical questions. ;)
At least in the context of what Andrew wrote, the calls to device_create_file()
in the hwmon drivers are not the problem. AFAICT, if one or more sysfs files
fail to be created, the worst that might happen is that *maybe* sensors(1) could
crash. Offhand, I can't see how missing sysfs files could lead to a kernel crash.
At any rate, yes, they're still bugs.
OTOH, there is a real need to review some of the error checking in the I2C
core - where ignoring status can and does cause kernel panics. See some
recent patches by both Jean and me for examples... and there are more.
> FWIW, device_remove_file() is only called 9 times.
> This doesnt seem to be a problem though, or we'd have heard it by now.
> Specifically - hwmon/pc87360 appears to clean up after itself,
> (or rather sysfs core code does), without any calls to device_remove_file()
>
> soekris:/sys/bus/i2c/devices# ls
> 9191-6620@
> soekris:/sys/bus/i2c/devices# rmmod pc87360
> soekris:/sys/bus/i2c/devices# ls
> soekris:/sys/bus/i2c/devices#
>
>
> Ive gone ahead and worked up a patch against pc87360 to
> count-errors-and-warn, which should suffice, at least for short term.
(Hopefully) I'll take a look at this later today.
Regards,
--
Mark M. Hoffman
mhoffman at lightlink.com
^ permalink raw reply [flat|nested] 7+ messages in thread* [lm-sensors] 18-rc1-mm1 and unchecked return-codes
2006-07-11 18:33 [lm-sensors] 18-rc1-mm1 and unchecked return-codes Jim Cromie
2006-07-12 13:02 ` Mark M. Hoffman
@ 2006-07-27 22:07 ` Jim Cromie
2006-07-29 2:35 ` Mark M. Hoffman
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Jim Cromie @ 2006-07-27 22:07 UTC (permalink / raw)
To: lm-sensors
Mark M. Hoffman wrote:
> Hi Jim:
>
Hi Mark, thanks for taking an interest.
> * Jim Cromie <jim.cromie at gmail.com> [2006-07-11 12:33:33 -0600]:
>
>> from the 18-rc1-mm1 announcement:
>>
>> - We're getting a relatively large number of crash reports coming out of the
>> core sysfs/kobject/driver/bus code, and they're all really hard to diagnose.
>>
>> I am suspecting that what's happening is that some registration functions
>> are failing and the caller is ignoring that failure. The code proceeds and
>> crashes much later, in obscure ways.
>>
>> All these functions return error codes, and we're not checking them. We
>> should. So there's a patch which marks all these things as __must_check,
>> which causes around 1,500 new warnings.
>>
>> These are all bugs and they all need to be fixed.
>>
>>
>>
>> Many of these 1500 are from lm-sensors modules;
>> for example - a single function - device_create_file(),
>> is called 1027 times, and 'every' one is a void context return.
>>
>> grep device_create_file drivers/hwmon/*.c
>>
>> So, what kinds of errors are possible here ?
>> and what should we do if they occur ?
>> does it ever happen that only 1 create fails ?
>> do/would we care ? (not currently)
>>
>> These Qs are somewhat rhetorical, theyre at least a heads-up.
>>
>
> I can't resist answering rhetorical questions. ;)
>
> At least in the context of what Andrew wrote, the calls to device_create_file()
> in the hwmon drivers are not the problem. AFAICT, if one or more sysfs files
> fail to be created, the worst that might happen is that *maybe* sensors(1) could
> crash. Offhand, I can't see how missing sysfs files could lead to a kernel crash.
>
> At any rate, yes, they're still bugs.
>
> OTOH, there is a real need to review some of the error checking in the I2C
> core - where ignoring status can and does cause kernel panics. See some
> recent patches by both Jean and me for examples... and there are more.
>
>
>> FWIW, device_remove_file() is only called 9 times.
>>
8 from hwmon/ams.c
1 from hwmon/lm70.c
This compares rather sparsely against:
[jimc at harpo linux-2.6.18-rc2-mm1-sk]$ grep device_create_file
drivers/hwmon/*.c |wc
1027 3046 83340
Is it truly this optional ?
>> This doesnt seem to be a problem though, or we'd have heard it by now.
>> Specifically - hwmon/pc87360 appears to clean up after itself,
>> (or rather sysfs core code does), without any calls to device_remove_file()
>>
>> soekris:/sys/bus/i2c/devices# ls
>> 9191-6620@
>> soekris:/sys/bus/i2c/devices# rmmod pc87360
>> soekris:/sys/bus/i2c/devices# ls
>> soekris:/sys/bus/i2c/devices#
>>
>>
>> Ive gone ahead and worked up a patch against pc87360 to
>> count-errors-and-warn, which should suffice, at least for short term.
>>
>
> (Hopefully) I'll take a look at this later today.
>
>
dashed-hopes ;-)
If you'd rather punt, I'll send it on to lkml, but I suppose Id like to
keep it in-channel.
> Regards,
>
>
thanks
-jimc
^ permalink raw reply [flat|nested] 7+ messages in thread* [lm-sensors] 18-rc1-mm1 and unchecked return-codes
2006-07-11 18:33 [lm-sensors] 18-rc1-mm1 and unchecked return-codes Jim Cromie
2006-07-12 13:02 ` Mark M. Hoffman
2006-07-27 22:07 ` Jim Cromie
@ 2006-07-29 2:35 ` Mark M. Hoffman
2006-07-29 15:09 ` Jim Cromie
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Mark M. Hoffman @ 2006-07-29 2:35 UTC (permalink / raw)
To: lm-sensors
Hi Jim:
* Jim Cromie <jim.cromie at gmail.com> [2006-07-27 16:07:05 -0600]:
> >> FWIW, device_remove_file() is only called 9 times.
> >>
> 8 from hwmon/ams.c
> 1 from hwmon/lm70.c
>
> This compares rather sparsely against:
> [jimc at harpo linux-2.6.18-rc2-mm1-sk]$ grep device_create_file
> drivers/hwmon/*.c |wc
> 1027 3046 83340
>
> Is it truly this optional ?
At present, yes. When the device goes away, the sysfs files go away too.
This will become a problem though when we (eventually) move to persistent
(i2c) devices, so we might as well fix it up now.
> >> Ive gone ahead and worked up a patch against pc87360 to
> >> count-errors-and-warn, which should suffice, at least for short term.
> >>
> >
> > (Hopefully) I'll take a look at this later today.
> >
> >
> dashed-hopes ;-)
> If you'd rather punt, I'll send it on to lkml, but I suppose Id like to
> keep it in-channel.
Someday I will learn to keep my fantasies about free time to myself. But
you didn't actually send your patch to the list right (?) At least I can't
find it now that I'm looking again.
Anyway... I guess we should stuff all those attributes in a table and run
over them with a for loop. Then if we catch an error, we can run back
through the table in reverse. I mean, what else can we do right? I'll
patch the following to start, all of which I can either test easily or
for which I'm responsible anyway...
asb100, lm75, lm78, smsc47b397, w83627hf
Any volunteers for the other 38?
Regards,
--
Mark M. Hoffman
mhoffman at lightlink.com
^ permalink raw reply [flat|nested] 7+ messages in thread* [lm-sensors] 18-rc1-mm1 and unchecked return-codes
2006-07-11 18:33 [lm-sensors] 18-rc1-mm1 and unchecked return-codes Jim Cromie
` (2 preceding siblings ...)
2006-07-29 2:35 ` Mark M. Hoffman
@ 2006-07-29 15:09 ` Jim Cromie
2006-07-29 15:26 ` Mark M. Hoffman
2006-08-02 17:08 ` Jean Delvare
5 siblings, 0 replies; 7+ messages in thread
From: Jim Cromie @ 2006-07-29 15:09 UTC (permalink / raw)
To: lm-sensors
Mark M. Hoffman wrote:
> Hi Jim:
>
> * Jim Cromie <jim.cromie at gmail.com> [2006-07-27 16:07:05 -0600]:
>
>>>> FWIW, device_remove_file() is only called 9 times.
>>>>
>>>>
>> 8 from hwmon/ams.c
>> 1 from hwmon/lm70.c
>>
>> This compares rather sparsely against:
>> [jimc at harpo linux-2.6.18-rc2-mm1-sk]$ grep device_create_file
>> drivers/hwmon/*.c |wc
>> 1027 3046 83340
>>
>> Is it truly this optional ?
>>
>
> At present, yes. When the device goes away, the sysfs files go away too.
> This will become a problem though when we (eventually) move to persistent
> (i2c) devices, so we might as well fix it up now.
>
>
>>>> Ive gone ahead and worked up a patch against pc87360 to
>>>> count-errors-and-warn, which should suffice, at least for short term.
>>>>
>>>>
>>> (Hopefully) I'll take a look at this later today.
>>>
>>>
>>>
>> dashed-hopes ;-)
>> If you'd rather punt, I'll send it on to lkml, but I suppose Id like to
>> keep it in-channel.
>>
>
> Someday I will learn to keep my fantasies about free time to myself.
;-) - I now swallow the words "Two Weeks!", and try to answer with a
question instead :-)
> But
> you didn't actually send your patch to the list right (?) At least I can't
> find it now that I'm looking again.
>
>
For the patch itself, I used a subject-line I thought more suitable for
submission.
However, it lost connection to this subject - doh.
http://lists.lm-sensors.org/pipermail/lm-sensors/2006-July/016909.html
> Anyway... I guess we should stuff all those attributes in a table and run
> over them with a for loop. Then if we catch an error, we can run back
> through the table in reverse. I mean, what else can we do right?
Im not sure what you mean -
- we dont *need* device-remove-file, since they 'go away' by themselves
(at least for normal rmmods)
and work properly when re-modprobed
- changing strategy from completely-unchecked to
undo-everything-and-bailout
is a rather long step, and makes driver possibly unusable in some
corner cases
(none of which we know and can repeat)
- my approach of just counting and warning is 'more legacy'
(but warnings may be ignored, whereas a 'broken-driver' will elicit
more feedback)
IOW - theres a balance to strike here, and Im not sure where (too-clever
see-saw analogy elided)
Anyway, your forthcoming patches will clarify your sense of the right
balance..
> I'll
> patch the following to start, all of which I can either test easily or
> for which I'm responsible anyway...
>
> asb100, lm75, lm78, smsc47b397, w83627hf
>
> Any volunteers for the other 38?
>
> Regards,
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread* [lm-sensors] 18-rc1-mm1 and unchecked return-codes
2006-07-11 18:33 [lm-sensors] 18-rc1-mm1 and unchecked return-codes Jim Cromie
` (3 preceding siblings ...)
2006-07-29 15:09 ` Jim Cromie
@ 2006-07-29 15:26 ` Mark M. Hoffman
2006-08-02 17:08 ` Jean Delvare
5 siblings, 0 replies; 7+ messages in thread
From: Mark M. Hoffman @ 2006-07-29 15:26 UTC (permalink / raw)
To: lm-sensors
Hi Jim:
> Mark M. Hoffman wrote:
> > But
> >you didn't actually send your patch to the list right (?) At least I can't
> >find it now that I'm looking again.
* Jim Cromie <jim.cromie at gmail.com> [2006-07-29 09:09:30 -0600]:
> For the patch itself, I used a subject-line I thought more suitable for
> submission.
> However, it lost connection to this subject - doh.
> http://lists.lm-sensors.org/pipermail/lm-sensors/2006-July/016909.html
OK thanks.
> >Anyway... I guess we should stuff all those attributes in a table and run
> >over them with a for loop. Then if we catch an error, we can run back
> >through the table in reverse. I mean, what else can we do right?
> Im not sure what you mean -
> - we dont *need* device-remove-file, since they 'go away' by themselves
> (at least for normal rmmods)
> and work properly when re-modprobed
Sorry, I'll be more specific. Right now, when you rmmod a hwmon driver, the
struct device to which it was bound gets destroyed. When that happens, all
sysfs files created with device_create_file() will get auto-freed also.
But this is contrary to the way most subsystems work. E.g. in PCI, the
struct device is created by the bus and is persistent (so long as it isn't
physically removed from the system). In this scenario, when the driver
is rmmod'ed, it is merely unbound from the struct device... and those sysfs
files have to be removed manually. I plan to move the I2C subsystem to
this model "sometime soon".
Ergo, while we're at it... we should add the device_remove_file() calls
now in order to save the time later. We'll be touching all the same code
anyway.
> - changing strategy from completely-unchecked to
> undo-everything-and-bailout
> is a rather long step, and makes driver possibly unusable in some
> corner cases
> (none of which we know and can repeat)
True enough, but I imagine that any other solution will be frowned upon
as "wallpapering over bugs". Jean: any opinion on this?
> - my approach of just counting and warning is 'more legacy'
> (but warnings may be ignored, whereas a 'broken-driver' will elicit
> more feedback)
>
> IOW - theres a balance to strike here, and Im not sure where (too-clever
> see-saw analogy elided)
> Anyway, your forthcoming patches will clarify your sense of the right
> balance..
>
> > I'll
> >patch the following to start, all of which I can either test easily or
> >for which I'm responsible anyway...
> >
> > asb100, lm75, lm78, smsc47b397, w83627hf
> >
> >Any volunteers for the other 38?
> >
> >Regards,
> >
> >
Regards,
--
Mark M. Hoffman
mhoffman at lightlink.com
^ permalink raw reply [flat|nested] 7+ messages in thread* [lm-sensors] 18-rc1-mm1 and unchecked return-codes
2006-07-11 18:33 [lm-sensors] 18-rc1-mm1 and unchecked return-codes Jim Cromie
` (4 preceding siblings ...)
2006-07-29 15:26 ` Mark M. Hoffman
@ 2006-08-02 17:08 ` Jean Delvare
5 siblings, 0 replies; 7+ messages in thread
From: Jean Delvare @ 2006-08-02 17:08 UTC (permalink / raw)
To: lm-sensors
Hi Jim, Mark,
Sorry for the late answer, I have a hard time catching up with e-mail
since I returned from vacation.
> > - changing strategy from completely-unchecked to
> > undo-everything-and-bailout
> > is a rather long step, and makes driver possibly unusable in some
> > corner cases
> > (none of which we know and can repeat)
>
> True enough, but I imagine that any other solution will be frowned upon
> as "wallpapering over bugs". Jean: any opinion on this?
Yup, I don't much like Jim's approach, because it's meant to be
temporary. Mark's approach seems better, especially since it is less
difficult than I first thought. Now let's see how it scales to more
complex drivers. Jim, would you like to try implementing something
similar for the pc87360 driver (or any other complex driver of your
choice, as long as you can test it) and see how it goes?
> > - my approach of just counting and warning is 'more legacy'
> > (but warnings may be ignored, whereas a 'broken-driver' will elicit
> > more feedback)
My fear with this approach is that, once the warnings are hidden,
nobody ever takes care of converting the drivers to the "proper
way" (whatever this ends up being.) The change will affect all hardware
monitoring drivers, and will be pretty large, so let's get it right
directly.
Give me a few more days to think about it all again, anyway... It's
large enough that we should be wise in choosing a strategy.
> > > asb100, lm75, lm78, smsc47b397, w83627hf
> > >
> > > Any volunteers for the other 38?
I will take care of the drivers I maintain (adm1025, lm83, lm90,
f71805f) and the ones I was involved in (lm63, w83l785ts).
Thanks,
--
Jean Delvare
^ permalink raw reply [flat|nested] 7+ messages in thread