Hi, On 01/21/2010 10:47 AM, Sven Anders wrote: > Hello! > > Thanks for you last comments on the code. Now after the holidays, I found > time to write you an answer and sent you an updated patch as well. You're welcome, thanks for being persistent, so this can get into the kernel eventually. >> Now the following can happen: >> >> 1) Driver attaches to w83793 i2c device >> 2) app opens /dev/watchdog >> 3) Driver detaches from w83793 i2c device >> (for example because the i2c master module >> gets unloaded) > > Is it really possible to unload the i2c master module without > unloading any dependend module first? I thought this isn't possible... > Yes, their is no dependency relation between the modules, only a driver binding, when the i2c master goes away, the i2c driver will simply have its detach function called. > I assume the already running watchdog app can still access the opened > file even if the /dev/watchdog node does not exist anymore. So this > should be a problem. > Correct, this is exactly why the ref counting is there, and why the watchdog functions called through the device check if client has not disappeared. > There is one small problem left: > > If the watchdog_open() functions failes with EBUSY, we must not > increase the counter. > Oh, good catch that bug is present in the fschmd.c code too. Note that the way you've fixed this with an unlock in the error path is sort of frowned up on. It is correct, but we usually try to keep locks and unlocks in balanced pairs, so that it is easy to check for missing unlocks. See the patch I've done to fix this same issue for fschmd (attached). >>> On the other hand, I think you are missing the 'reboot notifier' in >>> your fschmd driver code. >> >> The fsc chips are all part of Fujitsu Siemens Computers, iow the manufacturer >> of the motherboard == the manufacturer of the watchdog. As such the watchdog >> always gets initialized by the BIOS, so there is no reason to disable >> it before shutdown / reboot. > > But you will need it, to prevent the watchdog to reboot the machine, if the > shutdown sequence needs more time to shutdown than the watchdog is set to. > > Consider the following szenario: > > 1) Watchdog timeout is set to 1 Minute. > 2) Watchdog helper tool resets the watchdog timer every 30 seconds. > 3) User executes "shutdown" binary. Shutdown is pure userspaces, and does not do much more then signal init to switch runlevel. > 4) Watchdog timer is currently at 31 seconds and the watchdog helper > tool is stopped first. > 5) Watchdog timer has now about 30 seconds left. > 6) Heavy loaded database needs much time to write it's caches or perform any > other tasks during shutdown. For instance it will need about 2 minutes. That would be an initscripts bug then, the watchdog tool should be stopped as one of the last tasks in the runlevel. > 7) After 30 seconds the hardware watchdog kicks in and reboots the machine > leaving the database in an inconsistent state. > This will happen even with a reboot/halt notifier. That won't get called until the kernel actually is going to do the reboot (as in make the BIOS call to do a soft reset). > Please notify me, if I need to make some more changes or if you sent the patch > upstream. > Well, I don't have any path for sending patches directly upstream, Jean Delvare usually does that for hwmon tree patches. I can ack this though, telling Jean it has been reviewed by me and I don't see any more issues. But atm I still do see a few issues: watchdog_get_timeout(): Should return data->watchdog_timeout, not the register value, as that register actually counts down the minutes till it will reboot. IOW it does not contain the counter reload value (which is what we want to return), but it contains the actual counter. + case WDIOC_SETTIMEOUT: + if (get_user(val, (int __user *)arg)) { + ret = -EFAULT; + break; + } + data->watchdog_timeout = val; + ret = watchdog_set_timeout(data, val); + if (ret > 0) + ret = put_user(ret, (int __user *)arg); + break; Note that here you store val, which is in seconds! into data->watchdog_timeout, and then on the next trigger you will write that to W83793_REG_WDT_TIMEOUT, resulting in a timeout of as many minutes as the user asked seconds. I think it would best to remove the setting of data->watchdog_timeout here (esp as no rounding and bounds checking has been done yet), and set it to stimeout in watchdog_set_timeout() after all error checking has been done there. And I think Jean might fall over the balanced lock unlock thingie, Jean ? Regards, Hans