* PCI driver module unload race?
@ 2003-03-08 10:47 Russell King
2003-03-08 19:12 ` Greg KH
0 siblings, 1 reply; 23+ messages in thread
From: Russell King @ 2003-03-08 10:47 UTC (permalink / raw)
To: Linux Kernel List
Cc: Patrick Mochel, Ivan Kokshaysky, Jeff Garzik, Greg KH,
Rusty Russell
Hi,
What prevents the following scenario from happening? It's purely
theoretical - I haven't seen this occuring.
- Load PCI driver.
- PCI driver registers using pci_module_init(), and adds itself to sysfs.
- Hot-plugin a PCI device which uses this driver. sysfs matches the PCI
driver, and calls the PCI drivers probe function.
- The probe function calls kmalloc or some other function which sleeps
(or gets preempted, if CONFIG_PREEMPT is enabled.)
- We switch to another thread, which happens to be rmmod for this PCI
driver. We remove the driver since it has a use count of zero.
- We switch back to the PCI driver. Oops.
I've probably missed something, but I don't think so. I suspect we need
struct device_driver to include a struct module pointer which sysfs can
take before calling any driver functions.
Comments?
--
Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: PCI driver module unload race? 2003-03-08 10:47 PCI driver module unload race? Russell King @ 2003-03-08 19:12 ` Greg KH 2003-03-08 19:47 ` Petr Vandrovec 2003-03-08 20:09 ` Russell King 0 siblings, 2 replies; 23+ messages in thread From: Greg KH @ 2003-03-08 19:12 UTC (permalink / raw) To: Linux Kernel List, Patrick Mochel, Ivan Kokshaysky, Jeff Garzik, Rusty Russell On Sat, Mar 08, 2003 at 10:47:49AM +0000, Russell King wrote: > Hi, > > What prevents the following scenario from happening? It's purely > theoretical - I haven't seen this occuring. > > - Load PCI driver. > > - PCI driver registers using pci_module_init(), and adds itself to sysfs. > > - Hot-plugin a PCI device which uses this driver. sysfs matches the PCI > driver, and calls the PCI drivers probe function. Ugh, yes you are correct, I can't believe I missed this before. How does this patch look? thanks, greg k-h ===== drivers/pci/pci-driver.c 1.17 vs edited ===== --- 1.17/drivers/pci/pci-driver.c Sat Nov 23 13:23:03 2002 +++ edited/drivers/pci/pci-driver.c Sat Mar 8 11:21:06 2003 @@ -48,6 +48,12 @@ if (!pci_dev->driver && drv->probe) { const struct pci_device_id *id; + if (!try_module_get(drv->owner)) { + dev_err(dev, "Can't get a module reference for %s\n", + drv->name); + error = -ENODEV; + goto exit; + } id = pci_match_device(drv->id_table, pci_dev); if (id) error = drv->probe(pci_dev, id); @@ -55,7 +61,9 @@ pci_dev->driver = drv; error = 0; } + module_put(drv->owner); } +exit: return error; } @@ -66,7 +74,10 @@ if (drv) { if (drv->remove) - drv->remove(pci_dev); + if (try_module_get(drv->owner)) { + drv->remove(pci_dev); + module_put(drv->owner); + } pci_dev->driver = NULL; } return 0; ===== include/linux/pci.h 1.36 vs edited ===== --- 1.36/include/linux/pci.h Tue Mar 4 21:09:58 2003 +++ edited/include/linux/pci.h Sat Mar 8 11:20:15 2003 @@ -493,6 +493,7 @@ struct pci_driver { struct list_head node; + struct module *owner; char *name; const struct pci_device_id *id_table; /* NULL if wants all devices */ int (*probe) (struct pci_dev *dev, const struct pci_device_id *id); /* New device inserted */ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-08 19:12 ` Greg KH @ 2003-03-08 19:47 ` Petr Vandrovec 2003-03-08 19:51 ` Greg KH 2003-03-08 20:03 ` Russell King 2003-03-08 20:09 ` Russell King 1 sibling, 2 replies; 23+ messages in thread From: Petr Vandrovec @ 2003-03-08 19:47 UTC (permalink / raw) To: Greg KH Cc: Linux Kernel List, Patrick Mochel, Ivan Kokshaysky, Jeff Garzik, Rusty Russell On Sat, Mar 08, 2003 at 11:12:37AM -0800, Greg KH wrote: > On Sat, Mar 08, 2003 at 10:47:49AM +0000, Russell King wrote: > > Hi, > > > > What prevents the following scenario from happening? It's purely > > theoretical - I haven't seen this occuring. > > > > - Load PCI driver. > > > > - PCI driver registers using pci_module_init(), and adds itself to sysfs. > > > > - Hot-plugin a PCI device which uses this driver. sysfs matches the PCI > > driver, and calls the PCI drivers probe function. > > Ugh, yes you are correct, I can't believe I missed this before. > > How does this patch look? Bad... What are you trying to solve? After driver calls pci_unregister_driver, it is sure that there are no other users of this pci driver. So I do not understand why you are adding these additional protections (and one in remove looks suspicious to me: if you unplug device while module in unloading, you do not call remove?! who will call it then? you'll not get oopses in pci_unregister_driver? leaked memory?). Driver authors simple have to unregister from subsystems in correct order, no try_module_get() put anywhere is not going to solve this problem. There is simple no relation between modules and drivers - and anyone who is trying to force this is misguided. Driver starts its life inside call to *_register_*, and ceases in *_unregister_*, not somewhere after call to init_module, and before cleanup_module. And if module registers resources in wrong order, module has to cope with its stupidity, not kernel (and I have yet to see two kernel interfaces which cannot be properly ordered for both init and shutdown). Petr Vandrovec vandrove@vc.cvut.cz > thanks, > > greg k-h > > > ===== drivers/pci/pci-driver.c 1.17 vs edited ===== > --- 1.17/drivers/pci/pci-driver.c Sat Nov 23 13:23:03 2002 > +++ edited/drivers/pci/pci-driver.c Sat Mar 8 11:21:06 2003 > @@ -48,6 +48,12 @@ > if (!pci_dev->driver && drv->probe) { > const struct pci_device_id *id; > > + if (!try_module_get(drv->owner)) { > + dev_err(dev, "Can't get a module reference for %s\n", > + drv->name); > + error = -ENODEV; > + goto exit; > + } > id = pci_match_device(drv->id_table, pci_dev); > if (id) > error = drv->probe(pci_dev, id); > @@ -55,7 +61,9 @@ > pci_dev->driver = drv; > error = 0; > } > + module_put(drv->owner); > } > +exit: > return error; > } > > @@ -66,7 +74,10 @@ > > if (drv) { > if (drv->remove) > - drv->remove(pci_dev); > + if (try_module_get(drv->owner)) { > + drv->remove(pci_dev); > + module_put(drv->owner); > + } > pci_dev->driver = NULL; > } > return 0; > ===== include/linux/pci.h 1.36 vs edited ===== > --- 1.36/include/linux/pci.h Tue Mar 4 21:09:58 2003 > +++ edited/include/linux/pci.h Sat Mar 8 11:20:15 2003 > @@ -493,6 +493,7 @@ > > struct pci_driver { > struct list_head node; > + struct module *owner; > char *name; > const struct pci_device_id *id_table; /* NULL if wants all devices */ > int (*probe) (struct pci_dev *dev, const struct pci_device_id *id); /* New device inserted */ > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-08 19:47 ` Petr Vandrovec @ 2003-03-08 19:51 ` Greg KH 2003-03-09 2:33 ` Petr Vandrovec 2003-03-08 20:03 ` Russell King 1 sibling, 1 reply; 23+ messages in thread From: Greg KH @ 2003-03-08 19:51 UTC (permalink / raw) To: Petr Vandrovec Cc: Linux Kernel List, Patrick Mochel, Ivan Kokshaysky, Jeff Garzik, Rusty Russell On Sat, Mar 08, 2003 at 08:47:14PM +0100, Petr Vandrovec wrote: > On Sat, Mar 08, 2003 at 11:12:37AM -0800, Greg KH wrote: > > On Sat, Mar 08, 2003 at 10:47:49AM +0000, Russell King wrote: > > > Hi, > > > > > > What prevents the following scenario from happening? It's purely > > > theoretical - I haven't seen this occuring. > > > > > > - Load PCI driver. > > > > > > - PCI driver registers using pci_module_init(), and adds itself to sysfs. > > > > > > - Hot-plugin a PCI device which uses this driver. sysfs matches the PCI > > > driver, and calls the PCI drivers probe function. > > > > Ugh, yes you are correct, I can't believe I missed this before. > > > > How does this patch look? > > Bad... > > What are you trying to solve? The case where while probe() is called, the module is unloaded. Same thing for remove(). That's all. > After driver calls pci_unregister_driver, > it is sure that there are no other users of this pci driver. Sure, but that's not the case of what we are protecting here. We want pci_unregister_driver() (which is usually called from the module_exit() function), to not be called if we are in the middle of calling either probe() or release(). Do you have a way of protecting the race that is described by Russell here that differs from my patch? thanks, greg k-h ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-08 19:51 ` Greg KH @ 2003-03-09 2:33 ` Petr Vandrovec 0 siblings, 0 replies; 23+ messages in thread From: Petr Vandrovec @ 2003-03-09 2:33 UTC (permalink / raw) To: Greg KH Cc: Linux Kernel List, Patrick Mochel, Ivan Kokshaysky, Jeff Garzik, Rusty Russell [Scroll to the end...] > The case where while probe() is called, the module is unloaded. > Same thing for remove(). > > That's all. > > > After driver calls pci_unregister_driver, > > it is sure that there are no other users of this pci driver. > > Sure, but that's not the case of what we are protecting here. We want > pci_unregister_driver() (which is usually called from the module_exit() Usually! pci_unregister_driver has nothing to do with module_exit(), unless there is some rule which says that pci_unregister_driver may be called from module_exit() only. > function), to not be called if we are in the middle of calling either > probe() or release(). > > Do you have a way of protecting the race that is described by Russell > here that differs from my patch? There must be normal subsystem locking which prevents you from such race. Russel King wrote: > Load PCI driver. > PCI driver registers using pci_module_init(), and adds itself to sysfs. > Hot-plugin a PCI device which uses this driver. sysfs matches the PCI > driver, and calls the PCI drivers probe function. > The probe function calls kmalloc or some other function which sleeps > (or gets preempted, if CONFIG_PREEMPT is enabled.) > We switch to another thread, which happens to be rmmod for this PCI > driver. We remove the driver since it has a use count of zero. > We switch back to the PCI driver. Oops. Both calls to ->probe (& ->remove) must happen under same lock which is used by pci_(un)register_driver. Driver expects that there is no device registered after return from pci_unregister_driver, and your if (try_module_get())... violates this - you simply skip call to ->remove if module is in unloading state - although I see no reason why you could not enter driver's ->remove function long before pci_unregister_driver is called. Also pci_unregister_driver() expects ->remove to be called for all registered devices. If you'll guard this call by try_module_get(), no device will get unregistered from driver, causing memory leaks or even crashes. Well, and after reading sysfs code: there is no problem (I would be really surprised otherwise), as both hotplug event and driver registration/removal are guarded by bus->subsys.rwsem semaphore, so pci_unregister_driver() succeeds after previous ->remove (or ->probe or ...) finishes. Just make sure that people do not call device_release_driver unless they really know what they are doing. Proper interface is bus_remove_device or bus_remove_driver... Best regards, Petr Vandrovec vandrove@vc.cvut.cz ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-08 19:47 ` Petr Vandrovec 2003-03-08 19:51 ` Greg KH @ 2003-03-08 20:03 ` Russell King 1 sibling, 0 replies; 23+ messages in thread From: Russell King @ 2003-03-08 20:03 UTC (permalink / raw) To: Petr Vandrovec Cc: Greg KH, Linux Kernel List, Patrick Mochel, Ivan Kokshaysky, Jeff Garzik, Rusty Russell On Sat, Mar 08, 2003 at 08:47:14PM +0100, Petr Vandrovec wrote: > What are you trying to solve? Let me re-send my original mail to lkml - it seems to have gobbled up all this mornings email, completely loosing *everything*. -- Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-08 19:12 ` Greg KH 2003-03-08 19:47 ` Petr Vandrovec @ 2003-03-08 20:09 ` Russell King 2003-03-08 20:21 ` Greg KH 1 sibling, 1 reply; 23+ messages in thread From: Russell King @ 2003-03-08 20:09 UTC (permalink / raw) To: Greg KH Cc: Linux Kernel List, Patrick Mochel, Ivan Kokshaysky, Jeff Garzik, Rusty Russell On Sat, Mar 08, 2003 at 11:12:37AM -0800, Greg KH wrote: > On Sat, Mar 08, 2003 at 10:47:49AM +0000, Russell King wrote: > > Hi, > > > > What prevents the following scenario from happening? It's purely > > theoretical - I haven't seen this occuring. > > > > - Load PCI driver. > > > > - PCI driver registers using pci_module_init(), and adds itself to sysfs. > > > > - Hot-plugin a PCI device which uses this driver. sysfs matches the PCI > > driver, and calls the PCI drivers probe function. > > Ugh, yes you are correct, I can't believe I missed this before. > > How does this patch look? Hrm, I'm wondering whether this should be part of the device model infrastructure. After all, surely every subsystems device driver which could be a module would need this to prevent unload races? -- Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-08 20:09 ` Russell King @ 2003-03-08 20:21 ` Greg KH 2003-03-10 21:44 ` Greg KH 0 siblings, 1 reply; 23+ messages in thread From: Greg KH @ 2003-03-08 20:21 UTC (permalink / raw) To: Linux Kernel List, Patrick Mochel, Ivan Kokshaysky, Jeff Garzik, Rusty Russell On Sat, Mar 08, 2003 at 08:09:43PM +0000, Russell King wrote: > On Sat, Mar 08, 2003 at 11:12:37AM -0800, Greg KH wrote: > > On Sat, Mar 08, 2003 at 10:47:49AM +0000, Russell King wrote: > > > Hi, > > > > > > What prevents the following scenario from happening? It's purely > > > theoretical - I haven't seen this occuring. > > > > > > - Load PCI driver. > > > > > > - PCI driver registers using pci_module_init(), and adds itself to sysfs. > > > > > > - Hot-plugin a PCI device which uses this driver. sysfs matches the PCI > > > driver, and calls the PCI drivers probe function. > > > > Ugh, yes you are correct, I can't believe I missed this before. > > > > How does this patch look? > > Hrm, I'm wondering whether this should be part of the device model > infrastructure. After all, surely every subsystems device driver > which could be a module would need this to prevent unload races? Very good point, I can see myself duplicating this logic for every subsystem :) I'll look into moving this into the driver core later today. thanks, greg k-h ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-08 20:21 ` Greg KH @ 2003-03-10 21:44 ` Greg KH 2003-03-10 23:48 ` Oliver Neukum 0 siblings, 1 reply; 23+ messages in thread From: Greg KH @ 2003-03-10 21:44 UTC (permalink / raw) To: Linux Kernel List, Patrick Mochel, Ivan Kokshaysky, Jeff Garzik, Rusty Russell On Sat, Mar 08, 2003 at 12:21:01PM -0800, Greg KH wrote: > On Sat, Mar 08, 2003 at 08:09:43PM +0000, Russell King wrote: > > On Sat, Mar 08, 2003 at 11:12:37AM -0800, Greg KH wrote: > > > On Sat, Mar 08, 2003 at 10:47:49AM +0000, Russell King wrote: > > > > Hi, > > > > > > > > What prevents the following scenario from happening? It's purely > > > > theoretical - I haven't seen this occuring. > > > > > > > > - Load PCI driver. > > > > > > > > - PCI driver registers using pci_module_init(), and adds itself to sysfs. > > > > > > > > - Hot-plugin a PCI device which uses this driver. sysfs matches the PCI > > > > driver, and calls the PCI drivers probe function. > > > > > > Ugh, yes you are correct, I can't believe I missed this before. > > > > > > How does this patch look? > > > > Hrm, I'm wondering whether this should be part of the device model > > infrastructure. After all, surely every subsystems device driver > > which could be a module would need this to prevent unload races? > > Very good point, I can see myself duplicating this logic for every > subsystem :) > > I'll look into moving this into the driver core later today. Ok, how about this patch? It boots for me :) If no one has any complaints, I'll work on moving the USB core to using this pointer instead of its own in struct usb_driver. thanks, greg k-h # Driver core: add module owner to struct device_driver diff -Nru a/drivers/base/bus.c b/drivers/base/bus.c --- a/drivers/base/bus.c Mon Mar 10 13:52:15 2003 +++ b/drivers/base/bus.c Mon Mar 10 13:52:15 2003 @@ -263,14 +263,25 @@ if (dev->bus->match(dev,drv)) { dev->driver = drv; if (drv->probe) { - if ((error = drv->probe(dev))) { - dev->driver = NULL; - return error; + if (!try_module_get(drv->owner)) { + dev_err(dev, + "Can't get a module reference for %s\n", + drv->name); + goto exit; } + + if ((error = drv->probe(dev))) + dev->driver = NULL; + + module_put(drv->owner); + + if (error) + goto exit; } device_bind_driver(dev); error = 0; } +exit: return error; } @@ -350,8 +361,16 @@ sysfs_remove_link(&drv->kobj,dev->kobj.name); list_del_init(&dev->driver_list); devclass_remove_device(dev); - if (drv->remove) - drv->remove(dev); + if (drv->remove) { + if (!try_module_get(drv->owner)) { + dev_err(dev, + "Can't get a module reference for %s\n", + drv->name); + } else { + drv->remove(dev); + module_put(drv->owner); + } + } dev->driver = NULL; } } diff -Nru a/drivers/base/power.c b/drivers/base/power.c --- a/drivers/base/power.c Mon Mar 10 13:52:15 2003 +++ b/drivers/base/power.c Mon Mar 10 13:52:15 2003 @@ -41,10 +41,17 @@ list_for_each(node,&devices_subsys.kset.list) { struct device * dev = to_dev(node); if (dev->driver && dev->driver->suspend) { - pr_debug("suspending device %s\n",dev->name); - error = dev->driver->suspend(dev,state,level); - if (error) - printk(KERN_ERR "%s: suspend returned %d\n",dev->name,error); + if (!try_module_get(dev->driver->owner)) { + dev_err(dev, + "Can't get a module reference for %s\n", + dev->driver->name); + } else { + pr_debug("suspending device %s\n",dev->name); + error = dev->driver->suspend(dev,state,level); + if (error) + printk(KERN_ERR "%s: suspend returned %d\n",dev->name,error); + module_put(dev->driver->owner); + } } } up_write(&devices_subsys.rwsem); @@ -67,8 +74,15 @@ list_for_each_prev(node,&devices_subsys.kset.list) { struct device * dev = to_dev(node); if (dev->driver && dev->driver->resume) { - pr_debug("resuming device %s\n",dev->name); - dev->driver->resume(dev,level); + if (!try_module_get(dev->driver->owner)) { + dev_err(dev, + "Can't get a module reference for %s\n", + dev->driver->name); + } else { + pr_debug("resuming device %s\n",dev->name); + dev->driver->resume(dev,level); + module_put(dev->driver->owner); + } } } up_write(&devices_subsys.rwsem); @@ -89,8 +103,15 @@ list_for_each(entry,&devices_subsys.kset.list) { struct device * dev = to_dev(entry); if (dev->driver && dev->driver->shutdown) { - pr_debug("shutting down %s\n",dev->name); - dev->driver->shutdown(dev); + if (!try_module_get(dev->driver->owner)) { + dev_err(dev, + "Can't get a module reference for %s\n", + dev->driver->name); + } else { + pr_debug("shutting down %s\n",dev->name); + dev->driver->shutdown(dev); + module_put(dev->driver->owner); + } } } up_write(&devices_subsys.rwsem); diff -Nru a/include/linux/device.h b/include/linux/device.h --- a/include/linux/device.h Mon Mar 10 13:52:15 2003 +++ b/include/linux/device.h Mon Mar 10 13:52:15 2003 @@ -112,6 +112,7 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *); struct device_driver { + struct module * owner; char * name; struct bus_type * bus; struct device_class * devclass; ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-10 21:44 ` Greg KH @ 2003-03-10 23:48 ` Oliver Neukum 2003-03-10 23:51 ` Greg KH 0 siblings, 1 reply; 23+ messages in thread From: Oliver Neukum @ 2003-03-10 23:48 UTC (permalink / raw) To: Greg KH, Linux Kernel List, Patrick Mochel, Ivan Kokshaysky, Jeff Garzik, Rusty Russell > diff -Nru a/drivers/base/bus.c b/drivers/base/bus.c > --- a/drivers/base/bus.c Mon Mar 10 13:52:15 2003 > +++ b/drivers/base/bus.c Mon Mar 10 13:52:15 2003 > @@ -263,14 +263,25 @@ > if (dev->bus->match(dev,drv)) { > dev->driver = drv; > if (drv->probe) { > - if ((error = drv->probe(dev))) { It seems that the semaphore in bus_add_device() makes this unnecessary. [..] > diff -Nru a/drivers/base/power.c b/drivers/base/power.c > --- a/drivers/base/power.c Mon Mar 10 13:52:15 2003 > +++ b/drivers/base/power.c Mon Mar 10 13:52:15 2003 > @@ -41,10 +41,17 @@ > list_for_each(node,&devices_subsys.kset.list) { > struct device * dev = to_dev(node); > if (dev->driver && dev->driver->suspend) { > - pr_debug("suspending device %s\n",dev->name); > - error = dev->driver->suspend(dev,state,level); This change is no good. Either the semaphore protects you or it doesn't. If it doesn't you can't even trust the dev pointer, because suspend() can sleep. Regards Oliver ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-10 23:48 ` Oliver Neukum @ 2003-03-10 23:51 ` Greg KH 2003-03-11 1:04 ` Roman Zippel 0 siblings, 1 reply; 23+ messages in thread From: Greg KH @ 2003-03-10 23:51 UTC (permalink / raw) To: Oliver Neukum Cc: Linux Kernel List, Patrick Mochel, Ivan Kokshaysky, Jeff Garzik, Rusty Russell On Tue, Mar 11, 2003 at 12:48:43AM +0100, Oliver Neukum wrote: > > > diff -Nru a/drivers/base/bus.c b/drivers/base/bus.c > > --- a/drivers/base/bus.c Mon Mar 10 13:52:15 2003 > > +++ b/drivers/base/bus.c Mon Mar 10 13:52:15 2003 > > @@ -263,14 +263,25 @@ > > if (dev->bus->match(dev,drv)) { > > dev->driver = drv; > > if (drv->probe) { > > - if ((error = drv->probe(dev))) { > > It seems that the semaphore in bus_add_device() makes this unnecessary. Hm, yes. I think you are correct. So this patch is not needed, and the struct module * can be ripped out of struct usb_driver too :) Thanks, greg k-h ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-10 23:51 ` Greg KH @ 2003-03-11 1:04 ` Roman Zippel 2003-03-11 1:15 ` Greg KH 0 siblings, 1 reply; 23+ messages in thread From: Roman Zippel @ 2003-03-11 1:04 UTC (permalink / raw) To: Greg KH Cc: Oliver Neukum, Linux Kernel List, Patrick Mochel, Ivan Kokshaysky, Jeff Garzik, Rusty Russell Hi, On Mon, 10 Mar 2003, Greg KH wrote: > > It seems that the semaphore in bus_add_device() makes this unnecessary. > > Hm, yes. I think you are correct. > > So this patch is not needed, and the struct module * can be ripped out > of struct usb_driver too :) I think it's not easy. I haven't studied the code completely yet, but e.g. when you attach a device to a driver you also have to get a reference to the driver. I think there are more interesting races, e.g. when you create a sysfs symlink, that symlink might also have references to a module. bye, Roman ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-11 1:04 ` Roman Zippel @ 2003-03-11 1:15 ` Greg KH 2003-03-11 9:00 ` Oliver Neukum 2003-03-11 11:05 ` Roman Zippel 0 siblings, 2 replies; 23+ messages in thread From: Greg KH @ 2003-03-11 1:15 UTC (permalink / raw) To: Roman Zippel Cc: Oliver Neukum, Linux Kernel List, Patrick Mochel, Ivan Kokshaysky, Jeff Garzik, Rusty Russell On Tue, Mar 11, 2003 at 02:04:20AM +0100, Roman Zippel wrote: > On Mon, 10 Mar 2003, Greg KH wrote: > > > > It seems that the semaphore in bus_add_device() makes this unnecessary. > > > > Hm, yes. I think you are correct. > > > > So this patch is not needed, and the struct module * can be ripped out > > of struct usb_driver too :) > > I think it's not easy. I haven't studied the code completely yet, but e.g. > when you attach a device to a driver you also have to get a reference to > the driver. You get a link to the driver, but you can't increment the module count of the driver at that time, as we have to be able to remove a module somehow :) > I think there are more interesting races, e.g. when you create a sysfs > symlink, that symlink might also have references to a module. Yeah, I still think there are some nasty issues with regards to being in a sysfs directory, with a open file handle, and the module is removed. But I haven't checked stuff like that in a while. CONFIG_MODULE_UNLOAD, just say no. thanks, greg k-h ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-11 1:15 ` Greg KH @ 2003-03-11 9:00 ` Oliver Neukum 2003-03-11 15:06 ` Patrick Mochel 2003-03-11 11:05 ` Roman Zippel 1 sibling, 1 reply; 23+ messages in thread From: Oliver Neukum @ 2003-03-11 9:00 UTC (permalink / raw) To: Greg KH, Roman Zippel Cc: Linux Kernel List, Patrick Mochel, Ivan Kokshaysky, Jeff Garzik, Rusty Russell Am Dienstag, 11. März 2003 02:15 schrieb Greg KH: > On Tue, Mar 11, 2003 at 02:04:20AM +0100, Roman Zippel wrote: > > On Mon, 10 Mar 2003, Greg KH wrote: > > > > It seems that the semaphore in bus_add_device() makes this > > > > unnecessary. > > > > > > Hm, yes. I think you are correct. > > > > > > So this patch is not needed, and the struct module * can be ripped out > > > of struct usb_driver too :) > > > > I think it's not easy. I haven't studied the code completely yet, but > > e.g. when you attach a device to a driver you also have to get a > > reference to the driver. > > You get a link to the driver, but you can't increment the module count > of the driver at that time, as we have to be able to remove a module > somehow :) That is simple. Export a generic way to disconnect a driver from a device. > > I think there are more interesting races, e.g. when you create a sysfs > > symlink, that symlink might also have references to a module. > > Yeah, I still think there are some nasty issues with regards to being in > a sysfs directory, with a open file handle, and the module is removed. > But I haven't checked stuff like that in a while. > > CONFIG_MODULE_UNLOAD, just say no. That is taking the easy way out. Regards Oliver ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-11 9:00 ` Oliver Neukum @ 2003-03-11 15:06 ` Patrick Mochel 2003-03-11 16:07 ` Oliver Neukum 0 siblings, 1 reply; 23+ messages in thread From: Patrick Mochel @ 2003-03-11 15:06 UTC (permalink / raw) To: Oliver Neukum Cc: Greg KH, Roman Zippel, Linux Kernel List, Ivan Kokshaysky, Jeff Garzik, Rusty Russell > > > I think it's not easy. I haven't studied the code completely yet, but > > > e.g. when you attach a device to a driver you also have to get a > > > reference to the driver. > > > > You get a link to the driver, but you can't increment the module count > > of the driver at that time, as we have to be able to remove a module > > somehow :) > > That is simple. Export a generic way to disconnect a driver from a device. It's not that easy - Linux has always supported the operation of being able to remove a module while it is attached to devices. The reference count only goes up if a device is opened. This means the module refcount must remain at 0, even after it's bound to devices. Changing this would require a change in visible behavior, and require an extra step by a user to disconnect the driver before they unload the module. -pat ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-11 15:06 ` Patrick Mochel @ 2003-03-11 16:07 ` Oliver Neukum 2003-03-16 13:13 ` Rusty Russell 0 siblings, 1 reply; 23+ messages in thread From: Oliver Neukum @ 2003-03-11 16:07 UTC (permalink / raw) To: Patrick Mochel Cc: Greg KH, Roman Zippel, Linux Kernel List, Ivan Kokshaysky, Jeff Garzik, Rusty Russell > This means the module refcount must remain at 0, even after it's bound to > devices. Changing this would require a change in visible behavior, and > require an extra step by a user to disconnect the driver before they > unload the module. Yes, that would mean changing behaviour. On the other hand, we require new module utilities for 2.6 anyway, so why not? Regards Oliver ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-11 16:07 ` Oliver Neukum @ 2003-03-16 13:13 ` Rusty Russell 0 siblings, 0 replies; 23+ messages in thread From: Rusty Russell @ 2003-03-16 13:13 UTC (permalink / raw) To: Oliver Neukum Cc: Greg KH, Roman Zippel, Linux Kernel List, Ivan Kokshaysky, Jeff Garzik, Rusty Russell In message <200303111707.09119.oliver@neukum.name> you write: > > > This means the module refcount must remain at 0, even after it's bound to > > devices. Changing this would require a change in visible behavior, and > > require an extra step by a user to disconnect the driver before they > > unload the module. > > Yes, that would mean changing behaviour. On the other hand, we require > new module utilities for 2.6 anyway, so why not? Because I think it's silly, from a user point of view. Why are you removing the module, after all? Anyway, a pre-removal notifier in sys_module_delete would have the same effect (with the same issues if the refcount isn't 0 after all, and you then want to fail the rmmod). Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-11 1:15 ` Greg KH 2003-03-11 9:00 ` Oliver Neukum @ 2003-03-11 11:05 ` Roman Zippel 2003-03-11 15:27 ` Patrick Mochel 1 sibling, 1 reply; 23+ messages in thread From: Roman Zippel @ 2003-03-11 11:05 UTC (permalink / raw) To: Greg KH Cc: Oliver Neukum, Linux Kernel List, Patrick Mochel, Ivan Kokshaysky, Jeff Garzik, Rusty Russell Hi, On Mon, 10 Mar 2003, Greg KH wrote: > > I think it's not easy. I haven't studied the code completely yet, but e.g. > > when you attach a device to a driver you also have to get a reference to > > the driver. > > You get a link to the driver, but you can't increment the module count > of the driver at that time, as we have to be able to remove a module > somehow :) Somehow you have to protect dev->driver, you cannot resolve the driver pointer without holding the bus lock or holding a reference. If you don't get a reference, any access via this pointer must be done under the bus lock. > Yeah, I still think there are some nasty issues with regards to being in > a sysfs directory, with a open file handle, and the module is removed. > But I haven't checked stuff like that in a while. > > CONFIG_MODULE_UNLOAD, just say no. That's certainly an option, but I'm afraid not too many people will do this. bye, Roman ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-11 11:05 ` Roman Zippel @ 2003-03-11 15:27 ` Patrick Mochel 2003-03-11 20:09 ` Roman Zippel 2003-03-16 13:05 ` Rusty Russell 0 siblings, 2 replies; 23+ messages in thread From: Patrick Mochel @ 2003-03-11 15:27 UTC (permalink / raw) To: Roman Zippel Cc: Greg KH, Oliver Neukum, Linux Kernel List, Ivan Kokshaysky, Jeff Garzik, Rusty Russell > Somehow you have to protect dev->driver, you cannot resolve the driver > pointer without holding the bus lock or holding a reference. If you don't > get a reference, any access via this pointer must be done under the bus > lock. struct device_driver has a separate reference count than the module reference count. In the driver core, this reference count is used. It is acquired by taking the bus's lock. The module refcount isn't used, because it resides in the driver, so we have to guarantee we can dereference the pointer before we deref the pointer to increment the refcount. It's a bloody mess, and we work around it. The driver has an unload_sem that is locked until the driver's refcount goes to 0. When it does, it's unlocked. A driver_unregister() call will try and take this semaphore while unregistering, meaning it will block until outstanding references go away. I don't particularly love it, but it's simple enough and it works. Ideally, we'd have one reference count and this wouldn't be an issue. However, with the evolution of the driver core and the module core in 2.5, these details haven't had a chance to be worked. I hope in 2.7, we can achieve more unification between the two. > > Yeah, I still think there are some nasty issues with regards to being in > > a sysfs directory, with a open file handle, and the module is removed. > > But I haven't checked stuff like that in a while. > > > > CONFIG_MODULE_UNLOAD, just say no. > > That's certainly an option, but I'm afraid not too many people will do > this. Greg, and Rusty, are right. Dealing with this is a PITA, and I think will always be. I'm willing to take the Nancy Reagan platform, too. -pat ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-11 15:27 ` Patrick Mochel @ 2003-03-11 20:09 ` Roman Zippel 2003-03-11 19:15 ` Patrick Mochel 2003-03-16 13:05 ` Rusty Russell 1 sibling, 1 reply; 23+ messages in thread From: Roman Zippel @ 2003-03-11 20:09 UTC (permalink / raw) To: Patrick Mochel Cc: Greg KH, Oliver Neukum, Linux Kernel List, Ivan Kokshaysky, Jeff Garzik, Rusty Russell Hi, On Tue, 11 Mar 2003, Patrick Mochel wrote: > > > CONFIG_MODULE_UNLOAD, just say no. > > > > That's certainly an option, but I'm afraid not too many people will do > > this. > > Greg, and Rusty, are right. Dealing with this is a PITA, and I think will > always be. I'm willing to take the Nancy Reagan platform, too. Right with what? I think the problem is complex enough, that it's beyond a simple right/wrong scheme. What is the "Nancy Reagan platform"? bye, Roman ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-11 20:09 ` Roman Zippel @ 2003-03-11 19:15 ` Patrick Mochel 2003-03-12 2:28 ` Roman Zippel 0 siblings, 1 reply; 23+ messages in thread From: Patrick Mochel @ 2003-03-11 19:15 UTC (permalink / raw) To: Roman Zippel Cc: Greg KH, Oliver Neukum, Linux Kernel List, Ivan Kokshaysky, Jeff Garzik, Rusty Russell On Tue, 11 Mar 2003, Roman Zippel wrote: > Hi, > > On Tue, 11 Mar 2003, Patrick Mochel wrote: > > > > > CONFIG_MODULE_UNLOAD, just say no. > > > > > > That's certainly an option, but I'm afraid not too many people will do > > > this. > > > > Greg, and Rusty, are right. Dealing with this is a PITA, and I think will > > always be. I'm willing to take the Nancy Reagan platform, too. > > Right with what? With the idea that unloading modules is a bad idea. > What is the "Nancy Reagan platform"? "Just say no". It was a big anti-drug campaign in the US targeted at schoolchildren, spearheaded in the mid-80's by Nancy Reagan. -pat ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-11 19:15 ` Patrick Mochel @ 2003-03-12 2:28 ` Roman Zippel 0 siblings, 0 replies; 23+ messages in thread From: Roman Zippel @ 2003-03-12 2:28 UTC (permalink / raw) To: Patrick Mochel Cc: Greg KH, Oliver Neukum, Linux Kernel List, Ivan Kokshaysky, Jeff Garzik, Rusty Russell Hi, On Tue, 11 Mar 2003, Patrick Mochel wrote: > > > Greg, and Rusty, are right. Dealing with this is a PITA, and I think will > > > always be. I'm willing to take the Nancy Reagan platform, too. > > > > Right with what? > > With the idea that unloading modules is a bad idea. With the current module refcount model I can only agree. OTOH I need a sufficiently complex example, which gets the module locking right (file systems are just too simple), then I can actually produce a patch, which shows the advantages of a different model and the driver model/sysfs looks like an interesting victim. :) > > What is the "Nancy Reagan platform"? > > "Just say no". It was a big anti-drug campaign in the US targeted at > schoolchildren, spearheaded in the mid-80's by Nancy Reagan. Well, this always look simple, but it hardly ever solves the real problem. bye, Roman ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PCI driver module unload race? 2003-03-11 15:27 ` Patrick Mochel 2003-03-11 20:09 ` Roman Zippel @ 2003-03-16 13:05 ` Rusty Russell 1 sibling, 0 replies; 23+ messages in thread From: Rusty Russell @ 2003-03-16 13:05 UTC (permalink / raw) To: Patrick Mochel Cc: Roman Zippel, Werner Almesberger, Greg KH, Oliver Neukum, Linux Kernel List, Ivan Kokshaysky, Jeff Garzik In message <Pine.LNX.4.33.0303110916540.1003-100000@localhost.localdomain> you write: > The driver has an unload_sem that is locked until the driver's refcount > goes to 0. When it does, it's unlocked. A driver_unregister() call will > try and take this semaphore while unregistering, meaning it will block > until outstanding references go away. That sounds like an odd way of doing it. More normal would be either a rw lock of some kind (ie. always hold read when calling through functions), or if recursion is a worry, a lock, an atomic reference count, and a "who is waiting for it to be unloaded" task ptr. As one example, see net/core/netfilter.c: nf_unregister_hook uses a br_lock, nf_unregister_sockopt uses the refcount approach. Any way it's done, the effect is the same: deregisteration sleeps until the object is unused. > I don't particularly love it, but it's simple enough and it works. > Ideally, we'd have one reference count and this wouldn't be an issue. > However, with the evolution of the driver core and the module core in 2.5, > these details haven't had a chance to be worked. I hope in 2.7, we can > achieve more unification between the two. Unfortunately, I think that will require changing every module *and* every registration function, and noone has produced a reasonable model which has the "unload only if noone is using the module" semantics (which requires atomic "deactivation" of interfaces, like try_module_get). Currently, if you go for a full dynamic interface (ie. can be unregistered and structures destroyed at any time, not just on module unload), you don't have a race, but you don't bump module refcounts which users expect (ie. module appears "unused" and hangs on rmmod), which can be a problem depending on the interface (mainly whether the rmmod hang would be infinite). If you just use a module pointer and try_module_get, the interface can't be safely used in *general* (Werner's "unregister_xxx then kfree(xxx)" problem), but as long as the lifetime of the objects are tied to module lifetime (as many are), it works. Yes, this means some code has to do both. But it's simple, doesn't significantly effect code using the interface, and is easy to rip out when Something Better comes along. > Greg, and Rusty, are right. Dealing with this is a PITA, and I think will > always be. I'm willing to take the Nancy Reagan platform, too. And another reason that I like the "easy to remove" nature of try_module_get, for all its flaws. Cheers, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2003-03-17 17:34 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-03-08 10:47 PCI driver module unload race? Russell King 2003-03-08 19:12 ` Greg KH 2003-03-08 19:47 ` Petr Vandrovec 2003-03-08 19:51 ` Greg KH 2003-03-09 2:33 ` Petr Vandrovec 2003-03-08 20:03 ` Russell King 2003-03-08 20:09 ` Russell King 2003-03-08 20:21 ` Greg KH 2003-03-10 21:44 ` Greg KH 2003-03-10 23:48 ` Oliver Neukum 2003-03-10 23:51 ` Greg KH 2003-03-11 1:04 ` Roman Zippel 2003-03-11 1:15 ` Greg KH 2003-03-11 9:00 ` Oliver Neukum 2003-03-11 15:06 ` Patrick Mochel 2003-03-11 16:07 ` Oliver Neukum 2003-03-16 13:13 ` Rusty Russell 2003-03-11 11:05 ` Roman Zippel 2003-03-11 15:27 ` Patrick Mochel 2003-03-11 20:09 ` Roman Zippel 2003-03-11 19:15 ` Patrick Mochel 2003-03-12 2:28 ` Roman Zippel 2003-03-16 13:05 ` Rusty Russell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox