* bug 2400
@ 2004-04-01 21:15 Andrew Morton
2004-04-01 21:52 ` Matt Gulick
` (2 more replies)
0 siblings, 3 replies; 61+ messages in thread
From: Andrew Morton @ 2004-04-01 21:15 UTC (permalink / raw)
To: linux-usb-devel, linux-scsi
Apparently there is some controversy over whether this:
http://bugme.osdl.org/show_bug.cgi?id=2400 is a usb-storage problem or a
SCSI problem.
Can someone please shed some light, propose a way to get it fixed?
Thanks.
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: bug 2400 2004-04-01 21:15 bug 2400 Andrew Morton @ 2004-04-01 21:52 ` Matt Gulick 2004-04-01 22:08 ` Andrew Morton 2004-04-01 22:40 ` James Bottomley 2004-04-01 23:07 ` Matthew Dharm 2004-04-01 23:32 ` James Bottomley 2 siblings, 2 replies; 61+ messages in thread From: Matt Gulick @ 2004-04-01 21:52 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-usb-devel, linux-scsi On Thu, 2004-04-01 at 15:15, Andrew Morton wrote: > Apparently there is some controversy over whether this: > http://bugme.osdl.org/show_bug.cgi?id=2400 is a usb-storage problem or > a > SCSI problem. > > Can someone please shed some light, propose a way to get it fixed? > > Thanks. > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Andrew, This could be in the SCSI stack. USB is just a transport mechanism or physical layer. The commands are actually handled by SCSI. The commands are packaged into SBP-2 (SCSI Bus Protocol) data grams for delivery. Where you will get into trouble is that SCSI is not considered to be a 'Dynamic' bus. Devices do not normally come and go, while USB and 1394 devices are 'Hot Plugable'. Think about what happens on MacOS X or Windose, the device must first be stopped and ejected (virtual ejection) prior to unplugging. This allows the driver to tear down the device control structures and to release the associated memory. When the device was disconnected, these structures may have been torn down and released without the SCSI drivers knowledge, hence the zero pointer issue. Remember, SCSI is not hot plugable. The SCSI driver must receive a START/STOP cdb with the bits set for STOP. An EJECT call would not hurt if supported by the device in question. Just like I am constantly telling my kids, "Just 'cause you can, does not mean you should" Matt ---------------------------------------- Matt Gulick Sr. Staff Engineer Adaptec, Inc. gulickconsulting@direcway.com matt_gulick@adaptec.com (715) 426-0884 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-01 21:52 ` Matt Gulick @ 2004-04-01 22:08 ` Andrew Morton 2004-04-01 22:48 ` Matt Gulick 2004-04-01 22:40 ` James Bottomley 1 sibling, 1 reply; 61+ messages in thread From: Andrew Morton @ 2004-04-01 22:08 UTC (permalink / raw) To: gulickconsulting; +Cc: linux-usb-devel, linux-scsi Matt Gulick <gulickconsulting@direcway.com> wrote: > > Remember, SCSI is not hot plugable. oopsing the kernel is not a particularly friendly way of telling this to our users. If we have to bandaid over this, say, by leaking some memory and emitting some rude printk's then OK. Allowing the machine to kill itself, potentially losing all the user's unsaved work is distinctly windows 95ish, and is not OK, agree? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-01 22:08 ` Andrew Morton @ 2004-04-01 22:48 ` Matt Gulick 0 siblings, 0 replies; 61+ messages in thread From: Matt Gulick @ 2004-04-01 22:48 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-usb-devel, linux-scsi On Thu, 2004-04-01 at 16:08, Andrew Morton wrote: > Matt Gulick <gulickconsulting@direcway.com> wrote: > > > > Remember, SCSI is not hot plugable. > > oopsing the kernel is not a particularly friendly way of telling this > to > our users. > > If we have to bandaid over this, say, by leaking some memory and > emitting > some rude printk's then OK. Allowing the machine to kill itself, > potentially losing all the user's unsaved work is distinctly windows > 95ish, > and is not OK, agree? > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Very true. Oopsing the kernel is never good.(the BSOD is my favorite screen though, when a PC crashes and my iBook chugs along) The thing to do is to run through a clean-up proc when this happens. This is also why I use multiple ptrs in my code. The structures would then be torn down and released. An alert might also be generated to inform the user that the device was not properly removed which could result in data loss. Anything is better than an Oops. I would volunteer to take this on but I am between jobs at the moment and would not want to start something that I did not have time to properly complete. I will look at it and see if I have any suggestions for whoever takes this on. Matt ---------------------------------------- Matt Gulick Sr. Staff Engineer Adaptec, Inc. gulickconsulting@direcway.com matt_gulick@adaptec.com (715) 426-0884 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-01 21:52 ` Matt Gulick 2004-04-01 22:08 ` Andrew Morton @ 2004-04-01 22:40 ` James Bottomley 2004-04-01 22:53 ` Matt Gulick 1 sibling, 1 reply; 61+ messages in thread From: James Bottomley @ 2004-04-01 22:40 UTC (permalink / raw) To: gulickconsulting; +Cc: Andrew Morton, linux-usb-devel, SCSI Mailing List On Thu, 2004-04-01 at 16:52, Matt Gulick wrote: > Where you will get into trouble is that SCSI is not considered to be a > 'Dynamic' bus. Devices do not normally come and go, while USB and 1394 > devices are 'Hot Plugable'. It isn't? considering all the refcounting work that's gone into the SCSI stack to make it support hotplug transports, I don't think such a blanket statement is supportable. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-01 22:40 ` James Bottomley @ 2004-04-01 22:53 ` Matt Gulick 0 siblings, 0 replies; 61+ messages in thread From: Matt Gulick @ 2004-04-01 22:53 UTC (permalink / raw) To: James Bottomley; +Cc: Andrew Morton, linux-usb-devel, SCSI Mailing List On Thu, 2004-04-01 at 16:40, James Bottomley wrote: > On Thu, 2004-04-01 at 16:52, Matt Gulick wrote: > > Where you will get into trouble is that SCSI is not considered to be > a > > 'Dynamic' bus. Devices do not normally come and go, while USB and > 1394 > > devices are 'Hot Plugable'. > > It isn't? considering all the refcounting work that's gone into the > SCSI > stack to make it support hotplug transports, I don't think such a > blanket statement is supportable. > > James James, You may be right. I have not fully dissected the Linux SCSI stack. I am very intimately familiar with the Darwin SCSI stack as well as the Windose stacks (I had to fully architect USB 2.0 stack with support for SBP-2 under SE, ME and 2k, [I feel so dirty]). The Linux stack may be more forward looking. Matt ---------------------------------------- Matt Gulick Sr. Staff Engineer Adaptec, Inc. gulickconsulting@direcway.com matt_gulick@adaptec.com (715) 426-0884 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-01 21:15 bug 2400 Andrew Morton 2004-04-01 21:52 ` Matt Gulick @ 2004-04-01 23:07 ` Matthew Dharm 2004-04-01 23:32 ` James Bottomley 2 siblings, 0 replies; 61+ messages in thread From: Matthew Dharm @ 2004-04-01 23:07 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-usb-devel, linux-scsi [-- Attachment #1: Type: text/plain, Size: 729 bytes --] On Thu, Apr 01, 2004 at 01:15:02PM -0800, Andrew Morton wrote: > > Apparently there is some controversy over whether this: > http://bugme.osdl.org/show_bug.cgi?id=2400 is a usb-storage problem or a > SCSI problem. > > Can someone please shed some light, propose a way to get it fixed? There's controversy over this? It looks pretty clearly to be a SCSI or CD-ROM bug, not a usb-storage bug. The OOPS looks like it's all about object handling, and usb-storage does nothing directly with objects. Matt -- Matthew Dharm Home: mdharm-usb@one-eyed-alien.net Maintainer, Linux USB Mass Storage Driver Why am I talking to a toilet brush? -- CEO User Friendly, 4/30/1998 [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-01 21:15 bug 2400 Andrew Morton 2004-04-01 21:52 ` Matt Gulick 2004-04-01 23:07 ` Matthew Dharm @ 2004-04-01 23:32 ` James Bottomley 2004-04-02 0:29 ` Steven Dake ` (2 more replies) 2 siblings, 3 replies; 61+ messages in thread From: James Bottomley @ 2004-04-01 23:32 UTC (permalink / raw) To: Andrew Morton, greg, Jens Axboe; +Cc: linux-usb-devel, SCSI Mailing List On Thu, 2004-04-01 at 16:15, Andrew Morton wrote: > Apparently there is some controversy over whether this: > http://bugme.osdl.org/show_bug.cgi?id=2400 is a usb-storage problem or a > SCSI problem. Well, I know why this happens, but I'm not entirely clear how to fix it. The problem comes because the cdrom open and close take and release references to the SCSI generic device (as they're supposed to). However, Upper level Drivers like sr are implemented as generic device drivers in the driverfs model. When a USB unplug comes along, it calls scsi_remove_device, which eventually calls device_del(). The problem is that device_del triggers the ->remove methods of all the attached drivers and the sr_remove method calls cdrom_unregister which throws away the cdrom device state, even though the actual device has active references. Some time later, the device is closed but there's now bogus state because the sr_remove method has kfreed the struct scsi_cd which contains the struct cdrom_device_info. Now, the questions are, whose issue is this and how do we fix it? I can see that a driver needs early notification of unplugs so it can deny all access to a gone device. On the other hand, for a user land open where we still have to hold resources in the driver, we'd like the driver to have a notify when the device reference count drops to zero so we can clean up. This problem, by the way, exists in a lesser form for sd: the sd remove method will free the device for reattachment even though it might have active references. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-01 23:32 ` James Bottomley @ 2004-04-02 0:29 ` Steven Dake 2004-04-02 8:43 ` Mike Anderson 2004-04-02 23:36 ` James Bottomley 2 siblings, 0 replies; 61+ messages in thread From: Steven Dake @ 2004-04-02 0:29 UTC (permalink / raw) To: James Bottomley Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Thu, 2004-04-01 at 16:32, James Bottomley wrote: > On Thu, 2004-04-01 at 16:15, Andrew Morton wrote: > > Apparently there is some controversy over whether this: > > http://bugme.osdl.org/show_bug.cgi?id=2400 is a usb-storage problem or a > > SCSI problem. > > Well, I know why this happens, but I'm not entirely clear how to fix it. > > The problem comes because the cdrom open and close take and release > references to the SCSI generic device (as they're supposed to). > > However, Upper level Drivers like sr are implemented as generic device > drivers in the driverfs model. > > When a USB unplug comes along, it calls scsi_remove_device, which > eventually calls device_del(). The problem is that device_del triggers > the ->remove methods of all the attached drivers and the sr_remove > method calls cdrom_unregister which throws away the cdrom device state, > even though the actual device has active references. > > Some time later, the device is closed but there's now bogus state > because the sr_remove method has kfreed the struct scsi_cd which > contains the struct cdrom_device_info. > > Now, the questions are, whose issue is this and how do we fix it? I can > see that a driver needs early notification of unplugs so it can deny all > access to a gone device. On the other hand, for a user land open where > we still have to hold resources in the driver, we'd like the driver to > have a notify when the device reference count drops to zero so we can > clean up. > > This problem, by the way, exists in a lesser form for sd: the sd remove > method will free the device for reattachment even though it might have > active references. > One option to fix all this is to add a mechanism to blow away active references to any block devices. These references can be files, filesystems, raid devices, lvm devices, etc. This way, the remove can be called because no active references are left. The user gets a slew of errors in the user application that has the open file on the filesystem or block device, but atleast these can be managed (vs an oopsed kernel). Of course, adding forced unmount (required to blow away the references) may not be desireable for 2.6 since it is invasive.. Perhaps something to be considered for 2.7? Thanks -steve > James > > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-01 23:32 ` James Bottomley 2004-04-02 0:29 ` Steven Dake @ 2004-04-02 8:43 ` Mike Anderson 2004-04-02 15:57 ` James Bottomley 2004-04-02 23:36 ` James Bottomley 2 siblings, 1 reply; 61+ messages in thread From: Mike Anderson @ 2004-04-02 8:43 UTC (permalink / raw) To: James Bottomley Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List, stern James Bottomley [James.Bottomley@steeleye.com] wrote: > > Well, I know why this happens, but I'm not entirely clear how to fix it. > > The problem comes because the cdrom open and close take and release > references to the SCSI generic device (as they're supposed to). > > However, Upper level Drivers like sr are implemented as generic device > drivers in the driverfs model. > > When a USB unplug comes along, it calls scsi_remove_device, which > eventually calls device_del(). The problem is that device_del triggers > the ->remove methods of all the attached drivers and the sr_remove > method calls cdrom_unregister which throws away the cdrom device state, > even though the actual device has active references. Yes, we reordered some of this in sd. As your comment down below indicates reordering will reduce the window but not eliminate it. > > Some time later, the device is closed but there's now bogus state > because the sr_remove method has kfreed the struct scsi_cd which > contains the struct cdrom_device_info. > > Now, the questions are, whose issue is this and how do we fix it? I can > see that a driver needs early notification of unplugs so it can deny all > access to a gone device. On the other hand, for a user land open where > we still have to hold resources in the driver, we'd like the driver to > have a notify when the device reference count drops to zero so we can > clean up. > > This problem, by the way, exists in a lesser form for sd: the sd remove > method will free the device for reattachment even though it might have > active references. > I have looked at the sd issue off and on due to the previous open race reported by Alan Stern. While the window can be narrowed inside SCSI you need help for the calling subsystem to know when it is OK to cleanup and your routine will not be called anymore. A similar problem also showed up in the tear down the host directory entry in /proc/scsi but was only fixed up so far due to its depreciated status. http://marc.theaimsgroup.com/?t=105545175900001&r=1&w=2 I believe as indicated above that all cross subsystem registrations need a release / put callback. This would allow the release chain to be called from block -> ULD -> scsi core -> LLDD. Recently I have not been spending the proper time looking at this, but last look it appeared that we needed to add a release / put method call to the gendisk disk_release routine. The release function or object to do the put on would need to be set prior to the call to add_disk. -andmike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-02 8:43 ` Mike Anderson @ 2004-04-02 15:57 ` James Bottomley 2004-04-02 16:45 ` Mike Anderson 0 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2004-04-02 15:57 UTC (permalink / raw) To: Mike Anderson Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List, stern On Fri, 2004-04-02 at 03:43, Mike Anderson wrote: > I have looked at the sd issue off and on due to the previous open race > reported by Alan Stern. While the window can be narrowed inside SCSI you > need help for the calling subsystem to know when it is OK to cleanup and > your routine will not be called anymore. A similar problem also showed > up in the tear down the host directory entry in /proc/scsi but was only > fixed up so far due to its depreciated status. > > http://marc.theaimsgroup.com/?t=105545175900001&r=1&w=2 > > I believe as indicated above that all cross subsystem registrations need > a release / put callback. This would allow the release chain to be > called from block -> ULD -> scsi core -> LLDD. > > Recently I have not been spending the proper time looking at this, but > last look it appeared that we needed to add a release / put method call > to the gendisk disk_release routine. The release function or object to do > the put on would need to be set prior to the call to add_disk. Actually, I can't believe this problem to be local entirely to SCSI. So, a simpler mechanism (and more globally useful) might be to have a two phase driver release in sysfs: - the current ->remove would stay where it is (as a notify on device_del). On receving this the driver begins clean up enough to drop any internal references to the device it is holding. - then introduce a ->release which is called as part of dropping the last device reference where the driver cleans up any resources the driver was keeping to service the removed but not released device. Then, we'd obviously not call unregister_cdrom and kfree the scsi_cd structure until ->release time. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-02 15:57 ` James Bottomley @ 2004-04-02 16:45 ` Mike Anderson 2004-04-02 17:05 ` James Bottomley 0 siblings, 1 reply; 61+ messages in thread From: Mike Anderson @ 2004-04-02 16:45 UTC (permalink / raw) To: James Bottomley Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List, stern James Bottomley [James.Bottomley@steeleye.com] wrote: > > Recently I have not been spending the proper time looking at this, but > > last look it appeared that we needed to add a release / put method call > > to the gendisk disk_release routine. The release function or object to do > > the put on would need to be set prior to the call to add_disk. > > Actually, I can't believe this problem to be local entirely to SCSI. > So, a simpler mechanism (and more globally useful) might be to have a > two phase driver release in sysfs: I would agree this should be more general than SCSI, but if kobjects are being passed during the registration / setup in other subsystem interfaces than this could already be handled. > > - the current ->remove would stay where it is (as a notify on > device_del). On receving this the driver begins clean up enough to drop > any internal references to the device it is holding. > - then introduce a ->release which is called as part of dropping the > last device reference where the driver cleans up any resources the > driver was keeping to service the removed but not released device. > > Then, we'd obviously not call unregister_cdrom and kfree the scsi_cd > structure until ->release time. Maybe some clarification here as I am unsure if we both think there needs to be a notification (a put call) from outside SCSI. We have release functions available on most objects in SCSI now. The issue is that when we register (add_disk, dev_set_drvdata, etc.) or pass a handle to another subsystem we need a reference count agreement to know when the other subsystem is done with the the object. Something like the put_device(parent) used in scsi_host_dev_release. -andmike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-02 16:45 ` Mike Anderson @ 2004-04-02 17:05 ` James Bottomley 2004-04-02 17:44 ` Mike Anderson 0 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2004-04-02 17:05 UTC (permalink / raw) To: Mike Anderson Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List, stern On Fri, 2004-04-02 at 11:45, Mike Anderson wrote: > Maybe some clarification here as I am unsure if we both think there > needs to be a notification (a put call) from outside SCSI. We have > release functions available on most objects in SCSI now. The issue is > that when we register (add_disk, dev_set_drvdata, etc.) or pass a handle > to another subsystem we need a reference count agreement to know when > the other subsystem is done with the the object. Something like the > put_device(parent) used in scsi_host_dev_release. Actually, no, that's not the issue here, if I understand you. The reference counting model on the sdev->sdev_gendev seems to be working correctly because sr.c takes a reference to the sdev_gendev on open and drops it on close. The problem is that ULDs are implemented as struct device_drivers and as such, their ->remove method gets called *not* on last put of sdev_gendev but on device_del (when there are still active references). sr.c frees the cdinfo structure on ->remove, but still has its own reference to sdev_gendev (because the device is still open). On final close, the generic cdrom code tries to use cdinfo to close the device and references a kfree'd structure. Really what sr.c wants to be doing is freeing the cdinfo structure on last put, not on device_del. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-02 17:05 ` James Bottomley @ 2004-04-02 17:44 ` Mike Anderson 2004-04-02 18:13 ` James Bottomley 0 siblings, 1 reply; 61+ messages in thread From: Mike Anderson @ 2004-04-02 17:44 UTC (permalink / raw) To: James Bottomley Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List, stern James Bottomley [James.Bottomley@SteelEye.com] wrote: > On Fri, 2004-04-02 at 11:45, Mike Anderson wrote: > > Maybe some clarification here as I am unsure if we both think there > > needs to be a notification (a put call) from outside SCSI. We have > > release functions available on most objects in SCSI now. The issue is > > that when we register (add_disk, dev_set_drvdata, etc.) or pass a handle > > to another subsystem we need a reference count agreement to know when > > the other subsystem is done with the the object. Something like the > > put_device(parent) used in scsi_host_dev_release. > > Actually, no, that's not the issue here, if I understand you. The > reference counting model on the sdev->sdev_gendev seems to be working > correctly because sr.c takes a reference to the sdev_gendev on open and > drops it on close. > > The problem is that ULDs are implemented as struct device_drivers and as > such, their ->remove method gets called *not* on last put of sdev_gendev > but on device_del (when there are still active references). The remove can do as much or as little as the implementor wishes, but I believe there is still a under lying issue here (see comment below). > > sr.c frees the cdinfo structure on ->remove, but still has its own > reference to sdev_gendev (because the device is still open). On final > close, the generic cdrom code tries to use cdinfo to close the device > and references a kfree'd structure. Really what sr.c wants to be doing > is freeing the cdinfo structure on last put, not on device_del. > Where does the last put come from? How do you close the open race or know when the final put_disk has been called? SCSI cannot do this alone as we have created and registered an object in another subsystem (alloc_disk and add_disk) and we have no indication when that objects ref count has reached zero. Or As I previous stated in the thread below I have the gendisk / block layer locking mis-understood and there is something that SCSI can do. For reference you can look at this thread sent by Alan about a sd race. http://marc.theaimsgroup.com/?t=107591185800003&r=1&w=2 While freeing in sr could be rearranged more like what sd does there is still the issue of a cross subsystem put to know that the ULDs open function will not be called again. -andmike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-02 17:44 ` Mike Anderson @ 2004-04-02 18:13 ` James Bottomley 2004-04-02 23:40 ` Mike Anderson 0 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2004-04-02 18:13 UTC (permalink / raw) To: Mike Anderson Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List, stern On Fri, 2004-04-02 at 12:44, Mike Anderson wrote: > Where does the last put come from? How do you close the open race or > know when the final put_disk has been called? SCSI cannot do this alone > as we have created and registered an object in another subsystem > (alloc_disk and add_disk) and we have no indication when that objects > ref count has reached zero. > > Or > > As I previous stated in the thread below I have the gendisk / > block layer locking mis-understood and there is something that SCSI can > do. well, sr has elected to merge these, so it takes a reference to sdev_gendev on first open and releases it on last close of the block device. This is what ties the SCSI model into the final put_disk(). We founder on calling driver ->remove before the final put of sdev_gendev. Anything with objects in more than one refcounted subsystem is responsible for tying the refcounts together uniformly. There should be no open race as long as we error out correctly if a reference to the underlying sdev_gendev cannot be obtained (because the object is being destroyed). sr seems to do this correctly. The indication when the non-scsi object's refcount reaches zero is given to us because at that point the sr code does a put of the sdev_gendev (and if this is the last put, that should trigger cleanup). James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-02 18:13 ` James Bottomley @ 2004-04-02 23:40 ` Mike Anderson 2004-04-03 0:25 ` James Bottomley 0 siblings, 1 reply; 61+ messages in thread From: Mike Anderson @ 2004-04-02 23:40 UTC (permalink / raw) To: James Bottomley Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List, stern James Bottomley [James.Bottomley@SteelEye.com] wrote: > On Fri, 2004-04-02 at 12:44, Mike Anderson wrote: > > Where does the last put come from? How do you close the open race or > > know when the final put_disk has been called? SCSI cannot do this alone > > as we have created and registered an object in another subsystem > > (alloc_disk and add_disk) and we have no indication when that objects > > ref count has reached zero. > > > > Or > > > > As I previous stated in the thread below I have the gendisk / > > block layer locking mis-understood and there is something that SCSI can > > do. > > well, sr has elected to merge these, so it takes a reference to > sdev_gendev on first open and releases it on last close of the block > device. This is what ties the SCSI model into the final put_disk(). I think there is a little more to tie this together. see below. > > We founder on calling driver ->remove before the final put of > sdev_gendev. I do not believe we founder by calling, but what we do in the remove function. The way sd is split out seems to match what you think remove should do, but I do not have externally callable a release as it is handled inside sd. > > Anything with objects in more than one refcounted subsystem is > responsible for tying the refcounts together uniformly. > > There should be no open race as long as we error out correctly if a > reference to the underlying sdev_gendev cannot be obtained (because the > object is being destroyed). sr seems to do this correctly. The > indication when the non-scsi object's refcount reaches zero is given to > us because at that point the sr code does a put of the sdev_gendev (and > if this is the last put, that should trigger cleanup). Greg stopped by and after talking this over I think I see why sd is racing in its current form. The race happens when sd_remove and do_open race. Even though I do not like adding a lock_kernel it would appear adding on to sd_remove would serialize sd_remove and do_open. This would ensure either do_open's get_gendisk returns a gendisk struct and sd ref's are incremented or we will start cleaning up and sd_open will not be called. I would believe similar alignment in sr.c to what sd is doing plus agreement on the lock_kernel should fix both drivers. I think the "error out correctly" on trying to get a ref on sdev_gendev may need some higher serialization as I think there is a race on a release function starting and the reference count trying to be taken to 1 (i.e. you need something subsystem wide as you cannot look at the item you maybe deleting. -andmike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-02 23:40 ` Mike Anderson @ 2004-04-03 0:25 ` James Bottomley 2004-04-04 1:40 ` Alan Stern 0 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2004-04-03 0:25 UTC (permalink / raw) To: Mike Anderson Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List, stern On Fri, 2004-04-02 at 18:40, Mike Anderson wrote: > Greg stopped by and after talking this over I think I see why sd is > racing in its current form. The race happens when sd_remove and do_open > race. Even though I do not like adding a lock_kernel it would appear > adding on to sd_remove would serialize sd_remove and do_open. This would > ensure either do_open's get_gendisk returns a gendisk struct and sd > ref's are incremented or we will start cleaning up and sd_open will not > be called. > > I would believe similar alignment in sr.c to what sd is doing plus > agreement on the lock_kernel should fix both drivers. > > I think the "error out correctly" on trying to get a ref on sdev_gendev > may need some higher serialization as I think there is a race on a release > function starting and the reference count trying to be taken to 1 (i.e. > you need something subsystem wide as you cannot look at the item you > maybe deleting. I'm not convinced we need any other serialisation. As long as we get the reference we may use the device (of course if the device is being removed and sd_remove is being called, then I/O to it will fail, but I think that's fine). Leaving the user with an open device that won't accept I/O is also fine (as long as it returns the correct error codes). Could you outline what the consequences of this race are? James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-03 0:25 ` James Bottomley @ 2004-04-04 1:40 ` Alan Stern 2004-04-04 15:23 ` James Bottomley 0 siblings, 1 reply; 61+ messages in thread From: Alan Stern @ 2004-04-04 1:40 UTC (permalink / raw) To: James Bottomley Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On 2 Apr 2004, James Bottomley wrote: > On Fri, 2004-04-02 at 18:40, Mike Anderson wrote: > > Greg stopped by and after talking this over I think I see why sd is > > racing in its current form. The race happens when sd_remove and do_open > > race. Even though I do not like adding a lock_kernel it would appear > > adding on to sd_remove would serialize sd_remove and do_open. This would > > ensure either do_open's get_gendisk returns a gendisk struct and sd > > ref's are incremented or we will start cleaning up and sd_open will not > > be called. > > > > I would believe similar alignment in sr.c to what sd is doing plus > > agreement on the lock_kernel should fix both drivers. > > > > I think the "error out correctly" on trying to get a ref on sdev_gendev > > may need some higher serialization as I think there is a race on a release > > function starting and the reference count trying to be taken to 1 (i.e. > > you need something subsystem wide as you cannot look at the item you > > maybe deleting. > > I'm not convinced we need any other serialisation. As long as we get > the reference we may use the device (of course if the device is being > removed and sd_remove is being called, then I/O to it will fail, but I > think that's fine). Leaving the user with an open device that won't > accept I/O is also fine (as long as it returns the correct error codes). > > Could you outline what the consequences of this race are? Without having looked recently in any detail at the specific code in sd.c or sr.c, I want to comment in general terms on the nature of the open/disconnect race. It's a generic problem that affects every driver whose devices can be opened through the filesystem. It _cannot_ be solved by adding any sort of lock to the device structure. This is because the device structure is not available to the open() routine until it has followed some pointer from the major/minor device table entry, and the disconnect() routine will erase the pointer. That (or someplace equivalent) is where the race occurs. The problem _can_ be solved by introducing a lock higher up, such as at the driver level or at the bus level. (A kernel lock would work too but it would be extravagantly excessive.) For example, the bus subsystem rwsem in the driver model prevents analogous problems there. But you don't want to get a read lock on a bus-wide semaphore every time your open() procedure runs! A driver-wide lock makes a good solution. Another possible solution would be to have disconnect() perform an RCU update to the device pointer. I haven't seen any code that does this, but I think it ought to work. Alan Stern ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-04 1:40 ` Alan Stern @ 2004-04-04 15:23 ` James Bottomley 2004-04-04 16:46 ` Alan Stern 2004-04-04 18:16 ` David Brownell 0 siblings, 2 replies; 61+ messages in thread From: James Bottomley @ 2004-04-04 15:23 UTC (permalink / raw) To: Alan Stern Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Sat, 2004-04-03 at 20:40, Alan Stern wrote: > Without having looked recently in any detail at the specific code in sd.c > or sr.c, I want to comment in general terms on the nature of the > open/disconnect race. > > It's a generic problem that affects every driver whose devices can be > opened through the filesystem. It _cannot_ be solved by adding any sort > of lock to the device structure. This is because the device structure is > not available to the open() routine until it has followed some pointer > from the major/minor device table entry, and the disconnect() routine will > erase the pointer. That (or someplace equivalent) is where the race > occurs. OK, your "problem" definition is that "there's a race", which I agree with, I just don't agree that it's a problem. Disconnections are fundamentally asynchronous events (a device may be disconnected by the user at any stage regardless of what any kernel internal state model is doing). Trying to impose synchronisation on asynchronous events is asking for trouble. In the open race scenario, either the open is refused or the user gets a fd that cannot do anything (because the device isn't there) and simply returns errors to all operations. Both cases are correct, so who wins the race is irrelevant. Let me illustrate: the user may disconnect the device then open it. If they open it before even the USB subsystem gets notified of the disconnection then all the elaborate synchronisation in the world isn't going to be able to prevent that (the device was gone when they opened it, just nothing in the kernel knew that). Since we cannot solve that race, there's no reason to try to solve the "some parts of the kernel know but others don't" part of the race. James > The problem _can_ be solved by introducing a lock higher up, such as at > the driver level or at the bus level. (A kernel lock would work too but > it would be extravagantly excessive.) For example, the bus subsystem > rwsem in the driver model prevents analogous problems there. But you > don't want to get a read lock on a bus-wide semaphore every time your > open() procedure runs! A driver-wide lock makes a good solution. > > Another possible solution would be to have disconnect() perform an RCU > update to the device pointer. I haven't seen any code that does this, but > I think it ought to work. > > Alan Stern > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-04 15:23 ` James Bottomley @ 2004-04-04 16:46 ` Alan Stern 2004-04-04 17:04 ` James Bottomley 2004-04-04 18:16 ` David Brownell 1 sibling, 1 reply; 61+ messages in thread From: Alan Stern @ 2004-04-04 16:46 UTC (permalink / raw) To: James Bottomley Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On 4 Apr 2004, James Bottomley wrote: > OK, your "problem" definition is that "there's a race", which I agree > with, I just don't agree that it's a problem. > > Disconnections are fundamentally asynchronous events (a device may be > disconnected by the user at any stage regardless of what any kernel > internal state model is doing). Trying to impose synchronisation on > asynchronous events is asking for trouble. > > In the open race scenario, either the open is refused or the user gets a > fd that cannot do anything (because the device isn't there) and simply > returns errors to all operations. Both cases are correct, so who wins > the race is irrelevant. Ah, you have left out the third, bad alternative: open succeeds, user gets an fd that points to a deallocated device. More details below... > Let me illustrate: the user may disconnect the device then open it. If > they open it before even the USB subsystem gets notified of the > disconnection then all the elaborate synchronisation in the world isn't > going to be able to prevent that (the device was gone when they opened > it, just nothing in the kernel knew that). Since we cannot solve that > race, there's no reason to try to solve the "some parts of the kernel > know but others don't" part of the race. I agree with everything except your conclusion. :-) Just because some outcomes of the race lead to a benign result no matter what, that doesn't mean all outcomes will or that we can ignore the race. Let's consider a simple example, one that doesn't have all the complexities of the sr driver with its multiple driver layers. The usb-skeleton program in drivers/usb makes a good illustration; I added a semaphore to it some time ago to protect against exactly this sort of race. Without that semaphore, here's what can happen: Open process: Disconnect process: Get minor number from inode Lookup USB interface using minor number Get device pointer from the interface's private data and check it's not NULL Get device pointer from the interface's private data Set the private data to NULL Lock the device sem Unregister the minor number Terminate ongoing I/O operations Clear the device->present flag Unlock the device Since the open count is 0, deallocate the device structure Lock the device sem --> oops The idea is that at some stage the open process has got far enough along to believe the device exists, but not far enough to hold a reference to it. (That's inevitable, since you can't try to acquire a reference until you're sure the device does exist.) If the disconnect process deallocates the device at that time, there will be trouble. Alan Stern ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-04 16:46 ` Alan Stern @ 2004-04-04 17:04 ` James Bottomley 2004-04-05 3:17 ` Alan Stern 2004-04-05 13:30 ` [linux-usb-devel] " Oliver Neukum 0 siblings, 2 replies; 61+ messages in thread From: James Bottomley @ 2004-04-04 17:04 UTC (permalink / raw) To: Alan Stern Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Sun, 2004-04-04 at 12:46, Alan Stern wrote: > Ah, you have left out the third, bad alternative: open succeeds, user gets > an fd that points to a deallocated device. More details below... No, that would be a bug. I'm looking for evidence that such a bug exists in sd. > Let's consider a simple example, one that doesn't have all the > complexities of the sr driver with its multiple driver layers. The > usb-skeleton program in drivers/usb makes a good illustration; I added a > semaphore to it some time ago to protect against exactly this sort of > race. Without that semaphore, here's what can happen: > > Open process: Disconnect process: > > Get minor number from inode > Lookup USB interface using > minor number > Get device pointer from the > interface's private data > and check it's not NULL This is what's wrong. Here you should get a reference to the device pointer (that's what scsi_device_get() actually does for us). If the ref getting routine comes back with an error, you may not proceed. If it comes back with a device, you own a reference to it and may go on (even if the device is now gone, the structure will behave correctly). > Get device pointer from the > interface's private data > Set the private data to NULL > Lock the device sem > Unregister the minor number > Terminate ongoing I/O operations > Clear the device->present flag > Unlock the device > Since the open count is 0, > deallocate the device structure You're not obeying the object lifetime rules here. The device may be gone but the device structure must stay around (obviously in a special state) until the last reference is dropped. Only after that may you start nulling things out and killing it. > Lock the device sem --> oops This is because you're not using refcounted objects correctly. If anyone can find a similar bug in the SCSI ULD's, I'll fix it, that's why I asked for an example in sd (or sr with the current fixes) way back in this thread. James ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-04 17:04 ` James Bottomley @ 2004-04-05 3:17 ` Alan Stern 2004-04-05 14:59 ` Mike Anderson ` (2 more replies) 2004-04-05 13:30 ` [linux-usb-devel] " Oliver Neukum 1 sibling, 3 replies; 61+ messages in thread From: Alan Stern @ 2004-04-05 3:17 UTC (permalink / raw) To: James Bottomley Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On 4 Apr 2004, James Bottomley wrote: > On Sun, 2004-04-04 at 12:46, Alan Stern wrote: > > Ah, you have left out the third, bad alternative: open succeeds, user gets > > an fd that points to a deallocated device. More details below... > > No, that would be a bug. Of course it would! That's exactly what this thread is about: bugs caused by improper handling of open/disconnect races. > I'm looking for evidence that such a bug exists > in sd. Put up or shut up, eh? Okay... > This is what's wrong. Here you should get a reference to the device > pointer (that's what scsi_device_get() actually does for us). That's too vague; I can't tell what you're referring to. In sd.c there's the struct scsi_device, the struct scsi_disk, the struct gendisk, and their embedded struct devices and struct kobjects. Maybe you mean scsi_disk_get(), the one place in sd.c that calls scsi_device_get(). I would say that it acquires a reference to the device, not to the device pointer (whatever that might mean). But let's not dwell on this point. > If the > ref getting routine comes back with an error, you may not proceed. If > it comes back with a device, you own a reference to it and may go on > (even if the device is now gone, the structure will behave correctly). What if the ref getting routine causes an oops because the device has been deallocated? I know, you'll say that should never happen... See below. > You're not obeying the object lifetime rules here. The device may be > gone but the device structure must stay around (obviously in a special > state) until the last reference is dropped. Only after that may you > start nulling things out and killing it. In the example I gave, the device structure _did_ stay around until the last reference was dropped. The problem was that _another_ reference was in the middle of being acquired at the time. > If anyone can find a similar bug in the SCSI ULD's, I'll fix it, that's > why I asked for an example in sd (or sr with the current fixes) way back > in this thread. All right, let's look at sd.c. I'll show you that _it_ doesn't obey the object lifetime rules. In sd_open we see this code (lightly edited): static int sd_open(struct inode *inode, struct file *filp) { struct gendisk *disk = inode->i_bdev->bd_disk; struct scsi_disk *sdkp = scsi_disk(disk); struct scsi_device *sdev; int retval; retval = scsi_disk_get(sdkp); if (retval) return retval; sdev = sdkp->device; As it turns out, the block layer guarantees that when sd_open runs the bd_disk pointer will be valid. It does this by following the pattern I mentioned in an earlier message -- drivers/base/map.c uses a subsystem-wide semaphore, domain_sem, to properly synchronize lookups and deletes. Next, the scsi_disk inline function returns: container_of(disk->private_data, struct scsi_disk, driver); How do you know that the scsi_disk pointed to by disk->private_data still exists? So far as I can see, the gendisk doesn't take any references to it. Correct me if I'm wrong, but there doesn't seem to be anything preventing a disconnect event from arriving after the open() call has got a valid reference to the gendisk, and succeeding in deallocating the scsi_disk before this code executes. There's only one reference between the scsi_disk and the gendisk, and it goes the wrong way: the scsi_disk owns a reference to the gendisk. But let's suppose that works okay, so sdkp is a valid pointer. Then the code calls scsi_disk_get(), which in turn calls scsi_device_get() for sdkp->device. How do you know that this doesn't point to deallocated storage? The only reference to the scsi_device is taken (in a rather convoluted way) by the gendisk, and it is dropped during del_gendisk() -- not when the gendisk is released. Hence it is entirely possible for a disconnect event to have freed the scsi_device when this code executes. There's two potential oopses for you. I don't have a full grasp of the web of interlocking references (and interlocking code) in the SCSI, gendisk, and block layers, but it seems likely that at least one of these might actually happen. The object lifetime rules require that in your disconnect() routine, you must tell all your users that your structure is going away, but you must not free the structure until your users have notified you that they won't try to use it any more. When the scsi_disk is on its way out, sd.c tells the gendisk but doesn't wait for a notification in return. When the scsi_device is on its way out the SCSI core tells sd.c, but sd.c doesn't send back its notification at the right time. Alan Stern ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-05 3:17 ` Alan Stern @ 2004-04-05 14:59 ` Mike Anderson 2004-04-05 21:27 ` James Bottomley 2004-04-05 22:10 ` Patrick Mansfield 2 siblings, 0 replies; 61+ messages in thread From: Mike Anderson @ 2004-04-05 14:59 UTC (permalink / raw) To: Alan Stern Cc: James Bottomley, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List PS, I am traveling today so future comments will be delayed a bit. Alan Stern [stern@rowland.harvard.edu] wrote: > All right, let's look at sd.c. I'll show you that _it_ doesn't obey the > object lifetime rules. In sd_open we see this code (lightly edited): > > > static int sd_open(struct inode *inode, struct file *filp) > { > struct gendisk *disk = inode->i_bdev->bd_disk; > struct scsi_disk *sdkp = scsi_disk(disk); > struct scsi_device *sdev; > int retval; > > retval = scsi_disk_get(sdkp); > if (retval) > return retval; > > sdev = sdkp->device; > > > As it turns out, the block layer guarantees that when sd_open runs the > bd_disk pointer will be valid. It does this by following the pattern I > mentioned in an earlier message -- drivers/base/map.c uses a > subsystem-wide semaphore, domain_sem, to properly synchronize lookups and > deletes. > > Next, the scsi_disk inline function returns: > > container_of(disk->private_data, struct scsi_disk, driver); > > How do you know that the scsi_disk pointed to by disk->private_data still > exists? So far as I can see, the gendisk doesn't take any references to > it. Correct me if I'm wrong, but there doesn't seem to be anything > preventing a disconnect event from arriving after the open() call has got > a valid reference to the gendisk, and succeeding in deallocating the > scsi_disk before this code executes. There's only one reference between > the scsi_disk and the gendisk, and it goes the wrong way: the scsi_disk > owns a reference to the gendisk. > > But let's suppose that works okay, so sdkp is a valid pointer. Then > the code calls scsi_disk_get(), which in turn calls scsi_device_get() for > sdkp->device. How do you know that this doesn't point to deallocated > storage? The only reference to the scsi_device is taken (in a rather > convoluted way) by the gendisk, and it is dropped during del_gendisk() -- > not when the gendisk is released. Hence it is entirely possible for a > disconnect event to have freed the scsi_device when this code executes. > > > There's two potential oopses for you. I don't have a full grasp of the > web of interlocking references (and interlocking code) in the SCSI, > gendisk, and block layers, but it seems likely that at least one of > these might actually happen. > > The object lifetime rules require that in your disconnect() routine, you > must tell all your users that your structure is going away, but you must > not free the structure until your users have notified you that they won't > try to use it any more. When the scsi_disk is on its way out, sd.c tells > the gendisk but doesn't wait for a notification in return. When the > scsi_device is on its way out the SCSI core tells sd.c, but sd.c doesn't > send back its notification at the right time. As I previously stated there is no notification back from the block layer that users are complete with a structure. Currently the only method to prevent an oops looks like sd_remove would need to use lock_kernel so that scsi could ensure that a user would be through open which means the sd_remove would not take references to zero or the user has not made it far enough through do open that they have received a gendisk structure so that del_gendisk will ensure they do not call sd_open. I would like another method, but this looks like it would need to be another shared sync mechanism between scsi layer (an other blk interfaces) and block layer or a lookup method similar to kobj_lookup in scsi open routines so that a object can be unmapped atomically. -andmike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-05 3:17 ` Alan Stern 2004-04-05 14:59 ` Mike Anderson @ 2004-04-05 21:27 ` James Bottomley 2004-04-06 14:00 ` Alan Stern 2004-04-05 22:10 ` Patrick Mansfield 2 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2004-04-05 21:27 UTC (permalink / raw) To: Alan Stern Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Sun, 2004-04-04 at 22:17, Alan Stern wrote: > On 4 Apr 2004, James Bottomley wrote: > > On Sun, 2004-04-04 at 12:46, Alan Stern wrote: > > > Ah, you have left out the third, bad alternative: open succeeds, user gets > > > an fd that points to a deallocated device. More details below... > > > > No, that would be a bug. > > Of course it would! That's exactly what this thread is about: bugs caused > by improper handling of open/disconnect races. So you're quietly shifting your ground away from this assertion: On Sat, 2004-04-03 at 19:40, Alan Stern wrote: > The problem _can_ be solved by introducing a lock higher up, such as at > the driver level or at the bus level. (A kernel lock would work too but > it would be extravagantly excessive.) For example, the bus subsystem > rwsem in the driver model prevents analogous problems there. But you > don't want to get a read lock on a bus-wide semaphore every time your > open() procedure runs! A driver-wide lock makes a good solution. > > Another possible solution would be to have disconnect() perform an RCU > update to the device pointer. I haven't seen any code that does this, but > I think it ought to work. ? My contention is that the races can be solved by proper refcounting (without the need for locks and RCUs) not that we don't have any bugs in sd.c (I'll be happy if I can pull them all out of sr at the moment). James ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-05 21:27 ` James Bottomley @ 2004-04-06 14:00 ` Alan Stern 0 siblings, 0 replies; 61+ messages in thread From: Alan Stern @ 2004-04-06 14:00 UTC (permalink / raw) To: James Bottomley Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On 5 Apr 2004, James Bottomley wrote: > On Sun, 2004-04-04 at 22:17, Alan Stern wrote: > > Of course it would! That's exactly what this thread is about: bugs caused > > by improper handling of open/disconnect races. > > So you're quietly shifting your ground away from this assertion: > > On Sat, 2004-04-03 at 19:40, Alan Stern wrote: > > The problem _can_ be solved by introducing a lock higher up, such as at > > the driver level or at the bus level. (A kernel lock would work too but > > it would be extravagantly excessive.) For example, the bus subsystem > > rwsem in the driver model prevents analogous problems there. But you > > don't want to get a read lock on a bus-wide semaphore every time your > > open() procedure runs! A driver-wide lock makes a good solution. > > > > Another possible solution would be to have disconnect() perform an RCU > > update to the device pointer. I haven't seen any code that does this, but > > I think it ought to work. > > ? Not at all. Open/disconnect races can appear at several points in a driver. The earlier assertion referred to one of those points: the place where the code gets a pointer to a device structure from an open inode's private data. > My contention is that the races can be solved by proper refcounting > (without the need for locks and RCUs) not that we don't have any bugs in > sd.c (I'll be happy if I can pull them all out of sr at the moment). Your contention is correct, I agree. However, not all parts of the kernel implement proper refcounting and, equally important, proper delete-then-release notifications. In particular, the VFS layer doesn't. If a driver notifies VFS that a device has been disconnected so that the private data in the corresponding /dev inode entry can be removed, the driver does _not_ receive a release notification in return when the last process actively using that inode has finished with it. Given the lack of proper adherence to the object-lifetime rules in VFS, a driver has no choice but to adopt a scheme similar to what I outlined in the section you quoted above. _That's_ the manifestation of the open/disconnect race I was referring to. (It's different from the problems in sd.c, incidentally.) Alan Stern ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-05 3:17 ` Alan Stern 2004-04-05 14:59 ` Mike Anderson 2004-04-05 21:27 ` James Bottomley @ 2004-04-05 22:10 ` Patrick Mansfield 2004-04-06 14:10 ` Alan Stern 2004-04-08 14:09 ` Alan Stern 2 siblings, 2 replies; 61+ messages in thread From: Patrick Mansfield @ 2004-04-05 22:10 UTC (permalink / raw) To: Alan Stern Cc: James Bottomley, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Sun, Apr 04, 2004 at 11:17:55PM -0400, Alan Stern wrote: > As it turns out, the block layer guarantees that when sd_open runs the > bd_disk pointer will be valid. It does this by following the pattern I > mentioned in an earlier message -- drivers/base/map.c uses a > subsystem-wide semaphore, domain_sem, to properly synchronize lookups and > deletes. > > Next, the scsi_disk inline function returns: > > container_of(disk->private_data, struct scsi_disk, driver); > > How do you know that the scsi_disk pointed to by disk->private_data still > exists? So far as I can see, the gendisk doesn't take any references to > it. Correct me if I'm wrong, but there doesn't seem to be anything > preventing a disconnect event from arriving after the open() call has got > a valid reference to the gendisk, and succeeding in deallocating the > scsi_disk before this code executes. There's only one reference between > the scsi_disk and the gendisk, and it goes the wrong way: the scsi_disk > owns a reference to the gendisk. > > But let's suppose that works okay, so sdkp is a valid pointer. Then > the code calls scsi_disk_get(), which in turn calls scsi_device_get() for > sdkp->device. How do you know that this doesn't point to deallocated > storage? The only reference to the scsi_device is taken (in a rather > convoluted way) by the gendisk, and it is dropped during del_gendisk() -- > not when the gendisk is released. Hence it is entirely possible for a > disconnect event to have freed the scsi_device when this code executes. > > > There's two potential oopses for you. I don't have a full grasp of the > web of interlocking references (and interlocking code) in the SCSI, > gendisk, and block layers, but it seems likely that at least one of > these might actually happen. Yes. If we are in middle opening an sd, and are anywhere between the kobj_lookup having completed call to up_read(domain->sem) [called via do_open calling get_gendisk] and sd_open calling scsi_device_get(), and separately an sd_remove call completes [freeing the scsi_disk and the scsi_device] there could be an oops. Much as I hate it, simply adding a lock_kernel in sd_remove would solve the problem, since do_open uses lock_kernel. And I don't see why the kobject_get/put use an atomic_inc/dec, it just obscures the need for a lock. We already have a reference to the kobject (it's passed as an argument). The calls to the get/put must either be protected with a lock or other method, or we have already gotten a reference to the object and getting another is meaningless. -- Patrick Mansfield ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-05 22:10 ` Patrick Mansfield @ 2004-04-06 14:10 ` Alan Stern 2004-04-08 14:09 ` Alan Stern 1 sibling, 0 replies; 61+ messages in thread From: Alan Stern @ 2004-04-06 14:10 UTC (permalink / raw) To: Patrick Mansfield Cc: James Bottomley, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Mon, 5 Apr 2004, Patrick Mansfield wrote: > Yes. > > If we are in middle opening an sd, and are anywhere between the > kobj_lookup having completed call to up_read(domain->sem) [called via > do_open calling get_gendisk] and sd_open calling scsi_device_get(), and > separately an sd_remove call completes [freeing the scsi_disk and the > scsi_device] there could be an oops. > > Much as I hate it, simply adding a lock_kernel in sd_remove would solve > the problem, since do_open uses lock_kernel. There must be a better solution, although I can't recommend anything offhand. It might require changing the genhd interface somewhat. > And I don't see why the kobject_get/put use an atomic_inc/dec, it just > obscures the need for a lock. We already have a reference to the kobject > (it's passed as an argument). The calls to the get/put must either be > protected with a lock or other method, or we have already gotten a > reference to the object and getting another is meaningless. This is really a matter of definitions. The kobject/driver-model people use "reference" to mean, very specifically, an incremented refcounter. They do not (so far as I can tell) intend the more general meaning, i.e., any sort of pointer to the object. So yes, the call to kobject_get passes as an argument a "reference" in the general sense, and the call returns a "reference" in the more specific sense. The point is, the refcounting model is meant only to guarantee that an object will not be deallocated when it has any outstanding "references" in the specific sense -- i.e., when its refcount is positive. It does _not_ guarantee that there are no pointers to the object lurking somewhere else in the kernel; that job is up to the driver's author. With careful programming it can be done. The code in sd.c is not sufficiently careful. Alan Stern ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-05 22:10 ` Patrick Mansfield 2004-04-06 14:10 ` Alan Stern @ 2004-04-08 14:09 ` Alan Stern 2004-04-08 16:24 ` Matt Gulick 1 sibling, 1 reply; 61+ messages in thread From: Alan Stern @ 2004-04-08 14:09 UTC (permalink / raw) To: Patrick Mansfield Cc: James Bottomley, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Mon, 5 Apr 2004, Patrick Mansfield wrote: > Much as I hate it, simply adding a lock_kernel in sd_remove would solve > the problem, since do_open uses lock_kernel. On Fri, 2 Apr 2004, Mike Anderson wrote: > I believe as indicated above that all cross subsystem registrations need > a release / put callback. This would allow the release chain to be > called from block -> ULD -> scsi core -> LLDD. > > Recently I have not been spending the proper time looking at this, but > last look it appeared that we needed to add a release / put method call > to the gendisk disk_release routine. The release function or object to do > the put on would need to be set prior to the call to add_disk. On Mon, 5 Apr 2004, Mike Anderson wrote: > As I previously stated there is no notification back from the block > layer that users are complete with a structure. Currently the only > method to prevent an oops looks like sd_remove would need to use > lock_kernel so that scsi could ensure that a user would be through open > which means the sd_remove would not take references to zero or the user > has not made it far enough through do open that they have received a > gendisk structure so that del_gendisk will ensure they do not call > sd_open. > > I would like another method, but this looks like it would need to be > another shared sync mechanism between scsi layer (an other blk > interfaces) and block layer or a lookup method similar to kobj_lookup > in scsi open routines so that a object can be unmapped atomically. Mike's first comment was exactly right. If a pointer to a kobject (or kref) were added to the genhd structure, and if the genhd release routine would do a put() on this pointer, then there would be no problem. The pointer could be set to the scsi_disk's embedded kobject, and then the scsi_disk structure would be guaranteed to exist when it is dereferenced in sd_open(). Likewise, if the scsi_disk took a reference to its scsi_device and dropped that reference during its release routine, that would cure the other potential oops I identified earlier. In neither case would a kernel lock be needed. This discussion has stimulated some ideas about the driver model in general. It turns out that many drivers take a struct device from one subsystem and present it (in different form) to another subsystem. Some examples: A SCSI host adapter driver takes a PCI device (or in the case of usb-storage, a USB device interface) and presents it to the SCSI core as a host adapter. The sd driver takes a SCSI device and presents it to the genhd layer as a disk drive. The USB UHCI driver takes a PCI device and presents it to the USB core as a USB bus. In each case there are two driver-model structures involved: the parent which belongs to the upper subsystem, and the child which belongs to the lower subsystem. No driver-model entity belongs to the intermediate driver! As a result, when a disconnect occurs the driver has no way to know when the lower subsystem is finished using the device: The SCSI core does not notify the LLDD in any way following scsi_remove_host(). The genhd layer doesn't inform sd.c when then genhd structure is released. usbcore _does_ notify the HC drivers via a callback when a bus is released; it's an exception. There is implicitly _a_ notification, because when the child device is released it drops its reference to the parent (I still think this would be better if it were done differently, but never mind). However that doesn't do the intermediate driver any good, because the driver doesn't own the parent! The parent is owned by the higher subsystem. A general solution is to have the lower subsystem take a pointer to a kobject (or even a kref) belonging to the intermediate driver, and have it drop its reference when the child device is released. This approach is just as flexible as the callback made by usbcore. It could even be formalized as part of the driver model, although I'm not at all certain this would be a good idea. The concept is simple enough: A struct device (or maybe a struct kobject) would have as an additional member a pointer to a struct kobject (or maybe to a struct kref). Call this additional member "master". During the device release (or kobject cleanup) the core would do a put on the master pointer if it is set. It's not clear that a sufficiently high percentage of all kobjects follow this intermediate-driver pattern for it to be worthwhile adding the master pointer. But individual systems can still implement it on their own, and they should. Alan Stern ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-08 14:09 ` Alan Stern @ 2004-04-08 16:24 ` Matt Gulick 2004-04-08 18:33 ` Alan Stern 0 siblings, 1 reply; 61+ messages in thread From: Matt Gulick @ 2004-04-08 16:24 UTC (permalink / raw) To: Alan Stern Cc: Patrick Mansfield, James Bottomley, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List > There is implicitly _a_ notification, because when the child device is > released it drops its reference to the parent (I still think this would be > better if it were done differently, but never mind). However that doesn't > do the intermediate driver any good, because the driver doesn't own the > parent! The parent is owned by the higher subsystem. > > A general solution is to have the lower subsystem take a pointer to a > kobject (or even a kref) belonging to the intermediate driver, and have it > drop its reference when the child device is released. This approach is > just as flexible as the callback made by usbcore. > > It could even be formalized as part of the driver model, although I'm not > at all certain this would be a good idea. The concept is simple enough: A > struct device (or maybe a struct kobject) would have as an additional > member a pointer to a struct kobject (or maybe to a struct kref). Call > this additional member "master". During the device release (or kobject > cleanup) the core would do a put on the master pointer if it is set. > > It's not clear that a sufficiently high percentage of all kobjects follow > this intermediate-driver pattern for it to be worthwhile adding the master > pointer. But individual systems can still implement it on their own, and > they should. OK, Silly question or maybe not. When writing drivers for MacOS ( 7-9 & X) and Windose (98 - XP) and when I architected the USB 2.0 stack at Adaptec for 98SE, ME & 2k, we solved this issue with a simple heart beat task. Every so often (1-3 seconds) any device that was at risk of removal would receive a TEST UNIT READY cdb. Using the model of 1394, USB, ... being treated as a device with no media inserted (like a CD drive is treated), then you can query the device for media availability. Using the USB model of 7 tiers of devices and most hubs having 4 ports (7 port hubs are just two 4 port hubs internally connected) you can have way more than 15 SCSI ID's. By treating each USB as having its own ID (EHCI USB chips typically have three USB identities of 1 EHCI and 2 OHCI interfaces) and the devices on that bus that are mass storage class devices using SBP-2 or SBP-3 would be a LUN on that device. By treating each bus as a virtual device, the main struct can be static with LUN children added or removed as needed. Any thoughts on this? Matt ---------------------------------------- Matt Gulick Sr. Staff Engineer Adaptec, Inc. gulickconsulting@direcway.com matt_gulick@adaptec.com (715) 426-0884 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-08 16:24 ` Matt Gulick @ 2004-04-08 18:33 ` Alan Stern 2004-04-08 19:44 ` Matt Gulick 0 siblings, 1 reply; 61+ messages in thread From: Alan Stern @ 2004-04-08 18:33 UTC (permalink / raw) To: Matt Gulick Cc: Patrick Mansfield, James Bottomley, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Thu, 8 Apr 2004, Matt Gulick wrote: > OK, Silly question or maybe not. > > When writing drivers for MacOS ( 7-9 & X) and Windose (98 - XP) and when > I architected the USB 2.0 stack at Adaptec for 98SE, ME & 2k, we solved > this issue with a simple heart beat task. > > Every so often (1-3 seconds) any device that was at risk of removal > would receive a TEST UNIT READY cdb. > > Using the model of 1394, USB, ... being treated as a device with no > media inserted (like a CD drive is treated), then you can query the > device for media availability. > > Using the USB model of 7 tiers of devices and most hubs having 4 ports > (7 port hubs are just two 4 port hubs internally connected) you can have > way more than 15 SCSI ID's. By treating each USB as having its own ID > (EHCI USB chips typically have three USB identities of 1 EHCI and 2 OHCI > interfaces) and the devices on that bus that are mass storage class > devices using SBP-2 or SBP-3 would be a LUN on that device. > > By treating each bus as a virtual device, the main struct can be static > with LUN children added or removed as needed. > > Any thoughts on this? > > Matt I think you're talking about a different problem. Sending heartbeats solves the problem of detecting media availability and device availability. It doesn't solve the problem we're discussing here, which is how to tear down the device driver stack without causing any errors, particularly if the user tries to access the device while the stack is being deconstructed. Alan Stern ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-08 18:33 ` Alan Stern @ 2004-04-08 19:44 ` Matt Gulick 0 siblings, 0 replies; 61+ messages in thread From: Matt Gulick @ 2004-04-08 19:44 UTC (permalink / raw) To: Alan Stern Cc: Patrick Mansfield, James Bottomley, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Thu, 2004-04-08 at 13:33, Alan Stern wrote: > On Thu, 8 Apr 2004, Matt Gulick wrote: > > > OK, Silly question or maybe not. > > > > When writing drivers for MacOS ( 7-9 & X) and Windose (98 - XP) and > when > > I architected the USB 2.0 stack at Adaptec for 98SE, ME & 2k, we > solved > > this issue with a simple heart beat task. > > > > Every so often (1-3 seconds) any device that was at risk of removal > > would receive a TEST UNIT READY cdb. > > > > Using the model of 1394, USB, ... being treated as a device with no > > media inserted (like a CD drive is treated), then you can query the > > device for media availability. > > > > Using the USB model of 7 tiers of devices and most hubs having 4 > ports > > (7 port hubs are just two 4 port hubs internally connected) you can > have > > way more than 15 SCSI ID's. By treating each USB as having its own > ID > > (EHCI USB chips typically have three USB identities of 1 EHCI and 2 > OHCI > > interfaces) and the devices on that bus that are mass storage class > > devices using SBP-2 or SBP-3 would be a LUN on that device. > > > > By treating each bus as a virtual device, the main struct can be > static > > with LUN children added or removed as needed. > > > > Any thoughts on this? > > > > Matt > > I think you're talking about a different problem. Sending heartbeats > solves the problem of detecting media availability and device > availability. It doesn't solve the problem we're discussing here, > which > is how to tear down the device driver stack without causing any > errors, > particularly if the user tries to access the device while the stack is > being deconstructed. > > Alan Stern True. This only mitigates the need for the SCSI subsystem from having to release device structures that might be used for logical SCSI Bus housekeeping. I will have to dig into the Linux architecture model for SCSI to put this in Linux form vs what is done elsewhere. I'll be back. ;-) Matt ---------------------------------------- Matt Gulick Sr. Staff Engineer Adaptec, Inc. gulickconsulting@direcway.com matt_gulick@adaptec.com (715) 426-0884 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [linux-usb-devel] Re: bug 2400 2004-04-04 17:04 ` James Bottomley 2004-04-05 3:17 ` Alan Stern @ 2004-04-05 13:30 ` Oliver Neukum 1 sibling, 0 replies; 61+ messages in thread From: Oliver Neukum @ 2004-04-05 13:30 UTC (permalink / raw) To: James Bottomley, Alan Stern Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List > > Open process: Disconnect process: > > > > Get minor number from inode > > Lookup USB interface using > > minor number > > Get device pointer from the > > interface's private data > > and check it's not NULL > > This is what's wrong. Here you should get a reference to the device > pointer (that's what scsi_device_get() actually does for us). If the > ref getting routine comes back with an error, you may not proceed. If > it comes back with a device, you own a reference to it and may go on > (even if the device is now gone, the structure will behave correctly). You are correct, if and only if the code doing the lookup and getting the reference is atomic with respect to freeing the data structure. Which lock is supposed to assure that? Regards Oliver ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [linux-usb-devel] Re: bug 2400 2004-04-04 15:23 ` James Bottomley 2004-04-04 16:46 ` Alan Stern @ 2004-04-04 18:16 ` David Brownell 2004-04-04 18:42 ` James Bottomley 1 sibling, 1 reply; 61+ messages in thread From: David Brownell @ 2004-04-04 18:16 UTC (permalink / raw) To: James Bottomley Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List James Bottomley wrote: > Let me illustrate: the user may disconnect the device then open it. If > they open it before even the USB subsystem gets notified of the > disconnection then all the elaborate synchronisation in the world isn't > going to be able to prevent that (the device was gone when they opened > it, just nothing in the kernel knew that). Since we cannot solve that > race, there's no reason to try to solve the "some parts of the kernel > know but others don't" part of the race. You're assuming that synchronization is there to establish a single global notion of state. Clearly that's impossible; also undesirable. The synchronization is actually there to let the "device gone" state spread cleanly through the software stack. By the time USB disconnect() is called, host controller drivers (and khubd) have normally cleaned up all hardware state, and usbcore is never going to accept another operation on that device. The disconnect() callback is there to prevent that raciness from making trouble ... closing windows from the bottom up. The way usb-storage passes that up to the SCSI layer is by calling scsi_remove_host(). Bug 2400 shows up later, through the block layer (or is it just cdrom?) code. Did someone actually post the specific source code line in cdrom_release() that's oopsing? - Dave ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [linux-usb-devel] Re: bug 2400 2004-04-04 18:16 ` David Brownell @ 2004-04-04 18:42 ` James Bottomley 2004-04-05 3:54 ` David Brownell 0 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2004-04-04 18:42 UTC (permalink / raw) To: David Brownell Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Sun, 2004-04-04 at 14:16, David Brownell wrote: > You're assuming that synchronization is there to establish a single > global notion of state. Clearly that's impossible; also undesirable. Thank you. > The synchronization is actually there to let the "device gone" state > spread cleanly through the software stack. By the time USB disconnect() > is called, host controller drivers (and khubd) have normally cleaned up > all hardware state, and usbcore is never going to accept another operation > on that device. The disconnect() callback is there to prevent that > raciness from making trouble ... closing windows from the bottom up. So you dispute this assertion in the email you quoted above: On Sun, 2004-04-04 at 14:16, David Brownell wrote: > James Bottomley wrote: > > Since we cannot solve that > > race, there's no reason to try to solve the "some parts of the kernel > > know but others don't" part of the race. On what basis? This, I think, is the core of the differences between us. I don't see why an asynchronous event should proceed up the stack in an orderly synchronised manner. It goes like this: - Initially, only the device knows, so commands outstanding time out - Then, the USB driver knows, so it errors incoming commands (and presumably returns with error any outstanding untimed out ones) - Then, SCSI knows, so we forbid user I/O The point is, that any I/O after disconnection gets an error ... the error just comes from different places as the knowledge propagates upwards. > The way usb-storage passes that up to the SCSI layer is by calling > scsi_remove_host(). Bug 2400 shows up later, through the block > layer (or is it just cdrom?) code. Did someone actually post the > specific source code line in cdrom_release() that's oopsing? Well, someone posted a patch a lot earlier in the thread ... we're on to general hotplug principles now. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [linux-usb-devel] Re: bug 2400 2004-04-04 18:42 ` James Bottomley @ 2004-04-05 3:54 ` David Brownell 2004-04-05 21:44 ` James Bottomley 0 siblings, 1 reply; 61+ messages in thread From: David Brownell @ 2004-04-05 3:54 UTC (permalink / raw) To: James Bottomley Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List James Bottomley wrote: > So you dispute this assertion in the email you quoted above: > >>>Since we cannot solve that >>>race, there's no reason to try to solve the "some parts of the kernel >>>know but others don't" part of the race. > > > On what basis? This, I think, is the core of the differences between On the basis of faulty assumptions embedded therein. Notably "cannot solve that race" (I elaborate below on exactly how that information _is_ passed along from USB, which is specifically what you said "cannot" be solved) ... so also the conclusion that there's "no reason to try" fixing things. > us. I don't see why an asynchronous event should proceed up the stack > in an orderly synchronised manner. Some such events don't; but this isn't one of them. And the reason it _does_ is to make sure drivers can always prevent oopsing on device unplug ... and to help eliminate HCD-specific behaviors from the equation. (There were a lot of those in 2.4, which made robust device unplugging rather hard to achieve.) > It goes like this: > > - Initially, only the device knows, so commands outstanding time out Well, hardware timeouts or other protocol faults will happen when the device doesn't respond to I/O from the host. Those will be reported in the usual way. When they happen in other contexts, such faults may be recoverable ... there's no way yet to know that these particular ones won't be. That ends when khubd gets notified, by the hub, that the device is gone. At which point (a) the USB device gets marked as gone, so that usbcore will reject further requests with -ESHUTDOWN, (b) all pending I/Os are canceled with -ESHUTDOWN status, and except for UHCI, (c) HCDs force those the URBs back to drivers, as part of cleaning up the remaining hardware state. At that point, usbcore and the HCD are all but done with the device, and the only question is when the memory associated with the usb_interface and usb_device objects gets its last reference dropped, so the memory gets freed. The current UHCI answer may surprise some drivers, even though the 2.4 stack could do the same thing. The time from (a) to (c) will usually be tens of milliseconds, at most; average time before khubd notices will be around 1/8 second. Then (d) for each driver bound to an interface on that device, the USB disconnect() method is called. Driver responsibility is to drop all those interface and device references ASAP. Plus, in the typical case where drivers are using implicit refcounts to claim interface/device handles, never issuing requests using them after the disconnect() returns, since that releases the implicit reference acquired during probe(). > - Then, the USB driver knows, so it errors incoming commands (and > presumably returns with error any outstanding untimed out ones) I don't think any single point (a)-(d) matches that description. As soon as it gets -ESHUTDOWN it "knows", even before (d), for example ... though it's more natural to wait till (d) before starting to clean up the device state. > - Then, SCSI knows, so we forbid user I/O SCSI knows by the fact that the host was unregistered in (d), rarely more than ~300 msec after physical disconnect. The problem here was that the "forbid" is borked. Maybe not in SCSI (there are other layers involved), but oopsably. > The point is, that any I/O after disconnection gets an error ... the > error just comes from different places as the knowledge propagates > upwards. Which directly follows from what I said ... USB propagates that knowledge in carefully defined ways. Other layers can do the same, although clearly state associated with open file descriptors needs to use a slightly different strategy. And that strategy is what Alan's original comment was about. - Dave ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Re: bug 2400 2004-04-05 3:54 ` David Brownell @ 2004-04-05 21:44 ` James Bottomley 2004-04-05 23:23 ` [linux-usb-devel] " David Brownell 0 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2004-04-05 21:44 UTC (permalink / raw) To: David Brownell Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Sun, 2004-04-04 at 22:54, David Brownell wrote: > Which directly follows from what I said ... USB propagates > that knowledge in carefully defined ways. Other layers can > do the same, although clearly state associated with open file > descriptors needs to use a slightly different strategy. And > that strategy is what Alan's original comment was about. Well, again, this is at the core of the argument. I say that as long as we have all our objects correctly refcounted, how disconnections propagate up the stack is irrelevant. If you have to have a "carefully defined" order for the propagation of an asynchronous event to save you from oopsing, it's a sure sign of bugs in the code. My reasoning is that I/O down the stack will ultimately hit the part of the kernel (or even just timeout in transmission to the device if it's disappeared without trace) that contains the knowledge and be returned with an error. Ordering the disconnection propagation cannot change this fact, merely alter *which* component returns the error. As long as we're robust to the error there is no problem. At this point, the object lifetime does nothing more that count how long we have to keep the object around to return the error. James ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [linux-usb-devel] Re: bug 2400 2004-04-05 21:44 ` James Bottomley @ 2004-04-05 23:23 ` David Brownell 2004-04-06 1:19 ` James Bottomley 0 siblings, 1 reply; 61+ messages in thread From: David Brownell @ 2004-04-05 23:23 UTC (permalink / raw) To: James Bottomley Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List James Bottomley wrote: > On Sun, 2004-04-04 at 22:54, David Brownell wrote: > >>Which directly follows from what I said ... USB propagates >>that knowledge in carefully defined ways. Other layers can >>do the same, although clearly state associated with open file >>descriptors needs to use a slightly different strategy. And >>that strategy is what Alan's original comment was about. > > > Well, again, this is at the core of the argument. > > I say that as long as we have all our objects correctly refcounted, how > disconnections propagate up the stack is irrelevant. "Irrelevant" is pointlessly strong, even for what I'm guessing you really mean to be saying. Minimally, there needs to be synchronization to prevent open() on one CPU from using data structures disconnect() just invalidated on another CPU. That's a basic "how to write multi-threaded code" kind of issue, which I won't bother to explain (yet again). > If you have to have a "carefully defined" order for the propagation of > an asynchronous event to save you from oopsing, it's a sure sign of bugs > in the code. It was "carefully defined" to ensure that there are clearly defined points where it's safe to delete the hardware state. Those are needed in situations other than disconnect(). Not having such synchronization points meant that HCDs were reduced to _guessing_ whether it's safe to free that state. Which was a sure sign of API bugs (inside usbcore), and made for way too many oops-on-unplug reports in 2.4 kernels (and kept various other things from working right, too). > My reasoning is that I/O down the stack will ultimately hit the part of > the kernel (or even just timeout in transmission to the device if it's > disappeared without trace) that contains the knowledge and be returned > with an error. Ordering the disconnection propagation cannot change > this fact, merely alter *which* component returns the error. Ensuring such a behaviour certainly requires "careful design". As does ensuring that the event propagation _completes_ every time, rather than sometimes just refusing to finish. Surely you've seen both kinds of design botch. > As long as we're robust to the error there is no problem. The only system "error" in this discussion was that oops. It may be a minor point ... but "disconnecting device while in use" isn't an error at all. It's a fault, maybe sometimes undesirable, but one with behavior specified as fully as reading or writing a block of data. > At this point, the object lifetime does nothing more that count how long > we have to keep the object around to return the error. And that's basically what I said about the careful design of the disconnect sequence, ensuring that there _is_ such a point: one where the only issues left are when references get dropped, so that the only issue left is memory management. - Dave > James > > > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [linux-usb-devel] Re: bug 2400 2004-04-05 23:23 ` [linux-usb-devel] " David Brownell @ 2004-04-06 1:19 ` James Bottomley 2004-04-06 6:52 ` Oliver Neukum 2004-04-06 15:10 ` David Brownell 0 siblings, 2 replies; 61+ messages in thread From: James Bottomley @ 2004-04-06 1:19 UTC (permalink / raw) To: David Brownell Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Mon, 2004-04-05 at 18:23, David Brownell wrote: > "Irrelevant" is pointlessly strong, even for what I'm guessing > you really mean to be saying. No, it's the very point. If we have to enforce ordering on the event propagation across subsystems, that can't be done without inter subsystem synchronisation. In order to get away without any inter subsystem synchronisation the order has to not matter. i.e. be irrelevant. > Minimally, there needs to be synchronization to prevent open() on > one CPU from using data structures disconnect() just invalidated > on another CPU. That's a basic "how to write multi-threaded code" > kind of issue, which I won't bother to explain (yet again). Yes, but that's intra subsystem synchronisation, not inter subsystem synchronisation. Look, the rules are very simple: - Every subsystem gives out refcounted objects. - Every subsystem takes in a disconnection event for another object. Now what the subsystem does with the event is up to it. The only guarantee is if the subsystem holds no references to the other object after the disconnection completes in that subsystem, then it may be garbage collected. However, if not, the other object must remain until the last reference is dropped. The point is that all the disconnection event does is inform a subsystem that the entity represented by the other object has gone. What it chooses to do with the information is up to it. It could set it's own connected object to degraded, never use the other object again and drop the reference. Or, it could simply ignore the indication and continue to use the other object (even though it will return errors for every operation). Now, if the subsystem is going to garbage collect its own object as a result of the other object disconnect, then it is responsible for synchronising that with reference gets on its own object. However, that is easily achievable via *intra* subsystem synchronisation. If you follow these rules, there's no *inter* subsystem ordering or synchronisation requirement. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [linux-usb-devel] Re: bug 2400 2004-04-06 1:19 ` James Bottomley @ 2004-04-06 6:52 ` Oliver Neukum 2004-04-06 14:03 ` James Bottomley 2004-04-06 15:10 ` David Brownell 1 sibling, 1 reply; 61+ messages in thread From: Oliver Neukum @ 2004-04-06 6:52 UTC (permalink / raw) To: James Bottomley, David Brownell Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List > The point is that all the disconnection event does is inform a subsystem > that the entity represented by the other object has gone. What it > chooses to do with the information is up to it. It could set it's own > connected object to degraded, never use the other object again and drop > the reference. Or, it could simply ignore the indication and continue > to use the other object (even though it will return errors for every > operation). Pure refcounting can never protect you against races with freeing objects. The counters themselves must be protected. Try as you might you need locks for that and rules on how this locks are to be used. Regards Oliver ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [linux-usb-devel] Re: bug 2400 2004-04-06 6:52 ` Oliver Neukum @ 2004-04-06 14:03 ` James Bottomley 2004-04-07 9:19 ` Oliver.Neukum 0 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2004-04-06 14:03 UTC (permalink / raw) To: Oliver Neukum Cc: David Brownell, Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Tue, 2004-04-06 at 01:52, Oliver Neukum wrote: > Pure refcounting can never protect you against races with freeing objects. > The counters themselves must be protected. Try as you might you need > locks for that and rules on how this locks are to be used. Which part of On Mon, 2004-04-05 at 20:19, James Bottomley wrote: > Now, if the subsystem is going to garbage collect its own object as a > result of the other object disconnect, then it is responsible for > synchronising that with reference gets on its own object. However, that > is easily achievable via *intra* subsystem synchronisation. didn't you understand? However, how a subsystem resolves this intra subsystem synchronisation is up to it ... and it doesn't have to do it with locks. So there are no exposed locks for this and no rules therefore, for how to use them. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [linux-usb-devel] Re: bug 2400 2004-04-06 14:03 ` James Bottomley @ 2004-04-07 9:19 ` Oliver.Neukum 0 siblings, 0 replies; 61+ messages in thread From: Oliver.Neukum @ 2004-04-07 9:19 UTC (permalink / raw) To: James Bottomley Cc: Oliver Neukum, David Brownell, Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Tue, 6 Apr 2004, James Bottomley wrote: > On Tue, 2004-04-06 at 01:52, Oliver Neukum wrote: > > Pure refcounting can never protect you against races with freeing objects. > > The counters themselves must be protected. Try as you might you need > > locks for that and rules on how this locks are to be used. > > Which part of > > On Mon, 2004-04-05 at 20:19, James Bottomley wrote: > > Now, if the subsystem is going to garbage collect its own object as a > > result of the other object disconnect, then it is responsible for > > synchronising that with reference gets on its own object. However, that > > is easily achievable via *intra* subsystem synchronisation. > > didn't you understand? > > However, how a subsystem resolves this intra subsystem synchronisation > is up to it ... and it doesn't have to do it with locks. So there are > no exposed locks for this and no rules therefore, for how to use them. How you do this notification is irrelevant. You must address the basic race of increasing the refcount from 0 to 1. After that all is fine. At some point you have to handle a pointer to an object whose refcount is unknown to you. Regards Oliver ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [linux-usb-devel] Re: bug 2400 2004-04-06 1:19 ` James Bottomley 2004-04-06 6:52 ` Oliver Neukum @ 2004-04-06 15:10 ` David Brownell 2004-04-06 15:47 ` James Bottomley 1 sibling, 1 reply; 61+ messages in thread From: David Brownell @ 2004-04-06 15:10 UTC (permalink / raw) To: James Bottomley Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List James Bottomley wrote: > that's intra subsystem synchronisation, not inter subsystem I'd say that device_driver.remove() calls from the bus management code to a given device driver are inter-subsystem... > Look, the rules are very simple: > > - Every subsystem gives out refcounted objects. > > - Every subsystem takes in a disconnection event for another object. > .. > > If you follow these rules, there's no *inter* subsystem ordering or > synchronisation requirement. Well, either handling the disconnection event from the lower level subsystem (bus) is such a requirement, or it's not necessary! I've got to assume the former, since the Linux kernel doesn't garbage collect its memory. Refcounting systems need loop-breaking protocols ("disconnect"); this is needed for that sort of reason, if no other. - Dave ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [linux-usb-devel] Re: bug 2400 2004-04-06 15:10 ` David Brownell @ 2004-04-06 15:47 ` James Bottomley 2004-04-06 16:16 ` David Brownell 2004-04-06 16:55 ` Alan Stern 0 siblings, 2 replies; 61+ messages in thread From: James Bottomley @ 2004-04-06 15:47 UTC (permalink / raw) To: David Brownell Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Tue, 2004-04-06 at 10:10, David Brownell wrote: > James Bottomley wrote: > > that's intra subsystem synchronisation, not inter subsystem > > I'd say that device_driver.remove() calls from the bus management > code to a given device driver are inter-subsystem... so using a subsystem provided API from another subsystem constitutes inter subsytem synchronisation? I don't think so. > Refcounting systems need loop-breaking protocols ("disconnect"); > this is needed for that sort of reason, if no other. No, they don't. Supposing we did no hotplug at all and just relied on the fact that a driver responds DID_NO_CONNECT to a missing device. We have a mounted fs on a CD on SCSI on USB which gets disconnected. No event is generated because no hotplug. Now every read of the fs gets EIO. The system can continue in this state forever if necessary. However, eventually the user will get bored and unmount the fs If we just generated a single event saying "device gone" and did nothing else, userland helpers could pull apart the entire stack (clean up application, kill processes, unmount fs, remove scsi device etc). You're trying to make a particular event special. That's not correct. There are many types of disconnect (user unmount, user offlines device) coming from essentially all parts of the stack. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [linux-usb-devel] Re: bug 2400 2004-04-06 15:47 ` James Bottomley @ 2004-04-06 16:16 ` David Brownell 2004-04-06 16:55 ` Alan Stern 1 sibling, 0 replies; 61+ messages in thread From: David Brownell @ 2004-04-06 16:16 UTC (permalink / raw) To: James Bottomley Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List James Bottomley wrote: >>> that's intra subsystem synchronisation, not inter subsystem >> >>I'd say that device_driver.remove() calls from the bus management >>code to a given device driver are inter-subsystem... > > so using a subsystem provided API from another subsystem constitutes > inter subsytem synchronisation? I don't think so. So it's now "I won't listen to those facts, they bother me" ... Or maybe you're just trolling? "From another subsystem" is "between subsystems" is "inter-subsystem"; Q.E.D. Either way, it's not worth the time to respond to you any more, this will be my last post on this thread. >>Refcounting systems need loop-breaking protocols ("disconnect"); >>this is needed for that sort of reason, if no other. > > > No, they don't. Without protocols for breaking referencing loops, systems that use refcounting _alone_ (as you're advocating) leak memory unless they prevent such loops ... but requests sent from one subsystem to the next, then later returned, each embed such a loop. That's in addition to the need for locking in certain paths, on multi-threaded systems (like Linux). (And for that matter, there were no WMDs in Iraq.) Sorry for mentioning all those inconvenient facts; I'll go away now. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [linux-usb-devel] Re: bug 2400 2004-04-06 15:47 ` James Bottomley 2004-04-06 16:16 ` David Brownell @ 2004-04-06 16:55 ` Alan Stern 2004-04-06 17:13 ` James Bottomley 1 sibling, 1 reply; 61+ messages in thread From: Alan Stern @ 2004-04-06 16:55 UTC (permalink / raw) To: James Bottomley Cc: David Brownell, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On 6 Apr 2004, James Bottomley wrote: > Supposing we did no hotplug at all and just relied on the fact that a > driver responds DID_NO_CONNECT to a missing device. > > We have a mounted fs on a CD on SCSI on USB which gets disconnected. No > event is generated because no hotplug. Now every read of the fs gets > EIO. The system can continue in this state forever if necessary. > However, eventually the user will get bored and unmount the fs > > If we just generated a single event saying "device gone" and did nothing > else, userland helpers could pull apart the entire stack (clean up > application, kill processes, unmount fs, remove scsi device etc). In principle, yes. However... The kernel still has to process requests correctly while the stack is partially deconstructed. It also has to protect against userland helpers trying to pull things apart in the wrong order. I'm not trying to say that anything you wrote was wrong -- just that the situation is more complicated than you make it appear. Alan Stern ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [linux-usb-devel] Re: bug 2400 2004-04-06 16:55 ` Alan Stern @ 2004-04-06 17:13 ` James Bottomley 0 siblings, 0 replies; 61+ messages in thread From: James Bottomley @ 2004-04-06 17:13 UTC (permalink / raw) To: Alan Stern Cc: David Brownell, Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Tue, 2004-04-06 at 11:55, Alan Stern wrote: > In principle, yes. However... The kernel still has to process requests > correctly while the stack is partially deconstructed. It also has to > protect against userland helpers trying to pull things apart in the wrong > order. This I agree with ... that's where the "robust to errors" statement comes in. I'm not claiming we currently are, merely that we have to be (regardless of ordered or disordered event propagation). > I'm not trying to say that anything you wrote was wrong -- just that the > situation is more complicated than you make it appear. Well, I think the principle is simple. The practice is less so ... I can still panic my test system by offlining the ext2 root filesystem in the middle of heavy stress. What I don't believe is that enforcing some kind of ordering can relieve us of the necessity of making the error paths robust. Since the largely hitherto untested error path becomes our main handler for forced disconnect, we're necessarily running into a lot of bugs that didn't show up before. However, they have always been bugs regardless of whether they got tripped. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-01 23:32 ` James Bottomley 2004-04-02 0:29 ` Steven Dake 2004-04-02 8:43 ` Mike Anderson @ 2004-04-02 23:36 ` James Bottomley 2004-04-03 0:11 ` Mike Anderson ` (2 more replies) 2 siblings, 3 replies; 61+ messages in thread From: James Bottomley @ 2004-04-02 23:36 UTC (permalink / raw) To: James Bottomley Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Thu, 2004-04-01 at 18:32, James Bottomley wrote: > Now, the questions are, whose issue is this and how do we fix it? I can > see that a driver needs early notification of unplugs so it can deny all > access to a gone device. On the other hand, for a user land open where > we still have to hold resources in the driver, we'd like the driver to > have a notify when the device reference count drops to zero so we can > clean up. OK, I think this is easily fixable in sr.c by moving around the release code to do the right thing (and not drop the reference to sdev_gendev until we're completely finished). Jens, does this look OK (or have I just opened up another race window somewhere else)? James ===== drivers/scsi/sr.c 1.103 vs edited ===== --- 1.103/drivers/scsi/sr.c Fri Apr 2 11:30:44 2004 +++ edited/drivers/scsi/sr.c Fri Apr 2 17:29:06 2004 @@ -424,8 +424,19 @@ static int sr_block_release(struct inode *inode, struct file *file) { + int ret; struct scsi_cd *cd = scsi_cd(inode->i_bdev->bd_disk); - return cdrom_release(&cd->cdi, file); + struct scsi_device *sdev = cd->device; + ret = cdrom_release(&cd->cdi, file); + if(ret) + return ret; + + unregister_cdrom(&cd->cdi); + kfree(cd); + + scsi_device_put(sdev); + + return 0; } static int sr_block_ioctl(struct inode *inode, struct file *file, unsigned cmd, @@ -500,7 +511,6 @@ if (cd->device->sector_size > 2048) sr_set_blocklength(cd, 2048); - scsi_device_put(cd->device); } static int sr_probe(struct device *dev) @@ -874,9 +884,6 @@ spin_unlock(&sr_index_lock); put_disk(cd->disk); - unregister_cdrom(&cd->cdi); - kfree(cd); - return 0; } ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-02 23:36 ` James Bottomley @ 2004-04-03 0:11 ` Mike Anderson 2004-04-03 0:16 ` James Bottomley 2004-04-05 4:33 ` Patrick Mansfield 2004-04-05 14:03 ` Jens Axboe 2 siblings, 1 reply; 61+ messages in thread From: Mike Anderson @ 2004-04-03 0:11 UTC (permalink / raw) To: James Bottomley Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List James Bottomley [James.Bottomley@steeleye.com] wrote: > ===== drivers/scsi/sr.c 1.103 vs edited ===== > --- 1.103/drivers/scsi/sr.c Fri Apr 2 11:30:44 2004 > +++ edited/drivers/scsi/sr.c Fri Apr 2 17:29:06 2004 > @@ -424,8 +424,19 @@ > > static int sr_block_release(struct inode *inode, struct file *file) > { > + int ret; > struct scsi_cd *cd = scsi_cd(inode->i_bdev->bd_disk); > - return cdrom_release(&cd->cdi, file); > + struct scsi_device *sdev = cd->device; > + ret = cdrom_release(&cd->cdi, file); > + if(ret) > + return ret; It looks like cdrom_release always returns 0? Is there some other patch outstanding that changes this. -andmike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-03 0:11 ` Mike Anderson @ 2004-04-03 0:16 ` James Bottomley 0 siblings, 0 replies; 61+ messages in thread From: James Bottomley @ 2004-04-03 0:16 UTC (permalink / raw) To: Mike Anderson Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Fri, 2004-04-02 at 19:11, Mike Anderson wrote: > It looks like cdrom_release always returns 0? Is there some other patch > outstanding that changes this. Yes, not that I know of. However, while it apparently has a return code indicating success or failure, we should respect it [unless Jens wants to make it a void function]. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-02 23:36 ` James Bottomley 2004-04-03 0:11 ` Mike Anderson @ 2004-04-05 4:33 ` Patrick Mansfield 2004-04-05 14:09 ` James Bottomley 2004-04-05 21:07 ` James Bottomley 2004-04-05 14:03 ` Jens Axboe 2 siblings, 2 replies; 61+ messages in thread From: Patrick Mansfield @ 2004-04-05 4:33 UTC (permalink / raw) To: James Bottomley Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Fri, Apr 02, 2004 at 06:36:55PM -0500, James Bottomley wrote: > ===== drivers/scsi/sr.c 1.103 vs edited ===== > --- 1.103/drivers/scsi/sr.c Fri Apr 2 11:30:44 2004 > +++ edited/drivers/scsi/sr.c Fri Apr 2 17:29:06 2004 > @@ -424,8 +424,19 @@ > > static int sr_block_release(struct inode *inode, struct file *file) > { > + int ret; > struct scsi_cd *cd = scsi_cd(inode->i_bdev->bd_disk); > - return cdrom_release(&cd->cdi, file); > + struct scsi_device *sdev = cd->device; > + ret = cdrom_release(&cd->cdi, file); > + if(ret) > + return ret; > + > + unregister_cdrom(&cd->cdi); > + kfree(cd); > + > + scsi_device_put(sdev); > + > + return 0; > } > > static int sr_block_ioctl(struct inode *inode, struct file *file, unsigned cmd, > @@ -500,7 +511,6 @@ > if (cd->device->sector_size > 2048) > sr_set_blocklength(cd, 2048); > > - scsi_device_put(cd->device); > } > > static int sr_probe(struct device *dev) > @@ -874,9 +884,6 @@ > spin_unlock(&sr_index_lock); > > put_disk(cd->disk); > - unregister_cdrom(&cd->cdi); > - kfree(cd); > - > return 0; > } How is unregister_cdrom(&cd->cdi) called if the device is not open? -- Patrick Mansfield ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-05 4:33 ` Patrick Mansfield @ 2004-04-05 14:09 ` James Bottomley 2004-04-05 21:07 ` James Bottomley 1 sibling, 0 replies; 61+ messages in thread From: James Bottomley @ 2004-04-05 14:09 UTC (permalink / raw) To: Patrick Mansfield Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Sun, 2004-04-04 at 23:33, Patrick Mansfield wrote: > How is unregister_cdrom(&cd->cdi) called if the device is not open? Yes, I need to pull the same kobject trick that sd does for the same reason. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-05 4:33 ` Patrick Mansfield 2004-04-05 14:09 ` James Bottomley @ 2004-04-05 21:07 ` James Bottomley 2004-04-06 9:22 ` Jens Axboe 2004-04-08 23:06 ` Greg KH 1 sibling, 2 replies; 61+ messages in thread From: James Bottomley @ 2004-04-05 21:07 UTC (permalink / raw) To: Patrick Mansfield Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel, SCSI Mailing List On Sun, 2004-04-04 at 23:33, Patrick Mansfield wrote: > How is unregister_cdrom(&cd->cdi) called if the device is not open? OK, the attached does everything correctly (as I should have done at first) by using a kobject to hold the compound references. I've stressed this one nicely. Unfortunately, it's showing up a bug in scsi_sysfs (hold device open, remove it and re-add it at the same HCTL and you'll see an "error 1"). James ===== drivers/scsi/sr.c 1.103 vs edited ===== --- 1.103/drivers/scsi/sr.c Fri Apr 2 11:30:44 2004 +++ edited/drivers/scsi/sr.c Mon Apr 5 15:59:34 2004 @@ -113,6 +113,28 @@ .generic_packet = sr_packet, }; +static void sr_kobject_release(struct kobject *kobj); + +static struct kobj_type scsi_cdrom_kobj_type = { + .release = sr_kobject_release, +}; + +/* + * The get and put routines for the struct scsi_cd. Note this entity + * has a scsi_device pointer and owns a reference to this. + */ +static inline int scsi_cd_get(struct scsi_cd *cd) +{ + if (!kobject_get(&cd->kobj)) + return -ENODEV; + return 0; +} + +static inline void scsi_cd_put(struct scsi_cd *cd) +{ + kobject_put(&cd->kobj); +} + /* * This function checks to see if the media has been changed in the * CDROM drive. It is possible that we have already sensed a change, @@ -424,8 +446,15 @@ static int sr_block_release(struct inode *inode, struct file *file) { + int ret; struct scsi_cd *cd = scsi_cd(inode->i_bdev->bd_disk); - return cdrom_release(&cd->cdi, file); + ret = cdrom_release(&cd->cdi, file); + if(ret) + return ret; + + scsi_cd_put(cd); + + return 0; } static int sr_block_ioctl(struct inode *inode, struct file *file, unsigned cmd, @@ -467,7 +496,7 @@ struct scsi_device *sdev = cd->device; int retval; - retval = scsi_device_get(sdev); + retval = scsi_cd_get(cd); if (retval) return retval; @@ -489,7 +518,7 @@ return 0; error_out: - scsi_device_put(sdev); + scsi_cd_put(cd); return retval; } @@ -500,7 +529,6 @@ if (cd->device->sector_size > 2048) sr_set_blocklength(cd, 2048); - scsi_device_put(cd->device); } static int sr_probe(struct device *dev) @@ -514,12 +542,18 @@ if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM) goto fail; + if ((err = scsi_device_get(sdev)) != 0) + goto fail_put_sdev; + error = -ENOMEM; cd = kmalloc(sizeof(*cd), GFP_KERNEL); if (!cd) goto fail; memset(cd, 0, sizeof(*cd)); + kobject_init(&cd->kobj); + cd->kobj.ktype = &scsi_cdrom_kobj_type; + disk = alloc_disk(1); if (!disk) goto fail_free; @@ -588,6 +622,8 @@ put_disk(disk); fail_free: kfree(cd); +fail_put_sdev: + scsi_device_put(sdev); fail: return error; } @@ -863,19 +899,31 @@ return cgc->stat; } -static int sr_remove(struct device *dev) +static void sr_kobject_release(struct kobject *kobj) { - struct scsi_cd *cd = dev_get_drvdata(dev); - - del_gendisk(cd->disk); + struct scsi_cd *cd = container_of(kobj, struct scsi_cd, kobj); + struct scsi_device *sdev = cd->device; spin_lock(&sr_index_lock); clear_bit(cd->disk->first_minor, sr_index_bits); spin_unlock(&sr_index_lock); - put_disk(cd->disk); unregister_cdrom(&cd->cdi); + + put_disk(cd->disk); + kfree(cd); + + scsi_device_put(sdev); +} + +static int sr_remove(struct device *dev) +{ + struct scsi_cd *cd = dev_get_drvdata(dev); + + del_gendisk(cd->disk); + + scsi_cd_put(cd); return 0; } ===== drivers/scsi/sr.h 1.10 vs edited ===== --- 1.10/drivers/scsi/sr.h Mon May 26 04:50:43 2003 +++ edited/drivers/scsi/sr.h Mon Apr 5 15:51:37 2004 @@ -36,6 +36,9 @@ unsigned readcd_known:1; /* drive supports READ_CD (0xbe) */ unsigned readcd_cdda:1; /* reading audio data using READ_CD */ struct cdrom_device_info cdi; + /* We hold gendisk and scsi_device references on probe and use + * the refs on this kobj to decide when to release them */ + struct kobject kobj; struct gendisk *disk; } Scsi_CD; ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-05 21:07 ` James Bottomley @ 2004-04-06 9:22 ` Jens Axboe 2004-04-06 13:56 ` James Bottomley 2004-04-08 23:06 ` Greg KH 1 sibling, 1 reply; 61+ messages in thread From: Jens Axboe @ 2004-04-06 9:22 UTC (permalink / raw) To: James Bottomley Cc: Patrick Mansfield, Andrew Morton, greg, linux-usb-devel, SCSI Mailing List On Mon, Apr 05 2004, James Bottomley wrote: > On Sun, 2004-04-04 at 23:33, Patrick Mansfield wrote: > > How is unregister_cdrom(&cd->cdi) called if the device is not open? > > OK, the attached does everything correctly (as I should have done at > first) by using a kobject to hold the compound references. > > I've stressed this one nicely. Unfortunately, it's showing up a bug in > scsi_sysfs (hold device open, remove it and re-add it at the same HCTL > and you'll see an "error 1"). Really? It doesn't even compile :-) > @@ -514,12 +542,18 @@ > if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM) > goto fail; > > + if ((err = scsi_device_get(sdev)) != 0) > + goto fail_put_sdev; > + > error = -ENOMEM; > cd = kmalloc(sizeof(*cd), GFP_KERNEL); > if (!cd) > goto fail; > memset(cd, 0, sizeof(*cd)); -- Jens Axboe ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-06 9:22 ` Jens Axboe @ 2004-04-06 13:56 ` James Bottomley 2004-04-06 14:04 ` Jens Axboe 0 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2004-04-06 13:56 UTC (permalink / raw) To: Jens Axboe Cc: Patrick Mansfield, Andrew Morton, greg, linux-usb-devel, SCSI Mailing List On Tue, 2004-04-06 at 04:22, Jens Axboe wrote: > Really? It doesn't even compile :-) Heh, I really must learn that I have to copy the file from the test machine to the email machine *before* attaching it. You got a stale copy of an older incarnation, I think. The attached (hopefully) is what I compiled and tested with. The test, incidentally, is simply to hold the device open and then forcibly remove it using scsi remove-single-device before closing it. James ===== drivers/scsi/sr.c 1.103 vs edited ===== --- 1.103/drivers/scsi/sr.c Fri Apr 2 11:30:44 2004 +++ edited/drivers/scsi/sr.c Tue Apr 6 08:49:30 2004 @@ -113,6 +113,28 @@ .generic_packet = sr_packet, }; +static void sr_kobject_release(struct kobject *kobj); + +static struct kobj_type scsi_cdrom_kobj_type = { + .release = sr_kobject_release, +}; + +/* + * The get and put routines for the struct scsi_cd. Note this entity + * has a scsi_device pointer and owns a reference to this. + */ +static inline int scsi_cd_get(struct scsi_cd *cd) +{ + if (!kobject_get(&cd->kobj)) + return -ENODEV; + return 0; +} + +static inline void scsi_cd_put(struct scsi_cd *cd) +{ + kobject_put(&cd->kobj); +} + /* * This function checks to see if the media has been changed in the * CDROM drive. It is possible that we have already sensed a change, @@ -424,8 +446,15 @@ static int sr_block_release(struct inode *inode, struct file *file) { + int ret; struct scsi_cd *cd = scsi_cd(inode->i_bdev->bd_disk); - return cdrom_release(&cd->cdi, file); + ret = cdrom_release(&cd->cdi, file); + if(ret) + return ret; + + scsi_cd_put(cd); + + return 0; } static int sr_block_ioctl(struct inode *inode, struct file *file, unsigned cmd, @@ -467,7 +496,7 @@ struct scsi_device *sdev = cd->device; int retval; - retval = scsi_device_get(sdev); + retval = scsi_cd_get(cd); if (retval) return retval; @@ -489,7 +518,7 @@ return 0; error_out: - scsi_device_put(sdev); + scsi_cd_put(cd); return retval; } @@ -500,7 +529,6 @@ if (cd->device->sector_size > 2048) sr_set_blocklength(cd, 2048); - scsi_device_put(cd->device); } static int sr_probe(struct device *dev) @@ -514,12 +542,18 @@ if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM) goto fail; + if ((error = scsi_device_get(sdev)) != 0) + goto fail; + error = -ENOMEM; cd = kmalloc(sizeof(*cd), GFP_KERNEL); if (!cd) - goto fail; + goto fail_put_sdev; memset(cd, 0, sizeof(*cd)); + kobject_init(&cd->kobj); + cd->kobj.ktype = &scsi_cdrom_kobj_type; + disk = alloc_disk(1); if (!disk) goto fail_free; @@ -588,6 +622,8 @@ put_disk(disk); fail_free: kfree(cd); +fail_put_sdev: + scsi_device_put(sdev); fail: return error; } @@ -863,19 +899,31 @@ return cgc->stat; } -static int sr_remove(struct device *dev) +static void sr_kobject_release(struct kobject *kobj) { - struct scsi_cd *cd = dev_get_drvdata(dev); - - del_gendisk(cd->disk); + struct scsi_cd *cd = container_of(kobj, struct scsi_cd, kobj); + struct scsi_device *sdev = cd->device; spin_lock(&sr_index_lock); clear_bit(cd->disk->first_minor, sr_index_bits); spin_unlock(&sr_index_lock); - put_disk(cd->disk); unregister_cdrom(&cd->cdi); + + put_disk(cd->disk); + kfree(cd); + + scsi_device_put(sdev); +} + +static int sr_remove(struct device *dev) +{ + struct scsi_cd *cd = dev_get_drvdata(dev); + + del_gendisk(cd->disk); + + scsi_cd_put(cd); return 0; } ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-06 13:56 ` James Bottomley @ 2004-04-06 14:04 ` Jens Axboe 2004-04-06 14:09 ` James Bottomley 0 siblings, 1 reply; 61+ messages in thread From: Jens Axboe @ 2004-04-06 14:04 UTC (permalink / raw) To: James Bottomley Cc: Patrick Mansfield, Andrew Morton, greg, linux-usb-devel, SCSI Mailing List On Tue, Apr 06 2004, James Bottomley wrote: > On Tue, 2004-04-06 at 04:22, Jens Axboe wrote: > > Really? It doesn't even compile :-) > > Heh, I really must learn that I have to copy the file from the test > machine to the email machine *before* attaching it. You got a stale > copy of an older incarnation, I think. Only change is the err -> error, correct? > The attached (hopefully) is what I compiled and tested with. The test, > incidentally, is simply to hold the device open and then forcibly remove > it using scsi remove-single-device before closing it. Better than what is there, whether it needs other synchronization is a different question. -- Jens Axboe ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-06 14:04 ` Jens Axboe @ 2004-04-06 14:09 ` James Bottomley 0 siblings, 0 replies; 61+ messages in thread From: James Bottomley @ 2004-04-06 14:09 UTC (permalink / raw) To: Jens Axboe Cc: Patrick Mansfield, Andrew Morton, greg, linux-usb-devel, SCSI Mailing List On Tue, 2004-04-06 at 09:04, Jens Axboe wrote: > Only change is the err -> error, correct? Not quite; there was a cleanup goto issue as well (don't want to goto the release if you couldn't get the object). > > The attached (hopefully) is what I compiled and tested with. The test, > > incidentally, is simply to hold the device open and then forcibly remove > > it using scsi remove-single-device before closing it. > > Better than what is there, whether it needs other synchronization is a > different question. Heh, join the debate, why don't you... James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-05 21:07 ` James Bottomley 2004-04-06 9:22 ` Jens Axboe @ 2004-04-08 23:06 ` Greg KH 2004-04-09 11:28 ` James Bottomley 1 sibling, 1 reply; 61+ messages in thread From: Greg KH @ 2004-04-08 23:06 UTC (permalink / raw) To: James Bottomley Cc: Patrick Mansfield, Andrew Morton, Jens Axboe, linux-usb-devel, SCSI Mailing List On Mon, Apr 05, 2004 at 04:07:12PM -0500, James Bottomley wrote: > On Sun, 2004-04-04 at 23:33, Patrick Mansfield wrote: > > How is unregister_cdrom(&cd->cdi) called if the device is not open? > > OK, the attached does everything correctly (as I should have done at > first) by using a kobject to hold the compound references. Why not just use a kref for this? That's what they were created for? :) thanks, greg k-h ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-08 23:06 ` Greg KH @ 2004-04-09 11:28 ` James Bottomley 0 siblings, 0 replies; 61+ messages in thread From: James Bottomley @ 2004-04-09 11:28 UTC (permalink / raw) To: Greg KH Cc: Patrick Mansfield, Andrew Morton, Jens Axboe, linux-usb-devel, SCSI Mailing List On Thu, 2004-04-08 at 18:06, Greg KH wrote: > Why not just use a kref for this? That's what they were created for? > :) Heh, I was waiting for you to say that. Took you five days to notice, though .... I'll convert both sr and sd to use krefs. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-02 23:36 ` James Bottomley 2004-04-03 0:11 ` Mike Anderson 2004-04-05 4:33 ` Patrick Mansfield @ 2004-04-05 14:03 ` Jens Axboe 2004-04-05 21:08 ` James Bottomley 2 siblings, 1 reply; 61+ messages in thread From: Jens Axboe @ 2004-04-05 14:03 UTC (permalink / raw) To: James Bottomley; +Cc: Andrew Morton, greg, linux-usb-devel, SCSI Mailing List On Fri, Apr 02 2004, James Bottomley wrote: > On Thu, 2004-04-01 at 18:32, James Bottomley wrote: > > Now, the questions are, whose issue is this and how do we fix it? I can > > see that a driver needs early notification of unplugs so it can deny all > > access to a gone device. On the other hand, for a user land open where > > we still have to hold resources in the driver, we'd like the driver to > > have a notify when the device reference count drops to zero so we can > > clean up. > > OK, I think this is easily fixable in sr.c by moving around the release > code to do the right thing (and not drop the reference to sdev_gendev > until we're completely finished). > > Jens, does this look OK (or have I just opened up another race window > somewhere else)? Doesn't quite work, how about something like this as a work-around? diff -ur /mnt/kscratch/linux-2.6.4/drivers/cdrom/cdrom.c linux-2.6.4-SUSE-20040402/drivers/cdrom/cdrom.c --- /mnt/kscratch/linux-2.6.4/drivers/cdrom/cdrom.c 2004-04-02 14:53:42.000000000 +0200 +++ linux-2.6.4-SUSE-20040402/drivers/cdrom/cdrom.c 2004-04-05 15:43:01.603675727 +0200 @@ -379,6 +379,8 @@ #endif /* CONFIG_SYSCTL */ } + atomic_set(&cdi->all_use, 1); + ENSURE(drive_status, CDC_DRIVE_STATUS ); ENSURE(media_changed, CDC_MEDIA_CHANGED); ENSURE(tray_move, CDC_CLOSE_TRAY | CDC_OPEN_TRAY); @@ -425,6 +427,9 @@ struct cdrom_device_info *cdi, *prev; cdinfo(CD_OPEN, "entering unregister_cdrom\n"); + if (!atomic_dec_and_test(&unreg->all_use)) + return -EBUSY; + prev = NULL; spin_lock(&cdrom_lock); cdi = topCdromPtr; @@ -891,6 +896,8 @@ if (ret) goto err; + atomic_inc(&cdi->all_use); + cdinfo(CD_OPEN, "Use count for \"/dev/%s\" now %d\n", cdi->name, cdi->use_count); /* Do this on open. Don't wait for mount, because they might @@ -1096,6 +1103,7 @@ cdi->options & CDO_AUTO_EJECT && CDROM_CAN(CDC_OPEN_TRAY)) cdo->tray_move(cdi, 1); } + return 0; } diff -ur /mnt/kscratch/linux-2.6.4/drivers/scsi/sr.c linux-2.6.4-SUSE-20040402/drivers/scsi/sr.c --- /mnt/kscratch/linux-2.6.4/drivers/scsi/sr.c 2004-04-02 14:53:42.000000000 +0200 +++ linux-2.6.4-SUSE-20040402/drivers/scsi/sr.c 2004-04-05 15:57:35.577714398 +0200 @@ -419,13 +419,36 @@ static int sr_block_open(struct inode *inode, struct file *file) { struct scsi_cd *cd = scsi_cd(inode->i_bdev->bd_disk); - return cdrom_open(&cd->cdi, inode, file); + struct scsi_device *sdev = cd->device; + int retval; + + retval = scsi_device_get(sdev); + if (retval) + return retval; + + retval = cdrom_open(&cd->cdi, inode, file); + if (retval) + scsi_device_put(sdev); + + return retval; +} + +static void sr_cleanup(struct scsi_cd *cd) +{ + if (!unregister_cdrom(&cd->cdi)) { + scsi_device_put(cd->device); + kfree(cd); + } } static int sr_block_release(struct inode *inode, struct file *file) { struct scsi_cd *cd = scsi_cd(inode->i_bdev->bd_disk); - return cdrom_release(&cd->cdi, file); + int ret; + + ret = cdrom_release(&cd->cdi, file); + sr_cleanup(cd); + return ret; } static int sr_block_ioctl(struct inode *inode, struct file *file, unsigned cmd, @@ -467,10 +490,6 @@ struct scsi_device *sdev = cd->device; int retval; - retval = scsi_device_get(sdev); - if (retval) - return retval; - /* * If the device is in error recovery, wait until it is done. * If the device is offline, then disallow any access to it. @@ -499,8 +518,6 @@ if (cd->device->sector_size > 2048) sr_set_blocklength(cd, 2048); - - scsi_device_put(cd->device); } static int sr_probe(struct device *dev) @@ -874,8 +891,8 @@ spin_unlock(&sr_index_lock); put_disk(cd->disk); - unregister_cdrom(&cd->cdi); - kfree(cd); + + sr_cleanup(cd); return 0; } diff -ur /mnt/kscratch/linux-2.6.4/include/linux/cdrom.h linux-2.6.4-SUSE-20040402/include/linux/cdrom.h --- /mnt/kscratch/linux-2.6.4/include/linux/cdrom.h 2004-04-02 14:53:42.000000000 +0200 +++ linux-2.6.4-SUSE-20040402/include/linux/cdrom.h 2004-04-05 15:10:46.347243055 +0200 @@ -916,6 +916,7 @@ /* Uniform cdrom data structures for cdrom.c */ struct cdrom_device_info { + atomic_t all_use; struct cdrom_device_ops *ops; /* link to device_ops */ struct cdrom_device_info *next; /* next device_info for this major */ struct gendisk *disk; /* matching block layer disk */ -- Jens Axboe ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-05 14:03 ` Jens Axboe @ 2004-04-05 21:08 ` James Bottomley 2004-04-06 9:22 ` Jens Axboe 0 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2004-04-05 21:08 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrew Morton, greg, linux-usb-devel, SCSI Mailing List On Mon, 2004-04-05 at 09:03, Jens Axboe wrote: > Doesn't quite work, how about something like this as a work-around? I'm not sure we want to be introducing yet another refcount. What was the problem? James ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: bug 2400 2004-04-05 21:08 ` James Bottomley @ 2004-04-06 9:22 ` Jens Axboe 0 siblings, 0 replies; 61+ messages in thread From: Jens Axboe @ 2004-04-06 9:22 UTC (permalink / raw) To: James Bottomley; +Cc: Andrew Morton, greg, linux-usb-devel, SCSI Mailing List On Mon, Apr 05 2004, James Bottomley wrote: > On Mon, 2004-04-05 at 09:03, Jens Axboe wrote: > > Doesn't quite work, how about something like this as a work-around? > > I'm not sure we want to be introducing yet another refcount. What was > the problem? Same as in this thread, cdrom gets unregister'ed and freed too early. Your patch looks better though, the new one that is. -- Jens Axboe ^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2004-04-09 11:28 UTC | newest] Thread overview: 61+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-04-01 21:15 bug 2400 Andrew Morton 2004-04-01 21:52 ` Matt Gulick 2004-04-01 22:08 ` Andrew Morton 2004-04-01 22:48 ` Matt Gulick 2004-04-01 22:40 ` James Bottomley 2004-04-01 22:53 ` Matt Gulick 2004-04-01 23:07 ` Matthew Dharm 2004-04-01 23:32 ` James Bottomley 2004-04-02 0:29 ` Steven Dake 2004-04-02 8:43 ` Mike Anderson 2004-04-02 15:57 ` James Bottomley 2004-04-02 16:45 ` Mike Anderson 2004-04-02 17:05 ` James Bottomley 2004-04-02 17:44 ` Mike Anderson 2004-04-02 18:13 ` James Bottomley 2004-04-02 23:40 ` Mike Anderson 2004-04-03 0:25 ` James Bottomley 2004-04-04 1:40 ` Alan Stern 2004-04-04 15:23 ` James Bottomley 2004-04-04 16:46 ` Alan Stern 2004-04-04 17:04 ` James Bottomley 2004-04-05 3:17 ` Alan Stern 2004-04-05 14:59 ` Mike Anderson 2004-04-05 21:27 ` James Bottomley 2004-04-06 14:00 ` Alan Stern 2004-04-05 22:10 ` Patrick Mansfield 2004-04-06 14:10 ` Alan Stern 2004-04-08 14:09 ` Alan Stern 2004-04-08 16:24 ` Matt Gulick 2004-04-08 18:33 ` Alan Stern 2004-04-08 19:44 ` Matt Gulick 2004-04-05 13:30 ` [linux-usb-devel] " Oliver Neukum 2004-04-04 18:16 ` David Brownell 2004-04-04 18:42 ` James Bottomley 2004-04-05 3:54 ` David Brownell 2004-04-05 21:44 ` James Bottomley 2004-04-05 23:23 ` [linux-usb-devel] " David Brownell 2004-04-06 1:19 ` James Bottomley 2004-04-06 6:52 ` Oliver Neukum 2004-04-06 14:03 ` James Bottomley 2004-04-07 9:19 ` Oliver.Neukum 2004-04-06 15:10 ` David Brownell 2004-04-06 15:47 ` James Bottomley 2004-04-06 16:16 ` David Brownell 2004-04-06 16:55 ` Alan Stern 2004-04-06 17:13 ` James Bottomley 2004-04-02 23:36 ` James Bottomley 2004-04-03 0:11 ` Mike Anderson 2004-04-03 0:16 ` James Bottomley 2004-04-05 4:33 ` Patrick Mansfield 2004-04-05 14:09 ` James Bottomley 2004-04-05 21:07 ` James Bottomley 2004-04-06 9:22 ` Jens Axboe 2004-04-06 13:56 ` James Bottomley 2004-04-06 14:04 ` Jens Axboe 2004-04-06 14:09 ` James Bottomley 2004-04-08 23:06 ` Greg KH 2004-04-09 11:28 ` James Bottomley 2004-04-05 14:03 ` Jens Axboe 2004-04-05 21:08 ` James Bottomley 2004-04-06 9:22 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox