bug 2400

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* bug 2400
@ 2004-04-01 21:15 Andrew Morton
  2004-04-01 21:52 ` Matt Gulick
                   ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Andrew Morton @ 2004-04-01 21:15 UTC (permalink / raw)
  To: linux-usb-devel, linux-scsi

Apparently there is some controversy over whether this:
http://bugme.osdl.org/show_bug.cgi?id=2400 is a usb-storage problem or a
SCSI problem.

Can someone please shed some light, propose a way to get it fixed?

Thanks.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-01 21:15 bug 2400 Andrew Morton
@ 2004-04-01 21:52 ` Matt Gulick
  2004-04-01 22:08   ` Andrew Morton
  2004-04-01 22:40   ` James Bottomley
  2004-04-01 23:07 ` Matthew Dharm
  2004-04-01 23:32 ` James Bottomley
  2 siblings, 2 replies; 62+ messages in thread
From: Matt Gulick @ 2004-04-01 21:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-usb-devel, linux-scsi

On Thu, 2004-04-01 at 15:15, Andrew Morton wrote:
> Apparently there is some controversy over whether this:
> http://bugme.osdl.org/show_bug.cgi?id=2400 is a usb-storage problem or
> a
> SCSI problem.
> 
> Can someone please shed some light, propose a way to get it fixed?
> 
> Thanks.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andrew,

This could be in the SCSI stack.  USB is just a transport mechanism or
physical layer.  The commands are actually handled by SCSI.  The
commands are packaged into SBP-2 (SCSI Bus Protocol) data grams for
delivery.

Where you will get into trouble is that SCSI is not considered to be a
'Dynamic' bus.  Devices do not normally come and go, while USB and 1394
devices are 'Hot Plugable'.

Think about what happens on MacOS X or Windose, the device must first be
stopped and ejected (virtual ejection) prior to unplugging.  This allows
the driver to tear down the device control structures and to release the
associated memory.

When the device was disconnected, these structures may have been torn
down and released without the SCSI drivers knowledge, hence the zero
pointer issue.

Remember, SCSI is not hot plugable.  The SCSI driver must receive a
START/STOP cdb with the bits set for STOP.  An EJECT call would not hurt
if supported by the device in question.

Just like I am constantly telling my kids, "Just 'cause you can, does
not mean you should"

Matt

----------------------------------------
Matt Gulick
Sr. Staff Engineer
Adaptec, Inc.
gulickconsulting@direcway.com
matt_gulick@adaptec.com
(715) 426-0884

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-01 21:52 ` Matt Gulick
@ 2004-04-01 22:08   ` Andrew Morton
  2004-04-01 22:48     ` Matt Gulick
  2004-04-01 22:40   ` James Bottomley
  1 sibling, 1 reply; 62+ messages in thread
From: Andrew Morton @ 2004-04-01 22:08 UTC (permalink / raw)
  To: gulickconsulting; +Cc: linux-usb-devel, linux-scsi

Matt Gulick <gulickconsulting@direcway.com> wrote:
>
> Remember, SCSI is not hot plugable.

oopsing the kernel is not a particularly friendly way of telling this to
our users.

If we have to bandaid over this, say, by leaking some memory and emitting
some rude printk's then OK.  Allowing the machine to kill itself,
potentially losing all the user's unsaved work is distinctly windows 95ish,
and is not OK, agree?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-01 21:52 ` Matt Gulick
  2004-04-01 22:08   ` Andrew Morton
@ 2004-04-01 22:40   ` James Bottomley
  2004-04-01 22:53     ` Matt Gulick
  1 sibling, 1 reply; 62+ messages in thread
From: James Bottomley @ 2004-04-01 22:40 UTC (permalink / raw)
  To: gulickconsulting; +Cc: Andrew Morton, linux-usb-devel, SCSI Mailing List

On Thu, 2004-04-01 at 16:52, Matt Gulick wrote:
> Where you will get into trouble is that SCSI is not considered to be a
> 'Dynamic' bus.  Devices do not normally come and go, while USB and 1394
> devices are 'Hot Plugable'.

It isn't? considering all the refcounting work that's gone into the SCSI
stack to make it support hotplug transports, I don't think such a
blanket statement is supportable.

James



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-01 22:08   ` Andrew Morton
@ 2004-04-01 22:48     ` Matt Gulick
  0 siblings, 0 replies; 62+ messages in thread
From: Matt Gulick @ 2004-04-01 22:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-usb-devel, linux-scsi

On Thu, 2004-04-01 at 16:08, Andrew Morton wrote:
> Matt Gulick <gulickconsulting@direcway.com> wrote:
> >
> > Remember, SCSI is not hot plugable.
> 
> oopsing the kernel is not a particularly friendly way of telling this
> to
> our users.
> 
> If we have to bandaid over this, say, by leaking some memory and
> emitting
> some rude printk's then OK.  Allowing the machine to kill itself,
> potentially losing all the user's unsaved work is distinctly windows
> 95ish,
> and is not OK, agree?
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Very true.  Oopsing the kernel is never good.(the BSOD is my favorite
screen though, when a PC crashes and my iBook chugs along)

The thing to do is to run through a clean-up proc when this happens. 
This is also why I use multiple ptrs in my code.  The structures would
then be torn down and released.  An alert might also be generated to
inform the user that the device was not properly removed which could
result in data loss.

Anything is better than an Oops.

I would volunteer to take this on but I am between jobs at the moment
and would not want to start something that I did not have time to
properly complete.  I will look at it and see if I have any suggestions
for whoever takes this on.

Matt

----------------------------------------
Matt Gulick
Sr. Staff Engineer
Adaptec, Inc.
gulickconsulting@direcway.com
matt_gulick@adaptec.com
(715) 426-0884

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-01 22:40   ` James Bottomley
@ 2004-04-01 22:53     ` Matt Gulick
  0 siblings, 0 replies; 62+ messages in thread
From: Matt Gulick @ 2004-04-01 22:53 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, linux-usb-devel, SCSI Mailing List

On Thu, 2004-04-01 at 16:40, James Bottomley wrote:
> On Thu, 2004-04-01 at 16:52, Matt Gulick wrote:
> > Where you will get into trouble is that SCSI is not considered to be
> a
> > 'Dynamic' bus.  Devices do not normally come and go, while USB and
> 1394
> > devices are 'Hot Plugable'.
> 
> It isn't? considering all the refcounting work that's gone into the
> SCSI
> stack to make it support hotplug transports, I don't think such a
> blanket statement is supportable.
> 
> James

James,

You may be right.  I have not fully dissected the Linux SCSI stack.  I
am very intimately familiar with the Darwin SCSI stack as well as the
Windose stacks (I had to fully architect USB 2.0 stack with support for
SBP-2 under SE, ME and 2k, [I feel so dirty]).

The Linux stack may be more forward looking.

Matt

----------------------------------------
Matt Gulick
Sr. Staff Engineer
Adaptec, Inc.
gulickconsulting@direcway.com
matt_gulick@adaptec.com
(715) 426-0884



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-01 21:15 bug 2400 Andrew Morton
  2004-04-01 21:52 ` Matt Gulick
@ 2004-04-01 23:07 ` Matthew Dharm
  2004-04-01 23:32 ` James Bottomley
  2 siblings, 0 replies; 62+ messages in thread
From: Matthew Dharm @ 2004-04-01 23:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-usb-devel, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 729 bytes --]

On Thu, Apr 01, 2004 at 01:15:02PM -0800, Andrew Morton wrote:
> 
> Apparently there is some controversy over whether this:
> http://bugme.osdl.org/show_bug.cgi?id=2400 is a usb-storage problem or a
> SCSI problem.
> 
> Can someone please shed some light, propose a way to get it fixed?

There's controversy over this?

It looks pretty clearly to be a SCSI or CD-ROM bug, not a usb-storage bug.
The OOPS looks like it's all about object handling, and usb-storage does
nothing directly with objects.

Matt

-- 
Matthew Dharm                              Home: mdharm-usb@one-eyed-alien.net 
Maintainer, Linux USB Mass Storage Driver

Why am I talking to a toilet brush?
					-- CEO
User Friendly, 4/30/1998

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-01 21:15 bug 2400 Andrew Morton
  2004-04-01 21:52 ` Matt Gulick
  2004-04-01 23:07 ` Matthew Dharm
@ 2004-04-01 23:32 ` James Bottomley
  2004-04-02  0:29   ` Steven Dake
                     ` (2 more replies)
  2 siblings, 3 replies; 62+ messages in thread
From: James Bottomley @ 2004-04-01 23:32 UTC (permalink / raw)
  To: Andrew Morton, greg, Jens Axboe; +Cc: linux-usb-devel, SCSI Mailing List

On Thu, 2004-04-01 at 16:15, Andrew Morton wrote:
> Apparently there is some controversy over whether this:
> http://bugme.osdl.org/show_bug.cgi?id=2400 is a usb-storage problem or a
> SCSI problem.

Well, I know why this happens, but I'm not entirely clear how to fix it.

The problem comes because the cdrom open and close take and release
references to the SCSI generic device (as they're supposed to).

However, Upper level Drivers like sr are implemented as generic device
drivers in the driverfs model.

When a USB unplug comes along, it calls scsi_remove_device, which
eventually calls device_del().  The problem is that device_del triggers
the ->remove methods of all the attached drivers and the sr_remove
method calls cdrom_unregister which throws away the cdrom device state,
even though the actual device has active references.

Some time later, the device is closed but there's now bogus state
because the sr_remove method has kfreed the struct scsi_cd which
contains the struct cdrom_device_info.

Now, the questions are, whose issue is this and how do we fix it?  I can
see that a driver needs early notification of unplugs so it can deny all
access to a gone device.  On the other hand, for a user land open where
we still have to hold resources in the driver, we'd like the driver to
have a notify when the device reference count drops to zero so we can
clean up.

This problem, by the way, exists in a lesser form for sd: the sd remove
method will free the device for reattachment even though it might have
active references.

James

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-01 23:32 ` James Bottomley
@ 2004-04-02  0:29   ` Steven Dake
  2004-04-02  8:43   ` Mike Anderson
  2004-04-02 23:36   ` James Bottomley
  2 siblings, 0 replies; 62+ messages in thread
From: Steven Dake @ 2004-04-02  0:29 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On Thu, 2004-04-01 at 16:32, James Bottomley wrote:
> On Thu, 2004-04-01 at 16:15, Andrew Morton wrote:
> > Apparently there is some controversy over whether this:
> > http://bugme.osdl.org/show_bug.cgi?id=2400 is a usb-storage problem or a
> > SCSI problem.
> 
> Well, I know why this happens, but I'm not entirely clear how to fix it.
> 
> The problem comes because the cdrom open and close take and release
> references to the SCSI generic device (as they're supposed to).
> 
> However, Upper level Drivers like sr are implemented as generic device
> drivers in the driverfs model.
> 
> When a USB unplug comes along, it calls scsi_remove_device, which
> eventually calls device_del().  The problem is that device_del triggers
> the ->remove methods of all the attached drivers and the sr_remove
> method calls cdrom_unregister which throws away the cdrom device state,
> even though the actual device has active references.
> 
> Some time later, the device is closed but there's now bogus state
> because the sr_remove method has kfreed the struct scsi_cd which
> contains the struct cdrom_device_info.
> 
> Now, the questions are, whose issue is this and how do we fix it?  I can
> see that a driver needs early notification of unplugs so it can deny all
> access to a gone device.  On the other hand, for a user land open where
> we still have to hold resources in the driver, we'd like the driver to
> have a notify when the device reference count drops to zero so we can
> clean up.
> 
> This problem, by the way, exists in a lesser form for sd: the sd remove
> method will free the device for reattachment even though it might have
> active references.
> 

One option to fix all this is to add a mechanism to blow away active
references to any block devices.  These references can be files,
filesystems, raid devices, lvm devices, etc.  This way, the remove can
be called because no active references are left.  The user gets a slew
of errors in the user application that has the open file on the
filesystem or block device, but atleast these can be managed (vs an
oopsed kernel).

Of course, adding forced unmount (required to blow away the references)
may not be desireable for 2.6 since it is invasive..  Perhaps something
to be considered for 2.7?

Thanks
-steve

> James
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-01 23:32 ` James Bottomley
  2004-04-02  0:29   ` Steven Dake
@ 2004-04-02  8:43   ` Mike Anderson
  2004-04-02 15:57     ` James Bottomley
  2004-04-02 23:36   ` James Bottomley
  2 siblings, 1 reply; 62+ messages in thread
From: Mike Anderson @ 2004-04-02  8:43 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List, stern

James Bottomley [James.Bottomley@steeleye.com] wrote:
> 
> Well, I know why this happens, but I'm not entirely clear how to fix it.
> 
> The problem comes because the cdrom open and close take and release
> references to the SCSI generic device (as they're supposed to).
> 
> However, Upper level Drivers like sr are implemented as generic device
> drivers in the driverfs model.
> 
> When a USB unplug comes along, it calls scsi_remove_device, which
> eventually calls device_del().  The problem is that device_del triggers
> the ->remove methods of all the attached drivers and the sr_remove
> method calls cdrom_unregister which throws away the cdrom device state,
> even though the actual device has active references.

Yes, we reordered some of this in sd. As your comment down below
indicates reordering will reduce the window but not eliminate it.

> 
> Some time later, the device is closed but there's now bogus state
> because the sr_remove method has kfreed the struct scsi_cd which
> contains the struct cdrom_device_info.
> 
> Now, the questions are, whose issue is this and how do we fix it?  I can
> see that a driver needs early notification of unplugs so it can deny all
> access to a gone device.  On the other hand, for a user land open where
> we still have to hold resources in the driver, we'd like the driver to
> have a notify when the device reference count drops to zero so we can
> clean up.
> 
> This problem, by the way, exists in a lesser form for sd: the sd remove
> method will free the device for reattachment even though it might have
> active references.
> 

I have looked at the sd issue off and on due to the previous open race
reported by Alan Stern. While the window can be narrowed inside SCSI you
need help for the calling subsystem to know when it is OK to cleanup and
your routine will not be called anymore. A similar problem also showed
up in the tear down the host directory entry in /proc/scsi but was only
fixed up so far due to its depreciated status.

http://marc.theaimsgroup.com/?t=105545175900001&r=1&w=2

I believe as indicated above that all cross subsystem registrations need
a release / put callback. This would allow the release chain to be
called from block -> ULD -> scsi core -> LLDD. 

Recently I have not been spending the proper time looking at this, but
last look it appeared that we needed to  add a release / put method call
to the gendisk disk_release routine. The release function or object to do
the put on would need to be set prior to the call to add_disk.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-02  8:43   ` Mike Anderson
@ 2004-04-02 15:57     ` James Bottomley
  2004-04-02 16:45       ` Mike Anderson
  0 siblings, 1 reply; 62+ messages in thread
From: James Bottomley @ 2004-04-02 15:57 UTC (permalink / raw)
  To: Mike Anderson
  Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List, stern

On Fri, 2004-04-02 at 03:43, Mike Anderson wrote:
> I have looked at the sd issue off and on due to the previous open race
> reported by Alan Stern. While the window can be narrowed inside SCSI you
> need help for the calling subsystem to know when it is OK to cleanup and
> your routine will not be called anymore. A similar problem also showed
> up in the tear down the host directory entry in /proc/scsi but was only
> fixed up so far due to its depreciated status.
> 
> http://marc.theaimsgroup.com/?t=105545175900001&r=1&w=2
> 
> I believe as indicated above that all cross subsystem registrations need
> a release / put callback. This would allow the release chain to be
> called from block -> ULD -> scsi core -> LLDD. 
> 
> Recently I have not been spending the proper time looking at this, but
> last look it appeared that we needed to  add a release / put method call
> to the gendisk disk_release routine. The release function or object to do
> the put on would need to be set prior to the call to add_disk.

Actually, I can't believe this problem to be local entirely to SCSI. 
So, a simpler mechanism (and more globally useful) might be to have a
two phase driver release in sysfs:

- the current ->remove would stay where it is (as a notify on
device_del).  On receving this the driver begins clean up enough to drop
any internal references to the device it is holding.
- then introduce a ->release which is called as part of dropping the
last device reference where the driver cleans up any resources the
driver was keeping to service the removed but not released device.

Then, we'd obviously not call unregister_cdrom and kfree the scsi_cd
structure until ->release time.

James



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-02 15:57     ` James Bottomley
@ 2004-04-02 16:45       ` Mike Anderson
  2004-04-02 17:05         ` James Bottomley
  0 siblings, 1 reply; 62+ messages in thread
From: Mike Anderson @ 2004-04-02 16:45 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List, stern

James Bottomley [James.Bottomley@steeleye.com] wrote:
> > Recently I have not been spending the proper time looking at this, but
> > last look it appeared that we needed to  add a release / put method call
> > to the gendisk disk_release routine. The release function or object to do
> > the put on would need to be set prior to the call to add_disk.
> 
> Actually, I can't believe this problem to be local entirely to SCSI. 
> So, a simpler mechanism (and more globally useful) might be to have a
> two phase driver release in sysfs:

I would agree this should be more general than SCSI, but if kobjects are
being passed during the registration / setup in other subsystem
interfaces than this could already be handled.

> 
> - the current ->remove would stay where it is (as a notify on
> device_del).  On receving this the driver begins clean up enough to drop
> any internal references to the device it is holding.
> - then introduce a ->release which is called as part of dropping the
> last device reference where the driver cleans up any resources the
> driver was keeping to service the removed but not released device.
> 
> Then, we'd obviously not call unregister_cdrom and kfree the scsi_cd
> structure until ->release time.

Maybe some clarification here as I am unsure if we both think there
needs to be a notification (a put call) from outside SCSI. We have
release functions available on most objects in SCSI now. The issue is
that when we register (add_disk, dev_set_drvdata, etc.) or pass a handle
to another subsystem we need a reference count agreement to know when
the other subsystem is done with the the object. Something like the
put_device(parent) used in scsi_host_dev_release.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-02 16:45       ` Mike Anderson
@ 2004-04-02 17:05         ` James Bottomley
  2004-04-02 17:44           ` Mike Anderson
  0 siblings, 1 reply; 62+ messages in thread
From: James Bottomley @ 2004-04-02 17:05 UTC (permalink / raw)
  To: Mike Anderson
  Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List, stern

On Fri, 2004-04-02 at 11:45, Mike Anderson wrote:
> Maybe some clarification here as I am unsure if we both think there
> needs to be a notification (a put call) from outside SCSI. We have
> release functions available on most objects in SCSI now. The issue is
> that when we register (add_disk, dev_set_drvdata, etc.) or pass a handle
> to another subsystem we need a reference count agreement to know when
> the other subsystem is done with the the object. Something like the
> put_device(parent) used in scsi_host_dev_release.

Actually, no, that's not the issue here, if I understand you.  The
reference counting model on the sdev->sdev_gendev seems to be working
correctly because sr.c takes a reference to the sdev_gendev on open and
drops it on close.

The problem is that ULDs are implemented as struct device_drivers and as
such, their ->remove method gets called *not* on last put of sdev_gendev
but on device_del (when there are still active references).

sr.c frees the cdinfo structure on ->remove, but still has its own
reference to sdev_gendev (because the device is still open). On final
close, the generic cdrom code tries to use cdinfo to close the device
and references a kfree'd structure.  Really what sr.c wants to be doing
is freeing the cdinfo structure on last put, not on device_del.

James

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-02 17:05         ` James Bottomley
@ 2004-04-02 17:44           ` Mike Anderson
  2004-04-02 18:13             ` James Bottomley
  0 siblings, 1 reply; 62+ messages in thread
From: Mike Anderson @ 2004-04-02 17:44 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List, stern

James Bottomley [James.Bottomley@SteelEye.com] wrote:
> On Fri, 2004-04-02 at 11:45, Mike Anderson wrote:
> > Maybe some clarification here as I am unsure if we both think there
> > needs to be a notification (a put call) from outside SCSI. We have
> > release functions available on most objects in SCSI now. The issue is
> > that when we register (add_disk, dev_set_drvdata, etc.) or pass a handle
> > to another subsystem we need a reference count agreement to know when
> > the other subsystem is done with the the object. Something like the
> > put_device(parent) used in scsi_host_dev_release.
> 
> Actually, no, that's not the issue here, if I understand you.  The
> reference counting model on the sdev->sdev_gendev seems to be working
> correctly because sr.c takes a reference to the sdev_gendev on open and
> drops it on close.
> 
> The problem is that ULDs are implemented as struct device_drivers and as
> such, their ->remove method gets called *not* on last put of sdev_gendev
> but on device_del (when there are still active references).

The remove can do as much or as little as the implementor wishes, but I
believe there is still a under lying issue here (see comment below).
> 
> sr.c frees the cdinfo structure on ->remove, but still has its own
> reference to sdev_gendev (because the device is still open). On final
> close, the generic cdrom code tries to use cdinfo to close the device
> and references a kfree'd structure.  Really what sr.c wants to be doing
> is freeing the cdinfo structure on last put, not on device_del.
> 

Where does the last put come from? How do you close the open race or
know when the final put_disk has been called? SCSI cannot do this alone
as we have created and registered an object in another subsystem
(alloc_disk and add_disk) and we have no indication when that objects
ref count has reached zero. 

Or

As I previous stated in the thread below I have the gendisk /
block layer locking mis-understood and there is something that SCSI can
do.

For reference you can look at this thread sent by Alan about a sd race.
http://marc.theaimsgroup.com/?t=107591185800003&r=1&w=2

While freeing in sr could be rearranged more like what sd does there is
still the issue of a cross subsystem put to know that the ULDs open
function will not be called again.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-02 17:44           ` Mike Anderson
@ 2004-04-02 18:13             ` James Bottomley
  2004-04-02 23:40               ` Mike Anderson
  0 siblings, 1 reply; 62+ messages in thread
From: James Bottomley @ 2004-04-02 18:13 UTC (permalink / raw)
  To: Mike Anderson
  Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List, stern

On Fri, 2004-04-02 at 12:44, Mike Anderson wrote:
> Where does the last put come from? How do you close the open race or
> know when the final put_disk has been called? SCSI cannot do this alone
> as we have created and registered an object in another subsystem
> (alloc_disk and add_disk) and we have no indication when that objects
> ref count has reached zero. 
> 
> Or
> 
> As I previous stated in the thread below I have the gendisk /
> block layer locking mis-understood and there is something that SCSI can
> do.

well, sr has elected to merge these, so it takes a reference to
sdev_gendev on first open and releases it on last close of the block
device.  This is what ties the SCSI model into the final put_disk().

We founder on calling driver ->remove before the final put of
sdev_gendev.

Anything with objects in more than one refcounted subsystem is
responsible for tying the refcounts together uniformly.

There should be no open race as long as we error out correctly if a
reference to the underlying sdev_gendev cannot be obtained (because the
object is being destroyed).  sr seems to do this correctly.  The
indication when the non-scsi object's refcount reaches zero is given to
us because at that point the sr code does a put of the sdev_gendev (and
if this is the last put, that should trigger cleanup).

James

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-01 23:32 ` James Bottomley
  2004-04-02  0:29   ` Steven Dake
  2004-04-02  8:43   ` Mike Anderson
@ 2004-04-02 23:36   ` James Bottomley
  2004-04-03  0:11     ` Mike Anderson
                       ` (2 more replies)
  2 siblings, 3 replies; 62+ messages in thread
From: James Bottomley @ 2004-04-02 23:36 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On Thu, 2004-04-01 at 18:32, James Bottomley wrote:
> Now, the questions are, whose issue is this and how do we fix it?  I can
> see that a driver needs early notification of unplugs so it can deny all
> access to a gone device.  On the other hand, for a user land open where
> we still have to hold resources in the driver, we'd like the driver to
> have a notify when the device reference count drops to zero so we can
> clean up.

OK, I think this is easily fixable in sr.c by moving around the release
code to do the right thing (and not drop the reference to sdev_gendev
until we're completely finished).

Jens, does this look OK (or have I just opened up another race window
somewhere else)?

James

===== drivers/scsi/sr.c 1.103 vs edited =====
--- 1.103/drivers/scsi/sr.c	Fri Apr  2 11:30:44 2004
+++ edited/drivers/scsi/sr.c	Fri Apr  2 17:29:06 2004
@@ -424,8 +424,19 @@
 
 static int sr_block_release(struct inode *inode, struct file *file)
 {
+	int ret;
 	struct scsi_cd *cd = scsi_cd(inode->i_bdev->bd_disk);
-	return cdrom_release(&cd->cdi, file);
+	struct scsi_device *sdev = cd->device;
+	ret = cdrom_release(&cd->cdi, file);
+	if(ret)
+		return ret;
+	
+	unregister_cdrom(&cd->cdi);
+	kfree(cd);
+
+	scsi_device_put(sdev);
+
+	return 0;
 }
 
 static int sr_block_ioctl(struct inode *inode, struct file *file, unsigned cmd,
@@ -500,7 +511,6 @@
 	if (cd->device->sector_size > 2048)
 		sr_set_blocklength(cd, 2048);
 
-	scsi_device_put(cd->device);
 }
 
 static int sr_probe(struct device *dev)
@@ -874,9 +884,6 @@
 	spin_unlock(&sr_index_lock);
 
 	put_disk(cd->disk);
-	unregister_cdrom(&cd->cdi);
-	kfree(cd);
-
 	return 0;
 }
 



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-02 18:13             ` James Bottomley
@ 2004-04-02 23:40               ` Mike Anderson
  2004-04-03  0:25                 ` James Bottomley
  0 siblings, 1 reply; 62+ messages in thread
From: Mike Anderson @ 2004-04-02 23:40 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List, stern

James Bottomley [James.Bottomley@SteelEye.com] wrote:
> On Fri, 2004-04-02 at 12:44, Mike Anderson wrote:
> > Where does the last put come from? How do you close the open race or
> > know when the final put_disk has been called? SCSI cannot do this alone
> > as we have created and registered an object in another subsystem
> > (alloc_disk and add_disk) and we have no indication when that objects
> > ref count has reached zero. 
> > 
> > Or
> > 
> > As I previous stated in the thread below I have the gendisk /
> > block layer locking mis-understood and there is something that SCSI can
> > do.
> 
> well, sr has elected to merge these, so it takes a reference to
> sdev_gendev on first open and releases it on last close of the block
> device.  This is what ties the SCSI model into the final put_disk().

I think there is a little more to tie this together. see below.

> 
> We founder on calling driver ->remove before the final put of
> sdev_gendev.

I do not believe we founder by calling, but what we do in the remove
function. The way sd is split out seems to match what you think remove
should do, but I do not have externally callable a release as it is
handled inside sd.

> 
> Anything with objects in more than one refcounted subsystem is
> responsible for tying the refcounts together uniformly.
> 
> There should be no open race as long as we error out correctly if a
> reference to the underlying sdev_gendev cannot be obtained (because the
> object is being destroyed).  sr seems to do this correctly.  The
> indication when the non-scsi object's refcount reaches zero is given to
> us because at that point the sr code does a put of the sdev_gendev (and
> if this is the last put, that should trigger cleanup).

Greg stopped by and after talking this over I think I see why sd is
racing in its current form. The race happens when sd_remove and do_open
race. Even though I do not like adding a lock_kernel it would appear
adding on to sd_remove would serialize sd_remove and do_open. This would
ensure either do_open's get_gendisk returns a gendisk struct and sd
ref's are incremented or we will start cleaning up and sd_open will not
be called.

I would believe similar alignment in sr.c to what sd is doing plus
agreement on the lock_kernel should fix both drivers.

I think the "error out correctly" on trying to get a ref on sdev_gendev
may need some higher serialization as I think there is a race on a release
function starting and the reference count trying to be taken to 1 (i.e.
you need something subsystem wide as you cannot look at the item you
maybe deleting.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-02 23:36   ` James Bottomley
@ 2004-04-03  0:11     ` Mike Anderson
  2004-04-03  0:16       ` James Bottomley
  2004-04-05  4:33     ` Patrick Mansfield
  2004-04-05 14:03     ` Jens Axboe
  2 siblings, 1 reply; 62+ messages in thread
From: Mike Anderson @ 2004-04-03  0:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

James Bottomley [James.Bottomley@steeleye.com] wrote:
> ===== drivers/scsi/sr.c 1.103 vs edited =====
> --- 1.103/drivers/scsi/sr.c	Fri Apr  2 11:30:44 2004
> +++ edited/drivers/scsi/sr.c	Fri Apr  2 17:29:06 2004
> @@ -424,8 +424,19 @@
>  
>  static int sr_block_release(struct inode *inode, struct file *file)
>  {
> +	int ret;
>  	struct scsi_cd *cd = scsi_cd(inode->i_bdev->bd_disk);
> -	return cdrom_release(&cd->cdi, file);
> +	struct scsi_device *sdev = cd->device;
> +	ret = cdrom_release(&cd->cdi, file);
> +	if(ret)
> +		return ret;

It looks like cdrom_release always returns 0? Is there some other patch
outstanding that changes this.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-03  0:11     ` Mike Anderson
@ 2004-04-03  0:16       ` James Bottomley
  0 siblings, 0 replies; 62+ messages in thread
From: James Bottomley @ 2004-04-03  0:16 UTC (permalink / raw)
  To: Mike Anderson
  Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On Fri, 2004-04-02 at 19:11, Mike Anderson wrote:
> It looks like cdrom_release always returns 0? Is there some other patch
> outstanding that changes this.

Yes, not that I know of.

However, while it apparently has a return code indicating success or
failure, we should respect it [unless Jens wants to make it a void
function].

James
 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-02 23:40               ` Mike Anderson
@ 2004-04-03  0:25                 ` James Bottomley
  2004-04-04  1:40                   ` Alan Stern
  0 siblings, 1 reply; 62+ messages in thread
From: James Bottomley @ 2004-04-03  0:25 UTC (permalink / raw)
  To: Mike Anderson
  Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List, stern

On Fri, 2004-04-02 at 18:40, Mike Anderson wrote:
> Greg stopped by and after talking this over I think I see why sd is
> racing in its current form. The race happens when sd_remove and do_open
> race. Even though I do not like adding a lock_kernel it would appear
> adding on to sd_remove would serialize sd_remove and do_open. This would
> ensure either do_open's get_gendisk returns a gendisk struct and sd
> ref's are incremented or we will start cleaning up and sd_open will not
> be called.
> 
> I would believe similar alignment in sr.c to what sd is doing plus
> agreement on the lock_kernel should fix both drivers.
> 
> I think the "error out correctly" on trying to get a ref on sdev_gendev
> may need some higher serialization as I think there is a race on a release
> function starting and the reference count trying to be taken to 1 (i.e.
> you need something subsystem wide as you cannot look at the item you
> maybe deleting.

I'm not convinced we need any other serialisation.  As long as we get
the reference we may use the device (of course if the device is being
removed and sd_remove is being called, then I/O to it will fail, but I
think that's fine). Leaving the user with an open device that won't
accept I/O is also fine (as long as it returns the correct error codes).

Could you outline what the consequences of this race are?

James



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-03  0:25                 ` James Bottomley
@ 2004-04-04  1:40                   ` Alan Stern
  2004-04-04 15:23                     ` James Bottomley
  0 siblings, 1 reply; 62+ messages in thread
From: Alan Stern @ 2004-04-04  1:40 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On 2 Apr 2004, James Bottomley wrote:

> On Fri, 2004-04-02 at 18:40, Mike Anderson wrote:
> > Greg stopped by and after talking this over I think I see why sd is
> > racing in its current form. The race happens when sd_remove and do_open
> > race. Even though I do not like adding a lock_kernel it would appear
> > adding on to sd_remove would serialize sd_remove and do_open. This would
> > ensure either do_open's get_gendisk returns a gendisk struct and sd
> > ref's are incremented or we will start cleaning up and sd_open will not
> > be called.
> > 
> > I would believe similar alignment in sr.c to what sd is doing plus
> > agreement on the lock_kernel should fix both drivers.
> > 
> > I think the "error out correctly" on trying to get a ref on sdev_gendev
> > may need some higher serialization as I think there is a race on a release
> > function starting and the reference count trying to be taken to 1 (i.e.
> > you need something subsystem wide as you cannot look at the item you
> > maybe deleting.
> 
> I'm not convinced we need any other serialisation.  As long as we get
> the reference we may use the device (of course if the device is being
> removed and sd_remove is being called, then I/O to it will fail, but I
> think that's fine). Leaving the user with an open device that won't
> accept I/O is also fine (as long as it returns the correct error codes).
> 
> Could you outline what the consequences of this race are?

Without having looked recently in any detail at the specific code in sd.c
or sr.c, I want to comment in general terms on the nature of the
open/disconnect race.

It's a generic problem that affects every driver whose devices can be
opened through the filesystem.  It _cannot_ be solved by adding any sort
of lock to the device structure.  This is because the device structure is
not available to the open() routine until it has followed some pointer
from the major/minor device table entry, and the disconnect() routine will
erase the pointer.  That (or someplace equivalent) is where the race
occurs.

The problem _can_ be solved by introducing a lock higher up, such as at
the driver level or at the bus level.  (A kernel lock would work too but
it would be extravagantly excessive.)  For example, the bus subsystem
rwsem in the driver model prevents analogous problems there.  But you
don't want to get a read lock on a bus-wide semaphore every time your
open() procedure runs!  A driver-wide lock makes a good solution.

Another possible solution would be to have disconnect() perform an RCU 
update to the device pointer.  I haven't seen any code that does this, but 
I think it ought to work.

Alan Stern

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-04  1:40                   ` Alan Stern
@ 2004-04-04 15:23                     ` James Bottomley
  2004-04-04 16:46                       ` Alan Stern
  2004-04-04 18:16                       ` David Brownell
  0 siblings, 2 replies; 62+ messages in thread
From: James Bottomley @ 2004-04-04 15:23 UTC (permalink / raw)
  To: Alan Stern
  Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On Sat, 2004-04-03 at 20:40, Alan Stern wrote:
> Without having looked recently in any detail at the specific code in sd.c
> or sr.c, I want to comment in general terms on the nature of the
> open/disconnect race.
> 
> It's a generic problem that affects every driver whose devices can be
> opened through the filesystem.  It _cannot_ be solved by adding any sort
> of lock to the device structure.  This is because the device structure is
> not available to the open() routine until it has followed some pointer
> from the major/minor device table entry, and the disconnect() routine will
> erase the pointer.  That (or someplace equivalent) is where the race
> occurs.

OK, your "problem" definition is that "there's a race", which I agree
with, I just don't agree that it's a problem.

Disconnections are fundamentally asynchronous events (a device may be
disconnected by the user at any stage regardless of what any kernel
internal state model is doing).  Trying to impose synchronisation on
asynchronous events is asking for trouble.

In the open race scenario, either the open is refused or the user gets a
fd that cannot do anything (because the device isn't there) and simply
returns errors to all operations.  Both cases are correct, so who wins
the race is irrelevant.

Let me illustrate: the user may disconnect the device then open it.  If
they open it before even the USB subsystem gets notified of the
disconnection then all the elaborate synchronisation in the world isn't
going to be able to prevent that (the device was gone when they opened
it, just nothing in the kernel knew that).  Since we cannot solve that
race, there's no reason to try to solve the "some parts of the kernel
know but others don't" part of the race.

James

> The problem _can_ be solved by introducing a lock higher up, such as at
> the driver level or at the bus level.  (A kernel lock would work too but
> it would be extravagantly excessive.)  For example, the bus subsystem
> rwsem in the driver model prevents analogous problems there.  But you
> don't want to get a read lock on a bus-wide semaphore every time your
> open() procedure runs!  A driver-wide lock makes a good solution.
> 
> Another possible solution would be to have disconnect() perform an RCU 
> update to the device pointer.  I haven't seen any code that does this, but 
> I think it ought to work.
> 
> Alan Stern
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-04 15:23                     ` James Bottomley
@ 2004-04-04 16:46                       ` Alan Stern
  2004-04-04 17:04                         ` James Bottomley
  2004-04-04 18:16                       ` David Brownell
  1 sibling, 1 reply; 62+ messages in thread
From: Alan Stern @ 2004-04-04 16:46 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On 4 Apr 2004, James Bottomley wrote:

> OK, your "problem" definition is that "there's a race", which I agree
> with, I just don't agree that it's a problem.
> 
> Disconnections are fundamentally asynchronous events (a device may be
> disconnected by the user at any stage regardless of what any kernel
> internal state model is doing).  Trying to impose synchronisation on
> asynchronous events is asking for trouble.
> 
> In the open race scenario, either the open is refused or the user gets a
> fd that cannot do anything (because the device isn't there) and simply
> returns errors to all operations.  Both cases are correct, so who wins
> the race is irrelevant.

Ah, you have left out the third, bad alternative: open succeeds, user gets 
an fd that points to a deallocated device.  More details below...

> Let me illustrate: the user may disconnect the device then open it.  If
> they open it before even the USB subsystem gets notified of the
> disconnection then all the elaborate synchronisation in the world isn't
> going to be able to prevent that (the device was gone when they opened
> it, just nothing in the kernel knew that).  Since we cannot solve that
> race, there's no reason to try to solve the "some parts of the kernel
> know but others don't" part of the race.

I agree with everything except your conclusion. :-)  Just because some
outcomes of the race lead to a benign result no matter what, that doesn't
mean all outcomes will or that we can ignore the race.

Let's consider a simple example, one that doesn't have all the
complexities of the sr driver with its multiple driver layers.  The
usb-skeleton program in drivers/usb makes a good illustration; I added a
semaphore to it some time ago to protect against exactly this sort of
race.  Without that semaphore, here's what can happen:

	Open process:			Disconnect process:

	Get minor number from inode
	Lookup USB interface using
	  minor number
	Get device pointer from the
	  interface's private data
	  and check it's not NULL
					Get device pointer from the
					  interface's private data
					Set the private data to NULL
					Lock the device sem
					Unregister the minor number
					Terminate ongoing I/O operations
					Clear the device->present flag
					Unlock the device
					Since the open count is 0,
					  deallocate the device structure
	Lock the device sem --> oops

The idea is that at some stage the open process has got far enough along
to believe the device exists, but not far enough to hold a reference to
it.  (That's inevitable, since you can't try to acquire a reference until
you're sure the device does exist.)  If the disconnect process deallocates
the device at that time, there will be trouble.

Alan Stern

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-04 16:46                       ` Alan Stern
@ 2004-04-04 17:04                         ` James Bottomley
  2004-04-05  3:17                           ` Alan Stern
  2004-04-05 13:30                           ` [linux-usb-devel] " Oliver Neukum
  0 siblings, 2 replies; 62+ messages in thread
From: James Bottomley @ 2004-04-04 17:04 UTC (permalink / raw)
  To: Alan Stern
  Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On Sun, 2004-04-04 at 12:46, Alan Stern wrote:
> Ah, you have left out the third, bad alternative: open succeeds, user gets 
> an fd that points to a deallocated device.  More details below...

No, that would be a bug. I'm looking for evidence that such a bug exists
in sd.

> Let's consider a simple example, one that doesn't have all the
> complexities of the sr driver with its multiple driver layers.  The
> usb-skeleton program in drivers/usb makes a good illustration; I added a
> semaphore to it some time ago to protect against exactly this sort of
> race.  Without that semaphore, here's what can happen:
> 
> 	Open process:			Disconnect process:
> 
> 	Get minor number from inode
> 	Lookup USB interface using
> 	  minor number
> 	Get device pointer from the
> 	  interface's private data
> 	  and check it's not NULL

This is what's wrong.  Here you should get a reference to the device
pointer (that's what scsi_device_get() actually does for us).  If the
ref getting routine comes back with an error, you may not proceed.  If
it comes back with a device, you own a reference to it and may go on
(even if the device is now gone, the structure will behave correctly).

> 					Get device pointer from the
> 					  interface's private data
> 					Set the private data to NULL
> 					Lock the device sem
> 					Unregister the minor number
> 					Terminate ongoing I/O operations
> 					Clear the device->present flag
> 					Unlock the device
> 					Since the open count is 0,
> 					  deallocate the device structure

You're not obeying the object lifetime rules here.  The device may be
gone but the device structure must stay around (obviously in a special
state) until the last reference is dropped.  Only after that may you
start nulling things out and killing it.

> 	Lock the device sem --> oops

This is because you're not using refcounted objects correctly.

If anyone can find a similar bug in the SCSI ULD's, I'll fix it, that's
why I asked for an example in sd (or sr with the current fixes) way back
in this thread.

James




-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [linux-usb-devel] Re: bug 2400
  2004-04-04 15:23                     ` James Bottomley
  2004-04-04 16:46                       ` Alan Stern
@ 2004-04-04 18:16                       ` David Brownell
  2004-04-04 18:42                         ` James Bottomley
  1 sibling, 1 reply; 62+ messages in thread
From: David Brownell @ 2004-04-04 18:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe,
	linux-usb-devel, SCSI Mailing List

James Bottomley wrote:
> Let me illustrate: the user may disconnect the device then open it.  If
> they open it before even the USB subsystem gets notified of the
> disconnection then all the elaborate synchronisation in the world isn't
> going to be able to prevent that (the device was gone when they opened
> it, just nothing in the kernel knew that).  Since we cannot solve that
> race, there's no reason to try to solve the "some parts of the kernel
> know but others don't" part of the race.

You're assuming that synchronization is there to establish a single
global notion of state.  Clearly that's impossible; also undesirable.

The synchronization is actually there to let the "device gone" state
spread cleanly through the software stack.  By the time USB disconnect()
is called, host controller drivers (and khubd) have normally cleaned up
all hardware state, and usbcore is never going to accept another operation
on that device.  The disconnect() callback is there to prevent that
raciness from making trouble ... closing windows from the bottom up.

The way usb-storage passes that up to the SCSI layer is by calling
scsi_remove_host().  Bug 2400 shows up later, through the block
layer (or is it just cdrom?) code.  Did someone actually post the
specific source code line in cdrom_release() that's oopsing?

- Dave

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [linux-usb-devel] Re: bug 2400
  2004-04-04 18:16                       ` David Brownell
@ 2004-04-04 18:42                         ` James Bottomley
  2004-04-05  3:54                           ` David Brownell
  0 siblings, 1 reply; 62+ messages in thread
From: James Bottomley @ 2004-04-04 18:42 UTC (permalink / raw)
  To: David Brownell
  Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe,
	linux-usb-devel, SCSI Mailing List

On Sun, 2004-04-04 at 14:16, David Brownell wrote:
> You're assuming that synchronization is there to establish a single
> global notion of state.  Clearly that's impossible; also undesirable.

Thank you.

> The synchronization is actually there to let the "device gone" state
> spread cleanly through the software stack.  By the time USB disconnect()
> is called, host controller drivers (and khubd) have normally cleaned up
> all hardware state, and usbcore is never going to accept another operation
> on that device.  The disconnect() callback is there to prevent that
> raciness from making trouble ... closing windows from the bottom up.

So you dispute this assertion in the email you quoted above:

On Sun, 2004-04-04 at 14:16, David Brownell wrote:
> James Bottomley wrote:
> > Since we cannot solve that
> > race, there's no reason to try to solve the "some parts of the kernel
> > know but others don't" part of the race.

On what basis?  This, I think, is the core of the differences between
us.  I don't see why an asynchronous event should proceed up the stack
in an orderly synchronised manner.

It goes like this:

- Initially, only the device knows, so commands outstanding time out
- Then, the USB driver knows, so it errors incoming commands (and
presumably returns with error any outstanding untimed out ones)
- Then, SCSI knows, so we forbid user I/O

The point is, that any I/O after disconnection gets an error ... the
error just comes from different places as the knowledge propagates
upwards.

> The way usb-storage passes that up to the SCSI layer is by calling
> scsi_remove_host().  Bug 2400 shows up later, through the block
> layer (or is it just cdrom?) code.  Did someone actually post the
> specific source code line in cdrom_release() that's oopsing?

Well, someone posted a patch a lot earlier in the thread ... we're on to
general hotplug principles now.

James



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-04 17:04                         ` James Bottomley
@ 2004-04-05  3:17                           ` Alan Stern
  2004-04-05 14:59                             ` Mike Anderson
                                               ` (2 more replies)
  2004-04-05 13:30                           ` [linux-usb-devel] " Oliver Neukum
  1 sibling, 3 replies; 62+ messages in thread
From: Alan Stern @ 2004-04-05  3:17 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On 4 Apr 2004, James Bottomley wrote:

> On Sun, 2004-04-04 at 12:46, Alan Stern wrote:
> > Ah, you have left out the third, bad alternative: open succeeds, user gets 
> > an fd that points to a deallocated device.  More details below...
> 
> No, that would be a bug.

Of course it would!  That's exactly what this thread is about: bugs caused
by improper handling of open/disconnect races.

>  I'm looking for evidence that such a bug exists
> in sd.

Put up or shut up, eh?  Okay...

> This is what's wrong.  Here you should get a reference to the device
> pointer (that's what scsi_device_get() actually does for us).

That's too vague; I can't tell what you're referring to.  In sd.c 
there's the struct scsi_device, the struct scsi_disk, the struct gendisk, 
and their embedded struct devices and struct kobjects.  Maybe you mean 
scsi_disk_get(), the one place in sd.c that calls scsi_device_get().  I 
would say that it acquires a reference to the device, not to the device 
pointer (whatever that might mean).  But let's not dwell on this point.

>  If the
> ref getting routine comes back with an error, you may not proceed.  If
> it comes back with a device, you own a reference to it and may go on
> (even if the device is now gone, the structure will behave correctly).

What if the ref getting routine causes an oops because the device has been 
deallocated?  I know, you'll say that should never happen...  See below.

> You're not obeying the object lifetime rules here.  The device may be
> gone but the device structure must stay around (obviously in a special
> state) until the last reference is dropped.  Only after that may you
> start nulling things out and killing it.

In the example I gave, the device structure _did_ stay around until the 
last reference was dropped.  The problem was that _another_ reference was 
in the middle of being acquired at the time.

> If anyone can find a similar bug in the SCSI ULD's, I'll fix it, that's
> why I asked for an example in sd (or sr with the current fixes) way back
> in this thread.

All right, let's look at sd.c.  I'll show you that _it_ doesn't obey the 
object lifetime rules.  In sd_open we see this code (lightly edited):

	static int sd_open(struct inode *inode, struct file *filp)
	{
		struct gendisk *disk = inode->i_bdev->bd_disk;
		struct scsi_disk *sdkp = scsi_disk(disk);
		struct scsi_device *sdev;
		int retval;

		retval = scsi_disk_get(sdkp);
		if (retval)
			return retval;

		sdev = sdkp->device;

As it turns out, the block layer guarantees that when sd_open runs the
bd_disk pointer will be valid.  It does this by following the pattern I
mentioned in an earlier message -- drivers/base/map.c uses a
subsystem-wide semaphore, domain_sem, to properly synchronize lookups and
deletes.

Next, the scsi_disk inline function returns:

	container_of(disk->private_data, struct scsi_disk, driver);

How do you know that the scsi_disk pointed to by disk->private_data still 
exists?  So far as I can see, the gendisk doesn't take any references to 
it.  Correct me if I'm wrong, but there doesn't seem to be anything 
preventing a disconnect event from arriving after the open() call has got 
a valid reference to the gendisk, and succeeding in deallocating the 
scsi_disk before this code executes.  There's only one reference between 
the scsi_disk and the gendisk, and it goes the wrong way: the scsi_disk 
owns a reference to the gendisk.

But let's suppose that works okay, so sdkp is a valid pointer.  Then 
the code calls scsi_disk_get(), which in turn calls scsi_device_get() for 
sdkp->device.  How do you know that this doesn't point to deallocated 
storage?  The only reference to the scsi_device is taken (in a rather 
convoluted way) by the gendisk, and it is dropped during del_gendisk() -- 
not when the gendisk is released.  Hence it is entirely possible for a 
disconnect event to have freed the scsi_device when this code executes.

There's two potential oopses for you.  I don't have a full grasp of the
web of interlocking references (and interlocking code) in the SCSI,
gendisk, and block layers, but it seems likely that at least one of
these might actually happen.

The object lifetime rules require that in your disconnect() routine, you
must tell all your users that your structure is going away, but you must
not free the structure until your users have notified you that they won't
try to use it any more.  When the scsi_disk is on its way out, sd.c tells
the gendisk but doesn't wait for a notification in return.  When the
scsi_device is on its way out the SCSI core tells sd.c, but sd.c doesn't
send back its notification at the right time.

Alan Stern

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [linux-usb-devel] Re: bug 2400
  2004-04-04 18:42                         ` James Bottomley
@ 2004-04-05  3:54                           ` David Brownell
  2004-04-05 21:44                             ` James Bottomley
  0 siblings, 1 reply; 62+ messages in thread
From: David Brownell @ 2004-04-05  3:54 UTC (permalink / raw)
  To: James Bottomley
  Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe,
	linux-usb-devel, SCSI Mailing List

James Bottomley wrote:
> So you dispute this assertion in the email you quoted above:
> 
>>>Since we cannot solve that
>>>race, there's no reason to try to solve the "some parts of the kernel
>>>know but others don't" part of the race.
> 
> 
> On what basis?  This, I think, is the core of the differences between

On the basis of faulty assumptions embedded therein.  Notably
"cannot solve that race" (I elaborate below on exactly how that
information _is_ passed along from USB, which is specifically
what you said "cannot" be solved) ... so also the conclusion
that there's "no reason to try" fixing things.

> us.  I don't see why an asynchronous event should proceed up the stack
> in an orderly synchronised manner.

Some such events don't; but this isn't one of them.  And the
reason it _does_ is to make sure drivers can always prevent
oopsing on device unplug ... and to help eliminate HCD-specific
behaviors from the equation.  (There were a lot of those in 2.4,
which made robust device unplugging rather hard to achieve.)

> It goes like this:
> 
> - Initially, only the device knows, so commands outstanding time out

Well, hardware timeouts or other protocol faults will happen
when the device doesn't respond to I/O from the host.  Those
will be reported in the usual way.  When they happen in other
contexts, such faults may be recoverable ... there's no way
yet to know that these particular ones won't be.

That ends when khubd gets notified, by the hub, that the device
is gone.  At which point (a) the USB device gets marked as gone,
so that usbcore will reject further requests with -ESHUTDOWN,
(b) all pending I/Os are canceled with -ESHUTDOWN status, and
except for UHCI, (c) HCDs force those the URBs back to drivers,
as part of cleaning up the remaining hardware state.

At that point, usbcore and the HCD are all but done with the
device, and the only question is when the memory associated
with the usb_interface and usb_device objects gets its last
reference dropped, so the memory gets freed.  The current UHCI
answer may surprise some drivers, even though the 2.4 stack
could do the same thing.  The time from (a) to (c) will usually
be tens of milliseconds, at most; average time before khubd
notices will be around 1/8 second.

Then (d) for each driver bound to an interface on that device,
the USB disconnect() method is called.  Driver responsibility
is to drop all those interface and device references ASAP.

Plus, in the typical case where drivers are using implicit
refcounts to claim interface/device handles, never issuing
requests using them after the disconnect() returns, since that
releases the implicit reference acquired during probe().

> - Then, the USB driver knows, so it errors incoming commands (and
> presumably returns with error any outstanding untimed out ones)

I don't think any single point (a)-(d) matches that
description.  As soon as it gets -ESHUTDOWN it "knows",
even before (d), for example ... though it's more
natural to wait till (d) before starting to clean up
the device state.

> - Then, SCSI knows, so we forbid user I/O

SCSI knows by the fact that the host was unregistered in (d),
rarely more than ~300 msec after physical disconnect.

The problem here was that the "forbid" is borked.  Maybe not
in SCSI (there are other layers involved), but oopsably.

> The point is, that any I/O after disconnection gets an error ... the
> error just comes from different places as the knowledge propagates
> upwards.

Which directly follows from what I said ... USB propagates
that knowledge in carefully defined ways.   Other layers can
do the same, although clearly state associated with open file
descriptors needs to use a slightly different strategy.  And
that strategy is what Alan's original comment was about.

- Dave

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-02 23:36   ` James Bottomley
  2004-04-03  0:11     ` Mike Anderson
@ 2004-04-05  4:33     ` Patrick Mansfield
  2004-04-05 14:09       ` James Bottomley
  2004-04-05 21:07       ` James Bottomley
  2004-04-05 14:03     ` Jens Axboe
  2 siblings, 2 replies; 62+ messages in thread
From: Patrick Mansfield @ 2004-04-05  4:33 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On Fri, Apr 02, 2004 at 06:36:55PM -0500, James Bottomley wrote:

> ===== drivers/scsi/sr.c 1.103 vs edited =====
> --- 1.103/drivers/scsi/sr.c	Fri Apr  2 11:30:44 2004
> +++ edited/drivers/scsi/sr.c	Fri Apr  2 17:29:06 2004
> @@ -424,8 +424,19 @@
>  
>  static int sr_block_release(struct inode *inode, struct file *file)
>  {
> +	int ret;
>  	struct scsi_cd *cd = scsi_cd(inode->i_bdev->bd_disk);
> -	return cdrom_release(&cd->cdi, file);
> +	struct scsi_device *sdev = cd->device;
> +	ret = cdrom_release(&cd->cdi, file);
> +	if(ret)
> +		return ret;
> +	
> +	unregister_cdrom(&cd->cdi);
> +	kfree(cd);
> +
> +	scsi_device_put(sdev);
> +
> +	return 0;
>  }
>  
>  static int sr_block_ioctl(struct inode *inode, struct file *file, unsigned cmd,
> @@ -500,7 +511,6 @@
>  	if (cd->device->sector_size > 2048)
>  		sr_set_blocklength(cd, 2048);
>  
> -	scsi_device_put(cd->device);
>  }
>  
>  static int sr_probe(struct device *dev)
> @@ -874,9 +884,6 @@
>  	spin_unlock(&sr_index_lock);
>  
>  	put_disk(cd->disk);
> -	unregister_cdrom(&cd->cdi);
> -	kfree(cd);
> -
>  	return 0;
>  }

How is unregister_cdrom(&cd->cdi) called if the device is not open?

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [linux-usb-devel] Re: bug 2400
  2004-04-04 17:04                         ` James Bottomley
  2004-04-05  3:17                           ` Alan Stern
@ 2004-04-05 13:30                           ` Oliver Neukum
  1 sibling, 0 replies; 62+ messages in thread
From: Oliver Neukum @ 2004-04-05 13:30 UTC (permalink / raw)
  To: James Bottomley, Alan Stern
  Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List


> > 	Open process:			Disconnect process:
> >
> > 	Get minor number from inode
> > 	Lookup USB interface using
> > 	  minor number
> > 	Get device pointer from the
> > 	  interface's private data
> > 	  and check it's not NULL
>
> This is what's wrong.  Here you should get a reference to the device
> pointer (that's what scsi_device_get() actually does for us).  If the
> ref getting routine comes back with an error, you may not proceed.  If
> it comes back with a device, you own a reference to it and may go on
> (even if the device is now gone, the structure will behave correctly).

You are correct, if and only if the code doing the lookup and getting
the reference is atomic with respect to freeing the data structure.
Which lock is supposed to assure that?

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-02 23:36   ` James Bottomley
  2004-04-03  0:11     ` Mike Anderson
  2004-04-05  4:33     ` Patrick Mansfield
@ 2004-04-05 14:03     ` Jens Axboe
  2004-04-05 21:08       ` James Bottomley
  2 siblings, 1 reply; 62+ messages in thread
From: Jens Axboe @ 2004-04-05 14:03 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, greg, linux-usb-devel, SCSI Mailing List

On Fri, Apr 02 2004, James Bottomley wrote:
> On Thu, 2004-04-01 at 18:32, James Bottomley wrote:
> > Now, the questions are, whose issue is this and how do we fix it?  I can
> > see that a driver needs early notification of unplugs so it can deny all
> > access to a gone device.  On the other hand, for a user land open where
> > we still have to hold resources in the driver, we'd like the driver to
> > have a notify when the device reference count drops to zero so we can
> > clean up.
> 
> OK, I think this is easily fixable in sr.c by moving around the release
> code to do the right thing (and not drop the reference to sdev_gendev
> until we're completely finished).
> 
> Jens, does this look OK (or have I just opened up another race window
> somewhere else)?

Doesn't quite work, how about something like this as a work-around?

diff -ur /mnt/kscratch/linux-2.6.4/drivers/cdrom/cdrom.c linux-2.6.4-SUSE-20040402/drivers/cdrom/cdrom.c
--- /mnt/kscratch/linux-2.6.4/drivers/cdrom/cdrom.c	2004-04-02 14:53:42.000000000 +0200
+++ linux-2.6.4-SUSE-20040402/drivers/cdrom/cdrom.c	2004-04-05 15:43:01.603675727 +0200
@@ -379,6 +379,8 @@
 #endif /* CONFIG_SYSCTL */ 
 	}
 
+	atomic_set(&cdi->all_use, 1);
+
 	ENSURE(drive_status, CDC_DRIVE_STATUS );
 	ENSURE(media_changed, CDC_MEDIA_CHANGED);
 	ENSURE(tray_move, CDC_CLOSE_TRAY | CDC_OPEN_TRAY);
@@ -425,6 +427,9 @@
 	struct cdrom_device_info *cdi, *prev;
 	cdinfo(CD_OPEN, "entering unregister_cdrom\n"); 
 
+	if (!atomic_dec_and_test(&unreg->all_use))
+		return -EBUSY;
+
 	prev = NULL;
 	spin_lock(&cdrom_lock);
 	cdi = topCdromPtr;
@@ -891,6 +896,8 @@
 	if (ret)
 		goto err;
 
+	atomic_inc(&cdi->all_use);
+
 	cdinfo(CD_OPEN, "Use count for \"/dev/%s\" now %d\n",
 			cdi->name, cdi->use_count);
 	/* Do this on open.  Don't wait for mount, because they might
@@ -1096,6 +1103,7 @@
 		    cdi->options & CDO_AUTO_EJECT && CDROM_CAN(CDC_OPEN_TRAY))
 			cdo->tray_move(cdi, 1);
 	}
+
 	return 0;
 }
 
diff -ur /mnt/kscratch/linux-2.6.4/drivers/scsi/sr.c linux-2.6.4-SUSE-20040402/drivers/scsi/sr.c
--- /mnt/kscratch/linux-2.6.4/drivers/scsi/sr.c	2004-04-02 14:53:42.000000000 +0200
+++ linux-2.6.4-SUSE-20040402/drivers/scsi/sr.c	2004-04-05 15:57:35.577714398 +0200
@@ -419,13 +419,36 @@
 static int sr_block_open(struct inode *inode, struct file *file)
 {
 	struct scsi_cd *cd = scsi_cd(inode->i_bdev->bd_disk);
-	return cdrom_open(&cd->cdi, inode, file);
+	struct scsi_device *sdev = cd->device;
+	int retval;
+
+	retval = scsi_device_get(sdev);
+	if (retval)
+		return retval;
+	
+	retval = cdrom_open(&cd->cdi, inode, file);
+	if (retval)
+		scsi_device_put(sdev);
+
+	return retval;
+}
+
+static void sr_cleanup(struct scsi_cd *cd)
+{
+	if (!unregister_cdrom(&cd->cdi)) {
+		scsi_device_put(cd->device);
+		kfree(cd);
+	}
 }
 
 static int sr_block_release(struct inode *inode, struct file *file)
 {
 	struct scsi_cd *cd = scsi_cd(inode->i_bdev->bd_disk);
-	return cdrom_release(&cd->cdi, file);
+	int ret;
+
+	ret = cdrom_release(&cd->cdi, file);
+	sr_cleanup(cd);
+	return ret;
 }
 
 static int sr_block_ioctl(struct inode *inode, struct file *file, unsigned cmd,
@@ -467,10 +490,6 @@
 	struct scsi_device *sdev = cd->device;
 	int retval;
 
-	retval = scsi_device_get(sdev);
-	if (retval)
-		return retval;
-	
 	/*
 	 * If the device is in error recovery, wait until it is done.
 	 * If the device is offline, then disallow any access to it.
@@ -499,8 +518,6 @@
 
 	if (cd->device->sector_size > 2048)
 		sr_set_blocklength(cd, 2048);
-
-	scsi_device_put(cd->device);
 }
 
 static int sr_probe(struct device *dev)
@@ -874,8 +891,8 @@
 	spin_unlock(&sr_index_lock);
 
 	put_disk(cd->disk);
-	unregister_cdrom(&cd->cdi);
-	kfree(cd);
+
+	sr_cleanup(cd);
 
 	return 0;
 }
diff -ur /mnt/kscratch/linux-2.6.4/include/linux/cdrom.h linux-2.6.4-SUSE-20040402/include/linux/cdrom.h
--- /mnt/kscratch/linux-2.6.4/include/linux/cdrom.h	2004-04-02 14:53:42.000000000 +0200
+++ linux-2.6.4-SUSE-20040402/include/linux/cdrom.h	2004-04-05 15:10:46.347243055 +0200
@@ -916,6 +916,7 @@
 
 /* Uniform cdrom data structures for cdrom.c */
 struct cdrom_device_info {
+	atomic_t all_use;
 	struct cdrom_device_ops  *ops;  /* link to device_ops */
 	struct cdrom_device_info *next; /* next device_info for this major */
 	struct gendisk *disk;		/* matching block layer disk */

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-05  4:33     ` Patrick Mansfield
@ 2004-04-05 14:09       ` James Bottomley
  2004-04-05 21:07       ` James Bottomley
  1 sibling, 0 replies; 62+ messages in thread
From: James Bottomley @ 2004-04-05 14:09 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On Sun, 2004-04-04 at 23:33, Patrick Mansfield wrote:
> How is unregister_cdrom(&cd->cdi) called if the device is not open?

Yes, I need to pull the same kobject trick that sd does for the same
reason.

James



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-05  3:17                           ` Alan Stern
@ 2004-04-05 14:59                             ` Mike Anderson
  2004-04-05 21:27                             ` James Bottomley
  2004-04-05 22:10                             ` Patrick Mansfield
  2 siblings, 0 replies; 62+ messages in thread
From: Mike Anderson @ 2004-04-05 14:59 UTC (permalink / raw)
  To: Alan Stern
  Cc: James Bottomley, Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List


PS, I am traveling today so future comments will be delayed a bit.

Alan Stern [stern@rowland.harvard.edu] wrote:
> All right, let's look at sd.c.  I'll show you that _it_ doesn't obey the 
> object lifetime rules.  In sd_open we see this code (lightly edited):
> 
> 
> 	static int sd_open(struct inode *inode, struct file *filp)
> 	{
> 		struct gendisk *disk = inode->i_bdev->bd_disk;
> 		struct scsi_disk *sdkp = scsi_disk(disk);
> 		struct scsi_device *sdev;
> 		int retval;
> 
> 		retval = scsi_disk_get(sdkp);
> 		if (retval)
> 			return retval;
> 
> 		sdev = sdkp->device;
> 
> 
> As it turns out, the block layer guarantees that when sd_open runs the
> bd_disk pointer will be valid.  It does this by following the pattern I
> mentioned in an earlier message -- drivers/base/map.c uses a
> subsystem-wide semaphore, domain_sem, to properly synchronize lookups and
> deletes.
> 
> Next, the scsi_disk inline function returns:
> 
> 	container_of(disk->private_data, struct scsi_disk, driver);
> 
> How do you know that the scsi_disk pointed to by disk->private_data still 
> exists?  So far as I can see, the gendisk doesn't take any references to 
> it.  Correct me if I'm wrong, but there doesn't seem to be anything 
> preventing a disconnect event from arriving after the open() call has got 
> a valid reference to the gendisk, and succeeding in deallocating the 
> scsi_disk before this code executes.  There's only one reference between 
> the scsi_disk and the gendisk, and it goes the wrong way: the scsi_disk 
> owns a reference to the gendisk.
> 
> But let's suppose that works okay, so sdkp is a valid pointer.  Then 
> the code calls scsi_disk_get(), which in turn calls scsi_device_get() for 
> sdkp->device.  How do you know that this doesn't point to deallocated 
> storage?  The only reference to the scsi_device is taken (in a rather 
> convoluted way) by the gendisk, and it is dropped during del_gendisk() -- 
> not when the gendisk is released.  Hence it is entirely possible for a 
> disconnect event to have freed the scsi_device when this code executes.
> 
> 
> There's two potential oopses for you.  I don't have a full grasp of the
> web of interlocking references (and interlocking code) in the SCSI,
> gendisk, and block layers, but it seems likely that at least one of
> these might actually happen.
> 
> The object lifetime rules require that in your disconnect() routine, you
> must tell all your users that your structure is going away, but you must
> not free the structure until your users have notified you that they won't
> try to use it any more.  When the scsi_disk is on its way out, sd.c tells
> the gendisk but doesn't wait for a notification in return.  When the
> scsi_device is on its way out the SCSI core tells sd.c, but sd.c doesn't
> send back its notification at the right time.

As I previously stated there is no notification back from the block
layer that users are complete with a structure. Currently the only
method to prevent an oops looks like sd_remove would need to use
lock_kernel so that scsi could ensure that a user would be through open
which means the sd_remove would not take references to zero or the user
has not made it far enough through do open that they have received a
gendisk structure so that del_gendisk will ensure they do not call
sd_open.

I would like another method, but this looks like it would need to be
another shared sync mechanism between scsi layer (an other blk
interfaces) and block layer or a lookup method similar to kobj_lookup
in scsi open routines so that a object can be unmapped atomically.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-05  4:33     ` Patrick Mansfield
  2004-04-05 14:09       ` James Bottomley
@ 2004-04-05 21:07       ` James Bottomley
  2004-04-06  9:22         ` Jens Axboe
  2004-04-08 23:06         ` Greg KH
  1 sibling, 2 replies; 62+ messages in thread
From: James Bottomley @ 2004-04-05 21:07 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On Sun, 2004-04-04 at 23:33, Patrick Mansfield wrote:
> How is unregister_cdrom(&cd->cdi) called if the device is not open?

OK, the attached does everything correctly (as I should have done at
first) by using a kobject to hold the compound references.

I've stressed this one nicely.  Unfortunately, it's showing up a bug in
scsi_sysfs (hold device open, remove it and re-add it at the same HCTL
and you'll see an "error 1").

James

===== drivers/scsi/sr.c 1.103 vs edited =====
--- 1.103/drivers/scsi/sr.c	Fri Apr  2 11:30:44 2004
+++ edited/drivers/scsi/sr.c	Mon Apr  5 15:59:34 2004
@@ -113,6 +113,28 @@
 	.generic_packet		= sr_packet,
 };
 
+static void sr_kobject_release(struct kobject *kobj);
+
+static struct kobj_type scsi_cdrom_kobj_type = {
+	.release = sr_kobject_release,
+};
+
+/*
+ * The get and put routines for the struct scsi_cd.  Note this entity
+ * has a scsi_device pointer and owns a reference to this.
+ */
+static inline int scsi_cd_get(struct scsi_cd *cd)
+{
+	if (!kobject_get(&cd->kobj))
+		return -ENODEV;
+	return 0;
+}
+
+static inline void scsi_cd_put(struct scsi_cd *cd)
+{
+	kobject_put(&cd->kobj);
+}
+
 /*
  * This function checks to see if the media has been changed in the
  * CDROM drive.  It is possible that we have already sensed a change,
@@ -424,8 +446,15 @@
 
 static int sr_block_release(struct inode *inode, struct file *file)
 {
+	int ret;
 	struct scsi_cd *cd = scsi_cd(inode->i_bdev->bd_disk);
-	return cdrom_release(&cd->cdi, file);
+	ret = cdrom_release(&cd->cdi, file);
+	if(ret)
+		return ret;
+	
+	scsi_cd_put(cd);
+
+	return 0;
 }
 
 static int sr_block_ioctl(struct inode *inode, struct file *file, unsigned cmd,
@@ -467,7 +496,7 @@
 	struct scsi_device *sdev = cd->device;
 	int retval;
 
-	retval = scsi_device_get(sdev);
+	retval = scsi_cd_get(cd);
 	if (retval)
 		return retval;
 	
@@ -489,7 +518,7 @@
 	return 0;
 
 error_out:
-	scsi_device_put(sdev);
+	scsi_cd_put(cd);
 	return retval;	
 }
 
@@ -500,7 +529,6 @@
 	if (cd->device->sector_size > 2048)
 		sr_set_blocklength(cd, 2048);
 
-	scsi_device_put(cd->device);
 }
 
 static int sr_probe(struct device *dev)
@@ -514,12 +542,18 @@
 	if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM)
 		goto fail;
 
+	if ((err = scsi_device_get(sdev)) != 0)
+		goto fail_put_sdev;
+
 	error = -ENOMEM;
 	cd = kmalloc(sizeof(*cd), GFP_KERNEL);
 	if (!cd)
 		goto fail;
 	memset(cd, 0, sizeof(*cd));
 
+	kobject_init(&cd->kobj);
+	cd->kobj.ktype = &scsi_cdrom_kobj_type;
+
 	disk = alloc_disk(1);
 	if (!disk)
 		goto fail_free;
@@ -588,6 +622,8 @@
 	put_disk(disk);
 fail_free:
 	kfree(cd);
+fail_put_sdev:
+	scsi_device_put(sdev);
 fail:
 	return error;
 }
@@ -863,19 +899,31 @@
 	return cgc->stat;
 }
 
-static int sr_remove(struct device *dev)
+static void sr_kobject_release(struct kobject *kobj)
 {
-	struct scsi_cd *cd = dev_get_drvdata(dev);
-
-	del_gendisk(cd->disk);
+	struct scsi_cd *cd = container_of(kobj, struct scsi_cd, kobj);
+	struct scsi_device *sdev = cd->device;
 
 	spin_lock(&sr_index_lock);
 	clear_bit(cd->disk->first_minor, sr_index_bits);
 	spin_unlock(&sr_index_lock);
 
-	put_disk(cd->disk);
 	unregister_cdrom(&cd->cdi);
+
+	put_disk(cd->disk);
+
 	kfree(cd);
+
+	scsi_device_put(sdev);
+}
+
+static int sr_remove(struct device *dev)
+{
+	struct scsi_cd *cd = dev_get_drvdata(dev);
+
+	del_gendisk(cd->disk);
+
+	scsi_cd_put(cd);
 
 	return 0;
 }
===== drivers/scsi/sr.h 1.10 vs edited =====
--- 1.10/drivers/scsi/sr.h	Mon May 26 04:50:43 2003
+++ edited/drivers/scsi/sr.h	Mon Apr  5 15:51:37 2004
@@ -36,6 +36,9 @@
 	unsigned readcd_known:1;	/* drive supports READ_CD (0xbe) */
 	unsigned readcd_cdda:1;	/* reading audio data using READ_CD */
 	struct cdrom_device_info cdi;
+	/* We hold gendisk and scsi_device references on probe and use
+	 * the refs on this kobj to decide when to release them */
+	struct kobject kobj;
 	struct gendisk *disk;
 } Scsi_CD;
 



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-05 14:03     ` Jens Axboe
@ 2004-04-05 21:08       ` James Bottomley
  2004-04-06  9:22         ` Jens Axboe
  0 siblings, 1 reply; 62+ messages in thread
From: James Bottomley @ 2004-04-05 21:08 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, greg, linux-usb-devel, SCSI Mailing List

On Mon, 2004-04-05 at 09:03, Jens Axboe wrote:
> Doesn't quite work, how about something like this as a work-around?

I'm not sure we want to be introducing yet another refcount.  What was
the problem?

James




-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-05  3:17                           ` Alan Stern
  2004-04-05 14:59                             ` Mike Anderson
@ 2004-04-05 21:27                             ` James Bottomley
  2004-04-06 14:00                               ` Alan Stern
  2004-04-05 22:10                             ` Patrick Mansfield
  2 siblings, 1 reply; 62+ messages in thread
From: James Bottomley @ 2004-04-05 21:27 UTC (permalink / raw)
  To: Alan Stern
  Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On Sun, 2004-04-04 at 22:17, Alan Stern wrote:
> On 4 Apr 2004, James Bottomley wrote:
> > On Sun, 2004-04-04 at 12:46, Alan Stern wrote:
> > > Ah, you have left out the third, bad alternative: open succeeds, user gets 
> > > an fd that points to a deallocated device.  More details below...
> > 
> > No, that would be a bug.
> 
> Of course it would!  That's exactly what this thread is about: bugs caused
> by improper handling of open/disconnect races.

So you're quietly shifting your ground away from this assertion:

On Sat, 2004-04-03 at 19:40, Alan Stern wrote:
> The problem _can_ be solved by introducing a lock higher up, such as at
> the driver level or at the bus level.  (A kernel lock would work too but
> it would be extravagantly excessive.)  For example, the bus subsystem
> rwsem in the driver model prevents analogous problems there.  But you
> don't want to get a read lock on a bus-wide semaphore every time your
> open() procedure runs!  A driver-wide lock makes a good solution.
> 
> Another possible solution would be to have disconnect() perform an RCU 
> update to the device pointer.  I haven't seen any code that does this, but 
> I think it ought to work.

?

My contention is that the races can be solved by proper refcounting
(without the need for locks and RCUs) not that we don't have any bugs in
sd.c (I'll be happy if I can pull them all out of sr at the moment).

James




-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Re: bug 2400
  2004-04-05  3:54                           ` David Brownell
@ 2004-04-05 21:44                             ` James Bottomley
  2004-04-05 23:23                               ` [linux-usb-devel] " David Brownell
  0 siblings, 1 reply; 62+ messages in thread
From: James Bottomley @ 2004-04-05 21:44 UTC (permalink / raw)
  To: David Brownell
  Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe,
	linux-usb-devel, SCSI Mailing List

On Sun, 2004-04-04 at 22:54, David Brownell wrote:
> Which directly follows from what I said ... USB propagates
> that knowledge in carefully defined ways.   Other layers can
> do the same, although clearly state associated with open file
> descriptors needs to use a slightly different strategy.  And
> that strategy is what Alan's original comment was about.

Well, again, this is at the core of the argument.

I say that as long as we have all our objects correctly refcounted, how
disconnections propagate up the stack is irrelevant.

If you have to have a "carefully defined" order for the propagation of
an asynchronous event to save you from oopsing, it's a sure sign of bugs
in the code.

My reasoning is that I/O down the stack will ultimately hit the part of
the kernel (or even just timeout in transmission to the device if it's
disappeared without trace) that contains the knowledge and be returned
with an error.  Ordering the disconnection propagation cannot change
this fact, merely alter *which* component returns the error.

As long as we're robust to the error there is no problem.

At this point, the object lifetime does nothing more that count how long
we have to keep the object around to return the error.

James

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-05  3:17                           ` Alan Stern
  2004-04-05 14:59                             ` Mike Anderson
  2004-04-05 21:27                             ` James Bottomley
@ 2004-04-05 22:10                             ` Patrick Mansfield
  2004-04-06 14:10                               ` Alan Stern
  2004-04-08 14:09                               ` Alan Stern
  2 siblings, 2 replies; 62+ messages in thread
From: Patrick Mansfield @ 2004-04-05 22:10 UTC (permalink / raw)
  To: Alan Stern
  Cc: James Bottomley, Mike Anderson, Andrew Morton, greg, Jens Axboe,
	linux-usb-devel, SCSI Mailing List

On Sun, Apr 04, 2004 at 11:17:55PM -0400, Alan Stern wrote:

> As it turns out, the block layer guarantees that when sd_open runs the
> bd_disk pointer will be valid.  It does this by following the pattern I
> mentioned in an earlier message -- drivers/base/map.c uses a
> subsystem-wide semaphore, domain_sem, to properly synchronize lookups and
> deletes.
> 
> Next, the scsi_disk inline function returns:
> 
> 	container_of(disk->private_data, struct scsi_disk, driver);
> 
> How do you know that the scsi_disk pointed to by disk->private_data still 
> exists?  So far as I can see, the gendisk doesn't take any references to 
> it.  Correct me if I'm wrong, but there doesn't seem to be anything 
> preventing a disconnect event from arriving after the open() call has got 
> a valid reference to the gendisk, and succeeding in deallocating the 
> scsi_disk before this code executes.  There's only one reference between 
> the scsi_disk and the gendisk, and it goes the wrong way: the scsi_disk 
> owns a reference to the gendisk.
> 
> But let's suppose that works okay, so sdkp is a valid pointer.  Then 
> the code calls scsi_disk_get(), which in turn calls scsi_device_get() for 
> sdkp->device.  How do you know that this doesn't point to deallocated 
> storage?  The only reference to the scsi_device is taken (in a rather 
> convoluted way) by the gendisk, and it is dropped during del_gendisk() -- 
> not when the gendisk is released.  Hence it is entirely possible for a 
> disconnect event to have freed the scsi_device when this code executes.
> 
> 
> There's two potential oopses for you.  I don't have a full grasp of the
> web of interlocking references (and interlocking code) in the SCSI,
> gendisk, and block layers, but it seems likely that at least one of
> these might actually happen.

Yes. 

If we are in middle opening an sd, and are anywhere between the
kobj_lookup having completed call to up_read(domain->sem) [called via
do_open calling get_gendisk] and sd_open calling scsi_device_get(), and
separately an sd_remove call completes [freeing the scsi_disk and the
scsi_device] there could be an oops.

Much as I hate it, simply adding a lock_kernel in sd_remove would solve
the problem, since do_open uses lock_kernel.

And I don't see why the kobject_get/put use an atomic_inc/dec, it just
obscures the need for a lock. We already have a reference to the kobject
(it's passed as an argument). The calls to the get/put must either be
protected with a lock or other method, or we have already gotten a
reference to the object and getting another is meaningless.

-- Patrick Mansfield


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [linux-usb-devel] Re: bug 2400
  2004-04-05 21:44                             ` James Bottomley
@ 2004-04-05 23:23                               ` David Brownell
  2004-04-06  1:19                                 ` James Bottomley
  0 siblings, 1 reply; 62+ messages in thread
From: David Brownell @ 2004-04-05 23:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe,
	linux-usb-devel, SCSI Mailing List

James Bottomley wrote:
> On Sun, 2004-04-04 at 22:54, David Brownell wrote:
> 
>>Which directly follows from what I said ... USB propagates
>>that knowledge in carefully defined ways.   Other layers can
>>do the same, although clearly state associated with open file
>>descriptors needs to use a slightly different strategy.  And
>>that strategy is what Alan's original comment was about.
> 
> 
> Well, again, this is at the core of the argument.
> 
> I say that as long as we have all our objects correctly refcounted, how
> disconnections propagate up the stack is irrelevant.

"Irrelevant" is pointlessly strong, even for what I'm guessing
you really mean to be saying.

Minimally, there needs to be synchronization to prevent open() on
one CPU from using data structures disconnect() just invalidated
on another CPU.  That's a basic "how to write multi-threaded code"
kind of issue, which I won't bother to explain (yet again).

> If you have to have a "carefully defined" order for the propagation of
> an asynchronous event to save you from oopsing, it's a sure sign of bugs
> in the code.

It was "carefully defined" to ensure that there are clearly
defined points where it's safe to delete the hardware state.
Those are needed in situations other than disconnect().

Not having such synchronization points meant that HCDs were
reduced to _guessing_ whether it's safe to free that state.
Which was a sure sign of API bugs (inside usbcore), and made
for way too many oops-on-unplug reports in 2.4 kernels (and
kept various other things from working right, too).

> My reasoning is that I/O down the stack will ultimately hit the part of
> the kernel (or even just timeout in transmission to the device if it's
> disappeared without trace) that contains the knowledge and be returned
> with an error.  Ordering the disconnection propagation cannot change
> this fact, merely alter *which* component returns the error.

Ensuring such a behaviour certainly requires "careful design".
As does ensuring that the event propagation _completes_ every
time, rather than sometimes just refusing to finish.  Surely
you've seen both kinds of design botch.

> As long as we're robust to the error there is no problem.

The only system "error" in this discussion was that oops.

It may be a minor point ... but "disconnecting device while in use"
isn't an error at all.  It's a fault, maybe sometimes undesirable,
but one with behavior specified as fully as reading or writing a
block of data.

> At this point, the object lifetime does nothing more that count how long
> we have to keep the object around to return the error.

And that's basically what I said about the careful design of the
disconnect sequence, ensuring that there _is_ such a point:  one
where the only issues left are when references get dropped, so
that the only issue left is memory management.

- Dave

> James
> 
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [linux-usb-devel] Re: bug 2400
  2004-04-05 23:23                               ` [linux-usb-devel] " David Brownell
@ 2004-04-06  1:19                                 ` James Bottomley
  2004-04-06  6:52                                   ` Oliver Neukum
  2004-04-06 15:10                                   ` David Brownell
  0 siblings, 2 replies; 62+ messages in thread
From: James Bottomley @ 2004-04-06  1:19 UTC (permalink / raw)
  To: David Brownell
  Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe,
	linux-usb-devel, SCSI Mailing List

On Mon, 2004-04-05 at 18:23, David Brownell wrote:
> "Irrelevant" is pointlessly strong, even for what I'm guessing
> you really mean to be saying.

No, it's the very point.

If we have to enforce ordering on the event propagation across
subsystems, that can't be done without inter subsystem synchronisation.

In order to get away without any inter subsystem synchronisation the
order has to not matter.  i.e. be irrelevant.

> Minimally, there needs to be synchronization to prevent open() on
> one CPU from using data structures disconnect() just invalidated
> on another CPU.  That's a basic "how to write multi-threaded code"
> kind of issue, which I won't bother to explain (yet again).

Yes, but that's intra subsystem synchronisation, not inter subsystem
synchronisation.

Look, the rules are very simple:

- Every subsystem gives out refcounted objects.

- Every subsystem takes in a disconnection event for another object. 
Now what the subsystem does with the event is up to it.  The only
guarantee is if the subsystem holds no references to the other object
after the disconnection completes in that subsystem, then it may be
garbage collected.  However, if not, the other object must remain until
the last reference is dropped.

The point is that all the disconnection event does is inform a subsystem
that the entity represented by the other object has gone.  What it
chooses to do with the information is up to it.  It could set it's own
connected object to degraded, never use the other object again and drop
the reference.  Or, it could simply ignore the indication and continue
to use the other object (even though it will return errors for every
operation).

Now, if the subsystem is going to garbage collect its own object as a
result of the other object disconnect, then it is responsible for
synchronising that with reference gets on its own object.  However, that
is easily achievable via *intra* subsystem synchronisation.

If you follow these rules, there's no *inter* subsystem ordering or
synchronisation requirement.

James

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [linux-usb-devel] Re: bug 2400
  2004-04-06  1:19                                 ` James Bottomley
@ 2004-04-06  6:52                                   ` Oliver Neukum
  2004-04-06 14:03                                     ` James Bottomley
  2004-04-06 15:10                                   ` David Brownell
  1 sibling, 1 reply; 62+ messages in thread
From: Oliver Neukum @ 2004-04-06  6:52 UTC (permalink / raw)
  To: James Bottomley, David Brownell
  Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe,
	linux-usb-devel, SCSI Mailing List


> The point is that all the disconnection event does is inform a subsystem
> that the entity represented by the other object has gone.  What it
> chooses to do with the information is up to it.  It could set it's own
> connected object to degraded, never use the other object again and drop
> the reference.  Or, it could simply ignore the indication and continue
> to use the other object (even though it will return errors for every
> operation).

Pure refcounting can never protect you against races with freeing objects.
The counters themselves must be protected. Try as you might you need
locks for that and rules on how this locks are to be used.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-05 21:08       ` James Bottomley
@ 2004-04-06  9:22         ` Jens Axboe
  0 siblings, 0 replies; 62+ messages in thread
From: Jens Axboe @ 2004-04-06  9:22 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, greg, linux-usb-devel, SCSI Mailing List

On Mon, Apr 05 2004, James Bottomley wrote:
> On Mon, 2004-04-05 at 09:03, Jens Axboe wrote:
> > Doesn't quite work, how about something like this as a work-around?
> 
> I'm not sure we want to be introducing yet another refcount.  What was
> the problem?

Same as in this thread, cdrom gets unregister'ed and freed too early.
Your patch looks better though, the new one that is.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-05 21:07       ` James Bottomley
@ 2004-04-06  9:22         ` Jens Axboe
  2004-04-06 13:56           ` James Bottomley
  2004-04-08 23:06         ` Greg KH
  1 sibling, 1 reply; 62+ messages in thread
From: Jens Axboe @ 2004-04-06  9:22 UTC (permalink / raw)
  To: James Bottomley
  Cc: Patrick Mansfield, Andrew Morton, greg, linux-usb-devel,
	SCSI Mailing List

On Mon, Apr 05 2004, James Bottomley wrote:
> On Sun, 2004-04-04 at 23:33, Patrick Mansfield wrote:
> > How is unregister_cdrom(&cd->cdi) called if the device is not open?
> 
> OK, the attached does everything correctly (as I should have done at
> first) by using a kobject to hold the compound references.
> 
> I've stressed this one nicely.  Unfortunately, it's showing up a bug in
> scsi_sysfs (hold device open, remove it and re-add it at the same HCTL
> and you'll see an "error 1").

Really? It doesn't even compile :-)

> @@ -514,12 +542,18 @@
>  	if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM)
>  		goto fail;
>  
> +	if ((err = scsi_device_get(sdev)) != 0)
> +		goto fail_put_sdev;
> +
>  	error = -ENOMEM;
>  	cd = kmalloc(sizeof(*cd), GFP_KERNEL);
>  	if (!cd)
>  		goto fail;
>  	memset(cd, 0, sizeof(*cd));

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-06  9:22         ` Jens Axboe
@ 2004-04-06 13:56           ` James Bottomley
  2004-04-06 14:04             ` Jens Axboe
  0 siblings, 1 reply; 62+ messages in thread
From: James Bottomley @ 2004-04-06 13:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Patrick Mansfield, Andrew Morton, greg, linux-usb-devel,
	SCSI Mailing List

On Tue, 2004-04-06 at 04:22, Jens Axboe wrote:
> Really? It doesn't even compile :-)

Heh, I really must learn that I have to copy the file from the test
machine to the email machine *before* attaching it.  You got a stale
copy of an older incarnation, I think.

The attached (hopefully) is what I compiled and tested with.  The test,
incidentally, is simply to hold the device open and then forcibly remove
it using scsi remove-single-device before closing it.

James

===== drivers/scsi/sr.c 1.103 vs edited =====
--- 1.103/drivers/scsi/sr.c	Fri Apr  2 11:30:44 2004
+++ edited/drivers/scsi/sr.c	Tue Apr  6 08:49:30 2004
@@ -113,6 +113,28 @@
 	.generic_packet		= sr_packet,
 };
 
+static void sr_kobject_release(struct kobject *kobj);
+
+static struct kobj_type scsi_cdrom_kobj_type = {
+	.release = sr_kobject_release,
+};
+
+/*
+ * The get and put routines for the struct scsi_cd.  Note this entity
+ * has a scsi_device pointer and owns a reference to this.
+ */
+static inline int scsi_cd_get(struct scsi_cd *cd)
+{
+	if (!kobject_get(&cd->kobj))
+		return -ENODEV;
+	return 0;
+}
+
+static inline void scsi_cd_put(struct scsi_cd *cd)
+{
+	kobject_put(&cd->kobj);
+}
+
 /*
  * This function checks to see if the media has been changed in the
  * CDROM drive.  It is possible that we have already sensed a change,
@@ -424,8 +446,15 @@
 
 static int sr_block_release(struct inode *inode, struct file *file)
 {
+	int ret;
 	struct scsi_cd *cd = scsi_cd(inode->i_bdev->bd_disk);
-	return cdrom_release(&cd->cdi, file);
+	ret = cdrom_release(&cd->cdi, file);
+	if(ret)
+		return ret;
+	
+	scsi_cd_put(cd);
+
+	return 0;
 }
 
 static int sr_block_ioctl(struct inode *inode, struct file *file, unsigned cmd,
@@ -467,7 +496,7 @@
 	struct scsi_device *sdev = cd->device;
 	int retval;
 
-	retval = scsi_device_get(sdev);
+	retval = scsi_cd_get(cd);
 	if (retval)
 		return retval;
 	
@@ -489,7 +518,7 @@
 	return 0;
 
 error_out:
-	scsi_device_put(sdev);
+	scsi_cd_put(cd);
 	return retval;	
 }
 
@@ -500,7 +529,6 @@
 	if (cd->device->sector_size > 2048)
 		sr_set_blocklength(cd, 2048);
 
-	scsi_device_put(cd->device);
 }
 
 static int sr_probe(struct device *dev)
@@ -514,12 +542,18 @@
 	if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM)
 		goto fail;
 
+	if ((error = scsi_device_get(sdev)) != 0)
+		goto fail;
+
 	error = -ENOMEM;
 	cd = kmalloc(sizeof(*cd), GFP_KERNEL);
 	if (!cd)
-		goto fail;
+		goto fail_put_sdev;
 	memset(cd, 0, sizeof(*cd));
 
+	kobject_init(&cd->kobj);
+	cd->kobj.ktype = &scsi_cdrom_kobj_type;
+
 	disk = alloc_disk(1);
 	if (!disk)
 		goto fail_free;
@@ -588,6 +622,8 @@
 	put_disk(disk);
 fail_free:
 	kfree(cd);
+fail_put_sdev:
+	scsi_device_put(sdev);
 fail:
 	return error;
 }
@@ -863,19 +899,31 @@
 	return cgc->stat;
 }
 
-static int sr_remove(struct device *dev)
+static void sr_kobject_release(struct kobject *kobj)
 {
-	struct scsi_cd *cd = dev_get_drvdata(dev);
-
-	del_gendisk(cd->disk);
+	struct scsi_cd *cd = container_of(kobj, struct scsi_cd, kobj);
+	struct scsi_device *sdev = cd->device;
 
 	spin_lock(&sr_index_lock);
 	clear_bit(cd->disk->first_minor, sr_index_bits);
 	spin_unlock(&sr_index_lock);
 
-	put_disk(cd->disk);
 	unregister_cdrom(&cd->cdi);
+
+	put_disk(cd->disk);
+
 	kfree(cd);
+
+	scsi_device_put(sdev);
+}
+
+static int sr_remove(struct device *dev)
+{
+	struct scsi_cd *cd = dev_get_drvdata(dev);
+
+	del_gendisk(cd->disk);
+
+	scsi_cd_put(cd);
 
 	return 0;
 }



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-05 21:27                             ` James Bottomley
@ 2004-04-06 14:00                               ` Alan Stern
  0 siblings, 0 replies; 62+ messages in thread
From: Alan Stern @ 2004-04-06 14:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Anderson, Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On 5 Apr 2004, James Bottomley wrote:

> On Sun, 2004-04-04 at 22:17, Alan Stern wrote:
> > Of course it would!  That's exactly what this thread is about: bugs caused
> > by improper handling of open/disconnect races.
> 
> So you're quietly shifting your ground away from this assertion:
> 
> On Sat, 2004-04-03 at 19:40, Alan Stern wrote:
> > The problem _can_ be solved by introducing a lock higher up, such as at
> > the driver level or at the bus level.  (A kernel lock would work too but
> > it would be extravagantly excessive.)  For example, the bus subsystem
> > rwsem in the driver model prevents analogous problems there.  But you
> > don't want to get a read lock on a bus-wide semaphore every time your
> > open() procedure runs!  A driver-wide lock makes a good solution.
> > 
> > Another possible solution would be to have disconnect() perform an RCU 
> > update to the device pointer.  I haven't seen any code that does this, but 
> > I think it ought to work.
> 
> ?

Not at all.  Open/disconnect races can appear at several points in a 
driver.  The earlier assertion referred to one of those points: the place 
where the code gets a pointer to a device structure from an open inode's 
private data.

> My contention is that the races can be solved by proper refcounting
> (without the need for locks and RCUs) not that we don't have any bugs in
> sd.c (I'll be happy if I can pull them all out of sr at the moment).

Your contention is correct, I agree.  However, not all parts of the kernel
implement proper refcounting and, equally important, proper
delete-then-release notifications.  In particular, the VFS layer doesn't.  
If a driver notifies VFS that a device has been disconnected so that the
private data in the corresponding /dev inode entry can be removed, the
driver does _not_ receive a release notification in return when the last
process actively using that inode has finished with it.

Given the lack of proper adherence to the object-lifetime rules in VFS,
a driver has no choice but to adopt a scheme similar to what I outlined
in the section you quoted above.  _That's_ the manifestation of the
open/disconnect race I was referring to.  (It's different from the 
problems in sd.c, incidentally.)

Alan Stern

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [linux-usb-devel] Re: bug 2400
  2004-04-06  6:52                                   ` Oliver Neukum
@ 2004-04-06 14:03                                     ` James Bottomley
  2004-04-07  9:19                                       ` Oliver.Neukum
  0 siblings, 1 reply; 62+ messages in thread
From: James Bottomley @ 2004-04-06 14:03 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: David Brownell, Alan Stern, Mike Anderson, Andrew Morton, greg,
	Jens Axboe, linux-usb-devel, SCSI Mailing List

On Tue, 2004-04-06 at 01:52, Oliver Neukum wrote:
> Pure refcounting can never protect you against races with freeing objects.
> The counters themselves must be protected. Try as you might you need
> locks for that and rules on how this locks are to be used.

Which part of

On Mon, 2004-04-05 at 20:19, James Bottomley wrote:
> Now, if the subsystem is going to garbage collect its own object as a
> result of the other object disconnect, then it is responsible for
> synchronising that with reference gets on its own object.  However, that
> is easily achievable via *intra* subsystem synchronisation.

didn't you understand?

However, how a subsystem resolves this intra subsystem synchronisation
is up to it ... and it doesn't have to do it with locks.  So there are
no exposed locks for this and no rules therefore, for how to use them.

James



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-06 13:56           ` James Bottomley
@ 2004-04-06 14:04             ` Jens Axboe
  2004-04-06 14:09               ` James Bottomley
  0 siblings, 1 reply; 62+ messages in thread
From: Jens Axboe @ 2004-04-06 14:04 UTC (permalink / raw)
  To: James Bottomley
  Cc: Patrick Mansfield, Andrew Morton, greg, linux-usb-devel,
	SCSI Mailing List

On Tue, Apr 06 2004, James Bottomley wrote:
> On Tue, 2004-04-06 at 04:22, Jens Axboe wrote:
> > Really? It doesn't even compile :-)
> 
> Heh, I really must learn that I have to copy the file from the test
> machine to the email machine *before* attaching it.  You got a stale
> copy of an older incarnation, I think.

Only change is the err -> error, correct?

> The attached (hopefully) is what I compiled and tested with.  The test,
> incidentally, is simply to hold the device open and then forcibly remove
> it using scsi remove-single-device before closing it.

Better than what is there, whether it needs other synchronization is a
different question.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-06 14:04             ` Jens Axboe
@ 2004-04-06 14:09               ` James Bottomley
  0 siblings, 0 replies; 62+ messages in thread
From: James Bottomley @ 2004-04-06 14:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Patrick Mansfield, Andrew Morton, greg, linux-usb-devel,
	SCSI Mailing List

On Tue, 2004-04-06 at 09:04, Jens Axboe wrote:
> Only change is the err -> error, correct?

Not quite; there was a cleanup goto issue as well (don't want to goto
the release if you couldn't get the object).

> > The attached (hopefully) is what I compiled and tested with.  The test,
> > incidentally, is simply to hold the device open and then forcibly remove
> > it using scsi remove-single-device before closing it.
> 
> Better than what is there, whether it needs other synchronization is a
> different question.

Heh, join the debate, why don't you...

James



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-05 22:10                             ` Patrick Mansfield
@ 2004-04-06 14:10                               ` Alan Stern
  2004-04-08 14:09                               ` Alan Stern
  1 sibling, 0 replies; 62+ messages in thread
From: Alan Stern @ 2004-04-06 14:10 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: James Bottomley, Mike Anderson, Andrew Morton, greg, Jens Axboe,
	linux-usb-devel, SCSI Mailing List

On Mon, 5 Apr 2004, Patrick Mansfield wrote:

> Yes. 
> 
> If we are in middle opening an sd, and are anywhere between the
> kobj_lookup having completed call to up_read(domain->sem) [called via
> do_open calling get_gendisk] and sd_open calling scsi_device_get(), and
> separately an sd_remove call completes [freeing the scsi_disk and the
> scsi_device] there could be an oops.
> 
> Much as I hate it, simply adding a lock_kernel in sd_remove would solve
> the problem, since do_open uses lock_kernel.

There must be a better solution, although I can't recommend anything 
offhand.  It might require changing the genhd interface somewhat.

> And I don't see why the kobject_get/put use an atomic_inc/dec, it just
> obscures the need for a lock. We already have a reference to the kobject
> (it's passed as an argument). The calls to the get/put must either be
> protected with a lock or other method, or we have already gotten a
> reference to the object and getting another is meaningless.

This is really a matter of definitions.  The kobject/driver-model people 
use "reference" to mean, very specifically, an incremented refcounter.  
They do not (so far as I can tell) intend the more general meaning, i.e., 
any sort of pointer to the object.  So yes, the call to kobject_get passes 
as an argument a "reference" in the general sense, and the call returns a 
"reference" in the more specific sense.

The point is, the refcounting model is meant only to guarantee that an
object will not be deallocated when it has any outstanding "references" in
the specific sense -- i.e., when its refcount is positive.  It does _not_
guarantee that there are no pointers to the object lurking somewhere else
in the kernel; that job is up to the driver's author.  With careful
programming it can be done.  The code in sd.c is not sufficiently careful.

Alan Stern

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
@ 2004-04-06 15:09 Heiko Carstens
  0 siblings, 0 replies; 62+ messages in thread
From: Heiko Carstens @ 2004-04-06 15:09 UTC (permalink / raw)
  To: Mike Anderson; +Cc: James Bottomley, SCSI Mailing List

Hello,

>> When a USB unplug comes along, it calls scsi_remove_device, which
>> eventually calls device_del().  The problem is that device_del triggers
>> the ->remove methods of all the attached drivers and the sr_remove
>> method calls cdrom_unregister which throws away the cdrom device state,
>> even though the actual device has active references.
>Yes, we reordered some of this in sd. As your comment down below
>indicates reordering will reduce the window but not eliminate it.
>>[...]
>> This problem, by the way, exists in a lesser form for sd: the sd remove
>> method will free the device for reattachment even though it might have
>> active references.
>I have looked at the sd issue off and on due to the previous open race
>reported by Alan Stern. While the window can be narrowed inside SCSI you
>need help for the calling subsystem to know when it is OK to cleanup and
>your routine will not be called anymore. A similar problem also showed
>up in the tear down the host directory entry in /proc/scsi but was only
>fixed up so far due to its depreciated status.

I think we hit a problem which is related to this on s390 using the zfcp
lldd. What happened is that the lldd called scsi_add_device for a new lun
just before scsi_remove_host is called. The result is the Oops below.
If there is a delay of about a second between scsi_add_device and
scsi_remove_host this Oops doesn't happen anymore.

Is this related to what you described or could it be something different?
I haven't looked deeply into this yet...

Unable to handle kernel pointer dereference at virtual kernel address 
0000000000000000
Oops: 0004 [#1]
CPU:    0    Not tainted
Process fcp_crash.sh (pid: 687, task: 0000000005ec0100, ksp: 
000000000519b7b0)
Krnl PSW : 0700100180000000 00000000000ad574 
(sysfs_hash_and_remove+0x38/0xec)
Krnl GPRS: 0000000000200200 0000000000000000 00000000000000b0 
0000000000000001
           00000000ffffffff 0000000000000000 0000000000000000 
0000000000000002
           000000000519b808 00000000003609a0 00000000060beaf0 
0000000000000000
           00000000002adfbc 0000000000295538 000000000519b428 
000000000519b388
Krnl Code: ba 54 20 00 a7 44 ff fc 12 44 a7 44 00 4f b9 04 00 3c b9 04 
Call Trace:
 [<00000000000ae758>] sysfs_remove_link+0x20/0x30
 [<00000000001240cc>] class_device_dev_unlink+0x40/0x44
 [<00000000001246c4>] class_device_del+0xf4/0x150
 [<000000000012473e>] class_device_unregister+0x1e/0x38
 [<000000000013f9c8>] scsi_remove_device+0x50/0xd0
 [<000000000013eb82>] scsi_forget_host+0x86/0xec
 [<00000000001368fa>] scsi_remove_host+0x36/0x80
 [<00000000001b24e4>] zfcp_adapter_scsi_unregister+0x30/0x70
 [<00000000001b0ea2>] zfcp_ccw_set_offline+0x66/0xbc
 [<000000000016602c>] ccw_device_set_offline+0x60/0x188
 [<0000000000166658>] online_store+0x264/0x380
 [<0000000000121982>] dev_attr_store+0x46/0x50
 [<00000000000ad94c>] flush_write_buffer+0x44/0x58
 [<00000000000ad9bc>] sysfs_write_file+0x5c/0x7c
 [<000000000006e180>] vfs_write+0xe8/0x13c
 [<000000000006e290>] sys_write+0x48/0x74
 [<0000000000020b4a>] sysc_do_restart+0x16/0x1c

Thanks,
Heiko


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [linux-usb-devel] Re: bug 2400
  2004-04-06  1:19                                 ` James Bottomley
  2004-04-06  6:52                                   ` Oliver Neukum
@ 2004-04-06 15:10                                   ` David Brownell
  2004-04-06 15:47                                     ` James Bottomley
  1 sibling, 1 reply; 62+ messages in thread
From: David Brownell @ 2004-04-06 15:10 UTC (permalink / raw)
  To: James Bottomley
  Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe,
	linux-usb-devel, SCSI Mailing List

James Bottomley wrote:
>  that's intra subsystem synchronisation, not inter subsystem

I'd say that device_driver.remove() calls from the bus management
code to a given device driver are inter-subsystem...

> Look, the rules are very simple:
> 
> - Every subsystem gives out refcounted objects.
> 
> - Every subsystem takes in a disconnection event for another object. 
> ..
> 
> If you follow these rules, there's no *inter* subsystem ordering or
> synchronisation requirement.

Well, either handling the disconnection event from the lower level
subsystem (bus) is such a requirement, or it's not necessary!  I've
got to assume the former, since the Linux kernel doesn't garbage
collect its memory.

Refcounting systems need loop-breaking protocols ("disconnect");
this is needed for that sort of reason, if no other.

- Dave

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [linux-usb-devel] Re: bug 2400
  2004-04-06 15:10                                   ` David Brownell
@ 2004-04-06 15:47                                     ` James Bottomley
  2004-04-06 16:16                                       ` David Brownell
  2004-04-06 16:55                                       ` Alan Stern
  0 siblings, 2 replies; 62+ messages in thread
From: James Bottomley @ 2004-04-06 15:47 UTC (permalink / raw)
  To: David Brownell
  Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe,
	linux-usb-devel, SCSI Mailing List

On Tue, 2004-04-06 at 10:10, David Brownell wrote:
> James Bottomley wrote:
> >  that's intra subsystem synchronisation, not inter subsystem
> 
> I'd say that device_driver.remove() calls from the bus management
> code to a given device driver are inter-subsystem...

so using a subsystem provided API from another subsystem constitutes
inter subsytem synchronisation?  I don't think so.

> Refcounting systems need loop-breaking protocols ("disconnect");
> this is needed for that sort of reason, if no other.

No, they don't.

Supposing we did no hotplug at all and just relied on the fact that a
driver responds DID_NO_CONNECT to a missing device.

We have a mounted fs on a CD on SCSI on USB which gets disconnected. No
event is generated because no hotplug. Now every read of the fs gets
EIO.  The system can continue in this state forever if necessary. 
However, eventually the user will get bored and unmount the fs

If we just generated a single event saying "device gone" and did nothing
else, userland helpers could pull apart the entire stack (clean up
application, kill processes, unmount fs, remove scsi device etc).

You're trying to make a particular event special.  That's not correct. 
There are many types of disconnect (user unmount, user offlines device)
coming from essentially all parts of the stack.

James

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [linux-usb-devel] Re: bug 2400
  2004-04-06 15:47                                     ` James Bottomley
@ 2004-04-06 16:16                                       ` David Brownell
  2004-04-06 16:55                                       ` Alan Stern
  1 sibling, 0 replies; 62+ messages in thread
From: David Brownell @ 2004-04-06 16:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Alan Stern, Mike Anderson, Andrew Morton, greg, Jens Axboe,
	linux-usb-devel, SCSI Mailing List

James Bottomley wrote:
>>> that's intra subsystem synchronisation, not inter subsystem
>>
>>I'd say that device_driver.remove() calls from the bus management
>>code to a given device driver are inter-subsystem...
> 
> so using a subsystem provided API from another subsystem constitutes
> inter subsytem synchronisation?  I don't think so.

So it's now "I won't listen to those facts, they bother me" ...
Or maybe you're just trolling?  "From another subsystem" is
"between subsystems" is "inter-subsystem"; Q.E.D.

Either way, it's not worth the time to respond to you any more,
this will be my last post on this thread.

>>Refcounting systems need loop-breaking protocols ("disconnect");
>>this is needed for that sort of reason, if no other.
> 
> 
> No, they don't.

Without protocols for breaking referencing loops, systems that
use refcounting _alone_ (as you're advocating) leak memory unless
they prevent such loops ... but requests sent from one subsystem
to the next, then later returned, each embed such a loop.

That's in addition to the need for locking in certain paths,
on multi-threaded systems (like Linux).

(And for that matter, there were no WMDs in Iraq.)

Sorry for mentioning all those inconvenient facts;
I'll go away now.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [linux-usb-devel] Re: bug 2400
  2004-04-06 15:47                                     ` James Bottomley
  2004-04-06 16:16                                       ` David Brownell
@ 2004-04-06 16:55                                       ` Alan Stern
  2004-04-06 17:13                                         ` James Bottomley
  1 sibling, 1 reply; 62+ messages in thread
From: Alan Stern @ 2004-04-06 16:55 UTC (permalink / raw)
  To: James Bottomley
  Cc: David Brownell, Mike Anderson, Andrew Morton, greg, Jens Axboe,
	linux-usb-devel, SCSI Mailing List

On 6 Apr 2004, James Bottomley wrote:

> Supposing we did no hotplug at all and just relied on the fact that a
> driver responds DID_NO_CONNECT to a missing device.
> 
> We have a mounted fs on a CD on SCSI on USB which gets disconnected. No
> event is generated because no hotplug. Now every read of the fs gets
> EIO.  The system can continue in this state forever if necessary. 
> However, eventually the user will get bored and unmount the fs
> 
> If we just generated a single event saying "device gone" and did nothing
> else, userland helpers could pull apart the entire stack (clean up
> application, kill processes, unmount fs, remove scsi device etc).

In principle, yes.  However...  The kernel still has to process requests 
correctly while the stack is partially deconstructed.  It also has to 
protect against userland helpers trying to pull things apart in the wrong 
order.

I'm not trying to say that anything you wrote was wrong -- just that the 
situation is more complicated than you make it appear.

Alan Stern


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [linux-usb-devel] Re: bug 2400
  2004-04-06 16:55                                       ` Alan Stern
@ 2004-04-06 17:13                                         ` James Bottomley
  0 siblings, 0 replies; 62+ messages in thread
From: James Bottomley @ 2004-04-06 17:13 UTC (permalink / raw)
  To: Alan Stern
  Cc: David Brownell, Mike Anderson, Andrew Morton, greg, Jens Axboe,
	linux-usb-devel, SCSI Mailing List

On Tue, 2004-04-06 at 11:55, Alan Stern wrote:
> In principle, yes.  However...  The kernel still has to process requests 
> correctly while the stack is partially deconstructed.  It also has to 
> protect against userland helpers trying to pull things apart in the wrong 
> order.

This I agree with ... that's where the "robust to errors" statement
comes in.  I'm not claiming we currently are, merely that we have to be
(regardless of ordered or disordered event propagation).

> I'm not trying to say that anything you wrote was wrong -- just that the 
> situation is more complicated than you make it appear.

Well, I think the principle is simple.  The practice is less so ... I
can still panic my test system by offlining the ext2 root filesystem in
the middle of heavy stress.

What I don't believe is that enforcing some kind of ordering can relieve
us of the necessity of making the error paths robust.

Since the largely hitherto untested error path becomes our main handler
for forced disconnect, we're necessarily running into a lot of bugs that
didn't show up before.  However, they have always been bugs regardless
of whether they got tripped.

James

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [linux-usb-devel] Re: bug 2400
  2004-04-06 14:03                                     ` James Bottomley
@ 2004-04-07  9:19                                       ` Oliver.Neukum
  0 siblings, 0 replies; 62+ messages in thread
From: Oliver.Neukum @ 2004-04-07  9:19 UTC (permalink / raw)
  To: James Bottomley
  Cc: Oliver Neukum, David Brownell, Alan Stern, Mike Anderson,
	Andrew Morton, greg, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On Tue, 6 Apr 2004, James Bottomley wrote:

> On Tue, 2004-04-06 at 01:52, Oliver Neukum wrote:
> > Pure refcounting can never protect you against races with freeing objects.
> > The counters themselves must be protected. Try as you might you need
> > locks for that and rules on how this locks are to be used.
>
> Which part of
>
> On Mon, 2004-04-05 at 20:19, James Bottomley wrote:
> > Now, if the subsystem is going to garbage collect its own object as a
> > result of the other object disconnect, then it is responsible for
> > synchronising that with reference gets on its own object.  However, that
> > is easily achievable via *intra* subsystem synchronisation.
>
> didn't you understand?
>
> However, how a subsystem resolves this intra subsystem synchronisation
> is up to it ... and it doesn't have to do it with locks.  So there are
> no exposed locks for this and no rules therefore, for how to use them.

How you do this notification is irrelevant. You must address the basic
race of increasing the refcount from 0 to 1. After that all is fine.
At some point you have to handle a pointer to an object whose refcount
is unknown to you.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-05 22:10                             ` Patrick Mansfield
  2004-04-06 14:10                               ` Alan Stern
@ 2004-04-08 14:09                               ` Alan Stern
  2004-04-08 16:24                                 ` Matt Gulick
  1 sibling, 1 reply; 62+ messages in thread
From: Alan Stern @ 2004-04-08 14:09 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: James Bottomley, Mike Anderson, Andrew Morton, greg, Jens Axboe,
	linux-usb-devel, SCSI Mailing List

On Mon, 5 Apr 2004, Patrick Mansfield wrote:

> Much as I hate it, simply adding a lock_kernel in sd_remove would solve
> the problem, since do_open uses lock_kernel.

On Fri, 2 Apr 2004, Mike Anderson wrote:

> I believe as indicated above that all cross subsystem registrations need
> a release / put callback. This would allow the release chain to be
> called from block -> ULD -> scsi core -> LLDD.
> 
> Recently I have not been spending the proper time looking at this, but
> last look it appeared that we needed to  add a release / put method call
> to the gendisk disk_release routine. The release function or object to do
> the put on would need to be set prior to the call to add_disk.

On Mon, 5 Apr 2004, Mike Anderson wrote:

> As I previously stated there is no notification back from the block
> layer that users are complete with a structure. Currently the only 
> method to prevent an oops looks like sd_remove would need to use   
> lock_kernel so that scsi could ensure that a user would be through open
> which means the sd_remove would not take references to zero or the user
> has not made it far enough through do open that they have received a   
> gendisk structure so that del_gendisk will ensure they do not call     
> sd_open.
> 
> I would like another method, but this looks like it would need to be
> another shared sync mechanism between scsi layer (an other blk
> interfaces) and block layer or a lookup method similar to kobj_lookup
> in scsi open routines so that a object can be unmapped atomically.

Mike's first comment was exactly right.

If a pointer to a kobject (or kref) were added to the genhd structure, and 
if the genhd release routine would do a put() on this pointer, then there 
would be no problem.  The pointer could be set to the scsi_disk's embedded 
kobject, and then the scsi_disk structure would be guaranteed to exist 
when it is dereferenced in sd_open().

Likewise, if the scsi_disk took a reference to its scsi_device and dropped 
that reference during its release routine, that would cure the other 
potential oops I identified earlier.  In neither case would a kernel lock 
be needed.

This discussion has stimulated some ideas about the driver model in 
general.  It turns out that many drivers take a struct device from one 
subsystem and present it (in different form) to another subsystem.  Some 
examples:

	A SCSI host adapter driver takes a PCI device (or in the case
	of usb-storage, a USB device interface) and presents it to the
	SCSI core as a host adapter.

	The sd driver takes a SCSI device and presents it to the genhd
	layer as a disk drive.

	The USB UHCI driver takes a PCI device and presents it to the
	USB core as a USB bus.

In each case there are two driver-model structures involved: the parent 
which belongs to the upper subsystem, and the child which belongs to the 
lower subsystem.  No driver-model entity belongs to the intermediate 
driver!

As a result, when a disconnect occurs the driver has no way to know when 
the lower subsystem is finished using the device:

	The SCSI core does not notify the LLDD in any way following
	scsi_remove_host().

	The genhd layer doesn't inform sd.c when then genhd structure
	is released.

	usbcore _does_ notify the HC drivers via a callback when a bus is 
	released; it's an exception.

There is implicitly _a_ notification, because when the child device is 
released it drops its reference to the parent (I still think this would be 
better if it were done differently, but never mind).  However that doesn't 
do the intermediate driver any good, because the driver doesn't own the 
parent!  The parent is owned by the higher subsystem.

A general solution is to have the lower subsystem take a pointer to a 
kobject (or even a kref) belonging to the intermediate driver, and have it 
drop its reference when the child device is released.  This approach is 
just as flexible as the callback made by usbcore.

It could even be formalized as part of the driver model, although I'm not
at all certain this would be a good idea.  The concept is simple enough: A
struct device (or maybe a struct kobject) would have as an additional
member a pointer to a struct kobject (or maybe to a struct kref).  Call
this additional member "master".  During the device release (or kobject
cleanup) the core would do a put on the master pointer if it is set.

It's not clear that a sufficiently high percentage of all kobjects follow 
this intermediate-driver pattern for it to be worthwhile adding the master 
pointer.  But individual systems can still implement it on their own, and 
they should.

Alan Stern

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-08 14:09                               ` Alan Stern
@ 2004-04-08 16:24                                 ` Matt Gulick
  2004-04-08 18:33                                   ` Alan Stern
  0 siblings, 1 reply; 62+ messages in thread
From: Matt Gulick @ 2004-04-08 16:24 UTC (permalink / raw)
  To: Alan Stern
  Cc: Patrick Mansfield, James Bottomley, Mike Anderson, Andrew Morton,
	greg, Jens Axboe, linux-usb-devel, SCSI Mailing List

> There is implicitly _a_ notification, because when the child device is 
> released it drops its reference to the parent (I still think this would be 
> better if it were done differently, but never mind).  However that doesn't 
> do the intermediate driver any good, because the driver doesn't own the 
> parent!  The parent is owned by the higher subsystem.
> 
> A general solution is to have the lower subsystem take a pointer to a 
> kobject (or even a kref) belonging to the intermediate driver, and have it 
> drop its reference when the child device is released.  This approach is 
> just as flexible as the callback made by usbcore.
> 
> It could even be formalized as part of the driver model, although I'm not
> at all certain this would be a good idea.  The concept is simple enough: A
> struct device (or maybe a struct kobject) would have as an additional
> member a pointer to a struct kobject (or maybe to a struct kref).  Call
> this additional member "master".  During the device release (or kobject
> cleanup) the core would do a put on the master pointer if it is set.
> 
> It's not clear that a sufficiently high percentage of all kobjects follow 
> this intermediate-driver pattern for it to be worthwhile adding the master 
> pointer.  But individual systems can still implement it on their own, and 
> they should.

OK, Silly question or maybe not.

When writing drivers for MacOS ( 7-9 & X) and Windose (98 - XP) and when
I architected the USB 2.0 stack at Adaptec for 98SE, ME & 2k, we solved
this issue with a simple heart beat task.

Every so often (1-3 seconds) any device that was at risk of removal
would receive a TEST UNIT READY cdb.

Using the model of 1394, USB, ... being treated as a device with no
media inserted (like a CD drive is treated), then you can query the
device for media availability.

Using the USB model of 7 tiers of devices and most hubs having 4 ports
(7 port hubs are just two 4 port hubs internally connected) you can have
way more than 15 SCSI ID's.  By treating each USB as having its own ID
(EHCI USB chips typically have three USB identities of 1 EHCI and 2 OHCI
interfaces) and the devices on that bus that are mass storage class
devices using SBP-2 or SBP-3 would be a LUN on that device.

By treating each bus as a virtual device, the main struct can be static
with LUN children added or removed as needed.

Any thoughts on this?

Matt

----------------------------------------
Matt Gulick
Sr. Staff Engineer
Adaptec, Inc.
gulickconsulting@direcway.com
matt_gulick@adaptec.com
(715) 426-0884

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-08 16:24                                 ` Matt Gulick
@ 2004-04-08 18:33                                   ` Alan Stern
  2004-04-08 19:44                                     ` Matt Gulick
  0 siblings, 1 reply; 62+ messages in thread
From: Alan Stern @ 2004-04-08 18:33 UTC (permalink / raw)
  To: Matt Gulick
  Cc: Patrick Mansfield, James Bottomley, Mike Anderson, Andrew Morton,
	greg, Jens Axboe, linux-usb-devel, SCSI Mailing List

On Thu, 8 Apr 2004, Matt Gulick wrote:

> OK, Silly question or maybe not.
> 
> When writing drivers for MacOS ( 7-9 & X) and Windose (98 - XP) and when
> I architected the USB 2.0 stack at Adaptec for 98SE, ME & 2k, we solved
> this issue with a simple heart beat task.
> 
> Every so often (1-3 seconds) any device that was at risk of removal
> would receive a TEST UNIT READY cdb.
> 
> Using the model of 1394, USB, ... being treated as a device with no
> media inserted (like a CD drive is treated), then you can query the
> device for media availability.
> 
> Using the USB model of 7 tiers of devices and most hubs having 4 ports
> (7 port hubs are just two 4 port hubs internally connected) you can have
> way more than 15 SCSI ID's.  By treating each USB as having its own ID
> (EHCI USB chips typically have three USB identities of 1 EHCI and 2 OHCI
> interfaces) and the devices on that bus that are mass storage class
> devices using SBP-2 or SBP-3 would be a LUN on that device.
> 
> By treating each bus as a virtual device, the main struct can be static
> with LUN children added or removed as needed.
> 
> Any thoughts on this?
> 
> Matt

I think you're talking about a different problem.  Sending heartbeats 
solves the problem of detecting media availability and device 
availability.  It doesn't solve the problem we're discussing here, which 
is how to tear down the device driver stack without causing any errors, 
particularly if the user tries to access the device while the stack is 
being deconstructed.

Alan Stern


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-08 18:33                                   ` Alan Stern
@ 2004-04-08 19:44                                     ` Matt Gulick
  0 siblings, 0 replies; 62+ messages in thread
From: Matt Gulick @ 2004-04-08 19:44 UTC (permalink / raw)
  To: Alan Stern
  Cc: Patrick Mansfield, James Bottomley, Mike Anderson, Andrew Morton,
	greg, Jens Axboe, linux-usb-devel, SCSI Mailing List

On Thu, 2004-04-08 at 13:33, Alan Stern wrote:
> On Thu, 8 Apr 2004, Matt Gulick wrote:
> 
> > OK, Silly question or maybe not.
> > 
> > When writing drivers for MacOS ( 7-9 & X) and Windose (98 - XP) and
> when
> > I architected the USB 2.0 stack at Adaptec for 98SE, ME & 2k, we
> solved
> > this issue with a simple heart beat task.
> > 
> > Every so often (1-3 seconds) any device that was at risk of removal
> > would receive a TEST UNIT READY cdb.
> > 
> > Using the model of 1394, USB, ... being treated as a device with no
> > media inserted (like a CD drive is treated), then you can query the
> > device for media availability.
> > 
> > Using the USB model of 7 tiers of devices and most hubs having 4
> ports
> > (7 port hubs are just two 4 port hubs internally connected) you can
> have
> > way more than 15 SCSI ID's.  By treating each USB as having its own
> ID
> > (EHCI USB chips typically have three USB identities of 1 EHCI and 2
> OHCI
> > interfaces) and the devices on that bus that are mass storage class
> > devices using SBP-2 or SBP-3 would be a LUN on that device.
> > 
> > By treating each bus as a virtual device, the main struct can be
> static
> > with LUN children added or removed as needed.
> > 
> > Any thoughts on this?
> > 
> > Matt
> 
> I think you're talking about a different problem.  Sending heartbeats 
> solves the problem of detecting media availability and device 
> availability.  It doesn't solve the problem we're discussing here,
> which 
> is how to tear down the device driver stack without causing any
> errors, 
> particularly if the user tries to access the device while the stack is
> being deconstructed.
> 
> Alan Stern

True.  This only mitigates the need for the SCSI subsystem from having
to release device structures that might be used for logical SCSI Bus
housekeeping.

I will have to dig into the Linux architecture model for SCSI to put
this in Linux form vs what is done elsewhere.

I'll be back.  ;-)

Matt


----------------------------------------
Matt Gulick
Sr. Staff Engineer
Adaptec, Inc.
gulickconsulting@direcway.com
matt_gulick@adaptec.com
(715) 426-0884



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-05 21:07       ` James Bottomley
  2004-04-06  9:22         ` Jens Axboe
@ 2004-04-08 23:06         ` Greg KH
  2004-04-09 11:28           ` James Bottomley
  1 sibling, 1 reply; 62+ messages in thread
From: Greg KH @ 2004-04-08 23:06 UTC (permalink / raw)
  To: James Bottomley
  Cc: Patrick Mansfield, Andrew Morton, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On Mon, Apr 05, 2004 at 04:07:12PM -0500, James Bottomley wrote:
> On Sun, 2004-04-04 at 23:33, Patrick Mansfield wrote:
> > How is unregister_cdrom(&cd->cdi) called if the device is not open?
> 
> OK, the attached does everything correctly (as I should have done at
> first) by using a kobject to hold the compound references.

Why not just use a kref for this?  That's what they were created for?
:)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: bug 2400
  2004-04-08 23:06         ` Greg KH
@ 2004-04-09 11:28           ` James Bottomley
  0 siblings, 0 replies; 62+ messages in thread
From: James Bottomley @ 2004-04-09 11:28 UTC (permalink / raw)
  To: Greg KH
  Cc: Patrick Mansfield, Andrew Morton, Jens Axboe, linux-usb-devel,
	SCSI Mailing List

On Thu, 2004-04-08 at 18:06, Greg KH wrote:
> Why not just use a kref for this?  That's what they were created for?
> :)

Heh, I was waiting for you to say that.  Took you five days to notice,
though ....

I'll convert both sr and sd to use krefs.

James



^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2004-04-09 11:28 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-01 21:15 bug 2400 Andrew Morton
2004-04-01 21:52 ` Matt Gulick
2004-04-01 22:08   ` Andrew Morton
2004-04-01 22:48     ` Matt Gulick
2004-04-01 22:40   ` James Bottomley
2004-04-01 22:53     ` Matt Gulick
2004-04-01 23:07 ` Matthew Dharm
2004-04-01 23:32 ` James Bottomley
2004-04-02  0:29   ` Steven Dake
2004-04-02  8:43   ` Mike Anderson
2004-04-02 15:57     ` James Bottomley
2004-04-02 16:45       ` Mike Anderson
2004-04-02 17:05         ` James Bottomley
2004-04-02 17:44           ` Mike Anderson
2004-04-02 18:13             ` James Bottomley
2004-04-02 23:40               ` Mike Anderson
2004-04-03  0:25                 ` James Bottomley
2004-04-04  1:40                   ` Alan Stern
2004-04-04 15:23                     ` James Bottomley
2004-04-04 16:46                       ` Alan Stern
2004-04-04 17:04                         ` James Bottomley
2004-04-05  3:17                           ` Alan Stern
2004-04-05 14:59                             ` Mike Anderson
2004-04-05 21:27                             ` James Bottomley
2004-04-06 14:00                               ` Alan Stern
2004-04-05 22:10                             ` Patrick Mansfield
2004-04-06 14:10                               ` Alan Stern
2004-04-08 14:09                               ` Alan Stern
2004-04-08 16:24                                 ` Matt Gulick
2004-04-08 18:33                                   ` Alan Stern
2004-04-08 19:44                                     ` Matt Gulick
2004-04-05 13:30                           ` [linux-usb-devel] " Oliver Neukum
2004-04-04 18:16                       ` David Brownell
2004-04-04 18:42                         ` James Bottomley
2004-04-05  3:54                           ` David Brownell
2004-04-05 21:44                             ` James Bottomley
2004-04-05 23:23                               ` [linux-usb-devel] " David Brownell
2004-04-06  1:19                                 ` James Bottomley
2004-04-06  6:52                                   ` Oliver Neukum
2004-04-06 14:03                                     ` James Bottomley
2004-04-07  9:19                                       ` Oliver.Neukum
2004-04-06 15:10                                   ` David Brownell
2004-04-06 15:47                                     ` James Bottomley
2004-04-06 16:16                                       ` David Brownell
2004-04-06 16:55                                       ` Alan Stern
2004-04-06 17:13                                         ` James Bottomley
2004-04-02 23:36   ` James Bottomley
2004-04-03  0:11     ` Mike Anderson
2004-04-03  0:16       ` James Bottomley
2004-04-05  4:33     ` Patrick Mansfield
2004-04-05 14:09       ` James Bottomley
2004-04-05 21:07       ` James Bottomley
2004-04-06  9:22         ` Jens Axboe
2004-04-06 13:56           ` James Bottomley
2004-04-06 14:04             ` Jens Axboe
2004-04-06 14:09               ` James Bottomley
2004-04-08 23:06         ` Greg KH
2004-04-09 11:28           ` James Bottomley
2004-04-05 14:03     ` Jens Axboe
2004-04-05 21:08       ` James Bottomley
2004-04-06  9:22         ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2004-04-06 15:09 Heiko Carstens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox