* Re: [linux-usb-devel] BUG when removing USB flash drive
[not found] <20040514004132.GA10537@telpin.com.ar>
@ 2004-05-14 3:11 ` Alan Stern
2004-05-14 5:36 ` Mike Anderson
0 siblings, 1 reply; 9+ messages in thread
From: Alan Stern @ 2004-05-14 3:11 UTC (permalink / raw)
To: Alberto Bertogli; +Cc: USB development list, SCSI development list
On Thu, 13 May 2004, Alberto Bertogli wrote:
> Hi!
>
> I've just hit a bug after removing a Kingston USB flash drive.
>
> I have removed it several (probably more than 40) times today without
> problems until now. I'm removing it while doing streaming write()s on
> the device (using /dev/sda) because I'm testing some things with the
> drive. This was just like all the other times, so it looks like a race
> somewhere.
>
> This is a stock 2.6.6 kernel, on a Pentium 4 with HT (the kernel is
> compiled with both SMP and preempt).
>
> After the BUG the kernel doesn't detect the device anymore.
>
> Please let me know if you need any more information or if I can give you a
> hand with testing.
>
> Thanks a lot,
> Alberto
>
>
> kernel BUG at drivers/usb/storage/usb.c:848!
This BUG happened because the SCSI layer was still using the drive after
usb-storage had called scsi_remove_host().
This is an aspect of SCSI midlayer <-> low-level driver interaction that
I'm not very clear about. How long after the host is removed can the
midlayer continue to use it? How does the driver know when the midlayer
is finished using the host so the driver can exit (and unload from
memory)?
Alan Stern
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-usb-devel] BUG when removing USB flash drive
2004-05-14 3:11 ` [linux-usb-devel] BUG when removing USB flash drive Alan Stern
@ 2004-05-14 5:36 ` Mike Anderson
2004-05-14 15:12 ` Alan Stern
0 siblings, 1 reply; 9+ messages in thread
From: Mike Anderson @ 2004-05-14 5:36 UTC (permalink / raw)
To: Alan Stern; +Cc: Alberto Bertogli, USB development list, SCSI development list
Alan Stern [stern@rowland.harvard.edu] wrote:
> This BUG happened because the SCSI layer was still using the drive after
> usb-storage had called scsi_remove_host().
In this case the scsi_remove_host is being called in a unexpected
disconnect case. Any IOs in flight will be canceled in the mid-layer
(i.e., they will not be chased down with calls to eh_abort_handler).
>
> This is an aspect of SCSI midlayer <-> low-level driver interaction that
> I'm not very clear about. How long after the host is removed can the
> midlayer continue to use it? How does the driver know when the midlayer
> is finished using the host so the driver can exit (and unload from
> memory)?
The LLDD queuecommand should not be called after the return of
scsi_remove_host (unless we have a bug). The LLDD should not free its
resources until the release function is called on the struct device
passed in during the scsi_add_host call. This release function could be
called post the scsi_host_dev_release function being called as it does a
put on this struct device. This chain of release calls is undefined as a
hotplug goes out on the remove calls, but if user space never umounts /
closes the device things will not fully cleanup.
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-usb-devel] BUG when removing USB flash drive
2004-05-14 5:36 ` Mike Anderson
@ 2004-05-14 15:12 ` Alan Stern
2004-05-14 16:59 ` Mike Anderson
0 siblings, 1 reply; 9+ messages in thread
From: Alan Stern @ 2004-05-14 15:12 UTC (permalink / raw)
To: Mike Anderson
Cc: Alberto Bertogli, USB development list, SCSI development list
On Thu, 13 May 2004, Mike Anderson wrote:
> The LLDD queuecommand should not be called after the return of
> scsi_remove_host (unless we have a bug). The LLDD should not free its
> resources until the release function is called on the struct device
> passed in during the scsi_add_host call.
Ah. We have a problem here, because usb-storage doesn't own that struct
device. It is owned by the USB core, and usb-storage has no way to know
when its release function has been called.
Even worse, the release function may never be called at all. The device
itself may still exist and be attached to the system even though the
usb-storage driver has been unbound from it.
There must be many other LLDDs with the same problem. Any driver that
manages a PCI host controller, for example. The struct device it receives
from the PCI layer isn't owned by the LLDD, yet that's what it must pass
to scsi_add_host().
How should we handle this?
Alan Stern
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-usb-devel] BUG when removing USB flash drive
2004-05-14 15:12 ` Alan Stern
@ 2004-05-14 16:59 ` Mike Anderson
2004-05-14 19:33 ` Alan Stern
0 siblings, 1 reply; 9+ messages in thread
From: Mike Anderson @ 2004-05-14 16:59 UTC (permalink / raw)
To: Alan Stern; +Cc: Alberto Bertogli, USB development list, SCSI development list
Alan Stern [stern@rowland.harvard.edu] wrote:
> On Thu, 13 May 2004, Mike Anderson wrote:
>
> > The LLDD queuecommand should not be called after the return of
> > scsi_remove_host (unless we have a bug). The LLDD should not free its
> > resources until the release function is called on the struct device
> > passed in during the scsi_add_host call.
>
> Ah. We have a problem here, because usb-storage doesn't own that struct
> device. It is owned by the USB core, and usb-storage has no way to know
> when its release function has been called.
>
> Even worse, the release function may never be called at all. The device
> itself may still exist and be attached to the system even though the
> usb-storage driver has been unbound from it.
>
> There must be many other LLDDs with the same problem. Any driver that
> manages a PCI host controller, for example. The struct device it receives
> from the PCI layer isn't owned by the LLDD, yet that's what it must pass
> to scsi_add_host().
>
> How should we handle this?
>
I was incorrect in my previous statement. Well we may hold a reference
count on the struct device passed into the scsi_add_host until all
references on the host instance are gone and there maybe a positive ref
count on the hostt module until all devices are closed a instance of
LLDD should be able to do cleanup post return of scsi_remove_host. This
needs to be true or SCSI mid-layer would not honor the requirement for
the pci remove call.
I believe in the past you or someone else has mentioned that post the
return of scsi_remove_host a Scsi_Host instance still has a hostt and a
transportt which may be dangerous if all code paths are not correctly
handling a host in the SHOST_DEL state.
I guess we still have this current issue on what is happening in the
usb storage that it believes SCSI mid is still using the device. Is
something not cleaning up, or is something in SCSI mid not honoring the
SHOST_DEL state.
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-usb-devel] BUG when removing USB flash drive
2004-05-14 16:59 ` Mike Anderson
@ 2004-05-14 19:33 ` Alan Stern
2004-05-14 20:04 ` James Bottomley
0 siblings, 1 reply; 9+ messages in thread
From: Alan Stern @ 2004-05-14 19:33 UTC (permalink / raw)
To: Mike Anderson
Cc: Alberto Bertogli, USB development list, SCSI development list
On Fri, 14 May 2004, Mike Anderson wrote:
> I was incorrect in my previous statement. Well we may hold a reference
> count on the struct device passed into the scsi_add_host until all
> references on the host instance are gone and there maybe a positive ref
> count on the hostt module until all devices are closed a instance of
> LLDD should be able to do cleanup post return of scsi_remove_host. This
> needs to be true or SCSI mid-layer would not honor the requirement for
> the pci remove call.
>
> I believe in the past you or someone else has mentioned that post the
> return of scsi_remove_host a Scsi_Host instance still has a hostt and a
> transportt which may be dangerous if all code paths are not correctly
> handling a host in the SHOST_DEL state.
Yes, that was me.
> I guess we still have this current issue on what is happening in the
> usb storage that it believes SCSI mid is still using the device. Is
> something not cleaning up, or is something in SCSI mid not honoring the
> SHOST_DEL state.
usb-storage checks the state of its kernel thread after calling
scsi_remove_host(), while preparing to kill the thread. This can only
be done while the thread is idle, not processing any queued commands or
error handler requests. The BUG in Alberto's log was triggered because
the thread was not idle. Unfortunately the log doesn't say whether it was
executing a queued command or an error handler request at the time.
It does appear to have been some sort of race, as Alberto mentioned in his
original message:
http://marc.theaimsgroup.com/?l=linux-usb-devel&m=108449593410567&w=2
It might not be easy to find out where the race is occurring.
Alan Stern
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-usb-devel] BUG when removing USB flash drive
2004-05-14 19:33 ` Alan Stern
@ 2004-05-14 20:04 ` James Bottomley
2004-05-14 21:13 ` Alan Stern
0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2004-05-14 20:04 UTC (permalink / raw)
To: Alan Stern
Cc: Mike Anderson, Alberto Bertogli, USB development list,
SCSI development list
On Fri, 2004-05-14 at 14:33, Alan Stern wrote:
> usb-storage checks the state of its kernel thread after calling
> scsi_remove_host(), while preparing to kill the thread. This can only
> be done while the thread is idle, not processing any queued commands or
> error handler requests. The BUG in Alberto's log was triggered because
> the thread was not idle. Unfortunately the log doesn't say whether it was
> executing a queued command or an error handler request at the time.
The idea behind scsi_remove_host() was that it was going to be called
from the module_exit() routine. This means that the reference count on
the LLD has already dropped to zero and it's safe for removal.
If you call scsi_remove_host() outside of the module_exit() code then
what it does is remove all hosts from visibility (and makes it
impossible to open or do anything with them), but it cannot revoke
outstanding references (which may continue to use the device). The
mid-layer takes the devices through CANCEL and then DEL. When
scsi_remove_host() returns, you are guaranteed (barring any state model
cockups) that the devices on the LLD with outstanding references are all
in the DEL state. It will also block the device from above (no commands
will be sent down). Any currently outstanding commands will be errored
from the mid-layer; however, it will do nothing about outstanding
commands in the LLD, you have to deal with them yourself.
The way this is supposed to work is that when you get a disconnection,
you start refusing commands (and you also discard any active ones).
Then you call scsi_remove_host(). We discard all current commands
So, if the usb storage LLD isn't discarding the commands, then they may
be active.
James
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-usb-devel] BUG when removing USB flash drive
2004-05-14 20:04 ` James Bottomley
@ 2004-05-14 21:13 ` Alan Stern
2004-05-14 21:55 ` Alberto Bertogli
0 siblings, 1 reply; 9+ messages in thread
From: Alan Stern @ 2004-05-14 21:13 UTC (permalink / raw)
To: James Bottomley
Cc: Mike Anderson, Alberto Bertogli, USB development list,
SCSI development list
On 14 May 2004, James Bottomley wrote:
> The idea behind scsi_remove_host() was that it was going to be called
> from the module_exit() routine. This means that the reference count on
> the LLD has already dropped to zero and it's safe for removal.
>
> If you call scsi_remove_host() outside of the module_exit() code then
> what it does is remove all hosts from visibility (and makes it
> impossible to open or do anything with them), but it cannot revoke
> outstanding references (which may continue to use the device). The
> mid-layer takes the devices through CANCEL and then DEL. When
> scsi_remove_host() returns, you are guaranteed (barring any state model
> cockups) that the devices on the LLD with outstanding references are all
> in the DEL state. It will also block the device from above (no commands
> will be sent down). Any currently outstanding commands will be errored
> from the mid-layer; however, it will do nothing about outstanding
> commands in the LLD, you have to deal with them yourself.
>
> The way this is supposed to work is that when you get a disconnection,
> you start refusing commands (and you also discard any active ones).
> Then you call scsi_remove_host(). We discard all current commands
>
> So, if the usb storage LLD isn't discarding the commands, then they may
> be active.
Yes, that would explain it. We do arrange to abort the current command,
but this is done before calling scsi_remove_host() when it should be done
after. And we don't wait for the kernel thread to become idle afterwards.
I'll work on a patch. Testing it won't be easy, though; it's hard to
prove that a race _isn't_ happening!
Alan Stern
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-usb-devel] BUG when removing USB flash drive
2004-05-14 21:13 ` Alan Stern
@ 2004-05-14 21:55 ` Alberto Bertogli
2004-05-15 19:56 ` Alan Stern
0 siblings, 1 reply; 9+ messages in thread
From: Alberto Bertogli @ 2004-05-14 21:55 UTC (permalink / raw)
To: Alan Stern
Cc: James Bottomley, Mike Anderson, USB development list,
SCSI development list
On Fri, May 14, 2004 at 05:13:38PM -0400, Alan Stern wrote:
> I'll work on a patch. Testing it won't be easy, though; it's hard to
> prove that a race _isn't_ happening!
I'm going to have to repeat this tests a few more times (in fact, I'll
boot -mm2 in a couple of minutes to see if I can reproduce the bug there),
so if you want me to try some patch let me know. As you say, I can't
_prove_ it, but at least I can confirm I can't hit it =)
Thanks,
Alberto
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-usb-devel] BUG when removing USB flash drive
2004-05-14 21:55 ` Alberto Bertogli
@ 2004-05-15 19:56 ` Alan Stern
0 siblings, 0 replies; 9+ messages in thread
From: Alan Stern @ 2004-05-15 19:56 UTC (permalink / raw)
To: James Bottomley
Cc: Alberto Bertogli, Mike Anderson, USB development list,
SCSI development list
I took a look at scsi_host_cancel(), scsi_device_cancel(), and a few other
routines. Three things stood out:
When a host is removed, currently executing requests are
aborted within the midlevel but the low-level driver is not
informed of this. Presumably it shouldn't care since it
ought to cancel its own outstanding I/O?
There's a race between scsi_host_cancel()/scsi_device_cancel()
and scsi_times_out(). If the timer fires before
scsi_device_cancel() can delete it, and scsi_host_cancel()
tests the SHOST_RECOVERY bit before the error handler manages
to set it, then you will remove a host that's still in error
recovery.
There's no protection against accessing the host template
via procfs, if a user process pins the file during
scsi_remove_host(). You probably are already aware of this.
Alan Stern
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-05-15 19:56 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20040514004132.GA10537@telpin.com.ar>
2004-05-14 3:11 ` [linux-usb-devel] BUG when removing USB flash drive Alan Stern
2004-05-14 5:36 ` Mike Anderson
2004-05-14 15:12 ` Alan Stern
2004-05-14 16:59 ` Mike Anderson
2004-05-14 19:33 ` Alan Stern
2004-05-14 20:04 ` James Bottomley
2004-05-14 21:13 ` Alan Stern
2004-05-14 21:55 ` Alberto Bertogli
2004-05-15 19:56 ` Alan Stern
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox