* Re: [Patch] Fix oops on rmmod usb-storage
[not found] <415A67B8.2080003@suse.de>
@ 2004-09-29 12:03 ` Christoph Hellwig
2004-09-29 12:31 ` Hannes Reinecke
2004-09-29 13:56 ` James Bottomley
1 sibling, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2004-09-29 12:03 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: linux-scsi, James.Bottomley, Andrew Morton
[please send scsi issues to linux-scsi, thanks]
> It turned out that in drivers/scsi/hosts.c:scsi_remove_host()
> first the host is removed with scsi_forget_host() and _then_ all
> outstanding I/O to this host is cancelled with scsi_host_cancel().
> Sounds a bit fishy as scsi_host_cancel() tries to talk to a host which
> we just have deleted ...
> (Incidentally, this is most likely the same bug as Bug #2752 and #3480
> from bugme.osdl.org :-).
> (And also #133249 from bugzilla.redhat.com :-).
>
> The attached patch corrects this.
> Please apply.
I'ts not that easy. If we cancel the host first we won't get our write
caches flushed because the drivers don't accept the SYNCRHONIZE_CACHE.
command in cancelled state. Mike just changed the order to what it is
now a short while ago.
So we'll have to find a way to send a SYNCHRNOZIE_CACHE command even
in canncelled state.
In fact I can't see how these problems could happen, and that they only
happen with usb-storage seems strange.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 12:03 ` [Patch] Fix oops on rmmod usb-storage Christoph Hellwig
@ 2004-09-29 12:31 ` Hannes Reinecke
2004-09-29 17:12 ` Mike Anderson
0 siblings, 1 reply; 22+ messages in thread
From: Hannes Reinecke @ 2004-09-29 12:31 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-scsi, James.Bottomley, Andrew Morton
Christoph Hellwig wrote:
> [please send scsi issues to linux-scsi, thanks]
>
>
>>It turned out that in drivers/scsi/hosts.c:scsi_remove_host()
>>first the host is removed with scsi_forget_host() and _then_ all
>>outstanding I/O to this host is cancelled with scsi_host_cancel().
>>Sounds a bit fishy as scsi_host_cancel() tries to talk to a host which
>>we just have deleted ...
>>(Incidentally, this is most likely the same bug as Bug #2752 and #3480
>>from bugme.osdl.org :-).
>>(And also #133249 from bugzilla.redhat.com :-).
>>
>>The attached patch corrects this.
>>Please apply.
>
>
> I'ts not that easy. If we cancel the host first we won't get our write
> caches flushed because the drivers don't accept the SYNCRHONIZE_CACHE.
> command in cancelled state. Mike just changed the order to what it is
> now a short while ago.
>
Yeah, Jens Axboe just pointed that out.
> So we'll have to find a way to send a SYNCHRNOZIE_CACHE command even
> in canncelled state.
>
> In fact I can't see how these problems could happen, and that they only
> happen with usb-storage seems strange.
Well, the thing is: When usb-storage calls scsi_remove_host() the device
is already gone; whether or not we can synchronize the cache is a moot
point there.
And if I understood the comment to the Changeset properly, the patch was
just to remove the 'Synchronizing SCSI cache' message.
I can live with that. Kernel Oops is much worse.
Cheers,
Hannes
--
Dr. Hannes Reinecke hare@suse.de
SuSE Linux AG S390 & zSeries
Maxfeldstraße 5 +49 911 74053 688
90409 Nürnberg http://www.suse.de
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 13:56 ` James Bottomley
@ 2004-09-29 13:17 ` Alan Cox
2004-09-29 14:24 ` James Bottomley
0 siblings, 1 reply; 22+ messages in thread
From: Alan Cox @ 2004-09-29 13:17 UTC (permalink / raw)
To: James Bottomley
Cc: Hannes Reinecke, Linux Kernel Mailing List, Andrew Morton,
SCSI Mailing List
On Mer, 2004-09-29 at 14:56, James Bottomley wrote:
> The key to the solution of this problem is to know what USB is trying to
> do with the dead device. SCSI is trying to be polite and explicitly
> kill the outstanding commands before it removes the HBA. Presumably USB
> is returning something that says this can't be done so the EH gets all
> the way up to offlining.
Its nothing to do with USB, rmmod with eh running crashes all the other
SCSI drivers I've tested too. After the state transition fails you get
kobject related errors and a crash.
That makes me suspect whatever is ill is in the scsi core.
Alan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
[not found] <415A67B8.2080003@suse.de>
2004-09-29 12:03 ` [Patch] Fix oops on rmmod usb-storage Christoph Hellwig
@ 2004-09-29 13:56 ` James Bottomley
2004-09-29 13:17 ` Alan Cox
1 sibling, 1 reply; 22+ messages in thread
From: James Bottomley @ 2004-09-29 13:56 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Linux Kernel, Andrew Morton, SCSI Mailing List
On Wed, 2004-09-29 at 03:43, Hannes Reinecke wrote:
> usbcore: deregistering driver usb-storage
> scsi: Device offlined - not ready after error recovery: host 0 channel 0
> id 0 lun 0
> sr 0:0:0:0: Illegal state transition cancel->offline
> Badness in scsi_device_set_state at drivers/scsi/scsi_lib.c:1688
> [<e12bab6e>] scsi_device_set_state+0x9e/0xd0 [scsi_mod]
> [<e12b8a6e>] scsi_eh_offline_sdevs+0x4e/0x70 [scsi_mod]
> [<e12b8f1a>] scsi_unjam_host+0x9a/0x1b0 [scsi_mod]
> [<e12b90f5>] scsi_error_handler+0xc5/0x160 [scsi_mod]
> [<e12b9030>] scsi_error_handler+0x0/0x160 [scsi_mod]
> [<c0104255>] kernel_thread_helper+0x5/0x10
This isn't an oops, it's a state transition warning. Apparently the
customary attempt to cancel the commands failed. This is really only a
warning and will probably go away eventually (we'll just silently fail
the transition attempt).
> It turned out that in drivers/scsi/hosts.c:scsi_remove_host()
> first the host is removed with scsi_forget_host() and _then_ all
> outstanding I/O to this host is cancelled with scsi_host_cancel().
> Sounds a bit fishy as scsi_host_cancel() tries to talk to a host which
> we just have deleted ...
> (Incidentally, this is most likely the same bug as Bug #2752 and #3480
> from bugme.osdl.org :-).
> (And also #133249 from bugzilla.redhat.com :-).
>
> The attached patch corrects this.
> Please apply.
No, the patch is wrong. we do forget first to make the host
inaccessible from above then cancel the outstanding commands.
The key to the solution of this problem is to know what USB is trying to
do with the dead device. SCSI is trying to be polite and explicitly
kill the outstanding commands before it removes the HBA. Presumably USB
is returning something that says this can't be done so the EH gets all
the way up to offlining.
Also, please at least cc linux-scsi@vger.kernel.org on SCSI problems.
Thanks,
James
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 13:17 ` Alan Cox
@ 2004-09-29 14:24 ` James Bottomley
2004-09-29 14:44 ` Hannes Reinecke
0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2004-09-29 14:24 UTC (permalink / raw)
To: Alan Cox
Cc: Hannes Reinecke, Linux Kernel Mailing List, Andrew Morton,
SCSI Mailing List
On Wed, 2004-09-29 at 09:17, Alan Cox wrote:
> On Mer, 2004-09-29 at 14:56, James Bottomley wrote:
> > The key to the solution of this problem is to know what USB is trying to
> > do with the dead device. SCSI is trying to be polite and explicitly
> > kill the outstanding commands before it removes the HBA. Presumably USB
> > is returning something that says this can't be done so the EH gets all
> > the way up to offlining.
>
> Its nothing to do with USB, rmmod with eh running crashes all the other
> SCSI drivers I've tested too. After the state transition fails you get
> kobject related errors and a crash.
There is no crash in the log ... there was only a state transition
complaint.
I think the solution is in the eh and its simply not to try to ready the
device on SDEV_CANCEL (no point readying a device you're being asked to
kill).
James
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 14:24 ` James Bottomley
@ 2004-09-29 14:44 ` Hannes Reinecke
2004-09-29 15:15 ` James Bottomley
0 siblings, 1 reply; 22+ messages in thread
From: Hannes Reinecke @ 2004-09-29 14:44 UTC (permalink / raw)
To: James Bottomley
Cc: Alan Cox, Linux Kernel Mailing List, Andrew Morton,
SCSI Mailing List
[-- Attachment #1: Type: text/plain, Size: 1006 bytes --]
James Bottomley wrote:
> On Wed, 2004-09-29 at 09:17, Alan Cox wrote:
>
>>On Mer, 2004-09-29 at 14:56, James Bottomley wrote:
>>
>>>The key to the solution of this problem is to know what USB is trying to
>>>do with the dead device. SCSI is trying to be polite and explicitly
>>>kill the outstanding commands before it removes the HBA. Presumably USB
>>>is returning something that says this can't be done so the EH gets all
>>>the way up to offlining.
>>
>>Its nothing to do with USB, rmmod with eh running crashes all the other
>>SCSI drivers I've tested too. After the state transition fails you get
>>kobject related errors and a crash.
>
>
> There is no crash in the log ... there was only a state transition
> complaint.
>
Oh, that can be fixed. Attached is the full trace (including USB
debugging output).
It does crash. Hard.
Cheers,
Hannes
--
Dr. Hannes Reinecke hare@suse.de
SuSE Linux AG S390 & zSeries
Maxfeldstraße 5 +49 911 74053 688
90409 Nürnberg http://www.suse.de
[-- Attachment #2: usb-scsi-remove.oops --]
[-- Type: text/plain, Size: 6685 bytes --]
ehci_hcd 0000:00:1d.7: GetStatus port 5 status 001002 POWER sig=se0
CSC
hub 4-0:1.0: port 5, status 0100, change 0001, 12 Mb/s
usb 4-5: USB disconnect, address 4
usb 4-5: usb_disable_device nuking all URBs
usb 4-5: unregistering interface 4-5:1.0
bus usb: remove device 4-5:1.0
usb-storage: storage_disconnect() called
usb-storage: usb_stor_stop_transport called
CLASS: Unregistering class device. ID = '0:0:0:0'
CLASS: Unregistering class device. ID = 'sg0'
class_hotplug - name = sg0
device class 'sg0': release.
class_hotplug - name = 0:0:0:0
device class '0:0:0:0': release.
bus scsi: remove device 0:0:0:0
usb-storage: queuecommand called
usb-storage: *** thread awakened.
usb-storage: No command during disconnect
usb-storage: *** thread sleeping.
usb-storage: command_abort called
usb-storage: -- nothing to abort
usb-storage: device_reset called
usb-storage: No reset during disconnect
usb-storage: bus_reset called
usb-storage: No reset during disconnect
scsi: Device offlined - not ready after error recovery: host 0 channel
0 id 0 lun 0
sr 0:0:0:0: Illegal state transition cancel->offline
Badness in scsi_device_set_state at drivers/scsi/scsi_lib.c:1688
[<c0107235>] dump_stack+0x15/0x20
[<e0ef8e46>] scsi_device_set_state+0xa6/0xe0 [scsi_mod]
[<e0ef6c62>] scsi_eh_offline_sdevs+0x52/0x70 [scsi_mod]
[<e0ef7128>] scsi_unjam_host+0x98/0x1b0 [scsi_mod]
[<e0ef7305>] scsi_error_handler+0xc5/0x160 [scsi_mod]
[<c0104269>] kernel_thread_helper+0x5/0xc
Badness in kref_get at lib/kref.c:32
[<c0107235>] dump_stack+0x15/0x20
[<c01d575e>] kref_get+0x2e/0x40
[<c01d53e2>] kobject_get+0x12/0x20
[<c0246d41>] get_device+0x11/0x20
[<e0ef85c1>] scsi_request_fn+0x21/0x390 [scsi_mod]
[<c024d24e>] blk_insert_request+0x7e/0xa0
[<e0ef7673>] scsi_queue_insert+0x63/0xa0 [scsi_mod]
[<e0ef6fe8>] scsi_eh_flush_done_q+0x58/0x100 [scsi_mod]
[<e0ef7103>] scsi_unjam_host+0x73/0x1b0 [scsi_mod]
[<e0ef7305>] scsi_error_handler+0xc5/0x160 [scsi_mod]
[<c0104269>] kernel_thread_helper+0x5/0xc
Unable to handle kernel paging request at virtual address 00100104
printing eip:
e0efa735
*pde = 00000000
Oops: 0002 [#1]
Modules linked in: usb_storage rfcomm hidp l2cap hci_usb bluetooth
snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device usbhid joydev sg
st sd_mod sr_mod scsi_mod ide_cd cdrom nvram usbserial parport_pc lp
parport autofs cpufreq_userspace edd speedstep_centrino freq_table
thermal processor fan button battery ac snd_pcm_oss snd_mixer_oss
snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd soundcore
snd_page_alloc ipv6 af_packet ds ohci_hcd e100 mii ehci_hcd intel_agp
agpgart ohci1394 uhci_hcd yenta_socket ieee1394 pcmcia_core evdev
dm_mod usbcore reiserfs
CPU: 0
EIP: 0060:[<e0efa735>] Tainted: G U VLI
EFLAGS: 00010082 (2.6.8-0-defaultbt )
EIP is at scsi_device_dev_release+0x25/0x100 [scsi_mod]
eax: d2e82184 ebx: d2e82008 ecx: 00200200 edx: 00100100
esi: d2e82000 edi: 00000282 ebp: d25f3efc esp: d25f3eec
ds: 007b es: 007b ss: 0068
Process scsi_eh_0 (pid: 7063, threadinfo=d25f2000 task=d7a84000)
Stack: c157dcb4 d2e821a8 c038ad08 c038ad20 d25f3f04 c0246a83 d25f3f1c
c01d546a
c157dcd8 d2e821c0 c01d5470 c157dc00 d25f3f2c c01d5799 c157deb0
d2e82000
d25f3f48 e0ef8829 d2e82184 cdca00e8 c157deb0 00000001 cdca00e8
d25f3f60
Call Trace:
[<c010720b>] show_stack+0x9b/0xb0
[<c010735a>] show_registers+0x11a/0x190
[<c0107517>] die+0xb7/0x130
[<c0118dde>] do_page_fault+0x38e/0x5ca
[<c0106dfd>] error_code+0x2d/0x40
[<c0246a83>] device_release+0x43/0x50
[<c01d546a>] kobject_cleanup+0x7a/0x80
[<c01d5799>] kref_put+0x29/0x70
[<e0ef8829>] scsi_request_fn+0x289/0x390 [scsi_mod]
[<c024d24e>] blk_insert_request+0x7e/0xa0
[<e0ef7673>] scsi_queue_insert+0x63/0xa0 [scsi_mod]
[<e0ef6fe8>] scsi_eh_flush_done_q+0x58/0x100 [scsi_mod]
[<e0ef7103>] scsi_unjam_host+0x73/0x1b0 [scsi_mod]
[<e0ef7305>] scsi_error_handler+0xc5/0x160 [scsi_mod]
[<c0104269>] kernel_thread_helper+0x5/0xc
Code: 42 c6 34 df 89 f6 55 89 e5 57 56 53 51 8d b0 7c fe ff ff 8b 50
20 89 55 f0 9c 5f fa 8d 98 84 fe ff ff 8b 90 84 fe ff ff 8b 4b 04 <89>
4a 04 c7 43 04 00 02 20 00 89 11 8d 98 8c fe ff ff 8b 90 8c
Badness in kref_get at lib/kref.c:32
[<c0107235>] dump_stack+0x15/0x20
[<c01d575e>] kref_get+0x2e/0x40
[<c01d53e2>] kobject_get+0x12/0x20
[<c0246d41>] get_device+0x11/0x20
[<e0ef85c1>] scsi_request_fn+0x21/0x390 [scsi_mod]
[<c024ca01>] __generic_unplug_device+0x31/0x40
[<c024ca49>] blk_unplug_work+0x9/0x10
[<c012b125>] worker_thread+0x155/0x1f0
[<c012e905>] kthread+0x85/0xb0
[<c0104269>] kernel_thread_helper+0x5/0xc
Unable to handle kernel paging request at virtual address 00100104
printing eip:
e0efa735
*pde = 00000000
Oops: 0002 [#2]
Modules linked in: usb_storage rfcomm hidp l2cap hci_usb bluetooth
snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device usbhid joydev sg
st sd_mod sr_mod scsi_mod ide_cd cdrom nvram usbserial parport_pc lp
parport autofs cpufreq_userspace edd speedstep_centrino freq_table
thermal processor fan button battery ac snd_pcm_oss snd_mixer_oss
snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd soundcore
snd_page_alloc ipv6 af_packet ds ohci_hcd e100 mii ehci_hcd intel_agp
agpgart ohci1394 uhci_hcd yenta_socket ieee1394 pcmcia_core evdev
dm_mod usbcore reiserfs
CPU: 0
EIP: 0060:[<e0efa735>] Tainted: G U VLI
EFLAGS: 00010082 (2.6.8-0-defaultbt )
EIP is at scsi_device_dev_release+0x25/0x100 [scsi_mod]
eax: d2e82184 ebx: d2e82008 ecx: 00200200 edx: 00100100
esi: d2e82000 edi: 00000282 ebp: c1551ef0 esp: c1551ee0
ds: 007b es: 007b ss: 0068
Process kblockd/0 (pid: 32, threadinfo=c1550000 task=cdf8baa0)
Stack: c157dcb4 d2e821a8 c038ad08 c038ad20 c1551ef8 c0246a83 c1551f10
c01d546a
c157dcd8 d2e821c0 c01d5470 c157dc00 c1551f20 c01d5799 c157deb0
d2e82000
c1551f3c e0ef8829 d2e82184 cdca00e8 cdca00e8 c14d6e80 cdca01e0
c1551f48
Call Trace:
[<c010720b>] show_stack+0x9b/0xb0
[<c010735a>] show_registers+0x11a/0x190
[<c0107517>] die+0xb7/0x130
[<c0118dde>] do_page_fault+0x38e/0x5ca
[<c0106dfd>] error_code+0x2d/0x40
[<c0246a83>] device_release+0x43/0x50
[<c01d546a>] kobject_cleanup+0x7a/0x80
[<e0ef8829>] scsi_request_fn+0x289/0x390 [scsi_mod]
[<c024ca01>] __generic_unplug_device+0x31/0x40
[<c024ca19>] generic_unplug_device+0x9/0x10
[<c024ca49>] blk_unplug_work+0x9/0x10
[<c012b125>] worker_thread+0x155/0x1f0
[<c012e905>] kthread+0x85/0xb0
[<c0104269>] kernel_thread_helper+0x5/0xc
Code: 42 c6 34 df 89 f6 55 89 e5 57 56 53 51 8d b0 7c fe ff ff 8b 50
20 89 55 f0 9c 5f fa 8d 98 84 fe ff ff 8b 90 84 fe ff ff 8b 4b 04 <89>
4a 04 c7 43 04 00 02 20 00 89 11 8d 98 8c fe ff ff 8b 90 8c
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 14:44 ` Hannes Reinecke
@ 2004-09-29 15:15 ` James Bottomley
2004-09-29 15:28 ` Matthew Wilcox
0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2004-09-29 15:15 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Alan Cox, Linux Kernel Mailing List, Andrew Morton,
SCSI Mailing List
[-- Attachment #1: Type: text/plain, Size: 269 bytes --]
On Wed, 2004-09-29 at 10:44, Hannes Reinecke wrote:
> Oh, that can be fixed. Attached is the full trace (including USB
> debugging output).
> It does crash. Hard.
OK, looks like a refcounting problem again.
Try the attached and see if it goes away.
Thanks,
James
[-- Attachment #2: Type: message/rfc822, Size: 4917 bytes --]
From: James Bottomley <James.Bottomley@steeleye.com>
To: SCSI Mailing List <linux-scsi@vger.kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>, Andrew Morton <akpm@osdl.org>, USB development list <linux-usb-devel@lists.sourceforge.net>
Subject: Re: [linux-usb-devel] Fw: [Bug 3466] New: Bug while connecting USB-HDD (fwd)
Date: 28 Sep 2004 12:40:37 -0400
Message-ID: <1096389644.1717.45.camel@mulgrave>
On Sun, 2004-09-26 at 18:05, James Bottomley wrote:
> There's not enough information to say why it happened. However, all the
> SCSI code checks out (it's dated ... open coded reference counting
> instead of kref, but it looks sound). The scenario described could be
> seen if there's a problem in the host reference counting.
>
> In that case, there should have been a slab error earlier on in the logs
> at the point the error occurred saying something like "slab error in
> kmem_cache_destory(): can't free all objects"
>
> It's possible this could be caused by a refcounting race on the
> commands.
OK, I have a definite theory about this, but it hinges on finding the
above message in the logs.
I think we tried to destroy the command slab while some commands were
still active. The refcounting only applies to in-flight commands, but
commands can also be allocated and queued in the block layer.
If I'm right, the attached will close this refcounting hole.
James
===== drivers/scsi/scsi.c 1.146 vs edited =====
--- 1.146/drivers/scsi/scsi.c 2004-08-09 12:55:05 -05:00
+++ edited/drivers/scsi/scsi.c 2004-09-28 11:23:31 -05:00
@@ -244,7 +244,13 @@
*/
struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, int gfp_mask)
{
- struct scsi_cmnd *cmd = __scsi_get_command(dev->host, gfp_mask);
+ struct scsi_cmnd *cmd;
+
+ /* Bail if we can't get a reference to the device */
+ if (!get_device(&dev->sdev_gendev))
+ return NULL;
+
+ cmd = __scsi_get_command(dev->host, gfp_mask);
if (likely(cmd != NULL)) {
unsigned long flags;
@@ -258,7 +264,8 @@
spin_lock_irqsave(&dev->list_lock, flags);
list_add_tail(&cmd->list, &dev->cmd_list);
spin_unlock_irqrestore(&dev->list_lock, flags);
- }
+ } else
+ put_device(&dev->sdev_gendev);
return cmd;
}
@@ -276,7 +283,8 @@
*/
void scsi_put_command(struct scsi_cmnd *cmd)
{
- struct Scsi_Host *shost = cmd->device->host;
+ struct scsi_device *sdev = cmd->device;
+ struct Scsi_Host *shost = sdev->host;
unsigned long flags;
/* serious error if the command hasn't come from a device list */
@@ -294,6 +302,8 @@
if (likely(cmd != NULL))
kmem_cache_free(shost->cmd_pool->slab, cmd);
+
+ put_device(&sdev->sdev_gendev);
}
/*
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 15:15 ` James Bottomley
@ 2004-09-29 15:28 ` Matthew Wilcox
2004-09-29 15:35 ` James Bottomley
0 siblings, 1 reply; 22+ messages in thread
From: Matthew Wilcox @ 2004-09-29 15:28 UTC (permalink / raw)
To: James Bottomley
Cc: Hannes Reinecke, Alan Cox, Linux Kernel Mailing List,
Andrew Morton, SCSI Mailing List
On Wed, Sep 29, 2004 at 11:15:13AM -0400, James Bottomley wrote:
> struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, int gfp_mask)
> {
> - struct scsi_cmnd *cmd = __scsi_get_command(dev->host, gfp_mask);
> + struct scsi_cmnd *cmd;
> +
> + /* Bail if we can't get a reference to the device */
> + if (!get_device(&dev->sdev_gendev))
> + return NULL;
How can this happen? You're taking the address of dev->sdev_gendev, so it
can't be NULL:
struct device * get_device(struct device * dev)
{
return dev ? to_dev(kobject_get(&dev->kobj)) : NULL;
}
(kobject_get returns its argument).
--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 15:28 ` Matthew Wilcox
@ 2004-09-29 15:35 ` James Bottomley
0 siblings, 0 replies; 22+ messages in thread
From: James Bottomley @ 2004-09-29 15:35 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Hannes Reinecke, Alan Cox, Linux Kernel Mailing List,
Andrew Morton, SCSI Mailing List
On Wed, 2004-09-29 at 11:28, Matthew Wilcox wrote:
> How can this happen? You're taking the address of dev->sdev_gendev, so it
> can't be NULL:
Just in case we link it into the device state model.
James
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 12:31 ` Hannes Reinecke
@ 2004-09-29 17:12 ` Mike Anderson
2004-09-29 17:19 ` James Bottomley
0 siblings, 1 reply; 22+ messages in thread
From: Mike Anderson @ 2004-09-29 17:12 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Christoph Hellwig, linux-scsi, James.Bottomley, Andrew Morton,
Matthew Dharm, Alan Stern
Hannes Reinecke [hare@suse.de] wrote:
> Christoph Hellwig wrote:
> >[please send scsi issues to linux-scsi, thanks]
> >
> >
> >>It turned out that in drivers/scsi/hosts.c:scsi_remove_host()
> >>first the host is removed with scsi_forget_host() and _then_ all
> >>outstanding I/O to this host is cancelled with scsi_host_cancel().
> >>Sounds a bit fishy as scsi_host_cancel() tries to talk to a host which
> >>we just have deleted ...
> >>(Incidentally, this is most likely the same bug as Bug #2752 and #3480
> >>from bugme.osdl.org :-).
> >>(And also #133249 from bugzilla.redhat.com :-).
> >>
> >>The attached patch corrects this.
> >>Please apply.
> >
> >
> >I'ts not that easy. If we cancel the host first we won't get our write
> >caches flushed because the drivers don't accept the SYNCRHONIZE_CACHE.
> >command in cancelled state. Mike just changed the order to what it is
> >now a short while ago.
> >
> Yeah, Jens Axboe just pointed that out.
>
> >So we'll have to find a way to send a SYNCHRNOZIE_CACHE command even
> >in canncelled state.
> >
> >In fact I can't see how these problems could happen, and that they only
> >happen with usb-storage seems strange.
> Well, the thing is: When usb-storage calls scsi_remove_host() the device
> is already gone; whether or not we can synchronize the cache is a moot
> point there.
>
> And if I understood the comment to the Changeset properly, the patch was
> just to remove the 'Synchronizing SCSI cache' message.
> I can live with that. Kernel Oops is much worse.
>
Just a clarification. The patch was also to allow the sending of
the 'Synchronizing SCSI cache' message on rmmod. Prior to the change not
only was the message generated, but also no command was sent either for
devices that wanted one.
The scsi_remove_host interface does not have a method to distinguish
between clean and unexpected removals.
A pointer to the archives that has a partial discussion on a previously
more complicated change and the current change of reordering.
http://marc.theaimsgroup.com/?t=108701426000002&r=1&w=2
While James patch looks like it will address the issues in the error
handler ref counting. It would be best not to enter the error handler at
all.
We may want to reconsider a method to distinguish unexpected removals
from clean removals.
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 17:12 ` Mike Anderson
@ 2004-09-29 17:19 ` James Bottomley
2004-09-29 17:22 ` Christoph Hellwig
0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2004-09-29 17:19 UTC (permalink / raw)
To: Mike Anderson
Cc: Hannes Reinecke, Christoph Hellwig, SCSI Mailing List,
Andrew Morton, Matthew Dharm, Alan Stern
On Wed, 2004-09-29 at 13:12, Mike Anderson wrote:
> We may want to reconsider a method to distinguish unexpected removals
> from clean removals.
Technically we have this: it's the recovery flag to scsi_host_cancel().
Unfortunately, scsi_remove_host doesn't have such a flag so it's always
set to zero at the moment.
James
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 17:19 ` James Bottomley
@ 2004-09-29 17:22 ` Christoph Hellwig
2004-09-29 17:36 ` Mike Anderson
0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2004-09-29 17:22 UTC (permalink / raw)
To: James Bottomley
Cc: Mike Anderson, Hannes Reinecke, Christoph Hellwig,
SCSI Mailing List, Andrew Morton, Matthew Dharm, Alan Stern
On Wed, Sep 29, 2004 at 01:19:28PM -0400, James Bottomley wrote:
> On Wed, 2004-09-29 at 13:12, Mike Anderson wrote:
> > We may want to reconsider a method to distinguish unexpected removals
> > from clean removals.
>
> Technically we have this: it's the recovery flag to scsi_host_cancel().
> Unfortunately, scsi_remove_host doesn't have such a flag so it's always
> set to zero at the moment.
and ->remove for the bus drivers doesn't know either, so adding such
a flag wouldn't help us much.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 17:22 ` Christoph Hellwig
@ 2004-09-29 17:36 ` Mike Anderson
2004-09-29 17:38 ` Christoph Hellwig
0 siblings, 1 reply; 22+ messages in thread
From: Mike Anderson @ 2004-09-29 17:36 UTC (permalink / raw)
To: Christoph Hellwig
Cc: James Bottomley, Hannes Reinecke, SCSI Mailing List,
Andrew Morton, Matthew Dharm, Alan Stern
Christoph Hellwig [hch@infradead.org] wrote:
> On Wed, Sep 29, 2004 at 01:19:28PM -0400, James Bottomley wrote:
> > On Wed, 2004-09-29 at 13:12, Mike Anderson wrote:
> > > We may want to reconsider a method to distinguish unexpected removals
> > > from clean removals.
> >
> > Technically we have this: it's the recovery flag to scsi_host_cancel().
> > Unfortunately, scsi_remove_host doesn't have such a flag so it's always
> > set to zero at the moment.
>
> and ->remove for the bus drivers doesn't know either, so adding such
> a flag wouldn't help us much.
In some cases the knowledge would be in the driver would it not. When the
drivers remove function is called the driver knows the state of the
transport or in the case of the usb storage device it knows it will
not process anymore commands.
Alan / Matthew / others can correct if I have mis-read something in
usb_stor_release_resources but it knows that it will not process anymore
commands prior to calling scsi_remove_host so if we had a method to flag
this it would help in our shutdown if we want to support both devices
that can and cannot process commands.
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 17:36 ` Mike Anderson
@ 2004-09-29 17:38 ` Christoph Hellwig
2004-09-29 17:50 ` Alan Stern
2004-09-29 17:52 ` Mike Anderson
0 siblings, 2 replies; 22+ messages in thread
From: Christoph Hellwig @ 2004-09-29 17:38 UTC (permalink / raw)
To: Christoph Hellwig, James Bottomley, Hannes Reinecke,
SCSI Mailing List, Andrew Morton, Matthew Dharm, Alan Stern
On Wed, Sep 29, 2004 at 10:36:42AM -0700, Mike Anderson wrote:
> Christoph Hellwig [hch@infradead.org] wrote:
> > On Wed, Sep 29, 2004 at 01:19:28PM -0400, James Bottomley wrote:
> > > On Wed, 2004-09-29 at 13:12, Mike Anderson wrote:
> > > > We may want to reconsider a method to distinguish unexpected removals
> > > > from clean removals.
> > >
> > > Technically we have this: it's the recovery flag to scsi_host_cancel().
> > > Unfortunately, scsi_remove_host doesn't have such a flag so it's always
> > > set to zero at the moment.
> >
> > and ->remove for the bus drivers doesn't know either, so adding such
> > a flag wouldn't help us much.
>
> In some cases the knowledge would be in the driver would it not. When the
> drivers remove function is called the driver knows the state of the
> transport or in the case of the usb storage device it knows it will
> not process anymore commands.
>
> Alan / Matthew / others can correct if I have mis-read something in
> usb_stor_release_resources but it knows that it will not process anymore
> commands prior to calling scsi_remove_host so if we had a method to flag
> this it would help in our shutdown if we want to support both devices
> that can and cannot process commands.
Well, we're certainly not going to change the scsi_remove_host signature,
but we could add a separate one.
But I don't like that change at all as we would still have that problem
with all driver that don't have a way to magically find out. We should
really try to make scsi_remove_host foolprof.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 17:38 ` Christoph Hellwig
@ 2004-09-29 17:50 ` Alan Stern
2004-09-29 18:32 ` Mike Anderson
2004-09-29 17:52 ` Mike Anderson
1 sibling, 1 reply; 22+ messages in thread
From: Alan Stern @ 2004-09-29 17:50 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Anderson, James Bottomley, Hannes Reinecke,
SCSI Mailing List, Andrew Morton, Matthew Dharm
Mike, thanks for adding me to this thread...
On Wed, 29 Sep 2004, Christoph Hellwig wrote:
> > In some cases the knowledge would be in the driver would it not. When the
> > drivers remove function is called the driver knows the state of the
> > transport or in the case of the usb storage device it knows it will
> > not process anymore commands.
> >
> > Alan / Matthew / others can correct if I have mis-read something in
> > usb_stor_release_resources but it knows that it will not process anymore
> > commands prior to calling scsi_remove_host so if we had a method to flag
> > this it would help in our shutdown if we want to support both devices
> > that can and cannot process commands.
That's right; when usb-storage calls scsi_remove_host it will not process
any more commands.
> Well, we're certainly not going to change the scsi_remove_host signature,
> but we could add a separate one.
>
> But I don't like that change at all as we would still have that problem
> with all driver that don't have a way to magically find out. We should
> really try to make scsi_remove_host foolprof.
Please take a look at my questions in
http://marc.theaimsgroup.com/?l=linux-scsi&m=109647697621208&w=2
Part of the problem here may be that ownership of SCSI commands isn't
clear during the host removal process. As things stand, usb-storage has
no choice but to ignore commands queued after scsi_remove_host is called.
This certainly will cause problems if it means we have to wait around for
the commands to time out and be aborted.
Alan Stern
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 17:38 ` Christoph Hellwig
2004-09-29 17:50 ` Alan Stern
@ 2004-09-29 17:52 ` Mike Anderson
1 sibling, 0 replies; 22+ messages in thread
From: Mike Anderson @ 2004-09-29 17:52 UTC (permalink / raw)
To: Christoph Hellwig
Cc: James Bottomley, Hannes Reinecke, SCSI Mailing List,
Andrew Morton, Matthew Dharm, Alan Stern
Christoph Hellwig [hch@infradead.org] wrote:
> Well, we're certainly not going to change the scsi_remove_host signature,
> but we could add a separate one.
No we would not want to change the signature of scsi_remove_host. That
is why I tried to come up with the more complicated patch previously,
but as you state below it would be better if the function was more
foolproof
>
> But I don't like that change at all as we would still have that problem
> with all driver that don't have a way to magically find out. We should
> really try to make scsi_remove_host foolprof.
>
Well I believe the issue is not that we sent the command. It is that the
command will not be processed and we start up the error handler. We
could know that we are in the context of scsi_remove_host (possibly with
a state change or other method) and do not start of the error handler. We
would just try to send then and then cancel them with the call to
scsi_host_cancel in scsi_remove_host post the call to scsi_forget_host.
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 17:50 ` Alan Stern
@ 2004-09-29 18:32 ` Mike Anderson
2004-09-29 18:58 ` Alan Stern
0 siblings, 1 reply; 22+ messages in thread
From: Mike Anderson @ 2004-09-29 18:32 UTC (permalink / raw)
To: Alan Stern
Cc: Christoph Hellwig, James Bottomley, Hannes Reinecke,
SCSI Mailing List, Andrew Morton, Matthew Dharm
Alan Stern [stern@rowland.harvard.edu] wrote:
> Mike, thanks for adding me to this thread...
>
> On Wed, 29 Sep 2004, Christoph Hellwig wrote:
>
> > > In some cases the knowledge would be in the driver would it not. When the
> > > drivers remove function is called the driver knows the state of the
> > > transport or in the case of the usb storage device it knows it will
> > > not process anymore commands.
> > >
> > > Alan / Matthew / others can correct if I have mis-read something in
> > > usb_stor_release_resources but it knows that it will not process anymore
> > > commands prior to calling scsi_remove_host so if we had a method to flag
> > > this it would help in our shutdown if we want to support both devices
> > > that can and cannot process commands.
>
> That's right; when usb-storage calls scsi_remove_host it will not process
> any more commands.
>
> > Well, we're certainly not going to change the scsi_remove_host signature,
> > but we could add a separate one.
> >
> > But I don't like that change at all as we would still have that problem
> > with all driver that don't have a way to magically find out. We should
> > really try to make scsi_remove_host foolprof.
>
> Please take a look at my questions in
>
> http://marc.theaimsgroup.com/?l=linux-scsi&m=109647697621208&w=2
>
> Part of the problem here may be that ownership of SCSI commands isn't
> clear during the host removal process. As things stand, usb-storage has
> no choice but to ignore commands queued after scsi_remove_host is called.
> This certainly will cause problems if it means we have to wait around for
> the commands to time out and be aborted.
>
Is it possible usb storage could do something else beside just ignore
the commands like return them with DID_NO_CONNECT.
In storage_disconnect it looks like you set some state, wait for the
current command, but do not set anything that queuecommand can look at
to start rejecting commands.
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 18:32 ` Mike Anderson
@ 2004-09-29 18:58 ` Alan Stern
2004-09-30 8:09 ` Hannes Reinecke
0 siblings, 1 reply; 22+ messages in thread
From: Alan Stern @ 2004-09-29 18:58 UTC (permalink / raw)
To: Mike Anderson
Cc: Christoph Hellwig, James Bottomley, Hannes Reinecke,
SCSI Mailing List, Andrew Morton, Matthew Dharm
On Wed, 29 Sep 2004, Mike Anderson wrote:
> Is it possible usb storage could do something else beside just ignore
> the commands like return them with DID_NO_CONNECT.
>
> In storage_disconnect it looks like you set some state, wait for the
> current command, but do not set anything that queuecommand can look at
> to start rejecting commands.
No, there is a flag set: US_FLIDX_DISCONNECTING. queuecommand could use
that to reject commands.
On the other hand, considering the race between queuecommand and
scsi_remove_host, it shouldn't matter if queuecommand accepts the
commands. After all, it's not really possible for the SCSI core to tell
whether queuecommand was called before or after scsi_remove_host.
Alan Stern
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-29 18:58 ` Alan Stern
@ 2004-09-30 8:09 ` Hannes Reinecke
2004-09-30 18:14 ` Alan Stern
0 siblings, 1 reply; 22+ messages in thread
From: Hannes Reinecke @ 2004-09-30 8:09 UTC (permalink / raw)
To: Alan Stern
Cc: Mike Anderson, Christoph Hellwig, James Bottomley,
SCSI Mailing List, Andrew Morton, Matthew Dharm
[-- Attachment #1: Type: text/plain, Size: 1355 bytes --]
Alan Stern wrote:
> On Wed, 29 Sep 2004, Mike Anderson wrote:
>
>
>>Is it possible usb storage could do something else beside just ignore
>>the commands like return them with DID_NO_CONNECT.
>>
>>In storage_disconnect it looks like you set some state, wait for the
>>current command, but do not set anything that queuecommand can look at
>>to start rejecting commands.
>
>
> No, there is a flag set: US_FLIDX_DISCONNECTING. queuecommand could use
> that to reject commands.
>
> On the other hand, considering the race between queuecommand and
> scsi_remove_host, it shouldn't matter if queuecommand accepts the
> commands. After all, it's not really possible for the SCSI core to tell
> whether queuecommand was called before or after scsi_remove_host.
>
Ok, since this is essential the same thread then on linux-scsi
(cf. "Bug: CD driver sends commands during host removal") the same
solution applies here also.
Alan, I did a different patch which omit the special case handling in
scsiglue.c and rather reorders usb_stor_control_thread. I'd prefer mine
(naturally), but any of those will do.
And yes, both patches fix the problem.
Back to the maintainer for patch inclusion.
Cheers,
Hannes
--
Dr. Hannes Reinecke hare@suse.de
SuSE Linux AG S390 & zSeries
Maxfeldstraße 5 +49 911 74053 688
90409 Nürnberg http://www.suse.de
[-- Attachment #2: usb-drop-requests-for-disconnect --]
[-- Type: text/plain, Size: 1427 bytes --]
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
# 2004/09/30 09:59:42+02:00 hare@lammermuir.suse.de
# When a device is disconnected, we should mark the state accordingly
# and drop the packet as there is no device we could talk to.
# Fix suggested by James Bottomley.
#
# Signed-off-by: Hannes Reinecke <hare@suse.de>
#
# drivers/usb/storage/usb.c
# 2004/09/30 09:59:34+02:00 hare@lammermuir.suse.de +3 -2
# Fix SCSI command handling on disconnect.
#
diff -Nru a/drivers/usb/storage/usb.c b/drivers/usb/storage/usb.c
--- a/drivers/usb/storage/usb.c 2004-09-30 10:09:01 +02:00
+++ b/drivers/usb/storage/usb.c 2004-09-30 10:09:01 +02:00
@@ -320,7 +320,8 @@
/* don't do anything if we are disconnecting */
if (test_bit(US_FLIDX_DISCONNECTING, &us->flags)) {
- US_DEBUGP("No command during disconnect\n");
+ US_DEBUGP("Ignoring command during disconnect\n");
+ us->srb->result = DID_NO_CONNECT << 16;
goto SkipForDisconnect;
}
@@ -374,6 +375,7 @@
/* lock access to the state */
scsi_lock(host);
+SkipForDisconnect:
/* indicate that the command is done */
if (us->srb->result != DID_ABORT << 16) {
US_DEBUGP("scsi cmd done, result=0x%x\n",
@@ -393,7 +395,6 @@
complete(&(us->notify));
/* empty the queue, reset the state, and release the lock */
-SkipForDisconnect:
us->srb = NULL;
us->sm_state = US_STATE_IDLE;
scsi_unlock(host);
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-30 8:09 ` Hannes Reinecke
@ 2004-09-30 18:14 ` Alan Stern
2004-10-01 7:11 ` Hannes Reinecke
0 siblings, 1 reply; 22+ messages in thread
From: Alan Stern @ 2004-09-30 18:14 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Mike Anderson, Christoph Hellwig, James Bottomley,
SCSI Mailing List, Andrew Morton, Matthew Dharm
On Thu, 30 Sep 2004, Hannes Reinecke wrote:
> Ok, since this is essential the same thread then on linux-scsi
> (cf. "Bug: CD driver sends commands during host removal") the same
> solution applies here also.
>
> Alan, I did a different patch which omit the special case handling in
> scsiglue.c and rather reorders usb_stor_control_thread. I'd prefer mine
> (naturally), but any of those will do.
> And yes, both patches fix the problem.
>
> Back to the maintainer for patch inclusion.
>
> Cheers,
>
> Hannes
No, your patch is incorrect. In fact, it resembles the way the driver
used to work -- which caused an occasional oops. The problem is that by
the time the usb_stor_control_thread() wakes up and sees that it has a new
command to process, the SCSI core may already have deallocated the
scsi_cmnd structure. So the line where you assign to us->srb->result is
liable to raise an addressing exception.
Alan Stern
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-09-30 18:14 ` Alan Stern
@ 2004-10-01 7:11 ` Hannes Reinecke
2004-10-01 16:07 ` Alan Stern
0 siblings, 1 reply; 22+ messages in thread
From: Hannes Reinecke @ 2004-10-01 7:11 UTC (permalink / raw)
To: Alan Stern
Cc: Mike Anderson, Christoph Hellwig, James Bottomley,
SCSI Mailing List, Andrew Morton, Matthew Dharm
Alan Stern wrote:
> On Thu, 30 Sep 2004, Hannes Reinecke wrote:
>
>
>>Ok, since this is essential the same thread then on linux-scsi
>>(cf. "Bug: CD driver sends commands during host removal") the same
>>solution applies here also.
>>
>>Alan, I did a different patch which omit the special case handling in
>>scsiglue.c and rather reorders usb_stor_control_thread. I'd prefer mine
>>(naturally), but any of those will do.
>>And yes, both patches fix the problem.
>>
>>Back to the maintainer for patch inclusion.
>>
>>Cheers,
>>
>>Hannes
>
>
> No, your patch is incorrect. In fact, it resembles the way the driver
> used to work -- which caused an occasional oops. The problem is that by
> the time the usb_stor_control_thread() wakes up and sees that it has a new
> command to process, the SCSI core may already have deallocated the
> scsi_cmnd structure. So the line where you assign to us->srb->result is
> liable to raise an addressing exception.
>
> Alan Stern
>
Ok. (I knew there was something subtle going on).
What about the check for disconnecting later in
usb.c:usb_stor_control_thread?
It will only trigger if usb_stor_control_thread has been awakened by
some other command and the device has been disconnected between
awakening and executing this check.
(This may well be impossible to trigger, but the set_bit() in
storage_disconnect() appears not to be protected by any lock).
And I doubt it will work properly in that case as no scsi_done is not
called nor a proper status is set.
Is this check still needed or can it be removed?
Cheers,
Hannes
--
Dr. Hannes Reinecke hare@suse.de
SuSE Linux AG S390 & zSeries
Maxfeldstraße 5 +49 911 74053 688
90409 Nürnberg http://www.suse.de
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Patch] Fix oops on rmmod usb-storage
2004-10-01 7:11 ` Hannes Reinecke
@ 2004-10-01 16:07 ` Alan Stern
0 siblings, 0 replies; 22+ messages in thread
From: Alan Stern @ 2004-10-01 16:07 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Mike Anderson, Christoph Hellwig, James Bottomley,
SCSI Mailing List, Andrew Morton, Matthew Dharm
On Fri, 1 Oct 2004, Hannes Reinecke wrote:
> Ok. (I knew there was something subtle going on).
> What about the check for disconnecting later in
> usb.c:usb_stor_control_thread?
> It will only trigger if usb_stor_control_thread has been awakened by
> some other command and the device has been disconnected between
> awakening and executing this check.
If by "awakened" you mean the call to up(&us->sema), then that's right.
Remember that there will be some time between the call to up() and when
the thread starts running.
> (This may well be impossible to trigger,
No, it isn't. Uncommon yes, but not impossible.
> but the set_bit() in
> storage_disconnect() appears not to be protected by any lock).
set_bit, test_bit, test_and_clear_bit, etc. don't need to be protected by
locks because they execute atomically with appropriate memory barriers.
That's one of the reasons they are useful.
> And I doubt it will work properly in that case as no scsi_done is not
> called nor a proper status is set.
It's impossible for the control thread to set a status or call scsi_done
because us->srb will point to storage that probably has been deallocated.
It works correctly nevertheless, because the SCSI core terminates the
command automatically within scsi_remove_host.
> Is this check still needed or can it be removed?
The check is definitely needed.
Here's a diagram to help make things clearer. Things don't have to
execute in this order, but it is a possible scenario:
User unplugs the USB cable and USB core calls storage_disconnect
SCSI core calls queuecommand
queuecommand does up(&us->sema);
storage_disconnect does set_bit(US_FLIDX_DISCONNECTING...)
storage_disconnect does down(&us->dev_semaphore),
up(&us->dev_semaphore)
storage_disconnect calls scsi_remove_host
The SCSI core terminates the command and deallocates the srb
usb_stor_control_thread wakes up, sees the DISCONNECTING bit is
set, does nothing, and goes back to sleep
Now you see, if that test weren't present then the control thread would
try to dereference the srb pointer, causing an oops.
Alan Stern
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2004-10-01 16:07 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <415A67B8.2080003@suse.de>
2004-09-29 12:03 ` [Patch] Fix oops on rmmod usb-storage Christoph Hellwig
2004-09-29 12:31 ` Hannes Reinecke
2004-09-29 17:12 ` Mike Anderson
2004-09-29 17:19 ` James Bottomley
2004-09-29 17:22 ` Christoph Hellwig
2004-09-29 17:36 ` Mike Anderson
2004-09-29 17:38 ` Christoph Hellwig
2004-09-29 17:50 ` Alan Stern
2004-09-29 18:32 ` Mike Anderson
2004-09-29 18:58 ` Alan Stern
2004-09-30 8:09 ` Hannes Reinecke
2004-09-30 18:14 ` Alan Stern
2004-10-01 7:11 ` Hannes Reinecke
2004-10-01 16:07 ` Alan Stern
2004-09-29 17:52 ` Mike Anderson
2004-09-29 13:56 ` James Bottomley
2004-09-29 13:17 ` Alan Cox
2004-09-29 14:24 ` James Bottomley
2004-09-29 14:44 ` Hannes Reinecke
2004-09-29 15:15 ` James Bottomley
2004-09-29 15:28 ` Matthew Wilcox
2004-09-29 15:35 ` James Bottomley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).