* BUG: CD driver sends command during host removal [not found] <20040926082926.GA1944@uniball> @ 2004-09-27 18:18 ` Alan Stern 2004-09-27 18:51 ` Mohammed Sameer 2004-09-29 16:06 ` Luben Tuikov 0 siblings, 2 replies; 28+ messages in thread From: Alan Stern @ 2004-09-27 18:18 UTC (permalink / raw) To: SCSI development list; +Cc: Mohammed Sameer, USB users list I received the following error report, showing that something in the CD driver attempts to send a command to a USB CD-RW drive when the host is removed, leading to an oops. This was on a 2.6.9-rc2 system (Mohammed please correct me if that's wrong). The initial part of the log shows a perfectly normal probe and scan of the drive. Here are the important bits: On Sun, 26 Sep 2004, Mohammed Sameer wrote: > Sep 26 11:20:48 localhost kernel: usb-storage: Command INQUIRY (6 bytes) > Sep 26 11:20:48 localhost kernel: usb-storage: 12 00 00 00 24 00 > Sep 26 11:20:48 localhost kernel: usb-storage: scsi cmd done, result=0x0 > Sep 26 11:20:48 localhost kernel: Vendor: MSI Model: CD-RW CR52 Rev: 3.70 > Sep 26 11:20:48 localhost kernel: Type: CD-ROM ANSI SCSI revision: 02 > Sep 26 11:20:48 localhost kernel: usb-storage: Command TEST_UNIT_READY (6 bytes) > Sep 26 11:20:48 localhost kernel: usb-storage: 00 00 00 00 00 00 > Sep 26 11:20:48 localhost kernel: usb-storage: -- transport indicates command failure > Sep 26 11:20:48 localhost kernel: usb-storage: Issuing auto-REQUEST_SENSE > Sep 26 11:20:48 localhost kernel: usb-storage: -- code: 0x70, key: 0x6, ASC: 0x29, ASCQ: 0x0 > Sep 26 11:20:48 localhost kernel: usb-storage: Unit Attention: Power on, reset, or bus device reset occurred > Sep 26 11:20:48 localhost kernel: usb-storage: scsi cmd done, result=0x2 > Sep 26 11:20:48 localhost kernel: usb-storage: Command TEST_UNIT_READY (6 bytes) > Sep 26 11:20:48 localhost kernel: usb-storage: 00 00 00 00 00 00 > Sep 26 11:20:48 localhost kernel: usb-storage: scsi cmd done, result=0x0 > Sep 26 11:20:48 localhost kernel: usb-storage: Command MODE_SENSE_10 (10 bytes) > Sep 26 11:20:48 localhost kernel: usb-storage: 5a 00 2a 00 00 00 00 00 80 00 > Sep 26 11:20:48 localhost kernel: usb-storage: scsi cmd done, result=0x0 > Sep 26 11:20:48 localhost kernel: sr0: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray > Sep 26 11:20:48 localhost kernel: Attached scsi CD-ROM sr0 at scsi0, channel 0, id 0, lun 0Sep 26 11:20:54 localhost kernel: usb 1-2: USB disconnect, address 2 Up to here everything looks okay. Then the user unplugged the USB cable: > Sep 26 11:20:54 localhost kernel: usb-storage: storage_disconnect() called > Sep 26 11:20:54 localhost kernel: usb-storage: usb_stor_stop_transport called Next usb-storage called scsi_remove_host(). Apparently this caused some component of the CD driver to queue a command: > Sep 26 11:20:54 localhost kernel: usb-storage: queuecommand called > Sep 26 11:20:54 localhost kernel: usb-storage: *** thread awakened. > Sep 26 11:20:54 localhost kernel: usb-storage: No command during disconnect > Sep 26 11:20:54 localhost kernel: usb-storage: *** thread sleeping. usb-storage accepted the command but then ignored it because the host was in process of removal. Should the queuecommand routine have rejected the command? This would involve a race, because it's possible for queuecommand to accept a command and then scsi_remove_host() to be called before the command is carried out. After five seconds the command timed out: > Sep 26 11:20:59 localhost kernel: usb-storage: command_abort called > Sep 26 11:20:59 localhost kernel: usb-storage: -- nothing to abort usb-storage ignored the request to abort the command (the command_abort routine returned FAILED because no command was running). So error recovery proceeded to try a device reset and then a bus reset. Neither one was allowed: > Sep 26 11:20:59 localhost kernel: usb-storage: device_reset called > Sep 26 11:20:59 localhost kernel: usb-storage: No reset during disconnect > Sep 26 11:20:59 localhost kernel: usb-storage: bus_reset called > Sep 26 11:20:59 localhost kernel: usb-storage: No reset during disconnect usb-storage doesn't define a host reset, so error recovery gave up: > Sep 26 11:20:59 localhost kernel: scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lun 0 > Sep 26 11:20:59 localhost kernel: Badness in scsi_device_set_state at drivers/scsi/scsi_lib.c:1688 > Sep 26 11:20:59 localhost kernel: [<cf92d24f>] scsi_device_set_state+0xc4/0x112 [scsi_mod]Sep 26 11:20:59 localhost kernel: [<cf92afa0>] scsi_eh_offline_sdevs+0x64/0x80 [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<cf92b4b0>] scsi_unjam_host+0xb6/0x1eb [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<c0115777>] default_wake_function+0x0/0x12 > Sep 26 11:20:59 localhost kernel: [<cf92b6b4>] scsi_error_handler+0xcf/0x16b [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<cf92b5e5>] scsi_error_handler+0x0/0x16b [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<c010425d>] kernel_thread_helper+0x5/0xb > Sep 26 11:20:59 localhost kernel: Badness in kref_get at lib/kref.c:32 > Sep 26 11:20:59 localhost kernel: [<c01aa017>] kref_get+0x44/0x46 > Sep 26 11:20:59 localhost kernel: [<c01a9c65>] kobject_get+0x1a/0x24 > Sep 26 11:20:59 localhost kernel: [<c0218ec6>] get_device+0x18/0x21 > Sep 26 11:20:59 localhost kernel: [<cf92c9c4>] scsi_request_fn+0x25/0x367 [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<c021f1e2>] blk_insert_request+0xae/0xcc > Sep 26 11:20:59 localhost kernel: [<c0106428>] dump_stack+0x1c/0x20 > Sep 26 11:20:59 localhost kernel: [<cf92ba11>] scsi_queue_insert+0x89/0xd0 [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<cf92b381>] scsi_eh_flush_done_q+0x6f/0xe8 [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<cf92b47c>] scsi_unjam_host+0x82/0x1eb [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<c0115777>] default_wake_function+0x0/0x12 > Sep 26 11:20:59 localhost kernel: [<cf92b6b4>] scsi_error_handler+0xcf/0x16b [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<cf92b5e5>] scsi_error_handler+0x0/0x16b [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<c010425d>] kernel_thread_helper+0x5/0xb > Sep 26 11:20:59 localhost kernel: cf92ee79 > Sep 26 11:20:59 localhost kernel: Modules linked in: sr_mod usb_storage ipv6 thermal fan button processor ac battery microcode e100 yenta_socket pcmcia_core ehci_hcd usbcore ext2 dm_mod eepro100 mii toshiba_acpi psmouse pcspkr msr snd_seq_midi snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc gameport snd_mpu401_uart snd_rawmidi snd_seq_oss snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore ide_cd cdrom sd_mod scsi_mod rtc unix > Sep 26 11:20:59 localhost kernel: CPU: 0 > Sep 26 11:20:59 localhost kernel: EIP: 0060:[<cf92ee79>] Not tainted VLI > Sep 26 11:20:59 localhost kernel: EFLAGS: 00010082 (2.6.9-rc2-Uniball-1) > Sep 26 11:20:59 localhost kernel: EIP is at scsi_device_dev_release+0x26/0xeb [scsi_mod] > Sep 26 11:20:59 localhost kernel: eax: c974dd84 ebx: c974dc08 ecx: 00200200 edx: 00100100 > Sep 26 11:20:59 localhost kernel: esi: c974dc00 edi: 00000282 ebp: cef084b4 esp: c9091e80 > Sep 26 11:20:59 localhost kernel: ds: 007b es: 007b ss: 0068 > Sep 26 11:20:59 localhost kernel: Process scsi_eh_0 (pid: 2103, threadinfo=c9090000 task=cd2b8aa0) > Sep 26 11:20:59 localhost kernel: Stack: 00000046 c974dda8 c0334488 c03344a0 cef084d8 c0218bf8 c974dd84 c974dda8 > Sep 26 11:20:59 localhost kernel: c0334488 c03344a0 c01a9d07 c974dda8 c974ddc0 c01a9d09 cef08400 cec76128 > Sep 26 11:20:59 localhost kernel: c01aa052 c974dda8 cef08400 cec76128 c974dd84 cef302b0 c974dc00 c01a9d31 > Sep 26 11:20:59 localhost kernel: Call Trace: > Sep 26 11:20:59 localhost kernel: [<c0218bf8>] device_release+0x58/0x5c > Sep 26 11:20:59 localhost kernel: [<c01a9d07>] kobject_cleanup+0x98/0x9a > Sep 26 11:20:59 localhost kernel: [<c01a9d09>] kobject_release+0x0/0xa > Sep 26 11:20:59 localhost kernel: [<c01aa052>] kref_put+0x39/0x93 > Sep 26 11:20:59 localhost kernel: [<c01a9d31>] kobject_put+0x1e/0x22 > Sep 26 11:20:59 localhost kernel: [<c01a9d09>] kobject_release+0x0/0xa > Sep 26 11:20:59 localhost kernel: [<cf92cb82>] scsi_request_fn+0x1e3/0x367 [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<c021f1e2>] blk_insert_request+0xae/0xcc > Sep 26 11:20:59 localhost kernel: [<c0106428>] dump_stack+0x1c/0x20 > Sep 26 11:20:59 localhost kernel: [<cf92ba11>] scsi_queue_insert+0x89/0xd0 [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<cf92b381>] scsi_eh_flush_done_q+0x6f/0xe8 [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<cf92b47c>] scsi_unjam_host+0x82/0x1eb [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<c0115777>] default_wake_function+0x0/0x12 > Sep 26 11:20:59 localhost kernel: [<cf92b6b4>] scsi_error_handler+0xcf/0x16b [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<cf92b5e5>] scsi_error_handler+0x0/0x16b [scsi_mod] > Sep 26 11:20:59 localhost kernel: [<c010425d>] kernel_thread_helper+0x5/0xb > Sep 26 11:20:59 localhost kernel: Code: e9 7c a0 8e f0 55 57 56 53 83 ec 04 8b 44 24 18 8b 68 20 8d b0 7c fe ff ff 9c 5f fa 8d 98 84 fe ff ff 8b 90 84 fe ff ff 8b 4b 04 <89> 4a 04 89 11 c7 43 04 00 02 20 00 8d 98 8c fe ff ff 8b 90 8c I've omitted the remaining parts of the fault cascade. Can anyone help? Alan Stern ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-27 18:18 ` BUG: CD driver sends command during host removal Alan Stern @ 2004-09-27 18:51 ` Mohammed Sameer 2004-09-29 16:06 ` Luben Tuikov 1 sibling, 0 replies; 28+ messages in thread From: Mohammed Sameer @ 2004-09-27 18:51 UTC (permalink / raw) To: SCSI development list, USB users list [-- Attachment #1: Type: text/plain, Size: 1148 bytes --] On Mon, Sep 27, 2004 at 02:18:51PM -0400, Alan Stern wrote: > I received the following error report, showing that something in the CD > driver attempts to send a command to a USB CD-RW drive when the host is > removed, leading to an oops. This was on a 2.6.9-rc2 system (Mohammed > please correct me if that's wrong). thanks Alan for posting this to this mailing list. Yes i'm using a vanilla 2.6.9-rc2 on Debian testing. I'm now subscribed to this mailing list, And I can provide any required information. Best refards, -- ---------------- -- Katoob Main Developer Linux registered user #224950, ICQ #58475622 -- Don't send me any attachment in Micro$oft (.DOC, .PPT) format please Read http://www.fsf.org/philosophy/no-word-attachments.html Preferable attachments: .PDF, .HTML, .TXT Thanx for adding this text to Your signature -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCM/IT d-(++)@ s+(++):->+++ a-- C+++$>++++ UL+++$>++++ P+++$>+++++ L+++(++++)$>+++++ E>+++ W++?>$ N+>+++ o? K-? !w++ !O !M !V !PS@ !PE@ Y+ PGP=+++ t? 5? !X R? tv-- b+@ DI D+ G-- e++>+++ h-->++ !r y? ------END GEEK CODE BLOCK------ [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-27 18:18 ` BUG: CD driver sends command during host removal Alan Stern 2004-09-27 18:51 ` Mohammed Sameer @ 2004-09-29 16:06 ` Luben Tuikov 2004-09-29 16:55 ` Alan Stern 1 sibling, 1 reply; 28+ messages in thread From: Luben Tuikov @ 2004-09-29 16:06 UTC (permalink / raw) To: Alan Stern; +Cc: SCSI development list, Mohammed Sameer, USB users list Alan Stern wrote: > I received the following error report, showing that something in the CD > driver attempts to send a command to a USB CD-RW drive when the host is > removed, leading to an oops. This was on a 2.6.9-rc2 system (Mohammed > please correct me if that's wrong). > > The initial part of the log shows a perfectly normal probe and scan of the > drive. Here are the important bits: > > On Sun, 26 Sep 2004, Mohammed Sameer wrote: > > > Sep 26 11:20:48 localhost kernel: usb-storage: Command INQUIRY (6 bytes) > > Sep 26 11:20:48 localhost kernel: usb-storage: 12 00 00 00 24 00 > > Sep 26 11:20:48 localhost kernel: usb-storage: scsi cmd done, result=0x0 > > Sep 26 11:20:48 localhost kernel: Vendor: MSI Model: CD-RW > CR52 Rev: 3.70 > > Sep 26 11:20:48 localhost kernel: Type: > CD-ROM ANSI SCSI revision: 02 > > > Sep 26 11:20:48 localhost kernel: usb-storage: Command > TEST_UNIT_READY (6 bytes) > > Sep 26 11:20:48 localhost kernel: usb-storage: 00 00 00 00 00 00 > > Sep 26 11:20:48 localhost kernel: usb-storage: -- transport indicates > command failure > > Sep 26 11:20:48 localhost kernel: usb-storage: Issuing > auto-REQUEST_SENSE > > Sep 26 11:20:48 localhost kernel: usb-storage: -- code: 0x70, key: > 0x6, ASC: 0x29, ASCQ: 0x0 > > Sep 26 11:20:48 localhost kernel: usb-storage: Unit Attention: Power > on, reset, or bus device reset occurred > > Sep 26 11:20:48 localhost kernel: usb-storage: scsi cmd done, result=0x2 > > > Sep 26 11:20:48 localhost kernel: usb-storage: Command > TEST_UNIT_READY (6 bytes) > > Sep 26 11:20:48 localhost kernel: usb-storage: 00 00 00 00 00 00 > > Sep 26 11:20:48 localhost kernel: usb-storage: scsi cmd done, result=0x0 > > > Sep 26 11:20:48 localhost kernel: usb-storage: Command MODE_SENSE_10 > (10 bytes) > > Sep 26 11:20:48 localhost kernel: usb-storage: 5a 00 2a 00 00 00 00 > 00 80 00 > > Sep 26 11:20:48 localhost kernel: usb-storage: scsi cmd done, result=0x0 > > Sep 26 11:20:48 localhost kernel: sr0: scsi3-mmc drive: 40x/40x > writer cd/rw xa/form2 cdda tray > > Sep 26 11:20:48 localhost kernel: Attached scsi CD-ROM sr0 at scsi0, > channel 0, id 0, lun 0Sep 26 11:20:54 localhost kernel: usb 1-2: USB > disconnect, address 2 > > Up to here everything looks okay. Then the user unplugged the USB cable: > > > Sep 26 11:20:54 localhost kernel: usb-storage: storage_disconnect() > called > > Sep 26 11:20:54 localhost kernel: usb-storage: > usb_stor_stop_transport called > > Next usb-storage called scsi_remove_host(). Apparently this caused some > component of the CD driver to queue a command: > > > Sep 26 11:20:54 localhost kernel: usb-storage: queuecommand called > > Sep 26 11:20:54 localhost kernel: usb-storage: *** thread awakened. > > Sep 26 11:20:54 localhost kernel: usb-storage: No command during > disconnect > > Sep 26 11:20:54 localhost kernel: usb-storage: *** thread sleeping. > > usb-storage accepted the command but then ignored it because the host was > in process of removal. Should the queuecommand routine have rejected the > command? Yes, if the service delivery subsystem (SDS) knows that the device is gone and the command wouldn't be delivered, it should *not* "ignore" the command, but return it with error. I.e. if the LLDD has active/most recent knowledge about the device whereto the command is destined, it should act on that and return an appropriate error. After all, this is what a properly implemented SDS would do. > This would involve a race, because it's possible for > queuecommand to accept a command and then scsi_remove_host() to be called > before the command is carried out. If the command hasn't been carried out, then delivery would fail and SDS would return the appropriate error back to SCSI Core. If the command has already been delivered, SCSI core would preempt it without waiting for it to timeout. This is part of proper error recovery, as it knows that the device disappeared -- notified by usb-storage. > > After five seconds the command timed out: > > > Sep 26 11:20:59 localhost kernel: usb-storage: command_abort called > > Sep 26 11:20:59 localhost kernel: usb-storage: -- nothing to abort > > usb-storage ignored the request to abort the command (the command_abort > routine returned FAILED because no command was running). So error Where *was* the command? From the point of time when queuecommand() is called until scsi_done() is called, the command belongs to the LLDD. It should honor any TMF, regardless of the _state_ of the task. Luben > recovery proceeded to try a device reset and then a bus reset. Neither > one was allowed: > > > Sep 26 11:20:59 localhost kernel: usb-storage: device_reset called > > Sep 26 11:20:59 localhost kernel: usb-storage: No reset during > disconnect > > Sep 26 11:20:59 localhost kernel: usb-storage: bus_reset called > > Sep 26 11:20:59 localhost kernel: usb-storage: No reset during > disconnect > > usb-storage doesn't define a host reset, so error recovery gave up: > > > Sep 26 11:20:59 localhost kernel: scsi: Device offlined - not ready > after error recovery: host 0 channel 0 id 0 lun 0 > > Sep 26 11:20:59 localhost kernel: Badness in scsi_device_set_state at > drivers/scsi/scsi_lib.c:1688 > > Sep 26 11:20:59 localhost kernel: [<cf92d24f>] > scsi_device_set_state+0xc4/0x112 [scsi_mod]Sep 26 11:20:59 localhost > kernel: [<cf92afa0>] scsi_eh_offline_sdevs+0x64/0x80 [scsi_mod] > > > Sep 26 11:20:59 localhost kernel: [<cf92b4b0>] > scsi_unjam_host+0xb6/0x1eb [scsi_mod] > > Sep 26 11:20:59 localhost kernel: [<c0115777>] > default_wake_function+0x0/0x12 > > Sep 26 11:20:59 localhost kernel: [<cf92b6b4>] > scsi_error_handler+0xcf/0x16b [scsi_mod] > > Sep 26 11:20:59 localhost kernel: [<cf92b5e5>] > scsi_error_handler+0x0/0x16b [scsi_mod] > > Sep 26 11:20:59 localhost kernel: [<c010425d>] > kernel_thread_helper+0x5/0xb > > Sep 26 11:20:59 localhost kernel: Badness in kref_get at lib/kref.c:32 > > Sep 26 11:20:59 localhost kernel: [<c01aa017>] kref_get+0x44/0x46 > > Sep 26 11:20:59 localhost kernel: [<c01a9c65>] kobject_get+0x1a/0x24 > > Sep 26 11:20:59 localhost kernel: [<c0218ec6>] get_device+0x18/0x21 > > Sep 26 11:20:59 localhost kernel: [<cf92c9c4>] > scsi_request_fn+0x25/0x367 [scsi_mod] > > Sep 26 11:20:59 localhost kernel: [<c021f1e2>] > blk_insert_request+0xae/0xcc > > Sep 26 11:20:59 localhost kernel: [<c0106428>] dump_stack+0x1c/0x20 > > Sep 26 11:20:59 localhost kernel: [<cf92ba11>] > scsi_queue_insert+0x89/0xd0 [scsi_mod] > > Sep 26 11:20:59 localhost kernel: [<cf92b381>] > scsi_eh_flush_done_q+0x6f/0xe8 [scsi_mod] > > Sep 26 11:20:59 localhost kernel: [<cf92b47c>] > scsi_unjam_host+0x82/0x1eb [scsi_mod] > > Sep 26 11:20:59 localhost kernel: [<c0115777>] > default_wake_function+0x0/0x12 > > Sep 26 11:20:59 localhost kernel: [<cf92b6b4>] > scsi_error_handler+0xcf/0x16b [scsi_mod] > > Sep 26 11:20:59 localhost kernel: [<cf92b5e5>] > scsi_error_handler+0x0/0x16b [scsi_mod] > > Sep 26 11:20:59 localhost kernel: [<c010425d>] > kernel_thread_helper+0x5/0xb > > Sep 26 11:20:59 localhost kernel: cf92ee79 > > Sep 26 11:20:59 localhost kernel: Modules linked in: sr_mod > usb_storage ipv6 thermal fan button processor ac battery microcode e100 > yenta_socket pcmcia_core ehci_hcd usbcore ext2 dm_mod eepro100 mii > toshiba_acpi psmouse pcspkr msr snd_seq_midi snd_intel8x0 snd_ac97_codec > snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc gameport > snd_mpu401_uart snd_rawmidi snd_seq_oss snd_seq_midi_event snd_seq > snd_timer snd_seq_device snd soundcore ide_cd cdrom sd_mod scsi_mod rtc unix > > > Sep 26 11:20:59 localhost kernel: CPU: 0 > > Sep 26 11:20:59 localhost kernel: EIP: 0060:[<cf92ee79>] Not > tainted VLI > > Sep 26 11:20:59 localhost kernel: EFLAGS: 00010082 > (2.6.9-rc2-Uniball-1) > > Sep 26 11:20:59 localhost kernel: EIP is at > scsi_device_dev_release+0x26/0xeb [scsi_mod] > > Sep 26 11:20:59 localhost kernel: eax: c974dd84 ebx: c974dc08 > ecx: 00200200 edx: 00100100 > > Sep 26 11:20:59 localhost kernel: esi: c974dc00 edi: 00000282 > ebp: cef084b4 esp: c9091e80 > > Sep 26 11:20:59 localhost kernel: ds: 007b es: 007b ss: 0068 > > Sep 26 11:20:59 localhost kernel: Process scsi_eh_0 (pid: 2103, > threadinfo=c9090000 task=cd2b8aa0) > > Sep 26 11:20:59 localhost kernel: Stack: 00000046 c974dda8 c0334488 > c03344a0 cef084d8 c0218bf8 c974dd84 c974dda8 > > Sep 26 11:20:59 localhost kernel: c0334488 c03344a0 c01a9d07 > c974dda8 c974ddc0 c01a9d09 cef08400 cec76128 > > Sep 26 11:20:59 localhost kernel: c01aa052 c974dda8 cef08400 > cec76128 c974dd84 cef302b0 c974dc00 c01a9d31 > > Sep 26 11:20:59 localhost kernel: Call Trace: > > Sep 26 11:20:59 localhost kernel: [<c0218bf8>] device_release+0x58/0x5c > > Sep 26 11:20:59 localhost kernel: [<c01a9d07>] > kobject_cleanup+0x98/0x9a > > Sep 26 11:20:59 localhost kernel: [<c01a9d09>] kobject_release+0x0/0xa > > Sep 26 11:20:59 localhost kernel: [<c01aa052>] kref_put+0x39/0x93 > > Sep 26 11:20:59 localhost kernel: [<c01a9d31>] kobject_put+0x1e/0x22 > > Sep 26 11:20:59 localhost kernel: [<c01a9d09>] kobject_release+0x0/0xa > > Sep 26 11:20:59 localhost kernel: [<cf92cb82>] > scsi_request_fn+0x1e3/0x367 [scsi_mod] > > Sep 26 11:20:59 localhost kernel: [<c021f1e2>] > blk_insert_request+0xae/0xcc > > Sep 26 11:20:59 localhost kernel: [<c0106428>] dump_stack+0x1c/0x20 > > Sep 26 11:20:59 localhost kernel: [<cf92ba11>] > scsi_queue_insert+0x89/0xd0 [scsi_mod] > > Sep 26 11:20:59 localhost kernel: [<cf92b381>] > scsi_eh_flush_done_q+0x6f/0xe8 [scsi_mod] > > Sep 26 11:20:59 localhost kernel: [<cf92b47c>] > scsi_unjam_host+0x82/0x1eb [scsi_mod] > > Sep 26 11:20:59 localhost kernel: [<c0115777>] > default_wake_function+0x0/0x12 > > Sep 26 11:20:59 localhost kernel: [<cf92b6b4>] > scsi_error_handler+0xcf/0x16b [scsi_mod] > > Sep 26 11:20:59 localhost kernel: [<cf92b5e5>] > scsi_error_handler+0x0/0x16b [scsi_mod] > > Sep 26 11:20:59 localhost kernel: [<c010425d>] > kernel_thread_helper+0x5/0xb > > Sep 26 11:20:59 localhost kernel: Code: e9 7c a0 8e f0 55 57 56 53 83 > ec 04 8b 44 24 18 8b 68 20 8d b0 7c fe ff ff 9c 5f fa 8d 98 84 fe ff ff > 8b 90 84 fe ff ff 8b 4b 04 <89> 4a 04 89 11 c7 43 04 00 02 20 00 8d 98 > 8c fe ff ff 8b 90 8c > > I've omitted the remaining parts of the fault cascade. > > Can anyone help? > > Alan Stern > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 16:06 ` Luben Tuikov @ 2004-09-29 16:55 ` Alan Stern 2004-09-29 17:09 ` Mike Anderson 2004-09-29 18:02 ` Luben Tuikov 0 siblings, 2 replies; 28+ messages in thread From: Alan Stern @ 2004-09-29 16:55 UTC (permalink / raw) To: Luben Tuikov; +Cc: SCSI development list, Mohammed Sameer, USB users list On Wed, 29 Sep 2004, Luben Tuikov wrote: > > Next usb-storage called scsi_remove_host(). Apparently this caused some > > component of the CD driver to queue a command: This sounds like a bug, by the way. Commands shouldn't be queued because of a call to scsi_remove_host! > > > Sep 26 11:20:54 localhost kernel: usb-storage: queuecommand called > > > Sep 26 11:20:54 localhost kernel: usb-storage: *** thread awakened. > > > Sep 26 11:20:54 localhost kernel: usb-storage: No command during > > disconnect > > > Sep 26 11:20:54 localhost kernel: usb-storage: *** thread sleeping. > > > > usb-storage accepted the command but then ignored it because the host was > > in process of removal. Should the queuecommand routine have rejected the > > command? > > Yes, if the service delivery subsystem (SDS) knows that the device is gone > and the command wouldn't be delivered, it should *not* "ignore" the > command, but return it with error. > > I.e. if the LLDD has active/most recent knowledge about the device > whereto the command is destined, it should act on that and return > an appropriate error. After all, this is what a properly implemented > SDS would do. According to Documentation/scsi/scsi_mid_low_api.txt, the only possible error returns are SCSI_MLQUEUE_DEVICE_BUSY and SCSI_MLQUEUE_HOST_BUSY. Neither is appropriate; should the second one be returned? > > This would involve a race, because it's possible for > > queuecommand to accept a command and then scsi_remove_host() to be called > > before the command is carried out. > > If the command hasn't been carried out, then delivery would fail and SDS > would return the appropriate error back to SCSI Core. How? The SCSI core deallocates the scsi_cmnd before the SDS has a chance to return anything. > Where *was* the command? From the point of time when queuecommand() is > called until scsi_done() is called, the command belongs to the LLDD. > It should honor any TMF, regardless of the _state_ of the task. If the command belongs to the LLDD, why does scsi_remove_host do the following: calls scsi_host_cancel, which calls scsi_device_cancel_cb for each device, which calls scsi_device_cancel, which calls scsi_finish_command for each active command, which passes the command back to the upper layer Either there's a bug in the host removal sequence, or else the LLDD doesn't own any requests once scsi_remove_host has been called. Alan Stern ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 16:55 ` Alan Stern @ 2004-09-29 17:09 ` Mike Anderson 2004-09-29 18:02 ` Luben Tuikov 1 sibling, 0 replies; 28+ messages in thread From: Mike Anderson @ 2004-09-29 17:09 UTC (permalink / raw) To: Alan Stern Cc: Luben Tuikov, SCSI development list, Mohammed Sameer, USB users list Alan Stern [stern@rowland.harvard.edu] wrote: > On Wed, 29 Sep 2004, Luben Tuikov wrote: > > > > Next usb-storage called scsi_remove_host(). Apparently this caused some > > > component of the CD driver to queue a command: > > This sounds like a bug, by the way. Commands shouldn't be queued because > of a call to scsi_remove_host! > There was a reordering in the scsi_remove_host by me that may be causing your problem. Ths change was to allow devices to receive there SYNCRHONIZE_CACHE commands on rmmod. You can see part of the thread here. http://marc.theaimsgroup.com/?t=108701426000002&r=1&w=2 -andmike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 16:55 ` Alan Stern 2004-09-29 17:09 ` Mike Anderson @ 2004-09-29 18:02 ` Luben Tuikov 2004-09-29 18:09 ` James Bottomley 2004-09-29 19:01 ` Alan Stern 1 sibling, 2 replies; 28+ messages in thread From: Luben Tuikov @ 2004-09-29 18:02 UTC (permalink / raw) To: Alan Stern; +Cc: SCSI development list, Mohammed Sameer, USB users list Alan Stern wrote: > >>>Next usb-storage called scsi_remove_host(). Apparently this caused some >>>component of the CD driver to queue a command: > > > This sounds like a bug, by the way. Commands shouldn't be queued because > of a call to scsi_remove_host! Yes. >>>usb-storage accepted the command but then ignored it because the host was >>>in process of removal. Should the queuecommand routine have rejected the >>>command? >> >>Yes, if the service delivery subsystem (SDS) knows that the device is gone >>and the command wouldn't be delivered, it should *not* "ignore" the >>command, but return it with error. >> >>I.e. if the LLDD has active/most recent knowledge about the device >>whereto the command is destined, it should act on that and return >>an appropriate error. After all, this is what a properly implemented >>SDS would do. > > > According to Documentation/scsi/scsi_mid_low_api.txt, the only possible > error returns are SCSI_MLQUEUE_DEVICE_BUSY and SCSI_MLQUEUE_HOST_BUSY. > Neither is appropriate; should the second one be returned? I believe internally SCSI Core returns DID_ERROR. > >>> This would involve a race, because it's possible for >>>queuecommand to accept a command and then scsi_remove_host() to be called >>>before the command is carried out. >> >>If the command hasn't been carried out, then delivery would fail and SDS >>would return the appropriate error back to SCSI Core. > > > How? The SCSI core deallocates the scsi_cmnd before the SDS has a chance > to return anything. Hmm, once queuecommand() has been called, SCSI Core *should NOT* touch the struct command until the LLDD calls scsi_done() or it times out and ownership is given back indirectly via the appropriate return result of the times_out() function. >>Where *was* the command? From the point of time when queuecommand() is >>called until scsi_done() is called, the command belongs to the LLDD. >>It should honor any TMF, regardless of the _state_ of the task. > > > If the command belongs to the LLDD, why does scsi_remove_host do the > following: > > calls scsi_host_cancel, > which calls scsi_device_cancel_cb for each device, > which calls scsi_device_cancel, > which calls scsi_finish_command for each active command, > which passes the command back to the upper layer > > Either there's a bug in the host removal sequence, or else the LLDD > doesn't own any requests once scsi_remove_host has been called. Ah, definitely sounds like a bug -- the LLDD has not been given a chance to "return" the struct command. One thing I wanted to point out is that in scsi_remove_host() the _very_ first thing which should be done is setting the proper shost_state, SHOST_DEL, which should imply SHOST_CANCEL (by virtue of meaning), as opposed to "doubly" setting it. _Thought_ experiment: is it possibe to "catch" a command between a non-canceled host but canceled device (of that host)? So, first the host state is set to "cancelled", then each device is set accordingly, then commands sent to each device are "recovered" (all this top->down); and then the resources freed in opposite order: commands, devices, hosts. This may involve waiting for the LLDD to respond in the recovery process. Luben ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 18:02 ` Luben Tuikov @ 2004-09-29 18:09 ` James Bottomley 2004-09-29 18:58 ` Luben Tuikov 2004-09-29 19:01 ` Alan Stern 1 sibling, 1 reply; 28+ messages in thread From: James Bottomley @ 2004-09-29 18:09 UTC (permalink / raw) To: Luben Tuikov Cc: Alan Stern, SCSI development list, Mohammed Sameer, USB users list On Wed, 2004-09-29 at 14:02, Luben Tuikov wrote: > > According to Documentation/scsi/scsi_mid_low_api.txt, the only possible > > error returns are SCSI_MLQUEUE_DEVICE_BUSY and SCSI_MLQUEUE_HOST_BUSY. > > Neither is appropriate; should the second one be returned? > > I believe internally SCSI Core returns DID_ERROR. For a device that no-longer exists, DID_NO_CONNECT is probably the most appropriately descriptive. > >>> This would involve a race, because it's possible for > >>>queuecommand to accept a command and then scsi_remove_host() to be called > >>>before the command is carried out. Correct, scsi_remove_host() is asynchronous ... you can get requests after calling it while the queue is halting ... you must error these. > > If the command belongs to the LLDD, why does scsi_remove_host do the > > following: > > > > calls scsi_host_cancel, > > which calls scsi_device_cancel_cb for each device, > > which calls scsi_device_cancel, > > which calls scsi_finish_command for each active command, > > which passes the command back to the upper layer > > > > Either there's a bug in the host removal sequence, or else the LLDD > > doesn't own any requests once scsi_remove_host has been called. Right. scsi_remove_host tells the mid-layer that it's OK to trash all inflight commands because you removed all their users before calling it. It also tells us that you won't accept any future commands for this host (because you'll error any attempt in queuecommand). James ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 18:09 ` James Bottomley @ 2004-09-29 18:58 ` Luben Tuikov 2004-09-29 19:39 ` James Bottomley 0 siblings, 1 reply; 28+ messages in thread From: Luben Tuikov @ 2004-09-29 18:58 UTC (permalink / raw) To: James Bottomley Cc: Alan Stern, SCSI development list, Mohammed Sameer, USB users list James Bottomley wrote: > On Wed, 2004-09-29 at 14:02, Luben Tuikov wrote: > >>>According to Documentation/scsi/scsi_mid_low_api.txt, the only possible >>>error returns are SCSI_MLQUEUE_DEVICE_BUSY and SCSI_MLQUEUE_HOST_BUSY. >>>Neither is appropriate; should the second one be returned? >> >>I believe internally SCSI Core returns DID_ERROR. > > > For a device that no-longer exists, DID_NO_CONNECT is probably the most > appropriately descriptive. Does this mean that scsi_device_cancel() should set the result code to DID_NO_CONNECT either? >>>>>This would involve a race, because it's possible for >>>>>queuecommand to accept a command and then scsi_remove_host() to be called >>>>>before the command is carried out. > > > Correct, scsi_remove_host() is asynchronous ... you can get requests > after calling it while the queue is halting ... you must error these. > > >>>If the command belongs to the LLDD, why does scsi_remove_host do the >>>following: >>> >>> calls scsi_host_cancel, >>> which calls scsi_device_cancel_cb for each device, >>> which calls scsi_device_cancel, >>> which calls scsi_finish_command for each active command, >>> which passes the command back to the upper layer >>> >>>Either there's a bug in the host removal sequence, or else the LLDD >>>doesn't own any requests once scsi_remove_host has been called. > > > Right. scsi_remove_host tells the mid-layer that it's OK to trash all > inflight commands because you removed all their users before calling > it. It also tells us that you won't accept any future commands for this > host (because you'll error any attempt in queuecommand). Do you mean to say that when scsi_remove_host() is called, the LLDD must no own any commands? This is good, it means that the LLDD plugged queuecommand(), did "recovery" of pending commands and is ready for slave_destroy(). But does this also mean that scsi_device_cancel() (cancel_pending_io) is unnecessary in the scsi_remove_host() path as there's no outstanding IO to the host? Luben ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 18:58 ` Luben Tuikov @ 2004-09-29 19:39 ` James Bottomley 0 siblings, 0 replies; 28+ messages in thread From: James Bottomley @ 2004-09-29 19:39 UTC (permalink / raw) To: Luben Tuikov Cc: Alan Stern, SCSI development list, Mohammed Sameer, USB users list On Wed, 2004-09-29 at 14:58, Luben Tuikov wrote: > Does this mean that scsi_device_cancel() should set the result > code to DID_NO_CONNECT either? No, I think for in-flight commands, DID_ERROR is the safest. They could, after all have partially deposited their data load before being cancelled. DID_NO_CONNECT means that no processing was ever done. > > Right. scsi_remove_host tells the mid-layer that it's OK to trash all > > inflight commands because you removed all their users before calling > > it. It also tells us that you won't accept any future commands for this > > host (because you'll error any attempt in queuecommand). > > Do you mean to say that when scsi_remove_host() is called, > the LLDD must no own any commands? This is good, it means > that the LLDD plugged queuecommand(), did "recovery" of > pending commands and is ready for slave_destroy(). No. It's up to the LLD. If it wants to complete all its commands then call scsi_remove_host(), that's fine. However it's also fine for it to call scsi_remove_host() with pending in-flight commands; if it does, ownership of these is transferred back to the mid-layer and we cancel them all. James ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 18:02 ` Luben Tuikov 2004-09-29 18:09 ` James Bottomley @ 2004-09-29 19:01 ` Alan Stern 2004-09-29 19:27 ` Mike Anderson ` (2 more replies) 1 sibling, 3 replies; 28+ messages in thread From: Alan Stern @ 2004-09-29 19:01 UTC (permalink / raw) To: Luben Tuikov; +Cc: SCSI development list, Mohammed Sameer, USB users list On 29 Sep 2004, James Bottomley wrote: > On Wed, 2004-09-29 at 14:02, Luben Tuikov wrote: > > > According to Documentation/scsi/scsi_mid_low_api.txt, the only possible > > > error returns are SCSI_MLQUEUE_DEVICE_BUSY and SCSI_MLQUEUE_HOST_BUSY. > > > Neither is appropriate; should the second one be returned? > > > > I believe internally SCSI Core returns DID_ERROR. > > For a device that no-longer exists, DID_NO_CONNECT is probably the most > appropriately descriptive. Regardless of how descriptive the value is, the code in scsi_dispatch_cmd treats anything other than SCSI_MLQUEUE_DEVICE_BUSY as SCSI_MLQUEUE_HOST_BUSY. Will this matter? On Wed, 29 Sep 2004, Luben Tuikov wrote: > Hmm, once queuecommand() has been called, SCSI Core *should NOT* touch > the struct command until the LLDD calls scsi_done() or it times > out and ownership is given back indirectly via the appropriate > return result of the times_out() function. On 29 Sep 2004, James Bottomley wrote: > Right. scsi_remove_host tells the mid-layer that it's OK to trash all > inflight commands because you removed all their users before calling > it. It also tells us that you won't accept any future commands for this > host (because you'll error any attempt in queuecommand). It sounds like the two of you are in contradiction. Should the SCSI core deallocate in-flight commands without consulting the LLDD or shouldn't it? Alan Stern ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 19:01 ` Alan Stern @ 2004-09-29 19:27 ` Mike Anderson 2004-09-29 19:33 ` Luben Tuikov 2004-09-29 19:50 ` James Bottomley 2 siblings, 0 replies; 28+ messages in thread From: Mike Anderson @ 2004-09-29 19:27 UTC (permalink / raw) To: Alan Stern Cc: Luben Tuikov, SCSI development list, Mohammed Sameer, USB users list Alan Stern [stern@rowland.harvard.edu] wrote: > On 29 Sep 2004, James Bottomley wrote: > > > On Wed, 2004-09-29 at 14:02, Luben Tuikov wrote: > > > > According to Documentation/scsi/scsi_mid_low_api.txt, the only possible > > > > error returns are SCSI_MLQUEUE_DEVICE_BUSY and SCSI_MLQUEUE_HOST_BUSY. > > > > Neither is appropriate; should the second one be returned? > > > > > > I believe internally SCSI Core returns DID_ERROR. > > > > For a device that no-longer exists, DID_NO_CONNECT is probably the most > > appropriately descriptive. > > Regardless of how descriptive the value is, the code in > scsi_dispatch_cmd treats anything other than SCSI_MLQUEUE_DEVICE_BUSY > as SCSI_MLQUEUE_HOST_BUSY. Will this matter? > You would not want to return non zero status from your queucommand, but set the result of the command to DID_NO_CONNECT and call done. -andmike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 19:01 ` Alan Stern 2004-09-29 19:27 ` Mike Anderson @ 2004-09-29 19:33 ` Luben Tuikov 2004-09-29 19:50 ` James Bottomley 2 siblings, 0 replies; 28+ messages in thread From: Luben Tuikov @ 2004-09-29 19:33 UTC (permalink / raw) To: Alan Stern; +Cc: SCSI development list, Mohammed Sameer, USB users list Alan Stern wrote: > On Wed, 29 Sep 2004, Luben Tuikov wrote: > > > Hmm, once queuecommand() has been called, SCSI Core *should NOT* touch > > the struct command until the LLDD calls scsi_done() or it times > > out and ownership is given back indirectly via the appropriate > > return result of the times_out() function. > > On 29 Sep 2004, James Bottomley wrote: > > > Right. scsi_remove_host tells the mid-layer that it's OK to trash all > > inflight commands because you removed all their users before calling > > it. It also tells us that you won't accept any future commands for this > > host (because you'll error any attempt in queuecommand). > > It sounds like the two of you are in contradiction. Should the SCSI core > deallocate in-flight commands without consulting the LLDD or shouldn't it? If the host is really gone, then no pending commands would ever "return". This means that the LLDD can call scsi_done() immediately, without any waiting or delay of any sort. Basically, when your host is gone, plug queuecommand(), call scsi_done() on all your pending commands, and call scsi_remove_host(). The only problem would be if you have to sleep while "recovering" the pending commands, but I cannot imagine why you'd do that as your host is gone. Now since LLDD shouldn't keep internal queues of pending commands, ideally, this function is left to SCSI Core. Luben ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 19:01 ` Alan Stern 2004-09-29 19:27 ` Mike Anderson 2004-09-29 19:33 ` Luben Tuikov @ 2004-09-29 19:50 ` James Bottomley 2004-09-29 20:31 ` Mike Anderson 2 siblings, 1 reply; 28+ messages in thread From: James Bottomley @ 2004-09-29 19:50 UTC (permalink / raw) To: Alan Stern Cc: Luben Tuikov, SCSI development list, Mohammed Sameer, USB users list On Wed, 2004-09-29 at 15:01, Alan Stern wrote: > Regardless of how descriptive the value is, the code in > scsi_dispatch_cmd treats anything other than SCSI_MLQUEUE_DEVICE_BUSY > as SCSI_MLQUEUE_HOST_BUSY. Will this matter? Yes, you have to call cmd->done() having set the result field to DID_NO_CONNECT; then return zero from queuecommand() indicating that you're dealling with the command. > It sounds like the two of you are in contradiction. Should the SCSI core > deallocate in-flight commands without consulting the LLDD or shouldn't it? Once you've called scsi_remove_host() the mid-layer will take control of your in-flight commands (if there are any) and error them back to the user. You need to take any actions to clean up internal structures belonging to the commands *before* you do a scsi_remove_host. James ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 19:50 ` James Bottomley @ 2004-09-29 20:31 ` Mike Anderson 2004-09-29 20:41 ` James Bottomley 0 siblings, 1 reply; 28+ messages in thread From: Mike Anderson @ 2004-09-29 20:31 UTC (permalink / raw) To: James Bottomley Cc: Alan Stern, Luben Tuikov, SCSI development list, Mohammed Sameer, USB users list James Bottomley [James.Bottomley@SteelEye.com] wrote: > Once you've called scsi_remove_host() the mid-layer will take control of > your in-flight commands (if there are any) and error them back to the > user. You need to take any actions to clean up internal structures > belonging to the commands *before* you do a scsi_remove_host. > How do we address the problem we get into with the reordering in scsi_remove_host (i.e. the call to scsi_forget_host prior to scsi_host_cancel) if we possibly generate new io from sd_sync_cache that will cause the error handler to fire up if the LLDD just lets the command fall on the floor with no response. -andmike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 20:31 ` Mike Anderson @ 2004-09-29 20:41 ` James Bottomley 2004-09-29 21:07 ` Mike Anderson 0 siblings, 1 reply; 28+ messages in thread From: James Bottomley @ 2004-09-29 20:41 UTC (permalink / raw) To: Mike Anderson Cc: Alan Stern, Luben Tuikov, SCSI development list, Mohammed Sameer, USB users list On Wed, 2004-09-29 at 16:31, Mike Anderson wrote: > James Bottomley [James.Bottomley@SteelEye.com] wrote: > > Once you've called scsi_remove_host() the mid-layer will take control of > > your in-flight commands (if there are any) and error them back to the > > user. You need to take any actions to clean up internal structures > > belonging to the commands *before* you do a scsi_remove_host. > > > > How do we address the problem we get into with the reordering in > scsi_remove_host (i.e. the call to scsi_forget_host prior to > scsi_host_cancel) if we possibly generate new io from sd_sync_cache > that will cause the error handler to fire up if the LLDD just lets the > command fall on the floor with no response. That's why LLD's are responsible for erroring all commands issued at this time if the removal is a surprise ejection. The commands have to be errored in a way (like DID_NO_CONNECT) that won't excite the error handler. James ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 20:41 ` James Bottomley @ 2004-09-29 21:07 ` Mike Anderson 2004-09-29 21:14 ` James Bottomley 2004-09-29 21:20 ` Alan Stern 0 siblings, 2 replies; 28+ messages in thread From: Mike Anderson @ 2004-09-29 21:07 UTC (permalink / raw) To: James Bottomley Cc: Alan Stern, Luben Tuikov, SCSI development list, Mohammed Sameer, USB users list James Bottomley [James.Bottomley@SteelEye.com] wrote: > On Wed, 2004-09-29 at 16:31, Mike Anderson wrote: > > James Bottomley [James.Bottomley@SteelEye.com] wrote: > > > Once you've called scsi_remove_host() the mid-layer will take control of > > > your in-flight commands (if there are any) and error them back to the > > > user. You need to take any actions to clean up internal structures > > > belonging to the commands *before* you do a scsi_remove_host. > > > > > > > How do we address the problem we get into with the reordering in > > scsi_remove_host (i.e. the call to scsi_forget_host prior to > > scsi_host_cancel) if we possibly generate new io from sd_sync_cache > > that will cause the error handler to fire up if the LLDD just lets the > > command fall on the floor with no response. > > That's why LLD's are responsible for erroring all commands issued at > this time if the removal is a surprise ejection. The commands have to > be errored in a way (like DID_NO_CONNECT) that won't excite the error > handler. > ok, thanks for the clarification. Your previous statement seemed to imply once scsi_remove_host was called the LLDD had no responsibility for calling done on commands sent to the LLDDs queuecommand. So now that this thread and the other thread related to similar shutdown issues has grown long is the next step to see if we can get the usb queuecommand to return DID_NO_CONNECT in this shutdown case. -andmike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 21:07 ` Mike Anderson @ 2004-09-29 21:14 ` James Bottomley 2004-09-29 21:20 ` Luben Tuikov 2004-09-29 21:20 ` Alan Stern 1 sibling, 1 reply; 28+ messages in thread From: James Bottomley @ 2004-09-29 21:14 UTC (permalink / raw) To: Mike Anderson Cc: Alan Stern, Luben Tuikov, SCSI development list, Mohammed Sameer, USB users list On Wed, 2004-09-29 at 17:07, Mike Anderson wrote: > ok, thanks for the clarification. Your previous statement seemed to imply > once scsi_remove_host was called the LLDD had no responsibility for > calling done on commands sent to the LLDDs queuecommand. Hang on a minute, there are two cases: Existing in-flight commands: Here, either the LLD returns done on them or the mid-layer will cancel them during the invocation of scsi_remove_host(). In the latter case, the LLD doesn't need to call done on them. New Commands going down into queuecommand: These the LLD must error out (by returning done with an error status). > So now that this thread and the other thread related to similar > shutdown issues has grown long is the next step to see if we can get the > usb queuecommand to return DID_NO_CONNECT in this shutdown case. Yes, that sounds correct. James ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 21:14 ` James Bottomley @ 2004-09-29 21:20 ` Luben Tuikov 2004-09-29 21:26 ` James Bottomley 0 siblings, 1 reply; 28+ messages in thread From: Luben Tuikov @ 2004-09-29 21:20 UTC (permalink / raw) To: James Bottomley Cc: Mike Anderson, Alan Stern, SCSI development list, Mohammed Sameer, USB users list James Bottomley wrote: > On Wed, 2004-09-29 at 17:07, Mike Anderson wrote: > > ok, thanks for the clarification. Your previous statement seemed to > imply > > once scsi_remove_host was called the LLDD had no responsibility for > > calling done on commands sent to the LLDDs queuecommand. > > Hang on a minute, there are two cases: > > Existing in-flight commands: > > Here, either the LLD returns done on them or the mid-layer will cancel > them during the invocation of scsi_remove_host(). In the latter case, > the LLD doesn't need to call done on them. Is it possible that there could be a slight race when the LLDD starts calling scsi_done() on pending commands, _after_ it has called scsi_remove_host()? BTW, why have/leave two alternate paths for this behavior? This may bite us back in the future when things get more complicated, i.e. dealing with more complicated devices, transports, etc. > New Commands going down into queuecommand: > > These the LLD must error out (by returning done with an error status). Right at queuecommand() invocation. Luben > > So now that this thread and the other thread related to similar > > shutdown issues has grown long is the next step to see if we can get the > > usb queuecommand to return DID_NO_CONNECT in this shutdown case. > > Yes, that sounds correct. > > James > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 21:20 ` Luben Tuikov @ 2004-09-29 21:26 ` James Bottomley 0 siblings, 0 replies; 28+ messages in thread From: James Bottomley @ 2004-09-29 21:26 UTC (permalink / raw) To: Luben Tuikov Cc: Mike Anderson, Alan Stern, SCSI development list, Mohammed Sameer, USB users list On Wed, 2004-09-29 at 17:20, Luben Tuikov wrote: > Is it possible that there could be a slight race when > the LLDD starts calling scsi_done() on pending commands, > _after_ it has called scsi_remove_host()? No, because it starts failing commmands *before* it calls scsi_remove_host(). > BTW, why have/leave two alternate paths for this behavior? > This may bite us back in the future when things get more > complicated, i.e. dealing with more complicated devices, > transports, etc. Certain LLDs have no internal track of outstanding commands. James ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 21:07 ` Mike Anderson 2004-09-29 21:14 ` James Bottomley @ 2004-09-29 21:20 ` Alan Stern 2004-10-02 23:57 ` Mohammed Sameer 1 sibling, 1 reply; 28+ messages in thread From: Alan Stern @ 2004-09-29 21:20 UTC (permalink / raw) To: Mike Anderson Cc: James Bottomley, Luben Tuikov, SCSI development list, Mohammed Sameer, USB users list On Wed, 29 Sep 2004, Mike Anderson wrote: > So now that this thread and the other thread related to similar > shutdown issues has grown long is the next step to see if we can get the > usb queuecommand to return DID_NO_CONNECT in this shutdown case. Here's a patch. Mohammed, please try it out and tell us how it works. I just wrote it, so I haven't had a chance to test it myself yet. Alan Stern ===== drivers/usb/storage/scsiglue.c 1.84 vs edited ===== --- 1.84/drivers/usb/storage/scsiglue.c 2004-09-13 08:11:34 -04:00 +++ edited/drivers/usb/storage/scsiglue.c 2004-09-29 16:08:47 -04:00 @@ -183,6 +183,14 @@ return SCSI_MLQUEUE_HOST_BUSY; } + /* fail the command if we are disconnecting */ + if (test_bit(US_FLIDX_DISCONNECTING, &us->flags)) { + US_DEBUGP("Command failed for disconnect\n"); + srb->result = DID_NO_CONNECT << 16; + done(srb); + return 0; + } + srb->scsi_done = done; us->srb = srb; ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-09-29 21:20 ` Alan Stern @ 2004-10-02 23:57 ` Mohammed Sameer 0 siblings, 0 replies; 28+ messages in thread From: Mohammed Sameer @ 2004-10-02 23:57 UTC (permalink / raw) To: USB users list; +Cc: SCSI development list [-- Attachment #1: Type: text/plain, Size: 1704 bytes --] On Wed, Sep 29, 2004 at 05:20:21PM -0400, Alan Stern wrote: > On Wed, 29 Sep 2004, Mike Anderson wrote: > > > So now that this thread and the other thread related to similar > > shutdown issues has grown long is the next step to see if we can get the > > usb queuecommand to return DID_NO_CONNECT in this shutdown case. > > Here's a patch. Mohammed, please try it out and tell us how it works. I > just wrote it, so I haven't had a chance to test it myself yet. > > Alan Stern > > > ===== drivers/usb/storage/scsiglue.c 1.84 vs edited ===== > --- 1.84/drivers/usb/storage/scsiglue.c 2004-09-13 08:11:34 -04:00 > +++ edited/drivers/usb/storage/scsiglue.c 2004-09-29 16:08:47 -04:00 > @@ -183,6 +183,14 @@ > return SCSI_MLQUEUE_HOST_BUSY; > } > > + /* fail the command if we are disconnecting */ > + if (test_bit(US_FLIDX_DISCONNECTING, &us->flags)) { > + US_DEBUGP("Command failed for disconnect\n"); > + srb->result = DID_NO_CONNECT << 16; > + done(srb); > + return 0; > + } > + > srb->scsi_done = done; > us->srb = srb; > > though patch complained: patching file drivers/usb/storage/scsiglue.c Hunk #1 succeeded at 166 (offset -17 lines). But the patch works fine, Thanks! May god bless you all!! -- ---------------- -- Katoob Main Developer, Arabbix Maintainer. Linux registered user #224950 Proud Egyptian GNU/Linux User Group <www.eglug.org> Admin. Life powered by Debian, Homepage: www.foolab.org -- Don't send me any attachment in Micro$oft (.DOC, .PPT) format please Read http://www.fsf.org/philosophy/no-word-attachments.html Preferable attachments: .PDF, .HTML, .TXT Thanx for adding this text to Your signature [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal
@ 2004-10-11 19:20 Alan Stern
2004-10-11 19:36 ` James Bottomley
0 siblings, 1 reply; 28+ messages in thread
From: Alan Stern @ 2004-10-11 19:20 UTC (permalink / raw)
To: James Bottomley; +Cc: Mike Anderson, SCSI development list
James Bottomley [James.Bottomley@SteelEye.com] wrote:
> On Wed, 2004-09-29 at 16:31, Mike Anderson wrote:
> > James Bottomley [James.Bottomley@SteelEye.com] wrote:
> > > Once you've called scsi_remove_host() the mid-layer will take control of
> > > your in-flight commands (if there are any) and error them back to the
> > > user. You need to take any actions to clean up internal structures
> > > belonging to the commands *before* you do a scsi_remove_host.
> > >
> >
> > How do we address the problem we get into with the reordering in
> > scsi_remove_host (i.e. the call to scsi_forget_host prior to
> > scsi_host_cancel) if we possibly generate new io from sd_sync_cache
> > that will cause the error handler to fire up if the LLDD just lets the
> > command fall on the floor with no response.
>
> That's why LLD's are responsible for erroring all commands issued at
> this time if the removal is a surprise ejection. The commands have to
> be errored in a way (like DID_NO_CONNECT) that won't excite the error
> handler.
This raises a question for the case where the host removal is not a
surprise ejection. The LLD knows to clean up commands from _before_ doing
scsi_remove_host, and it knows that it should handle commands from _after_
calling scsi_remove_host.
But scsi_remove_host isn't synchronized at all with queuecommand, so what
about commands that arrive at just about _the same time_ as when
scsi_remove_host is called? The LLD has no way to tell which class such a
command belongs to. What then?
Alan Stern
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: BUG: CD driver sends command during host removal 2004-10-11 19:20 Alan Stern @ 2004-10-11 19:36 ` James Bottomley 2004-10-11 20:03 ` Alan Stern 0 siblings, 1 reply; 28+ messages in thread From: James Bottomley @ 2004-10-11 19:36 UTC (permalink / raw) To: Alan Stern; +Cc: Mike Anderson, SCSI development list On Mon, 2004-10-11 at 14:20, Alan Stern wrote: > James Bottomley [James.Bottomley@SteelEye.com] wrote: > > That's why LLD's are responsible for erroring all commands issued at > > this time if the removal is a surprise ejection. The commands have to > > be errored in a way (like DID_NO_CONNECT) that won't excite the error > > handler. > > This raises a question for the case where the host removal is not a > surprise ejection. The LLD knows to clean up commands from _before_ doing > scsi_remove_host, and it knows that it should handle commands from _after_ > calling scsi_remove_host. > > But scsi_remove_host isn't synchronized at all with queuecommand, so what > about commands that arrive at just about _the same time_ as when > scsi_remove_host is called? The LLD has no way to tell which class such a > command belongs to. What then? The host's job is simply to send commands to the device ... if the device (or host) still exists in the system, you send them; if it doesn't, you error them. Even in the notified ejection case, there's no upper bound on the time it will take hotplug to clean up ... this can be particularly long if the user has things mounted and doesn't let us simply kill everything on the filesystem. So the only difference as far as the HBA sees it is that for notified ejection, when it calls scsi_remove_host() it can still send commands. For surprise ejection, the device is gone and it must error. James ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-10-11 19:36 ` James Bottomley @ 2004-10-11 20:03 ` Alan Stern 2004-10-11 20:12 ` James Bottomley 0 siblings, 1 reply; 28+ messages in thread From: Alan Stern @ 2004-10-11 20:03 UTC (permalink / raw) To: James Bottomley; +Cc: Mike Anderson, SCSI development list On 11 Oct 2004, James Bottomley wrote: > > But scsi_remove_host isn't synchronized at all with queuecommand, so what > > about commands that arrive at just about _the same time_ as when > > scsi_remove_host is called? The LLD has no way to tell which class such a > > command belongs to. What then? > > The host's job is simply to send commands to the device ... if the > device (or host) still exists in the system, you send them; if it > doesn't, you error them. > > Even in the notified ejection case, there's no upper bound on the time > it will take hotplug to clean up ... this can be particularly long if > the user has things mounted and doesn't let us simply kill everything on > the filesystem. > > So the only difference as far as the HBA sees it is that for notified > ejection, when it calls scsi_remove_host() it can still send commands. > For surprise ejection, the device is gone and it must error. The problem arises when a command completes. In the notified ejection case, if the LLD tries to call the scsi_done routine for a completed command which was submitted before scsi_remove_host and which the SCSI core has already cleaned up, it will cause an oops. Since the LLD has no way to tell whether the core has cleaned up a command or not, its only choice is to fail _every_ command starting shortly before it calls scsi_remove_host. Alan Stern ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-10-11 20:03 ` Alan Stern @ 2004-10-11 20:12 ` James Bottomley 2004-10-11 20:40 ` Mike Anderson 0 siblings, 1 reply; 28+ messages in thread From: James Bottomley @ 2004-10-11 20:12 UTC (permalink / raw) To: Alan Stern; +Cc: Mike Anderson, SCSI development list On Mon, 2004-10-11 at 15:03, Alan Stern wrote: > The problem arises when a command completes. In the notified ejection > case, if the LLD tries to call the scsi_done routine for a completed > command which was submitted before scsi_remove_host and which the SCSI > core has already cleaned up, it will cause an oops. Since the LLD has no > way to tell whether the core has cleaned up a command or not, its only > choice is to fail _every_ command starting shortly before it calls > scsi_remove_host. This is the remove implies cancel issue that was discussed earlier. I thought the proposal was to have a remove that wouldn't automatically cancel all the commands? ... although I don't think I've seen any code for that case yet. James ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-10-11 20:12 ` James Bottomley @ 2004-10-11 20:40 ` Mike Anderson 2004-10-11 21:15 ` James Bottomley 0 siblings, 1 reply; 28+ messages in thread From: Mike Anderson @ 2004-10-11 20:40 UTC (permalink / raw) To: James Bottomley; +Cc: Alan Stern, SCSI development list James Bottomley [James.Bottomley@SteelEye.com] wrote: > On Mon, 2004-10-11 at 15:03, Alan Stern wrote: > > The problem arises when a command completes. In the notified ejection > > case, if the LLD tries to call the scsi_done routine for a completed > > command which was submitted before scsi_remove_host and which the SCSI > > core has already cleaned up, it will cause an oops. Since the LLD has no > > way to tell whether the core has cleaned up a command or not, its only > > choice is to fail _every_ command starting shortly before it calls > > scsi_remove_host. > > This is the remove implies cancel issue that was discussed earlier. I > thought the proposal was to have a remove that wouldn't automatically > cancel all the commands? ... although I don't think I've seen any code > for that case yet. > Clarification. James, are you indicating that there needs to be a new scsi mid api that performs similar function to scsi_remove_host expect does not cancel commands? If is unclear to me if a LLDD provides a slave_destroy which is called from scsi_remove_device during the scsi_forget_host function that we would hit a case where the LLDD has good IO to complete still outstanding when we complete scsi_forget_host and call scsi_host_cancel. Is there some locking / synchronization issue with our state changes and the scsi_prep_fn / scsi_request_fn / scsi_dispatch_cmd sequence? -andmike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-10-11 20:40 ` Mike Anderson @ 2004-10-11 21:15 ` James Bottomley 2004-10-11 23:13 ` Mike Anderson 0 siblings, 1 reply; 28+ messages in thread From: James Bottomley @ 2004-10-11 21:15 UTC (permalink / raw) To: Mike Anderson; +Cc: Alan Stern, SCSI development list On Mon, 2004-10-11 at 15:40, Mike Anderson wrote: > > This is the remove implies cancel issue that was discussed earlier. I > > thought the proposal was to have a remove that wouldn't automatically > > cancel all the commands? ... although I don't think I've seen any code > > for that case yet. > > > > Clarification. James, are you indicating that there needs to be a new > scsi mid api that performs similar function to scsi_remove_host expect > does not cancel commands? Sorry, by "a remove that .." I was meaning "another remove method that ..." > If is unclear to me if a LLDD provides a slave_destroy which is called > from scsi_remove_device during the scsi_forget_host function that we > would hit a case where the LLDD has good IO to complete still > outstanding when we complete scsi_forget_host and call scsi_host_cancel. That depends what the LLD does with the slave destroy really. The API doesn't say the LLD has to chase all I/Os down when slave destroy is called, so we can't assume it has. > Is there some locking / synchronization issue with our state changes and > the scsi_prep_fn / scsi_request_fn / scsi_dispatch_cmd sequence? In the device, no ... I still need to check the host. James ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: BUG: CD driver sends command during host removal 2004-10-11 21:15 ` James Bottomley @ 2004-10-11 23:13 ` Mike Anderson 0 siblings, 0 replies; 28+ messages in thread From: Mike Anderson @ 2004-10-11 23:13 UTC (permalink / raw) To: James Bottomley; +Cc: Alan Stern, SCSI development list James Bottomley [James.Bottomley@SteelEye.com] wrote: > On Mon, 2004-10-11 at 15:40, Mike Anderson wrote: > > > This is the remove implies cancel issue that was discussed earlier. I > > > thought the proposal was to have a remove that wouldn't automatically > > > cancel all the commands? ... although I don't think I've seen any code > > > for that case yet. > > > > > > > Clarification. James, are you indicating that there needs to be a new > > scsi mid api that performs similar function to scsi_remove_host expect > > does not cancel commands? > > Sorry, by "a remove that .." I was meaning "another remove method that > ..." > Well based on Mike C's mail it looks like when I moved scsi_forget_host up in scsi_remove_host that I broke scsi_host_cancel as it stands now scsi_remove_host is not really doing any cancels. http://marc.theaimsgroup.com/?l=linux-scsi&m=109743460011434&w=2 > > If is unclear to me if a LLDD provides a slave_destroy which is called > > from scsi_remove_device during the scsi_forget_host function that we > > would hit a case where the LLDD has good IO to complete still > > outstanding when we complete scsi_forget_host and call scsi_host_cancel. > > That depends what the LLD does with the slave destroy really. The API > doesn't say the LLD has to chase all I/Os down when slave destroy is > called, so we can't assume it has. > ok. -andmike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2004-10-11 23:13 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20040926082926.GA1944@uniball>
2004-09-27 18:18 ` BUG: CD driver sends command during host removal Alan Stern
2004-09-27 18:51 ` Mohammed Sameer
2004-09-29 16:06 ` Luben Tuikov
2004-09-29 16:55 ` Alan Stern
2004-09-29 17:09 ` Mike Anderson
2004-09-29 18:02 ` Luben Tuikov
2004-09-29 18:09 ` James Bottomley
2004-09-29 18:58 ` Luben Tuikov
2004-09-29 19:39 ` James Bottomley
2004-09-29 19:01 ` Alan Stern
2004-09-29 19:27 ` Mike Anderson
2004-09-29 19:33 ` Luben Tuikov
2004-09-29 19:50 ` James Bottomley
2004-09-29 20:31 ` Mike Anderson
2004-09-29 20:41 ` James Bottomley
2004-09-29 21:07 ` Mike Anderson
2004-09-29 21:14 ` James Bottomley
2004-09-29 21:20 ` Luben Tuikov
2004-09-29 21:26 ` James Bottomley
2004-09-29 21:20 ` Alan Stern
2004-10-02 23:57 ` Mohammed Sameer
2004-10-11 19:20 Alan Stern
2004-10-11 19:36 ` James Bottomley
2004-10-11 20:03 ` Alan Stern
2004-10-11 20:12 ` James Bottomley
2004-10-11 20:40 ` Mike Anderson
2004-10-11 21:15 ` James Bottomley
2004-10-11 23:13 ` Mike Anderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).