* Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
@ 2005-01-05 8:14 Andrew Morton
2005-01-05 14:17 ` James Bottomley
2005-01-05 15:50 ` Alan Stern
0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2005-01-05 8:14 UTC (permalink / raw)
To: linux-usb-devel, linux-scsi
Looks like the scsi-vs-usb-storage bunfight isn't yet settled...
Begin forwarded message:
Date: Fri, 31 Dec 2004 14:44:44 +1100
From: Srihari Vijayaraghavan <harisri@internode.on.net>
To: linux-kernel@vger.kernel.org
Subject: [BUG] USB Storage OOPS and a D state process in 2.6.10
Can easily reproduce this by plugging and unplugging an external USB DVD-RW a
few times on my Athlon 64 desktop:
usb-storage: device scan complete
usb 1-2: USB disconnect, address 22
Unable to handle kernel paging request at 000000000044c8c6 RIP:
<ffffffffa0108252>{:usb_storage:bus_reset+66}
PML4 331b9067 PGD 331ba067 PMD 331bb067 PTE 0
Oops: 0000 [1]
CPU 0
Modules linked in: radeon ipt_conntrack ip_conntrack nfsd exportfs lockd
autofs4 sunrpc iptable_filter ip_tables af_packet sr_mod dm_mod video button
ohci1394 ieee1394 usb_storage uhci_hcd ehci_hcd snd_via82xx snd_ac97_codec
snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart
snd_rawmidi snd_seq_device snd soundcore via_rhine mii r8169 crc32 floppy
unix ext3 mbcache jbd sata_via libata sd_mod scsi_mod
Pid: 8843, comm: scsi_eh_22 Not tainted 2.6.10
RIP: 0010:[<ffffffffa0108252>] <ffffffffa0108252>{:usb_storage:bus_reset+66}
RSP: 0018:000001002f9a7e68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 000001002f8ec200 RCX: 0000010037d6f670
RDX: 0000000000002003 RSI: 0000000000000000 RDI: 000000000044c68e
RBP: 00000000fffffff0 R08: 0000000000000000 R09: 0000000000000001
R10: 00000000ffffffff R11: 0000000000000000 R12: 0000010037d6f670
R13: 0000010036acf400 R14: 000001002f6c4c20 R15: 0000000000000001
FS: 0000002a955792c0(0000) GS:ffffffff803c9f00(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000044c8c6 CR3: 0000000000101000 CR4: 00000000000006e0
Process scsi_eh_22 (pid: 8843, threadinfo 000001002f9a6000, task
00000100369f70f0)
Stack: 0000010037d6f640 0000010037d6f640 0000010037d6f670 ffffffffa0003c31
0000000000000282 0000000000000000 0000010037d6f640 ffffffffa000463b
ff000000ff000000 ff000000ff000000
Call Trace:<ffffffffa0003c31>{:scsi_mod:scsi_try_bus_reset+113}
<ffffffffa000463b>{:scsi_mod:scsi_error_handler+2251}
<ffffffff8010ebf3>{child_rip+8}
<ffffffffa0003d70>{:scsi_mod:scsi_error_handler+0}
<ffffffff8010ebeb>{child_rip+0}
Code: 48 8b 87 38 02 00 00 80 78 04 01 75 31 48 8b 73 20 e8 e8 c8
RIP <ffffffffa0108252>{:usb_storage:bus_reset+66} RSP <000001002f9a7e68>
CR2: 000000000044c8c6
<6>agpgart: Found an AGP 3.0 compliant device at 0000:00:00.0.
And a D state hald process:
Dec 31 13:52:09 desktop kernel: hald D 00000100376c3938 0 3677
1 4600 2304 (NOTLB)
Dec 31 13:52:09 desktop kernel: 00000100376c3898 0000000000000006
000000732f5c80b8 000001003f7555d0
Dec 31 13:52:09 desktop kernel: 0000000000000736 000001003fed2070
0000010037d6f640 00000100368f8000
Dec 31 13:52:09 desktop kernel: 000001002f5c80b8 00000100376c3938
Dec 31 13:52:09 desktop kernel: Call
Trace:<ffffffff802c11eb>{wait_for_completion+139}
<ffffffff8012dc20>{default_wake_function+0}
Dec 31 13:52:09 desktop kernel:
<ffffffff8012dc20>{default_wake_function+0}
<ffffffffa0004f7b>{:scsi_mod:scsi_wait_req+91}
Dec 31 13:52:09 desktop kernel:
<ffffffffa0000038>{:scsi_mod:scsi_allocate_request+56}
Dec 31 13:52:09 desktop kernel:
<ffffffffa018a33c>{:sr_mod:sr_do_ioctl+156}
<ffffffffa018a654>{:sr_mod:sr_audio_ioctl+372}
Dec 31 13:52:09 desktop kernel:
<ffffffffa0004fac>{:scsi_mod:scsi_wait_req+140}
<ffffffff8023f71e>{cdrom_count_tracks+222}
Dec 31 13:52:09 desktop kernel: <ffffffff8023f9a0>{cdrom_open+448}
<ffffffffa00471f4>{:ext3:ext3_get_block_handle+228}
Dec 31 13:52:09 desktop kernel: <ffffffff8014ef4e>{find_get_page+14}
<ffffffff8016ce4c>{__find_get_block_slow+220}
Dec 31 13:52:09 desktop kernel:
<ffffffff8016d519>{__find_get_block+377} <ffffffff8016f87f>{__getblk+31}
Dec 31 13:52:09 desktop kernel: <ffffffff801af4ba>{avc_has_perm+90}
<ffffffff801af4ba>{avc_has_perm+90}
Dec 31 13:52:09 desktop kernel: <ffffffff801af4ba>{avc_has_perm+90}
<ffffffff801b02b4>{task_has_capability+100}
Dec 31 13:52:09 desktop kernel: <ffffffff801c2992>{kobject_get+18}
<ffffffff801c2992>{kobject_get+18}
Dec 31 13:52:09 desktop kernel:
<ffffffffa0189730>{:sr_mod:sr_block_open+176} <ffffffff8017286a>{do_open+170}
Dec 31 13:52:09 desktop kernel: <ffffffff80172c5f>{blkdev_open+47}
<ffffffff8016aa36>{dentry_open+230}
Dec 31 13:52:09 desktop kernel: <ffffffff8016ab7e>{filp_open+62}
<ffffffff8016abc7>{get_unused_fd+55}
Dec 31 13:52:09 desktop kernel: <ffffffff8016ad4c>{sys_open+76}
<ffffffff8010e1da>{system_call+126}
Dec 31 13:52:09 desktop kernel:
Thanks
Hari.
PS: I am not subscribed to LKML, so please cc me on replies.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 2005-01-05 8:14 Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 Andrew Morton @ 2005-01-05 14:17 ` James Bottomley 2005-01-05 15:50 ` Alan Stern 1 sibling, 0 replies; 15+ messages in thread From: James Bottomley @ 2005-01-05 14:17 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-usb-devel, SCSI Mailing List On Wed, 2005-01-05 at 00:14 -0800, Andrew Morton wrote: > Looks like the scsi-vs-usb-storage bunfight isn't yet settled... Actually, this one looks pretty clear cut: According to the trace, we get into error handling for a removed device. This: > RIP <ffffffffa0108252>{:usb_storage:bus_reset+66} RSP <000001002f9a7e68> says that the usb_storage bus_reset routine never returns. So the system is hung in error handling and the kernel is waiting for everything to return to normal. The fix is to work out why usb_storage hung there instead of returning success or fail. James ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 2005-01-05 8:14 Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 Andrew Morton 2005-01-05 14:17 ` James Bottomley @ 2005-01-05 15:50 ` Alan Stern 2005-01-06 10:38 ` [linux-usb-devel] " Srihari Vijayaraghavan 2005-01-08 0:43 ` Srihari Vijayaraghavan 1 sibling, 2 replies; 15+ messages in thread From: Alan Stern @ 2005-01-05 15:50 UTC (permalink / raw) To: Srihari Vijayaraghavan, James Bottomley Cc: USB development list, SCSI development list On Wed, 5 Jan 2005, Andrew Morton wrote: > Looks like the scsi-vs-usb-storage bunfight isn't yet settled... > > > Begin forwarded message: > > Date: Fri, 31 Dec 2004 14:44:44 +1100 > From: Srihari Vijayaraghavan <harisri@internode.on.net> > To: linux-kernel@vger.kernel.org > Subject: [BUG] USB Storage OOPS and a D state process in 2.6.10 > > > Can easily reproduce this by plugging and unplugging an external USB DVD-RW a > few times on my Athlon 64 desktop: > > usb-storage: device scan complete > usb 1-2: USB disconnect, address 22 > Unable to handle kernel paging request at 000000000044c8c6 RIP: > <ffffffffa0108252>{:usb_storage:bus_reset+66} > Process scsi_eh_22 (pid: 8843, threadinfo 000001002f9a6000, task > 00000100369f70f0) > Stack: 0000010037d6f640 0000010037d6f640 0000010037d6f670 ffffffffa0003c31 > 0000000000000282 0000000000000000 0000010037d6f640 ffffffffa000463b > ff000000ff000000 ff000000ff000000 > Call Trace:<ffffffffa0003c31>{:scsi_mod:scsi_try_bus_reset+113} > <ffffffffa000463b>{:scsi_mod:scsi_error_handler+2251} > <ffffffff8010ebf3>{child_rip+8} > <ffffffffa0003d70>{:scsi_mod:scsi_error_handler+0} > <ffffffff8010ebeb>{child_rip+0} > And a D state hald process: On Wed, 5 Jan 2005, James Bottomley wrote: > Actually, this one looks pretty clear cut: > > According to the trace, we get into error handling for a removed device. > This: > > > RIP <ffffffffa0108252>{:usb_storage:bus_reset+66} RSP <000001002f9a7e68> > > says that the usb_storage bus_reset routine never returns. > > So the system is hung in error handling and the kernel is waiting for > everything to return to normal. The fix is to work out why usb_storage > hung there instead of returning success or fail. I agree with James, although it's clear from the oops report why the routine doesn't return: It tries to reference an invalid address. Not being at all familiar with 64-bit code, I can't tell exactly where the invalid reference occurred. It looks like it's near the start of the bus_reset routine, but that doesn't make sense because none of the data structures referred to there should have been deallocated yet. That doesn't happen until after scsi_remove_host returns, and once that happens the error handler shouldn't run at all. Another question is why we get into the error handler in the first place. No way to tell from the information provided, but I suspect the haldaemon process isn't doing what it should. Srihari, can you try reproducing this after rebuilding the usb-storage driver with CONFIG_USB_STORAGE_DEBUG=y? Alan Stern ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 2005-01-05 15:50 ` Alan Stern @ 2005-01-06 10:38 ` Srihari Vijayaraghavan 2005-01-08 0:43 ` Srihari Vijayaraghavan 1 sibling, 0 replies; 15+ messages in thread From: Srihari Vijayaraghavan @ 2005-01-06 10:38 UTC (permalink / raw) To: Alan Stern; +Cc: James Bottomley, USB development list, SCSI development list On Thursday 06 January 2005 02:50, Alan Stern wrote: > ... > Srihari, can you try reproducing this after rebuilding the usb-storage > driver with CONFIG_USB_STORAGE_DEBUG=y? Alan, Sure, I shall try to trigger this bug with the debug option. (But I am afraid it would take a couple of days, as that USB peripheral belongs to a friend, who has a need for it ATM, and I shall borrow it off him, when it is available.) Thanks. Hari ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 2005-01-05 15:50 ` Alan Stern 2005-01-06 10:38 ` [linux-usb-devel] " Srihari Vijayaraghavan @ 2005-01-08 0:43 ` Srihari Vijayaraghavan 2005-01-08 2:11 ` Srihari Vijayaraghavan 2005-01-08 4:03 ` Alan Stern 1 sibling, 2 replies; 15+ messages in thread From: Srihari Vijayaraghavan @ 2005-01-08 0:43 UTC (permalink / raw) To: Alan Stern; +Cc: James Bottomley, USB development list, SCSI development list On Thursday 06 January 2005 02:50, Alan Stern wrote: > ... > Srihari, can you try reproducing this after rebuilding the usb-storage > driver with CONFIG_USB_STORAGE_DEBUG=y? Alan, I have made a couple of observations: 1. While it is still easy to trigger this bug without USB Storage Debug Option in vanilla 2.6.10, I cannot reproduce the bug with debug option. Perhaps it changes some timings. (Without debug option, within a dozen plug/unplug events I can reliably trigger the bug, but OTOH with debug options I cannot even after 50 events. Maybe I should run my desktop with debug option for better stability. :-)) 2. When running without debug option, the D state "hald" process appears first, and upon unplugging the drive an OOPS appears on the very next plugging of the drive. Here is the new OOPS: usb-storage: device found at 7 usb-storage: waiting for device to settle before scanning Unable to handle kernel paging request at 0000001600000019 RIP: <ffffffffa0108259>{:usb_storage:bus_reset+73} PML4 32fca067 PGD 0 Oops: 0000 [1] CPU 0 Modules linked in: radeon ipt_conntrack ip_conntrack nfsd exportfs lockd autofs4 sunrpc iptable_filter ip_tables af_packet sr_mod dm_mod video button ohci1394 ieee1394 usb_storage uhci_hcd ehci_hcd snd_via82xx snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore via_rhine mii r8169 crc32 floppy unix ext3 mbcache jbd sata_via libata sd_mod scsi_mod Pid: 5435, comm: scsi_eh_6 Not tainted 2.6.10 RIP: 0010:[<ffffffffa0108259>] <ffffffffa0108259>{:usb_storage:bus_reset+73} RSP: 0018:000001002fedde68 EFLAGS: 00010246 RAX: 0000001600000015 RBX: 0000010034fb7c00 RCX: 0000010037d73270 RDX: 0000000000002003 RSI: 0000000000000000 RDI: 0000003796e00000 RBP: 00000000fffffff0 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000000 R12: 0000010037d73270 R13: 000001003e4cc800 R14: 000001003572ac20 R15: 0000000000000001 FS: 0000002a955792c0(0000) GS:ffffffff803c9f00(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000001600000019 CR3: 0000000000101000 CR4: 00000000000006e0 Process scsi_eh_6 (pid: 5435, threadinfo 000001002fedc000, task 0000010036f1d650) Stack: 0000010037d73240 0000010037d73240 0000010037d73270 ffffffffa0003c31 0000000000000282 0000000000000000 0000010037d73240 ffffffffa000463b ff000000ff000000 ff000000ff000000 Call Trace:<ffffffffa0003c31>{:scsi_mod:scsi_try_bus_reset+113} <ffffffffa000463b>{:scsi_mod:scsi_error_handler+2251} <ffffffff8010ebf3>{child_rip+8} <ffffffffa0003d70>{:scsi_mod:scsi_error_handler+0} <ffffffff8010ebeb>{child_rip+0} Code: 80 78 04 01 75 31 48 8b 73 20 e8 e8 c8 13 e0 85 c0 41 89 c4 RIP <ffffffffa0108259>{:usb_storage:bus_reset+73} RSP <000001002fedde68> CR2: 0000001600000019 <6>usb 1-5: USB disconnect, address 7 And the D state "hald" process: Jan 8 11:29:24 desktop kernel: hald D 0000010037507938 0 3539 1 4354 3176 (NOTLB) Jan 8 11:29:24 desktop kernel: 0000010037507898 0000000000000006 000000732f812078 000001003811c0b0 Jan 8 11:29:24 desktop kernel: 0000000000003d15 000001003fd73490 0000010037d73240 000001003eb86000 Jan 8 11:29:24 desktop kernel: 000001002f812078 0000010037507938 Jan 8 11:29:24 desktop kernel: Call Trace:<ffffffff802c11eb>{wait_for_completion+139} <ffffffff8012dc20>{default_wake_function+0} Jan 8 11:29:24 desktop kernel: <ffffffff8012dc20>{default_wake_function+0} <ffffffffa0004f7b>{:scsi_mod:scsi_wait_req+91} Jan 8 11:29:24 desktop kernel: <ffffffffa0000038>{:scsi_mod:scsi_allocate_request+56} Jan 8 11:29:24 desktop kernel: <ffffffffa018a33c>{:sr_mod:sr_do_ioctl+156} <ffffffffa018a654>{:sr_mod:sr_audio_ioctl+372} Jan 8 11:29:24 desktop kernel: <ffffffffa0004fac>{:scsi_mod:scsi_wait_req+140} <ffffffff8023f71e>{cdrom_count_tracks+222} Jan 8 11:29:24 desktop kernel: <ffffffff8023f9a0>{cdrom_open+448} <ffffffffa00471f4>{:ext3:ext3_get_block_handle+228} Jan 8 11:29:24 desktop kernel: <ffffffff8014ef4e>{find_get_page+14} <ffffffff8016ce4c>{__find_get_block_slow+220} Jan 8 11:29:24 desktop kernel: <ffffffff8016d519>{__find_get_block+377} <ffffffff8016f87f>{__getblk+31} Jan 8 11:29:24 desktop kernel: <ffffffff801af4ba>{avc_has_perm+90} <ffffffff801af4ba>{avc_has_perm+90} Jan 8 11:29:24 desktop kernel: <ffffffff801af4ba>{avc_has_perm+90} <ffffffff801b02b4>{task_has_capability+100} Jan 8 11:29:24 desktop kernel: <ffffffff801c2992>{kobject_get+18} <ffffffff801c2992>{kobject_get+18} Jan 8 11:29:24 desktop kernel: <ffffffffa0189730>{:sr_mod:sr_block_open+176} <ffffffff8017286a>{do_open+170} Jan 8 11:29:24 desktop kernel: <ffffffff80172c5f>{blkdev_open+47} <ffffffff8016aa36>{dentry_open+230} Jan 8 11:29:24 desktop kernel: <ffffffff8016ab7e>{filp_open+62} <ffffffff8016abc7>{get_unused_fd+55} Jan 8 11:29:24 desktop kernel: <ffffffff8016ad4c>{sys_open+76} <ffffffff8010e1da>{system_call+126} Jan 8 11:29:24 desktop kernel: Thank you. Hari ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 2005-01-08 0:43 ` Srihari Vijayaraghavan @ 2005-01-08 2:11 ` Srihari Vijayaraghavan 2005-01-08 3:51 ` Alan Stern 2005-01-08 4:03 ` Alan Stern 1 sibling, 1 reply; 15+ messages in thread From: Srihari Vijayaraghavan @ 2005-01-08 2:11 UTC (permalink / raw) To: Alan Stern; +Cc: James Bottomley, USB development list, SCSI development list [-- Attachment #1: Type: text/plain, Size: 5906 bytes --] On Saturday 08 January 2005 11:43, Srihari Vijayaraghavan wrote: > ... > 1. While it is still easy to trigger this bug without USB Storage Debug > Option in vanilla 2.6.10, I cannot reproduce the bug with debug option. > Perhaps it changes some timings. (Am sorry for replying to my own email.) Though I could not trigger an OOPS with debug option yet, I have managed to trigger a D state "hald" process: Jan 8 12:41:45 desktop kernel: hald D 000001003fb24478 0 3369 1 4163 3006 (NOTLB) Jan 8 12:41:45 desktop kernel: 00000100374bb9b8 0000000000000006 000000732d933078 000001003fb241f0 Jan 8 12:41:45 desktop kernel: 00000000000b08ff 00000100371bc170 0000010037d71440 000001002e0db800 Jan 8 12:41:45 desktop kernel: 000001002d933078 00000100374bba58 Jan 8 12:41:45 desktop kernel: Call Trace:<ffffffff802c11eb>{wait_for_completion+139} <ffffffff8012dc20>{default_wake_function+0} Jan 8 12:41:45 desktop kernel: <ffffffff8012dc20>{default_wake_function+0} <ffffffffa0004f7b>{:scsi_mod:scsi_wait_req+91} Jan 8 12:41:45 desktop kernel: <ffffffffa0000038>{:scsi_mod:scsi_allocate_request+56} Jan 8 12:41:45 desktop kernel: <ffffffffa023e99a>{:sr_mod:sr_open+234} <ffffffff80110000>{do_stack_segment+128} Jan 8 12:41:45 desktop kernel: <ffffffff80240000>{cdrom_open+2080} <ffffffff8026cbfc>{alloc_skb+108} Jan 8 12:41:45 desktop kernel: <ffffffff8026b4e1>{sock_alloc_send_pskb+113} <ffffffff8026e747>{memcpy_toiovec+55} Jan 8 12:41:45 desktop kernel: <ffffffff8015f119>{find_mergeable_anon_vma+137} <ffffffff801af4ba>{avc_has_perm+90} Jan 8 12:41:45 desktop kernel: <ffffffff801af4ba>{avc_has_perm+90} <ffffffff801af4ba>{avc_has_perm+90} Jan 8 12:41:45 desktop kernel: <ffffffff801b02b4>{task_has_capability+100} <ffffffff801c2992>{kobject_get+18} Jan 8 12:41:45 desktop kernel: <ffffffff8021e61c>{get_disk+44} <ffffffff801c2992>{kobject_get+18} Jan 8 12:41:45 desktop kernel: <ffffffffa023e730>{:sr_mod:sr_block_open+176} <ffffffff8017286a>{do_open+170} Jan 8 12:41:45 desktop kernel: <ffffffff80172c5f>{blkdev_open+47} <ffffffff8016aa36>{dentry_open+230} Jan 8 12:41:45 desktop kernel: <ffffffff8016ab7e>{filp_open+62} <ffffffff8016abc7>{get_unused_fd+55} Jan 8 12:41:45 desktop kernel: <ffffffff8016ad4c>{sys_open+76} <ffffffff8010e1da>{system_call+126} Jan 8 12:41:45 desktop kernel: This (kernel badness in scsi_lib.c:1717) I have it in the system logs just before the process went in to D state: Jan 8 12:34:00 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110 Jan 8 12:34:00 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = -32) Jan 8 12:34:00 desktop kernel: usb 1-5: new high speed USB device using ehci_hcd and address 9 Jan 8 12:34:00 desktop kernel: scsi9 : SCSI emulation for USB Mass Storage devices Jan 8 12:34:05 desktop kernel: Vendor: PIONEER Model: DVD-RW DVR-107D Rev: 1.13 Jan 8 12:34:05 desktop kernel: Type: CD-ROM ANSI SCSI revision: 00 Jan 8 12:34:05 desktop kernel: sr1: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray Jan 8 12:34:05 desktop scsi.agent[6565]: cdrom at /devices/pci0000:00/0000:00:10.4/usb1/1-5/1-5:1.0/host9/target9:0:0/9:0:0:0 Jan 8 12:34:06 desktop kernel: usb 1-5: USB disconnect, address 9 Jan 8 12:34:36 desktop kernel: scsi: Device offlined - not ready after error recovery: host 9 channel 0 id 0 lun 0 Jan 8 12:34:36 desktop kernel: 9:0:0:0: Illegal state transition deleted->offline Jan 8 12:34:36 desktop kernel: Badness in scsi_device_set_state at drivers/scsi/scsi_lib.c:1717 Jan 8 12:34:36 desktop kernel: Jan 8 12:34:36 desktop kernel: Call Trace:<ffffffffa0006808>{:scsi_mod:scsi_device_set_state+264} Jan 8 12:34:36 desktop kernel: <ffffffffa000481c>{:scsi_mod:scsi_error_handler+2732} Jan 8 12:34:36 desktop kernel: <ffffffff8010ebf3>{child_rip+8} <ffffffffa0003d70>{:scsi_mod:scsi_error_handler+0} Jan 8 12:34:36 desktop kernel: <ffffffff8010ebeb>{child_rip+0} Jan 8 12:35:10 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110 Jan 8 12:35:10 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = -32) Jan 8 12:35:10 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110 Jan 8 12:35:10 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = -32) Jan 8 12:35:11 desktop kernel: usb 1-5: new high speed USB device using ehci_hcd and address 10 Jan 8 12:35:11 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110 Jan 8 12:35:11 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = -32) Jan 8 12:35:11 desktop kernel: scsi10 : SCSI emulation for USB Mass Storage devices Jan 8 12:35:15 desktop kernel: usb 1-5: USB disconnect, address 10 Jan 8 12:35:28 desktop kernel: usb 1-5: new high speed USB device using ehci_hcd and address 11 Jan 8 12:35:28 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110 Jan 8 12:35:28 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = -32) Jan 8 12:35:28 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110 Jan 8 12:35:28 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = -32) Jan 8 12:35:29 desktop kernel: scsi11 : SCSI emulation for USB Mass Storage devices Jan 8 12:35:34 desktop kernel: Vendor: PIONEER Model: DVD-RW DVR-107D Rev: 1.13 Jan 8 12:35:34 desktop kernel: Type: CD-ROM ANSI SCSI revision: 00 Jan 8 12:35:34 desktop kernel: sr2: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray Jan 8 12:35:36 desktop kernel: usb 1-5: USB disconnect, address 11 The complete copy of system logs is attached to this email. I think I have seen this "Badness in scsi_lib.c:1717" in 2.6.10 before (even without USB Storage Debug option that is). Thanks. Hari [-- Attachment #2: dmesg-2.6.10-1.bz2 --] [-- Type: application/x-bzip2, Size: 20270 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 2005-01-08 2:11 ` Srihari Vijayaraghavan @ 2005-01-08 3:51 ` Alan Stern 2005-01-09 4:26 ` Srihari Vijayaraghavan 0 siblings, 1 reply; 15+ messages in thread From: Alan Stern @ 2005-01-08 3:51 UTC (permalink / raw) To: Srihari Vijayaraghavan Cc: James Bottomley, USB development list, SCSI development list On Sat, 8 Jan 2005, Srihari Vijayaraghavan wrote: > Though I could not trigger an OOPS with debug option yet, I have managed to > trigger a D state "hald" process: > > Jan 8 12:41:45 desktop kernel: hald D 000001003fb24478 0 3369 > 1 4163 3006 (NOTLB) > Jan 8 12:41:45 desktop kernel: 00000100374bb9b8 0000000000000006 > 000000732d933078 000001003fb241f0 > Jan 8 12:41:45 desktop kernel: 00000000000b08ff 00000100371bc170 > 0000010037d71440 000001002e0db800 > Jan 8 12:41:45 desktop kernel: 000001002d933078 00000100374bba58 > Jan 8 12:41:45 desktop kernel: Call > Trace:<ffffffff802c11eb>{wait_for_completion+139} > <ffffffff8012dc20>{default_wake_function+0} > Jan 8 12:41:45 desktop kernel: > <ffffffff8012dc20>{default_wake_function+0} > <ffffffffa0004f7b>{:scsi_mod:scsi_wait_req+91} That's not surprising. It's waiting for the error handler to finish. > This (kernel badness in scsi_lib.c:1717) I have it in the system logs just > before the process went in to D state: > > Jan 8 12:34:00 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110 > Jan 8 12:34:00 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = > -32) > Jan 8 12:34:00 desktop kernel: usb 1-5: new high speed USB device using > ehci_hcd and address 9 > Jan 8 12:34:00 desktop kernel: scsi9 : SCSI emulation for USB Mass Storage > devices > Jan 8 12:34:05 desktop kernel: Vendor: PIONEER Model: DVD-RW DVR-107D > Rev: 1.13 > Jan 8 12:34:05 desktop kernel: Type: CD-ROM > ANSI SCSI revision: 00 > Jan 8 12:34:05 desktop kernel: sr1: scsi3-mmc drive: 40x/40x writer cd/rw > xa/form2 cdda tray I see you didn't have CONFIG_USB_STORAGE_DEBUG set when you ran this test. Or else you didn't have your syslog daemon configured to store the debugging messages. If you can get anything unusual to happen with the debugging enabled, it would make things much easier to track down. > Jan 8 12:34:05 desktop scsi.agent[6565]: cdrom > at /devices/pci0000:00/0000:00:10.4/usb1/1-5/1-5:1.0/host9/target9:0:0/9:0:0:0 > Jan 8 12:34:06 desktop kernel: usb 1-5: USB disconnect, address 9 > Jan 8 12:34:36 desktop kernel: scsi: Device offlined - not ready after error > recovery: host 9 channel 0 id 0 lun 0 > Jan 8 12:34:36 desktop kernel: 9:0:0:0: Illegal state transition > deleted->offline > Jan 8 12:34:36 desktop kernel: Badness in scsi_device_set_state at > drivers/scsi/scsi_lib.c:1717 > Jan 8 12:34:36 desktop kernel: > Jan 8 12:34:36 desktop kernel: Call > Trace:<ffffffffa0006808>{:scsi_mod:scsi_device_set_state+264} > Jan 8 12:34:36 desktop kernel: > <ffffffffa000481c>{:scsi_mod:scsi_error_handler+2732} > Jan 8 12:34:36 desktop kernel: <ffffffff8010ebf3>{child_rip+8} > <ffffffffa0003d70>{:scsi_mod:scsi_error_handler+0} > Jan 8 12:34:36 desktop kernel: <ffffffff8010ebeb>{child_rip+0} I thought these state transition problems had all been cleaned up? > Jan 8 12:35:10 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110 > Jan 8 12:35:10 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = > -32) > Jan 8 12:35:10 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110 > Jan 8 12:35:10 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = > -32) Those errors are quite unusual. They seem to indicate there's a problem with your computer's USB hardware, as do the numerous spontaneous (?) disconnections in your log. However I'm not familiar with all the details of the EHCI driver. It's faintly possible that plugging the drive into a different USB port would work better. If you still can't get anything with the debugging option, try running without the debugging code and rmmod ehci-hcd before plugging in the device. Maybe if you use a UHCI controller instead the errors won't occur. Alan Stern ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 2005-01-08 3:51 ` Alan Stern @ 2005-01-09 4:26 ` Srihari Vijayaraghavan 2005-01-09 17:32 ` Alan Stern 0 siblings, 1 reply; 15+ messages in thread From: Srihari Vijayaraghavan @ 2005-01-09 4:26 UTC (permalink / raw) To: Alan Stern; +Cc: James Bottomley, USB development list, SCSI development list On Saturday 08 January 2005 14:51, Alan Stern wrote: > ... > I see you didn't have CONFIG_USB_STORAGE_DEBUG set when you ran this test. Actually I did. I do not know what log level the USB debug messages were, but the silly syslog did not capture them. Now I have set it to write all kernel messages. > Or else you didn't have your syslog daemon configured to store the > debugging messages. If you can get anything unusual to happen with the > debugging enabled, it would make things much easier to track down. That is right. I am trying to trigger the bug, and I have ensured that the USB debug messages are being written to the log file. But I have not triggered the bug yet. > ... > > Jan 8 12:35:10 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error > > -110 Jan 8 12:35:10 desktop kernel: hub 1-0:1.0: hub_port_status failed > > (err = -32) > > Jan 8 12:35:10 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error > > -110 Jan 8 12:35:10 desktop kernel: hub 1-0:1.0: hub_port_status failed > > (err = -32) > > Those errors are quite unusual. They seem to indicate there's a problem > with your computer's USB hardware, as do the numerous spontaneous (?) > disconnections in your log. However I'm not familiar with all the details > of the EHCI driver. It's faintly possible that plugging the drive into a > different USB port would work better. The numerous disconnections (because without debug, I can trigger the D state process and oops within a dozen connection/disconnection, but with debug it takes 10 times more connection/disconnection to trigger D state process, and I do not know how many are needed for oops :-( ) are purposefully generated to trigger the bug. Of course UHCI (USB 1) is not as fast as EHCI (USB 2). Would that not matter for CD/DVD burning? > If you still can't get anything with the debugging option, try running > without the debugging code and rmmod ehci-hcd before plugging in the > device. Maybe if you use a UHCI controller instead the errors won't > occur. OK I shall try with UHCI. Thank you. Hari ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 2005-01-09 4:26 ` Srihari Vijayaraghavan @ 2005-01-09 17:32 ` Alan Stern 2005-01-10 10:14 ` Srihari Vijayaraghavan 0 siblings, 1 reply; 15+ messages in thread From: Alan Stern @ 2005-01-09 17:32 UTC (permalink / raw) To: Srihari Vijayaraghavan Cc: James Bottomley, USB development list, SCSI development list On Sun, 9 Jan 2005, Srihari Vijayaraghavan wrote: > On Saturday 08 January 2005 14:51, Alan Stern wrote: > > ... > > I see you didn't have CONFIG_USB_STORAGE_DEBUG set when you ran this test. > > Actually I did. I do not know what log level the USB debug messages were, but > the silly syslog did not capture them. Now I have set it to write all kernel > messages. It's a very common thing to overlook. And by the way, the log level for the debug messages is DEBUG (what a surprise!). > > Or else you didn't have your syslog daemon configured to store the > > debugging messages. If you can get anything unusual to happen with the > > debugging enabled, it would make things much easier to track down. > > That is right. I am trying to trigger the bug, and I have ensured that the USB > debug messages are being written to the log file. But I have not triggered > the bug yet. If you're unable to trigger the bug, you can try turning the debug configuration option back off and using that patch I sent earlier. It wouldn't hurt to use that patch even with the debugging messages... > > Those errors are quite unusual. They seem to indicate there's a problem > > with your computer's USB hardware, as do the numerous spontaneous (?) > > disconnections in your log. However I'm not familiar with all the details > > of the EHCI driver. It's faintly possible that plugging the drive into a > > different USB port would work better. > > The numerous disconnections (because without debug, I can trigger the D state > process and oops within a dozen connection/disconnection, but with debug it > takes 10 times more connection/disconnection to trigger D state process, and > I do not know how many are needed for oops :-( ) are purposefully generated > to trigger the bug. Oh, so you did those by hand. All right. Probably the fact that they come fairly close together is related to the bug. I wonder if something could be causing the SCSI error handler to run when it shouldn't, using an out-of-date host structure? > Of course UHCI (USB 1) is not as fast as EHCI (USB 2). Would that not matter > for CD/DVD burning? It definitely will slow things down a lot. For normal operation you would not want to rely on it. Alan Stern ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 2005-01-09 17:32 ` Alan Stern @ 2005-01-10 10:14 ` Srihari Vijayaraghavan 2005-01-10 17:33 ` Alan Stern 0 siblings, 1 reply; 15+ messages in thread From: Srihari Vijayaraghavan @ 2005-01-10 10:14 UTC (permalink / raw) To: Alan Stern; +Cc: James Bottomley, USB development list, SCSI development list [-- Attachment #1: Type: text/plain, Size: 4297 bytes --] On Monday 10 January 2005 04:32, Alan Stern wrote: > If you're unable to trigger the bug, you can try turning the debug > configuration option back off and using that patch I sent earlier. It > wouldn't hurt to use that patch even with the debugging messages... I have disabled the USB storage debug option. Your patch unfortunately did not apply cleanly, so I did it manually, and here is the correct patch: --- test/drivers/usb/storage/scsiglue.c.orig 2005-01-10 18:31:36.453932344 +1100 +++ test/drivers/usb/storage/scsiglue.c 2005-01-10 18:37:57.329030528 +1100 @@ -268,9 +268,15 @@ /* This is always called with scsi_lock(srb->host) held */ static int bus_reset(struct scsi_cmnd *srb) { - struct us_data *us = (struct us_data *)srb->device->host->hostdata[0]; + struct us_data *us; int result, rc; + printk(KERN_INFO "In bus_reset, srb: %p\n", srb); + printk(" device: %p\n", srb->device); + printk(" host: %p\n", srb->device->host); + printk(" hostdata: %p\n", (void *) srb->device->host->hostdata[0]); + + us = (struct us_data *)srb->device->host->hostdata[0]; US_DEBUGP("%s called\n", __FUNCTION__); scsi_unlock(srb->device->host); I am unable to trigger the bug, despite my sincere efforts. But I have managed to push "hald" process in to D state. Jan 10 21:06:41 desktop kernel: hald D 000001003776d938 0 3367 1 3429 3357 (NOTLB) Jan 10 21:06:41 desktop kernel: 000001003776d898 0000000000000002 000000732b17d078 000001003f771650 Jan 10 21:06:41 desktop kernel: 00000000000007fd 000001003fed2070 0000010037d71640 0000010038e3d000 Jan 10 21:06:41 desktop kernel: 000001002b17d078 000001003776d938 Jan 10 21:06:41 desktop kernel: Call Trace:<ffffffff802c11eb>{wait_for_completion+139} <ffffffff8012dc20>{default_wake_function+0} Jan 10 21:06:41 desktop kernel: <ffffffff8012dc20>{default_wake_function+0} <ffffffffa0004f7b>{:scsi_mod:scsi_wait_req+91} Jan 10 21:06:41 desktop kernel: <ffffffffa0000038>{:scsi_mod:scsi_allocate_request+56} Jan 10 21:06:41 desktop kernel: <ffffffffa023a33c>{:sr_mod:sr_do_ioctl+156} <ffffffffa023a654>{:sr_mod:sr_audio_ioctl+372} Jan 10 21:06:41 desktop kernel: <ffffffffa0004fac>{:scsi_mod:scsi_wait_req+140} <ffffffff8023f71e>{cdrom_count_tracks+222} Jan 10 21:06:41 desktop kernel: <ffffffff8023f9a0>{cdrom_open+448} <ffffffffa00471f4>{:ext3:ext3_get_block_handle+228} Jan 10 21:06:41 desktop kernel: <ffffffff8014ef4e>{find_get_page+14} <ffffffff8016ce4c>{__find_get_block_slow+220} Jan 10 21:06:41 desktop kernel: <ffffffff8016d519>{__find_get_block+377} <ffffffff8016f87f>{__getblk+31} Jan 10 21:06:41 desktop kernel: <ffffffff801af4ba>{avc_has_perm+90} <ffffffff801af4ba>{avc_has_perm+90} Jan 10 21:06:41 desktop kernel: <ffffffff801af4ba>{avc_has_perm+90} <ffffffff801b02b4>{task_has_capability+100} Jan 10 21:06:41 desktop kernel: <ffffffff801c2992>{kobject_get+18} <ffffffff801c2992>{kobject_get+18} Jan 10 21:06:41 desktop kernel: <ffffffffa0239730>{:sr_mod:sr_block_open+176} <ffffffff8017286a>{do_open+170} Jan 10 21:06:41 desktop kernel: <ffffffff80172c5f>{blkdev_open+47} <ffffffff8016aa36>{dentry_open+230} Jan 10 21:06:41 desktop kernel: <ffffffff8016ab7e>{filp_open+62} <ffffffff8016abc7>{get_unused_fd+55} Jan 10 21:06:41 desktop kernel: <ffffffff8016ad4c>{sys_open+76} <ffffffff8010e1da>{system_call+126} Jan 10 21:06:41 desktop kernel: Please refer to the attachment for the complete kernel log. I think I understand the problem (conceptually I guess): 1. Connect the USB DVD-RW drive 2. USB Storage says: "usb-storage: device found at nnn" and "usb-storage: waiting for device to settle before scanning" 3. And then it waits for 5+ seconds 4. Then it initialises the peripheral and then says: "Vendor: PIONEER Model: DVD-RW DVR-107D Rev: 1.13 ..." and "usb-storage: device scan complete" I think if I unplug the device precisely the same moment when it initialises (step 4), and by time that routine completes kernel is really unhappy (D state process, Badness in scsi_device_set_state at drivers/scsi/scsi_lib.c:1717, and/or oops) to discover that the drive is not there anymore :-). I do not know what else to do. Thank you. Hari [-- Attachment #2: kernel.log.bz2 --] [-- Type: application/x-bzip2, Size: 6591 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 2005-01-10 10:14 ` Srihari Vijayaraghavan @ 2005-01-10 17:33 ` Alan Stern 2005-01-10 17:39 ` Greg KH 2005-01-10 20:14 ` Mike Anderson 0 siblings, 2 replies; 15+ messages in thread From: Alan Stern @ 2005-01-10 17:33 UTC (permalink / raw) To: James Bottomley, Srihari Vijayaraghavan, Greg KH Cc: USB development list, SCSI development list If James Bottomley, Mike Anderson, or anyone else from linux-scsi can comment on the material below, I would appreciate it. On Mon, 10 Jan 2005, Srihari Vijayaraghavan wrote: > I am unable to trigger the bug, despite my sincere efforts. If nothing else, I have to admire your persistence in plugging and unplugging the drive 120 times over the course of a half-hour! > Please refer to the attachment for the complete kernel log. > > I think I understand the problem (conceptually I guess): > 1. Connect the USB DVD-RW drive > 2. USB Storage says: "usb-storage: device found at nnn" and "usb-storage: > waiting for device to settle before scanning" > 3. And then it waits for 5+ seconds > 4. Then it initialises the peripheral and then says: "Vendor: PIONEER Model: > DVD-RW DVR-107D Rev: 1.13 ..." and "usb-storage: device scan complete" That's right. But there's more going on behind the scenes that you aren't aware of. Hotplug programs, haldaemon, automounters, and who knows what else. > I think if I unplug the device precisely the same moment when it initialises > (step 4), and by time that routine completes kernel is really unhappy (D > state process, Badness in scsi_device_set_state at > drivers/scsi/scsi_lib.c:1717, and/or oops) to discover that the drive is not > there anymore :-). It's not that simple. Your log contained two different sorts of errors. The first occurred right near the start: Jan 10 20:24:46 desktop kernel: usb 1-3: USB disconnect, address 3 Jan 10 20:24:46 desktop kernel: 50-fstab-sync.h[5147]: segfault at 0000000000000000 rip 000000379706f6b2 rsp 0000007fbffff388 error 4 Jan 10 20:24:46 desktop kernel: 50-fstab-sync.h[5161]: segfault at 0000000000000000 rip 000000379706f6b2 rsp 0000007fbffff6d8 error 4 This looks like a problem in a user program, not in the kernel. I have no idea what 50-fstab-sync.h is. Part of udev perhaps? Greg, do you know? The second problem occurred during a period in which you replugged the drive three times! Here's the relevant part of the log: Jan 10 20:48:37 desktop kernel: usb 1-3: new high speed USB device using ehci_hcd and address 111 Jan 10 20:48:37 desktop kernel: scsi111 : SCSI emulation for USB Mass Storage devices Jan 10 20:48:37 desktop kernel: usb-storage: device found at 111 Jan 10 20:48:37 desktop kernel: usb-storage: waiting for device to settle before scanning Jan 10 20:48:42 desktop kernel: Vendor: PIONEER Model: DVD-RW DVR-107D Rev: 1.13 Jan 10 20:48:42 desktop kernel: Type: CD-ROM ANSI SCSI revision: 00 Jan 10 20:48:42 desktop kernel: sr1: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray Jan 10 20:48:42 desktop kernel: Attached scsi CD-ROM sr1 at scsi111, channel 0, id 0, lun 0 Jan 10 20:48:42 desktop kernel: usb-storage: device scan complete Jan 10 20:48:42 desktop kernel: usb 1-3: USB disconnect, address 111 That looks okay. But you don't see the hotplug etc. programs that are still running, keeping the device below scsi111 open. Jan 10 20:48:56 desktop kernel: usb 1-3: new high speed USB device using ehci_hcd and address 112 Jan 10 20:48:56 desktop kernel: scsi112 : SCSI emulation for USB Mass Storage devices Jan 10 20:48:56 desktop kernel: usb-storage: device found at 112 Jan 10 20:48:56 desktop kernel: usb-storage: waiting for device to settle before scanning Jan 10 20:49:01 desktop kernel: usb-storage: device scan complete Jan 10 20:49:01 desktop kernel: usb 1-3: USB disconnect, address 112 That's normal also. Host scsi112 was created and apparently destroyed. Jan 10 20:49:08 desktop kernel: usb 1-3: new high speed USB device using ehci_hcd and address 113 Jan 10 20:49:08 desktop kernel: scsi113 : SCSI emulation for USB Mass Storage devices Jan 10 20:49:08 desktop kernel: usb-storage: device found at 113 Jan 10 20:49:08 desktop kernel: usb-storage: waiting for device to settle before scanning Jan 10 20:49:12 desktop kernel: In bus_reset, srb: 0000010037d71640 Jan 10 20:49:12 desktop kernel: device: 0000010038e3d000 Jan 10 20:49:12 desktop kernel: host: 000001002b500c00 Jan 10 20:49:12 desktop kernel: hostdata: 000001002b549c00 This is the first indication of something funny. I can't think how a bus reset would get initiated by the kernel -- that only happens after a timeout. Maybe it came from a user program? Jan 10 20:49:12 desktop kernel: ehci_hcd 0000:00:10.4: port 3 reset error -110 Jan 10 20:49:12 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = -32) Jan 10 20:49:13 desktop kernel: usb 1-3: reset high speed USB device using ehci_hcd and address 113 Jan 10 20:49:13 desktop kernel: scsi: Device offlined - not ready after error recovery: host 111 channel 0 id 0 lun 0 Jan 10 20:49:13 desktop kernel: 111:0:0:0: Illegal state transition deleted->offline Jan 10 20:49:13 desktop kernel: Badness in scsi_device_set_state at drivers/scsi/scsi_lib.c:1717 Jan 10 20:49:13 desktop kernel: Jan 10 20:49:13 desktop kernel: Call Trace:<ffffffffa0006808>{:scsi_mod:scsi_device_set_state+264} Jan 10 20:49:13 desktop kernel: <ffffffffa000481c>{:scsi_mod:scsi_error_handler+2732} Jan 10 20:49:13 desktop kernel: <ffffffff8010ebf3>{child_rip+8} <ffffffffa0003d70>{:scsi_mod:scsi_error_handler+0} Jan 10 20:49:13 desktop kernel: <ffffffff8010ebeb>{child_rip+0} And here's the weird part. Apparently the bus reset was not for the current device attached to scsi113, but for the old incarnation attached to scsi111 -- which is long since gone! That accounts for the illegal state transition. But why did it happen like this in the first place? Could it be caused by a user program doing an SG_SCSI_RESET ioctl while still holding the device open after scsi_remove_host has finished? Jan 10 20:49:13 desktop kernel: usb 1-3: USB disconnect, address 113 Jan 10 20:49:18 desktop kernel: In bus_reset, srb: 0000010037d71240 Jan 10 20:49:18 desktop kernel: device: 0000010038f09000 Jan 10 20:49:18 desktop kernel: host: 000001002b500400 Jan 10 20:49:18 desktop kernel: hostdata: 000001002b549c00 Jan 10 20:49:18 desktop kernel: scsi: Device offlined - not ready after error recovery: host 113 channel 0 id 0 lun 0 Jan 10 20:49:18 desktop kernel: usb-storage: device scan complete Now it's not surprising that the reset, which was intended for scsi111, has interfered with the proper operation of scsi113. Alan Stern ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 2005-01-10 17:33 ` Alan Stern @ 2005-01-10 17:39 ` Greg KH 2005-01-10 20:14 ` Mike Anderson 1 sibling, 0 replies; 15+ messages in thread From: Greg KH @ 2005-01-10 17:39 UTC (permalink / raw) To: Alan Stern Cc: James Bottomley, Srihari Vijayaraghavan, USB development list, SCSI development list On Mon, Jan 10, 2005 at 12:33:49PM -0500, Alan Stern wrote: > It's not that simple. Your log contained two different sorts of errors. > The first occurred right near the start: > > Jan 10 20:24:46 desktop kernel: usb 1-3: USB disconnect, address 3 > Jan 10 20:24:46 desktop kernel: 50-fstab-sync.h[5147]: segfault at 0000000000000000 rip 000000379706f6b2 rsp 0000007fbffff388 error 4 > Jan 10 20:24:46 desktop kernel: 50-fstab-sync.h[5161]: segfault at 0000000000000000 rip 000000379706f6b2 rsp 0000007fbffff6d8 error 4 > > This looks like a problem in a user program, not in the kernel. I have no > idea what 50-fstab-sync.h is. Part of udev perhaps? Greg, do you know? I think it might be part of HAL, it's not part of udev. Ah yes, it's at /etc/hal/device.d/50-fstab-sync.hal on my box. thanks, greg k-h ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 2005-01-10 17:33 ` Alan Stern 2005-01-10 17:39 ` Greg KH @ 2005-01-10 20:14 ` Mike Anderson 2005-01-10 22:29 ` [linux-usb-devel] " Alan Stern 1 sibling, 1 reply; 15+ messages in thread From: Mike Anderson @ 2005-01-10 20:14 UTC (permalink / raw) To: Alan Stern Cc: James Bottomley, Srihari Vijayaraghavan, Greg KH, USB development list, SCSI development list Alan Stern [stern@rowland.harvard.edu] wrote: > Jan 10 20:49:08 desktop kernel: scsi113 : SCSI emulation for USB Mass Storage devices > Jan 10 20:49:08 desktop kernel: usb-storage: device found at 113 > Jan 10 20:49:08 desktop kernel: usb-storage: waiting for device to settle before scanning > Jan 10 20:49:12 desktop kernel: In bus_reset, srb: 0000010037d71640 > Jan 10 20:49:12 desktop kernel: device: 0000010038e3d000 > Jan 10 20:49:12 desktop kernel: host: 000001002b500c00 > Jan 10 20:49:12 desktop kernel: hostdata: 000001002b549c00 > > This is the first indication of something funny. I can't think how a bus > reset would get initiated by the kernel -- that only happens after a > timeout. Maybe it came from a user program? Well it can get initiated in the scsi error handler. This can be verified by setting the logging_level (sysctl -w dev.scsi.logging_level=0x4). You probably do not need to do this as I assume what is happening is that with all the plug and unplug that is being initiated that a IO is in flight when the scsi_remove_host functions is called. It looks like we still have to decide on the removal cleanup. Currently scsi_host_cancel being called from scsi_remove_host appears to not do anything as scsi_forget_host removes the devices from the list it iterates over. Also as previously mentioned by you there are races with scsi_mid canceling IOs that might be completing. Both of these issue where mentioned in an old thread (link provided below), but not completely resolved. http://marc.theaimsgroup.com/?t=109630920600005&r=1&w=2 What is needed is for the LLDD to ensure all IO sent through the LLDD's queuecommand functions gets a scsi_done call (even a fail call). The scsi mid applies choke points to stop IO being set to the LLDDs queuecommand, but there can be races during a host removal (you previously mentioned this). This would also imply that scsi mid should not be making the call to scsi_host_cancel (which appears to not do anything anyway). -andmike -- Michael Anderson andmike@us.ibm.com ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 2005-01-10 20:14 ` Mike Anderson @ 2005-01-10 22:29 ` Alan Stern 0 siblings, 0 replies; 15+ messages in thread From: Alan Stern @ 2005-01-10 22:29 UTC (permalink / raw) To: Mike Anderson Cc: James Bottomley, Srihari Vijayaraghavan, Greg KH, USB development list, SCSI development list On Mon, 10 Jan 2005, Mike Anderson wrote: > > This is the first indication of something funny. I can't think how a bus > > reset would get initiated by the kernel -- that only happens after a > > timeout. Maybe it came from a user program? > > Well it can get initiated in the scsi error handler. This can be verified > by setting the logging_level (sysctl -w dev.scsi.logging_level=0x4). It could be, but it's very unlikely. Especially since the error handler only does a bus reset in response to repeated timeouts (which don't happen here; commands either succeed or quickly get an error) and since the host has already been removed. > You probably do not need to do this as I assume what is happening is that > with all the plug and unplug that is being initiated that a IO is in > flight when the scsi_remove_host functions is called. It looks like we > still have to decide on the removal cleanup. Currently scsi_host_cancel > being called from scsi_remove_host appears to not do anything as > scsi_forget_host removes the devices from the list it iterates over. That could easily be the problem. > Also > as previously mentioned by you there are races with scsi_mid canceling IOs > that might be completing. Both of these issue where mentioned in an old > thread (link provided below), but not completely resolved. > > http://marc.theaimsgroup.com/?t=109630920600005&r=1&w=2 > > What is needed is for the LLDD to ensure all IO sent through the LLDD's > queuecommand functions gets a scsi_done call (even a fail call). The scsi > mid applies choke points to stop IO being set to the LLDDs queuecommand, > but there can be races during a host removal (you previously mentioned > this). This would also imply that scsi mid should not be making the call > to scsi_host_cancel (which appears to not do anything anyway). This _has_ been resolved in usb-storage. Every command sent to the queuecommand routine either gets executed and sent to scsi_done or else is rejected immediately with a failure call to scsi_done. The exception is commands that were aborted through the command_abort routine; scsi_done is not called for them. My concern is about reset requests coming in through an ioctl call; they don't pass through queuecommand but are sent directly to the bus_reset routine. Hence they aren't subject to the choke point treatment you describe. Alan Stern ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 2005-01-08 0:43 ` Srihari Vijayaraghavan 2005-01-08 2:11 ` Srihari Vijayaraghavan @ 2005-01-08 4:03 ` Alan Stern 1 sibling, 0 replies; 15+ messages in thread From: Alan Stern @ 2005-01-08 4:03 UTC (permalink / raw) To: Srihari Vijayaraghavan Cc: James Bottomley, USB development list, SCSI development list On Sat, 8 Jan 2005, Srihari Vijayaraghavan wrote: > Alan, > > I have made a couple of observations: > 1. While it is still easy to trigger this bug without USB Storage Debug Option > in vanilla 2.6.10, I cannot reproduce the bug with debug option. Perhaps it > changes some timings. Certainly it changes timings. > (Without debug option, within a dozen plug/unplug events I can reliably > trigger the bug, but OTOH with debug options I cannot even after 50 events. > Maybe I should run my desktop with debug option for better stability. :-)) > > 2. When running without debug option, the D state "hald" process appears > first, and upon unplugging the drive an OOPS appears on the very next > plugging of the drive. It sounds like the problem occurs in hald, and the later oops is just a symptom whose cause came much earlier. > Here is the new OOPS: > > usb-storage: device found at 7 > usb-storage: waiting for device to settle before scanning > Unable to handle kernel paging request at 0000001600000019 RIP: > <ffffffffa0108259>{:usb_storage:bus_reset+73} I don't suppose you're familiar enough with 64-bit assembler to be able to decode the low-level instruction listing and tell me which line of the bus_reset routine corresponds to offset 73? Failing that, maybe you can try applying this patch. It will print some extra debugging information in the system log, even with CONFIG_USB_STORAGE_DEBUG not set. Alan Stern --- 2.6.10-orig/drivers/usb/storage/scsiglue.c Mon Jan 3 09:39:36 2005 +++ 2.6.10/drivers/usb/storage/scsiglue.c Fri Jan 7 22:59:51 2005 @@ -268,9 +268,15 @@ /* This is always called with scsi_lock(srb->host) held */ static int bus_reset(struct scsi_cmnd *srb) { - struct us_data *us = (struct us_data *)srb->device->host->hostdata[0]; + struct us_data *us; int result, rc; + printk(KERN_INFO "In bus_reset, srb: %p\n", srb); + printk(" device: %p\n", srb->device); + printk(" host: %p\n", srb->device->host); + printk(" hostdata: %p\n", (void *) srb->device->host->hostdata[0]); + + us = (struct us_data *)srb->device->host->hostdata[0]; US_DEBUGP("%s called\n", __FUNCTION__); scsi_unlock(srb->device->host); ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2005-01-10 22:29 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-01-05 8:14 Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 Andrew Morton 2005-01-05 14:17 ` James Bottomley 2005-01-05 15:50 ` Alan Stern 2005-01-06 10:38 ` [linux-usb-devel] " Srihari Vijayaraghavan 2005-01-08 0:43 ` Srihari Vijayaraghavan 2005-01-08 2:11 ` Srihari Vijayaraghavan 2005-01-08 3:51 ` Alan Stern 2005-01-09 4:26 ` Srihari Vijayaraghavan 2005-01-09 17:32 ` Alan Stern 2005-01-10 10:14 ` Srihari Vijayaraghavan 2005-01-10 17:33 ` Alan Stern 2005-01-10 17:39 ` Greg KH 2005-01-10 20:14 ` Mike Anderson 2005-01-10 22:29 ` [linux-usb-devel] " Alan Stern 2005-01-08 4:03 ` Alan Stern
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox