public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
@ 2005-01-05  8:14 Andrew Morton
  2005-01-05 14:17 ` James Bottomley
  2005-01-05 15:50 ` Alan Stern
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2005-01-05  8:14 UTC (permalink / raw)
  To: linux-usb-devel, linux-scsi


Looks like the scsi-vs-usb-storage bunfight isn't yet settled...


Begin forwarded message:

Date: Fri, 31 Dec 2004 14:44:44 +1100
From: Srihari Vijayaraghavan <harisri@internode.on.net>
To: linux-kernel@vger.kernel.org
Subject: [BUG] USB Storage OOPS and a D state process in 2.6.10


Can easily reproduce this by plugging and unplugging an external USB DVD-RW a 
few times on my Athlon 64 desktop:

usb-storage: device scan complete
usb 1-2: USB disconnect, address 22
Unable to handle kernel paging request at 000000000044c8c6 RIP: 
<ffffffffa0108252>{:usb_storage:bus_reset+66}
PML4 331b9067 PGD 331ba067 PMD 331bb067 PTE 0
Oops: 0000 [1] 
CPU 0 
Modules linked in: radeon ipt_conntrack ip_conntrack nfsd exportfs lockd 
autofs4 sunrpc iptable_filter ip_tables af_packet sr_mod dm_mod video button 
ohci1394 ieee1394 usb_storage uhci_hcd ehci_hcd snd_via82xx snd_ac97_codec 
snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart 
snd_rawmidi snd_seq_device snd soundcore via_rhine mii r8169 crc32 floppy 
unix ext3 mbcache jbd sata_via libata sd_mod scsi_mod
Pid: 8843, comm: scsi_eh_22 Not tainted 2.6.10
RIP: 0010:[<ffffffffa0108252>] <ffffffffa0108252>{:usb_storage:bus_reset+66}
RSP: 0018:000001002f9a7e68  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 000001002f8ec200 RCX: 0000010037d6f670
RDX: 0000000000002003 RSI: 0000000000000000 RDI: 000000000044c68e
RBP: 00000000fffffff0 R08: 0000000000000000 R09: 0000000000000001
R10: 00000000ffffffff R11: 0000000000000000 R12: 0000010037d6f670
R13: 0000010036acf400 R14: 000001002f6c4c20 R15: 0000000000000001
FS:  0000002a955792c0(0000) GS:ffffffff803c9f00(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000044c8c6 CR3: 0000000000101000 CR4: 00000000000006e0
Process scsi_eh_22 (pid: 8843, threadinfo 000001002f9a6000, task 
00000100369f70f0)
Stack: 0000010037d6f640 0000010037d6f640 0000010037d6f670 ffffffffa0003c31 
       0000000000000282 0000000000000000 0000010037d6f640 ffffffffa000463b 
       ff000000ff000000 ff000000ff000000 
Call Trace:<ffffffffa0003c31>{:scsi_mod:scsi_try_bus_reset+113} 
       <ffffffffa000463b>{:scsi_mod:scsi_error_handler+2251} 
       <ffffffff8010ebf3>{child_rip+8} 
<ffffffffa0003d70>{:scsi_mod:scsi_error_handler+0} 
       <ffffffff8010ebeb>{child_rip+0} 

Code: 48 8b 87 38 02 00 00 80 78 04 01 75 31 48 8b 73 20 e8 e8 c8 
RIP <ffffffffa0108252>{:usb_storage:bus_reset+66} RSP <000001002f9a7e68>
CR2: 000000000044c8c6
 <6>agpgart: Found an AGP 3.0 compliant device at 0000:00:00.0.

And a D state hald process:

Dec 31 13:52:09 desktop kernel: hald          D 00000100376c3938     0  3677      
1          4600  2304 (NOTLB)
Dec 31 13:52:09 desktop kernel: 00000100376c3898 0000000000000006 
000000732f5c80b8 000001003f7555d0 
Dec 31 13:52:09 desktop kernel:        0000000000000736 000001003fed2070 
0000010037d6f640 00000100368f8000 
Dec 31 13:52:09 desktop kernel:        000001002f5c80b8 00000100376c3938 
Dec 31 13:52:09 desktop kernel: Call 
Trace:<ffffffff802c11eb>{wait_for_completion+139} 
<ffffffff8012dc20>{default_wake_function+0} 
Dec 31 13:52:09 desktop kernel:        
<ffffffff8012dc20>{default_wake_function+0} 
<ffffffffa0004f7b>{:scsi_mod:scsi_wait_req+91} 
Dec 31 13:52:09 desktop kernel:        
<ffffffffa0000038>{:scsi_mod:scsi_allocate_request+56} 
Dec 31 13:52:09 desktop kernel:        
<ffffffffa018a33c>{:sr_mod:sr_do_ioctl+156} 
<ffffffffa018a654>{:sr_mod:sr_audio_ioctl+372} 
Dec 31 13:52:09 desktop kernel:        
<ffffffffa0004fac>{:scsi_mod:scsi_wait_req+140} 
<ffffffff8023f71e>{cdrom_count_tracks+222} 
Dec 31 13:52:09 desktop kernel:        <ffffffff8023f9a0>{cdrom_open+448} 
<ffffffffa00471f4>{:ext3:ext3_get_block_handle+228} 
Dec 31 13:52:09 desktop kernel:        <ffffffff8014ef4e>{find_get_page+14} 
<ffffffff8016ce4c>{__find_get_block_slow+220} 
Dec 31 13:52:09 desktop kernel:        
<ffffffff8016d519>{__find_get_block+377} <ffffffff8016f87f>{__getblk+31} 
Dec 31 13:52:09 desktop kernel:        <ffffffff801af4ba>{avc_has_perm+90} 
<ffffffff801af4ba>{avc_has_perm+90} 
Dec 31 13:52:09 desktop kernel:        <ffffffff801af4ba>{avc_has_perm+90} 
<ffffffff801b02b4>{task_has_capability+100} 
Dec 31 13:52:09 desktop kernel:        <ffffffff801c2992>{kobject_get+18} 
<ffffffff801c2992>{kobject_get+18} 
Dec 31 13:52:09 desktop kernel:        
<ffffffffa0189730>{:sr_mod:sr_block_open+176} <ffffffff8017286a>{do_open+170} 
Dec 31 13:52:09 desktop kernel:        <ffffffff80172c5f>{blkdev_open+47} 
<ffffffff8016aa36>{dentry_open+230} 
Dec 31 13:52:09 desktop kernel:        <ffffffff8016ab7e>{filp_open+62} 
<ffffffff8016abc7>{get_unused_fd+55} 
Dec 31 13:52:09 desktop kernel:        <ffffffff8016ad4c>{sys_open+76} 
<ffffffff8010e1da>{system_call+126} 
Dec 31 13:52:09 desktop kernel:        

Thanks
Hari.

PS: I am not subscribed to LKML, so please cc me on replies.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
  2005-01-05  8:14 Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 Andrew Morton
@ 2005-01-05 14:17 ` James Bottomley
  2005-01-05 15:50 ` Alan Stern
  1 sibling, 0 replies; 15+ messages in thread
From: James Bottomley @ 2005-01-05 14:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-usb-devel, SCSI Mailing List

On Wed, 2005-01-05 at 00:14 -0800, Andrew Morton wrote:
> Looks like the scsi-vs-usb-storage bunfight isn't yet settled...

Actually, this one looks pretty clear cut:

According to the trace, we get into error handling for a removed device.
This:

> RIP <ffffffffa0108252>{:usb_storage:bus_reset+66} RSP <000001002f9a7e68>

says that the usb_storage bus_reset routine never returns.

So the system is hung in error handling and the kernel is waiting for
everything to return to normal.  The fix is to work out why usb_storage
hung there instead of returning success or fail.

James



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
  2005-01-05  8:14 Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 Andrew Morton
  2005-01-05 14:17 ` James Bottomley
@ 2005-01-05 15:50 ` Alan Stern
  2005-01-06 10:38   ` [linux-usb-devel] " Srihari Vijayaraghavan
  2005-01-08  0:43   ` Srihari Vijayaraghavan
  1 sibling, 2 replies; 15+ messages in thread
From: Alan Stern @ 2005-01-05 15:50 UTC (permalink / raw)
  To: Srihari Vijayaraghavan, James Bottomley
  Cc: USB development list, SCSI development list

On Wed, 5 Jan 2005, Andrew Morton wrote:

> Looks like the scsi-vs-usb-storage bunfight isn't yet settled...
> 
> 
> Begin forwarded message:
> 
> Date: Fri, 31 Dec 2004 14:44:44 +1100
> From: Srihari Vijayaraghavan <harisri@internode.on.net>
> To: linux-kernel@vger.kernel.org
> Subject: [BUG] USB Storage OOPS and a D state process in 2.6.10
> 
> 
> Can easily reproduce this by plugging and unplugging an external USB DVD-RW a 
> few times on my Athlon 64 desktop:
> 
> usb-storage: device scan complete
> usb 1-2: USB disconnect, address 22
> Unable to handle kernel paging request at 000000000044c8c6 RIP: 
> <ffffffffa0108252>{:usb_storage:bus_reset+66}

> Process scsi_eh_22 (pid: 8843, threadinfo 000001002f9a6000, task 
> 00000100369f70f0)
> Stack: 0000010037d6f640 0000010037d6f640 0000010037d6f670 ffffffffa0003c31 
>        0000000000000282 0000000000000000 0000010037d6f640 ffffffffa000463b 
>        ff000000ff000000 ff000000ff000000 
> Call Trace:<ffffffffa0003c31>{:scsi_mod:scsi_try_bus_reset+113} 
>        <ffffffffa000463b>{:scsi_mod:scsi_error_handler+2251} 
>        <ffffffff8010ebf3>{child_rip+8} 
> <ffffffffa0003d70>{:scsi_mod:scsi_error_handler+0} 
>        <ffffffff8010ebeb>{child_rip+0} 

> And a D state hald process:


On Wed, 5 Jan 2005, James Bottomley wrote:

> Actually, this one looks pretty clear cut:
>
> According to the trace, we get into error handling for a removed device.
> This:
>
> > RIP <ffffffffa0108252>{:usb_storage:bus_reset+66} RSP 
<000001002f9a7e68>
>
> says that the usb_storage bus_reset routine never returns.
>
> So the system is hung in error handling and the kernel is waiting for
> everything to return to normal.  The fix is to work out why usb_storage
> hung there instead of returning success or fail.


I agree with James, although it's clear from the oops report why the 
routine doesn't return: It tries to reference an invalid address.

Not being at all familiar with 64-bit code, I can't tell exactly where the
invalid reference occurred.  It looks like it's near the start of the
bus_reset routine, but that doesn't make sense because none of the data
structures referred to there should have been deallocated yet.  That 
doesn't happen until after scsi_remove_host returns, and once that happens 
the error handler shouldn't run at all.

Another question is why we get into the error handler in the first place.  
No way to tell from the information provided, but I suspect the haldaemon 
process isn't doing what it should.

Srihari, can you try reproducing this after rebuilding the usb-storage 
driver with CONFIG_USB_STORAGE_DEBUG=y?

Alan Stern



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
  2005-01-05 15:50 ` Alan Stern
@ 2005-01-06 10:38   ` Srihari Vijayaraghavan
  2005-01-08  0:43   ` Srihari Vijayaraghavan
  1 sibling, 0 replies; 15+ messages in thread
From: Srihari Vijayaraghavan @ 2005-01-06 10:38 UTC (permalink / raw)
  To: Alan Stern; +Cc: James Bottomley, USB development list, SCSI development list

On Thursday 06 January 2005 02:50, Alan Stern wrote:
> ...
> Srihari, can you try reproducing this after rebuilding the usb-storage
> driver with CONFIG_USB_STORAGE_DEBUG=y?

Alan,

Sure, I shall try to trigger this bug with the debug option.

(But I am afraid it would take a couple of days, as that USB peripheral 
belongs to a friend, who has a need for it ATM, and I shall borrow it off 
him, when it is available.)

Thanks.
Hari

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
  2005-01-05 15:50 ` Alan Stern
  2005-01-06 10:38   ` [linux-usb-devel] " Srihari Vijayaraghavan
@ 2005-01-08  0:43   ` Srihari Vijayaraghavan
  2005-01-08  2:11     ` Srihari Vijayaraghavan
  2005-01-08  4:03     ` Alan Stern
  1 sibling, 2 replies; 15+ messages in thread
From: Srihari Vijayaraghavan @ 2005-01-08  0:43 UTC (permalink / raw)
  To: Alan Stern; +Cc: James Bottomley, USB development list, SCSI development list

On Thursday 06 January 2005 02:50, Alan Stern wrote:
> ...
> Srihari, can you try reproducing this after rebuilding the usb-storage
> driver with CONFIG_USB_STORAGE_DEBUG=y?

Alan,

I have made a couple of observations:
1. While it is still easy to trigger this bug without USB Storage Debug Option 
in vanilla 2.6.10, I cannot reproduce the bug with debug option. Perhaps it 
changes some timings.
(Without debug option, within a dozen plug/unplug events I can reliably 
trigger the bug, but OTOH with debug options I cannot even after 50 events. 
Maybe I should run my desktop with debug option for better stability. :-))

2. When running without debug option, the D state "hald" process appears 
first, and upon unplugging the drive an OOPS appears on the very next 
plugging of the drive.

Here is the new OOPS:

usb-storage: device found at 7
usb-storage: waiting for device to settle before scanning
Unable to handle kernel paging request at 0000001600000019 RIP:
<ffffffffa0108259>{:usb_storage:bus_reset+73}
PML4 32fca067 PGD 0
Oops: 0000 [1]
CPU 0
Modules linked in: radeon ipt_conntrack ip_conntrack nfsd exportfs lockd 
autofs4 sunrpc iptable_filter ip_tables af_packet sr_mod dm_mod video button 
ohci1394 ieee1394 usb_storage uhci_hcd ehci_hcd snd_via82xx snd_ac97_codec 
snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart 
snd_rawmidi snd_seq_device snd soundcore via_rhine mii r8169 crc32 floppy 
unix ext3 mbcache jbd sata_via libata sd_mod scsi_mod
Pid: 5435, comm: scsi_eh_6 Not tainted 2.6.10
RIP: 0010:[<ffffffffa0108259>] <ffffffffa0108259>{:usb_storage:bus_reset+73}
RSP: 0018:000001002fedde68  EFLAGS: 00010246
RAX: 0000001600000015 RBX: 0000010034fb7c00 RCX: 0000010037d73270
RDX: 0000000000002003 RSI: 0000000000000000 RDI: 0000003796e00000
RBP: 00000000fffffff0 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000000 R12: 0000010037d73270
R13: 000001003e4cc800 R14: 000001003572ac20 R15: 0000000000000001
FS:  0000002a955792c0(0000) GS:ffffffff803c9f00(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000001600000019 CR3: 0000000000101000 CR4: 00000000000006e0
Process scsi_eh_6 (pid: 5435, threadinfo 000001002fedc000, task 
0000010036f1d650)
Stack: 0000010037d73240 0000010037d73240 0000010037d73270 ffffffffa0003c31
       0000000000000282 0000000000000000 0000010037d73240 ffffffffa000463b
       ff000000ff000000 ff000000ff000000
Call Trace:<ffffffffa0003c31>{:scsi_mod:scsi_try_bus_reset+113}
       <ffffffffa000463b>{:scsi_mod:scsi_error_handler+2251}
       <ffffffff8010ebf3>{child_rip+8} 
<ffffffffa0003d70>{:scsi_mod:scsi_error_handler+0}
       <ffffffff8010ebeb>{child_rip+0}

Code: 80 78 04 01 75 31 48 8b 73 20 e8 e8 c8 13 e0 85 c0 41 89 c4
RIP <ffffffffa0108259>{:usb_storage:bus_reset+73} RSP <000001002fedde68>
CR2: 0000001600000019
 <6>usb 1-5: USB disconnect, address 7


And the D state "hald" process:

Jan  8 11:29:24 desktop kernel: hald          D 0000010037507938     0  3539      
1          4354  3176 (NOTLB)
Jan  8 11:29:24 desktop kernel: 0000010037507898 0000000000000006 
000000732f812078 000001003811c0b0
Jan  8 11:29:24 desktop kernel:        0000000000003d15 000001003fd73490 
0000010037d73240 000001003eb86000
Jan  8 11:29:24 desktop kernel:        000001002f812078 0000010037507938
Jan  8 11:29:24 desktop kernel: Call 
Trace:<ffffffff802c11eb>{wait_for_completion+139} 
<ffffffff8012dc20>{default_wake_function+0}
Jan  8 11:29:24 desktop kernel:        
<ffffffff8012dc20>{default_wake_function+0} 
<ffffffffa0004f7b>{:scsi_mod:scsi_wait_req+91}
Jan  8 11:29:24 desktop kernel:        
<ffffffffa0000038>{:scsi_mod:scsi_allocate_request+56}
Jan  8 11:29:24 desktop kernel:        
<ffffffffa018a33c>{:sr_mod:sr_do_ioctl+156} 
<ffffffffa018a654>{:sr_mod:sr_audio_ioctl+372}
Jan  8 11:29:24 desktop kernel:        
<ffffffffa0004fac>{:scsi_mod:scsi_wait_req+140} 
<ffffffff8023f71e>{cdrom_count_tracks+222}
Jan  8 11:29:24 desktop kernel:        <ffffffff8023f9a0>{cdrom_open+448} 
<ffffffffa00471f4>{:ext3:ext3_get_block_handle+228}
Jan  8 11:29:24 desktop kernel:        <ffffffff8014ef4e>{find_get_page+14} 
<ffffffff8016ce4c>{__find_get_block_slow+220}
Jan  8 11:29:24 desktop kernel:        
<ffffffff8016d519>{__find_get_block+377} <ffffffff8016f87f>{__getblk+31}
Jan  8 11:29:24 desktop kernel:        <ffffffff801af4ba>{avc_has_perm+90} 
<ffffffff801af4ba>{avc_has_perm+90}
Jan  8 11:29:24 desktop kernel:        <ffffffff801af4ba>{avc_has_perm+90} 
<ffffffff801b02b4>{task_has_capability+100}
Jan  8 11:29:24 desktop kernel:        <ffffffff801c2992>{kobject_get+18} 
<ffffffff801c2992>{kobject_get+18}
Jan  8 11:29:24 desktop kernel:        
<ffffffffa0189730>{:sr_mod:sr_block_open+176} <ffffffff8017286a>{do_open+170}
Jan  8 11:29:24 desktop kernel:        <ffffffff80172c5f>{blkdev_open+47} 
<ffffffff8016aa36>{dentry_open+230}
Jan  8 11:29:24 desktop kernel:        <ffffffff8016ab7e>{filp_open+62} 
<ffffffff8016abc7>{get_unused_fd+55}
Jan  8 11:29:24 desktop kernel:        <ffffffff8016ad4c>{sys_open+76} 
<ffffffff8010e1da>{system_call+126}
Jan  8 11:29:24 desktop kernel:

Thank you.
Hari

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
  2005-01-08  0:43   ` Srihari Vijayaraghavan
@ 2005-01-08  2:11     ` Srihari Vijayaraghavan
  2005-01-08  3:51       ` Alan Stern
  2005-01-08  4:03     ` Alan Stern
  1 sibling, 1 reply; 15+ messages in thread
From: Srihari Vijayaraghavan @ 2005-01-08  2:11 UTC (permalink / raw)
  To: Alan Stern; +Cc: James Bottomley, USB development list, SCSI development list

[-- Attachment #1: Type: text/plain, Size: 5906 bytes --]

On Saturday 08 January 2005 11:43, Srihari Vijayaraghavan wrote:
> ...
> 1. While it is still easy to trigger this bug without USB Storage Debug
> Option in vanilla 2.6.10, I cannot reproduce the bug with debug option.
> Perhaps it changes some timings.

(Am sorry for replying to my own email.)
Though I could not trigger an OOPS with debug option yet, I have managed to 
trigger a D state "hald" process:

Jan  8 12:41:45 desktop kernel: hald          D 000001003fb24478     0  3369      
1          4163  3006 (NOTLB)
Jan  8 12:41:45 desktop kernel: 00000100374bb9b8 0000000000000006 
000000732d933078 000001003fb241f0
Jan  8 12:41:45 desktop kernel:        00000000000b08ff 00000100371bc170 
0000010037d71440 000001002e0db800
Jan  8 12:41:45 desktop kernel:        000001002d933078 00000100374bba58
Jan  8 12:41:45 desktop kernel: Call 
Trace:<ffffffff802c11eb>{wait_for_completion+139} 
<ffffffff8012dc20>{default_wake_function+0}
Jan  8 12:41:45 desktop kernel:        
<ffffffff8012dc20>{default_wake_function+0} 
<ffffffffa0004f7b>{:scsi_mod:scsi_wait_req+91}
Jan  8 12:41:45 desktop kernel:        
<ffffffffa0000038>{:scsi_mod:scsi_allocate_request+56}
Jan  8 12:41:45 desktop kernel:        <ffffffffa023e99a>{:sr_mod:sr_open+234} 
<ffffffff80110000>{do_stack_segment+128}
Jan  8 12:41:45 desktop kernel:        <ffffffff80240000>{cdrom_open+2080} 
<ffffffff8026cbfc>{alloc_skb+108}
Jan  8 12:41:45 desktop kernel:        
<ffffffff8026b4e1>{sock_alloc_send_pskb+113} 
<ffffffff8026e747>{memcpy_toiovec+55}
Jan  8 12:41:45 desktop kernel:        
<ffffffff8015f119>{find_mergeable_anon_vma+137} 
<ffffffff801af4ba>{avc_has_perm+90}
Jan  8 12:41:45 desktop kernel:        <ffffffff801af4ba>{avc_has_perm+90} 
<ffffffff801af4ba>{avc_has_perm+90}
Jan  8 12:41:45 desktop kernel:        
<ffffffff801b02b4>{task_has_capability+100} 
<ffffffff801c2992>{kobject_get+18}
Jan  8 12:41:45 desktop kernel:        <ffffffff8021e61c>{get_disk+44} 
<ffffffff801c2992>{kobject_get+18}
Jan  8 12:41:45 desktop kernel:        
<ffffffffa023e730>{:sr_mod:sr_block_open+176} <ffffffff8017286a>{do_open+170}
Jan  8 12:41:45 desktop kernel:        <ffffffff80172c5f>{blkdev_open+47} 
<ffffffff8016aa36>{dentry_open+230}
Jan  8 12:41:45 desktop kernel:        <ffffffff8016ab7e>{filp_open+62} 
<ffffffff8016abc7>{get_unused_fd+55}
Jan  8 12:41:45 desktop kernel:        <ffffffff8016ad4c>{sys_open+76} 
<ffffffff8010e1da>{system_call+126}
Jan  8 12:41:45 desktop kernel:


This (kernel badness in scsi_lib.c:1717) I have it in the system logs just 
before the process went in to D state:

Jan  8 12:34:00 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110
Jan  8 12:34:00 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = 
-32)
Jan  8 12:34:00 desktop kernel: usb 1-5: new high speed USB device using 
ehci_hcd and address 9
Jan  8 12:34:00 desktop kernel: scsi9 : SCSI emulation for USB Mass Storage 
devices
Jan  8 12:34:05 desktop kernel:   Vendor: PIONEER   Model: DVD-RW  DVR-107D  
Rev: 1.13
Jan  8 12:34:05 desktop kernel:   Type:   CD-ROM                             
ANSI SCSI revision: 00
Jan  8 12:34:05 desktop kernel: sr1: scsi3-mmc drive: 40x/40x writer cd/rw 
xa/form2 cdda tray
Jan  8 12:34:05 desktop scsi.agent[6565]: cdrom 
at /devices/pci0000:00/0000:00:10.4/usb1/1-5/1-5:1.0/host9/target9:0:0/9:0:0:0
Jan  8 12:34:06 desktop kernel: usb 1-5: USB disconnect, address 9
Jan  8 12:34:36 desktop kernel: scsi: Device offlined - not ready after error 
recovery: host 9 channel 0 id 0 lun 0
Jan  8 12:34:36 desktop kernel:  9:0:0:0: Illegal state transition 
deleted->offline
Jan  8 12:34:36 desktop kernel: Badness in scsi_device_set_state at 
drivers/scsi/scsi_lib.c:1717
Jan  8 12:34:36 desktop kernel:
Jan  8 12:34:36 desktop kernel: Call 
Trace:<ffffffffa0006808>{:scsi_mod:scsi_device_set_state+264}
Jan  8 12:34:36 desktop kernel:        
<ffffffffa000481c>{:scsi_mod:scsi_error_handler+2732}
Jan  8 12:34:36 desktop kernel:        <ffffffff8010ebf3>{child_rip+8} 
<ffffffffa0003d70>{:scsi_mod:scsi_error_handler+0}
Jan  8 12:34:36 desktop kernel:        <ffffffff8010ebeb>{child_rip+0}
Jan  8 12:35:10 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110
Jan  8 12:35:10 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = 
-32)
Jan  8 12:35:10 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110
Jan  8 12:35:10 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = 
-32)
Jan  8 12:35:11 desktop kernel: usb 1-5: new high speed USB device using 
ehci_hcd and address 10
Jan  8 12:35:11 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110
Jan  8 12:35:11 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = 
-32)
Jan  8 12:35:11 desktop kernel: scsi10 : SCSI emulation for USB Mass Storage 
devices
Jan  8 12:35:15 desktop kernel: usb 1-5: USB disconnect, address 10
Jan  8 12:35:28 desktop kernel: usb 1-5: new high speed USB device using 
ehci_hcd and address 11
Jan  8 12:35:28 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110
Jan  8 12:35:28 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = 
-32)
Jan  8 12:35:28 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110
Jan  8 12:35:28 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = 
-32)
Jan  8 12:35:29 desktop kernel: scsi11 : SCSI emulation for USB Mass Storage 
devices
Jan  8 12:35:34 desktop kernel:   Vendor: PIONEER   Model: DVD-RW  DVR-107D  
Rev: 1.13
Jan  8 12:35:34 desktop kernel:   Type:   CD-ROM                             
ANSI SCSI revision: 00
Jan  8 12:35:34 desktop kernel: sr2: scsi3-mmc drive: 40x/40x writer cd/rw 
xa/form2 cdda tray
Jan  8 12:35:36 desktop kernel: usb 1-5: USB disconnect, address 11


The complete copy of system logs is attached to this email.

I think I have seen this "Badness in scsi_lib.c:1717" in 2.6.10 before (even 
without USB Storage Debug option that is).

Thanks.
Hari

[-- Attachment #2: dmesg-2.6.10-1.bz2 --]
[-- Type: application/x-bzip2, Size: 20270 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
  2005-01-08  2:11     ` Srihari Vijayaraghavan
@ 2005-01-08  3:51       ` Alan Stern
  2005-01-09  4:26         ` Srihari Vijayaraghavan
  0 siblings, 1 reply; 15+ messages in thread
From: Alan Stern @ 2005-01-08  3:51 UTC (permalink / raw)
  To: Srihari Vijayaraghavan
  Cc: James Bottomley, USB development list, SCSI development list

On Sat, 8 Jan 2005, Srihari Vijayaraghavan wrote:

> Though I could not trigger an OOPS with debug option yet, I have managed to 
> trigger a D state "hald" process:
> 
> Jan  8 12:41:45 desktop kernel: hald          D 000001003fb24478     0  3369      
> 1          4163  3006 (NOTLB)
> Jan  8 12:41:45 desktop kernel: 00000100374bb9b8 0000000000000006 
> 000000732d933078 000001003fb241f0
> Jan  8 12:41:45 desktop kernel:        00000000000b08ff 00000100371bc170 
> 0000010037d71440 000001002e0db800
> Jan  8 12:41:45 desktop kernel:        000001002d933078 00000100374bba58
> Jan  8 12:41:45 desktop kernel: Call 
> Trace:<ffffffff802c11eb>{wait_for_completion+139} 
> <ffffffff8012dc20>{default_wake_function+0}
> Jan  8 12:41:45 desktop kernel:        
> <ffffffff8012dc20>{default_wake_function+0} 
> <ffffffffa0004f7b>{:scsi_mod:scsi_wait_req+91}

That's not surprising.  It's waiting for the error handler to finish.

> This (kernel badness in scsi_lib.c:1717) I have it in the system logs just 
> before the process went in to D state:
> 
> Jan  8 12:34:00 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110
> Jan  8 12:34:00 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = 
> -32)
> Jan  8 12:34:00 desktop kernel: usb 1-5: new high speed USB device using 
> ehci_hcd and address 9
> Jan  8 12:34:00 desktop kernel: scsi9 : SCSI emulation for USB Mass Storage 
> devices
> Jan  8 12:34:05 desktop kernel:   Vendor: PIONEER   Model: DVD-RW  DVR-107D  
> Rev: 1.13
> Jan  8 12:34:05 desktop kernel:   Type:   CD-ROM                             
> ANSI SCSI revision: 00
> Jan  8 12:34:05 desktop kernel: sr1: scsi3-mmc drive: 40x/40x writer cd/rw 
> xa/form2 cdda tray

I see you didn't have CONFIG_USB_STORAGE_DEBUG set when you ran this test.  
Or else you didn't have your syslog daemon configured to store the
debugging messages.  If you can get anything unusual to happen with the
debugging enabled, it would make things much easier to track down.

> Jan  8 12:34:05 desktop scsi.agent[6565]: cdrom 
> at /devices/pci0000:00/0000:00:10.4/usb1/1-5/1-5:1.0/host9/target9:0:0/9:0:0:0
> Jan  8 12:34:06 desktop kernel: usb 1-5: USB disconnect, address 9
> Jan  8 12:34:36 desktop kernel: scsi: Device offlined - not ready after error 
> recovery: host 9 channel 0 id 0 lun 0
> Jan  8 12:34:36 desktop kernel:  9:0:0:0: Illegal state transition 
> deleted->offline
> Jan  8 12:34:36 desktop kernel: Badness in scsi_device_set_state at 
> drivers/scsi/scsi_lib.c:1717
> Jan  8 12:34:36 desktop kernel:
> Jan  8 12:34:36 desktop kernel: Call 
> Trace:<ffffffffa0006808>{:scsi_mod:scsi_device_set_state+264}
> Jan  8 12:34:36 desktop kernel:        
> <ffffffffa000481c>{:scsi_mod:scsi_error_handler+2732}
> Jan  8 12:34:36 desktop kernel:        <ffffffff8010ebf3>{child_rip+8} 
> <ffffffffa0003d70>{:scsi_mod:scsi_error_handler+0}
> Jan  8 12:34:36 desktop kernel:        <ffffffff8010ebeb>{child_rip+0}

I thought these state transition problems had all been cleaned up?

> Jan  8 12:35:10 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110
> Jan  8 12:35:10 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = 
> -32)
> Jan  8 12:35:10 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error -110
> Jan  8 12:35:10 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = 
> -32)

Those errors are quite unusual.  They seem to indicate there's a problem
with your computer's USB hardware, as do the numerous spontaneous (?)  
disconnections in your log.  However I'm not familiar with all the details
of the EHCI driver.  It's faintly possible that plugging the drive into a 
different USB port would work better.

If you still can't get anything with the debugging option, try running
without the debugging code and rmmod ehci-hcd before plugging in the
device.  Maybe if you use a UHCI controller instead the errors won't
occur.

Alan Stern


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
  2005-01-08  0:43   ` Srihari Vijayaraghavan
  2005-01-08  2:11     ` Srihari Vijayaraghavan
@ 2005-01-08  4:03     ` Alan Stern
  1 sibling, 0 replies; 15+ messages in thread
From: Alan Stern @ 2005-01-08  4:03 UTC (permalink / raw)
  To: Srihari Vijayaraghavan
  Cc: James Bottomley, USB development list, SCSI development list

On Sat, 8 Jan 2005, Srihari Vijayaraghavan wrote:

> Alan,
> 
> I have made a couple of observations:
> 1. While it is still easy to trigger this bug without USB Storage Debug Option 
> in vanilla 2.6.10, I cannot reproduce the bug with debug option. Perhaps it 
> changes some timings.

Certainly it changes timings.

> (Without debug option, within a dozen plug/unplug events I can reliably 
> trigger the bug, but OTOH with debug options I cannot even after 50 events. 
> Maybe I should run my desktop with debug option for better stability. :-))
> 
> 2. When running without debug option, the D state "hald" process appears 
> first, and upon unplugging the drive an OOPS appears on the very next 
> plugging of the drive.

It sounds like the problem occurs in hald, and the later oops is just a 
symptom whose cause came much earlier.

> Here is the new OOPS:
> 
> usb-storage: device found at 7
> usb-storage: waiting for device to settle before scanning
> Unable to handle kernel paging request at 0000001600000019 RIP:
> <ffffffffa0108259>{:usb_storage:bus_reset+73}

I don't suppose you're familiar enough with 64-bit assembler to be able to 
decode the low-level instruction listing and tell me which line of the 
bus_reset routine corresponds to offset 73?

Failing that, maybe you can try applying this patch.  It will print some 
extra debugging information in the system log, even with 
CONFIG_USB_STORAGE_DEBUG not set.

Alan Stern



--- 2.6.10-orig/drivers/usb/storage/scsiglue.c	Mon Jan  3 09:39:36 2005
+++ 2.6.10/drivers/usb/storage/scsiglue.c	Fri Jan  7 22:59:51 2005
@@ -268,9 +268,15 @@
 /* This is always called with scsi_lock(srb->host) held */
 static int bus_reset(struct scsi_cmnd *srb)
 {
-	struct us_data *us = (struct us_data *)srb->device->host->hostdata[0];
+	struct us_data *us;
 	int result, rc;
 
+	printk(KERN_INFO "In bus_reset, srb: %p\n", srb);
+	printk("  device: %p\n", srb->device);
+	printk("  host: %p\n", srb->device->host);
+	printk("  hostdata: %p\n", (void *) srb->device->host->hostdata[0]);
+
+	us = (struct us_data *)srb->device->host->hostdata[0];
 	US_DEBUGP("%s called\n", __FUNCTION__);
 
 	scsi_unlock(srb->device->host);


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
  2005-01-08  3:51       ` Alan Stern
@ 2005-01-09  4:26         ` Srihari Vijayaraghavan
  2005-01-09 17:32           ` Alan Stern
  0 siblings, 1 reply; 15+ messages in thread
From: Srihari Vijayaraghavan @ 2005-01-09  4:26 UTC (permalink / raw)
  To: Alan Stern; +Cc: James Bottomley, USB development list, SCSI development list

On Saturday 08 January 2005 14:51, Alan Stern wrote:
> ...
> I see you didn't have CONFIG_USB_STORAGE_DEBUG set when you ran this test.

Actually I did. I do not know what log level the USB debug messages were, but 
the silly syslog did not capture them. Now I have set it to write all kernel 
messages.

> Or else you didn't have your syslog daemon configured to store the
> debugging messages.  If you can get anything unusual to happen with the
> debugging enabled, it would make things much easier to track down.

That is right. I am trying to trigger the bug, and I have ensured that the USB 
debug messages are being written to the log file. But I have not triggered 
the bug yet.

> ...
> > Jan  8 12:35:10 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error
> > -110 Jan  8 12:35:10 desktop kernel: hub 1-0:1.0: hub_port_status failed
> > (err = -32)
> > Jan  8 12:35:10 desktop kernel: ehci_hcd 0000:00:10.4: port 5 reset error
> > -110 Jan  8 12:35:10 desktop kernel: hub 1-0:1.0: hub_port_status failed
> > (err = -32)
>
> Those errors are quite unusual.  They seem to indicate there's a problem
> with your computer's USB hardware, as do the numerous spontaneous (?)
> disconnections in your log.  However I'm not familiar with all the details
> of the EHCI driver.  It's faintly possible that plugging the drive into a
> different USB port would work better.

The numerous disconnections (because without debug, I can trigger the D state 
process and oops within a dozen connection/disconnection, but with debug it 
takes 10 times more connection/disconnection to trigger D state process, and 
I do not know how many are needed for oops :-( ) are purposefully generated 
to trigger the bug.

Of course UHCI (USB 1) is not as fast as EHCI (USB 2). Would that not matter 
for CD/DVD burning?

> If you still can't get anything with the debugging option, try running
> without the debugging code and rmmod ehci-hcd before plugging in the
> device.  Maybe if you use a UHCI controller instead the errors won't
> occur.

OK I shall try with UHCI.

Thank you.
Hari

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
  2005-01-09  4:26         ` Srihari Vijayaraghavan
@ 2005-01-09 17:32           ` Alan Stern
  2005-01-10 10:14             ` Srihari Vijayaraghavan
  0 siblings, 1 reply; 15+ messages in thread
From: Alan Stern @ 2005-01-09 17:32 UTC (permalink / raw)
  To: Srihari Vijayaraghavan
  Cc: James Bottomley, USB development list, SCSI development list

On Sun, 9 Jan 2005, Srihari Vijayaraghavan wrote:

> On Saturday 08 January 2005 14:51, Alan Stern wrote:
> > ...
> > I see you didn't have CONFIG_USB_STORAGE_DEBUG set when you ran this test.
> 
> Actually I did. I do not know what log level the USB debug messages were, but 
> the silly syslog did not capture them. Now I have set it to write all kernel 
> messages.

It's a very common thing to overlook.  And by the way, the log level for 
the debug messages is DEBUG (what a surprise!).

> > Or else you didn't have your syslog daemon configured to store the
> > debugging messages.  If you can get anything unusual to happen with the
> > debugging enabled, it would make things much easier to track down.
> 
> That is right. I am trying to trigger the bug, and I have ensured that the USB 
> debug messages are being written to the log file. But I have not triggered 
> the bug yet.

If you're unable to trigger the bug, you can try turning the debug 
configuration option back off and using that patch I sent earlier.  It 
wouldn't hurt to use that patch even with the debugging messages...


> > Those errors are quite unusual.  They seem to indicate there's a problem
> > with your computer's USB hardware, as do the numerous spontaneous (?)
> > disconnections in your log.  However I'm not familiar with all the details
> > of the EHCI driver.  It's faintly possible that plugging the drive into a
> > different USB port would work better.
> 
> The numerous disconnections (because without debug, I can trigger the D state 
> process and oops within a dozen connection/disconnection, but with debug it 
> takes 10 times more connection/disconnection to trigger D state process, and 
> I do not know how many are needed for oops :-( ) are purposefully generated 
> to trigger the bug.

Oh, so you did those by hand.  All right.  Probably the fact that they 
come fairly close together is related to the bug.  I wonder if something 
could be causing the SCSI error handler to run when it shouldn't, using an 
out-of-date host structure?

> Of course UHCI (USB 1) is not as fast as EHCI (USB 2). Would that not matter 
> for CD/DVD burning?

It definitely will slow things down a lot.  For normal operation you would 
not want to rely on it.

Alan Stern


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
  2005-01-09 17:32           ` Alan Stern
@ 2005-01-10 10:14             ` Srihari Vijayaraghavan
  2005-01-10 17:33               ` Alan Stern
  0 siblings, 1 reply; 15+ messages in thread
From: Srihari Vijayaraghavan @ 2005-01-10 10:14 UTC (permalink / raw)
  To: Alan Stern; +Cc: James Bottomley, USB development list, SCSI development list

[-- Attachment #1: Type: text/plain, Size: 4297 bytes --]

On Monday 10 January 2005 04:32, Alan Stern wrote:
> If you're unable to trigger the bug, you can try turning the debug
> configuration option back off and using that patch I sent earlier.  It
> wouldn't hurt to use that patch even with the debugging messages...

I have disabled the USB storage debug option. Your patch unfortunately did not 
apply cleanly, so I did it manually, and here is the correct patch:
--- test/drivers/usb/storage/scsiglue.c.orig 2005-01-10 18:31:36.453932344 
+1100
+++ test/drivers/usb/storage/scsiglue.c 2005-01-10 18:37:57.329030528 +1100
@@ -268,9 +268,15 @@
 /* This is always called with scsi_lock(srb->host) held */
 static int bus_reset(struct scsi_cmnd *srb)
 {
- struct us_data *us = (struct us_data *)srb->device->host->hostdata[0];
+ struct us_data *us;
  int result, rc;
 
+ printk(KERN_INFO "In bus_reset, srb: %p\n", srb);
+ printk("  device: %p\n", srb->device);
+ printk("  host: %p\n", srb->device->host);
+ printk("  hostdata: %p\n", (void *) srb->device->host->hostdata[0]);
+
+ us = (struct us_data *)srb->device->host->hostdata[0];
  US_DEBUGP("%s called\n", __FUNCTION__);
 
  scsi_unlock(srb->device->host);


I am unable to trigger the bug, despite my sincere efforts. But I have managed 
to push "hald" process in to D state.
Jan 10 21:06:41 desktop kernel: hald          D 000001003776d938     0  3367      
1          3429  3357 (NOTLB)
Jan 10 21:06:41 desktop kernel: 000001003776d898 0000000000000002 
000000732b17d078 000001003f771650
Jan 10 21:06:41 desktop kernel:        00000000000007fd 000001003fed2070 
0000010037d71640 0000010038e3d000
Jan 10 21:06:41 desktop kernel:        000001002b17d078 000001003776d938
Jan 10 21:06:41 desktop kernel: Call 
Trace:<ffffffff802c11eb>{wait_for_completion+139} 
<ffffffff8012dc20>{default_wake_function+0}
Jan 10 21:06:41 desktop kernel:        
<ffffffff8012dc20>{default_wake_function+0} 
<ffffffffa0004f7b>{:scsi_mod:scsi_wait_req+91}
Jan 10 21:06:41 desktop kernel:        
<ffffffffa0000038>{:scsi_mod:scsi_allocate_request+56}
Jan 10 21:06:41 desktop kernel:        
<ffffffffa023a33c>{:sr_mod:sr_do_ioctl+156} 
<ffffffffa023a654>{:sr_mod:sr_audio_ioctl+372}
Jan 10 21:06:41 desktop kernel:        
<ffffffffa0004fac>{:scsi_mod:scsi_wait_req+140} 
<ffffffff8023f71e>{cdrom_count_tracks+222}
Jan 10 21:06:41 desktop kernel:        <ffffffff8023f9a0>{cdrom_open+448} 
<ffffffffa00471f4>{:ext3:ext3_get_block_handle+228}
Jan 10 21:06:41 desktop kernel:        <ffffffff8014ef4e>{find_get_page+14} 
<ffffffff8016ce4c>{__find_get_block_slow+220}
Jan 10 21:06:41 desktop kernel:        
<ffffffff8016d519>{__find_get_block+377} <ffffffff8016f87f>{__getblk+31}
Jan 10 21:06:41 desktop kernel:        <ffffffff801af4ba>{avc_has_perm+90} 
<ffffffff801af4ba>{avc_has_perm+90}
Jan 10 21:06:41 desktop kernel:        <ffffffff801af4ba>{avc_has_perm+90} 
<ffffffff801b02b4>{task_has_capability+100}
Jan 10 21:06:41 desktop kernel:        <ffffffff801c2992>{kobject_get+18} 
<ffffffff801c2992>{kobject_get+18}
Jan 10 21:06:41 desktop kernel:        
<ffffffffa0239730>{:sr_mod:sr_block_open+176} <ffffffff8017286a>{do_open+170}
Jan 10 21:06:41 desktop kernel:        <ffffffff80172c5f>{blkdev_open+47} 
<ffffffff8016aa36>{dentry_open+230}
Jan 10 21:06:41 desktop kernel:        <ffffffff8016ab7e>{filp_open+62} 
<ffffffff8016abc7>{get_unused_fd+55}
Jan 10 21:06:41 desktop kernel:        <ffffffff8016ad4c>{sys_open+76} 
<ffffffff8010e1da>{system_call+126}
Jan 10 21:06:41 desktop kernel:


Please refer to the attachment for the complete kernel log.

I think I understand the problem (conceptually I guess):
1. Connect the USB DVD-RW drive
2. USB Storage says: "usb-storage: device found at nnn" and "usb-storage: 
waiting for device to settle before scanning"
3. And then it waits for 5+ seconds
4. Then it initialises the peripheral and then says: "Vendor: PIONEER   Model: 
DVD-RW  DVR-107D  Rev: 1.13 ..." and "usb-storage: device scan complete"

I think if I unplug the device precisely the same moment when it initialises 
(step 4), and by time that routine completes kernel is really unhappy (D 
state process,  Badness in scsi_device_set_state at 
drivers/scsi/scsi_lib.c:1717, and/or oops) to discover that the drive is not 
there anymore :-).

I do not know what else to do.

Thank you.
Hari

[-- Attachment #2: kernel.log.bz2 --]
[-- Type: application/x-bzip2, Size: 6591 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
  2005-01-10 10:14             ` Srihari Vijayaraghavan
@ 2005-01-10 17:33               ` Alan Stern
  2005-01-10 17:39                 ` Greg KH
  2005-01-10 20:14                 ` Mike Anderson
  0 siblings, 2 replies; 15+ messages in thread
From: Alan Stern @ 2005-01-10 17:33 UTC (permalink / raw)
  To: James Bottomley, Srihari Vijayaraghavan, Greg KH
  Cc: USB development list, SCSI development list

If James Bottomley, Mike Anderson, or anyone else from linux-scsi can
comment on the material below, I would appreciate it.


On Mon, 10 Jan 2005, Srihari Vijayaraghavan wrote:

> I am unable to trigger the bug, despite my sincere efforts.

If nothing else, I have to admire your persistence in plugging and 
unplugging the drive 120 times over the course of a half-hour!

> Please refer to the attachment for the complete kernel log.
> 
> I think I understand the problem (conceptually I guess):
> 1. Connect the USB DVD-RW drive
> 2. USB Storage says: "usb-storage: device found at nnn" and "usb-storage: 
> waiting for device to settle before scanning"
> 3. And then it waits for 5+ seconds
> 4. Then it initialises the peripheral and then says: "Vendor: PIONEER   Model: 
> DVD-RW  DVR-107D  Rev: 1.13 ..." and "usb-storage: device scan complete"

That's right.  But there's more going on behind the scenes that you aren't 
aware of.  Hotplug programs, haldaemon, automounters, and who knows what 
else.

> I think if I unplug the device precisely the same moment when it initialises 
> (step 4), and by time that routine completes kernel is really unhappy (D 
> state process,  Badness in scsi_device_set_state at 
> drivers/scsi/scsi_lib.c:1717, and/or oops) to discover that the drive is not 
> there anymore :-).

It's not that simple.  Your log contained two different sorts of errors.  
The first occurred right near the start:

Jan 10 20:24:46 desktop kernel: usb 1-3: USB disconnect, address 3
Jan 10 20:24:46 desktop kernel: 50-fstab-sync.h[5147]: segfault at 0000000000000000 rip 000000379706f6b2 rsp 0000007fbffff388 error 4
Jan 10 20:24:46 desktop kernel: 50-fstab-sync.h[5161]: segfault at 0000000000000000 rip 000000379706f6b2 rsp 0000007fbffff6d8 error 4

This looks like a problem in a user program, not in the kernel.  I have no 
idea what 50-fstab-sync.h is.  Part of udev perhaps?  Greg, do you know?


The second problem occurred during a period in which you replugged the
drive three times!  Here's the relevant part of the log:

Jan 10 20:48:37 desktop kernel: usb 1-3: new high speed USB device using ehci_hcd and address 111
Jan 10 20:48:37 desktop kernel: scsi111 : SCSI emulation for USB Mass Storage devices
Jan 10 20:48:37 desktop kernel: usb-storage: device found at 111
Jan 10 20:48:37 desktop kernel: usb-storage: waiting for device to settle before scanning
Jan 10 20:48:42 desktop kernel:   Vendor: PIONEER   Model: DVD-RW  DVR-107D  Rev: 1.13
Jan 10 20:48:42 desktop kernel:   Type:   CD-ROM                             ANSI SCSI revision: 00
Jan 10 20:48:42 desktop kernel: sr1: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray
Jan 10 20:48:42 desktop kernel: Attached scsi CD-ROM sr1 at scsi111, channel 0, id 0, lun 0
Jan 10 20:48:42 desktop kernel: usb-storage: device scan complete
Jan 10 20:48:42 desktop kernel: usb 1-3: USB disconnect, address 111

That looks okay.  But you don't see the hotplug etc. programs that are 
still running, keeping the device below scsi111 open.

Jan 10 20:48:56 desktop kernel: usb 1-3: new high speed USB device using ehci_hcd and address 112
Jan 10 20:48:56 desktop kernel: scsi112 : SCSI emulation for USB Mass Storage devices
Jan 10 20:48:56 desktop kernel: usb-storage: device found at 112
Jan 10 20:48:56 desktop kernel: usb-storage: waiting for device to settle before scanning
Jan 10 20:49:01 desktop kernel: usb-storage: device scan complete
Jan 10 20:49:01 desktop kernel: usb 1-3: USB disconnect, address 112

That's normal also.  Host scsi112 was created and apparently destroyed.

Jan 10 20:49:08 desktop kernel: usb 1-3: new high speed USB device using ehci_hcd and address 113
Jan 10 20:49:08 desktop kernel: scsi113 : SCSI emulation for USB Mass Storage devices
Jan 10 20:49:08 desktop kernel: usb-storage: device found at 113
Jan 10 20:49:08 desktop kernel: usb-storage: waiting for device to settle before scanning
Jan 10 20:49:12 desktop kernel: In bus_reset, srb: 0000010037d71640
Jan 10 20:49:12 desktop kernel:   device: 0000010038e3d000
Jan 10 20:49:12 desktop kernel:   host: 000001002b500c00
Jan 10 20:49:12 desktop kernel:   hostdata: 000001002b549c00

This is the first indication of something funny.  I can't think how a bus
reset would get initiated by the kernel -- that only happens after a
timeout.  Maybe it came from a user program?

Jan 10 20:49:12 desktop kernel: ehci_hcd 0000:00:10.4: port 3 reset error -110
Jan 10 20:49:12 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Jan 10 20:49:13 desktop kernel: usb 1-3: reset high speed USB device using ehci_hcd and address 113
Jan 10 20:49:13 desktop kernel: scsi: Device offlined - not ready after error recovery: host 111 channel 0 id 0 lun 0
Jan 10 20:49:13 desktop kernel:  111:0:0:0: Illegal state transition deleted->offline
Jan 10 20:49:13 desktop kernel: Badness in scsi_device_set_state at drivers/scsi/scsi_lib.c:1717
Jan 10 20:49:13 desktop kernel: 
Jan 10 20:49:13 desktop kernel: Call Trace:<ffffffffa0006808>{:scsi_mod:scsi_device_set_state+264} 
Jan 10 20:49:13 desktop kernel:        <ffffffffa000481c>{:scsi_mod:scsi_error_handler+2732} 
Jan 10 20:49:13 desktop kernel:        <ffffffff8010ebf3>{child_rip+8} <ffffffffa0003d70>{:scsi_mod:scsi_error_handler+0} 
Jan 10 20:49:13 desktop kernel:        <ffffffff8010ebeb>{child_rip+0} 

And here's the weird part.  Apparently the bus reset was not for the
current device attached to scsi113, but for the old incarnation attached
to scsi111 -- which is long since gone!  That accounts for the illegal
state transition.  But why did it happen like this in the first place?  
Could it be caused by a user program doing an SG_SCSI_RESET ioctl while
still holding the device open after scsi_remove_host has finished?

Jan 10 20:49:13 desktop kernel: usb 1-3: USB disconnect, address 113
Jan 10 20:49:18 desktop kernel: In bus_reset, srb: 0000010037d71240
Jan 10 20:49:18 desktop kernel:   device: 0000010038f09000
Jan 10 20:49:18 desktop kernel:   host: 000001002b500400
Jan 10 20:49:18 desktop kernel:   hostdata: 000001002b549c00
Jan 10 20:49:18 desktop kernel: scsi: Device offlined - not ready after error recovery: host 113 channel 0 id 0 lun 0
Jan 10 20:49:18 desktop kernel: usb-storage: device scan complete

Now it's not surprising that the reset, which was intended for scsi111, 
has interfered with the proper operation of scsi113.

Alan Stern



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
  2005-01-10 17:33               ` Alan Stern
@ 2005-01-10 17:39                 ` Greg KH
  2005-01-10 20:14                 ` Mike Anderson
  1 sibling, 0 replies; 15+ messages in thread
From: Greg KH @ 2005-01-10 17:39 UTC (permalink / raw)
  To: Alan Stern
  Cc: James Bottomley, Srihari Vijayaraghavan, USB development list,
	SCSI development list

On Mon, Jan 10, 2005 at 12:33:49PM -0500, Alan Stern wrote:
> It's not that simple.  Your log contained two different sorts of errors.  
> The first occurred right near the start:
> 
> Jan 10 20:24:46 desktop kernel: usb 1-3: USB disconnect, address 3
> Jan 10 20:24:46 desktop kernel: 50-fstab-sync.h[5147]: segfault at 0000000000000000 rip 000000379706f6b2 rsp 0000007fbffff388 error 4
> Jan 10 20:24:46 desktop kernel: 50-fstab-sync.h[5161]: segfault at 0000000000000000 rip 000000379706f6b2 rsp 0000007fbffff6d8 error 4
> 
> This looks like a problem in a user program, not in the kernel.  I have no 
> idea what 50-fstab-sync.h is.  Part of udev perhaps?  Greg, do you know?

I think it might be part of HAL, it's not part of udev.  Ah yes, it's at
/etc/hal/device.d/50-fstab-sync.hal on my box.

thanks,

greg k-h


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
  2005-01-10 17:33               ` Alan Stern
  2005-01-10 17:39                 ` Greg KH
@ 2005-01-10 20:14                 ` Mike Anderson
  2005-01-10 22:29                   ` [linux-usb-devel] " Alan Stern
  1 sibling, 1 reply; 15+ messages in thread
From: Mike Anderson @ 2005-01-10 20:14 UTC (permalink / raw)
  To: Alan Stern
  Cc: James Bottomley, Srihari Vijayaraghavan, Greg KH,
	USB development list, SCSI development list


Alan Stern [stern@rowland.harvard.edu] wrote:
> Jan 10 20:49:08 desktop kernel: scsi113 : SCSI emulation for USB Mass Storage devices
> Jan 10 20:49:08 desktop kernel: usb-storage: device found at 113
> Jan 10 20:49:08 desktop kernel: usb-storage: waiting for device to settle before scanning
> Jan 10 20:49:12 desktop kernel: In bus_reset, srb: 0000010037d71640
> Jan 10 20:49:12 desktop kernel:   device: 0000010038e3d000
> Jan 10 20:49:12 desktop kernel:   host: 000001002b500c00
> Jan 10 20:49:12 desktop kernel:   hostdata: 000001002b549c00
> 
> This is the first indication of something funny.  I can't think how a bus
> reset would get initiated by the kernel -- that only happens after a
> timeout.  Maybe it came from a user program?

Well it can get initiated in the scsi error handler. This can be verified
by setting the logging_level (sysctl -w dev.scsi.logging_level=0x4).

You probably do not need to do this as I assume what is happening is that
with all the plug and unplug that is being initiated that a IO is in
flight when the scsi_remove_host functions is called. It looks like we
still have to decide on the removal cleanup. Currently scsi_host_cancel
being called from scsi_remove_host appears to not do anything as
scsi_forget_host removes the devices from the list it iterates over. Also
as previously mentioned by you there are races with scsi_mid canceling IOs
that might be completing.  Both of these issue where mentioned in an old
thread (link provided below), but not completely resolved.

http://marc.theaimsgroup.com/?t=109630920600005&r=1&w=2

What is needed is for the LLDD to ensure all IO sent through the LLDD's
queuecommand functions gets a scsi_done call (even a fail call). The scsi
mid applies choke points to stop IO being set to the LLDDs queuecommand,
but there can be races during a host removal (you previously mentioned
this). This would also imply that scsi mid should not be making the call
to scsi_host_cancel (which appears to not do anything anyway).


-andmike
--
Michael Anderson
andmike@us.ibm.com



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10
  2005-01-10 20:14                 ` Mike Anderson
@ 2005-01-10 22:29                   ` Alan Stern
  0 siblings, 0 replies; 15+ messages in thread
From: Alan Stern @ 2005-01-10 22:29 UTC (permalink / raw)
  To: Mike Anderson
  Cc: James Bottomley, Srihari Vijayaraghavan, Greg KH,
	USB development list, SCSI development list

On Mon, 10 Jan 2005, Mike Anderson wrote:

> > This is the first indication of something funny.  I can't think how a bus
> > reset would get initiated by the kernel -- that only happens after a
> > timeout.  Maybe it came from a user program?
> 
> Well it can get initiated in the scsi error handler. This can be verified
> by setting the logging_level (sysctl -w dev.scsi.logging_level=0x4).

It could be, but it's very unlikely.  Especially since the error handler 
only does a bus reset in response to repeated timeouts (which don't happen 
here; commands either succeed or quickly get an error) and since the host 
has already been removed.

> You probably do not need to do this as I assume what is happening is that
> with all the plug and unplug that is being initiated that a IO is in
> flight when the scsi_remove_host functions is called. It looks like we
> still have to decide on the removal cleanup. Currently scsi_host_cancel
> being called from scsi_remove_host appears to not do anything as
> scsi_forget_host removes the devices from the list it iterates over.

That could easily be the problem.

>  Also
> as previously mentioned by you there are races with scsi_mid canceling IOs
> that might be completing.  Both of these issue where mentioned in an old
> thread (link provided below), but not completely resolved.
> 
> http://marc.theaimsgroup.com/?t=109630920600005&r=1&w=2
> 
> What is needed is for the LLDD to ensure all IO sent through the LLDD's
> queuecommand functions gets a scsi_done call (even a fail call). The scsi
> mid applies choke points to stop IO being set to the LLDDs queuecommand,
> but there can be races during a host removal (you previously mentioned
> this). This would also imply that scsi mid should not be making the call
> to scsi_host_cancel (which appears to not do anything anyway).

This _has_ been resolved in usb-storage.  Every command sent to the 
queuecommand routine either gets executed and sent to scsi_done or else is 
rejected immediately with a failure call to scsi_done.  The exception is 
commands that were aborted through the command_abort routine; scsi_done is 
not called for them.

My concern is about reset requests coming in through an ioctl call; they
don't pass through queuecommand but are sent directly to the bus_reset
routine.  Hence they aren't subject to the choke point treatment you
describe.

Alan Stern


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2005-01-10 22:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-05  8:14 Fw: [BUG] USB Storage OOPS and a D state process in 2.6.10 Andrew Morton
2005-01-05 14:17 ` James Bottomley
2005-01-05 15:50 ` Alan Stern
2005-01-06 10:38   ` [linux-usb-devel] " Srihari Vijayaraghavan
2005-01-08  0:43   ` Srihari Vijayaraghavan
2005-01-08  2:11     ` Srihari Vijayaraghavan
2005-01-08  3:51       ` Alan Stern
2005-01-09  4:26         ` Srihari Vijayaraghavan
2005-01-09 17:32           ` Alan Stern
2005-01-10 10:14             ` Srihari Vijayaraghavan
2005-01-10 17:33               ` Alan Stern
2005-01-10 17:39                 ` Greg KH
2005-01-10 20:14                 ` Mike Anderson
2005-01-10 22:29                   ` [linux-usb-devel] " Alan Stern
2005-01-08  4:03     ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox