* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
[not found] <Pine.LNX.4.44.0409050434540.8779-100000@sasami.anime.net>
@ 2004-10-24 14:42 ` Olaf Hering
2004-10-24 14:53 ` James Bottomley
2004-10-31 21:05 ` Herbert Schmid
0 siblings, 2 replies; 17+ messages in thread
From: Olaf Hering @ 2004-10-24 14:42 UTC (permalink / raw)
To: Dan Hollis, linux-scsi; +Cc: bcollins, linux1394-devel
On Sun, Sep 05, Dan Hollis wrote:
> Vendor: PIONEER Model: DVD-RW DVR-104 Rev: 1.31
> Type: CD-ROM ANSI SCSI revision: 02
> cat /proc/scsi/scsi, it shows up
>
> now turn drive off
>
> ieee1394: Node changed: 0-01:1023 -> 0-00:1023
> ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[00309995505516f8]
>
> and now...
> cat /proc/scsi/scsi
>
> entire firewire system is permanently out to lunch. the cat hangs and
> can't be kill -9'd, no hotplug ever works again, the only solution is a
> complete reboot.
This is the backtrace:
knodemgrd_0 D 00000000 0 296 1 307 293 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c017736c] schedule+0x2b8/0x5e0
[c0177914] wait_for_completion+0x7c/0xec
[c5d2fb94] scsi_wait_req+0x64/0xac [scsi_mod]
[c54f7598] sr_do_ioctl+0x70/0x240 [sr_mod]
[c54f6a00] sr_packet+0x5c/0x9c [sr_mod]
[c5d11a34] cdrom_get_disc_info+0x60/0xc4 [cdrom]
[c5d121e4] cdrom_mrw_exit+0x1c/0x104 [cdrom]
[c5d10b6c] unregister_cdrom+0xd0/0x104 [cdrom]
[c54f61a8] sr_kref_release+0x54/0x80 [sr_mod]
[c00ad96c] kref_put+0x60/0x70
[c54f6924] sr_remove+0x50/0xd0 [sr_mod]
[c00eb100] device_release_driver+0x1b8/0x1bc
[c00eb2f0] bus_remove_device+0xc0/0x12c
[c00e9620] device_del+0xa4/0x114
The trouble starts in register_cdrom(), it sets the ->exit() function if
the CD can do CDC_MRW_W.
On unregister, it tries to send a packet to the device which is already
gone.
How about this patch?
diff -purN linux-2.6.9.orig/drivers/scsi/sr.c linux-2.6.9-olh/drivers/scsi/sr.c
--- linux-2.6.9.orig/drivers/scsi/sr.c 2004-10-22 19:02:43.545400072 +0200
+++ linux-2.6.9-olh/drivers/scsi/sr.c 2004-10-24 16:32:10.765682704 +0200
@@ -916,6 +918,7 @@ static void sr_kref_release(struct kref
struct gendisk *disk = cd->disk;
spin_lock(&sr_index_lock);
+ cd->use = 0;
clear_bit(disk->first_minor, sr_index_bits);
spin_unlock(&sr_index_lock);
diff -purN linux-2.6.9.orig/drivers/scsi/sr_ioctl.c linux-2.6.9-olh/drivers/scsi/sr_ioctl.c
--- linux-2.6.9.orig/drivers/scsi/sr_ioctl.c 2004-10-22 19:02:43.547399768 +0200
+++ linux-2.6.9-olh/drivers/scsi/sr_ioctl.c 2004-10-24 16:31:05.921540512 +0200
@@ -86,6 +86,11 @@ int sr_do_ioctl(Scsi_CD *cd, struct pack
struct request *req;
int result, err = 0, retries = 0;
+ if(!cd->use) {
+ err = -ENODEV;
+ goto out;
+ }
+
SDev = cd->device;
SRpnt = scsi_allocate_request(SDev, GFP_KERNEL);
if (!SRpnt) {
--
USB is for mice, FireWire is for men!
sUse lINUX ag, nÜRNBERG
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
2004-10-24 14:42 ` 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1 Olaf Hering
@ 2004-10-24 14:53 ` James Bottomley
2004-10-24 15:00 ` Olaf Hering
2004-10-31 21:05 ` Herbert Schmid
1 sibling, 1 reply; 17+ messages in thread
From: James Bottomley @ 2004-10-24 14:53 UTC (permalink / raw)
To: Olaf Hering; +Cc: Dan Hollis, SCSI Mailing List, bcollins, linux1394-devel
On Sun, 2004-10-24 at 10:42, Olaf Hering wrote:
> This is the backtrace:
>
> knodemgrd_0 D 00000000 0 296 1 307 293 (L-TLB)
> Call trace:
> [c000a228] __switch_to+0x48/0x70
> [c017736c] schedule+0x2b8/0x5e0
> [c0177914] wait_for_completion+0x7c/0xec
> [c5d2fb94] scsi_wait_req+0x64/0xac [scsi_mod]
> [c54f7598] sr_do_ioctl+0x70/0x240 [sr_mod]
> [c54f6a00] sr_packet+0x5c/0x9c [sr_mod]
> [c5d11a34] cdrom_get_disc_info+0x60/0xc4 [cdrom]
> [c5d121e4] cdrom_mrw_exit+0x1c/0x104 [cdrom]
> [c5d10b6c] unregister_cdrom+0xd0/0x104 [cdrom]
> [c54f61a8] sr_kref_release+0x54/0x80 [sr_mod]
> [c00ad96c] kref_put+0x60/0x70
> [c54f6924] sr_remove+0x50/0xd0 [sr_mod]
> [c00eb100] device_release_driver+0x1b8/0x1bc
> [c00eb2f0] bus_remove_device+0xc0/0x12c
> [c00e9620] device_del+0xa4/0x114
>
>
> The trouble starts in register_cdrom(), it sets the ->exit() function if
> the CD can do CDC_MRW_W.
> On unregister, it tries to send a packet to the device which is already
> gone.
There's something else going on here. The CD should hold a reference to
the device so is entitled to send it a command. If the device isn't
there (and the firewire driver doesn't reject it immediately) it should
go into error handling and eventually offline the device ... what do the
kernel messages say is going on?
James
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
2004-10-24 14:53 ` James Bottomley
@ 2004-10-24 15:00 ` Olaf Hering
2004-10-24 15:22 ` James Bottomley
0 siblings, 1 reply; 17+ messages in thread
From: Olaf Hering @ 2004-10-24 15:00 UTC (permalink / raw)
To: James Bottomley; +Cc: Dan Hollis, SCSI Mailing List, bcollins, linux1394-devel
On Sun, Oct 24, James Bottomley wrote:
> There's something else going on here. The CD should hold a reference to
> the device so is entitled to send it a command. If the device isn't
> there (and the firewire driver doesn't reject it immediately) it should
> go into error handling and eventually offline the device ... what do the
> kernel messages say is going on?
turn off cdrom:
ohci1394: fw-host0: IntEvent: 00030010
ohci1394: fw-host0: irq_handler: Bus reset requested
ohci1394: fw-host0: Cancel request received
ohci1394: fw-host0: Got RQPkt interrupt status=0x00008409
ohci1394: fw-host0: SelfID interrupt received (phyid 0, root)
ohci1394: fw-host0: SelfID packet 0x807f8c56 received
ieee1394: Including SelfID 0x807f8c56
ohci1394: fw-host0: SelfID for this node is 0x807f8c56
ohci1394: fw-host0: SelfID complete
ohci1394: fw-host0: PhyReqFilter=ffffffffffffffff
ieee1394: selfid_complete called with successful SelfID stage ... irm_id: 0xFFC0 node_id: 0xFFC0
ieee1394: NodeMgr: Processing host reset for knodemgrd_0
ohci1394: fw-host0: Single packet rcv'd
ohci1394: fw-host0: Got phy packet ctx=0 ... discarded
ieee1394: send packet local: ffc09940 ffc0ffff f0000400
ieee1394: received packet: ffc09940 ffc0ffff f0000400
ieee1394: send packet local: ffc09960 ffc00000 00000000 040462b0
ieee1394: received packet: ffc09960 ffc00000 00000000 040462b0
ieee1394: send packet local: ffc09d40 ffc0ffff f0000404
ieee1394: received packet: ffc09d40 ffc0ffff f0000404
ieee1394: send packet local: ffc09d60 ffc00000 00000000 31333934
ieee1394: received packet: ffc09d60 ffc00000 00000000 31333934
ieee1394: send packet local: ffc0a140 ffc0ffff f0000408
ieee1394: received packet: ffc0a140 ffc0ffff f0000408
ieee1394: send packet local: ffc0a160 ffc00000 00000000 e064a232
ieee1394: received packet: ffc0a160 ffc00000 00000000 e064a232
ieee1394: send packet local: ffc0a540 ffc0ffff f000040c
ieee1394: received packet: ffc0a540 ffc0ffff f000040c
ieee1394: send packet local: ffc0a560 ffc00000 00000000 00601d00
ieee1394: received packet: ffc0a560 ffc00000 00000000 00601d00
ieee1394: send packet local: ffc0a940 ffc0ffff f0000410
ieee1394: received packet: ffc0a940 ffc0ffff f0000410
ieee1394: send packet local: ffc0a960 ffc00000 00000000 000000dd
ieee1394: received packet: ffc0a960 ffc00000 00000000 000000dd
ieee1394: send packet local: ffc0ad50 ffc0ffff f0000400 04000000
ieee1394: received packet: ffc0ad50 ffc0ffff f0000400 04000000
ieee1394: send packet local: ffc0ad70 ffc00000 00000000 04000000
ieee1394: received packet: ffc0ad70 ffc00000 00000000 04000000
ieee1394: Node changed: 0-02:1023 -> 0-00:1023
ieee1394: send packet 100: ffff1100 ffc0ffff f0000234 c000001f
ohci1394: fw-host0: Inserting packet for node 0-63:1023, tlabel=4, tcode=0x0, speed=0
ohci1394: fw-host0: Starting transmit DMA ctx=0
ohci1394: fw-host0: IntEvent: 00000001
ohci1394: fw-host0: Got reqTxComplete interrupt status=0x00008011
ohci1394: fw-host0: Packet sent to node 63 tcode=0x0 tLabel=0x04 ack=0x11 spd=0 data=0x1F0000C0 ctx=0
ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[00010410100036e0]
ieee1394: Node suspended: ID:BUS[0-01:1023] GUID[0001d200500601fb]
SysRq : Show State
sibling
task PC pid father child younger older
bash S 0FE3BC9C 0 1 0 2 (NOTLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c0177140] schedule+0x2b8/0x5e0
[c001cf7c] do_wait+0x1f8/0xd7c
[c00064d0] ret_from_syscall+0x0/0x4c
ksoftirqd/0 R running 0 2 1 3 (L-TLB)
events/0 S 00000000 0 3 1 4 26 2 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c0177140] schedule+0x2b8/0x5e0
[c002c288] worker_thread+0x224/0x228
[c0031134] kthread+0xf0/0x12c
[c0009834] kernel_thread+0x44/0x60
khelper S 00000000 0 4 3 13 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c0177140] schedule+0x2b8/0x5e0
[c002c288] worker_thread+0x224/0x228
[c0031134] kthread+0xf0/0x12c
[c0009834] kernel_thread+0x44/0x60
kblockd/0 R running 0 13 3 24 4 (L-TLB)
pdflush S 00000000 0 24 3 25 13 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c0177140] schedule+0x2b8/0x5e0
[c003e9d0] pdflush+0xc0/0x1fc
[c0031134] kthread+0xf0/0x12c
[c0009834] kernel_thread+0x44/0x60
pdflush S 00000000 0 25 3 27 24 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c0177140] schedule+0x2b8/0x5e0
[c003e9d0] pdflush+0xc0/0x1fc
[c0031134] kthread+0xf0/0x12c
[c0009834] kernel_thread+0x44/0x60
aio/0 S 00000000 0 27 3 25 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c0177140] schedule+0x2b8/0x5e0
[c002c288] worker_thread+0x224/0x228
[c0031134] kthread+0xf0/0x12c
[c0009834] kernel_thread+0x44/0x60
kswapd0 S 00000000 0 26 1 257 3 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c0177140] schedule+0x2b8/0x5e0
[c00454b4] kswapd+0x88/0xb8
[c0009834] kernel_thread+0x44/0x60
rpciod S 00000000 0 257 1 289 26 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c0177140] schedule+0x2b8/0x5e0
[c548749c] rpciod+0x1c8/0x2dc [sunrpc]
[c0009834] kernel_thread+0x44/0x60
sh R running 0 289 1 293 257 (NOTLB)
khpsbpkt S 00000000 0 293 1 296 289 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c0177140] schedule+0x2b8/0x5e0
[c0176c34] __down_interruptible+0xcc/0x15c
[c54cc364] hpsbpkt_thread+0xc8/0xe0 [ieee1394]
[c0009834] kernel_thread+0x44/0x60
knodemgrd_0 D 00000000 0 296 1 321 293 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c0177140] schedule+0x2b8/0x5e0
[c01776e8] wait_for_completion+0x7c/0xec
[c5d2fb94] scsi_wait_req+0x64/0xac [scsi_mod]
[c54f5590] sr_do_ioctl+0x9c/0x26c [sr_mod]
[c54f49cc] sr_packet+0x28/0x68 [sr_mod]
[c5d11a34] cdrom_get_disc_info+0x60/0xc4 [cdrom]
[c5d121e4] cdrom_mrw_exit+0x1c/0x104 [cdrom]
[c5d10b6c] unregister_cdrom+0xd0/0x104 [cdrom]
[c54f41a8] sr_kref_release+0x54/0x80 [sr_mod]
[c00ad96c] kref_put+0x60/0x70
[c54f4924] sr_remove+0x50/0xd0 [sr_mod]
[c00eafcc] device_release_driver+0x84/0x88
[c00eb1bc] bus_remove_device+0xc0/0x12c
[c00e9620] device_del+0xa4/0x114
scsi_eh_1 S 00000000 0 321 1 296 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c0177140] schedule+0x2b8/0x5e0
[c0176c34] __down_interruptible+0xcc/0x15c
[c5d2db80] scsi_error_handler+0x590/0xe68 [scsi_mod]
[c0009834] kernel_thread+0x44/0x60
here is something with my debug stuff:
ohci1394: fw-host0: IntEvent: 00030010
ohci1394: fw-host0: irq_handler: Bus reset requested
ohci1394: fw-host0: Cancel request received
ohci1394: fw-host0: Got RQPkt interrupt status=0x00008409
ohci1394: fw-host0: SelfID interrupt received (phyid 0, root)
ohci1394: fw-host0: SelfID packet 0x807f8c56 received
ieee1394: Including SelfID 0x807f8c56
ohci1394: fw-host0: SelfID for this node is 0x807f8c56
ohci1394: fw-host0: SelfID complete
ohci1394: fw-host0: PhyReqFilter=ffffffffffffffff
ieee1394: selfid_complete called with successful SelfID stage ... irm_id: 0xFFC0 node_id: 0xFFC0
ieee1394: NodeMgr: Processing host reset for knodemgrd_0
ohci1394: fw-host0: Single packet rcv'd
ohci1394: fw-host0: Got phy packet ctx=0 ... discarded
ieee1394: send packet local: ffc05d40 ffc0ffff f0000400
ieee1394: received packet: ffc05d40 ffc0ffff f0000400
ieee1394: send packet local: ffc05d60 ffc00000 00000000 040462b0
ieee1394: received packet: ffc05d60 ffc00000 00000000 040462b0
ieee1394: send packet local: ffc06140 ffc0ffff f0000404
ieee1394: received packet: ffc06140 ffc0ffff f0000404
ieee1394: send packet local: ffc06160 ffc00000 00000000 31333934
ieee1394: received packet: ffc06160 ffc00000 00000000 31333934
ieee1394: send packet local: ffc06540 ffc0ffff f0000408
ieee1394: received packet: ffc06540 ffc0ffff f0000408
ieee1394: send packet local: ffc06560 ffc00000 00000000 e064a232
ieee1394: received packet: ffc06560 ffc00000 00000000 e064a232
ieee1394: send packet local: ffc06940 ffc0ffff f000040c
ieee1394: received packet: ffc06940 ffc0ffff f000040c
ieee1394: send packet local: ffc06960 ffc00000 00000000 00601d00
ieee1394: received packet: ffc06960 ffc00000 00000000 00601d00
ieee1394: send packet local: ffc06d40 ffc0ffff f0000410
ieee1394: received packet: ffc06d40 ffc0ffff f0000410
ieee1394: send packet local: ffc06d60 ffc00000 00000000 000000dd
ieee1394: received packet: ffc06d60 ffc00000 00000000 000000dd
ieee1394: send packet local: ffc07150 ffc0ffff f0000400 04000000
ieee1394: received packet: ffc07150 ffc0ffff f0000400 04000000
ieee1394: send packet local: ffc07170 ffc00000 00000000 04000000
ieee1394: received packet: ffc07170 ffc00000 00000000 04000000
ieee1394: Node changed: 0-01:1023 -> 0-00:1023
ieee1394: send packet 100: ffff0d00 ffc0ffff f0000234 c000001f
ohci1394: fw-host0: Inserting packet for node 0-63:1023, tlabel=3, tcode=0x0, speed=0
ohci1394: fw-host0: Starting transmit DMA ctx=0
ohci1394: fw-host0: IntEvent: 00000001
ohci1394: fw-host0: Got reqTxComplete interrupt status=0x00008011
ohci1394: fw-host0: Packet sent to node 63 tcode=0x0 tLabel=0x03 ack=0x11 spd=0 data=0x1F0000C0 ctx=0
nodemgr_node_probe(1354) knodemgrd_0(296) enter
nodemgr_probe_ne(1322) knodemgrd_0(296) enter
nodemgr_probe_ne(1344) knodemgrd_0(296) leave
nodemgr_probe_ne(1322) knodemgrd_0(296) enter
ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[0001d200500601fb]
nodemgr_suspend_ne(1253) knodemgrd_0(296) ud->ne c0fbd400(<NULL>) ne c0fbd400(<NULL>)
nodemgr_suspend_ne(1257) knodemgrd_0(296) suspend 00000000
device_release_driver(370) knodemgrd_0(296) enter
device_release_driver(372) knodemgrd_0(296) drv sbp2 hpsb_protocol_driver c54ed5f8
device_release_driver(373) knodemgrd_0(296) sysfs_remove_link
device_release_driver(375) knodemgrd_0(296) list_del_init
device_release_driver(377) knodemgrd_0(296) device_detach_shutdown
device_detach_shutdown(25) knodemgrd_0(296) enter
device_release_driver(379) knodemgrd_0(296) remove
ieee1394: sbp2: sbp2_remove
sbp2_remove(634) knodemgrd_0(296) enter
sbp2_remove(639) knodemgrd_0(296) scsi_id c0207760
ieee1394: sbp2: sbp2_logout_device
ieee1394: sbp2: sbp2_remove_device
device_release_driver(370) knodemgrd_0(296) enter
device_release_driver(372) knodemgrd_0(296) drv sr c54f68d4
device_release_driver(373) knodemgrd_0(296) sysfs_remove_link
device_release_driver(375) knodemgrd_0(296) list_del_init
device_release_driver(377) knodemgrd_0(296) device_detach_shutdown
device_detach_shutdown(25) knodemgrd_0(296) enter
device_release_driver(379) knodemgrd_0(296) remove
sr_packet(896) knodemgrd_0(296)
SysRq : Show State
sibling
task PC pid father child younger older
bash S 0FE3BC9C 0 1 0 2 (NOTLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c017736c] schedule+0x2b8/0x5e0
[c001cf7c] do_wait+0x1f8/0xd7c
[c00064d0] ret_from_syscall+0x0/0x4c
ksoftirqd/0 R running 0 2 1 3 (L-TLB)
events/0 S 00000000 0 3 1 4 26 2 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c017736c] schedule+0x2b8/0x5e0
[c002c288] worker_thread+0x224/0x228
[c0031134] kthread+0xf0/0x12c
[c0009834] kernel_thread+0x44/0x60
khelper S 00000000 0 4 3 13 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c017736c] schedule+0x2b8/0x5e0
[c002c288] worker_thread+0x224/0x228
[c0031134] kthread+0xf0/0x12c
[c0009834] kernel_thread+0x44/0x60
kblockd/0 R running 0 13 3 24 4 (L-TLB)
pdflush S 00000000 0 24 3 25 13 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c017736c] schedule+0x2b8/0x5e0
[c003e9d0] pdflush+0xc0/0x1fc
[c0031134] kthread+0xf0/0x12c
[c0009834] kernel_thread+0x44/0x60
pdflush S 00000000 0 25 3 27 24 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c017736c] schedule+0x2b8/0x5e0
[c003e9d0] pdflush+0xc0/0x1fc
[c0031134] kthread+0xf0/0x12c
[c0009834] kernel_thread+0x44/0x60
aio/0 S 00000000 0 27 3 25 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c017736c] schedule+0x2b8/0x5e0
[c002c288] worker_thread+0x224/0x228
[c0031134] kthread+0xf0/0x12c
[c0009834] kernel_thread+0x44/0x60
kswapd0 S 00000000 0 26 1 257 3 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c017736c] schedule+0x2b8/0x5e0
[c00454b4] kswapd+0x88/0xb8
[c0009834] kernel_thread+0x44/0x60
rpciod S 00000000 0 257 1 289 26 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c017736c] schedule+0x2b8/0x5e0
[c548749c] rpciod+0x1c8/0x2dc [sunrpc]
[c0009834] kernel_thread+0x44/0x60
sh R running 0 289 1 293 257 (NOTLB)
khpsbpkt S 00000000 0 293 1 296 289 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c017736c] schedule+0x2b8/0x5e0
[c0176e60] __down_interruptible+0xcc/0x15c
[c54cc364] hpsbpkt_thread+0xc8/0xe0 [ieee1394]
[c0009834] kernel_thread+0x44/0x60
knodemgrd_0 D 00000000 0 296 1 307 293 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c017736c] schedule+0x2b8/0x5e0
[c0177914] wait_for_completion+0x7c/0xec
[c5d2fb94] scsi_wait_req+0x64/0xac [scsi_mod]
[c54f7598] sr_do_ioctl+0x70/0x240 [sr_mod]
[c54f6a00] sr_packet+0x5c/0x9c [sr_mod]
[c5d11a34] cdrom_get_disc_info+0x60/0xc4 [cdrom]
[c5d121e4] cdrom_mrw_exit+0x1c/0x104 [cdrom]
[c5d10b6c] unregister_cdrom+0xd0/0x104 [cdrom]
[c54f61a8] sr_kref_release+0x54/0x80 [sr_mod]
[c00ad96c] kref_put+0x60/0x70
[c54f6924] sr_remove+0x50/0xd0 [sr_mod]
[c00eb100] device_release_driver+0x1b8/0x1bc
[c00eb2f0] bus_remove_device+0xc0/0x12c
[c00e9620] device_del+0xa4/0x114
scsi_eh_0 S 00000000 0 307 1 296 (L-TLB)
Call trace:
[c000a228] __switch_to+0x48/0x70
[c017736c] schedule+0x2b8/0x5e0
[c0176e60] __down_interruptible+0xcc/0x15c
[c5d2db80] scsi_error_handler+0x590/0xe68 [scsi_mod]
[c0009834] kernel_thread+0x44/0x60
--
USB is for mice, FireWire is for men!
sUse lINUX ag, nÜRNBERG
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
2004-10-24 15:00 ` Olaf Hering
@ 2004-10-24 15:22 ` James Bottomley
2004-10-24 17:22 ` Dmitry Torokhov
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: James Bottomley @ 2004-10-24 15:22 UTC (permalink / raw)
To: Olaf Hering; +Cc: Dan Hollis, SCSI Mailing List, bcollins, linux1394-devel
The trace doesn't show any error handler activity at all. Are there no
messages in the log about offlining the device? If not, it sounds like
there's a problem somewhere in the firewire system.
James
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
2004-10-24 15:22 ` James Bottomley
@ 2004-10-24 17:22 ` Dmitry Torokhov
2004-10-24 18:02 ` Olaf Hering
2004-10-24 19:29 ` Olaf Hering
2 siblings, 0 replies; 17+ messages in thread
From: Dmitry Torokhov @ 2004-10-24 17:22 UTC (permalink / raw)
To: linux1394-devel
Cc: James Bottomley, Olaf Hering, Dan Hollis, SCSI Mailing List,
bcollins
On Sunday 24 October 2004 10:22 am, James Bottomley wrote:
> The trace doesn't show any error handler activity at all. Are there no
> messages in the log about offlining the device? If not, it sounds like
> there's a problem somewhere in the firewire system.
>
The thing is that the hang is happening only with CD/DVD devices (sr_mod)
and not with hard drives (sd_mod) and as far as I can tell SBP2 does not
differentiate betweeh an HD and a CD so I think there is something going
on in sr_mod code.
--
Dmitry
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
2004-10-24 15:22 ` James Bottomley
2004-10-24 17:22 ` Dmitry Torokhov
@ 2004-10-24 18:02 ` Olaf Hering
2004-10-24 19:29 ` Olaf Hering
2 siblings, 0 replies; 17+ messages in thread
From: Olaf Hering @ 2004-10-24 18:02 UTC (permalink / raw)
To: James Bottomley; +Cc: Dan Hollis, SCSI Mailing List, bcollins, linux1394-devel
On Sun, Oct 24, James Bottomley wrote:
> The trace doesn't show any error handler activity at all. Are there no
> messages in the log about offlining the device? If not, it sounds like
> there's a problem somewhere in the firewire system.
sbp2_start_device doesnt save sdev, maybe thats a starting point to set
the device offline.
--
USB is for mice, FireWire is for men!
sUse lINUX ag, nÜRNBERG
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
[not found] <200410241930.47104.chrivers@iversen-net.dk>
@ 2004-10-24 18:12 ` Stefan Richter
0 siblings, 0 replies; 17+ messages in thread
From: Stefan Richter @ 2004-10-24 18:12 UTC (permalink / raw)
To: linux-scsi; +Cc: linux1394-devel
Christian Iversen wrote to linux1394-devel:
> On Sunday 24 October 2004 19:22, Dmitry Torokhov wrote:
>> On Sunday 24 October 2004 10:22 am, James Bottomley wrote:
>>> The trace doesn't show any error handler activity at all. Are there no
>>> messages in the log about offlining the device? If not, it sounds like
>>> there's a problem somewhere in the firewire system.
>>
>> The thing is that the hang is happening only with CD/DVD devices (sr_mod)
>> and not with hard drives (sd_mod) and as far as I can tell SBP2 does not
>> differentiate betweeh an HD and a CD so I think there is something going
>> on in sr_mod code.
>
> I can't see the start of the thread, but I have a very similar problem with
> external CD-ROMs in usb-2.0 cases. Whenever I unplug them, they kill the
> entire system (usb subsystem at first, then the whole kernel locks up).
...
> This is tested on 3 different controllers, 2 different motherboards and 3
> different kernels. 2.6.7-2.6.9 all have this problem. 2.6.5-rc3 works like
> a charm, and I haven't tested 2.6.6.
I have 2 different SBP-2 CD-RW drives, 3 different SBP-2 harddrives, and
use an USB memory stick occasionally. After the last kernel update (from
2.6.6-rc3 to 2.6.8.1) I noticed problems with device removals concerning
/dev/scd? for the first time. I did not use the drives very extensively
under the former kernel so I am not sure whether that worked fine. One
thing is however certain: There is no hassle with sd devices.
There were some more reports about the CD/DVD removal problems with
recent 2.6 kernels on linux1394-user/-devel, but nothing similar about
harddisks.
--
Stefan Richter
-=====-=-=-- =-=- ==---
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
2004-10-24 15:22 ` James Bottomley
2004-10-24 17:22 ` Dmitry Torokhov
2004-10-24 18:02 ` Olaf Hering
@ 2004-10-24 19:29 ` Olaf Hering
2005-07-31 14:43 ` Dan Hollis
2 siblings, 1 reply; 17+ messages in thread
From: Olaf Hering @ 2004-10-24 19:29 UTC (permalink / raw)
To: James Bottomley; +Cc: Dan Hollis, SCSI Mailing List, bcollins, linux1394-devel
On Sun, Oct 24, James Bottomley wrote:
> The trace doesn't show any error handler activity at all. Are there no
> messages in the log about offlining the device? If not, it sounds like
> there's a problem somewhere in the firewire system.
How should sbp2_remove_device get rid of the device? It calls
scsi_remove_host, which calls scsi_remove_device, which sets the mode to
SDEV_CANCEL, then calls to device_del. sr_do_ioctl expects SDEV_OFFLINE.
--
USB is for mice, FireWire is for men!
sUse lINUX ag, nÜRNBERG
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
2004-10-24 14:42 ` 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1 Olaf Hering
2004-10-24 14:53 ` James Bottomley
@ 2004-10-31 21:05 ` Herbert Schmid
1 sibling, 0 replies; 17+ messages in thread
From: Herbert Schmid @ 2004-10-31 21:05 UTC (permalink / raw)
To: Olaf Hering; +Cc: Dan Hollis, linux-scsi, bcollins, linux1394-devel
Hi,
I've tested the scsi bugs with 2.6.9, 2.6.9 with ieee1392 svn-trunk 1233
from today and 2.6.10-rc1 with and without the patch from Olaf.
Results:
Removing a unused, unmounted hard disk from 1394 was no problem at all.
Removing a unused, unmounted cdrom drive from 1394 FAILS, if
scsi-cdrom is compiled in and
Olaf's patch is not used.
FAIL meens, that cat /proc/scsi/scsi can't even be killed -9.
Using only scsi generic adding and removing was detected correctly.
Using the patch, you can even remove the cdrom drive while a disk is
mounted. (ls does give nothing, there haven't been even . and ..). You can
unmount the disk later. With linux 2.6.10-rc1 scsi is unstable after and
may lock later, especially if you use usb additionally.
Removing any firewire device (disk or cd) during use, always kills
scsi. From my (user) point of view, this seems to be a different problem,
which doesn't make the patch of Olaf useless.
USB
Removing a usb cdrom device without the patch will kill the whole kernel.
Using the patch, the behavior is like with 1394, i.e. cat /proc/scsi/scsi
works.
dd if=/dev/scdX
(only tested with the patch)
On 2.6.9 with the trunk from today: dd gets unkillable, but
/proc/scsi/scsi still works.
(2.6.10 /proc/scsi/scsi or whole kernel fails.)
USB stops this with an I/O error.
Overview:
test 2.6.9 2.6.9fw1233 2.6.10rc1
add/remove without patch
hd + + +
cd
no scsicd + + +
with scsicd - - -
add/remove with patch
with scsicd + + +
remove mounted + + unstable
remove while access
hd - - -
with patch and scsi-cd
cd - only dd locked -
usb
add/remove without patch
cd (untested) (untested) -
add/remove with patch
(untested) + unstable
remove while access
(untested) i/o error (untested)
Summary (only my opinion as user):
Comparing 2.6.10rc1 to 2.6.9, scsi lost stability.
The patch is good at least for 2.6.9.
Hot unplugging during use seems not to be managed well by scsi or 1394.
Yours,
Herbert
P.S.: 2.4.27 simply never removes the scsi drive object from memory and
passes all test. When removing a mounted device, it keeps wrong data in
the cache. Which is ok for cd-roms.
On Sun, 24 Oct 2004, Olaf Hering wrote:
> On Sun, Sep 05, Dan Hollis wrote:
>
> > Vendor: PIONEER Model: DVD-RW DVR-104 Rev: 1.31
> > Type: CD-ROM ANSI SCSI revision: 02
>
> > cat /proc/scsi/scsi, it shows up
> >
> > now turn drive off
> >
> > ieee1394: Node changed: 0-01:1023 -> 0-00:1023
> > ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[00309995505516f8]
> >
> > and now...
> > cat /proc/scsi/scsi
> >
> > entire firewire system is permanently out to lunch. the cat hangs and
> > can't be kill -9'd, no hotplug ever works again, the only solution is a
> > complete reboot.
>
> This is the backtrace:
>
> knodemgrd_0 D 00000000 0 296 1 307 293 (L-TLB)
> Call trace:
> [c000a228] __switch_to+0x48/0x70
> [c017736c] schedule+0x2b8/0x5e0
> [c0177914] wait_for_completion+0x7c/0xec
> [c5d2fb94] scsi_wait_req+0x64/0xac [scsi_mod]
> [c54f7598] sr_do_ioctl+0x70/0x240 [sr_mod]
> [c54f6a00] sr_packet+0x5c/0x9c [sr_mod]
> [c5d11a34] cdrom_get_disc_info+0x60/0xc4 [cdrom]
> [c5d121e4] cdrom_mrw_exit+0x1c/0x104 [cdrom]
> [c5d10b6c] unregister_cdrom+0xd0/0x104 [cdrom]
> [c54f61a8] sr_kref_release+0x54/0x80 [sr_mod]
> [c00ad96c] kref_put+0x60/0x70
> [c54f6924] sr_remove+0x50/0xd0 [sr_mod]
> [c00eb100] device_release_driver+0x1b8/0x1bc
> [c00eb2f0] bus_remove_device+0xc0/0x12c
> [c00e9620] device_del+0xa4/0x114
>
>
> The trouble starts in register_cdrom(), it sets the ->exit() function if
> the CD can do CDC_MRW_W.
> On unregister, it tries to send a packet to the device which is already
> gone.
>
> How about this patch?
>
>
> diff -purN linux-2.6.9.orig/drivers/scsi/sr.c linux-2.6.9-olh/drivers/scsi/sr.c
> --- linux-2.6.9.orig/drivers/scsi/sr.c 2004-10-22 19:02:43.545400072 +0200
> +++ linux-2.6.9-olh/drivers/scsi/sr.c 2004-10-24 16:32:10.765682704 +0200
> @@ -916,6 +918,7 @@ static void sr_kref_release(struct kref
> struct gendisk *disk = cd->disk;
>
> spin_lock(&sr_index_lock);
> + cd->use = 0;
> clear_bit(disk->first_minor, sr_index_bits);
> spin_unlock(&sr_index_lock);
>
> diff -purN linux-2.6.9.orig/drivers/scsi/sr_ioctl.c linux-2.6.9-olh/drivers/scsi/sr_ioctl.c
> --- linux-2.6.9.orig/drivers/scsi/sr_ioctl.c 2004-10-22 19:02:43.547399768 +0200
> +++ linux-2.6.9-olh/drivers/scsi/sr_ioctl.c 2004-10-24 16:31:05.921540512 +0200
> @@ -86,6 +86,11 @@ int sr_do_ioctl(Scsi_CD *cd, struct pack
> struct request *req;
> int result, err = 0, retries = 0;
>
> + if(!cd->use) {
> + err = -ENODEV;
> + goto out;
> + }
> +
> SDev = cd->device;
> SRpnt = scsi_allocate_request(SDev, GFP_KERNEL);
> if (!SRpnt) {
> --
> USB is for mice, FireWire is for men!
>
> sUse lINUX ag, nÜRNBERG
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
> Use IT products in your business? Tell us what you think of them. Give us
> Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
> http://productguide.itmanagersjournal.com/guidepromo.tmpl
> _______________________________________________
> mailing list linux1394-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux1394-devel
>
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
2004-10-24 19:29 ` Olaf Hering
@ 2005-07-31 14:43 ` Dan Hollis
2005-07-31 17:31 ` Stefan Richter
2005-07-31 19:16 ` Olaf Hering
0 siblings, 2 replies; 17+ messages in thread
From: Dan Hollis @ 2005-07-31 14:43 UTC (permalink / raw)
To: Olaf Hering; +Cc: James Bottomley, SCSI Mailing List, bcollins, linux1394-devel
On Sun, 24 Oct 2004, Olaf Hering wrote:
> On Sun, Oct 24, James Bottomley wrote:
> > The trace doesn't show any error handler activity at all. Are there no
> > messages in the log about offlining the device? If not, it sounds like
> > there's a problem somewhere in the firewire system.
> How should sbp2_remove_device get rid of the device? It calls
> scsi_remove_host, which calls scsi_remove_device, which sets the mode to
> SDEV_CANCEL, then calls to device_del. sr_do_ioctl expects SDEV_OFFLINE.
Has there been any progress on this issue? I just got my _entire scsi
subsystem_ sent out to lunch because a firewire device lost power.
-Dan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
2005-07-31 14:43 ` Dan Hollis
@ 2005-07-31 17:31 ` Stefan Richter
2005-07-31 17:54 ` Stefan Richter
2005-07-31 19:16 ` Olaf Hering
1 sibling, 1 reply; 17+ messages in thread
From: Stefan Richter @ 2005-07-31 17:31 UTC (permalink / raw)
To: SCSI Mailing List, linux1394-devel
Cc: Dan Hollis, Olaf Hering, James Bottomley, bcollins
Dan Hollis wrote:
> On Sun, 24 Oct 2004, Olaf Hering wrote:
>> On Sun, Oct 24, James Bottomley wrote:
>>>The trace doesn't show any error handler activity at all. Are there no
>>>messages in the log about offlining the device? If not, it sounds like
>>>there's a problem somewhere in the firewire system.
>>
>>How should sbp2_remove_device get rid of the device? It calls
>>scsi_remove_host, which calls scsi_remove_device, which sets the mode to
>>SDEV_CANCEL, then calls to device_del. sr_do_ioctl expects SDEV_OFFLINE.
This has been subtly changed in the latest sbp2 source from linux1394.org.
scsi_remove_device is now called before sbp2_logout_device and
sbp2_remove_device. The latter is still the caller of scsi_remove_host.
However, this little tweak only helps when sbp2 is unloaded or detached
from a device while the device is still physically present and working.
It does not improve the situation when a device was physically removed
while the drivers were still attached.
> Has there been any progress on this issue? I just got my _entire scsi
> subsystem_ sent out to lunch because a firewire device lost power.
The situation has got even worse. Since RBC conversions were moved from
sbp2 to sd_mod in linux-2.6.13-rc2, the same or a similar problem is now
exhibited by sd_mod too with most of the FireWire harddisks. The culprit
in sd_mod is obviously sd_shutdown/ sd_sync_cache, which blocks at
scsi_wait_req if the device is physically gone, thus effectively freezing
ieee1394's nodemgr thread. I have not yet looked closer into what might
be the equivalent in sr_mod.
However, there is a positive side of this: 1. With harddisks triggering
the same or a similar problem now, the pressure got really high to
finally fix this extremely annoying bug. I am trying to understand the
nature of the problem right now, but I am new in linux-scsi land. 2. At
least unplugging should be less of a problem if you always remember to
detach or unload sbp2 first. (Or alternatively sr_mod/ sd_mod?) Of
course this does not help if power on the SBP-2 device failed or
somebody tripped over the wire. And detaching the driver manually is of
course quite awkward.
--
Stefan Richter
-=====-=-=-= -=== =====
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
2005-07-31 17:31 ` Stefan Richter
@ 2005-07-31 17:54 ` Stefan Richter
2005-07-31 17:59 ` Christoph Hellwig
0 siblings, 1 reply; 17+ messages in thread
From: Stefan Richter @ 2005-07-31 17:54 UTC (permalink / raw)
To: linux-scsi, linux1394-devel
PS: Documentation/scsi/scsi_mid_low_api.txt suggests:
> An LLD that detects the removal of a SCSI device can instigate
> its removal from upper layers with this sequence:
>
> SCSI DEVICE hot unplug
> LLD mid level LLD
> ===----------------------=========-----------------===------
> scsi_remove_device() -------+
> |
> slave_destroy()
> ------------------------------------------------------------
Well, the LLD can _try_ it, but it may go horribly wrong.
--
Stefan Richter
-=====-=-=-= -=== =====
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
2005-07-31 17:54 ` Stefan Richter
@ 2005-07-31 17:59 ` Christoph Hellwig
2005-07-31 18:08 ` Stefan Richter
0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2005-07-31 17:59 UTC (permalink / raw)
To: Stefan Richter; +Cc: linux-scsi, linux1394-devel
On Sun, Jul 31, 2005 at 07:54:59PM +0200, Stefan Richter wrote:
> Well, the LLD can _try_ it, but it may go horribly wrong.
It seems to work nicely for everyone but sbp2..
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
2005-07-31 17:59 ` Christoph Hellwig
@ 2005-07-31 18:08 ` Stefan Richter
0 siblings, 0 replies; 17+ messages in thread
From: Stefan Richter @ 2005-07-31 18:08 UTC (permalink / raw)
To: linux-scsi, linux1394-devel
Christoph Hellwig wrote:
> On Sun, Jul 31, 2005 at 07:54:59PM +0200, Stefan Richter wrote:
>>Well, the LLD can _try_ it, but it may go horribly wrong.
>
> It seems to work nicely for everyone but sbp2..
Sure. There is be probably something in sbp2 or nodemgr that leads
to a deadlock, although the end of the road as we can see it is
located in the scsi highlevel.
--
Stefan Richter
-=====-=-=-= -=== =====
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
2005-07-31 14:43 ` Dan Hollis
2005-07-31 17:31 ` Stefan Richter
@ 2005-07-31 19:16 ` Olaf Hering
2005-07-31 22:57 ` [PATCH] ieee1394/sbp2: fix for hot-unplug Stefan Richter
2005-07-31 23:14 ` 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1 Stefan Richter
1 sibling, 2 replies; 17+ messages in thread
From: Olaf Hering @ 2005-07-31 19:16 UTC (permalink / raw)
To: Dan Hollis; +Cc: James Bottomley, SCSI Mailing List, bcollins, linux1394-devel
On Sun, Jul 31, Dan Hollis wrote:
> On Sun, 24 Oct 2004, Olaf Hering wrote:
> > On Sun, Oct 24, James Bottomley wrote:
> > > The trace doesn't show any error handler activity at all. Are there no
> > > messages in the log about offlining the device? If not, it sounds like
> > > there's a problem somewhere in the firewire system.
> > How should sbp2_remove_device get rid of the device? It calls
> > scsi_remove_host, which calls scsi_remove_device, which sets the mode to
> > SDEV_CANCEL, then calls to device_del. sr_do_ioctl expects SDEV_OFFLINE.
>
> Has there been any progress on this issue? I just got my _entire scsi
> subsystem_ sent out to lunch because a firewire device lost power.
I cant help you with that, sr_mod is likely still broken with MRW
capable drives. But sbp2 is still the culprit, incomplete error
handling.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] ieee1394/sbp2: fix for hot-unplug
2005-07-31 19:16 ` Olaf Hering
@ 2005-07-31 22:57 ` Stefan Richter
2005-07-31 23:14 ` 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1 Stefan Richter
1 sibling, 0 replies; 17+ messages in thread
From: Stefan Richter @ 2005-07-31 22:57 UTC (permalink / raw)
To: linux-scsi, linux1394-devel
When a FireWire CD-ROM or RBC harddisk was detached, the ieee1394
subsystem and probably the scsi subsystem was locked up. The fix
unblocks the sbp2 host before attempting to remove scsi devices.
Commands from scsi highlevel are no longer blocked, but rather
completed with DID_NO_CONNECT when the FireWire node was detached.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Index: sbp2.c
===================================================================
--- sbp2.c (revision 1316)
+++ sbp2.c (working copy)
@@ -637,8 +637,11 @@
scsi_id = ud->device.driver_data;
/* Trigger shutdown functions in scsi's highlevel. */
- if (scsi_id && scsi_id->sdev)
+ if (scsi_id) {
+ BUG_ON(!scsi_id->scsi_host || !scsi_id->sdev);
+ scsi_unblock_requests(scsi_id->scsi_host);
scsi_remove_device(scsi_id->sdev);
+ }
sbp2_logout_device(scsi_id);
sbp2_remove_device(scsi_id);
@@ -2310,6 +2313,7 @@
struct scsi_id_instance_data *scsi_id =
(struct scsi_id_instance_data *)SCpnt->device->host->hostdata[0];
struct sbp2scsi_host_info *hi;
+ int result = DID_NO_CONNECT << 16;
SBP2_DEBUG("sbp2scsi_queuecommand");
@@ -2317,30 +2321,28 @@
* If scsi_id is null, it means there is no device in this slot,
* so we should return selection timeout.
*/
- if (!scsi_id) {
- SCpnt->result = DID_NO_CONNECT << 16;
- done (SCpnt);
- return 0;
- }
+ if (!scsi_id)
+ goto done;
+ /*
+ * Node was detached.
+ */
+ if (scsi_id->ne->in_limbo)
+ goto done;
+
hi = scsi_id->hi;
if (!hi) {
SBP2_ERR("sbp2scsi_host_info is NULL - this is bad!");
- SCpnt->result = DID_NO_CONNECT << 16;
- done (SCpnt);
- return(0);
+ goto done;
}
/*
* Until we handle multiple luns, just return selection time-out
* to any IO directed at non-zero LUNs
*/
- if (SCpnt->device->lun) {
- SCpnt->result = DID_NO_CONNECT << 16;
- done (SCpnt);
- return(0);
- }
+ if (SCpnt->device->lun)
+ goto done;
/*
* Check for request sense command, and handle it here
@@ -2351,7 +2353,7 @@
memcpy(SCpnt->request_buffer, SCpnt->sense_buffer, SCpnt->request_bufflen);
memset(SCpnt->sense_buffer, 0, sizeof(SCpnt->sense_buffer));
sbp2scsi_complete_command(scsi_id, SBP2_SCSI_STATUS_GOOD, SCpnt, done);
- return(0);
+ return 0;
}
/*
@@ -2359,9 +2361,8 @@
*/
if (!hpsb_node_entry_valid(scsi_id->ne)) {
SBP2_ERR("Bus reset in progress - rejecting command");
- SCpnt->result = DID_BUS_BUSY << 16;
- done (SCpnt);
- return(0);
+ result = DID_BUS_BUSY << 16;
+ goto done;
}
/*
@@ -2372,8 +2373,12 @@
sbp2scsi_complete_command(scsi_id, SBP2_SCSI_STATUS_SELECTION_TIMEOUT,
SCpnt, done);
}
+ return 0;
- return(0);
+done:
+ SCpnt->result = result;
+ done(SCpnt);
+ return 0;
}
/*
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1
2005-07-31 19:16 ` Olaf Hering
2005-07-31 22:57 ` [PATCH] ieee1394/sbp2: fix for hot-unplug Stefan Richter
@ 2005-07-31 23:14 ` Stefan Richter
1 sibling, 0 replies; 17+ messages in thread
From: Stefan Richter @ 2005-07-31 23:14 UTC (permalink / raw)
To: linux-scsi, linux1394-devel; +Cc: goemon, James.Bottomley, bcollins, olh
On 31 Jul, Olaf Hering wrote:
> On Sun, Jul 31, Dan Hollis wrote:
>> On Sun, 24 Oct 2004, Olaf Hering wrote:
>> > On Sun, Oct 24, James Bottomley wrote:
>> > > The trace doesn't show any error handler activity at all. Are there no
>> > > messages in the log about offlining the device? If not, it sounds like
>> > > there's a problem somewhere in the firewire system.
>> > How should sbp2_remove_device get rid of the device? It calls
>> > scsi_remove_host, which calls scsi_remove_device, which sets the mode to
>> > SDEV_CANCEL, then calls to device_del. sr_do_ioctl expects SDEV_OFFLINE.
>>
>> Has there been any progress on this issue? I just got my _entire scsi
>> subsystem_ sent out to lunch because a firewire device lost power.
_Now_ there has been progress. :-) The patch I just posted should fix
it. I tested it with a CD-R/W as well as with a harddisk that was
affected. The patch applies to the latest sbp2 revision at linux1394.org
which is not yet merged into mainline.
> I cant help you with that, sr_mod is likely still broken with MRW
> capable drives. But sbp2 is still the culprit, incomplete error
> handling.
Sbp2 was one of the very few LDDs that uses scsi_block_requests(). It is
called whenever the FireWire bus is reset, e.g. when a node was
attached, detached, or switched off. The patch simply unblocks sbp2's
host before device removal is attempted in the upper scsi layers. It
also takes care of how commands are enqueued after a FireWire node was
detached. (I think we could get rid of scsi_block_requests() altogether
if we get the .queuecommand and .eh_*_handler right.)
So sbp2 was indeed the culprit, although the fault was not in its error
handling.
--
Stefan Richter
-=====-=-=-= =--- ----=
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2005-07-31 23:18 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.44.0409050434540.8779-100000@sasami.anime.net>
2004-10-24 14:42 ` 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1 Olaf Hering
2004-10-24 14:53 ` James Bottomley
2004-10-24 15:00 ` Olaf Hering
2004-10-24 15:22 ` James Bottomley
2004-10-24 17:22 ` Dmitry Torokhov
2004-10-24 18:02 ` Olaf Hering
2004-10-24 19:29 ` Olaf Hering
2005-07-31 14:43 ` Dan Hollis
2005-07-31 17:31 ` Stefan Richter
2005-07-31 17:54 ` Stefan Richter
2005-07-31 17:59 ` Christoph Hellwig
2005-07-31 18:08 ` Stefan Richter
2005-07-31 19:16 ` Olaf Hering
2005-07-31 22:57 ` [PATCH] ieee1394/sbp2: fix for hot-unplug Stefan Richter
2005-07-31 23:14 ` 100% repeatable way to send firewire out to lunch permanently on 2.6.8.1 Stefan Richter
2004-10-31 21:05 ` Herbert Schmid
[not found] <200410241930.47104.chrivers@iversen-net.dk>
2004-10-24 18:12 ` Stefan Richter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).