* [Bug 9734] New: I/O error when inserting a second firewire sata disk
@ 2008-01-12 15:47 bugme-daemon
2008-01-12 16:02 ` [Bug 9734] " bugme-daemon
2008-01-13 1:11 ` [Bug 9734] New: " James Bottomley
0 siblings, 2 replies; 7+ messages in thread
From: bugme-daemon @ 2008-01-12 15:47 UTC (permalink / raw)
To: linux-scsi
http://bugzilla.kernel.org/show_bug.cgi?id=9734
Summary: I/O error when inserting a second firewire sata disk
Product: IO/Storage
Version: 2.5
KernelVersion: 2.6.24 rc7
Platform: All
OS/Version: Linux
Tree: Mainline
Status: NEW
Severity: high
Priority: P1
Component: SCSI
AssignedTo: linux-scsi@vger.kernel.org
ReportedBy: sbu@squadrainformatica.com
Latest working kernel version: 2.6.18-5
(can't test any other kernel between 2.6.18-5 and 2.6.22-3 because aren't in
the repo)
Earliest failing kernel version: 2.6.22-3
Distribution: debian lenny 32bit AND 64 bit (it happens the same on both)
Hardware Environment: ibm x3400 quad core 2GB ram type/number 7976-KBG bios
version 1.56 http://www-03.ibm.com/systems/x/tower/x3400/specs.html
cpu info: -> http://www.pastebin.org/15078
lspci firewire: -> http://www.pastebin.org/15081
lsmod: -> http://www.pastebin.org/15083
Software Environment: bash
Problem Description:
We have a 64bit pci double firewire 800 port to which I attach 2 sata hdd (it
doesn't matter the brand of the hdd. We tried many)
Everything works well until we use only one disk. Connecting a second disk when
we are working on the first one (i.e. writing to the first device) causes the
interruption of the first job. once happened also a kernel freeze but we can't
document and reproduce it right now.
Steps to reproduce:
We attach a sata disk to the first port of the pci firewire B controller
08:02.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b Link Layer
Controller (rev 01)
Jan 12 16:19:13 x3400 kernel: ieee1394: Error parsing configrom for node
1-00:1023
Jan 12 16:19:13 x3400 kernel: ieee1394: Node changed: 1-00:1023 -> 1-01:1023
Jan 12 16:19:14 x3400 kernel: ieee1394: Node resumed: ID:BUS[1-00:1023]
GUID[0030e002e0454697]
Jan 12 16:19:14 x3400 kernel: scsi9 : SBP-2 IEEE-1394
Jan 12 16:19:15 x3400 kernel: ieee1394: sbp2: Logged into SBP-2 device
Jan 12 16:19:15 x3400 kernel: ieee1394: Node 1-00:1023: Max speed [S800] - Max
payload [4096]
Jan 12 16:19:15 x3400 kernel: Vendor: WDC WD16 Model: 00JD-00HBC0 Rev:
08.0
Jan 12 16:19:15 x3400 kernel: Type: Direct-Access-RBC ANSI
SCSI revision: 04
Jan 12 16:19:15 x3400 kernel: SCSI device sdc: 312579695 512-byte hdwr sectors
(160041 MB)
Jan 12 16:19:15 x3400 kernel: sdc: Write Protect is off
Jan 12 16:19:15 x3400 kernel: sdc: Mode Sense: 11 00 00 00
Jan 12 16:19:15 x3400 kernel: SCSI device sdc: drive cache: write back
Jan 12 16:19:15 x3400 kernel: SCSI device sdc: 312579695 512-byte hdwr sectors
(160041 MB)
Jan 12 16:19:15 x3400 kernel: sdc: Write Protect is off
Jan 12 16:19:15 x3400 kernel: sdc: Mode Sense: 11 00 00 00
Jan 12 16:19:15 x3400 kernel: SCSI device sdc: drive cache: write back
Jan 12 16:19:15 x3400 kernel: sdc: unknown partition table
Jan 12 16:19:15 x3400 kernel: sd 9:0:0:0: Attached scsi disk sdc
now we launch :
dd if=/dev/zero of=/dev/sdc
everything ok until now
now we attach another sata disc to the second port of the pci firewire
controller:
Jan 12 16:20:09 x3400 kernel: ieee1394: Error parsing configrom for node
1-00:1023
Jan 12 16:20:09 x3400 kernel: ieee1394: Node changed: 1-00:1023 -> 1-01:1023
Jan 12 16:20:09 x3400 kernel: ieee1394: Node changed: 1-01:1023 -> 1-02:1023
Jan 12 16:20:10 x3400 kernel: ieee1394: Reconnected to SBP-2 device
Jan 12 16:20:10 x3400 kernel: ieee1394: Node 1-01:1023: Max speed [S800] - Max
payload [4096]
using 2.6.18-5-686 everything works well
dd still works
now we disconnect the disk from the second port
Jan 12 16:21:13 x3400 kernel: ieee1394: Node changed: 1-01:1023 -> 1-00:1023
Jan 12 16:21:13 x3400 kernel: ieee1394: Node changed: 1-02:1023 -> 1-01:1023
Jan 12 16:21:13 x3400 kernel: ieee1394: Reconnected to SBP-2 device
Jan 12 16:21:13 x3400 kernel: ieee1394: Node 1-00:1023: Max speed [S800] - Max
payload [4096]
everything ok also disconnecting the second device
-------------------------------------------------------
now the same issue using 2.6.24-rc7-686:
we attach a sata disk to the first port of the pci firewire B controller
Jan 12 16:49:45 x3400 kernel: firewire_core: phy config: card 1, new root=ffc1,
gap_count=5
Jan 12 16:49:46 x3400 kernel: scsi11 : SBP-2 IEEE-1394
Jan 12 16:49:46 x3400 kernel: firewire_core: created new fw device fw2 (2
config rom retries, S800)
Jan 12 16:49:46 x3400 kernel: firewire_sbp2: logged in to fw2.0 LUN 0000 (0
retries)
Jan 12 16:49:46 x3400 kernel: scsi 11:0:0:0: Direct-Access-RBC WDC WD16
00JD-00HBC0 08.0 PQ: 0 ANSI: 4
Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] 312579695 512-byte hardware
sectors (160041 MB)
Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] Write Protect is off
Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] Mode Sense: 11 00 00 00
Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] Write cache: enabled, read
cache: enabled, doesn't support DPO or FUA
Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] 312579695 512-byte hardware
sectors (160041 MB)
Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] Write Protect is off
Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] Mode Sense: 11 00 00 00
Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] Write cache: enabled, read
cache: enabled, doesn't support DPO or FUA
Jan 12 16:49:46 x3400 kernel: sdc: unknown partition table
Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] Attached SCSI disk
now we launch :
dd if=/dev/zero of=/dev/sdc
now we attach another sata disk to the second port of the pci firewire B
controller:
Jan 12 16:50:49 x3400 kernel: firewire_sbp2: orb reply timed out, rcode=0x11
Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
driverbyte=DRIVER_OK,SUGGEST_OK
Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2942302
Jan 12 16:50:49 x3400 kernel: printk: 571588 messages suppressed.
Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
2942302
Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
2942303
Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
2942304
Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
2942305
Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
2942306
Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
2942307
Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
2942308
Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
2942309
Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
2942310
Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
2942311
Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
driverbyte=DRIVER_OK,SUGGEST_OK
Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2942557
Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
driverbyte=DRIVER_OK,SUGGEST_OK
Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2942812
Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
driverbyte=DRIVER_OK,SUGGEST_OK
Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2943067
Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
driverbyte=DRIVER_OK,SUGGEST_OK
Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2943322
Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
driverbyte=DRIVER_OK,SUGGEST_OK
Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2943577
Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
driverbyte=DRIVER_OK,SUGGEST_OK
Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2943832
Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
driverbyte=DRIVER_OK,SUGGEST_OK
Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2944087
Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
driverbyte=DRIVER_OK,SUGGEST_OK
Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2944342
Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
driverbyte=DRIVER_OK,SUGGEST_OK
Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2944597
it goes on this way until when we kill the dd
thanks in advance
damko & divilinux
--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 7+ messages in thread* [Bug 9734] I/O error when inserting a second firewire sata disk
2008-01-12 15:47 [Bug 9734] New: I/O error when inserting a second firewire sata disk bugme-daemon
@ 2008-01-12 16:02 ` bugme-daemon
2008-01-13 1:11 ` [Bug 9734] New: " James Bottomley
1 sibling, 0 replies; 7+ messages in thread
From: bugme-daemon @ 2008-01-12 16:02 UTC (permalink / raw)
To: linux-scsi
http://bugzilla.kernel.org/show_bug.cgi?id=9734
bunk@kernel.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bunk@kernel.org
AssignedTo|linux-scsi@vger.kernel.org |drivers_ieee1394@kernel-
| |bugs.osdl.org
Component|SCSI |IEEE1394
Product|IO/Storage |Drivers
Regression|0 |1
--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [Bug 9734] New: I/O error when inserting a second firewire sata disk
2008-01-12 15:47 [Bug 9734] New: I/O error when inserting a second firewire sata disk bugme-daemon
2008-01-12 16:02 ` [Bug 9734] " bugme-daemon
@ 2008-01-13 1:11 ` James Bottomley
2008-01-13 9:06 ` Stefan Richter
2008-01-19 13:20 ` Stefan Richter
1 sibling, 2 replies; 7+ messages in thread
From: James Bottomley @ 2008-01-13 1:11 UTC (permalink / raw)
To: bugme-daemon; +Cc: linux-scsi, Kristian Hoegsberg, linux1394-devel
Firewire list cc'd
On Sat, 2008-01-12 at 07:47 -0800, bugme-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9734
>
> Summary: I/O error when inserting a second firewire sata disk
> Product: IO/Storage
> Version: 2.5
> KernelVersion: 2.6.24 rc7
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: high
> Priority: P1
> Component: SCSI
> AssignedTo: linux-scsi@vger.kernel.org
> ReportedBy: sbu@squadrainformatica.com
>
>
> Latest working kernel version: 2.6.18-5
> (can't test any other kernel between 2.6.18-5 and 2.6.22-3 because aren't in
> the repo)
>
> Earliest failing kernel version: 2.6.22-3
>
> Distribution: debian lenny 32bit AND 64 bit (it happens the same on both)
>
> Hardware Environment: ibm x3400 quad core 2GB ram type/number 7976-KBG bios
> version 1.56 http://www-03.ibm.com/systems/x/tower/x3400/specs.html
> cpu info: -> http://www.pastebin.org/15078
> lspci firewire: -> http://www.pastebin.org/15081
> lsmod: -> http://www.pastebin.org/15083
>
> Software Environment: bash
>
> Problem Description:
> We have a 64bit pci double firewire 800 port to which I attach 2 sata hdd (it
> doesn't matter the brand of the hdd. We tried many)
> Everything works well until we use only one disk. Connecting a second disk when
> we are working on the first one (i.e. writing to the first device) causes the
> interruption of the first job. once happened also a kernel freeze but we can't
> document and reproduce it right now.
>
> Steps to reproduce:
> We attach a sata disk to the first port of the pci firewire B controller
>
> 08:02.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b Link Layer
> Controller (rev 01)
> Jan 12 16:19:13 x3400 kernel: ieee1394: Error parsing configrom for node
> 1-00:1023
> Jan 12 16:19:13 x3400 kernel: ieee1394: Node changed: 1-00:1023 -> 1-01:1023
> Jan 12 16:19:14 x3400 kernel: ieee1394: Node resumed: ID:BUS[1-00:1023]
> GUID[0030e002e0454697]
> Jan 12 16:19:14 x3400 kernel: scsi9 : SBP-2 IEEE-1394
> Jan 12 16:19:15 x3400 kernel: ieee1394: sbp2: Logged into SBP-2 device
> Jan 12 16:19:15 x3400 kernel: ieee1394: Node 1-00:1023: Max speed [S800] - Max
> payload [4096]
> Jan 12 16:19:15 x3400 kernel: Vendor: WDC WD16 Model: 00JD-00HBC0 Rev:
> 08.0
> Jan 12 16:19:15 x3400 kernel: Type: Direct-Access-RBC ANSI
> SCSI revision: 04
> Jan 12 16:19:15 x3400 kernel: SCSI device sdc: 312579695 512-byte hdwr sectors
> (160041 MB)
> Jan 12 16:19:15 x3400 kernel: sdc: Write Protect is off
> Jan 12 16:19:15 x3400 kernel: sdc: Mode Sense: 11 00 00 00
> Jan 12 16:19:15 x3400 kernel: SCSI device sdc: drive cache: write back
> Jan 12 16:19:15 x3400 kernel: SCSI device sdc: 312579695 512-byte hdwr sectors
> (160041 MB)
> Jan 12 16:19:15 x3400 kernel: sdc: Write Protect is off
> Jan 12 16:19:15 x3400 kernel: sdc: Mode Sense: 11 00 00 00
> Jan 12 16:19:15 x3400 kernel: SCSI device sdc: drive cache: write back
> Jan 12 16:19:15 x3400 kernel: sdc: unknown partition table
> Jan 12 16:19:15 x3400 kernel: sd 9:0:0:0: Attached scsi disk sdc
>
> now we launch :
> dd if=/dev/zero of=/dev/sdc
>
> everything ok until now
>
> now we attach another sata disc to the second port of the pci firewire
> controller:
>
> Jan 12 16:20:09 x3400 kernel: ieee1394: Error parsing configrom for node
> 1-00:1023
> Jan 12 16:20:09 x3400 kernel: ieee1394: Node changed: 1-00:1023 -> 1-01:1023
> Jan 12 16:20:09 x3400 kernel: ieee1394: Node changed: 1-01:1023 -> 1-02:1023
> Jan 12 16:20:10 x3400 kernel: ieee1394: Reconnected to SBP-2 device
> Jan 12 16:20:10 x3400 kernel: ieee1394: Node 1-01:1023: Max speed [S800] - Max
> payload [4096]
>
> using 2.6.18-5-686 everything works well
> dd still works
>
> now we disconnect the disk from the second port
>
> Jan 12 16:21:13 x3400 kernel: ieee1394: Node changed: 1-01:1023 -> 1-00:1023
> Jan 12 16:21:13 x3400 kernel: ieee1394: Node changed: 1-02:1023 -> 1-01:1023
> Jan 12 16:21:13 x3400 kernel: ieee1394: Reconnected to SBP-2 device
> Jan 12 16:21:13 x3400 kernel: ieee1394: Node 1-00:1023: Max speed [S800] - Max
> payload [4096]
>
> everything ok also disconnecting the second device
>
> -------------------------------------------------------
>
> now the same issue using 2.6.24-rc7-686:
>
> we attach a sata disk to the first port of the pci firewire B controller
>
> Jan 12 16:49:45 x3400 kernel: firewire_core: phy config: card 1, new root=ffc1,
> gap_count=5
> Jan 12 16:49:46 x3400 kernel: scsi11 : SBP-2 IEEE-1394
> Jan 12 16:49:46 x3400 kernel: firewire_core: created new fw device fw2 (2
> config rom retries, S800)
> Jan 12 16:49:46 x3400 kernel: firewire_sbp2: logged in to fw2.0 LUN 0000 (0
> retries)
> Jan 12 16:49:46 x3400 kernel: scsi 11:0:0:0: Direct-Access-RBC WDC WD16
> 00JD-00HBC0 08.0 PQ: 0 ANSI: 4
> Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] 312579695 512-byte hardware
> sectors (160041 MB)
> Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] Write Protect is off
> Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] Mode Sense: 11 00 00 00
> Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] Write cache: enabled, read
> cache: enabled, doesn't support DPO or FUA
> Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] 312579695 512-byte hardware
> sectors (160041 MB)
> Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] Write Protect is off
> Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] Mode Sense: 11 00 00 00
> Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] Write cache: enabled, read
> cache: enabled, doesn't support DPO or FUA
> Jan 12 16:49:46 x3400 kernel: sdc: unknown partition table
> Jan 12 16:49:46 x3400 kernel: sd 11:0:0:0: [sdc] Attached SCSI disk
>
> now we launch :
> dd if=/dev/zero of=/dev/sdc
>
> now we attach another sata disk to the second port of the pci firewire B
> controller:
>
> Jan 12 16:50:49 x3400 kernel: firewire_sbp2: orb reply timed out, rcode=0x11
> Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
> driverbyte=DRIVER_OK,SUGGEST_OK
Best I can tell, this is the source of the problem. The sbp2 driver is
replying DID_BUS_BUSY until that gets sorted out, which seems to be
never.
So, first pass analysis indicates the error to be in the firewire
subsystem. I'm guessing from the message that it's actually
drivers/firewire, not drivers iee1934?
> Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2942302
> Jan 12 16:50:49 x3400 kernel: printk: 571588 messages suppressed.
> Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
> 2942302
> Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
> Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
> 2942303
> Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
> Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
> 2942304
> Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
> Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
> 2942305
> Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
> Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
> 2942306
> Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
> Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
> 2942307
> Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
> Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
> 2942308
> Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
> Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
> 2942309
> Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
> Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
> 2942310
> Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
> Jan 12 16:50:49 x3400 kernel: Buffer I/O error on device sdc, logical block
> 2942311
> Jan 12 16:50:49 x3400 kernel: lost page write due to I/O error on sdc
> Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
> driverbyte=DRIVER_OK,SUGGEST_OK
> Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2942557
> Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
> driverbyte=DRIVER_OK,SUGGEST_OK
> Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2942812
> Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
> driverbyte=DRIVER_OK,SUGGEST_OK
> Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2943067
> Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
> driverbyte=DRIVER_OK,SUGGEST_OK
> Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2943322
> Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
> driverbyte=DRIVER_OK,SUGGEST_OK
> Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2943577
> Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
> driverbyte=DRIVER_OK,SUGGEST_OK
> Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2943832
> Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
> driverbyte=DRIVER_OK,SUGGEST_OK
> Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2944087
> Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
> driverbyte=DRIVER_OK,SUGGEST_OK
> Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2944342
> Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
> driverbyte=DRIVER_OK,SUGGEST_OK
> Jan 12 16:50:49 x3400 kernel: end_request: I/O error, dev sdc, sector 2944597
>
> it goes on this way until when we kill the dd
>
> thanks in advance
>
> damko & divilinux
James
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bug 9734] New: I/O error when inserting a second firewire sata disk
2008-01-13 1:11 ` [Bug 9734] New: " James Bottomley
@ 2008-01-13 9:06 ` Stefan Richter
2008-02-03 2:10 ` Stefan Richter
2008-01-19 13:20 ` Stefan Richter
1 sibling, 1 reply; 7+ messages in thread
From: Stefan Richter @ 2008-01-13 9:06 UTC (permalink / raw)
To: James Bottomley
Cc: Kristian Hoegsberg, linux1394-devel, linux-scsi, bugme-daemon
James Bottomley wrote:
> Firewire list cc'd
It's indeed a problem in the firewire stack.
...
>> now we attach another sata disk to the second port of the pci firewire B
>> controller:
>>
>> Jan 12 16:50:49 x3400 kernel: firewire_sbp2: orb reply timed out, rcode=0x11
>> Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
>> driverbyte=DRIVER_OK,SUGGEST_OK
>
> Best I can tell, this is the source of the problem. The sbp2 driver is
> replying DID_BUS_BUSY until that gets sorted out, which seems to be
> never.
When something was plugged in or out at the same bus, fw-sbp2 has to
reconnect == renew the login to each logical unit. The syslog in the
report is inconclusive whether that happened or failed. There should be
one of
"reconnected to ... LUN ... (... retries)"
or "failed to reconnect to ..."
logged eventually. As long as the reconnect didn't succeed, fw-sbp2
quits newly enqueued commands with DID_BUS_BUSY.
I need a bit spare time and a way to reproduce this or similar things to
get to the cause of this. (I had random reconnect problems with fw-sbp2
now and then, but not yet 100% reproducible ones like the reporter has.)
The last tested kernel in the report, 2.6.24-rc7, already contains the
latest and greatest firewire code as far as problem at hand is concerned.
> So, first pass analysis indicates the error to be in the firewire
> subsystem. I'm guessing from the message that it's actually
> drivers/firewire, not drivers iee1934?
Yes.
As a side note, the old sbp2 driver does not quit commands with
DID_BUS_BUSY between bus reset and reconnect. Instead it blocks the
Scsi_Host in order to not receive commands during that time at all.
The time from bus reset to successful reconnect is typically circa a
second, on bigger buses with lots of protocol activity after bus reset
perhaps a few seconds. (Could IMO still be optimized, in both of the
drivers/ieee1394 and drivers/firewire implementations.)
--
Stefan Richter
-=====-==--- ---= -==-=
http://arcgraph.de/sr/
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bug 9734] New: I/O error when inserting a second firewire sata disk
2008-01-13 9:06 ` Stefan Richter
@ 2008-02-03 2:10 ` Stefan Richter
0 siblings, 0 replies; 7+ messages in thread
From: Stefan Richter @ 2008-02-03 2:10 UTC (permalink / raw)
To: linux1394-devel
Cc: James Bottomley, Kristian Hoegsberg, linux-scsi, bugme-daemon
I wrote on 2008-01-13:
> James Bottomley wrote:
>> Firewire list cc'd
>>> Jan 12 16:50:49 x3400 kernel: firewire_sbp2: orb reply timed out, rcode=0x11
>>> Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY
>>> driverbyte=DRIVER_OK,SUGGEST_OK
>> Best I can tell, this is the source of the problem. The sbp2 driver is
>> replying DID_BUS_BUSY until that gets sorted out, which seems to be
>> never.
>
> When something was plugged in or out at the same bus, fw-sbp2 has to
> reconnect == renew the login to each logical unit. The syslog in the
> report is inconclusive whether that happened or failed.
In any case, there are frequently commands retried or newly enqueued
while fw-sbp2 waits to get the login renewed. (And fw-sbp2 continues to
complete them with DID_BUS_BUSY until the reconnection didn't succeed.
Whoever caused that I/O, e.g. dd like in the reporter's and my own
tests, will quickly fail.
> As a side note, the old sbp2 driver does not quit commands with
> DID_BUS_BUSY between bus reset and reconnect. Instead it blocks the
> Scsi_Host in order to not receive commands during that time at all.
I experimented with this yesterday. First I tried
scsi_internal_device_block() because we
- want to block logical units individually if possible,
- need to block from within atomic context (softirq context).
However, this failed miserably with all sorts of lock inversion bug
backtraces (alleged ones or real ones, I don't know) and with occasional
kernel lock-ups (so it were probably real lock inversions). These
locking issues cannot be solved easily because block layer and scsi_lib
play nauseating games with their locks.
So, I switched over to scsi_block_requests(), i.e. blocking the whole
host like the old sbp2 driver does. This doesn't seem to have
scsi_internal_device_block()'s locking issues. However, the sbp2 driver
has one Scsi_Host for each logical unit while the new fw-sbp2 driver
however has one Scsi_Host for each target. Hence there are difficulties
with targets with multiple logical units, but I probably got them sorted
out now.
There remain frequent problems with reconnection + re-login failures
though. These failures don't happen with exactly the same bus topology
if I don't run I/O during the bus resets. I have an idea though what to
try next...
--
Stefan Richter
-=====-==--- --=- ---==
http://arcgraph.de/sr/
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bug 9734] New: I/O error when inserting a second firewire sata disk
2008-01-13 1:11 ` [Bug 9734] New: " James Bottomley
2008-01-13 9:06 ` Stefan Richter
@ 2008-01-19 13:20 ` Stefan Richter
2008-01-19 20:47 ` Stefan Richter
1 sibling, 1 reply; 7+ messages in thread
From: Stefan Richter @ 2008-01-19 13:20 UTC (permalink / raw)
To: linux1394-devel
Cc: James Bottomley, Kristian Hoegsberg, linux-scsi, bugme-daemon
I did a quick and simple test now:
1.) switch on 1st disk (sdd)
Jan 19 13:46:18 stein sd 88:0:0:0: [sdd] Attached SCSI disk
Jan 19 13:46:18 stein sd 88:0:0:0: Attached scsi generic sg3 type 14
2.) start "dd if=/dev/sdd of=/dev/null"
3.) switch on 2nd disk (sde)
Jan 19 13:48:11 stein firewire_sbp2: orb reply timed out, rcode=0x11
Jan 19 13:48:11 stein sd 88:0:0:0: [sdd] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK
Jan 19 13:48:11 stein end_request: I/O error, dev sdd, sector 7871464
Jan 19 13:48:11 stein Buffer I/O error on device sdd, logical block 983933
Jan 19 13:48:11 stein Buffer I/O error on device sdd, logical block 983934
Jan 19 13:48:11 stein Buffer I/O error on device sdd, logical block 983935
Jan 19 13:48:11 stein Buffer I/O error on device sdd, logical block 983936
Jan 19 13:48:11 stein Buffer I/O error on device sdd, logical block 983937
Jan 19 13:48:11 stein Buffer I/O error on device sdd, logical block 983938
Jan 19 13:48:11 stein Buffer I/O error on device sdd, logical block 983939
Jan 19 13:48:11 stein Buffer I/O error on device sdd, logical block 983940
Jan 19 13:48:11 stein Buffer I/O error on device sdd, logical block 983941
Jan 19 13:48:11 stein Buffer I/O error on device sdd, logical block 983942
Jan 19 13:48:11 stein sd 88:0:0:0: [sdd] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK
Jan 19 13:48:11 stein end_request: I/O error, dev sdd, sector 7871712
Jan 19 13:48:11 stein sd 88:0:0:0: [sdd] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK
Jan 19 13:48:11 stein end_request: I/O error, dev sdd, sector 7871720
Jan 19 13:48:11 stein sd 88:0:0:0: [sdd] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK
Jan 19 13:48:11 stein end_request: I/O error, dev sdd, sector 7871464
Jan 19 13:48:11 stein firewire_sbp2: error status: 0:9
Jan 19 13:48:11 stein firewire_sbp2: error status: 0:9
Jan 19 13:48:11 stein firewire_sbp2: error status: 0:9
Jan 19 13:48:12 stein firewire_sbp2: error status: 0:9
Jan 19 13:48:12 stein firewire_sbp2: error status: 0:9
Jan 19 13:48:12 stein firewire_sbp2: failed to reconnect to fw3.0
Jan 19 13:48:12 stein firewire_sbp2: logged in to fw3.0 LUN 0000 (0 retries)
Jan 19 13:48:26 stein firewire_sbp2: orb reply timed out, rcode=0x11
Jan 19 13:48:27 stein firewire_sbp2: error status: 0:9
Jan 19 13:48:27 stein firewire_sbp2: error status: 0:9
Jan 19 13:48:27 stein firewire_sbp2: error status: 0:9
Jan 19 13:48:27 stein scsi89 : SBP-2 IEEE-1394
Jan 19 13:48:27 stein firewire_core: created new fw device fw4 (6 config rom retries, S800)
Jan 19 13:48:27 stein firewire_sbp2: error status: 0:9
Jan 19 13:48:27 stein firewire_sbp2: logged in to fw4.0 LUN 0000 (0 retries)
Jan 19 13:48:27 stein scsi 89:0:0:0: Direct-Access-RBC HDS72404 0KLAT80 KFAO PQ: 0 ANSI: 4
Jan 19 13:48:27 stein sd 89:0:0:0: [sde] 1562845488 512-byte hardware sectors (800177 MB)
Jan 19 13:48:27 stein sd 89:0:0:0: [sde] Write Protect is off
Jan 19 13:48:27 stein sd 89:0:0:0: [sde] Mode Sense: 11 00 00 00
Jan 19 13:48:27 stein sd 89:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jan 19 13:48:27 stein sde: sde1
Jan 19 13:48:27 stein sd 89:0:0:0: [sde] Attached SCSI disk
Jan 19 13:48:27 stein sd 89:0:0:0: Attached scsi generic sg4 type 14
Jan 19 13:48:27 stein firewire_sbp2: error status: 0:9
Jan 19 13:48:27 stein firewire_sbp2: failed to reconnect to fw3.0
Jan 19 13:48:28 stein firewire_sbp2: logged in to fw3.0 LUN 0000 (0 retries)
Doing that, dd aborted:
dd: reading `/dev/sdd': Input/output error
7871464+0 records in
7871464+0 records out
4030189568 bytes (4.0 GB) copied, 57.6538 s, 69.9 MB/s
sdd was however accessible again after the login at 13:48:28. As you
see, the SCSI stack did not take the disk offline after the I/O errors.
However, the desired behavior is that dd is simply put into I/O wait
state rather than failing with errors while the SBP-2 transport is busy.
Both disks used in this test are based on the OXFW912 bridge. The
"error status: 0:9" before reconnect is typical for them, at least as I
have set them up right now (1394b card, 1394b hub, disks on the hub).
The status means "request complete, function rejected". It has always
been recoverable here when I switched on the 2nd disk with the 1st disk
connected but idle.
Perhaps we should do it in fw-sbp2 like in sbp2: Put the scsi_device
(in sbp2: the Scsi_Host) into blocked state between bus reset event and
successful reconnection. We took a while to get this right in sbp2
because blocking the host is prone to deadlocks. Therefore I was so far
satisfied with fw-sbp2 not using that scheme.
However, the first thing I shall try is to insert a "return
SCSI_MLQUEUE_HOST_BUSY" early in sbp2_scsi_queuecommand, depending on a
check of the logical unit's generation.
A variation of this problem is if the bus reset -- reconnect phase
happens while the SCSI stack is just in the middle of INQUIRY or READ
CAPACITY. During this, the SCSI stack is very quick to offline a
device. We need to do something there, because this situation is not
too unusual (e.g. when several FireWire devices are being powered up
together).
I will proceed to experiment with this as spare time permits. Of course
any advice on how to best interact with the SCSI core while the
transport is busy would be appreciated.
--
Stefan Richter
-=====-==--- ---= =--==
http://arcgraph.de/sr/
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-02-03 2:10 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-12 15:47 [Bug 9734] New: I/O error when inserting a second firewire sata disk bugme-daemon
2008-01-12 16:02 ` [Bug 9734] " bugme-daemon
2008-01-13 1:11 ` [Bug 9734] New: " James Bottomley
2008-01-13 9:06 ` Stefan Richter
2008-02-03 2:10 ` Stefan Richter
2008-01-19 13:20 ` Stefan Richter
2008-01-19 20:47 ` Stefan Richter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox