linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sata_mv, io stucks
@ 2008-10-17 12:25 Artem Bokhan
  2008-10-23  8:53 ` Artem Bokhan
  2008-10-23 13:31 ` Harri Olin
  0 siblings, 2 replies; 18+ messages in thread
From: Artem Bokhan @ 2008-10-17 12:25 UTC (permalink / raw)
  To: linux-ide, tj

I try to simulate random reads  with "sysbench --test=fileio 
--num-threads=16 --max-requests=9999999 --max-time=60 --init-rng=on 
--file-num=16 --file-fsync-freq=0 --file-test-mode=rndrd 
--file-total-size=30G run"

Two marvell controllers, 16 disks, software raid10, IO stucks on 
different disks, kernel 2.6.26.5.
With default ubuntu's 8.04 2.6.24 kernel the problem can not be repeated


[  289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 
0x6 frozen
[  289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 tag 0 
ncq 4096 out
[  289.851697]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 
0x4 (timeout)
[  289.851774] ata11.00: status: { DRDY }
[  289.851834] ata11: hard resetting link
[  290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  290.749239] ata11.00: max_sectors limited to 256 for NCQ
[  290.809189] ata11.00: max_sectors limited to 256 for NCQ
[  290.809194] ata11.00: configured for UDMA/133
[  290.809200] ata11: EH complete
[  290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware sectors 
(1000205 MB)
[  290.809258] sd 10:0:0:0: [sdk] Write Protect is off
[  290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00
[  290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA


lspci:

05:01.0 SCSI storage controller: Marvell Technology Group Ltd. 
MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
        Subsystem: Marvell Technology Group Ltd. Unknown device 11ab
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- 
ParErr+ Stepping- SERR+ FastB2B+
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 32, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 24
        Region 0: Memory at d8200000 (64-bit, non-prefetchable) [size=1M]
        Region 2: I/O ports at 3000 [size=256]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ 
Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] PCI-X non-bridge device
                Command: DPERE- ERO- RBC=512 OST=4
                Status: Dev=05:01.0 64bit+ 133MHz+ SCD- USC- DC=simple 
DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-

05:02.0 SCSI storage controller: Marvell Technology Group Ltd. 
MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
        Subsystem: Marvell Technology Group Ltd. Unknown device 11ab
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- 
ParErr+ Stepping- SERR+ FastB2B+
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 32, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 28
        Region 0: Memory at d8300000 (64-bit, non-prefetchable) [size=1M]
        Region 2: I/O ports at 3400 [size=256]
        Region 3: [virtual] Memory at d8800000 (32-bit, 
non-prefetchable) [size=4M]
        [virtual] Expansion ROM at d9000000 [disabled] [size=4M]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ 
Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] PCI-X non-bridge device
                Command: DPERE- ERO- RBC=512 OST=4
                Status: Dev=05:02.0 64bit+ 133MHz+ SCD- USC- DC=simple 
DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-10-17 12:25 sata_mv, io stucks Artem Bokhan
@ 2008-10-23  8:53 ` Artem Bokhan
  2008-10-23 16:07   ` Mark Lord
  2008-10-23 13:31 ` Harri Olin
  1 sibling, 1 reply; 18+ messages in thread
From: Artem Bokhan @ 2008-10-23  8:53 UTC (permalink / raw)
  To: Artem Bokhan; +Cc: linux-ide

Any thought?

Artem Bokhan пишет:
> I try to simulate random reads  with "sysbench --test=fileio 
> --num-threads=16 --max-requests=9999999 --max-time=60 --init-rng=on 
> --file-num=16 --file-fsync-freq=0 --file-test-mode=rndrd 
> --file-total-size=30G run"
>
> Two marvell controllers, 16 disks, software raid10, IO stucks on 
> different disks, kernel 2.6.26.5.
> With default ubuntu's 8.04 2.6.24 kernel the problem can not be repeated
>
>
> [  289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 
> 0x6 frozen
> [  289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 tag 0 
> ncq 4096 out
> [  289.851697]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 
> 0x4 (timeout)
> [  289.851774] ata11.00: status: { DRDY }
> [  289.851834] ata11: hard resetting link
> [  290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [  290.749239] ata11.00: max_sectors limited to 256 for NCQ
> [  290.809189] ata11.00: max_sectors limited to 256 for NCQ
> [  290.809194] ata11.00: configured for UDMA/133
> [  290.809200] ata11: EH complete
> [  290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware sectors 
> (1000205 MB)
> [  290.809258] sd 10:0:0:0: [sdk] Write Protect is off
> [  290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00
> [  290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read cache: 
> enabled, doesn't support DPO or FUA
>
>
> lspci:
>
> 05:01.0 SCSI storage controller: Marvell Technology Group Ltd. 
> MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
>        Subsystem: Marvell Technology Group Ltd. Unknown device 11ab
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- 
> ParErr+ Stepping- SERR+ FastB2B+
>        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium 
> >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>        Latency: 32, Cache Line Size: 32 bytes
>        Interrupt: pin A routed to IRQ 24
>        Region 0: Memory at d8200000 (64-bit, non-prefetchable) [size=1M]
>        Region 2: I/O ports at 3000 [size=256]
>        Capabilities: [40] Power Management version 2
>                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ 
> Queue=0/0 Enable-
>                Address: 0000000000000000  Data: 0000
>        Capabilities: [60] PCI-X non-bridge device
>                Command: DPERE- ERO- RBC=512 OST=4
>                Status: Dev=05:01.0 64bit+ 133MHz+ SCD- USC- DC=simple 
> DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-
>
> 05:02.0 SCSI storage controller: Marvell Technology Group Ltd. 
> MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
>        Subsystem: Marvell Technology Group Ltd. Unknown device 11ab
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- 
> ParErr+ Stepping- SERR+ FastB2B+
>        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium 
> >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>        Latency: 32, Cache Line Size: 32 bytes
>        Interrupt: pin A routed to IRQ 28
>        Region 0: Memory at d8300000 (64-bit, non-prefetchable) [size=1M]
>        Region 2: I/O ports at 3400 [size=256]
>        Region 3: [virtual] Memory at d8800000 (32-bit, 
> non-prefetchable) [size=4M]
>        [virtual] Expansion ROM at d9000000 [disabled] [size=4M]
>        Capabilities: [40] Power Management version 2
>                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ 
> Queue=0/0 Enable-
>                Address: 0000000000000000  Data: 0000
>        Capabilities: [60] PCI-X non-bridge device
>                Command: DPERE- ERO- RBC=512 OST=4
>                Status: Dev=05:02.0 64bit+ 133MHz+ SCD- USC- DC=simple 
> DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-10-17 12:25 sata_mv, io stucks Artem Bokhan
  2008-10-23  8:53 ` Artem Bokhan
@ 2008-10-23 13:31 ` Harri Olin
  2008-10-23 16:32   ` Bokhan Artem
  1 sibling, 1 reply; 18+ messages in thread
From: Harri Olin @ 2008-10-23 13:31 UTC (permalink / raw)
  To: Artem Bokhan; +Cc: linux-ide, tj, liml

Artem Bokhan wrote:
> I try to simulate random reads  with "sysbench --test=fileio 
> --num-threads=16 --max-requests=9999999 --max-time=60 --init-rng=on 
> --file-num=16 --file-fsync-freq=0 --file-test-mode=rndrd 
> --file-total-size=30G run"
>
> Two marvell controllers, 16 disks, software raid10, IO stucks on 
> different disks, kernel 2.6.26.5.
> With default ubuntu's 8.04 2.6.24 kernel the problem can not be repeated

I have the same problem with recent kernels with updated sata_mv driver. 
First IO stops for a while and afer EH runs, everything works again for 
a while. Happens on 3 different computers using WD5000ABYS, WD5000YS and 
WD7500AYYS hard disks, RAID5 and 6 configurations using Linux MD.

Stalls seem to happen only on controller ports 0-3, ports 4-7 work 
without problems.

Contoller is Supermicro AOC-SAT2-MV8, connected to 133MHz PCI-X slot on 
one computer, 66MHz 64bit PCI slot on the second machine and to normal 
32bit PCI slot on third computer.
http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm

At the moment I don't have disks connected to failing ports, but if 
needed, I can test patches.

Oct 10 18:56:17 mizar kernel: ata10.00: exception Emask 0x0 SAct 0x0 
SErr 0x0 action 0x6 frozen
Oct 10 18:56:17 mizar kernel: ata10.00: cmd 
35/00:08:3f:52:54/00:00:57:00:00/e0 tag 0 dma 4096 out
Oct 10 18:56:17 mizar kernel:          res 
40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Oct 10 18:56:17 mizar kernel: ata10.00: status: { DRDY }
Oct 10 18:56:17 mizar kernel: ata10: hard resetting link
Oct 10 18:56:17 mizar kernel: ata10: SATA link up 1.5 Gbps (SStatus 113 
SControl 310)
Oct 10 18:56:17 mizar kernel: ata10.00: max_sectors limited to 256 for NCQ
Oct 10 18:56:17 mizar kernel: ata10.00: max_sectors limited to 256 for NCQ
Oct 10 18:56:17 mizar kernel: ata10.00: configured for UDMA/33
Oct 10 18:56:17 mizar kernel: ata10: EH complete
Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] 1465149168 512-byte 
hardware sectors (750156 MB)
Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] Write Protect is off
Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] Mode Sense: 00 3a 00 00
Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] Write cache: enabled, 
read cache: enabled, doesn't support DPO or FUA
Oct 10 19:34:58 mizar kernel: ata10.00: exception Emask 0x0 SAct 0x0 
SErr 0x0 action 0x6 frozen
Oct 10 19:34:58 mizar kernel: ata10.00: cmd 
35/00:08:3f:52:54/00:00:57:00:00/e0 tag 0 dma 4096 out
Oct 10 19:34:58 mizar kernel:          res 
40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Oct 10 19:34:58 mizar kernel: ata10.00: status: { DRDY }
Oct 10 19:34:58 mizar kernel: ata10: hard resetting link
Oct 10 19:34:58 mizar kernel: ata10: SATA link up 1.5 Gbps (SStatus 113 
SControl 310)
Oct 10 19:34:58 mizar kernel: ata10.00: max_sectors limited to 256 for NCQ
Oct 10 19:34:58 mizar kernel: ata10.00: max_sectors limited to 256 for NCQ
Oct 10 19:34:58 mizar kernel: ata10.00: configured for UDMA/33
Oct 10 19:34:58 mizar kernel: ata10: EH complete
Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] 1465149168 512-byte 
hardware sectors (750156 MB)
Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] Write Protect is off
Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] Mode Sense: 00 3a 00 00
Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] Write cache: enabled, 
read cache: enabled, doesn't support DPO or FUA

Oct 10 19:37:05 mizar kernel: ata10.00: exception Emask 0x0 SAct 0x0 
SErr 0x0 action 0x6 frozen
Oct 10 19:37:05 mizar kernel: ata10.00: cmd 
35/00:08:3f:52:54/00:00:57:00:00/e0 tag 0 dma 4096 out
Oct 10 19:37:05 mizar kernel:          res 
40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Oct 10 19:37:05 mizar kernel: ata10.00: status: { DRDY }
Oct 10 19:37:05 mizar kernel: ata10: hard resetting link
Oct 10 19:37:06 mizar kernel: ata10: SATA link up 1.5 Gbps (SStatus 113 
SControl 310)
Oct 10 19:37:06 mizar kernel: ata10.00: max_sectors limited to 256 for NCQ
Oct 10 19:37:06 mizar kernel: ata10.00: max_sectors limited to 256 for NCQ
Oct 10 19:37:06 mizar kernel: ata10.00: configured for UDMA/33
Oct 10 19:37:06 mizar kernel: ata10: EH complete
Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] 1465149168 512-byte 
hardware sectors (750156 MB)
Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] Write Protect is off
Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] Mode Sense: 00 3a 00 00
Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] Write cache: enabled, 
read cache: enabled, doesn't support DPO or FUA

Sep 26 15:47:14 mvsrv02 kernel: ata5.00: exception Emask 0x0 SAct 0xf 
SErr 0x0 action 0x6 frozen
Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd 
60/40:00:7f:a1:e2/00:00:28:00:00/40 tag 0 ncq 32768 in
Sep 26 15:47:14 mvsrv02 kernel:          res 
40/00:00:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY }
Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd 
60/40:08:3f:a1:e2/00:00:28:00:00/40 tag 1 ncq 32768 in
Sep 26 15:47:14 mvsrv02 kernel:          res 
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY }
Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd 
60/40:10:3f:a2:e2/00:00:28:00:00/40 tag 2 ncq 32768 in
Sep 26 15:47:14 mvsrv02 kernel:          res 
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY }
Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd 
60/c0:18:7f:a2:e2/00:00:28:00:00/40 tag 3 ncq 98304 in
Sep 26 15:47:14 mvsrv02 kernel:          res 
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY }
Sep 26 15:47:14 mvsrv02 kernel: ata5: hard resetting link
Sep 26 15:47:14 mvsrv02 kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 
SControl 300)
Sep 26 15:47:14 mvsrv02 kernel: ata5.00: max_sectors limited to 256 for NCQ
Sep 26 15:47:14 mvsrv02 kernel: ata5.00: max_sectors limited to 256 for NCQ
Sep 26 15:47:14 mvsrv02 kernel: ata5.00: configured for UDMA/133
Sep 26 15:47:14 mvsrv02 kernel: ata5: EH complete
Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] 976773168 512-byte 
hardware sectors (500108 MB)
Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] Write Protect is off
Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] Write cache: enabled, 
read cache: enabled, doesn't support DPO or FUA

1st comuter: 133MHz PCI-X slot
03:01.0 SCSI storage controller: Marvell Technology Group Ltd. 
MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
        Subsystem: Marvell Technology Group Ltd. Unknown device 11ab
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- 
ParErr- Stepping- SERR- FastB2B+ DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 32, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 48
        Region 0: Memory at d8800000 (64-bit, non-prefetchable) [size=1M]
        Region 2: I/O ports at 3000 [size=256]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ 
Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] PCI-X non-bridge device
                Command: DPERE- ERO- RBC=512 OST=4
                Status: Dev=03:01.0 64bit+ 133MHz+ SCD- USC- DC=simple 
DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-
        Kernel driver in use: sata_mv

2nd: 66MHz 64bit PCI
02:01.0 SCSI storage controller: Marvell Technology Group Ltd. 
MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
        Subsystem: Marvell Technology Group Ltd. Unknown device 11ab
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 32, Cache Line Size: 128 bytes
        Interrupt: pin A routed to IRQ 24
        Region 0: Memory at f2800000 (64-bit, non-prefetchable) [size=1M]
        Region 2: I/O ports at c000 [size=256]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ 
Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] PCI-X non-bridge device
                Command: DPERE- ERO- RBC=512 OST=4
                Status: Dev=02:01.0 64bit+ 133MHz+ SCD- USC- DC=simple 
DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-

3rd computer: 32bit 33MHz PCI
00:0a.0 SCSI storage controller: Marvell Technology Group Ltd. 
MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
        Subsystem: Marvell Technology Group Ltd. Unknown device 11ab
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- 
ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 32, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at cfe00000 (64-bit, non-prefetchable) [size=1M]
        Region 2: I/O ports at dc00 [size=256]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] Message Signalled Interrupts: 64bit+ 
Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] PCI-X non-bridge device
                Command: DPERE- ERO- RBC=512 OST=4
                Status: Dev=ff:1f.0 64bit+ 133MHz+ SCD- USC- DC=simple 
DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-


-- 
Harri.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-10-23  8:53 ` Artem Bokhan
@ 2008-10-23 16:07   ` Mark Lord
  2008-11-15 15:18     ` Harri Olin
  0 siblings, 1 reply; 18+ messages in thread
From: Mark Lord @ 2008-10-23 16:07 UTC (permalink / raw)
  To: Artem Bokhan; +Cc: linux-ide

Artem Bokhan wrote:
> Any thought?
> 
> Artem Bokhan пишет:
>> I try to simulate random reads  with "sysbench --test=fileio 
>> --num-threads=16 --max-requests=9999999 --max-time=60 --init-rng=on 
>> --file-num=16 --file-fsync-freq=0 --file-test-mode=rndrd 
>> --file-total-size=30G run"
>>
>> Two marvell controllers, 16 disks, software raid10, IO stucks on 
>> different disks, kernel 2.6.26.5.
>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be repeated
>>
>>
>> [  289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 
>> 0x6 frozen
>> [  289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 tag 0 
>> ncq 4096 out
>> [  289.851697]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 
>> 0x4 (timeout)
>> [  289.851774] ata11.00: status: { DRDY }
>> [  289.851834] ata11: hard resetting link
>> [  290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [  290.749239] ata11.00: max_sectors limited to 256 for NCQ
>> [  290.809189] ata11.00: max_sectors limited to 256 for NCQ
>> [  290.809194] ata11.00: configured for UDMA/133
>> [  290.809200] ata11: EH complete
>> [  290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware sectors 
>> (1000205 MB)
>> [  290.809258] sd 10:0:0:0: [sdk] Write Protect is off
>> [  290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00
>> [  290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read cache: 
>> enabled, doesn't support DPO or FUA
...

I've just returned here from a month holiday in Italy,
and I'll have a look at this and other sata_mv issues
next week or so.

Cheers

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-10-23 13:31 ` Harri Olin
@ 2008-10-23 16:32   ` Bokhan Artem
  0 siblings, 0 replies; 18+ messages in thread
From: Bokhan Artem @ 2008-10-23 16:32 UTC (permalink / raw)
  To: Harri Olin; +Cc: linux-ide, liml

The controller is AOC-SAT2-MV8 too.


Harri Olin пишет:
> Artem Bokhan wrote:
>> I try to simulate random reads  with "sysbench --test=fileio 
>> --num-threads=16 --max-requests=9999999 --max-time=60 --init-rng=on 
>> --file-num=16 --file-fsync-freq=0 --file-test-mode=rndrd 
>> --file-total-size=30G run"
>>
>> Two marvell controllers, 16 disks, software raid10, IO stucks on 
>> different disks, kernel 2.6.26.5.
>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be repeated
>
> I have the same problem with recent kernels with updated sata_mv 
> driver. First IO stops for a while and afer EH runs, everything works 
> again for a while. Happens on 3 different computers using WD5000ABYS, 
> WD5000YS and WD7500AYYS hard disks, RAID5 and 6 configurations using 
> Linux MD.
>
> Stalls seem to happen only on controller ports 0-3, ports 4-7 work 
> without problems.
>
> Contoller is Supermicro AOC-SAT2-MV8, connected to 133MHz PCI-X slot 
> on one computer, 66MHz 64bit PCI slot on the second machine and to 
> normal 32bit PCI slot on third computer.
> http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm
>
> At the moment I don't have disks connected to failing ports, but if 
> needed, I can test patches.
>
> Oct 10 18:56:17 mizar kernel: ata10.00: exception Emask 0x0 SAct 0x0 
> SErr 0x0 action 0x6 frozen
> Oct 10 18:56:17 mizar kernel: ata10.00: cmd 
> 35/00:08:3f:52:54/00:00:57:00:00/e0 tag 0 dma 4096 out
> Oct 10 18:56:17 mizar kernel:          res 
> 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
> Oct 10 18:56:17 mizar kernel: ata10.00: status: { DRDY }
> Oct 10 18:56:17 mizar kernel: ata10: hard resetting link
> Oct 10 18:56:17 mizar kernel: ata10: SATA link up 1.5 Gbps (SStatus 
> 113 SControl 310)
> Oct 10 18:56:17 mizar kernel: ata10.00: max_sectors limited to 256 for 
> NCQ
> Oct 10 18:56:17 mizar kernel: ata10.00: max_sectors limited to 256 for 
> NCQ
> Oct 10 18:56:17 mizar kernel: ata10.00: configured for UDMA/33
> Oct 10 18:56:17 mizar kernel: ata10: EH complete
> Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] 1465149168 512-byte 
> hardware sectors (750156 MB)
> Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] Write Protect is off
> Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] Mode Sense: 00 3a 00 00
> Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] Write cache: enabled, 
> read cache: enabled, doesn't support DPO or FUA
> Oct 10 19:34:58 mizar kernel: ata10.00: exception Emask 0x0 SAct 0x0 
> SErr 0x0 action 0x6 frozen
> Oct 10 19:34:58 mizar kernel: ata10.00: cmd 
> 35/00:08:3f:52:54/00:00:57:00:00/e0 tag 0 dma 4096 out
> Oct 10 19:34:58 mizar kernel:          res 
> 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
> Oct 10 19:34:58 mizar kernel: ata10.00: status: { DRDY }
> Oct 10 19:34:58 mizar kernel: ata10: hard resetting link
> Oct 10 19:34:58 mizar kernel: ata10: SATA link up 1.5 Gbps (SStatus 
> 113 SControl 310)
> Oct 10 19:34:58 mizar kernel: ata10.00: max_sectors limited to 256 for 
> NCQ
> Oct 10 19:34:58 mizar kernel: ata10.00: max_sectors limited to 256 for 
> NCQ
> Oct 10 19:34:58 mizar kernel: ata10.00: configured for UDMA/33
> Oct 10 19:34:58 mizar kernel: ata10: EH complete
> Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] 1465149168 512-byte 
> hardware sectors (750156 MB)
> Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] Write Protect is off
> Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] Mode Sense: 00 3a 00 00
> Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] Write cache: enabled, 
> read cache: enabled, doesn't support DPO or FUA
>
> Oct 10 19:37:05 mizar kernel: ata10.00: exception Emask 0x0 SAct 0x0 
> SErr 0x0 action 0x6 frozen
> Oct 10 19:37:05 mizar kernel: ata10.00: cmd 
> 35/00:08:3f:52:54/00:00:57:00:00/e0 tag 0 dma 4096 out
> Oct 10 19:37:05 mizar kernel:          res 
> 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
> Oct 10 19:37:05 mizar kernel: ata10.00: status: { DRDY }
> Oct 10 19:37:05 mizar kernel: ata10: hard resetting link
> Oct 10 19:37:06 mizar kernel: ata10: SATA link up 1.5 Gbps (SStatus 
> 113 SControl 310)
> Oct 10 19:37:06 mizar kernel: ata10.00: max_sectors limited to 256 for 
> NCQ
> Oct 10 19:37:06 mizar kernel: ata10.00: max_sectors limited to 256 for 
> NCQ
> Oct 10 19:37:06 mizar kernel: ata10.00: configured for UDMA/33
> Oct 10 19:37:06 mizar kernel: ata10: EH complete
> Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] 1465149168 512-byte 
> hardware sectors (750156 MB)
> Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] Write Protect is off
> Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] Mode Sense: 00 3a 00 00
> Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] Write cache: enabled, 
> read cache: enabled, doesn't support DPO or FUA
>
> Sep 26 15:47:14 mvsrv02 kernel: ata5.00: exception Emask 0x0 SAct 0xf 
> SErr 0x0 action 0x6 frozen
> Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd 
> 60/40:00:7f:a1:e2/00:00:28:00:00/40 tag 0 ncq 32768 in
> Sep 26 15:47:14 mvsrv02 kernel:          res 
> 40/00:00:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY }
> Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd 
> 60/40:08:3f:a1:e2/00:00:28:00:00/40 tag 1 ncq 32768 in
> Sep 26 15:47:14 mvsrv02 kernel:          res 
> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY }
> Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd 
> 60/40:10:3f:a2:e2/00:00:28:00:00/40 tag 2 ncq 32768 in
> Sep 26 15:47:14 mvsrv02 kernel:          res 
> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY }
> Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd 
> 60/c0:18:7f:a2:e2/00:00:28:00:00/40 tag 3 ncq 98304 in
> Sep 26 15:47:14 mvsrv02 kernel:          res 
> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY }
> Sep 26 15:47:14 mvsrv02 kernel: ata5: hard resetting link
> Sep 26 15:47:14 mvsrv02 kernel: ata5: SATA link up 3.0 Gbps (SStatus 
> 123 SControl 300)
> Sep 26 15:47:14 mvsrv02 kernel: ata5.00: max_sectors limited to 256 
> for NCQ
> Sep 26 15:47:14 mvsrv02 kernel: ata5.00: max_sectors limited to 256 
> for NCQ
> Sep 26 15:47:14 mvsrv02 kernel: ata5.00: configured for UDMA/133
> Sep 26 15:47:14 mvsrv02 kernel: ata5: EH complete
> Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] 976773168 512-byte 
> hardware sectors (500108 MB)
> Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] Write Protect is off
> Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] Write cache: 
> enabled, read cache: enabled, doesn't support DPO or FUA
>
> 1st comuter: 133MHz PCI-X slot
> 03:01.0 SCSI storage controller: Marvell Technology Group Ltd. 
> MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
>        Subsystem: Marvell Technology Group Ltd. Unknown device 11ab
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- 
> ParErr- Stepping- SERR- FastB2B+ DisINTx-
>        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium 
> >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Latency: 32, Cache Line Size: 32 bytes
>        Interrupt: pin A routed to IRQ 48
>        Region 0: Memory at d8800000 (64-bit, non-prefetchable) [size=1M]
>        Region 2: I/O ports at 3000 [size=256]
>        Capabilities: [40] Power Management version 2
>                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ 
> Queue=0/0 Enable-
>                Address: 0000000000000000  Data: 0000
>        Capabilities: [60] PCI-X non-bridge device
>                Command: DPERE- ERO- RBC=512 OST=4
>                Status: Dev=03:01.0 64bit+ 133MHz+ SCD- USC- DC=simple 
> DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-
>        Kernel driver in use: sata_mv
>
> 2nd: 66MHz 64bit PCI
> 02:01.0 SCSI storage controller: Marvell Technology Group Ltd. 
> MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
>        Subsystem: Marvell Technology Group Ltd. Unknown device 11ab
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
>        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium 
> >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>        Latency: 32, Cache Line Size: 128 bytes
>        Interrupt: pin A routed to IRQ 24
>        Region 0: Memory at f2800000 (64-bit, non-prefetchable) [size=1M]
>        Region 2: I/O ports at c000 [size=256]
>        Capabilities: [40] Power Management version 2
>                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ 
> Queue=0/0 Enable-
>                Address: 0000000000000000  Data: 0000
>        Capabilities: [60] PCI-X non-bridge device
>                Command: DPERE- ERO- RBC=512 OST=4
>                Status: Dev=02:01.0 64bit+ 133MHz+ SCD- USC- DC=simple 
> DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-
>
> 3rd computer: 32bit 33MHz PCI
> 00:0a.0 SCSI storage controller: Marvell Technology Group Ltd. 
> MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
>        Subsystem: Marvell Technology Group Ltd. Unknown device 11ab
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- 
> ParErr- Stepping- SERR+ FastB2B-
>        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium 
> >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>        Latency: 32, Cache Line Size: 32 bytes
>        Interrupt: pin A routed to IRQ 16
>        Region 0: Memory at cfe00000 (64-bit, non-prefetchable) [size=1M]
>        Region 2: I/O ports at dc00 [size=256]
>        Capabilities: [40] Power Management version 2
>                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>        Capabilities: [50] Message Signalled Interrupts: 64bit+ 
> Queue=0/0 Enable-
>                Address: 0000000000000000  Data: 0000
>        Capabilities: [60] PCI-X non-bridge device
>                Command: DPERE- ERO- RBC=512 OST=4
>                Status: Dev=ff:1f.0 64bit+ 133MHz+ SCD- USC- DC=simple 
> DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-
>
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-10-23 16:07   ` Mark Lord
@ 2008-11-15 15:18     ` Harri Olin
  2008-11-15 21:35       ` Mark Lord
  0 siblings, 1 reply; 18+ messages in thread
From: Harri Olin @ 2008-11-15 15:18 UTC (permalink / raw)
  To: Mark Lord; +Cc: Artem Bokhan, linux-ide

Mark Lord wrote:
>>> Two marvell controllers, 16 disks, software raid10, IO stucks on 
>>> different disks, kernel 2.6.26.5.
>>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be 
>>> repeated
>>>
>>>
>>> [  289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 
>>> action 0x6 frozen
>>> [  289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 tag 
>>> 0 ncq 4096 out
>>> [  289.851697]          res 40/00:00:00:00:00/00:00:00:00:00/00 
>>> Emask 0x4 (timeout)
>>> [  289.851774] ata11.00: status: { DRDY }
>>> [  289.851834] ata11: hard resetting link
>>> [  290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> [  290.749239] ata11.00: max_sectors limited to 256 for NCQ
>>> [  290.809189] ata11.00: max_sectors limited to 256 for NCQ
>>> [  290.809194] ata11.00: configured for UDMA/133
>>> [  290.809200] ata11: EH complete
>>> [  290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware 
>>> sectors (1000205 MB)
>>> [  290.809258] sd 10:0:0:0: [sdk] Write Protect is off
>>> [  290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00
>>> [  290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read cache: 
>>> enabled, doesn't support DPO or FUA
> ...
>
> I've just returned here from a month holiday in Italy,
> and I'll have a look at this and other sata_mv issues
> next week or so.

I ran git-bisect on it and it returned 
a3718c1f230240361ed92d3e53342df0ff7efa8c as first bad commit. Also 
verified by hand that patching it on working tree breaks it.

-- 
Harri.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-11-15 15:18     ` Harri Olin
@ 2008-11-15 21:35       ` Mark Lord
  2008-11-15 23:41         ` Harri Olin
  0 siblings, 1 reply; 18+ messages in thread
From: Mark Lord @ 2008-11-15 21:35 UTC (permalink / raw)
  To: Harri Olin; +Cc: Artem Bokhan, linux-ide

Harri Olin wrote:
> Mark Lord wrote:
>>>> Two marvell controllers, 16 disks, software raid10, IO stucks on 
>>>> different disks, kernel 2.6.26.5.
>>>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be 
>>>> repeated
>>>>
>>>>
>>>> [  289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 
>>>> action 0x6 frozen
>>>> [  289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 tag 
>>>> 0 ncq 4096 out
>>>> [  289.851697]          res 40/00:00:00:00:00/00:00:00:00:00/00 
>>>> Emask 0x4 (timeout)
>>>> [  289.851774] ata11.00: status: { DRDY }
>>>> [  289.851834] ata11: hard resetting link
>>>> [  290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>>> [  290.749239] ata11.00: max_sectors limited to 256 for NCQ
>>>> [  290.809189] ata11.00: max_sectors limited to 256 for NCQ
>>>> [  290.809194] ata11.00: configured for UDMA/133
>>>> [  290.809200] ata11: EH complete
>>>> [  290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware 
>>>> sectors (1000205 MB)
>>>> [  290.809258] sd 10:0:0:0: [sdk] Write Protect is off
>>>> [  290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00
>>>> [  290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read cache: 
>>>> enabled, doesn't support DPO or FUA
>> ...
>>
>> I've just returned here from a month holiday in Italy,
>> and I'll have a look at this and other sata_mv issues
>> next week or so.
> 
> I ran git-bisect on it and it returned 
> a3718c1f230240361ed92d3e53342df0ff7efa8c as first bad commit. Also 
> verified by hand that patching it on working tree breaks it.
..

Wow.. thanks for all of the hard work there.

So it has something to do with (qc->tf.flags & ATA_TFLAG_POLLING).
But what I don't see (yet), is exactly which path through
the driver is reporting this error.

Any path through mv_unexpected_intr() should have the
string "unexpected device interrupt" as part of the error report.

Similarly, any path through mv_err_intr() should have
the string "edma_err_cause=xxxxxxxx" in the error report.

I see neither.  So I wonder what path is being taken
through the driver that results in the error?
(That's the trouble with having been away from the code for five months..).

Since cmd 61 is an NCQ write, it pretty much has to be in EDMA mode,
which means it will likely be calling mv_process_crpb_entries()
and then dropping through to check ERR_IRQ and DEV_IRQ.

The old code checked ERR_IRQ above, and never saw it,
so the new code is probably not seeing ERR_IRQ either.

So it must be seeing DEV_IRQ after process_crpb_entries(),
something that the old code never checked for.
And which is not supposed to happen here.

Weird.

Looking at later kernels (after the commit in question), I see that
the code was further fixed to remove some possible races and stuff,
but that's still just 2.6.26.5, which you guys see failures on.

So here's some instrumentation to help us figure it out.
Please apply and report back once it triggers again.
Thanks.


--- linux-2.6.26.5/drivers/ata/sata_mv.c	2008-09-08 13:40:20.000000000 -0400
+++ linux/drivers/ata/sata_mv.c	2008-11-15 16:32:23.000000000 -0500
@@ -1999,12 +1999,15 @@
 				 * Error will be seen/handled by mv_err_intr().
 				 * So do nothing at all here.
 				 */
+				ata_port_printk(ap, KERN_WARNING, "mv_process_crpb_response1: err_cause=0x%x\n", err_cause);
 				return;
 			}
 		}
 		ata_status = edma_status >> CRPB_FLAG_STATUS_SHIFT;
 		if (!ac_err_mask(ata_status))
 			ata_qc_complete(qc);
+		else
+			ata_port_printk(ap, KERN_WARNING, "mv_process_crpb_response2: edma_status=0x%x\n", edma_status);
 		/* else: leave it for mv_err_intr() */
 	} else {
 		ata_port_printk(ap, KERN_ERR, "%s: no qc for tag=%d\n",
@@ -2070,20 +2073,25 @@
 	 */
 	if (edma_was_enabled && (port_cause & DONE_IRQ)) {
 		mv_process_crpb_entries(ap, pp);
-		if (pp->pp_flags & MV_PP_FLAG_DELAYED_EH)
+		if (pp->pp_flags & MV_PP_FLAG_DELAYED_EH) {
+			ata_port_printk(ap, KERN_WARNING, "mv_port_intr1: port_cause=0x%x(ERR_IRQ), ppflags=0x%x\n", port_cause, pp->pp_flags);
 			mv_handle_fbs_ncq_dev_err(ap);
+		}
 	}
 	/*
 	 * Handle chip-reported errors, or continue on to handle PIO.
 	 */
 	if (unlikely(port_cause & ERR_IRQ)) {
+		ata_port_printk(ap, KERN_WARNING, "mv_port_intr2: port_cause=0x%x(ERR_IRQ), edma=%d, ppflags=0x%x\n", port_cause, edma_was_enabled, pp->pp_flags);
 		mv_err_intr(ap);
 	} else if (!edma_was_enabled) {
 		struct ata_queued_cmd *qc = mv_get_active_qc(ap);
 		if (qc)
 			ata_sff_host_intr(ap, qc);
-		else
+		else {
+			ata_port_printk(ap, KERN_WARNING, "mv_port_intr3: port_cause=0x%x(ERR_IRQ), ppflags=0x%x\n", port_cause, pp->pp_flags);
 			mv_unexpected_intr(ap, edma_was_enabled);
+		}
 	}
 }
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-11-15 21:35       ` Mark Lord
@ 2008-11-15 23:41         ` Harri Olin
  2008-11-15 23:44           ` Justin Piszcz
  2008-11-16  4:43           ` Mark Lord
  0 siblings, 2 replies; 18+ messages in thread
From: Harri Olin @ 2008-11-15 23:41 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-ide

Mark Lord wrote:
> Harri Olin wrote:
>> Mark Lord wrote:
>>>>> Two marvell controllers, 16 disks, software raid10, IO stucks on 
>>>>> different disks, kernel 2.6.26.5.
>>>>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be 
>>>>> repeated
>>>>>
>>>>>
>>>>> [  289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 
>>>>> action 0x6 frozen
>>>>> [  289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 
>>>>> tag 0 ncq 4096 out
>>>>> [  289.851697]          res 40/00:00:00:00:00/00:00:00:00:00/00 
>>>>> Emask 0x4 (timeout)
>>>>> [  289.851774] ata11.00: status: { DRDY }
>>>>> [  289.851834] ata11: hard resetting link
>>>>> [  290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 
>>>>> 300)
>>>>> [  290.749239] ata11.00: max_sectors limited to 256 for NCQ
>>>>> [  290.809189] ata11.00: max_sectors limited to 256 for NCQ
>>>>> [  290.809194] ata11.00: configured for UDMA/133
>>>>> [  290.809200] ata11: EH complete
>>>>> [  290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware 
>>>>> sectors (1000205 MB)
>>>>> [  290.809258] sd 10:0:0:0: [sdk] Write Protect is off
>>>>> [  290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00
>>>>> [  290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read 
>>>>> cache: enabled, doesn't support DPO or FUA
>>> ...
>>>
>>> I've just returned here from a month holiday in Italy,
>>> and I'll have a look at this and other sata_mv issues
>>> next week or so.
>>
>> I ran git-bisect on it and it returned 
>> a3718c1f230240361ed92d3e53342df0ff7efa8c as first bad commit. Also 
>> verified by hand that patching it on working tree breaks it.
> Looking at later kernels (after the commit in question), I see that
> the code was further fixed to remove some possible races and stuff,
> but that's still just 2.6.26.5, which you guys see failures on.
>
> So here's some instrumentation to help us figure it out.
> Please apply and report back once it triggers again.
> Thanks.

I have to take back that bisect, as just couple of minutes ago it 
happened again, with last 'good' kernel from bisect. Just the frequency 
of stalls has dropped quite much. I also noticed that on current kernels 
are much better too.
pre-..0ff7efa8c: only once after 6 hours of testing
post-..0ff7efa8c: one hd stalled while filesystem was mounting. Before 
boot was complete, 3 stalls. Also at shutdown kernel hung at 
Synchronizing SCSI cache for a while.
2.6.27: once in 5 minutes or so on heavy load

When some hd/port stalls, other ports sill work fine.

I applied your patch on 2.6.27.1, no results:

ata14.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata14.00: cmd 61/08:00:3f:52:54/00:00:57:00:00/40 tag 0 ncq 4096 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata14.00: status: { DRDY }
ata14: hard resetting link
ata14: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata14.00: max_sectors limited to 256 for NCQ
ata14.00: max_sectors limited to 256 for NCQ
ata14.00: configured for UDMA/133
ata14: EH complete
sd 13:0:0:0: [sdh] 1465149168 512-byte hardware sectors (750156 MB)
sd 13:0:0:0: [sdh] Write Protect is off
sd 13:0:0:0: [sdh] Mode Sense: 00 3a 00 00
sd 13:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

Do I have to enable something somewhere else too?

I also compiled and patched linux-2.6-stable tree from git but it just 
paniced after stall instead of recovering. I'm currently trying to 
reproduce that on second computer where I can capture the panic.

-- 
Harri.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-11-15 23:41         ` Harri Olin
@ 2008-11-15 23:44           ` Justin Piszcz
  2008-11-15 23:47             ` Harri Olin
  2008-11-16  4:43           ` Mark Lord
  1 sibling, 1 reply; 18+ messages in thread
From: Justin Piszcz @ 2008-11-15 23:44 UTC (permalink / raw)
  To: Harri Olin; +Cc: Mark Lord, linux-ide



On Sun, 16 Nov 2008, Harri Olin wrote:

> Mark Lord wrote:
>> Harri Olin wrote:
>>> Mark Lord wrote:
>>>>>> Two marvell controllers, 16 disks, software raid10, IO stucks on 
>>>>>> different disks, kernel 2.6.26.5.
>>>>>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be 
>>>>>> repeated
>>>>>> 
>>>>>> 
>>>>>> [  289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 
>>>>>> 0x6 frozen
>>>>>> [  289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 tag 0 
>>>>>> ncq 4096 out
>>>>>> [  289.851697]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 
>>>>>> 0x4 (timeout)
>>>>>> [  289.851774] ata11.00: status: { DRDY }
>>>>>> [  289.851834] ata11: hard resetting link
>>>>>> [  290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>>>>> [  290.749239] ata11.00: max_sectors limited to 256 for NCQ
>>>>>> [  290.809189] ata11.00: max_sectors limited to 256 for NCQ
>>>>>> [  290.809194] ata11.00: configured for UDMA/133
>>>>>> [  290.809200] ata11: EH complete
>>>>>> [  290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware sectors 
>>>>>> (1000205 MB)
>>>>>> [  290.809258] sd 10:0:0:0: [sdk] Write Protect is off
>>>>>> [  290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00
>>>>>> [  290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read cache: 
>>>>>> enabled, doesn't support DPO or FUA
>>>> ...
>>>> 
>>>> I've just returned here from a month holiday in Italy,
>>>> and I'll have a look at this and other sata_mv issues
>>>> next week or so.
>>> 
>>> I ran git-bisect on it and it returned 
>>> a3718c1f230240361ed92d3e53342df0ff7efa8c as first bad commit. Also 
>>> verified by hand that patching it on working tree breaks it.
>> Looking at later kernels (after the commit in question), I see that
>> the code was further fixed to remove some possible races and stuff,
>> but that's still just 2.6.26.5, which you guys see failures on.
>> 
>> So here's some instrumentation to help us figure it out.
>> Please apply and report back once it triggers again.
>> Thanks.
>
> I have to take back that bisect, as just couple of minutes ago it happened 
> again, with last 'good' kernel from bisect. Just the frequency of stalls has 
> dropped quite much. I also noticed that on current kernels are much better 
> too.
> pre-..0ff7efa8c: only once after 6 hours of testing
> post-..0ff7efa8c: one hd stalled while filesystem was mounting. Before boot 
> was complete, 3 stalls. Also at shutdown kernel hung at Synchronizing SCSI 
> cache for a while.
> 2.6.27: once in 5 minutes or so on heavy load
>
> When some hd/port stalls, other ports sill work fine.
>
> I applied your patch on 2.6.27.1, no results:
>
> ata14.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
> ata14.00: cmd 61/08:00:3f:52:54/00:00:57:00:00/40 tag 0 ncq 4096 out
>        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata14.00: status: { DRDY }
> ata14: hard resetting link
> ata14: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata14.00: max_sectors limited to 256 for NCQ
> ata14.00: max_sectors limited to 256 for NCQ
> ata14.00: configured for UDMA/133
> ata14: EH complete
> sd 13:0:0:0: [sdh] 1465149168 512-byte hardware sectors (750156 MB)
> sd 13:0:0:0: [sdh] Write Protect is off
> sd 13:0:0:0: [sdh] Mode Sense: 00 3a 00 00
> sd 13:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support 
> DPO or FUA
>
> Do I have to enable something somewhere else too?
>
> I also compiled and patched linux-2.6-stable tree from git but it just 
> paniced after stall instead of recovering. I'm currently trying to reproduce 
> that on second computer where I can capture the panic.

What type of disks are you using?

Justin.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-11-15 23:44           ` Justin Piszcz
@ 2008-11-15 23:47             ` Harri Olin
  2008-11-15 23:52               ` Justin Piszcz
  0 siblings, 1 reply; 18+ messages in thread
From: Harri Olin @ 2008-11-15 23:47 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Mark Lord, linux-ide

Justin Piszcz wrote:
>
>
> On Sun, 16 Nov 2008, Harri Olin wrote:
>
>> Mark Lord wrote:
>>> Harri Olin wrote:
>>>> Mark Lord wrote:
>>>>>>> Two marvell controllers, 16 disks, software raid10, IO stucks on 
>>>>>>> different disks, kernel 2.6.26.5.
>>>>>>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be 
>>>>>>> repeated
>>>>>>>
>>>>>>>
>>>>>>> [  289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 
>>>>>>> action 0x6 frozen
>>>>>>> [  289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 
>>>>>>> tag 0 ncq 4096 out
>>>>>>> [  289.851697]          res 40/00:00:00:00:00/00:00:00:00:00/00 
>>>>>>> Emask 0x4 (timeout)
>>>>>>> [  289.851774] ata11.00: status: { DRDY }
>>>>>>> [  289.851834] ata11: hard resetting link
>>>>>>> [  290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 
>>>>>>> SControl 300)
>>>>>>> [  290.749239] ata11.00: max_sectors limited to 256 for NCQ
>>>>>>> [  290.809189] ata11.00: max_sectors limited to 256 for NCQ
>>>>>>> [  290.809194] ata11.00: configured for UDMA/133
>>>>>>> [  290.809200] ata11: EH complete
>>>>>>> [  290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware 
>>>>>>> sectors (1000205 MB)
>>>>>>> [  290.809258] sd 10:0:0:0: [sdk] Write Protect is off
>>>>>>> [  290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00
>>>>>>> [  290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read 
>>>>>>> cache: enabled, doesn't support DPO or FUA
>>>>> ...
>>>>>
>>>>> I've just returned here from a month holiday in Italy,
>>>>> and I'll have a look at this and other sata_mv issues
>>>>> next week or so.
>>>>
>>>> I ran git-bisect on it and it returned 
>>>> a3718c1f230240361ed92d3e53342df0ff7efa8c as first bad commit. Also 
>>>> verified by hand that patching it on working tree breaks it.
>>> Looking at later kernels (after the commit in question), I see that
>>> the code was further fixed to remove some possible races and stuff,
>>> but that's still just 2.6.26.5, which you guys see failures on.
>>>
>>> So here's some instrumentation to help us figure it out.
>>> Please apply and report back once it triggers again.
>>> Thanks.
>>
>> I have to take back that bisect, as just couple of minutes ago it 
>> happened again, with last 'good' kernel from bisect. Just the 
>> frequency of stalls has dropped quite much. I also noticed that on 
>> current kernels are much better too.
>> pre-..0ff7efa8c: only once after 6 hours of testing
>> post-..0ff7efa8c: one hd stalled while filesystem was mounting. 
>> Before boot was complete, 3 stalls. Also at shutdown kernel hung at 
>> Synchronizing SCSI cache for a while.
>> 2.6.27: once in 5 minutes or so on heavy load
>>
>> When some hd/port stalls, other ports sill work fine.
>>
>> I applied your patch on 2.6.27.1, no results:
>>
>> ata14.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
>> ata14.00: cmd 61/08:00:3f:52:54/00:00:57:00:00/40 tag 0 ncq 4096 out
>>        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
>> ata14.00: status: { DRDY }
>> ata14: hard resetting link
>> ata14: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> ata14.00: max_sectors limited to 256 for NCQ
>> ata14.00: max_sectors limited to 256 for NCQ
>> ata14.00: configured for UDMA/133
>> ata14: EH complete
>> sd 13:0:0:0: [sdh] 1465149168 512-byte hardware sectors (750156 MB)
>> sd 13:0:0:0: [sdh] Write Protect is off
>> sd 13:0:0:0: [sdh] Mode Sense: 00 3a 00 00
>> sd 13:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't 
>> support DPO or FUA
>>
>> Do I have to enable something somewhere else too?
>>
>> I also compiled and patched linux-2.6-stable tree from git but it 
>> just paniced after stall instead of recovering. I'm currently trying 
>> to reproduce that on second computer where I can capture the panic.
>
> What type of disks are you using?
>
> Justin.
I have seen this happening on on 3 different computers using WD5000ABYS, 
WD5000YS and WD7500AYYS hard disks. All have same Supermicro controller. 
Stalls happen only on controller ports 0-3, never on ports 4-7. Moving 
cables around doesn't help.

-- 
Harri.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-11-15 23:47             ` Harri Olin
@ 2008-11-15 23:52               ` Justin Piszcz
  0 siblings, 0 replies; 18+ messages in thread
From: Justin Piszcz @ 2008-11-15 23:52 UTC (permalink / raw)
  To: Harri Olin; +Cc: Mark Lord, linux-ide



On Sun, 16 Nov 2008, Harri Olin wrote:

> Justin Piszcz wrote:
>> 
>> 
>> On Sun, 16 Nov 2008, Harri Olin wrote:
>> 
>>> Mark Lord wrote:
>>>> Harri Olin wrote:
>>>>> Mark Lord wrote:
>>>>>>>> Two marvell controllers, 16 disks, software raid10, IO stucks on 
>>>>>>>> different disks, kernel 2.6.26.5.
>>>>>>>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be 
>>>>>>>> repeated
>> 
>> What type of disks are you using?
>> 
>> Justin.
> I have seen this happening on on 3 different computers using WD5000ABYS, 
> WD5000YS and WD7500AYYS hard disks. All have same Supermicro controller. 
> Stalls happen only on controller ports 0-3, never on ports 4-7. Moving cables 
> around doesn't help.

I have been compiling my own list of reports on this problem:

Bug 462425 -  Kernel 2.6.26.3-29.fc9.x86_64 drive goes offline
https://bugzilla.redhat.com/show_bug.cgi?id=462425

hardy / ibex - raid5 - ata#: hard resetting link  [edit]
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263160/

exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
http://groups.google.com/group/linux.kernel/browse_thread/thread/8f3c7ea69a900e51?q="exception+Emask+0x0+SAct"

Harri,

That is interesting that it never happens on port 4-7.  I have two machines,
each with Sil 3132 cards, sata_mv and Intel ICH8 and the same problem happens
across all (12) Velociraptors, I have RMA'd 8 disks so far.  It also happens on
my Raptor 150s in a different box, but MUCH less.  I am currently testing
a patch from Alan Cox and will report if/when the problem recurs:

Subject: [PATCH] libata: Drain data on errors

From: Alan Cox <alan@redhat.com>

If the device is signalling that there is data to drain after an error we
should read the bytes out and throw them away. Without this some devices
and controllers get wedged and don't recover.

Based on earlier work by Mark Lord

Signed-off-by: Alan Cox <alan@redhat.com>
---

  drivers/ata/libata-sff.c  |   44 +++++++++++++++++++++++++++++++++++++++++++-
  drivers/ata/pata_pcmcia.c |   34 +++++++++++++++++++++++++++++++++-
  include/linux/libata.h    |    3 +++
  3 files changed, 79 insertions(+), 2 deletions(-)

Justin.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-11-15 23:41         ` Harri Olin
  2008-11-15 23:44           ` Justin Piszcz
@ 2008-11-16  4:43           ` Mark Lord
  2008-11-16  4:59             ` Mark Lord
                               ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Mark Lord @ 2008-11-16  4:43 UTC (permalink / raw)
  To: Harri Olin; +Cc: linux-ide, Artem Bokhan

Harri Olin wrote:
..
>>>>>> Two marvell controllers, 16 disks, software raid10, IO stucks on 
>>>>>> different disks, kernel 2.6.26.5.
>>>>>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be 
>>>>>> repeated
>>>>>>
>>>>>>
>>>>>> [  289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 
>>>>>> action 0x6 frozen
>>>>>> [  289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 
>>>>>> tag 0 ncq 4096 out
>>>>>> [  289.851697]          res 40/00:00:00:00:00/00:00:00:00:00/00 
>>>>>> Emask 0x4 (timeout)
>>>>>> [  289.851774] ata11.00: status: { DRDY }
>>>>>> [  289.851834] ata11: hard resetting link
>>>>>> [  290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 
>>>>>> 300)
>>>>>> [  290.749239] ata11.00: max_sectors limited to 256 for NCQ
>>>>>> [  290.809189] ata11.00: max_sectors limited to 256 for NCQ
>>>>>> [  290.809194] ata11.00: configured for UDMA/133
>>>>>> [  290.809200] ata11: EH complete
>>>>>> [  290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware 
>>>>>> sectors (1000205 MB)
>>>>>> [  290.809258] sd 10:0:0:0: [sdk] Write Protect is off
>>>>>> [  290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00
>>>>>> [  290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read 
>>>>>> cache: enabled, doesn't support DPO or FUA
..
> I have to take back that bisect, as just couple of minutes ago it 
> happened again, with last 'good' kernel from bisect. Just the frequency 
> of stalls has dropped quite much. I also noticed that on current kernels 
> are much better too.
> pre-..0ff7efa8c: only once after 6 hours of testing
> post-..0ff7efa8c: one hd stalled while filesystem was mounting. Before 
> boot was complete, 3 stalls. Also at shutdown kernel hung at 
> Synchronizing SCSI cache for a while.
> 2.6.27: once in 5 minutes or so on heavy load
> 
> When some hd/port stalls, other ports sill work fine.
> 
> I applied your patch on 2.6.27.1, no results:
> 
> ata14.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
> ata14.00: cmd 61/08:00:3f:52:54/00:00:57:00:00/40 tag 0 ncq 4096 out
>         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
..

Yeah, I see what I was missing earlier:   "(timeout)".
So it's "none of" the driver paths.

This could very well be due to one/several of the as-yet un-addressed
chipset errata for the 6081.  Someday we'll have software workarounds
for those, but I'm (still) waiting on Marvell for stuff.

I will look and see if this makes sense based on the errata info
that I have already though (under NDA).

Harri / Artem:  what type/speed of slots are your 6081 controllers in?
PCI, or PCI-X?   Bus speed?

Thanks


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-11-16  4:43           ` Mark Lord
@ 2008-11-16  4:59             ` Mark Lord
  2008-11-16  9:13               ` Justin Piszcz
  2008-11-17 14:10               ` Bokhan Artem
  2008-11-16 12:35             ` Harri Olin
  2008-11-16 17:32             ` Harri Olin
  2 siblings, 2 replies; 18+ messages in thread
From: Mark Lord @ 2008-11-16  4:59 UTC (permalink / raw)
  To: Harri Olin; +Cc: linux-ide, Artem Bokhan

Mark Lord wrote:
>
> Yeah, I see what I was missing earlier:   "(timeout)".
> So it's "none of" the driver paths.
> 
> This could very well be due to one/several of the as-yet un-addressed
> chipset errata for the 6081.  Someday we'll have software workarounds
> for those, but I'm (still) waiting on Marvell for stuff.
> 
> I will look and see if this makes sense based on the errata info
> that I have already though (under NDA).
> 
> Harri / Artem:  what type/speed of slots are your 6081 controllers in?
> PCI, or PCI-X?   Bus speed?
..

Mmm.. Harri at least previously said: 
> Contoller is Supermicro AOC-SAT2-MV8, connected to 133MHz PCI-X slot on one computer,
> 66MHz 64bit PCI slot on the second machine and to normal 32bit PCI slot on third computer.
..

And Artem said:
> The controller is AOC-SAT2-MV8 too.
..

So I guess I need to know if Artem is using PCI-X, and the bus width + speed.

Cheers

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-11-16  4:59             ` Mark Lord
@ 2008-11-16  9:13               ` Justin Piszcz
  2008-11-17  5:22                 ` Mark Lord
  2008-11-17 14:10               ` Bokhan Artem
  1 sibling, 1 reply; 18+ messages in thread
From: Justin Piszcz @ 2008-11-16  9:13 UTC (permalink / raw)
  To: Mark Lord; +Cc: Harri Olin, linux-ide, Artem Bokhan



On Sat, 15 Nov 2008, Mark Lord wrote:

> Mark Lord wrote:
>> 
>> Yeah, I see what I was missing earlier:   "(timeout)".
>> So it's "none of" the driver paths.
>> 
>> This could very well be due to one/several of the as-yet un-addressed
>> chipset errata for the 6081.  Someday we'll have software workarounds
>> for those, but I'm (still) waiting on Marvell for stuff.
>> 
>> I will look and see if this makes sense based on the errata info
>> that I have already though (under NDA).
>> 
>> Harri / Artem:  what type/speed of slots are your 6081 controllers in?
>> PCI, or PCI-X?   Bus speed?
> ..
>
> Mmm.. Harri at least previously said: 
>> Contoller is Supermicro AOC-SAT2-MV8, connected to 133MHz PCI-X slot on one 
>> computer,
>> 66MHz 64bit PCI slot on the second machine and to normal 32bit PCI slot on 
>> third computer.
> ..
>
> And Artem said:
>> The controller is AOC-SAT2-MV8 too.
> ..

As I mentioned earlier, it happens on all of my drives, 4 of 12 disks are 
connected to a Marvell controller, in my case:

01:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SX7042 PCI-e 4-port SATA-II (rev 02)

It is in the x16 slot.

Justin.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-11-16  4:43           ` Mark Lord
  2008-11-16  4:59             ` Mark Lord
@ 2008-11-16 12:35             ` Harri Olin
  2008-11-16 17:32             ` Harri Olin
  2 siblings, 0 replies; 18+ messages in thread
From: Harri Olin @ 2008-11-16 12:35 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-ide, Artem Bokhan

Mark Lord wrote:
> Harri Olin wrote:
> ..
>>>>>>> Two marvell controllers, 16 disks, software raid10, IO stucks on 
>>>>>>> different disks, kernel 2.6.26.5.
>>>>>>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be 
>>>>>>> repeated
> This could very well be due to one/several of the as-yet un-addressed
> chipset errata for the 6081.  Someday we'll have software workarounds
> for those, but I'm (still) waiting on Marvell for stuff.
>

I was going through logs for the last night and noticed something:
Currently I have only 7 disks connected to 6081 controller, port 5 is 
free, and now stalls have been happening only on ports 0, 2 and 3. So it 
seems something on high port is affecting corresponding lower port too.

> Harri / Artem:  what type/speed of slots are your 6081 controllers in?
> PCI, or PCI-X?   Bus speed?
>

As mentioned earlier, 33MHz, 66MHz and 133MHz. Motherboards are: some 
Asus board with VIA chipset, Supermicro P4SCI and Supermicro X7SBE. I 
think P4SCI has 66MHz 64bit PCI, not PCI-X.

-- 
Harri.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-11-16  4:43           ` Mark Lord
  2008-11-16  4:59             ` Mark Lord
  2008-11-16 12:35             ` Harri Olin
@ 2008-11-16 17:32             ` Harri Olin
  2 siblings, 0 replies; 18+ messages in thread
From: Harri Olin @ 2008-11-16 17:32 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-ide, Artem Bokhan

Mark Lord wrote:
>> ata14.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
>> ata14.00: cmd 61/08:00:3f:52:54/00:00:57:00:00/40 tag 0 ncq 4096 out
>>         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> Yeah, I see what I was missing earlier:   "(timeout)".
> So it's "none of" the driver paths.
>
> This could very well be due to one/several of the as-yet un-addressed
> chipset errata for the 6081.  Someday we'll have software workarounds
> for those, but I'm (still) waiting on Marvell for stuff.
>

After a bit of testing, it seems that writing is required to trigger the 
bug, dstat output follows:

--dsk/sde-----dsk/sdf-----dsk/sdg-----dsk/sdh-----dsk/sdi-----dsk/sdj-----dsk/sdk--
read  writ: read  writ: read  writ: read  writ: read  writ: read  writ: 
read  writ
 37M    0 :  35M    0 :  35M    0 :  37M    0 :  34M    0 :  35M    0 :  
32M    0
 35M    0 :  34M    0 :  34M    0 :  35M    0 :  37M    0 :  37M    0 :  
36M    0
 34M    0 :  35M    0 :  35M    0 :  40M    0 :  36M    0 :  33M    0 :  
35M    0
 30M 8192B:  28M 8192B:  30M 8192B:  30M    0 :  28M 8192B:  30M 8192B:  
28M 8192B
 35M    0 :  37M    0 :  33M    0 :   0     0 :  36M    0 :  34M    0 :  
35M    0
 36M    0 :  35M    0 :  35M    0 :   0     0 :  35M    0 :  34M    0 :  
34M    0
 34M    0 :  37M    0 :  38M    0 :   0     0 :  36M    0 :  36M    0 :  
35M    0

I was running fio, reading from all drives connected to 6081. After 
nothing happened for a while, I decided to mount the xfs filesystem 
read-write and it hung immediately before mount was even complete.

I also managed to catch the panic I mentioned, running kernel 2.6.28-rc5:

[  503.918122] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000000
[  503.918399] IP: [<ffffffff804d3938>] scsi_times_out+0x8/0x70
[  503.918561] PGD 229068067 PUD 22a1f0067 PMD 0
[  503.918814] Oops: 0000 [#1] SMP
[  503.919009] last sysfs file: /sys/block/sdk/stat
[  503.919123] CPU 2
[  503.919273] Modules linked in: kvm_intel kvm coretemp w83627hf w83793 
hwmon_vid hwmon nf_conntrack_ftp 3c59x i2c_i801 i2c_core e100 iTCO_wdt
[  503.920074] Pid: 0, comm: swapper Not tainted 2.6.28-rc5 #4
[  503.920190] RIP: 0010:[<ffffffff804d3938>]  [<ffffffff804d3938>] 
scsi_times_out+0x8/0x70
[  503.920417] RSP: 0018:ffff88022f0f3e60  EFLAGS: 00010046
[  503.920540] RAX: ffff88022d4f5470 RBX: 0000000000000000 RCX: 
ffff88022d4f5ac8
[  503.920659] RDX: ffff88022d4f57e8 RSI: 0000000000000eae RDI: 
ffff8801f8188848
[  503.920777] RBP: ffff88022d4f5988 R08: 0000000000000000 R09: 
0000000000000000
[  503.920897] R10: ffffffff804d6142 R11: ffffffff805dc480 R12: 
ffff88022f0e4000
[  503.921015] R13: ffff88022d4f57e8 R14: 0000000000000000 R15: 
ffff88022d4f5470
[  503.921134] FS:  0000000000000000(0000) GS:ffff88022f08bac0(0000) 
knlGS:0000000000000000
[  503.921317] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  503.921434] CR2: 0000000000000000 CR3: 000000022a0cf000 CR4: 
00000000000026e0
[  503.921553] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  503.921674] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[  503.921793] Process swapper (pid: 0, threadinfo ffff88022f0ee000, 
task ffff88022f0e2c30)
[  503.921985] Stack:
[  503.922094]  ffff8801f8188848 ffffffff80416eee ffff8801f8188848 
ffffffff80416fea
[  503.922116]  0000000000000282 ffff88022d4f5470 0000000000000100 
ffff88022f0e4000
[  503.922116]  ffff88022f0f3ee0 ffffffff80416f30 ffff88022f0e5018 
ffffffff8024393b
[  503.922116] Call Trace:
[  503.922116]  <IRQ> <0> [<ffffffff80416eee>] ? blk_rq_timed_out+0xe/0x50
[  503.922116]  [<ffffffff80416fea>] ? blk_rq_timed_out_timer+0xba/0x120
[  503.922116]  [<ffffffff80416f30>] ? blk_rq_timed_out_timer+0x0/0x120
[  503.922116]  [<ffffffff8024393b>] ? run_timer_softirq+0x1bb/0x230
[  503.922116]  [<ffffffff8023f00b>] ? __do_softirq+0x8b/0x150
[  503.922116]  [<ffffffff8020e7db>] ? profile_pc+0x3b/0x80
[  503.922116]  [<ffffffff8020c8fc>] ? call_softirq+0x1c/0x40
[  503.922116]  [<ffffffff8020db55>] ? do_softirq+0x35/0x70
[  503.922116]  [<ffffffff802205b5>] ? smp_apic_timer_interrupt+0x85/0xd0
[  503.922116]  [<ffffffff8020c34b>] ? apic_timer_interrupt+0x6b/0x70
[  503.922116]  <EOI> <0> [<ffffffff805dc480>] ? udp_poll+0x0/0x150
[  503.922116]  [<ffffffff80212d8c>] ? mwait_idle+0x3c/0x40
[  503.922116]  [<ffffffff80209d5a>] ? cpu_idle+0x3a/0x70
[  503.922116] Code: 18 4c 8b 74 24 20 48 83 c4 28 c3 be 06 00 00 00 48 
89 df e8 9b c8 ff ff 85 c0 75 c3 eb 87 0f 1f 44 00 00 53 48 8b 9f e0 00 
00 00 <48> 8b 03 48
[  503.922116] RIP  [<ffffffff804d3938>] scsi_times_out+0x8/0x70
[  503.922116]  RSP <ffff88022f0f3e60>
[  503.922116] CR2: 0000000000000000
[  503.922116] Kernel panic - not syncing: Fatal exception in interrupt
[  503.922116] ------------[ cut here ]------------
[  503.922116] WARNING: at kernel/smp.c:333 
smp_call_function_mask+0x236/0x240()
[  503.922116] Modules linked in: kvm_intel kvm coretemp w83627hf w83793 
hwmon_vid hwmon nf_conntrack_ftp 3c59x i2c_i801 i2c_core e100 iTCO_wdt
[  503.922116] Pid: 0, comm: swapper Tainted: G      D    2.6.28-rc5 #4
[  503.922116] Call Trace:
[  503.922116]  <IRQ>  [<ffffffff80239ea4>] warn_on_slowpath+0x64/0xa0
[  503.922116]  [<ffffffff80252396>] up+0x16/0x50
[  503.922116]  [<ffffffff8023a657>] release_console_sem+0x197/0x1e0
[  503.922116]  [<ffffffff8025c126>] smp_call_function_mask+0x236/0x240
[  503.922116]  [<ffffffff8023b0fe>] printk+0x4e/0x60
[  503.922116]  [<ffffffff80252396>] up+0x16/0x50
[  503.922116]  [<ffffffff8021f290>] native_smp_send_stop+0x20/0x30
[  503.922116]  [<ffffffff80239f7e>] panic+0x8e/0x150
[  503.922116]  [<ffffffff8020e582>] show_registers+0x192/0x250
[  503.922116]  [<ffffffff8047d745>] do_unblank_screen+0x15/0x140
[  503.922116]  [<ffffffff80636370>] oops_end+0xa0/0xb0
[  503.922116]  [<ffffffff80637f43>] do_page_fault+0x6a3/0x830
[  503.922116]  [<ffffffff80635799>] error_exit+0x0/0x51
[  503.922116]  [<ffffffff805dc480>] udp_poll+0x0/0x150
[  503.922116]  [<ffffffff804d6142>] scsi_request_fn+0xe2/0x400
[  503.922116]  [<ffffffff804d3938>] scsi_times_out+0x8/0x70
[  503.922116]  [<ffffffff80416eee>] blk_rq_timed_out+0xe/0x50
[  503.922116]  [<ffffffff80416fea>] blk_rq_timed_out_timer+0xba/0x120
[  503.922116]  [<ffffffff80416f30>] blk_rq_timed_out_timer+0x0/0x120
[  503.922116]  [<ffffffff8024393b>] run_timer_softirq+0x1bb/0x230
[  503.922116]  [<ffffffff8023f00b>] __do_softirq+0x8b/0x150
[  503.922116]  [<ffffffff8020e7db>] profile_pc+0x3b/0x80
[  503.922116]  [<ffffffff8020c8fc>] call_softirq+0x1c/0x40
[  503.922116]  [<ffffffff8020db55>] do_softirq+0x35/0x70
[  503.922116]  [<ffffffff802205b5>] smp_apic_timer_interrupt+0x85/0xd0
[  503.922116]  [<ffffffff8020c34b>] apic_timer_interrupt+0x6b/0x70
[  503.922116]  <EOI>  [<ffffffff805dc480>] udp_poll+0x0/0x150
[  503.922116]  [<ffffffff80212d8c>] mwait_idle+0x3c/0x40
[  503.922116]  [<ffffffff80209d5a>] cpu_idle+0x3a/0x70
[  503.922116] ---[ end trace 3eef0898db52fd7a ]---


-- 
Harri.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-11-16  9:13               ` Justin Piszcz
@ 2008-11-17  5:22                 ` Mark Lord
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Lord @ 2008-11-17  5:22 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Harri Olin, linux-ide, Artem Bokhan

Justin Piszcz wrote:
> ..
> As I mentioned earlier, it happens on all of my drives, 4 of 12 disks 
> are connected to a Marvell controller, in my case:
> 
> 01:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SX7042 
> PCI-e 4-port SATA-II (rev 02)
..

Whatever problem you are seeing is likely different from what the
others see.  The 7042 is a much newer rev of the chipset, with a lot
less in the way of errata and stuff to go wrong.

So, yes, you're apparently seeing some problems, but there's nothing
to suggest that those are the *same* problems as the 6081 users are seeing.

Cheers

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sata_mv, io stucks
  2008-11-16  4:59             ` Mark Lord
  2008-11-16  9:13               ` Justin Piszcz
@ 2008-11-17 14:10               ` Bokhan Artem
  1 sibling, 0 replies; 18+ messages in thread
From: Bokhan Artem @ 2008-11-17 14:10 UTC (permalink / raw)
  To: Mark Lord; +Cc: Harri Olin, linux-ide

Mark Lord пишет:
>
> And Artem said:
>> The controller is AOC-SAT2-MV8 too.
> ..
>
> So I guess I need to know if Artem is using PCI-X, and the bus width + 
> speed.
>
The only configuration I tried was 2 controllers on 133 mhz pci-x
> Cheers
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-11-17 14:10 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-17 12:25 sata_mv, io stucks Artem Bokhan
2008-10-23  8:53 ` Artem Bokhan
2008-10-23 16:07   ` Mark Lord
2008-11-15 15:18     ` Harri Olin
2008-11-15 21:35       ` Mark Lord
2008-11-15 23:41         ` Harri Olin
2008-11-15 23:44           ` Justin Piszcz
2008-11-15 23:47             ` Harri Olin
2008-11-15 23:52               ` Justin Piszcz
2008-11-16  4:43           ` Mark Lord
2008-11-16  4:59             ` Mark Lord
2008-11-16  9:13               ` Justin Piszcz
2008-11-17  5:22                 ` Mark Lord
2008-11-17 14:10               ` Bokhan Artem
2008-11-16 12:35             ` Harri Olin
2008-11-16 17:32             ` Harri Olin
2008-10-23 13:31 ` Harri Olin
2008-10-23 16:32   ` Bokhan Artem

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).