* sata_mv, io stucks
@ 2008-10-17 12:25 Artem Bokhan
2008-10-23 8:53 ` Artem Bokhan
2008-10-23 13:31 ` Harri Olin
0 siblings, 2 replies; 18+ messages in thread
From: Artem Bokhan @ 2008-10-17 12:25 UTC (permalink / raw)
To: linux-ide, tj
I try to simulate random reads with "sysbench --test=fileio
--num-threads=16 --max-requests=9999999 --max-time=60 --init-rng=on
--file-num=16 --file-fsync-freq=0 --file-test-mode=rndrd
--file-total-size=30G run"
Two marvell controllers, 16 disks, software raid10, IO stucks on
different disks, kernel 2.6.26.5.
With default ubuntu's 8.04 2.6.24 kernel the problem can not be repeated
[ 289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action
0x6 frozen
[ 289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 tag 0
ncq 4096 out
[ 289.851697] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[ 289.851774] ata11.00: status: { DRDY }
[ 289.851834] ata11: hard resetting link
[ 290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 290.749239] ata11.00: max_sectors limited to 256 for NCQ
[ 290.809189] ata11.00: max_sectors limited to 256 for NCQ
[ 290.809194] ata11.00: configured for UDMA/133
[ 290.809200] ata11: EH complete
[ 290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware sectors
(1000205 MB)
[ 290.809258] sd 10:0:0:0: [sdk] Write Protect is off
[ 290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00
[ 290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
lspci:
05:01.0 SCSI storage controller: Marvell Technology Group Ltd.
MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
Subsystem: Marvell Technology Group Ltd. Unknown device 11ab
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr+ Stepping- SERR+ FastB2B+
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 32, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 24
Region 0: Memory at d8200000 (64-bit, non-prefetchable) [size=1M]
Region 2: I/O ports at 3000 [size=256]
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+
Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [60] PCI-X non-bridge device
Command: DPERE- ERO- RBC=512 OST=4
Status: Dev=05:01.0 64bit+ 133MHz+ SCD- USC- DC=simple
DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-
05:02.0 SCSI storage controller: Marvell Technology Group Ltd.
MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
Subsystem: Marvell Technology Group Ltd. Unknown device 11ab
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr+ Stepping- SERR+ FastB2B+
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 32, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 28
Region 0: Memory at d8300000 (64-bit, non-prefetchable) [size=1M]
Region 2: I/O ports at 3400 [size=256]
Region 3: [virtual] Memory at d8800000 (32-bit,
non-prefetchable) [size=4M]
[virtual] Expansion ROM at d9000000 [disabled] [size=4M]
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+
Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [60] PCI-X non-bridge device
Command: DPERE- ERO- RBC=512 OST=4
Status: Dev=05:02.0 64bit+ 133MHz+ SCD- USC- DC=simple
DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: sata_mv, io stucks 2008-10-17 12:25 sata_mv, io stucks Artem Bokhan @ 2008-10-23 8:53 ` Artem Bokhan 2008-10-23 16:07 ` Mark Lord 2008-10-23 13:31 ` Harri Olin 1 sibling, 1 reply; 18+ messages in thread From: Artem Bokhan @ 2008-10-23 8:53 UTC (permalink / raw) To: Artem Bokhan; +Cc: linux-ide Any thought? Artem Bokhan пишет: > I try to simulate random reads with "sysbench --test=fileio > --num-threads=16 --max-requests=9999999 --max-time=60 --init-rng=on > --file-num=16 --file-fsync-freq=0 --file-test-mode=rndrd > --file-total-size=30G run" > > Two marvell controllers, 16 disks, software raid10, IO stucks on > different disks, kernel 2.6.26.5. > With default ubuntu's 8.04 2.6.24 kernel the problem can not be repeated > > > [ 289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action > 0x6 frozen > [ 289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 tag 0 > ncq 4096 out > [ 289.851697] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask > 0x4 (timeout) > [ 289.851774] ata11.00: status: { DRDY } > [ 289.851834] ata11: hard resetting link > [ 290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > [ 290.749239] ata11.00: max_sectors limited to 256 for NCQ > [ 290.809189] ata11.00: max_sectors limited to 256 for NCQ > [ 290.809194] ata11.00: configured for UDMA/133 > [ 290.809200] ata11: EH complete > [ 290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware sectors > (1000205 MB) > [ 290.809258] sd 10:0:0:0: [sdk] Write Protect is off > [ 290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00 > [ 290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read cache: > enabled, doesn't support DPO or FUA > > > lspci: > > 05:01.0 SCSI storage controller: Marvell Technology Group Ltd. > MV88SX6081 8-port SATA II PCI-X Controller (rev 09) > Subsystem: Marvell Technology Group Ltd. Unknown device 11ab > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- > ParErr+ Stepping- SERR+ FastB2B+ > Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium > >TAbort- <TAbort- <MAbort- >SERR- <PERR- > Latency: 32, Cache Line Size: 32 bytes > Interrupt: pin A routed to IRQ 24 > Region 0: Memory at d8200000 (64-bit, non-prefetchable) [size=1M] > Region 2: I/O ports at 3000 [size=256] > Capabilities: [40] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ > Queue=0/0 Enable- > Address: 0000000000000000 Data: 0000 > Capabilities: [60] PCI-X non-bridge device > Command: DPERE- ERO- RBC=512 OST=4 > Status: Dev=05:01.0 64bit+ 133MHz+ SCD- USC- DC=simple > DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz- > > 05:02.0 SCSI storage controller: Marvell Technology Group Ltd. > MV88SX6081 8-port SATA II PCI-X Controller (rev 09) > Subsystem: Marvell Technology Group Ltd. Unknown device 11ab > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- > ParErr+ Stepping- SERR+ FastB2B+ > Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium > >TAbort- <TAbort- <MAbort- >SERR- <PERR- > Latency: 32, Cache Line Size: 32 bytes > Interrupt: pin A routed to IRQ 28 > Region 0: Memory at d8300000 (64-bit, non-prefetchable) [size=1M] > Region 2: I/O ports at 3400 [size=256] > Region 3: [virtual] Memory at d8800000 (32-bit, > non-prefetchable) [size=4M] > [virtual] Expansion ROM at d9000000 [disabled] [size=4M] > Capabilities: [40] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ > Queue=0/0 Enable- > Address: 0000000000000000 Data: 0000 > Capabilities: [60] PCI-X non-bridge device > Command: DPERE- ERO- RBC=512 OST=4 > Status: Dev=05:02.0 64bit+ 133MHz+ SCD- USC- DC=simple > DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz- > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ide" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-10-23 8:53 ` Artem Bokhan @ 2008-10-23 16:07 ` Mark Lord 2008-11-15 15:18 ` Harri Olin 0 siblings, 1 reply; 18+ messages in thread From: Mark Lord @ 2008-10-23 16:07 UTC (permalink / raw) To: Artem Bokhan; +Cc: linux-ide Artem Bokhan wrote: > Any thought? > > Artem Bokhan пишет: >> I try to simulate random reads with "sysbench --test=fileio >> --num-threads=16 --max-requests=9999999 --max-time=60 --init-rng=on >> --file-num=16 --file-fsync-freq=0 --file-test-mode=rndrd >> --file-total-size=30G run" >> >> Two marvell controllers, 16 disks, software raid10, IO stucks on >> different disks, kernel 2.6.26.5. >> With default ubuntu's 8.04 2.6.24 kernel the problem can not be repeated >> >> >> [ 289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action >> 0x6 frozen >> [ 289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 tag 0 >> ncq 4096 out >> [ 289.851697] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask >> 0x4 (timeout) >> [ 289.851774] ata11.00: status: { DRDY } >> [ 289.851834] ata11: hard resetting link >> [ 290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >> [ 290.749239] ata11.00: max_sectors limited to 256 for NCQ >> [ 290.809189] ata11.00: max_sectors limited to 256 for NCQ >> [ 290.809194] ata11.00: configured for UDMA/133 >> [ 290.809200] ata11: EH complete >> [ 290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware sectors >> (1000205 MB) >> [ 290.809258] sd 10:0:0:0: [sdk] Write Protect is off >> [ 290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00 >> [ 290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read cache: >> enabled, doesn't support DPO or FUA ... I've just returned here from a month holiday in Italy, and I'll have a look at this and other sata_mv issues next week or so. Cheers ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-10-23 16:07 ` Mark Lord @ 2008-11-15 15:18 ` Harri Olin 2008-11-15 21:35 ` Mark Lord 0 siblings, 1 reply; 18+ messages in thread From: Harri Olin @ 2008-11-15 15:18 UTC (permalink / raw) To: Mark Lord; +Cc: Artem Bokhan, linux-ide Mark Lord wrote: >>> Two marvell controllers, 16 disks, software raid10, IO stucks on >>> different disks, kernel 2.6.26.5. >>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be >>> repeated >>> >>> >>> [ 289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 >>> action 0x6 frozen >>> [ 289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 tag >>> 0 ncq 4096 out >>> [ 289.851697] res 40/00:00:00:00:00/00:00:00:00:00/00 >>> Emask 0x4 (timeout) >>> [ 289.851774] ata11.00: status: { DRDY } >>> [ 289.851834] ata11: hard resetting link >>> [ 290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >>> [ 290.749239] ata11.00: max_sectors limited to 256 for NCQ >>> [ 290.809189] ata11.00: max_sectors limited to 256 for NCQ >>> [ 290.809194] ata11.00: configured for UDMA/133 >>> [ 290.809200] ata11: EH complete >>> [ 290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware >>> sectors (1000205 MB) >>> [ 290.809258] sd 10:0:0:0: [sdk] Write Protect is off >>> [ 290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00 >>> [ 290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read cache: >>> enabled, doesn't support DPO or FUA > ... > > I've just returned here from a month holiday in Italy, > and I'll have a look at this and other sata_mv issues > next week or so. I ran git-bisect on it and it returned a3718c1f230240361ed92d3e53342df0ff7efa8c as first bad commit. Also verified by hand that patching it on working tree breaks it. -- Harri. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-11-15 15:18 ` Harri Olin @ 2008-11-15 21:35 ` Mark Lord 2008-11-15 23:41 ` Harri Olin 0 siblings, 1 reply; 18+ messages in thread From: Mark Lord @ 2008-11-15 21:35 UTC (permalink / raw) To: Harri Olin; +Cc: Artem Bokhan, linux-ide Harri Olin wrote: > Mark Lord wrote: >>>> Two marvell controllers, 16 disks, software raid10, IO stucks on >>>> different disks, kernel 2.6.26.5. >>>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be >>>> repeated >>>> >>>> >>>> [ 289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 >>>> action 0x6 frozen >>>> [ 289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 tag >>>> 0 ncq 4096 out >>>> [ 289.851697] res 40/00:00:00:00:00/00:00:00:00:00/00 >>>> Emask 0x4 (timeout) >>>> [ 289.851774] ata11.00: status: { DRDY } >>>> [ 289.851834] ata11: hard resetting link >>>> [ 290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >>>> [ 290.749239] ata11.00: max_sectors limited to 256 for NCQ >>>> [ 290.809189] ata11.00: max_sectors limited to 256 for NCQ >>>> [ 290.809194] ata11.00: configured for UDMA/133 >>>> [ 290.809200] ata11: EH complete >>>> [ 290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware >>>> sectors (1000205 MB) >>>> [ 290.809258] sd 10:0:0:0: [sdk] Write Protect is off >>>> [ 290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00 >>>> [ 290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read cache: >>>> enabled, doesn't support DPO or FUA >> ... >> >> I've just returned here from a month holiday in Italy, >> and I'll have a look at this and other sata_mv issues >> next week or so. > > I ran git-bisect on it and it returned > a3718c1f230240361ed92d3e53342df0ff7efa8c as first bad commit. Also > verified by hand that patching it on working tree breaks it. .. Wow.. thanks for all of the hard work there. So it has something to do with (qc->tf.flags & ATA_TFLAG_POLLING). But what I don't see (yet), is exactly which path through the driver is reporting this error. Any path through mv_unexpected_intr() should have the string "unexpected device interrupt" as part of the error report. Similarly, any path through mv_err_intr() should have the string "edma_err_cause=xxxxxxxx" in the error report. I see neither. So I wonder what path is being taken through the driver that results in the error? (That's the trouble with having been away from the code for five months..). Since cmd 61 is an NCQ write, it pretty much has to be in EDMA mode, which means it will likely be calling mv_process_crpb_entries() and then dropping through to check ERR_IRQ and DEV_IRQ. The old code checked ERR_IRQ above, and never saw it, so the new code is probably not seeing ERR_IRQ either. So it must be seeing DEV_IRQ after process_crpb_entries(), something that the old code never checked for. And which is not supposed to happen here. Weird. Looking at later kernels (after the commit in question), I see that the code was further fixed to remove some possible races and stuff, but that's still just 2.6.26.5, which you guys see failures on. So here's some instrumentation to help us figure it out. Please apply and report back once it triggers again. Thanks. --- linux-2.6.26.5/drivers/ata/sata_mv.c 2008-09-08 13:40:20.000000000 -0400 +++ linux/drivers/ata/sata_mv.c 2008-11-15 16:32:23.000000000 -0500 @@ -1999,12 +1999,15 @@ * Error will be seen/handled by mv_err_intr(). * So do nothing at all here. */ + ata_port_printk(ap, KERN_WARNING, "mv_process_crpb_response1: err_cause=0x%x\n", err_cause); return; } } ata_status = edma_status >> CRPB_FLAG_STATUS_SHIFT; if (!ac_err_mask(ata_status)) ata_qc_complete(qc); + else + ata_port_printk(ap, KERN_WARNING, "mv_process_crpb_response2: edma_status=0x%x\n", edma_status); /* else: leave it for mv_err_intr() */ } else { ata_port_printk(ap, KERN_ERR, "%s: no qc for tag=%d\n", @@ -2070,20 +2073,25 @@ */ if (edma_was_enabled && (port_cause & DONE_IRQ)) { mv_process_crpb_entries(ap, pp); - if (pp->pp_flags & MV_PP_FLAG_DELAYED_EH) + if (pp->pp_flags & MV_PP_FLAG_DELAYED_EH) { + ata_port_printk(ap, KERN_WARNING, "mv_port_intr1: port_cause=0x%x(ERR_IRQ), ppflags=0x%x\n", port_cause, pp->pp_flags); mv_handle_fbs_ncq_dev_err(ap); + } } /* * Handle chip-reported errors, or continue on to handle PIO. */ if (unlikely(port_cause & ERR_IRQ)) { + ata_port_printk(ap, KERN_WARNING, "mv_port_intr2: port_cause=0x%x(ERR_IRQ), edma=%d, ppflags=0x%x\n", port_cause, edma_was_enabled, pp->pp_flags); mv_err_intr(ap); } else if (!edma_was_enabled) { struct ata_queued_cmd *qc = mv_get_active_qc(ap); if (qc) ata_sff_host_intr(ap, qc); - else + else { + ata_port_printk(ap, KERN_WARNING, "mv_port_intr3: port_cause=0x%x(ERR_IRQ), ppflags=0x%x\n", port_cause, pp->pp_flags); mv_unexpected_intr(ap, edma_was_enabled); + } } } ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-11-15 21:35 ` Mark Lord @ 2008-11-15 23:41 ` Harri Olin 2008-11-15 23:44 ` Justin Piszcz 2008-11-16 4:43 ` Mark Lord 0 siblings, 2 replies; 18+ messages in thread From: Harri Olin @ 2008-11-15 23:41 UTC (permalink / raw) To: Mark Lord; +Cc: linux-ide Mark Lord wrote: > Harri Olin wrote: >> Mark Lord wrote: >>>>> Two marvell controllers, 16 disks, software raid10, IO stucks on >>>>> different disks, kernel 2.6.26.5. >>>>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be >>>>> repeated >>>>> >>>>> >>>>> [ 289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 >>>>> action 0x6 frozen >>>>> [ 289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 >>>>> tag 0 ncq 4096 out >>>>> [ 289.851697] res 40/00:00:00:00:00/00:00:00:00:00/00 >>>>> Emask 0x4 (timeout) >>>>> [ 289.851774] ata11.00: status: { DRDY } >>>>> [ 289.851834] ata11: hard resetting link >>>>> [ 290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl >>>>> 300) >>>>> [ 290.749239] ata11.00: max_sectors limited to 256 for NCQ >>>>> [ 290.809189] ata11.00: max_sectors limited to 256 for NCQ >>>>> [ 290.809194] ata11.00: configured for UDMA/133 >>>>> [ 290.809200] ata11: EH complete >>>>> [ 290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware >>>>> sectors (1000205 MB) >>>>> [ 290.809258] sd 10:0:0:0: [sdk] Write Protect is off >>>>> [ 290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00 >>>>> [ 290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read >>>>> cache: enabled, doesn't support DPO or FUA >>> ... >>> >>> I've just returned here from a month holiday in Italy, >>> and I'll have a look at this and other sata_mv issues >>> next week or so. >> >> I ran git-bisect on it and it returned >> a3718c1f230240361ed92d3e53342df0ff7efa8c as first bad commit. Also >> verified by hand that patching it on working tree breaks it. > Looking at later kernels (after the commit in question), I see that > the code was further fixed to remove some possible races and stuff, > but that's still just 2.6.26.5, which you guys see failures on. > > So here's some instrumentation to help us figure it out. > Please apply and report back once it triggers again. > Thanks. I have to take back that bisect, as just couple of minutes ago it happened again, with last 'good' kernel from bisect. Just the frequency of stalls has dropped quite much. I also noticed that on current kernels are much better too. pre-..0ff7efa8c: only once after 6 hours of testing post-..0ff7efa8c: one hd stalled while filesystem was mounting. Before boot was complete, 3 stalls. Also at shutdown kernel hung at Synchronizing SCSI cache for a while. 2.6.27: once in 5 minutes or so on heavy load When some hd/port stalls, other ports sill work fine. I applied your patch on 2.6.27.1, no results: ata14.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen ata14.00: cmd 61/08:00:3f:52:54/00:00:57:00:00/40 tag 0 ncq 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata14.00: status: { DRDY } ata14: hard resetting link ata14: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata14.00: max_sectors limited to 256 for NCQ ata14.00: max_sectors limited to 256 for NCQ ata14.00: configured for UDMA/133 ata14: EH complete sd 13:0:0:0: [sdh] 1465149168 512-byte hardware sectors (750156 MB) sd 13:0:0:0: [sdh] Write Protect is off sd 13:0:0:0: [sdh] Mode Sense: 00 3a 00 00 sd 13:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Do I have to enable something somewhere else too? I also compiled and patched linux-2.6-stable tree from git but it just paniced after stall instead of recovering. I'm currently trying to reproduce that on second computer where I can capture the panic. -- Harri. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-11-15 23:41 ` Harri Olin @ 2008-11-15 23:44 ` Justin Piszcz 2008-11-15 23:47 ` Harri Olin 2008-11-16 4:43 ` Mark Lord 1 sibling, 1 reply; 18+ messages in thread From: Justin Piszcz @ 2008-11-15 23:44 UTC (permalink / raw) To: Harri Olin; +Cc: Mark Lord, linux-ide On Sun, 16 Nov 2008, Harri Olin wrote: > Mark Lord wrote: >> Harri Olin wrote: >>> Mark Lord wrote: >>>>>> Two marvell controllers, 16 disks, software raid10, IO stucks on >>>>>> different disks, kernel 2.6.26.5. >>>>>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be >>>>>> repeated >>>>>> >>>>>> >>>>>> [ 289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action >>>>>> 0x6 frozen >>>>>> [ 289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 tag 0 >>>>>> ncq 4096 out >>>>>> [ 289.851697] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask >>>>>> 0x4 (timeout) >>>>>> [ 289.851774] ata11.00: status: { DRDY } >>>>>> [ 289.851834] ata11: hard resetting link >>>>>> [ 290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >>>>>> [ 290.749239] ata11.00: max_sectors limited to 256 for NCQ >>>>>> [ 290.809189] ata11.00: max_sectors limited to 256 for NCQ >>>>>> [ 290.809194] ata11.00: configured for UDMA/133 >>>>>> [ 290.809200] ata11: EH complete >>>>>> [ 290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware sectors >>>>>> (1000205 MB) >>>>>> [ 290.809258] sd 10:0:0:0: [sdk] Write Protect is off >>>>>> [ 290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00 >>>>>> [ 290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read cache: >>>>>> enabled, doesn't support DPO or FUA >>>> ... >>>> >>>> I've just returned here from a month holiday in Italy, >>>> and I'll have a look at this and other sata_mv issues >>>> next week or so. >>> >>> I ran git-bisect on it and it returned >>> a3718c1f230240361ed92d3e53342df0ff7efa8c as first bad commit. Also >>> verified by hand that patching it on working tree breaks it. >> Looking at later kernels (after the commit in question), I see that >> the code was further fixed to remove some possible races and stuff, >> but that's still just 2.6.26.5, which you guys see failures on. >> >> So here's some instrumentation to help us figure it out. >> Please apply and report back once it triggers again. >> Thanks. > > I have to take back that bisect, as just couple of minutes ago it happened > again, with last 'good' kernel from bisect. Just the frequency of stalls has > dropped quite much. I also noticed that on current kernels are much better > too. > pre-..0ff7efa8c: only once after 6 hours of testing > post-..0ff7efa8c: one hd stalled while filesystem was mounting. Before boot > was complete, 3 stalls. Also at shutdown kernel hung at Synchronizing SCSI > cache for a while. > 2.6.27: once in 5 minutes or so on heavy load > > When some hd/port stalls, other ports sill work fine. > > I applied your patch on 2.6.27.1, no results: > > ata14.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen > ata14.00: cmd 61/08:00:3f:52:54/00:00:57:00:00/40 tag 0 ncq 4096 out > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > ata14.00: status: { DRDY } > ata14: hard resetting link > ata14: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > ata14.00: max_sectors limited to 256 for NCQ > ata14.00: max_sectors limited to 256 for NCQ > ata14.00: configured for UDMA/133 > ata14: EH complete > sd 13:0:0:0: [sdh] 1465149168 512-byte hardware sectors (750156 MB) > sd 13:0:0:0: [sdh] Write Protect is off > sd 13:0:0:0: [sdh] Mode Sense: 00 3a 00 00 > sd 13:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support > DPO or FUA > > Do I have to enable something somewhere else too? > > I also compiled and patched linux-2.6-stable tree from git but it just > paniced after stall instead of recovering. I'm currently trying to reproduce > that on second computer where I can capture the panic. What type of disks are you using? Justin. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-11-15 23:44 ` Justin Piszcz @ 2008-11-15 23:47 ` Harri Olin 2008-11-15 23:52 ` Justin Piszcz 0 siblings, 1 reply; 18+ messages in thread From: Harri Olin @ 2008-11-15 23:47 UTC (permalink / raw) To: Justin Piszcz; +Cc: Mark Lord, linux-ide Justin Piszcz wrote: > > > On Sun, 16 Nov 2008, Harri Olin wrote: > >> Mark Lord wrote: >>> Harri Olin wrote: >>>> Mark Lord wrote: >>>>>>> Two marvell controllers, 16 disks, software raid10, IO stucks on >>>>>>> different disks, kernel 2.6.26.5. >>>>>>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be >>>>>>> repeated >>>>>>> >>>>>>> >>>>>>> [ 289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 >>>>>>> action 0x6 frozen >>>>>>> [ 289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 >>>>>>> tag 0 ncq 4096 out >>>>>>> [ 289.851697] res 40/00:00:00:00:00/00:00:00:00:00/00 >>>>>>> Emask 0x4 (timeout) >>>>>>> [ 289.851774] ata11.00: status: { DRDY } >>>>>>> [ 289.851834] ata11: hard resetting link >>>>>>> [ 290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 >>>>>>> SControl 300) >>>>>>> [ 290.749239] ata11.00: max_sectors limited to 256 for NCQ >>>>>>> [ 290.809189] ata11.00: max_sectors limited to 256 for NCQ >>>>>>> [ 290.809194] ata11.00: configured for UDMA/133 >>>>>>> [ 290.809200] ata11: EH complete >>>>>>> [ 290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware >>>>>>> sectors (1000205 MB) >>>>>>> [ 290.809258] sd 10:0:0:0: [sdk] Write Protect is off >>>>>>> [ 290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00 >>>>>>> [ 290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read >>>>>>> cache: enabled, doesn't support DPO or FUA >>>>> ... >>>>> >>>>> I've just returned here from a month holiday in Italy, >>>>> and I'll have a look at this and other sata_mv issues >>>>> next week or so. >>>> >>>> I ran git-bisect on it and it returned >>>> a3718c1f230240361ed92d3e53342df0ff7efa8c as first bad commit. Also >>>> verified by hand that patching it on working tree breaks it. >>> Looking at later kernels (after the commit in question), I see that >>> the code was further fixed to remove some possible races and stuff, >>> but that's still just 2.6.26.5, which you guys see failures on. >>> >>> So here's some instrumentation to help us figure it out. >>> Please apply and report back once it triggers again. >>> Thanks. >> >> I have to take back that bisect, as just couple of minutes ago it >> happened again, with last 'good' kernel from bisect. Just the >> frequency of stalls has dropped quite much. I also noticed that on >> current kernels are much better too. >> pre-..0ff7efa8c: only once after 6 hours of testing >> post-..0ff7efa8c: one hd stalled while filesystem was mounting. >> Before boot was complete, 3 stalls. Also at shutdown kernel hung at >> Synchronizing SCSI cache for a while. >> 2.6.27: once in 5 minutes or so on heavy load >> >> When some hd/port stalls, other ports sill work fine. >> >> I applied your patch on 2.6.27.1, no results: >> >> ata14.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen >> ata14.00: cmd 61/08:00:3f:52:54/00:00:57:00:00/40 tag 0 ncq 4096 out >> res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) >> ata14.00: status: { DRDY } >> ata14: hard resetting link >> ata14: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >> ata14.00: max_sectors limited to 256 for NCQ >> ata14.00: max_sectors limited to 256 for NCQ >> ata14.00: configured for UDMA/133 >> ata14: EH complete >> sd 13:0:0:0: [sdh] 1465149168 512-byte hardware sectors (750156 MB) >> sd 13:0:0:0: [sdh] Write Protect is off >> sd 13:0:0:0: [sdh] Mode Sense: 00 3a 00 00 >> sd 13:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't >> support DPO or FUA >> >> Do I have to enable something somewhere else too? >> >> I also compiled and patched linux-2.6-stable tree from git but it >> just paniced after stall instead of recovering. I'm currently trying >> to reproduce that on second computer where I can capture the panic. > > What type of disks are you using? > > Justin. I have seen this happening on on 3 different computers using WD5000ABYS, WD5000YS and WD7500AYYS hard disks. All have same Supermicro controller. Stalls happen only on controller ports 0-3, never on ports 4-7. Moving cables around doesn't help. -- Harri. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-11-15 23:47 ` Harri Olin @ 2008-11-15 23:52 ` Justin Piszcz 0 siblings, 0 replies; 18+ messages in thread From: Justin Piszcz @ 2008-11-15 23:52 UTC (permalink / raw) To: Harri Olin; +Cc: Mark Lord, linux-ide On Sun, 16 Nov 2008, Harri Olin wrote: > Justin Piszcz wrote: >> >> >> On Sun, 16 Nov 2008, Harri Olin wrote: >> >>> Mark Lord wrote: >>>> Harri Olin wrote: >>>>> Mark Lord wrote: >>>>>>>> Two marvell controllers, 16 disks, software raid10, IO stucks on >>>>>>>> different disks, kernel 2.6.26.5. >>>>>>>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be >>>>>>>> repeated >> >> What type of disks are you using? >> >> Justin. > I have seen this happening on on 3 different computers using WD5000ABYS, > WD5000YS and WD7500AYYS hard disks. All have same Supermicro controller. > Stalls happen only on controller ports 0-3, never on ports 4-7. Moving cables > around doesn't help. I have been compiling my own list of reports on this problem: Bug 462425 - Kernel 2.6.26.3-29.fc9.x86_64 drive goes offline https://bugzilla.redhat.com/show_bug.cgi?id=462425 hardy / ibex - raid5 - ata#: hard resetting link [edit] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263160/ exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen http://groups.google.com/group/linux.kernel/browse_thread/thread/8f3c7ea69a900e51?q="exception+Emask+0x0+SAct" Harri, That is interesting that it never happens on port 4-7. I have two machines, each with Sil 3132 cards, sata_mv and Intel ICH8 and the same problem happens across all (12) Velociraptors, I have RMA'd 8 disks so far. It also happens on my Raptor 150s in a different box, but MUCH less. I am currently testing a patch from Alan Cox and will report if/when the problem recurs: Subject: [PATCH] libata: Drain data on errors From: Alan Cox <alan@redhat.com> If the device is signalling that there is data to drain after an error we should read the bytes out and throw them away. Without this some devices and controllers get wedged and don't recover. Based on earlier work by Mark Lord Signed-off-by: Alan Cox <alan@redhat.com> --- drivers/ata/libata-sff.c | 44 +++++++++++++++++++++++++++++++++++++++++++- drivers/ata/pata_pcmcia.c | 34 +++++++++++++++++++++++++++++++++- include/linux/libata.h | 3 +++ 3 files changed, 79 insertions(+), 2 deletions(-) Justin. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-11-15 23:41 ` Harri Olin 2008-11-15 23:44 ` Justin Piszcz @ 2008-11-16 4:43 ` Mark Lord 2008-11-16 4:59 ` Mark Lord ` (2 more replies) 1 sibling, 3 replies; 18+ messages in thread From: Mark Lord @ 2008-11-16 4:43 UTC (permalink / raw) To: Harri Olin; +Cc: linux-ide, Artem Bokhan Harri Olin wrote: .. >>>>>> Two marvell controllers, 16 disks, software raid10, IO stucks on >>>>>> different disks, kernel 2.6.26.5. >>>>>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be >>>>>> repeated >>>>>> >>>>>> >>>>>> [ 289.851609] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 >>>>>> action 0x6 frozen >>>>>> [ 289.851695] ata11.00: cmd 61/08:00:60:1e:bf/00:00:01:00:00/40 >>>>>> tag 0 ncq 4096 out >>>>>> [ 289.851697] res 40/00:00:00:00:00/00:00:00:00:00/00 >>>>>> Emask 0x4 (timeout) >>>>>> [ 289.851774] ata11.00: status: { DRDY } >>>>>> [ 289.851834] ata11: hard resetting link >>>>>> [ 290.649259] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl >>>>>> 300) >>>>>> [ 290.749239] ata11.00: max_sectors limited to 256 for NCQ >>>>>> [ 290.809189] ata11.00: max_sectors limited to 256 for NCQ >>>>>> [ 290.809194] ata11.00: configured for UDMA/133 >>>>>> [ 290.809200] ata11: EH complete >>>>>> [ 290.809242] sd 10:0:0:0: [sdk] 1953525168 512-byte hardware >>>>>> sectors (1000205 MB) >>>>>> [ 290.809258] sd 10:0:0:0: [sdk] Write Protect is off >>>>>> [ 290.809263] sd 10:0:0:0: [sdk] Mode Sense: 00 3a 00 00 >>>>>> [ 290.809286] sd 10:0:0:0: [sdk] Write cache: enabled, read >>>>>> cache: enabled, doesn't support DPO or FUA .. > I have to take back that bisect, as just couple of minutes ago it > happened again, with last 'good' kernel from bisect. Just the frequency > of stalls has dropped quite much. I also noticed that on current kernels > are much better too. > pre-..0ff7efa8c: only once after 6 hours of testing > post-..0ff7efa8c: one hd stalled while filesystem was mounting. Before > boot was complete, 3 stalls. Also at shutdown kernel hung at > Synchronizing SCSI cache for a while. > 2.6.27: once in 5 minutes or so on heavy load > > When some hd/port stalls, other ports sill work fine. > > I applied your patch on 2.6.27.1, no results: > > ata14.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen > ata14.00: cmd 61/08:00:3f:52:54/00:00:57:00:00/40 tag 0 ncq 4096 out > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) .. Yeah, I see what I was missing earlier: "(timeout)". So it's "none of" the driver paths. This could very well be due to one/several of the as-yet un-addressed chipset errata for the 6081. Someday we'll have software workarounds for those, but I'm (still) waiting on Marvell for stuff. I will look and see if this makes sense based on the errata info that I have already though (under NDA). Harri / Artem: what type/speed of slots are your 6081 controllers in? PCI, or PCI-X? Bus speed? Thanks ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-11-16 4:43 ` Mark Lord @ 2008-11-16 4:59 ` Mark Lord 2008-11-16 9:13 ` Justin Piszcz 2008-11-17 14:10 ` Bokhan Artem 2008-11-16 12:35 ` Harri Olin 2008-11-16 17:32 ` Harri Olin 2 siblings, 2 replies; 18+ messages in thread From: Mark Lord @ 2008-11-16 4:59 UTC (permalink / raw) To: Harri Olin; +Cc: linux-ide, Artem Bokhan Mark Lord wrote: > > Yeah, I see what I was missing earlier: "(timeout)". > So it's "none of" the driver paths. > > This could very well be due to one/several of the as-yet un-addressed > chipset errata for the 6081. Someday we'll have software workarounds > for those, but I'm (still) waiting on Marvell for stuff. > > I will look and see if this makes sense based on the errata info > that I have already though (under NDA). > > Harri / Artem: what type/speed of slots are your 6081 controllers in? > PCI, or PCI-X? Bus speed? .. Mmm.. Harri at least previously said: > Contoller is Supermicro AOC-SAT2-MV8, connected to 133MHz PCI-X slot on one computer, > 66MHz 64bit PCI slot on the second machine and to normal 32bit PCI slot on third computer. .. And Artem said: > The controller is AOC-SAT2-MV8 too. .. So I guess I need to know if Artem is using PCI-X, and the bus width + speed. Cheers ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-11-16 4:59 ` Mark Lord @ 2008-11-16 9:13 ` Justin Piszcz 2008-11-17 5:22 ` Mark Lord 2008-11-17 14:10 ` Bokhan Artem 1 sibling, 1 reply; 18+ messages in thread From: Justin Piszcz @ 2008-11-16 9:13 UTC (permalink / raw) To: Mark Lord; +Cc: Harri Olin, linux-ide, Artem Bokhan On Sat, 15 Nov 2008, Mark Lord wrote: > Mark Lord wrote: >> >> Yeah, I see what I was missing earlier: "(timeout)". >> So it's "none of" the driver paths. >> >> This could very well be due to one/several of the as-yet un-addressed >> chipset errata for the 6081. Someday we'll have software workarounds >> for those, but I'm (still) waiting on Marvell for stuff. >> >> I will look and see if this makes sense based on the errata info >> that I have already though (under NDA). >> >> Harri / Artem: what type/speed of slots are your 6081 controllers in? >> PCI, or PCI-X? Bus speed? > .. > > Mmm.. Harri at least previously said: >> Contoller is Supermicro AOC-SAT2-MV8, connected to 133MHz PCI-X slot on one >> computer, >> 66MHz 64bit PCI slot on the second machine and to normal 32bit PCI slot on >> third computer. > .. > > And Artem said: >> The controller is AOC-SAT2-MV8 too. > .. As I mentioned earlier, it happens on all of my drives, 4 of 12 disks are connected to a Marvell controller, in my case: 01:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SX7042 PCI-e 4-port SATA-II (rev 02) It is in the x16 slot. Justin. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-11-16 9:13 ` Justin Piszcz @ 2008-11-17 5:22 ` Mark Lord 0 siblings, 0 replies; 18+ messages in thread From: Mark Lord @ 2008-11-17 5:22 UTC (permalink / raw) To: Justin Piszcz; +Cc: Harri Olin, linux-ide, Artem Bokhan Justin Piszcz wrote: > .. > As I mentioned earlier, it happens on all of my drives, 4 of 12 disks > are connected to a Marvell controller, in my case: > > 01:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SX7042 > PCI-e 4-port SATA-II (rev 02) .. Whatever problem you are seeing is likely different from what the others see. The 7042 is a much newer rev of the chipset, with a lot less in the way of errata and stuff to go wrong. So, yes, you're apparently seeing some problems, but there's nothing to suggest that those are the *same* problems as the 6081 users are seeing. Cheers ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-11-16 4:59 ` Mark Lord 2008-11-16 9:13 ` Justin Piszcz @ 2008-11-17 14:10 ` Bokhan Artem 1 sibling, 0 replies; 18+ messages in thread From: Bokhan Artem @ 2008-11-17 14:10 UTC (permalink / raw) To: Mark Lord; +Cc: Harri Olin, linux-ide Mark Lord пишет: > > And Artem said: >> The controller is AOC-SAT2-MV8 too. > .. > > So I guess I need to know if Artem is using PCI-X, and the bus width + > speed. > The only configuration I tried was 2 controllers on 133 mhz pci-x > Cheers > -- > To unsubscribe from this list: send the line "unsubscribe linux-ide" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-11-16 4:43 ` Mark Lord 2008-11-16 4:59 ` Mark Lord @ 2008-11-16 12:35 ` Harri Olin 2008-11-16 17:32 ` Harri Olin 2 siblings, 0 replies; 18+ messages in thread From: Harri Olin @ 2008-11-16 12:35 UTC (permalink / raw) To: Mark Lord; +Cc: linux-ide, Artem Bokhan Mark Lord wrote: > Harri Olin wrote: > .. >>>>>>> Two marvell controllers, 16 disks, software raid10, IO stucks on >>>>>>> different disks, kernel 2.6.26.5. >>>>>>> With default ubuntu's 8.04 2.6.24 kernel the problem can not be >>>>>>> repeated > This could very well be due to one/several of the as-yet un-addressed > chipset errata for the 6081. Someday we'll have software workarounds > for those, but I'm (still) waiting on Marvell for stuff. > I was going through logs for the last night and noticed something: Currently I have only 7 disks connected to 6081 controller, port 5 is free, and now stalls have been happening only on ports 0, 2 and 3. So it seems something on high port is affecting corresponding lower port too. > Harri / Artem: what type/speed of slots are your 6081 controllers in? > PCI, or PCI-X? Bus speed? > As mentioned earlier, 33MHz, 66MHz and 133MHz. Motherboards are: some Asus board with VIA chipset, Supermicro P4SCI and Supermicro X7SBE. I think P4SCI has 66MHz 64bit PCI, not PCI-X. -- Harri. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-11-16 4:43 ` Mark Lord 2008-11-16 4:59 ` Mark Lord 2008-11-16 12:35 ` Harri Olin @ 2008-11-16 17:32 ` Harri Olin 2 siblings, 0 replies; 18+ messages in thread From: Harri Olin @ 2008-11-16 17:32 UTC (permalink / raw) To: Mark Lord; +Cc: linux-ide, Artem Bokhan Mark Lord wrote: >> ata14.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen >> ata14.00: cmd 61/08:00:3f:52:54/00:00:57:00:00/40 tag 0 ncq 4096 out >> res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > Yeah, I see what I was missing earlier: "(timeout)". > So it's "none of" the driver paths. > > This could very well be due to one/several of the as-yet un-addressed > chipset errata for the 6081. Someday we'll have software workarounds > for those, but I'm (still) waiting on Marvell for stuff. > After a bit of testing, it seems that writing is required to trigger the bug, dstat output follows: --dsk/sde-----dsk/sdf-----dsk/sdg-----dsk/sdh-----dsk/sdi-----dsk/sdj-----dsk/sdk-- read writ: read writ: read writ: read writ: read writ: read writ: read writ 37M 0 : 35M 0 : 35M 0 : 37M 0 : 34M 0 : 35M 0 : 32M 0 35M 0 : 34M 0 : 34M 0 : 35M 0 : 37M 0 : 37M 0 : 36M 0 34M 0 : 35M 0 : 35M 0 : 40M 0 : 36M 0 : 33M 0 : 35M 0 30M 8192B: 28M 8192B: 30M 8192B: 30M 0 : 28M 8192B: 30M 8192B: 28M 8192B 35M 0 : 37M 0 : 33M 0 : 0 0 : 36M 0 : 34M 0 : 35M 0 36M 0 : 35M 0 : 35M 0 : 0 0 : 35M 0 : 34M 0 : 34M 0 34M 0 : 37M 0 : 38M 0 : 0 0 : 36M 0 : 36M 0 : 35M 0 I was running fio, reading from all drives connected to 6081. After nothing happened for a while, I decided to mount the xfs filesystem read-write and it hung immediately before mount was even complete. I also managed to catch the panic I mentioned, running kernel 2.6.28-rc5: [ 503.918122] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 503.918399] IP: [<ffffffff804d3938>] scsi_times_out+0x8/0x70 [ 503.918561] PGD 229068067 PUD 22a1f0067 PMD 0 [ 503.918814] Oops: 0000 [#1] SMP [ 503.919009] last sysfs file: /sys/block/sdk/stat [ 503.919123] CPU 2 [ 503.919273] Modules linked in: kvm_intel kvm coretemp w83627hf w83793 hwmon_vid hwmon nf_conntrack_ftp 3c59x i2c_i801 i2c_core e100 iTCO_wdt [ 503.920074] Pid: 0, comm: swapper Not tainted 2.6.28-rc5 #4 [ 503.920190] RIP: 0010:[<ffffffff804d3938>] [<ffffffff804d3938>] scsi_times_out+0x8/0x70 [ 503.920417] RSP: 0018:ffff88022f0f3e60 EFLAGS: 00010046 [ 503.920540] RAX: ffff88022d4f5470 RBX: 0000000000000000 RCX: ffff88022d4f5ac8 [ 503.920659] RDX: ffff88022d4f57e8 RSI: 0000000000000eae RDI: ffff8801f8188848 [ 503.920777] RBP: ffff88022d4f5988 R08: 0000000000000000 R09: 0000000000000000 [ 503.920897] R10: ffffffff804d6142 R11: ffffffff805dc480 R12: ffff88022f0e4000 [ 503.921015] R13: ffff88022d4f57e8 R14: 0000000000000000 R15: ffff88022d4f5470 [ 503.921134] FS: 0000000000000000(0000) GS:ffff88022f08bac0(0000) knlGS:0000000000000000 [ 503.921317] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 503.921434] CR2: 0000000000000000 CR3: 000000022a0cf000 CR4: 00000000000026e0 [ 503.921553] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 503.921674] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 503.921793] Process swapper (pid: 0, threadinfo ffff88022f0ee000, task ffff88022f0e2c30) [ 503.921985] Stack: [ 503.922094] ffff8801f8188848 ffffffff80416eee ffff8801f8188848 ffffffff80416fea [ 503.922116] 0000000000000282 ffff88022d4f5470 0000000000000100 ffff88022f0e4000 [ 503.922116] ffff88022f0f3ee0 ffffffff80416f30 ffff88022f0e5018 ffffffff8024393b [ 503.922116] Call Trace: [ 503.922116] <IRQ> <0> [<ffffffff80416eee>] ? blk_rq_timed_out+0xe/0x50 [ 503.922116] [<ffffffff80416fea>] ? blk_rq_timed_out_timer+0xba/0x120 [ 503.922116] [<ffffffff80416f30>] ? blk_rq_timed_out_timer+0x0/0x120 [ 503.922116] [<ffffffff8024393b>] ? run_timer_softirq+0x1bb/0x230 [ 503.922116] [<ffffffff8023f00b>] ? __do_softirq+0x8b/0x150 [ 503.922116] [<ffffffff8020e7db>] ? profile_pc+0x3b/0x80 [ 503.922116] [<ffffffff8020c8fc>] ? call_softirq+0x1c/0x40 [ 503.922116] [<ffffffff8020db55>] ? do_softirq+0x35/0x70 [ 503.922116] [<ffffffff802205b5>] ? smp_apic_timer_interrupt+0x85/0xd0 [ 503.922116] [<ffffffff8020c34b>] ? apic_timer_interrupt+0x6b/0x70 [ 503.922116] <EOI> <0> [<ffffffff805dc480>] ? udp_poll+0x0/0x150 [ 503.922116] [<ffffffff80212d8c>] ? mwait_idle+0x3c/0x40 [ 503.922116] [<ffffffff80209d5a>] ? cpu_idle+0x3a/0x70 [ 503.922116] Code: 18 4c 8b 74 24 20 48 83 c4 28 c3 be 06 00 00 00 48 89 df e8 9b c8 ff ff 85 c0 75 c3 eb 87 0f 1f 44 00 00 53 48 8b 9f e0 00 00 00 <48> 8b 03 48 [ 503.922116] RIP [<ffffffff804d3938>] scsi_times_out+0x8/0x70 [ 503.922116] RSP <ffff88022f0f3e60> [ 503.922116] CR2: 0000000000000000 [ 503.922116] Kernel panic - not syncing: Fatal exception in interrupt [ 503.922116] ------------[ cut here ]------------ [ 503.922116] WARNING: at kernel/smp.c:333 smp_call_function_mask+0x236/0x240() [ 503.922116] Modules linked in: kvm_intel kvm coretemp w83627hf w83793 hwmon_vid hwmon nf_conntrack_ftp 3c59x i2c_i801 i2c_core e100 iTCO_wdt [ 503.922116] Pid: 0, comm: swapper Tainted: G D 2.6.28-rc5 #4 [ 503.922116] Call Trace: [ 503.922116] <IRQ> [<ffffffff80239ea4>] warn_on_slowpath+0x64/0xa0 [ 503.922116] [<ffffffff80252396>] up+0x16/0x50 [ 503.922116] [<ffffffff8023a657>] release_console_sem+0x197/0x1e0 [ 503.922116] [<ffffffff8025c126>] smp_call_function_mask+0x236/0x240 [ 503.922116] [<ffffffff8023b0fe>] printk+0x4e/0x60 [ 503.922116] [<ffffffff80252396>] up+0x16/0x50 [ 503.922116] [<ffffffff8021f290>] native_smp_send_stop+0x20/0x30 [ 503.922116] [<ffffffff80239f7e>] panic+0x8e/0x150 [ 503.922116] [<ffffffff8020e582>] show_registers+0x192/0x250 [ 503.922116] [<ffffffff8047d745>] do_unblank_screen+0x15/0x140 [ 503.922116] [<ffffffff80636370>] oops_end+0xa0/0xb0 [ 503.922116] [<ffffffff80637f43>] do_page_fault+0x6a3/0x830 [ 503.922116] [<ffffffff80635799>] error_exit+0x0/0x51 [ 503.922116] [<ffffffff805dc480>] udp_poll+0x0/0x150 [ 503.922116] [<ffffffff804d6142>] scsi_request_fn+0xe2/0x400 [ 503.922116] [<ffffffff804d3938>] scsi_times_out+0x8/0x70 [ 503.922116] [<ffffffff80416eee>] blk_rq_timed_out+0xe/0x50 [ 503.922116] [<ffffffff80416fea>] blk_rq_timed_out_timer+0xba/0x120 [ 503.922116] [<ffffffff80416f30>] blk_rq_timed_out_timer+0x0/0x120 [ 503.922116] [<ffffffff8024393b>] run_timer_softirq+0x1bb/0x230 [ 503.922116] [<ffffffff8023f00b>] __do_softirq+0x8b/0x150 [ 503.922116] [<ffffffff8020e7db>] profile_pc+0x3b/0x80 [ 503.922116] [<ffffffff8020c8fc>] call_softirq+0x1c/0x40 [ 503.922116] [<ffffffff8020db55>] do_softirq+0x35/0x70 [ 503.922116] [<ffffffff802205b5>] smp_apic_timer_interrupt+0x85/0xd0 [ 503.922116] [<ffffffff8020c34b>] apic_timer_interrupt+0x6b/0x70 [ 503.922116] <EOI> [<ffffffff805dc480>] udp_poll+0x0/0x150 [ 503.922116] [<ffffffff80212d8c>] mwait_idle+0x3c/0x40 [ 503.922116] [<ffffffff80209d5a>] cpu_idle+0x3a/0x70 [ 503.922116] ---[ end trace 3eef0898db52fd7a ]--- -- Harri. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-10-17 12:25 sata_mv, io stucks Artem Bokhan 2008-10-23 8:53 ` Artem Bokhan @ 2008-10-23 13:31 ` Harri Olin 2008-10-23 16:32 ` Bokhan Artem 1 sibling, 1 reply; 18+ messages in thread From: Harri Olin @ 2008-10-23 13:31 UTC (permalink / raw) To: Artem Bokhan; +Cc: linux-ide, tj, liml Artem Bokhan wrote: > I try to simulate random reads with "sysbench --test=fileio > --num-threads=16 --max-requests=9999999 --max-time=60 --init-rng=on > --file-num=16 --file-fsync-freq=0 --file-test-mode=rndrd > --file-total-size=30G run" > > Two marvell controllers, 16 disks, software raid10, IO stucks on > different disks, kernel 2.6.26.5. > With default ubuntu's 8.04 2.6.24 kernel the problem can not be repeated I have the same problem with recent kernels with updated sata_mv driver. First IO stops for a while and afer EH runs, everything works again for a while. Happens on 3 different computers using WD5000ABYS, WD5000YS and WD7500AYYS hard disks, RAID5 and 6 configurations using Linux MD. Stalls seem to happen only on controller ports 0-3, ports 4-7 work without problems. Contoller is Supermicro AOC-SAT2-MV8, connected to 133MHz PCI-X slot on one computer, 66MHz 64bit PCI slot on the second machine and to normal 32bit PCI slot on third computer. http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm At the moment I don't have disks connected to failing ports, but if needed, I can test patches. Oct 10 18:56:17 mizar kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Oct 10 18:56:17 mizar kernel: ata10.00: cmd 35/00:08:3f:52:54/00:00:57:00:00/e0 tag 0 dma 4096 out Oct 10 18:56:17 mizar kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Oct 10 18:56:17 mizar kernel: ata10.00: status: { DRDY } Oct 10 18:56:17 mizar kernel: ata10: hard resetting link Oct 10 18:56:17 mizar kernel: ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Oct 10 18:56:17 mizar kernel: ata10.00: max_sectors limited to 256 for NCQ Oct 10 18:56:17 mizar kernel: ata10.00: max_sectors limited to 256 for NCQ Oct 10 18:56:17 mizar kernel: ata10.00: configured for UDMA/33 Oct 10 18:56:17 mizar kernel: ata10: EH complete Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] 1465149168 512-byte hardware sectors (750156 MB) Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] Write Protect is off Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] Mode Sense: 00 3a 00 00 Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Oct 10 19:34:58 mizar kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Oct 10 19:34:58 mizar kernel: ata10.00: cmd 35/00:08:3f:52:54/00:00:57:00:00/e0 tag 0 dma 4096 out Oct 10 19:34:58 mizar kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Oct 10 19:34:58 mizar kernel: ata10.00: status: { DRDY } Oct 10 19:34:58 mizar kernel: ata10: hard resetting link Oct 10 19:34:58 mizar kernel: ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Oct 10 19:34:58 mizar kernel: ata10.00: max_sectors limited to 256 for NCQ Oct 10 19:34:58 mizar kernel: ata10.00: max_sectors limited to 256 for NCQ Oct 10 19:34:58 mizar kernel: ata10.00: configured for UDMA/33 Oct 10 19:34:58 mizar kernel: ata10: EH complete Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] 1465149168 512-byte hardware sectors (750156 MB) Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] Write Protect is off Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] Mode Sense: 00 3a 00 00 Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Oct 10 19:37:05 mizar kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Oct 10 19:37:05 mizar kernel: ata10.00: cmd 35/00:08:3f:52:54/00:00:57:00:00/e0 tag 0 dma 4096 out Oct 10 19:37:05 mizar kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Oct 10 19:37:05 mizar kernel: ata10.00: status: { DRDY } Oct 10 19:37:05 mizar kernel: ata10: hard resetting link Oct 10 19:37:06 mizar kernel: ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Oct 10 19:37:06 mizar kernel: ata10.00: max_sectors limited to 256 for NCQ Oct 10 19:37:06 mizar kernel: ata10.00: max_sectors limited to 256 for NCQ Oct 10 19:37:06 mizar kernel: ata10.00: configured for UDMA/33 Oct 10 19:37:06 mizar kernel: ata10: EH complete Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] 1465149168 512-byte hardware sectors (750156 MB) Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] Write Protect is off Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] Mode Sense: 00 3a 00 00 Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Sep 26 15:47:14 mvsrv02 kernel: ata5.00: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x6 frozen Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd 60/40:00:7f:a1:e2/00:00:28:00:00/40 tag 0 ncq 32768 in Sep 26 15:47:14 mvsrv02 kernel: res 40/00:00:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY } Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd 60/40:08:3f:a1:e2/00:00:28:00:00/40 tag 1 ncq 32768 in Sep 26 15:47:14 mvsrv02 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY } Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd 60/40:10:3f:a2:e2/00:00:28:00:00/40 tag 2 ncq 32768 in Sep 26 15:47:14 mvsrv02 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY } Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd 60/c0:18:7f:a2:e2/00:00:28:00:00/40 tag 3 ncq 98304 in Sep 26 15:47:14 mvsrv02 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY } Sep 26 15:47:14 mvsrv02 kernel: ata5: hard resetting link Sep 26 15:47:14 mvsrv02 kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Sep 26 15:47:14 mvsrv02 kernel: ata5.00: max_sectors limited to 256 for NCQ Sep 26 15:47:14 mvsrv02 kernel: ata5.00: max_sectors limited to 256 for NCQ Sep 26 15:47:14 mvsrv02 kernel: ata5.00: configured for UDMA/133 Sep 26 15:47:14 mvsrv02 kernel: ata5: EH complete Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB) Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] Write Protect is off Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00 Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA 1st comuter: 133MHz PCI-X slot 03:01.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09) Subsystem: Marvell Technology Group Ltd. Unknown device 11ab Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B+ DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 48 Region 0: Memory at d8800000 (64-bit, non-prefetchable) [size=1M] Region 2: I/O ports at 3000 [size=256] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [60] PCI-X non-bridge device Command: DPERE- ERO- RBC=512 OST=4 Status: Dev=03:01.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz- Kernel driver in use: sata_mv 2nd: 66MHz 64bit PCI 02:01.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09) Subsystem: Marvell Technology Group Ltd. Unknown device 11ab Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32, Cache Line Size: 128 bytes Interrupt: pin A routed to IRQ 24 Region 0: Memory at f2800000 (64-bit, non-prefetchable) [size=1M] Region 2: I/O ports at c000 [size=256] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [60] PCI-X non-bridge device Command: DPERE- ERO- RBC=512 OST=4 Status: Dev=02:01.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz- 3rd computer: 32bit 33MHz PCI 00:0a.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09) Subsystem: Marvell Technology Group Ltd. Unknown device 11ab Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at cfe00000 (64-bit, non-prefetchable) [size=1M] Region 2: I/O ports at dc00 [size=256] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [60] PCI-X non-bridge device Command: DPERE- ERO- RBC=512 OST=4 Status: Dev=ff:1f.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz- -- Harri. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sata_mv, io stucks 2008-10-23 13:31 ` Harri Olin @ 2008-10-23 16:32 ` Bokhan Artem 0 siblings, 0 replies; 18+ messages in thread From: Bokhan Artem @ 2008-10-23 16:32 UTC (permalink / raw) To: Harri Olin; +Cc: linux-ide, liml The controller is AOC-SAT2-MV8 too. Harri Olin пишет: > Artem Bokhan wrote: >> I try to simulate random reads with "sysbench --test=fileio >> --num-threads=16 --max-requests=9999999 --max-time=60 --init-rng=on >> --file-num=16 --file-fsync-freq=0 --file-test-mode=rndrd >> --file-total-size=30G run" >> >> Two marvell controllers, 16 disks, software raid10, IO stucks on >> different disks, kernel 2.6.26.5. >> With default ubuntu's 8.04 2.6.24 kernel the problem can not be repeated > > I have the same problem with recent kernels with updated sata_mv > driver. First IO stops for a while and afer EH runs, everything works > again for a while. Happens on 3 different computers using WD5000ABYS, > WD5000YS and WD7500AYYS hard disks, RAID5 and 6 configurations using > Linux MD. > > Stalls seem to happen only on controller ports 0-3, ports 4-7 work > without problems. > > Contoller is Supermicro AOC-SAT2-MV8, connected to 133MHz PCI-X slot > on one computer, 66MHz 64bit PCI slot on the second machine and to > normal 32bit PCI slot on third computer. > http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm > > At the moment I don't have disks connected to failing ports, but if > needed, I can test patches. > > Oct 10 18:56:17 mizar kernel: ata10.00: exception Emask 0x0 SAct 0x0 > SErr 0x0 action 0x6 frozen > Oct 10 18:56:17 mizar kernel: ata10.00: cmd > 35/00:08:3f:52:54/00:00:57:00:00/e0 tag 0 dma 4096 out > Oct 10 18:56:17 mizar kernel: res > 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) > Oct 10 18:56:17 mizar kernel: ata10.00: status: { DRDY } > Oct 10 18:56:17 mizar kernel: ata10: hard resetting link > Oct 10 18:56:17 mizar kernel: ata10: SATA link up 1.5 Gbps (SStatus > 113 SControl 310) > Oct 10 18:56:17 mizar kernel: ata10.00: max_sectors limited to 256 for > NCQ > Oct 10 18:56:17 mizar kernel: ata10.00: max_sectors limited to 256 for > NCQ > Oct 10 18:56:17 mizar kernel: ata10.00: configured for UDMA/33 > Oct 10 18:56:17 mizar kernel: ata10: EH complete > Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] 1465149168 512-byte > hardware sectors (750156 MB) > Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] Write Protect is off > Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] Mode Sense: 00 3a 00 00 > Oct 10 18:56:17 mizar kernel: sd 9:0:0:0: [sdg] Write cache: enabled, > read cache: enabled, doesn't support DPO or FUA > Oct 10 19:34:58 mizar kernel: ata10.00: exception Emask 0x0 SAct 0x0 > SErr 0x0 action 0x6 frozen > Oct 10 19:34:58 mizar kernel: ata10.00: cmd > 35/00:08:3f:52:54/00:00:57:00:00/e0 tag 0 dma 4096 out > Oct 10 19:34:58 mizar kernel: res > 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) > Oct 10 19:34:58 mizar kernel: ata10.00: status: { DRDY } > Oct 10 19:34:58 mizar kernel: ata10: hard resetting link > Oct 10 19:34:58 mizar kernel: ata10: SATA link up 1.5 Gbps (SStatus > 113 SControl 310) > Oct 10 19:34:58 mizar kernel: ata10.00: max_sectors limited to 256 for > NCQ > Oct 10 19:34:58 mizar kernel: ata10.00: max_sectors limited to 256 for > NCQ > Oct 10 19:34:58 mizar kernel: ata10.00: configured for UDMA/33 > Oct 10 19:34:58 mizar kernel: ata10: EH complete > Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] 1465149168 512-byte > hardware sectors (750156 MB) > Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] Write Protect is off > Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] Mode Sense: 00 3a 00 00 > Oct 10 19:34:58 mizar kernel: sd 9:0:0:0: [sdg] Write cache: enabled, > read cache: enabled, doesn't support DPO or FUA > > Oct 10 19:37:05 mizar kernel: ata10.00: exception Emask 0x0 SAct 0x0 > SErr 0x0 action 0x6 frozen > Oct 10 19:37:05 mizar kernel: ata10.00: cmd > 35/00:08:3f:52:54/00:00:57:00:00/e0 tag 0 dma 4096 out > Oct 10 19:37:05 mizar kernel: res > 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) > Oct 10 19:37:05 mizar kernel: ata10.00: status: { DRDY } > Oct 10 19:37:05 mizar kernel: ata10: hard resetting link > Oct 10 19:37:06 mizar kernel: ata10: SATA link up 1.5 Gbps (SStatus > 113 SControl 310) > Oct 10 19:37:06 mizar kernel: ata10.00: max_sectors limited to 256 for > NCQ > Oct 10 19:37:06 mizar kernel: ata10.00: max_sectors limited to 256 for > NCQ > Oct 10 19:37:06 mizar kernel: ata10.00: configured for UDMA/33 > Oct 10 19:37:06 mizar kernel: ata10: EH complete > Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] 1465149168 512-byte > hardware sectors (750156 MB) > Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] Write Protect is off > Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] Mode Sense: 00 3a 00 00 > Oct 10 19:37:06 mizar kernel: sd 9:0:0:0: [sdg] Write cache: enabled, > read cache: enabled, doesn't support DPO or FUA > > Sep 26 15:47:14 mvsrv02 kernel: ata5.00: exception Emask 0x0 SAct 0xf > SErr 0x0 action 0x6 frozen > Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd > 60/40:00:7f:a1:e2/00:00:28:00:00/40 tag 0 ncq 32768 in > Sep 26 15:47:14 mvsrv02 kernel: res > 40/00:00:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) > Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY } > Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd > 60/40:08:3f:a1:e2/00:00:28:00:00/40 tag 1 ncq 32768 in > Sep 26 15:47:14 mvsrv02 kernel: res > 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY } > Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd > 60/40:10:3f:a2:e2/00:00:28:00:00/40 tag 2 ncq 32768 in > Sep 26 15:47:14 mvsrv02 kernel: res > 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY } > Sep 26 15:47:14 mvsrv02 kernel: ata5.00: cmd > 60/c0:18:7f:a2:e2/00:00:28:00:00/40 tag 3 ncq 98304 in > Sep 26 15:47:14 mvsrv02 kernel: res > 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > Sep 26 15:47:14 mvsrv02 kernel: ata5.00: status: { DRDY } > Sep 26 15:47:14 mvsrv02 kernel: ata5: hard resetting link > Sep 26 15:47:14 mvsrv02 kernel: ata5: SATA link up 3.0 Gbps (SStatus > 123 SControl 300) > Sep 26 15:47:14 mvsrv02 kernel: ata5.00: max_sectors limited to 256 > for NCQ > Sep 26 15:47:14 mvsrv02 kernel: ata5.00: max_sectors limited to 256 > for NCQ > Sep 26 15:47:14 mvsrv02 kernel: ata5.00: configured for UDMA/133 > Sep 26 15:47:14 mvsrv02 kernel: ata5: EH complete > Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] 976773168 512-byte > hardware sectors (500108 MB) > Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] Write Protect is off > Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00 > Sep 26 15:47:14 mvsrv02 kernel: sd 4:0:0:0: [sdb] Write cache: > enabled, read cache: enabled, doesn't support DPO or FUA > > 1st comuter: 133MHz PCI-X slot > 03:01.0 SCSI storage controller: Marvell Technology Group Ltd. > MV88SX6081 8-port SATA II PCI-X Controller (rev 09) > Subsystem: Marvell Technology Group Ltd. Unknown device 11ab > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- > ParErr- Stepping- SERR- FastB2B+ DisINTx- > Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium > >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 32, Cache Line Size: 32 bytes > Interrupt: pin A routed to IRQ 48 > Region 0: Memory at d8800000 (64-bit, non-prefetchable) [size=1M] > Region 2: I/O ports at 3000 [size=256] > Capabilities: [40] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ > Queue=0/0 Enable- > Address: 0000000000000000 Data: 0000 > Capabilities: [60] PCI-X non-bridge device > Command: DPERE- ERO- RBC=512 OST=4 > Status: Dev=03:01.0 64bit+ 133MHz+ SCD- USC- DC=simple > DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz- > Kernel driver in use: sata_mv > > 2nd: 66MHz 64bit PCI > 02:01.0 SCSI storage controller: Marvell Technology Group Ltd. > MV88SX6081 8-port SATA II PCI-X Controller (rev 09) > Subsystem: Marvell Technology Group Ltd. Unknown device 11ab > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- > ParErr- Stepping- SERR- FastB2B- > Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium > >TAbort- <TAbort- <MAbort- >SERR- <PERR- > Latency: 32, Cache Line Size: 128 bytes > Interrupt: pin A routed to IRQ 24 > Region 0: Memory at f2800000 (64-bit, non-prefetchable) [size=1M] > Region 2: I/O ports at c000 [size=256] > Capabilities: [40] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ > Queue=0/0 Enable- > Address: 0000000000000000 Data: 0000 > Capabilities: [60] PCI-X non-bridge device > Command: DPERE- ERO- RBC=512 OST=4 > Status: Dev=02:01.0 64bit+ 133MHz+ SCD- USC- DC=simple > DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz- > > 3rd computer: 32bit 33MHz PCI > 00:0a.0 SCSI storage controller: Marvell Technology Group Ltd. > MV88SX6081 8-port SATA II PCI-X Controller (rev 09) > Subsystem: Marvell Technology Group Ltd. Unknown device 11ab > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- > ParErr- Stepping- SERR+ FastB2B- > Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium > >TAbort- <TAbort- <MAbort- >SERR- <PERR- > Latency: 32, Cache Line Size: 32 bytes > Interrupt: pin A routed to IRQ 16 > Region 0: Memory at cfe00000 (64-bit, non-prefetchable) [size=1M] > Region 2: I/O ports at dc00 [size=256] > Capabilities: [40] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [50] Message Signalled Interrupts: 64bit+ > Queue=0/0 Enable- > Address: 0000000000000000 Data: 0000 > Capabilities: [60] PCI-X non-bridge device > Command: DPERE- ERO- RBC=512 OST=4 > Status: Dev=ff:1f.0 64bit+ 133MHz+ SCD- USC- DC=simple > DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz- > > ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2008-11-17 14:10 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-10-17 12:25 sata_mv, io stucks Artem Bokhan 2008-10-23 8:53 ` Artem Bokhan 2008-10-23 16:07 ` Mark Lord 2008-11-15 15:18 ` Harri Olin 2008-11-15 21:35 ` Mark Lord 2008-11-15 23:41 ` Harri Olin 2008-11-15 23:44 ` Justin Piszcz 2008-11-15 23:47 ` Harri Olin 2008-11-15 23:52 ` Justin Piszcz 2008-11-16 4:43 ` Mark Lord 2008-11-16 4:59 ` Mark Lord 2008-11-16 9:13 ` Justin Piszcz 2008-11-17 5:22 ` Mark Lord 2008-11-17 14:10 ` Bokhan Artem 2008-11-16 12:35 ` Harri Olin 2008-11-16 17:32 ` Harri Olin 2008-10-23 13:31 ` Harri Olin 2008-10-23 16:32 ` Bokhan Artem
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).