* spurious completions during NCQ on rt kernel
@ 2007-11-24 23:14 jozwicki
2007-11-26 4:32 ` Mark Lord
0 siblings, 1 reply; 3+ messages in thread
From: jozwicki @ 2007-11-24 23:14 UTC (permalink / raw)
To: jgarzik; +Cc: linux-ide, mingo
Hello,
On Linux localhost 2.6.24-rc2-rt1 #5 SMP PREEMPT RT Sat Nov 24 13:39:15
CET 2007 i686 AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
AuthenticAMD GNU/Linux
under heavy load I get this kind of error:
[27442.074715] ata3.00: exception Emask 0x2 SAct 0xd SErr 0x0 action
0x2 frozen
[27442.074722] ata3.00: spurious completions during NCQ issue=0x1
SAct=0xd FIS=004040a1:00000002
[27442.074735] ata3.00: cmd 61/01:00:55:69:43/00:00:00:00:00/40 tag 0
cdb 0x0 data 512 out
[27442.074736] res 40/00:18:03:77:af/00:00:05:00:00/40 Emask
0x2 (HSM violation)
[27442.074741] ata3.00: status: { DRDY }
[27442.074746] ata3.00: cmd 61/08:10:73:d1:e4/00:00:03:00:00/40 tag 2
cdb 0x0 data 4096 out
[27442.074748] res 40/00:00:55:69:43/00:00:00:00:00/40 Emask
0x2 (HSM violation)
[27442.074752] ata3.00: status: { DRDY }
[27442.074757] ata3.00: cmd 61/08:18:03:77:af/00:00:05:00:00/40 tag 3
cdb 0x0 data 4096 out
[27442.074758] res 40/00:00:55:69:43/00:00:00:00:00/40 Emask
0x2 (HSM violation)
[27442.074762] ata3.00: status: { DRDY }
[27442.377525] ata3: soft resetting link
[27442.530488] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[27442.845995] ata3.00: configured for UDMA/133
[27442.846019] ata3: EH complete
[27442.849003] sd 2:0:0:0: [sda] 488395055 512-byte hardware sectors
(250058 MB)
[27442.849054] sd 2:0:0:0: [sda] Write Protect is off
[27442.849057] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[27442.849097] sd 2:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
It doesn't happen with vanilla kernel.
Is it problem with RT patch? Error comes from ahci_port_intr.
[ 29.975568] scsi0 : pata_atiixp
[ 29.975702] scsi1 : pata_atiixp
[ 29.974885] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xf900
irq 14
[ 29.974888] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xf908
irq 15
[ 30.280976] ata1.00: ATAPI: LITE-ON DVDRW SOHW-832S, VS0B, max
UDMA/33
[ 30.437898] ata1.00: configured for UDMA/33
[ 31.252252] scsi 0:0:0:0: CD-ROM LITE-ON DVDRW SOHW-832S
VS0B PQ: 0 ANSI: 5
[ 31.258278] sr0: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda
tray
[ 31.258282] Uniform CD-ROM driver Revision: 3.20
[ 31.258402] sr 0:0:0:0: Attached scsi CD-ROM sr0
[ 31.258516] sr 0:0:0:0: Attached scsi generic sg0 type 5
[ 32.964056] ahci 0000:00:12.0: version 3.0
[ 32.964083] ACPI: PCI Interrupt 0000:00:12.0[A] -> GSI 22 (level,
low) -> IRQ 16
[ 32.964105] ahci 0000:00:12.0: controller can't do 64bit DMA,
forcing 32bit
[ 32.964107] ahci 0000:00:12.0: controller can't do PMP, turning off
CAP_PMP
[ 33.964321] ahci 0000:00:12.0: AHCI 0001.0100 32 slots 4 ports 3
Gbps 0xf impl SATA mode
[ 33.964324] ahci 0000:00:12.0: flags: ncq sntf ilck pm led clo pio
slum part
[ 33.965715] scsi2 : ahci
[ 33.965848] scsi3 : ahci
[ 33.965940] scsi4 : ahci
[ 33.966027] scsi5 : ahci
[ 33.965136] ata3: SATA max UDMA/133 abar m1024@0xfe02f000 port
0xfe02f100 irq 16
[ 33.965140] ata4: SATA max UDMA/133 abar m1024@0xfe02f000 port
0xfe02f180 irq 16
[ 33.965144] ata5: SATA max UDMA/133 abar m1024@0xfe02f000 port
0xfe02f200 irq 16
[ 33.965148] ata6: SATA max UDMA/133 abar m1024@0xfe02f000 port
0xfe02f280 irq 16
[ 34.421241] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 34.467810] ata3.00: HPA detected: current 488395055, native
488397168
[ 34.467814] ata3.00: ATA-7: ST3250820AS, 3.AAC, max UDMA/133
[ 34.467817] ata3.00: 488395055 sectors, multi 16: LBA48 NCQ (depth
31/32)
[ 34.526081] ata3.00: configured for UDMA/133
[ 34.983061] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 35.023125] ata4.00: HPA detected: current 488395055, native
488397168
[ 35.023129] ata4.00: ATA-7: ST3250820AS, 3.AAE, max UDMA/133
[ 35.023131] ata4.00: 488395055 sectors, multi 16: LBA48 NCQ (depth
31/32)
[ 35.081404] ata4.00: configured for UDMA/133
[ 35.384933] ata5: SATA link down (SStatus 0 SControl 300)
[ 35.687837] ata6: SATA link down (SStatus 0 SControl 300)
[ 35.687928] scsi 2:0:0:0: Direct-Access ATA ST3250820AS
3.AA PQ: 0 ANSI: 5
[ 35.688070] sd 2:0:0:0: [sda] 488395055 512-byte hardware sectors
(250058 MB)
[ 35.688083] sd 2:0:0:0: [sda] Write Protect is off
[ 35.688086] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 35.688106] sd 2:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 35.688153] sd 2:0:0:0: [sda] 488395055 512-byte hardware sectors
(250058 MB)
[ 35.688164] sd 2:0:0:0: [sda] Write Protect is off
[ 35.688166] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 35.688185] sd 2:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 35.688188] sda: sda1 sda2 sda3
[ 35.707963] sd 2:0:0:0: [sda] Attached SCSI disk
[ 35.708057] sd 2:0:0:0: Attached scsi generic sg1 type 0
[ 35.708173] scsi 3:0:0:0: Direct-Access ATA ST3250820AS
3.AA PQ: 0 ANSI: 5
[ 35.708296] sd 3:0:0:0: [sdb] 488395055 512-byte hardware sectors
(250058 MB)
[ 35.708308] sd 3:0:0:0: [sdb] Write Protect is off
[ 35.708311] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 35.708331] sd 3:0:0:0: [sdb] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 35.708370] sd 3:0:0:0: [sdb] 488395055 512-byte hardware sectors
(250058 MB)
[ 35.708382] sd 3:0:0:0: [sdb] Write Protect is off
[ 35.708384] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 35.708405] sd 3:0:0:0: [sdb] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 35.708408] sdb: sdb1 sdb2 sdb3
[ 35.730412] sd 3:0:0:0: [sdb] Attached SCSI disk
There are 2 Seagates with md RAID0 configured.
00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA
(prog-if 01 [AHCI 1.0])
Subsystem: Giga-byte Technology Unknown device b005
Flags: bus master, 66MHz, medium devsel, latency 128, IRQ 16
I/O ports at ff00 [size=8]
I/O ports at fe00 [size=4]
I/O ports at fd00 [size=8]
I/O ports at fc00 [size=4]
I/O ports at fb00 [size=16]
Memory at fe02f000 (32-bit, non-prefetchable) [size=1K]
Capabilities: [60] Power Management version 2
Regards,
Jakub
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: spurious completions during NCQ on rt kernel
2007-11-24 23:14 spurious completions during NCQ on rt kernel jozwicki
@ 2007-11-26 4:32 ` Mark Lord
2007-11-28 0:37 ` Tejun Heo
0 siblings, 1 reply; 3+ messages in thread
From: Mark Lord @ 2007-11-26 4:32 UTC (permalink / raw)
To: jozwicki; +Cc: jgarzik, linux-ide, mingo, Tejun Heo
jozwicki wrote:
> Hello,
> On Linux localhost 2.6.24-rc2-rt1 #5 SMP PREEMPT RT Sat Nov 24 13:39:15
> CET 2007 i686 AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
> AuthenticAMD GNU/Linux
> under heavy load I get this kind of error:
>
> [27442.074715] ata3.00: exception Emask 0x2 SAct 0xd SErr 0x0 action
> 0x2 frozen
> [27442.074722] ata3.00: spurious completions during NCQ issue=0x1
> SAct=0xd FIS=004040a1:00000002
> [27442.074735] ata3.00: cmd 61/01:00:55:69:43/00:00:00:00:00/40 tag 0
> cdb 0x0 data 512 out
> [27442.074736] res 40/00:18:03:77:af/00:00:05:00:00/40 Emask
> 0x2 (HSM violation)
> [27442.074741] ata3.00: status: { DRDY }
> [27442.074746] ata3.00: cmd 61/08:10:73:d1:e4/00:00:03:00:00/40 tag 2
> cdb 0x0 data 4096 out
> [27442.074748] res 40/00:00:55:69:43/00:00:00:00:00/40 Emask
> 0x2 (HSM violation)
> [27442.074752] ata3.00: status: { DRDY }
> [27442.074757] ata3.00: cmd 61/08:18:03:77:af/00:00:05:00:00/40 tag 3
> cdb 0x0 data 4096 out
> [27442.074758] res 40/00:00:55:69:43/00:00:00:00:00/40 Emask
> 0x2 (HSM violation)
> [27442.074762] ata3.00: status: { DRDY }
> [27442.377525] ata3: soft resetting link
> [27442.530488] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [27442.845995] ata3.00: configured for UDMA/133
> [27442.846019] ata3: EH complete
> [27442.849003] sd 2:0:0:0: [sda] 488395055 512-byte hardware sectors
> (250058 MB)
> [27442.849054] sd 2:0:0:0: [sda] Write Protect is off
> [27442.849057] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [27442.849097] sd 2:0:0:0: [sda] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
>
>
> It doesn't happen with vanilla kernel.
> Is it problem with RT patch? Error comes from ahci_port_intr.
>
>
> [ 29.975568] scsi0 : pata_atiixp
> [ 29.975702] scsi1 : pata_atiixp
> [ 29.974885] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xf900
> irq 14
> [ 29.974888] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xf908
> irq 15
> [ 30.280976] ata1.00: ATAPI: LITE-ON DVDRW SOHW-832S, VS0B, max
> UDMA/33
> [ 30.437898] ata1.00: configured for UDMA/33
> [ 31.252252] scsi 0:0:0:0: CD-ROM LITE-ON DVDRW SOHW-832S
> VS0B PQ: 0 ANSI: 5
> [ 31.258278] sr0: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda
> tray
> [ 31.258282] Uniform CD-ROM driver Revision: 3.20
> [ 31.258402] sr 0:0:0:0: Attached scsi CD-ROM sr0
> [ 31.258516] sr 0:0:0:0: Attached scsi generic sg0 type 5
> [ 32.964056] ahci 0000:00:12.0: version 3.0
> [ 32.964083] ACPI: PCI Interrupt 0000:00:12.0[A] -> GSI 22 (level,
> low) -> IRQ 16
> [ 32.964105] ahci 0000:00:12.0: controller can't do 64bit DMA,
> forcing 32bit
> [ 32.964107] ahci 0000:00:12.0: controller can't do PMP, turning off
> CAP_PMP
> [ 33.964321] ahci 0000:00:12.0: AHCI 0001.0100 32 slots 4 ports 3
> Gbps 0xf impl SATA mode
> [ 33.964324] ahci 0000:00:12.0: flags: ncq sntf ilck pm led clo pio
> slum part
> [ 33.965715] scsi2 : ahci
> [ 33.965848] scsi3 : ahci
> [ 33.965940] scsi4 : ahci
> [ 33.966027] scsi5 : ahci
> [ 33.965136] ata3: SATA max UDMA/133 abar m1024@0xfe02f000 port
> 0xfe02f100 irq 16
> [ 33.965140] ata4: SATA max UDMA/133 abar m1024@0xfe02f000 port
> 0xfe02f180 irq 16
> [ 33.965144] ata5: SATA max UDMA/133 abar m1024@0xfe02f000 port
> 0xfe02f200 irq 16
> [ 33.965148] ata6: SATA max UDMA/133 abar m1024@0xfe02f000 port
> 0xfe02f280 irq 16
> [ 34.421241] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [ 34.467810] ata3.00: HPA detected: current 488395055, native
> 488397168
> [ 34.467814] ata3.00: ATA-7: ST3250820AS, 3.AAC, max UDMA/133
> [ 34.467817] ata3.00: 488395055 sectors, multi 16: LBA48 NCQ (depth
> 31/32)
> [ 34.526081] ata3.00: configured for UDMA/133
> [ 34.983061] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [ 35.023125] ata4.00: HPA detected: current 488395055, native
> 488397168
> [ 35.023129] ata4.00: ATA-7: ST3250820AS, 3.AAE, max UDMA/133
> [ 35.023131] ata4.00: 488395055 sectors, multi 16: LBA48 NCQ (depth
> 31/32)
> [ 35.081404] ata4.00: configured for UDMA/133
> [ 35.384933] ata5: SATA link down (SStatus 0 SControl 300)
> [ 35.687837] ata6: SATA link down (SStatus 0 SControl 300)
> [ 35.687928] scsi 2:0:0:0: Direct-Access ATA ST3250820AS
> 3.AA PQ: 0 ANSI: 5
> [ 35.688070] sd 2:0:0:0: [sda] 488395055 512-byte hardware sectors
> (250058 MB)
> [ 35.688083] sd 2:0:0:0: [sda] Write Protect is off
> [ 35.688086] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [ 35.688106] sd 2:0:0:0: [sda] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
> [ 35.688153] sd 2:0:0:0: [sda] 488395055 512-byte hardware sectors
> (250058 MB)
> [ 35.688164] sd 2:0:0:0: [sda] Write Protect is off
> [ 35.688166] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [ 35.688185] sd 2:0:0:0: [sda] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
> [ 35.688188] sda: sda1 sda2 sda3
> [ 35.707963] sd 2:0:0:0: [sda] Attached SCSI disk
> [ 35.708057] sd 2:0:0:0: Attached scsi generic sg1 type 0
> [ 35.708173] scsi 3:0:0:0: Direct-Access ATA ST3250820AS
> 3.AA PQ: 0 ANSI: 5
> [ 35.708296] sd 3:0:0:0: [sdb] 488395055 512-byte hardware sectors
> (250058 MB)
> [ 35.708308] sd 3:0:0:0: [sdb] Write Protect is off
> [ 35.708311] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> [ 35.708331] sd 3:0:0:0: [sdb] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
> [ 35.708370] sd 3:0:0:0: [sdb] 488395055 512-byte hardware sectors
> (250058 MB)
> [ 35.708382] sd 3:0:0:0: [sdb] Write Protect is off
> [ 35.708384] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> [ 35.708405] sd 3:0:0:0: [sdb] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
> [ 35.708408] sdb: sdb1 sdb2 sdb3
> [ 35.730412] sd 3:0:0:0: [sdb] Attached SCSI disk
>
> There are 2 Seagates with md RAID0 configured.
>
> 00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA
> (prog-if 01 [AHCI 1.0])
> Subsystem: Giga-byte Technology Unknown device b005
> Flags: bus master, 66MHz, medium devsel, latency 128, IRQ 16
> I/O ports at ff00 [size=8]
> I/O ports at fe00 [size=4]
> I/O ports at fd00 [size=8]
> I/O ports at fc00 [size=4]
> I/O ports at fb00 [size=16]
> Memory at fe02f000 (32-bit, non-prefetchable) [size=1K]
> Capabilities: [60] Power Management version 2
..
Mmmm.. that's interesting. I suppose a big difference between -rt and vanilla
is in latency between interrupt signal and the interrupt handler actually running.
I think in -rt it is slightly *longer*, since the handler is run by a scheduled
thread instead of directly.
Anyway, for me, the intriguing point of data here, is that there could be some
tight soft timing issue that sometimes can make NCQ fail. And our response to date
with this is to blacklist the drives.. which may or may not be correct now.
Cheers
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: spurious completions during NCQ on rt kernel
2007-11-26 4:32 ` Mark Lord
@ 2007-11-28 0:37 ` Tejun Heo
0 siblings, 0 replies; 3+ messages in thread
From: Tejun Heo @ 2007-11-28 0:37 UTC (permalink / raw)
To: Mark Lord; +Cc: jozwicki, jgarzik, linux-ide, mingo
Hello,
Mark Lord wrote:
> Mmmm.. that's interesting. I suppose a big difference between -rt and
> vanilla
> is in latency between interrupt signal and the interrupt handler
> actually running.
>
> I think in -rt it is slightly *longer*, since the handler is run by a
> scheduled
> thread instead of directly.
>
> Anyway, for me, the intriguing point of data here, is that there could
> be some
> tight soft timing issue that sometimes can make NCQ fail. And our
> response to date
> with this is to blacklist the drives.. which may or may not be correct now.
I think it's interrupt timing problem too. I just received my order of
problematic drives. I'll try to reproduce it and find out. At any
rate, I think we'll probably have to drop spurious NCQ completion check.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-11-28 0:38 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-24 23:14 spurious completions during NCQ on rt kernel jozwicki
2007-11-26 4:32 ` Mark Lord
2007-11-28 0:37 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).