linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* timeout errors on sata_sil24 with port multiplier & PCI-X
@ 2007-03-09 22:16 Jim Paris
  2007-03-09 23:31 ` Robin H. Johnson
  2007-03-15 19:36 ` Jim Paris
  0 siblings, 2 replies; 4+ messages in thread
From: Jim Paris @ 2007-03-09 22:16 UTC (permalink / raw)
  To: htejun; +Cc: linux-ide

Hi Tejun,

I know you're not working on PMP at the moment, but maybe you have some insight here:

I have a SiI3124 PCI-X controller, currently plugged into a PCI slot:

02:06.0 RAID bus controller [0104]: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller [1095:3124] (rev 01)
        Subsystem: Silicon Image, Inc. Unknown device [1095:7124]
        Flags: bus master, stepping, 66MHz, medium devsel, latency 32, IRQ 177
        Memory at fdfff000 (64-bit, non-prefetchable) [size=128]
        Memory at fdff0000 (64-bit, non-prefetchable) [size=32K]
        I/O ports at 9c00 [size=16]
        Expansion ROM at 80000000 [disabled] [size=512K]
        Capabilities: [64] Power Management version 2
        Capabilities: [40] PCI-X non-bridge device
        Capabilities: [54] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-

I have been using this configuration with a DS-1220 external enclosure + port multiplier:

Mar  9 01:09:32 localhost kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Mar  9 01:09:32 localhost kernel: ata7.15: Port Multiplier 1.1, 0x1095:0x4726 r0, 5 ports, feat 0x9/0xb

The system is currently running linux-2.6.18.8 + libata-tj-2.6.18.1-20061020
but behaves the same with linux-2.6.17.4 + libata-tj-2.6.17.4-20060710 

Generally this configuration has been working well for a long time.
Occasionally, I get an error like this:

Mar  9 12:53:18 localhost kernel: ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
Mar  9 12:53:18 localhost kernel: ata8.00: tag 0 cmd 0x61 Emask 0x4 stat 0x40 err 0x0 (timeout)
Mar  9 12:53:18 localhost kernel: ata8.15: hard resetting port
Mar  9 12:53:20 localhost kernel: ata8.15: SATA link down (SStatus 0 SControl 300)
Mar  9 12:53:20 localhost kernel: ata8.15: failed to read PMP GSCR[0] (errno=-16)
Mar  9 12:53:20 localhost kernel: ata8.15: PMP revalidation failed (errno=-16)
Mar  9 12:53:20 localhost kernel: ata8.15: retrying hardreset in 5 secs
Mar  9 12:53:25 localhost kernel: ata8.15: hard resetting port
Mar  9 12:53:28 localhost kernel: ata8.15: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Mar  9 12:53:28 localhost kernel: ata8.00: hard resetting port
Mar  9 12:53:29 localhost kernel: ata8.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar  9 12:53:29 localhost kernel: ata8.01: hard resetting port
Mar  9 12:53:29 localhost kernel: ata8.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar  9 12:53:29 localhost kernel: ata8.02: hard resetting port
Mar  9 12:53:30 localhost kernel: ata8.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar  9 12:53:30 localhost kernel: ata8.03: hard resetting port
Mar  9 12:53:30 localhost kernel: ata8.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar  9 12:53:30 localhost kernel: ata8.04: hard resetting port
Mar  9 12:53:31 localhost kernel: ata8.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar  9 12:53:31 localhost kernel: ata8.00: configured for UDMA/100
Mar  9 12:53:32 localhost kernel: ata8.01: configured for UDMA/100
Mar  9 12:53:32 localhost kernel: ata8.02: configured for UDMA/100
Mar  9 12:53:32 localhost kernel: ata8.03: configured for UDMA/100
Mar  9 12:53:32 localhost kernel: ata8.04: configured for UDMA/100
Mar  9 12:53:32 localhost kernel: ata8: EH complete

On the "old" machine, these were infrequent -- maybe once every few
hours, or every couple days.  They just caused a small delay and
everything keeps running fine.  However, I've recently upgraded the
motherboard to one with PCI-X and tried placing the card in a PCI-X
slot.  Then the errors sometimes occur so frequently as to render the
disks almost unusable:

$ grep timeout syslog.0
Mar  8 22:02:03 localhost kernel: ata7.03: tag 3 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
Mar  8 22:02:47 localhost kernel: ata8.02: tag 4 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
Mar  8 22:03:33 localhost kernel: ata7.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
Mar  8 22:04:48 localhost kernel: ata7.04: tag 4 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
Mar  8 22:05:32 localhost kernel: ata8.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
Mar  8 22:06:16 localhost kernel: ata8.02: tag 4 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
Mar  8 22:07:00 localhost kernel: ata7.03: tag 4 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
Mar  8 22:08:26 localhost kernel: ata8.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
Mar  8 22:09:12 localhost kernel: ata7.03: tag 3 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
Mar  8 22:11:04 localhost kernel: ata8.00: tag 0 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
Mar  8 22:11:48 localhost kernel: ata7.03: tag 3 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
Mar  8 22:12:48 localhost kernel: ata7.04: tag 4 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
Mar  8 22:13:32 localhost kernel: ata7.04: tag 1 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
Mar  8 22:14:19 localhost kernel: ata8.03: tag 4 cmd 0x60 Emask 0x4 stat 0x40 err 0x0 (timeout)
..

As far as I can tell, there was never any data corruption, just
frequent timeouts.  It definitely seems related to load, something
like deleting large files or running fsck.ext3 would trigger it
continuously, while copying files from the array to another disk
triggered it maybe every once every 5-10 minutes.

After noticing this in the syslog:

Mar  9 01:17:28 localhost kernel: sata_sil24 0000:04:01.0: Applying completion IRQ loss on PCI-X errata fix

I decided to try putting the card into a plain PCI slot on the new
motherboard and it's much better again, back to the point where the
timeouts are infrequent enough to be not a big problem.

Any idea what the cause of these timeouts are?  

If not I think my next course of action will be to try swapping the
controller to a PCIe SiI3132, then maybe give Robin H. Johnson's
2.6.20 port of the PMP patches a try.

-jim

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: timeout errors on sata_sil24 with port multiplier & PCI-X
  2007-03-09 22:16 timeout errors on sata_sil24 with port multiplier & PCI-X Jim Paris
@ 2007-03-09 23:31 ` Robin H. Johnson
  2007-03-10  3:13   ` Jim Paris
  2007-03-15 19:36 ` Jim Paris
  1 sibling, 1 reply; 4+ messages in thread
From: Robin H. Johnson @ 2007-03-09 23:31 UTC (permalink / raw)
  To: Jim Paris; +Cc: htejun, linux-ide

[-- Attachment #1: Type: text/plain, Size: 7374 bytes --]

On Fri, Mar 09, 2007 at 05:16:28PM -0500, Jim Paris wrote:
> If not I think my next course of action will be to try swapping the
> controller to a PCIe SiI3132, then maybe give Robin H. Johnson's
> 2.6.20 port of the PMP patches a try.

Here's paste of the equivalent stuff on my patches. Notice that it takes 49
seconds to bring the disk array up :-(. I've had plenty of people complain to
me that they have struggled to apply my patches (because they were against the
Linus tree with libata-upstream as of a specific point), and they definitely
need a lot more work to apply to newer kernels.

Also, afaik there has been no resolution of the performance issue - while the
controller shows 3Gb, the actual combined speeds I get are only at the 1.5Gb
level.

03:00.0 Mass storage controller [0180]: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller [1095:3132] (rev 01)
	Subsystem: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller [1095:3132]
	Flags: bus master, fast devsel, latency 0, IRQ 18
	Memory at b0404000 (64-bit, non-prefetchable) [size=128]
	Memory at b0400000 (64-bit, non-prefetchable) [size=16K]
	I/O ports at 3000 [size=128]
	Expansion ROM at b0b00000 [disabled] [size=512K]
	Capabilities: [54] Power Management version 2
	Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
	Capabilities: [70] Express Legacy Endpoint IRQ 0

Feb 28 20:30:52 [kernel] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x42 frozen
Feb 28 20:30:52 [kernel] ata1: (irq_stat 0x00b00090, PHY RDY changed)
Feb 28 20:30:52 [kernel] ata1: waiting for device to spin up (8 secs)
Feb 28 20:31:01 [kernel] ata1: soft resetting port
Feb 28 20:31:01 [kernel] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:01 [kernel] ata1.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 5 ports, feat 0x9/0x9
Feb 28 20:31:01 [kernel] ata1.00: hard resetting port
Feb 28 20:31:01 [kernel] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:01 [kernel] ata1.01: hard resetting port
Feb 28 20:31:02 [kernel] ata1.01: softreset failed (SRST command error)
Feb 28 20:31:02 [kernel] ata1.01: follow-up softreset failed, retrying in 5 secs
Feb 28 20:31:07 [kernel] ata1.01: hard resetting port
Feb 28 20:31:07 [kernel] ata1.01: hardreset failed (port not ready)
Feb 28 20:31:07 [kernel] ata1.01: reset failed, giving up
Feb 28 20:31:07 [kernel] ata1.15: hard resetting port
Feb 28 20:31:07 [kernel] ata1: controller in dubious state, performing PORT_RST
Feb 28 20:31:09 [kernel] ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Feb 28 20:31:09 [kernel] ata1.00: hard resetting port
Feb 28 20:31:10 [kernel] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:10 [kernel] ata1.01: hard resetting port
Feb 28 20:31:11 [kernel] ata1.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:11 [kernel] ata1.02: hard resetting port
Feb 28 20:31:11 [kernel] ata1.02: softreset failed (SRST command error)
Feb 28 20:31:11 [kernel] ata1.02: follow-up softreset failed, retrying in 5 secs
Feb 28 20:31:16 [kernel] ata1.02: hard resetting port
Feb 28 20:31:16 [kernel] ata1.02: COMRESET failed (errno=-5)
Feb 28 20:31:16 [kernel] ata1.02: reset failed, giving up
Feb 28 20:31:16 [kernel] ata1.15: hard resetting port
Feb 28 20:31:16 [kernel] ata1: controller in dubious state, performing PORT_RST
Feb 28 20:31:18 [kernel] ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Feb 28 20:31:19 [kernel] ata1.00: hard resetting port
Feb 28 20:31:19 [kernel] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:19 [kernel] ata1.01: hard resetting port
Feb 28 20:31:20 [kernel] ata1.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:20 [kernel] ata1.02: hard resetting port
Feb 28 20:31:20 [kernel] ata1.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:20 [kernel] ata1.03: hard resetting port
Feb 28 20:31:21 [kernel] ata1.03: softreset failed (SRST command error)
Feb 28 20:31:21 [kernel] ata1.03: follow-up softreset failed, retrying in 5 secs
Feb 28 20:31:26 [kernel] ata1.03: hard resetting port
Feb 28 20:31:26 [kernel] ata1.03: COMRESET failed (errno=-5)
Feb 28 20:31:26 [kernel] ata1.03: reset failed, giving up
Feb 28 20:31:26 [kernel] ata1.15: hard resetting port
Feb 28 20:31:26 [kernel] ata1: controller in dubious state, performing PORT_RST
Feb 28 20:31:28 [kernel] ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Feb 28 20:31:28 [kernel] ata1.00: hard resetting port
Feb 28 20:31:29 [kernel] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:29 [kernel] ata1.01: hard resetting port
Feb 28 20:31:29 [kernel] ata1.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:29 [kernel] ata1.02: hard resetting port
Feb 28 20:31:30 [kernel] ata1.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:30 [kernel] ata1.03: hard resetting port
Feb 28 20:31:30 [kernel] ata1.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:30 [kernel] ata1.04: hard resetting port
Feb 28 20:31:31 [kernel] ata1.04: softreset failed (SRST command error)
Feb 28 20:31:31 [kernel] ata1.04: follow-up softreset failed, retrying in 5 secs
Feb 28 20:31:36 [kernel] ata1.04: hard resetting port
Feb 28 20:31:36 [kernel] ata1.04: hardreset failed (port not ready)
Feb 28 20:31:36 [kernel] ata1.04: reset failed, giving up
Feb 28 20:31:36 [kernel] ata1.15: hard resetting port
Feb 28 20:31:36 [kernel] ata1: controller in dubious state, performing PORT_RST
Feb 28 20:31:38 [kernel] ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Feb 28 20:31:38 [kernel] ata1.00: hard resetting port
Feb 28 20:31:39 [kernel] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:39 [kernel] ata1.01: hard resetting port
Feb 28 20:31:40 [kernel] ata1.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:40 [kernel] ata1.02: hard resetting port
Feb 28 20:31:40 [kernel] ata1.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:40 [kernel] ata1.03: hard resetting port
Feb 28 20:31:41 [kernel] ata1.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:41 [kernel] ata1.04: hard resetting port
Feb 28 20:31:41 [kernel] ata1.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 28 20:31:41 [kernel] ata1.00: ATA-7, max UDMA/133, 625142448 sectors: LBA48 NCQ (depth 31/32)
Feb 28 20:31:41 [kernel] ata1.00: configured for UDMA/133
Feb 28 20:31:41 [kernel] ata1.01: ATA-7, max UDMA/133, 625142448 sectors: LBA48 NCQ (depth 31/32)
Feb 28 20:31:41 [kernel] ata1.01: configured for UDMA/133
Feb 28 20:31:41 [kernel] ata1.02: ATA-7, max UDMA/133, 625142448 sectors: LBA48 NCQ (depth 31/32)
Feb 28 20:31:41 [kernel] ata1.02: configured for UDMA/133
Feb 28 20:31:41 [kernel] ata1.03: ATA-7, max UDMA/133, 625142448 sectors: LBA48 NCQ (depth 31/32)
Feb 28 20:31:41 [kernel] ata1.03: configured for UDMA/133
Feb 28 20:31:41 [kernel] ata1.04: ATA-7, max UDMA/133, 625142448 sectors: LBA48 NCQ (depth 31/32)
Feb 28 20:31:41 [kernel] ata1.04: configured for UDMA/133
Feb 28 20:31:41 [kernel] ata1: EH complete

-- 
Robin Hugh Johnson
Gentoo Linux Developer
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: timeout errors on sata_sil24 with port multiplier & PCI-X
  2007-03-09 23:31 ` Robin H. Johnson
@ 2007-03-10  3:13   ` Jim Paris
  0 siblings, 0 replies; 4+ messages in thread
From: Jim Paris @ 2007-03-10  3:13 UTC (permalink / raw)
  To: Robin H. Johnson; +Cc: htejun, linux-ide

> > If not I think my next course of action will be to try swapping the
> > controller to a PCIe SiI3132, then maybe give Robin H. Johnson's
> > 2.6.20 port of the PMP patches a try.
> 
> Here's paste of the equivalent stuff on my patches. Notice that it takes 49
> seconds to bring the disk array up :-(.
>
> Feb 28 20:30:52 [kernel] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x42 frozen
> Feb 28 20:30:52 [kernel] ata1: (irq_stat 0x00b00090, PHY RDY changed)

>From the "PHY RDY changed" I assume that's when you're bringing it up
for the first time?  Mine also takes a while, but that doesn't bother
me -- my problem is frequent random timeouts that cause the same sorts
of delays during normal use.

> Also, afaik there has been no resolution of the performance issue - while the
> controller shows 3Gb, the actual combined speeds I get are only at the 1.5Gb
> level.

1.5Gb without the timeouts/hard resets would be fine with me :)

-jim

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: timeout errors on sata_sil24 with port multiplier & PCI-X
  2007-03-09 22:16 timeout errors on sata_sil24 with port multiplier & PCI-X Jim Paris
  2007-03-09 23:31 ` Robin H. Johnson
@ 2007-03-15 19:36 ` Jim Paris
  1 sibling, 0 replies; 4+ messages in thread
From: Jim Paris @ 2007-03-15 19:36 UTC (permalink / raw)
  To: htejun, Robin H. Johnson; +Cc: linux-ide

I wrote:
> I have a SiI3124 PCI-X controller
..
> I have been using this configuration with a DS-1220 external
> enclosure + port multiplier:
..
> Occasionally, I get an error like this:
> 
> Mar  9 12:53:18 localhost kernel: ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
> Mar  9 12:53:18 localhost kernel: ata8.00: tag 0 cmd 0x61 Emask 0x4 stat 0x40 err 0x0 (timeout)
> Mar  9 12:53:18 localhost kernel: ata8.15: hard resetting port

After moving the port multiplier from the PCI-X SiI3124 to a PCIe
SiI3132, the timeouts are gone.  No errors at all while moving 1T back
and forth and resyncing the RAID.  I wonder if it's a bad card
(Norco 4618, came with the enclosure) or a driver issue.


Robin H. Johnson wrote:
> Also, afaik there has been no resolution of the performance issue - while the
> controller shows 3Gb, the actual combined speeds I get are only at the 1.5Gb
> level.

I agree, on a single port combined speeds are right around 150MB/sec.
On two ports it hits 180MB/sec.  I guess that's approaching the PCIe x1
limit.

One interesting note is that the original SiI3124 controller always
reported the host->PMP link as 1.5Gb.  The SiI3132 reports it as
3.0Gb.

-jim

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-03-15 19:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-09 22:16 timeout errors on sata_sil24 with port multiplier & PCI-X Jim Paris
2007-03-09 23:31 ` Robin H. Johnson
2007-03-10  3:13   ` Jim Paris
2007-03-15 19:36 ` Jim Paris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).