linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problems with >3 drives on an eSATA portmultiplier
@ 2009-04-01 13:54 Justin Fletcher
  2009-04-01 16:45 ` Grant Grundler
  0 siblings, 1 reply; 8+ messages in thread
From: Justin Fletcher @ 2009-04-01 13:54 UTC (permalink / raw)
  To: linux-ide

Hiya,

I've been having problems recently with my external eSATA drives failing 
to be recognised when there are more than 3 plugged in at one time.

Summary of problem:

When one drive is connected in the external box, everything is fine.

When two are connected, everything is fine.

When three are connected, it can sometimes take a while for them all to 
be detected and mounted.

When four are connected, it almost never detects them properly or mounts 
them. Occassionally I get all 4 mounted, and rarely I get just 1 or 2 of 
the drives mounted.

When five are connected, it's not mounting the drives.


More details:

The kernel I'm using now is 2.6.29 with no patches applied.

The system I'm using is a MSI motherboard, with a SiI eSATA controller 
(a 3132, specifically this 
one: http://www.span.com/catalog/product_info.php?products_id=15995 ) 
connected though the only PCI express card on the MB.

The bridgeboard in my external box is a NA910C, with a SiI3726 onboard 
(specifically this 
one: http://www.span.com/catalog/product_info.php?products_id=15709 ).

The method of disconnecting the drives is to remove the SATA cable from 
the bridge board.

The eSATA cable has been replaced with another one (both 1M long) and 
this has had no effect.

All the drives in the external box are Western Digital. 3 are 500G 
drives, 2 are 1T 'Green Power' drives.

Once detected, the drives are mounted (and subsequently unmounted) by 
udev rules.


History:

The full 5 drives were working and being mounted correctly in the past. 
However, due to many upgrades and confusing hardware problems at the 
same time, trying to identify when that was has become a problem for me 
- I can't say when it was working. When it was working I had a JMB362 
PCIexpress card (specifically this one: 
http://www.span.com/catalog/product_info.php?products_id=16361 ). This 
has been replaced by the SiI card in order to determine if the card is a 
problem; the problems persist and have the same symptoms. (should it be 
necessary for diagnosis, I can put the JMB362 card back). I can say for 
certain that the failures I'm seeing have happened at least on kernels 
2.6.28.3, 2.6.28.4 and 2.6.29.

During testing combinations of drives have been changed, and the bridge 
board ports that they are plugged in to. This has not appeared to make 
any difference - the factor in this equation is the number of drives 
that are connected.


Typical failure:

A typical reads something like this (taken from kern.log from messages 
collected during initialisation):

Apr  1 11:43:23 buttercup kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Apr  1 11:43:23 buttercup kernel: ata1.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  1 11:43:23 buttercup kernel: ata1.02: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.02: failed to write SCR 1 (Emask=0x1)
Apr  1 11:43:23 buttercup kernel: ata1.02: failed to read SCR 0 (Emask=0x40)
Apr  1 11:43:23 buttercup kernel: ata1.03: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.03: hardreset failed (port not ready)
Apr  1 11:43:23 buttercup kernel: ata1.03: failed to read SCR 0 (Emask=0x40)
Apr  1 11:43:23 buttercup kernel: ata1.03: reset failed, giving up
Apr  1 11:43:23 buttercup kernel: ata1.15: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1: controller in dubious state, performing PORT_RST
Apr  1 11:43:23 buttercup kernel: ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  1 11:43:23 buttercup kernel: ata1.02: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  1 11:43:23 buttercup kernel: ata1.03: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.03: failed to write SCR 2 (Emask=0x1)
Apr  1 11:43:23 buttercup kernel: ata1.03: COMRESET failed (errno=-5)
Apr  1 11:43:23 buttercup kernel: ata1.03: failed to read SCR 0 (Emask=0x40)
Apr  1 11:43:23 buttercup kernel: ata1.03: reset failed, giving up
Apr  1 11:43:23 buttercup kernel: ata1.15: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1: controller in dubious state, performing PORT_RST
Apr  1 11:43:23 buttercup kernel: ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  1 11:43:23 buttercup kernel: ata1.02: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  1 11:43:23 buttercup kernel: ata1.03: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.03: failed to read SCR 2 (Emask=0x1)
Apr  1 11:43:23 buttercup kernel: ata1.03: COMRESET failed (errno=-5)
Apr  1 11:43:23 buttercup kernel: ata1.03: failed to read SCR 0 (Emask=0x40)
Apr  1 11:43:23 buttercup kernel: ata1.03: reset failed, giving up
Apr  1 11:43:23 buttercup kernel: ata1.03: failed to recover link after 3 tries, disabling
Apr  1 11:43:23 buttercup kernel: ata1.15: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1: controller in dubious state, performing PORT_RST

... and so on until it tries detaching the port multiplier ...

Apr  1 11:43:23 buttercup kernel: ata1: controller in dubious state, performing PORT_RST
Apr  1 11:43:23 buttercup kernel: ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.01: failed to read SCR 2 (Emask=0x40)
Apr  1 11:43:23 buttercup kernel: ata1.01: COMRESET failed (errno=-5)
Apr  1 11:43:23 buttercup kernel: ata1.01: failed to read SCR 0 (Emask=0x40)
Apr  1 11:43:23 buttercup kernel: ata1.01: reset failed, giving up
Apr  1 11:43:23 buttercup kernel: ata1.01: failed to recover link after 3 tries, disabling
Apr  1 11:43:23 buttercup kernel: ata1.15: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Apr  1 11:43:23 buttercup kernel: ata1.04: failed to read SCR 0 (Emask=0x40)
Apr  1 11:43:23 buttercup kernel: ata1.04: COMRESET failed (errno=-5)
Apr  1 11:43:23 buttercup kernel: ata1.04: failed to write SCR 1 (Emask=0x40)
Apr  1 11:43:23 buttercup kernel: ata1.04: failed to clear SError.N (errno=-5)
Apr  1 11:43:23 buttercup kernel: ata1: failed to recover PMP after 5 tries, giving up
Apr  1 11:43:23 buttercup kernel: ata1.15: Port Multiplier detaching
Apr  1 11:43:23 buttercup kernel: ata1.00: disabled
Apr  1 11:43:23 buttercup kernel: ata1: exception Emask 0x13 SAct 0x0 SErr 0x40d0000 action 0xe frozen t4
Apr  1 11:43:23 buttercup kernel: ata1: irq_stat 0x01100010, PHY RDY changed
Apr  1 11:43:23 buttercup kernel: ata1: SError: { PHYRdyChg CommWake 10B8B DevExch }
Apr  1 11:43:23 buttercup kernel: ata1: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Apr  1 11:43:23 buttercup kernel: ata1.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  1 11:43:23 buttercup kernel: ata1.02: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  1 11:43:23 buttercup kernel: ata1.03: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.03: failed to read SCR 0 (Emask=0x40)
Apr  1 11:43:23 buttercup kernel: ata1.03: COMRESET failed (errno=-5)
Apr  1 11:43:23 buttercup kernel: ata1.03: failed to read SCR 0 (Emask=0x40)
Apr  1 11:43:23 buttercup kernel: ata1.03: reset failed, giving up
Apr  1 11:43:23 buttercup kernel: ata1.15: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  1 11:43:23 buttercup kernel: ata1.02: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.02: failed to read SCR 0 (Emask=0x1)
Apr  1 11:43:23 buttercup kernel: ata1.02: COMRESET failed (errno=-5)
Apr  1 11:43:23 buttercup kernel: ata1.02: failed to read SCR 0 (Emask=0x40)
Apr  1 11:43:23 buttercup kernel: ata1.02: reset failed, giving up
Apr  1 11:43:23 buttercup kernel: ata1.15: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  1 11:43:23 buttercup kernel: ata1.02: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.02: failed to read SCR 0 (Emask=0x1)
Apr  1 11:43:23 buttercup kernel: ata1.02: failed to read SCR 1 (Emask=0x40)
Apr  1 11:43:23 buttercup kernel: ata1.02: failed to read SCR 0 (Emask=0x40)
Apr  1 11:43:23 buttercup kernel: ata1.03: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.03: hardreset failed (port not ready)
Apr  1 11:43:23 buttercup kernel: ata1.03: failed to read SCR 0 (Emask=0x40)
Apr  1 11:43:23 buttercup kernel: ata1.03: reset failed, giving up
Apr  1 11:43:23 buttercup kernel: ata1.15: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1: controller in dubious state, performing PORT_RST

... and the sequence repeats until it gets fed up ...

Apr  1 11:43:23 buttercup kernel: ata1: controller in dubious state, performing PORT_RST
Apr  1 11:43:23 buttercup kernel: ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.01: SATA link down (SStatus 221 SControl 300)
Apr  1 11:43:23 buttercup kernel: ata1.05: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
Apr  1 11:43:23 buttercup kernel: ata1.00: ATA-8: WDC WD5000AAKS-00YGA0, 12.01C02, max UDMA/133
Apr  1 11:43:23 buttercup kernel: ata1.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 31/32)
Apr  1 11:43:23 buttercup kernel: ata1.00: configured for UDMA/100
Apr  1 11:43:23 buttercup kernel: ata1.04: PHY status changed but maxed out on retries, giving up
Apr  1 11:43:23 buttercup kernel: ata1.04: Manully issue scan to resume this link
Apr  1 11:43:23 buttercup kernel: ata1: PMP SError.N set for some ports, repeating recovery
Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Apr  1 11:43:23 buttercup kernel: ata1.00: configured for UDMA/100
Apr  1 11:43:23 buttercup kernel: ata1: EH pending after 5 tries, giving up
Apr  1 11:43:23 buttercup kernel: ata1: EH complete


As can be seen, it got as far as identifying one of the drives in this 
configuration on the final attempt, but the other 3 were not detected 
properly.


My gut feeling:

There's some timing problem involved here - either the drives are being 
sent commands when they're not ready, or they're being timed out before 
they have a chance to respond after a reset. As the problem gets worse 
(to the point of always failing) with more drives, I'm thinking of some 
overall timeout that's being triggered but the individual drives are 
getting less and less time to handle it. For example, drive 1 reset at 
1s, drive 2 reset at 2s, drive 3 reset at 3s, etc, but an overall 
timeout of 8s, so by the time that drive 5 has been reset, it only has 
3s to respond and its initialisation takes longer than that so it never 
does). Not knowing what is involved here, this may be complete rubbish 
and is purely guesswork on my part.


More details from kernel logs:

Because I'm not sure what's useful, and I wanted to capture some timings 
for the sequences of events, I've captured kernel logs of the a number 
of drive combinations. In each case the PC was turned off, the box was 
turned off, the SATA leads were connected as required for the test, then 
the box turned on, a few seconds waited for the box to settle, then the 
PC turned on. The system booted into 2.6.29 and then waited until it had 
settled to a login prompt. At this point, the drive box was turned off. 
The system then shut down whatever drives it had detected after 
determining that the PMP had gone away. The drive box was then turned on 
again. This second initialisation of the box should ensure that there 
are timings present in the kernel logs which determine how long it was 
between events.

The numbering of the logs indicates which drives were connected - these 
are drives numbers from 1-5, not the numbers used in the log messages 
which are 0-4 (it just makes more sense for me to think of them as 
drives 1-5 not 0-4).

Drives 1-3 are 500G, drives 4-5 are 1T.

In the logs it can also be seen that there are two ATA drives connected 
to the MB, and two SATA drives connected to the MB. Neither of these 
appear to exhibit any other problems.

The logs can be found at:

http://usenet.gerph.org/SATA/


sata-15-kern.log:
    2 drives connected.
    All detected during initialisation.
    All detected on restarting box.

sata-45-kern.log:
    2 drives connected.
    All detected during initialisation.
    All detected on restarting box, although it reset the port 3 times.

sata-125-kern.log:
    3 drives connected.
    All detected during initialisation, but after doing so it then tried
    to re-detect later (which was successful)
    All detected on restarting box, although it reset the port 2 times
    and had SCSI errors reported which it recovered from.

sata-345-kern.log:
    3 drives connected.
    1 detected during initialisation, only drive 4 was initialised
    properly; during init 3 had been IDENTIFYd but the port was then
    reset and more attempts made.
    All detected on restarting box, although it reset the port 2 times
    and had other errors reported which it recovered from.

sata-1235-kern.log:
    4 drives connected.
    1 detected during initialisation (drive 1), many attempts made.
    None detected on restarting box, although it retried many times.

sata-12345-kern.log:
    5 drives connected.
    None detected during initialisation, many attempts made.
    Ineffective - no output when the external box was turned off, nor
    when it was turned on.


Finally:

I can provide more information, more combinations and try different 
kernel configurations if it's found to be useful for this. I'm sorry if 
this information is too verbose, or if I've missed something out - 
please let me know and I'll try to do tests or fill in the blanks.


Hope someone can help with this!


-- 
Gerph <http://gerph.org/>
[ All information, speculation, opinion or data within, or attached to,
   this email is private and confidential. Such content may not be
   disclosed to third parties, or a public forum, without explicit
   permission being granted. ]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problems with >3 drives on an eSATA portmultiplier
  2009-04-01 13:54 Problems with >3 drives on an eSATA portmultiplier Justin Fletcher
@ 2009-04-01 16:45 ` Grant Grundler
  2009-04-01 17:09   ` Justin Fletcher
  0 siblings, 1 reply; 8+ messages in thread
From: Grant Grundler @ 2009-04-01 16:45 UTC (permalink / raw)
  To: Justin Fletcher; +Cc: linux-ide

On Wed, Apr 1, 2009 at 6:54 AM, Justin Fletcher <gerph@gerph.org> wrote:
> Hiya,
>
> I've been having problems recently with my external eSATA drives failing to
> be recognised when there are more than 3 plugged in at one time.
>
> Summary of problem:
>
> When one drive is connected in the external box, everything is fine.
>
> When two are connected, everything is fine.
>
> When three are connected, it can sometimes take a while for them all to be
> detected and mounted.
>
> When four are connected, it almost never detects them properly or mounts
> them. Occassionally I get all 4 mounted, and rarely I get just 1 or 2 of the
> drives mounted.
>
> When five are connected, it's not mounting the drives.
>
>
> More details:
>
> The kernel I'm using now is 2.6.29 with no patches applied.
>
> The system I'm using is a MSI motherboard, with a SiI eSATA controller (a
> 3132, specifically this one:
> http://www.span.com/catalog/product_info.php?products_id=15995 ) connected
> though the only PCI express card on the MB.

2.6.29 kernel + SII 3132 SATA controller should work fine with 3726 PMP.

I'm skeptical it's a driver problem. But I've not tested recent
kernels with that config.
I do know 2.6.26 does work with that config.

...
> History:
>
> The full 5 drives were working and being mounted correctly in the past.
> However, due to many upgrades and confusing hardware problems at the same
> time, trying to identify when that was has become a problem for me - I can't
> say when it was working. When it was working I had a JMB362 PCIexpress card
> (specifically this one:
> http://www.span.com/catalog/product_info.php?products_id=16361 ). This has
> been replaced by the SiI card in order to determine if the card is a
> problem; the problems persist and have the same symptoms. (should it be
> necessary for diagnosis, I can put the JMB362 card back). I can say for
> certain that the failures I'm seeing have happened at least on kernels
> 2.6.28.3, 2.6.28.4 and 2.6.29.
>
> During testing combinations of drives have been changed, and the bridge
> board ports that they are plugged in to. This has not appeared to make any
> difference - the factor in this equation is the number of drives that are
> connected.

This suggests the power supply is now failing to provide adequate power
for drive spinup. The WD "Green" drives certainly use less power during
normal operation (IIRC, they are 5400 RPM and only 3 platter). But they
will need substantially more to spinup.

Happen to have another PSU that could provide power to the drives?
Ie build the same topology with 3132 + 3726 but power it with a
different (or multiple) PSU.

hth,
grant

>
>
> Typical failure:
>
> A typical reads something like this (taken from kern.log from messages
> collected during initialisation):
>
> Apr  1 11:43:23 buttercup kernel: ata1: SATA link up 3.0 Gbps (SStatus 123
> SControl 0)
> Apr  1 11:43:23 buttercup kernel: ata1.15: Port Multiplier 1.1,
> 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
> Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus
> 123 SControl 320)
> Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.01: SATA link up 3.0 Gbps (SStatus
> 123 SControl 300)
> Apr  1 11:43:23 buttercup kernel: ata1.02: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.02: failed to write SCR 1 (Emask=0x1)
> Apr  1 11:43:23 buttercup kernel: ata1.02: failed to read SCR 0 (Emask=0x40)
> Apr  1 11:43:23 buttercup kernel: ata1.03: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.03: hardreset failed (port not ready)
> Apr  1 11:43:23 buttercup kernel: ata1.03: failed to read SCR 0 (Emask=0x40)
> Apr  1 11:43:23 buttercup kernel: ata1.03: reset failed, giving up
> Apr  1 11:43:23 buttercup kernel: ata1.15: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1: controller in dubious state,
> performing PORT_RST
> Apr  1 11:43:23 buttercup kernel: ata1.15: SATA link up 3.0 Gbps (SStatus
> 123 SControl 0)
> Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus
> 123 SControl 320)
> Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.01: SATA link up 3.0 Gbps (SStatus
> 123 SControl 300)
> Apr  1 11:43:23 buttercup kernel: ata1.02: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.02: SATA link up 3.0 Gbps (SStatus
> 123 SControl 300)
> Apr  1 11:43:23 buttercup kernel: ata1.03: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.03: failed to write SCR 2 (Emask=0x1)
> Apr  1 11:43:23 buttercup kernel: ata1.03: COMRESET failed (errno=-5)
> Apr  1 11:43:23 buttercup kernel: ata1.03: failed to read SCR 0 (Emask=0x40)
> Apr  1 11:43:23 buttercup kernel: ata1.03: reset failed, giving up
> Apr  1 11:43:23 buttercup kernel: ata1.15: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1: controller in dubious state,
> performing PORT_RST
> Apr  1 11:43:23 buttercup kernel: ata1.15: SATA link up 3.0 Gbps (SStatus
> 123 SControl 0)
> Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus
> 123 SControl 320)
> Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.01: SATA link up 3.0 Gbps (SStatus
> 123 SControl 300)
> Apr  1 11:43:23 buttercup kernel: ata1.02: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.02: SATA link up 3.0 Gbps (SStatus
> 123 SControl 300)
> Apr  1 11:43:23 buttercup kernel: ata1.03: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.03: failed to read SCR 2 (Emask=0x1)
> Apr  1 11:43:23 buttercup kernel: ata1.03: COMRESET failed (errno=-5)
> Apr  1 11:43:23 buttercup kernel: ata1.03: failed to read SCR 0 (Emask=0x40)
> Apr  1 11:43:23 buttercup kernel: ata1.03: reset failed, giving up
> Apr  1 11:43:23 buttercup kernel: ata1.03: failed to recover link after 3
> tries, disabling
> Apr  1 11:43:23 buttercup kernel: ata1.15: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1: controller in dubious state,
> performing PORT_RST
>
> ... and so on until it tries detaching the port multiplier ...
>
> Apr  1 11:43:23 buttercup kernel: ata1: controller in dubious state,
> performing PORT_RST
> Apr  1 11:43:23 buttercup kernel: ata1.15: SATA link up 3.0 Gbps (SStatus
> 123 SControl 0)
> Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus
> 123 SControl 320)
> Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.01: failed to read SCR 2 (Emask=0x40)
> Apr  1 11:43:23 buttercup kernel: ata1.01: COMRESET failed (errno=-5)
> Apr  1 11:43:23 buttercup kernel: ata1.01: failed to read SCR 0 (Emask=0x40)
> Apr  1 11:43:23 buttercup kernel: ata1.01: reset failed, giving up
> Apr  1 11:43:23 buttercup kernel: ata1.01: failed to recover link after 3
> tries, disabling
> Apr  1 11:43:23 buttercup kernel: ata1.15: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.15: SATA link up 3.0 Gbps (SStatus
> 123 SControl 0)
> Apr  1 11:43:23 buttercup kernel: ata1.04: failed to read SCR 0 (Emask=0x40)
> Apr  1 11:43:23 buttercup kernel: ata1.04: COMRESET failed (errno=-5)
> Apr  1 11:43:23 buttercup kernel: ata1.04: failed to write SCR 1
> (Emask=0x40)
> Apr  1 11:43:23 buttercup kernel: ata1.04: failed to clear SError.N
> (errno=-5)
> Apr  1 11:43:23 buttercup kernel: ata1: failed to recover PMP after 5 tries,
> giving up
> Apr  1 11:43:23 buttercup kernel: ata1.15: Port Multiplier detaching
> Apr  1 11:43:23 buttercup kernel: ata1.00: disabled
> Apr  1 11:43:23 buttercup kernel: ata1: exception Emask 0x13 SAct 0x0 SErr
> 0x40d0000 action 0xe frozen t4
> Apr  1 11:43:23 buttercup kernel: ata1: irq_stat 0x01100010, PHY RDY changed
> Apr  1 11:43:23 buttercup kernel: ata1: SError: { PHYRdyChg CommWake 10B8B
> DevExch }
> Apr  1 11:43:23 buttercup kernel: ata1: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1: SATA link up 3.0 Gbps (SStatus 123
> SControl 0)
> Apr  1 11:43:23 buttercup kernel: ata1.15: Port Multiplier 1.1,
> 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
> Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus
> 123 SControl 320)
> Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.01: SATA link up 3.0 Gbps (SStatus
> 123 SControl 300)
> Apr  1 11:43:23 buttercup kernel: ata1.02: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.02: SATA link up 3.0 Gbps (SStatus
> 123 SControl 300)
> Apr  1 11:43:23 buttercup kernel: ata1.03: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.03: failed to read SCR 0 (Emask=0x40)
> Apr  1 11:43:23 buttercup kernel: ata1.03: COMRESET failed (errno=-5)
> Apr  1 11:43:23 buttercup kernel: ata1.03: failed to read SCR 0 (Emask=0x40)
> Apr  1 11:43:23 buttercup kernel: ata1.03: reset failed, giving up
> Apr  1 11:43:23 buttercup kernel: ata1.15: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.15: SATA link up 3.0 Gbps (SStatus
> 123 SControl 0)
> Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus
> 123 SControl 320)
> Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.01: SATA link up 3.0 Gbps (SStatus
> 123 SControl 300)
> Apr  1 11:43:23 buttercup kernel: ata1.02: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.02: failed to read SCR 0 (Emask=0x1)
> Apr  1 11:43:23 buttercup kernel: ata1.02: COMRESET failed (errno=-5)
> Apr  1 11:43:23 buttercup kernel: ata1.02: failed to read SCR 0 (Emask=0x40)
> Apr  1 11:43:23 buttercup kernel: ata1.02: reset failed, giving up
> Apr  1 11:43:23 buttercup kernel: ata1.15: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.15: SATA link up 3.0 Gbps (SStatus
> 123 SControl 0)
> Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus
> 123 SControl 320)
> Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.01: SATA link up 3.0 Gbps (SStatus
> 123 SControl 300)
> Apr  1 11:43:23 buttercup kernel: ata1.02: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.02: failed to read SCR 0 (Emask=0x1)
> Apr  1 11:43:23 buttercup kernel: ata1.02: failed to read SCR 1 (Emask=0x40)
> Apr  1 11:43:23 buttercup kernel: ata1.02: failed to read SCR 0 (Emask=0x40)
> Apr  1 11:43:23 buttercup kernel: ata1.03: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.03: hardreset failed (port not ready)
> Apr  1 11:43:23 buttercup kernel: ata1.03: failed to read SCR 0 (Emask=0x40)
> Apr  1 11:43:23 buttercup kernel: ata1.03: reset failed, giving up
> Apr  1 11:43:23 buttercup kernel: ata1.15: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1: controller in dubious state,
> performing PORT_RST
>
> ... and the sequence repeats until it gets fed up ...
>
> Apr  1 11:43:23 buttercup kernel: ata1: controller in dubious state,
> performing PORT_RST
> Apr  1 11:43:23 buttercup kernel: ata1.15: SATA link up 3.0 Gbps (SStatus
> 123 SControl 0)
> Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus
> 123 SControl 320)
> Apr  1 11:43:23 buttercup kernel: ata1.01: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.01: SATA link down (SStatus 221
> SControl 300)
> Apr  1 11:43:23 buttercup kernel: ata1.05: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.05: SATA link up 1.5 Gbps (SStatus
> 113 SControl 320)
> Apr  1 11:43:23 buttercup kernel: ata1.00: ATA-8: WDC WD5000AAKS-00YGA0,
> 12.01C02, max UDMA/133
> Apr  1 11:43:23 buttercup kernel: ata1.00: 976773168 sectors, multi 16:
> LBA48 NCQ (depth 31/32)
> Apr  1 11:43:23 buttercup kernel: ata1.00: configured for UDMA/100
> Apr  1 11:43:23 buttercup kernel: ata1.04: PHY status changed but maxed out
> on retries, giving up
> Apr  1 11:43:23 buttercup kernel: ata1.04: Manully issue scan to resume this
> link
> Apr  1 11:43:23 buttercup kernel: ata1: PMP SError.N set for some ports,
> repeating recovery
> Apr  1 11:43:23 buttercup kernel: ata1.00: hard resetting link
> Apr  1 11:43:23 buttercup kernel: ata1.00: SATA link up 3.0 Gbps (SStatus
> 123 SControl 320)
> Apr  1 11:43:23 buttercup kernel: ata1.00: configured for UDMA/100
> Apr  1 11:43:23 buttercup kernel: ata1: EH pending after 5 tries, giving up
> Apr  1 11:43:23 buttercup kernel: ata1: EH complete
>
>
> As can be seen, it got as far as identifying one of the drives in this
> configuration on the final attempt, but the other 3 were not detected
> properly.
>
>
> My gut feeling:
>
> There's some timing problem involved here - either the drives are being sent
> commands when they're not ready, or they're being timed out before they have
> a chance to respond after a reset. As the problem gets worse (to the point
> of always failing) with more drives, I'm thinking of some overall timeout
> that's being triggered but the individual drives are getting less and less
> time to handle it. For example, drive 1 reset at 1s, drive 2 reset at 2s,
> drive 3 reset at 3s, etc, but an overall timeout of 8s, so by the time that
> drive 5 has been reset, it only has 3s to respond and its initialisation
> takes longer than that so it never does). Not knowing what is involved here,
> this may be complete rubbish and is purely guesswork on my part.
>
>
> More details from kernel logs:
>
> Because I'm not sure what's useful, and I wanted to capture some timings for
> the sequences of events, I've captured kernel logs of the a number of drive
> combinations. In each case the PC was turned off, the box was turned off,
> the SATA leads were connected as required for the test, then the box turned
> on, a few seconds waited for the box to settle, then the PC turned on. The
> system booted into 2.6.29 and then waited until it had settled to a login
> prompt. At this point, the drive box was turned off. The system then shut
> down whatever drives it had detected after determining that the PMP had gone
> away. The drive box was then turned on again. This second initialisation of
> the box should ensure that there are timings present in the kernel logs
> which determine how long it was between events.
>
> The numbering of the logs indicates which drives were connected - these are
> drives numbers from 1-5, not the numbers used in the log messages which are
> 0-4 (it just makes more sense for me to think of them as drives 1-5 not
> 0-4).
>
> Drives 1-3 are 500G, drives 4-5 are 1T.
>
> In the logs it can also be seen that there are two ATA drives connected to
> the MB, and two SATA drives connected to the MB. Neither of these appear to
> exhibit any other problems.
>
> The logs can be found at:
>
> http://usenet.gerph.org/SATA/
>
>
> sata-15-kern.log:
>   2 drives connected.
>   All detected during initialisation.
>   All detected on restarting box.
>
> sata-45-kern.log:
>   2 drives connected.
>   All detected during initialisation.
>   All detected on restarting box, although it reset the port 3 times.
>
> sata-125-kern.log:
>   3 drives connected.
>   All detected during initialisation, but after doing so it then tried
>   to re-detect later (which was successful)
>   All detected on restarting box, although it reset the port 2 times
>   and had SCSI errors reported which it recovered from.
>
> sata-345-kern.log:
>   3 drives connected.
>   1 detected during initialisation, only drive 4 was initialised
>   properly; during init 3 had been IDENTIFYd but the port was then
>   reset and more attempts made.
>   All detected on restarting box, although it reset the port 2 times
>   and had other errors reported which it recovered from.
>
> sata-1235-kern.log:
>   4 drives connected.
>   1 detected during initialisation (drive 1), many attempts made.
>   None detected on restarting box, although it retried many times.
>
> sata-12345-kern.log:
>   5 drives connected.
>   None detected during initialisation, many attempts made.
>   Ineffective - no output when the external box was turned off, nor
>   when it was turned on.
>
>
> Finally:
>
> I can provide more information, more combinations and try different kernel
> configurations if it's found to be useful for this. I'm sorry if this
> information is too verbose, or if I've missed something out - please let me
> know and I'll try to do tests or fill in the blanks.
>
>
> Hope someone can help with this!
>
>
> --
> Gerph <http://gerph.org/>
> [ All information, speculation, opinion or data within, or attached to,
>  this email is private and confidential. Such content may not be
>  disclosed to third parties, or a public forum, without explicit
>  permission being granted. ]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problems with >3 drives on an eSATA portmultiplier
  2009-04-01 16:45 ` Grant Grundler
@ 2009-04-01 17:09   ` Justin Fletcher
  2009-04-01 17:46     ` Grant Grundler
  0 siblings, 1 reply; 8+ messages in thread
From: Justin Fletcher @ 2009-04-01 17:09 UTC (permalink / raw)
  To: Grant Grundler; +Cc: linux-ide

On Wed, 1 Apr 2009, Grant Grundler wrote:

> On Wed, Apr 1, 2009 at 6:54 AM, Justin Fletcher <gerph@gerph.org> wrote:
>> Hiya,
>>
>>
>> The system I'm using is a MSI motherboard, with a SiI eSATA controller (a
>> 3132, specifically this one:
>> http://www.span.com/catalog/product_info.php?products_id=15995 ) connected
>> though the only PCI express card on the MB.
>
> 2.6.29 kernel + SII 3132 SATA controller should work fine with 3726 PMP.
>
> I'm skeptical it's a driver problem. But I've not tested recent
> kernels with that config.
> I do know 2.6.26 does work with that config.

I could drop down to 2.6.26 and rebuild my kernel with support for that 
controller - I hadn't done this before because I cannot then use my DVB-S 
card, but... I could live without that for the test.

I'm unsure of where the problem might lie; if it's not a driver problem 
and the PSU isn't the issue, then the only variable left (I think) is the 
bridge board itself. AKA "The expensive bit" :-)

[snip]

>>
>> During testing combinations of drives have been changed, and the bridge
>> board ports that they are plugged in to. This has not appeared to make any
>> difference - the factor in this equation is the number of drives that are
>> connected.
>
> This suggests the power supply is now failing to provide adequate power
> for drive spinup. The WD "Green" drives certainly use less power during
> normal operation (IIRC, they are 5400 RPM and only 3 platter). But they
> will need substantially more to spinup.
>
> Happen to have another PSU that could provide power to the drives?
> Ie build the same topology with 3132 + 3726 but power it with a
> different (or multiple) PSU.

I've replaced the PSU once already - it was replaced on 6th Feb, with a 
Sumvision 450W 20+4pin SATA PSU. Those 5 drives + the bridge board and fan 
are the only things being powered by that PSU. I should have thought that 
even during drive initialisation 5 drives wouldn't exceed 450W. Anyhow, 
this was one of my first thoughts. I could get another PSU (I think I 
might be able to find one around here), but having replaced it so 
recently, I'm uncomfortable doing so.

Is a 450W PSU going to be able to supply enough power for those drives ?

-- 
Gerph <http://gerph.org/>
[ All information, speculation, opinion or data within, or attached to,
   this email is private and confidential. Such content may not be
   disclosed to third parties, or a public forum, without explicit
   permission being granted. ]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problems with >3 drives on an eSATA portmultiplier
  2009-04-01 17:09   ` Justin Fletcher
@ 2009-04-01 17:46     ` Grant Grundler
  2009-04-02  0:28       ` Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Grant Grundler @ 2009-04-01 17:46 UTC (permalink / raw)
  To: Justin Fletcher; +Cc: linux-ide

On Wed, Apr 1, 2009 at 10:09 AM, Justin Fletcher <gerph@gerph.org> wrote:
> On Wed, 1 Apr 2009, Grant Grundler wrote:
>
>> On Wed, Apr 1, 2009 at 6:54 AM, Justin Fletcher <gerph@gerph.org> wrote:
>>>
>>> Hiya,
>>>
>>>
>>> The system I'm using is a MSI motherboard, with a SiI eSATA controller (a
>>> 3132, specifically this one:
>>> http://www.span.com/catalog/product_info.php?products_id=15995 )
>>> connected
>>> though the only PCI express card on the MB.
>>
>> 2.6.29 kernel + SII 3132 SATA controller should work fine with 3726 PMP.
>>
>> I'm skeptical it's a driver problem. But I've not tested recent
>> kernels with that config.
>> I do know 2.6.26 does work with that config.
>
> I could drop down to 2.6.26 and rebuild my kernel with support for that
> controller - I hadn't done this before because I cannot then use my DVB-S
> card, but... I could live without that for the test.

If someone could comment on  2.6.28 or 2.6.29, you could avoid running
this test.

...
> I've replaced the PSU once already - it was replaced on 6th Feb, with a
> Sumvision 450W 20+4pin SATA PSU. Those 5 drives + the bridge board and fan
> are the only things being powered by that PSU. I should have thought that
> even during drive initialisation 5 drives wouldn't exceed 450W. Anyhow, this
> was one of my first thoughts. I could get another PSU (I think I might be
> able to find one around here), but having replaced it so recently, I'm
> uncomfortable doing so.
>
> Is a 450W PSU going to be able to supply enough power for those drives ?

It's not just about how many watts. Both 12v and 5v "rails" need to
provide enough power.
But if this worked before, I would assume the replacement is working too.

And in general, yes, a 450W should be plenty for 5 drives plus PMP
board. Especially if
the OS is "staggering" the spinup so only one or two drives are
spinning up at the same time.
I know I've had to fix "staggered spinup" in the past (2.6.18) but
don't recall if it's fixed in current 2.6.28 or .29 kernels.

grant

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problems with >3 drives on an eSATA portmultiplier
  2009-04-01 17:46     ` Grant Grundler
@ 2009-04-02  0:28       ` Tejun Heo
  2009-04-03 21:06         ` Justin Fletcher
  0 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2009-04-02  0:28 UTC (permalink / raw)
  To: Grant Grundler; +Cc: Justin Fletcher, linux-ide

Grant Grundler wrote:
>> I could drop down to 2.6.26 and rebuild my kernel with support for that
>> controller - I hadn't done this before because I cannot then use my DVB-S
>> card, but... I could live without that for the test.
> 
> If someone could comment on  2.6.28 or 2.6.29, you could avoid running
> this test.

Hmmm... I don't really remember changing much, so it should be fine.
Can you please give a shot at 2.6.26 anyway?  Also, kernel logs with
JMB controller would be helpful too as ahci tends to better show what's
really going on and failing during reset.

>> I've replaced the PSU once already - it was replaced on 6th Feb, with a
>> Sumvision 450W 20+4pin SATA PSU. Those 5 drives + the bridge board and fan
>> are the only things being powered by that PSU. I should have thought that
>> even during drive initialisation 5 drives wouldn't exceed 450W. Anyhow, this
>> was one of my first thoughts. I could get another PSU (I think I might be
>> able to find one around here), but having replaced it so recently, I'm
>> uncomfortable doing so.
>>
>> Is a 450W PSU going to be able to supply enough power for those drives ?
> 
> It's not just about how many watts. Both 12v and 5v "rails" need to
> provide enough power.  But if this worked before, I would assume the
> replacement is working too.
> 
> And in general, yes, a 450W should be plenty for 5 drives plus PMP
> board. Especially if the OS is "staggering" the spinup so only one
> or two drives are spinning up at the same time.  I know I've had to
> fix "staggered spinup" in the past (2.6.18) but don't recall if it's
> fixed in current 2.6.28 or .29 kernels.

libata never had proper provision for staggered spin up.  If it works,
it's just because they're being probed sequentially.  :-)

Anyways, 3132 + 3726 should work.  Both 3726 and 4726 pretty much
require its first port to be always occupied (or it's fake config
drive jumps arounds and freaks out) and hotplug is often flaky but
other than that it should generally work.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problems with >3 drives on an eSATA portmultiplier
  2009-04-02  0:28       ` Tejun Heo
@ 2009-04-03 21:06         ` Justin Fletcher
  2009-04-07 21:20           ` Justin Fletcher
  0 siblings, 1 reply; 8+ messages in thread
From: Justin Fletcher @ 2009-04-03 21:06 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Grant Grundler, linux-ide

On Thu, 2 Apr 2009, Tejun Heo wrote:

> Grant Grundler wrote:
>>> I could drop down to 2.6.26 and rebuild my kernel with support for that
>>> controller - I hadn't done this before because I cannot then use my DVB-S
>>> card, but... I could live without that for the test.
>>
>> If someone could comment on  2.6.28 or 2.6.29, you could avoid running
>> this test.
>
> Hmmm... I don't really remember changing much, so it should be fine.
> Can you please give a shot at 2.6.26 anyway?  Also, kernel logs with
> JMB controller would be helpful too as ahci tends to better show what's
> really going on and failing during reset.

I tried the JMB controller again with 2.6.26, and with drives 1,2,3 
connected. This was different to what had happened previously - there are 
two lights on the bridgeboard, green and red. Most of the time they're 
both on. When the JMB controller was connected (even in the BIOS screen) 
these lights flashed on and off about once a second. Occassionally they'd 
settle into the on state. The log of booting into 2.6.26 is here:

http://usenet.gerph.org/SATA/sata-123-2.6.26-jmb-kern.log

and mostly consists of:

Apr  3 20:33:43 buttercup kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen
Apr  3 20:33:43 buttercup kernel: ata1: irq_stat 0x00000040, connection status changed
Apr  3 20:33:43 buttercup kernel: ata1: SError: { DevExch }
Apr  3 20:33:43 buttercup kernel: ata1: hard resetting link
Apr  3 20:33:44 buttercup kernel: ata1: SATA link down (SStatus 0 SControl 300)
Apr  3 20:33:44 buttercup kernel: ata1: EH complete

repeated over and over whilst the light was flashing (not in time with 
it).

An interesting (?) thing here is that at 20:34:27 in the log the light 
stayed on for a bit and we managed to read details from the drives. Then 
the light went back to flashing regularly again.

I also booted into 2.6.29 with this configuration (but with drives 3,4,5) 
and got a similar effect, except for the constant light:

http://usenet.gerph.org/SATA/sata-345-2.6.29-jmb-kern.log

I don't think it's particularly surprising that this exhibits the same 
results, given that even during the BIOS the light flashes in the same 
way.

I removed and re-fitted the JMB card as part of these tests, in case it 
was not seated properly in the slot (unlikely, as it was detected, but 
just to be sure, I did this anyhow). I also exchanged the eSATA cables to 
try to eliminate them. No change in behaviour was observed.

Having tried this, I reinserted the SiI card and rebuilt 2.6.26 with 
support for it (and increased the debug buffer). Whilst in the BIOS and 
after booting normally, the bridge board lights are almost constantly on. 
They only seem to go off when the link is reset. Booting with drives 
1,2,3,4 connected (which should be a failure, or only partially detect the 
drives according to the 2.6.29 results) gave me:

http://usenet.gerph.org/SATA/sata-1234-2.6.26-sil-kern.log

which did many resets during the initialisation and never detected any 
drives. Turning the external box off and on again did not cause it to 
re-detect the drives (as if it was just completely ignoring the interface 
now).

Repeating the test with drives 3,4,5 and the 2.6.26 kernel and the sil 
card gave me the following log:

http://usenet.gerph.org/SATA/sata-345-2.6.26-sil-kern.log

This managed to identify 2 of the 3 drives during initialisation. When the 
box was turned off and on again it only detected 2 of the 3.

>>> I've replaced the PSU once already - it was replaced on 6th Feb, with a
>>> Sumvision 450W 20+4pin SATA PSU. Those 5 drives + the bridge board and fan
>>> are the only things being powered by that PSU. I should have thought that
>>> even during drive initialisation 5 drives wouldn't exceed 450W. Anyhow, this
>>> was one of my first thoughts. I could get another PSU (I think I might be
>>> able to find one around here), but having replaced it so recently, I'm
>>> uncomfortable doing so.
>>>
>>> Is a 450W PSU going to be able to supply enough power for those drives ?
>>
>> It's not just about how many watts. Both 12v and 5v "rails" need to
>> provide enough power.  But if this worked before, I would assume the
>> replacement is working too.
>>
>> And in general, yes, a 450W should be plenty for 5 drives plus PMP
>> board. Especially if the OS is "staggering" the spinup so only one
>> or two drives are spinning up at the same time.  I know I've had to
>> fix "staggered spinup" in the past (2.6.18) but don't recall if it's
>> fixed in current 2.6.28 or .29 kernels.
>
> libata never had proper provision for staggered spin up.  If it works,
> it's just because they're being probed sequentially.  :-)
>
> Anyways, 3132 + 3726 should work.  Both 3726 and 4726 pretty much
> require its first port to be always occupied (or it's fake config
> drive jumps arounds and freaks out) and hotplug is often flaky but
> other than that it should generally work.

So my using ports 3,4,5 only probably isn't all that useful for it ?

Even still, I'm not sure that the behaviour that I'm seeing is down to a 
driver issue now. I'd appreciate any other thoughts based on the extra 
info I've given. I may just bite the bullet and try a replacement bridge 
board and see if that helps.

-- 
Gerph <http://gerph.org/>
[ All information, speculation, opinion or data within, or attached to,
   this email is private and confidential. Such content may not be
   disclosed to third parties, or a public forum, without explicit
   permission being granted. ]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problems with >3 drives on an eSATA portmultiplier
  2009-04-03 21:06         ` Justin Fletcher
@ 2009-04-07 21:20           ` Justin Fletcher
  2009-04-14 10:45             ` Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Justin Fletcher @ 2009-04-07 21:20 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Grant Grundler, linux-ide

On Fri, 3 Apr 2009, Justin Fletcher wrote:

> On Thu, 2 Apr 2009, Tejun Heo wrote:
>
>> Grant Grundler wrote:
>>>> I could drop down to 2.6.26 and rebuild my kernel with support for that
>>>> controller - I hadn't done this before because I cannot then use my DVB-S
>>>> card, but... I could live without that for the test.
>>> 
>>> If someone could comment on  2.6.28 or 2.6.29, you could avoid running
>>> this test.
>> 
>> Hmmm... I don't really remember changing much, so it should be fine.
>> Can you please give a shot at 2.6.26 anyway?  Also, kernel logs with
>> JMB controller would be helpful too as ahci tends to better show what's
>> really going on and failing during reset.

For future reference if anyone else sees similar issues with their 
systems... I have replaced the bridge board with another and the problem 
has gone away - all 5 drives are accessible and I have no issues.

Thanks for the help, and sorry to have wasted your time.

-- 
Gerph <http://gerph.org/>
[ All information, speculation, opinion or data within, or attached to,
   this email is private and confidential. Such content may not be
   disclosed to third parties, or a public forum, without explicit
   permission being granted. ]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problems with >3 drives on an eSATA portmultiplier
  2009-04-07 21:20           ` Justin Fletcher
@ 2009-04-14 10:45             ` Tejun Heo
  0 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2009-04-14 10:45 UTC (permalink / raw)
  To: Justin Fletcher; +Cc: Grant Grundler, linux-ide

Hello,

Justin Fletcher wrote:
> For future reference if anyone else sees similar issues with their
> systems... I have replaced the bridge board with another and the problem
> has gone away - all 5 drives are accessible and I have no issues.
> 
> Thanks for the help, and sorry to have wasted your time.

Ah... thanks for reporting.  Was worried whether you discovered
yet-unknown 3726 detection problem.  :-)

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-04-14 10:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-01 13:54 Problems with >3 drives on an eSATA portmultiplier Justin Fletcher
2009-04-01 16:45 ` Grant Grundler
2009-04-01 17:09   ` Justin Fletcher
2009-04-01 17:46     ` Grant Grundler
2009-04-02  0:28       ` Tejun Heo
2009-04-03 21:06         ` Justin Fletcher
2009-04-07 21:20           ` Justin Fletcher
2009-04-14 10:45             ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).