linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yan Seiner <yan@seiner.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-ide@vger.kernel.org
Subject: Re: Port multiplier resets
Date: Wed, 19 Nov 2008 07:15:42 -0800	[thread overview]
Message-ID: <49242D9E.4050500@seiner.com> (raw)
In-Reply-To: <49194B20.8000209@kernel.org>

Tejun Heo wrote:
> Yan Seiner wrote:
>   
>> I'm seeing errors on a new port multiplier install.  The mobo has a
>> J-Micron SATA controller: JMicron Technologies, Inc.  JMicron
>> 20360/20363 AHCI Controller (rev 03) The port multiplier has a SIL
>> 3726 chipset.  AFAICT, this is a supported comnbination.  The kernel
>> recognizes the chipset.  The system has an adaptec SCSI controller
>> w/ 2 15K SCSI drives, 6 internal SATA drives, and 3 external SATA
>> drives using the port multiplier. Because of a broken bios (ARGH!) I
>> have to power up the external drives after the system scans the SCSI
>> bus.  This seems to cause no problems.  Here's a log of a recent
>> boot (scroll down for the error I see after the system is booted):
>>
>> Nov  2 05:26:07 selene kernel: [   28.691524] ata8: SATA max UDMA/133 abar m8192@0xfdcfe000 port 0xfdcfe180 irq 16
>> Nov  2 05:26:07 selene kernel: [   34.104580] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> Nov  2 05:26:07 selene kernel: [   34.104580] ata8.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
>> Nov  2 05:26:07 selene kernel: [   34.104580] ata8.00: hard resetting link
>> Nov  2 05:26:07 selene kernel: [   34.425000] ata8.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov  2 05:26:07 selene kernel: [   34.425003] ata8.01: hard resetting link
>> Nov  2 05:26:07 selene kernel: [   35.001232] ata8.01: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov  2 05:26:07 selene kernel: [   35.001234] ata8.02: hard resetting link
>> Nov  2 05:26:07 selene kernel: [   35.321389] ata8.02: SATA link down (SStatus 0 SControl 320)
>> Nov  2 05:26:07 selene kernel: [   35.321419] ata8.03: hard resetting link
>> Nov  2 05:26:07 selene kernel: [   35.869739] ata8.03: SATA link down (SStatus 0 SControl 320)
>> Nov  2 05:26:07 selene kernel: [   35.869769] ata8.04: hard resetting link
>> Nov  2 05:26:07 selene kernel: [   36.621491] ata8.04: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov  2 05:26:07 selene kernel: [   36.621493] ata8.05: hard resetting link
>> Nov  2 05:26:07 selene kernel: [   36.941025] ata8.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov  2 05:26:07 selene kernel: [   36.942522] ata8.00: ATA-8: ST31000340AS, SD15, max UDMA/133
>> Nov  2 05:26:07 selene kernel: [   36.942525] ata8.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>> Nov  2 05:26:07 selene kernel: [   36.944479] ata8.00: configured for UDMA/133
>> Nov  2 05:26:07 selene kernel: [   53.613888] ata8.01: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)
>> Nov  2 05:26:07 selene kernel: [   53.613888] ata8.04: failed to IDENTIFY (I/O error, err_mask=0x100)
>> Nov  2 05:26:07 selene kernel: [   53.613888] ata8: failed to recover some devices, retrying in 5 secs
>> Nov  2 05:26:07 selene kernel: [   61.420677] ata8.01: hard resetting link
>> Nov  2 05:26:07 selene kernel: [   66.420763] ata8.15: qc timeout (cmd 0xe4)
>> Nov  2 05:26:07 selene kernel: [   66.420770] ata8.01: failed to read SCR 2 (Emask=0x4)
>> Nov  2 05:26:07 selene kernel: [   66.420773] ata8.01: failed to read SCR 2 (Emask=0x40)
>> Nov  2 05:26:07 selene kernel: [   66.420775] ata8.01: COMRESET failed (errno=-5)
>> Nov  2 05:26:07 selene kernel: [   66.420804] ata8.01: failed to read SCR 0 (Emask=0x40)
>> Nov  2 05:26:07 selene kernel: [   66.420806] ata8.01: reset failed, giving up
>> Nov  2 05:26:07 selene kernel: [   66.420835] ata8.15: hard resetting link
>> Nov  2 05:26:07 selene kernel: [   68.284883] ata8.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> Nov  2 05:26:07 selene kernel: [   68.285116] ata8.00: hard resetting link
>> Nov  2 05:26:07 selene kernel: [   68.604947] ata8.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov  2 05:26:07 selene kernel: [   68.604949] ata8.01: hard resetting link
>> Nov  2 05:26:07 selene kernel: [   69.329438] ata8.01: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov  2 05:26:07 selene kernel: [   69.329440] ata8.02: hard resetting link
>> Nov  2 05:26:07 selene kernel: [   69.649777] ata8.02: SATA link down (SStatus 0 SControl 320)
>> Nov  2 05:26:07 selene kernel: [   69.649807] ata8.03: hard resetting link
>> Nov  2 05:26:07 selene kernel: [   69.969439] ata8.03: SATA link down (SStatus 0 SControl 320)
>> Nov  2 05:26:07 selene kernel: [   69.969469] ata8.04: hard resetting link
>> Nov  2 05:26:07 selene kernel: [   70.813852] ata8.04: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov  2 05:26:07 selene kernel: [   70.813855] ata8.05: hard resetting link
>> Nov  2 05:26:07 selene kernel: [   71.377665] ata8.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov  2 05:26:07 selene kernel: [   71.381475] ata8.00: configured for UDMA/133
>> Nov  2 05:26:07 selene kernel: [   71.381475] ata8.01: ATA-8: ST31000340AS, SD15, max UDMA/133
>> Nov  2 05:26:07 selene kernel: [   71.381475] ata8.01: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>> Nov  2 05:26:07 selene kernel: [   71.381879] ata8.01: configured for UDMA/133
>> Nov  2 05:26:07 selene kernel: [   71.381879] ata8.04: ATA-8: ST31000340AS, SD15, max UDMA/133
>> Nov  2 05:26:07 selene kernel: [   71.381879] ata8.04: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>> Nov  2 05:26:07 selene kernel: [   71.381890] ata8.04: configured for UDMA/133
>> Nov  2 05:26:07 selene kernel: [   71.389769] ata8: EH complete
>>     
>
> The link probably got a PHY event after reset sequence is complete.
> Maybe our timinig is too aggressive or the PMP is just quirky but as
> long as detection succeeds in the end, it should be okay.
>
>   
>> Once the system is up and running, I get these errors.  They appear
>> anywhere from several hours to a few minutes apart.  What exactly
>> does this mean?  It doesn't seem to have any impact on the
>> performance of the drives.  Typically, these drives are very heavily
>> loaded; I've been dumping > 1TB of data to them while rebuilding the
>> RAID-5 array, so they're pretty well maxed out.
>>
>> Nov  2 05:40:13 selene kernel: [  984.254543] ata8.15: exception Emask 0x10 SAct 0x0 SErr 0x780101 action 0x7
>> Nov  2 05:40:13 selene kernel: [  984.254549] ata8.15: irq_stat 0x0c000000
>> Nov  2 05:40:13 selene kernel: [  984.254552] ata8: SError: { RecovData UnrecovData 10B8B Dispar BadCRC Handshk }
>> Nov  2 05:40:13 selene kernel: [  984.254557] ata8.01: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x0
>>     
>
> It indicates that the link experienced ATA bus transmission failure.
> Heh... Not only that the controller turned on all possible link error
> bits.  Hmmm... The second device has active commands and it's the same
> device which caused problem during boot too.  Can you please try the
> followings?
>
> 1. Swap the second drive with another one and see whether the error
>    follows the drive or stays with the slot.
>
> 2. Specify libata.force=8.15:1.5Gbps
>   

Sorry for the delayed response.  I've installed another controller based 
on the SiI 3132 chipset.  The esata array is the same; I've simply moved 
the esata cables to the new controller.  My observations:

1.  The JMicron 20360/20363 AHCI Controller (rev 03) controller is far, 
far slower.  hdparm clocks it at 20 mb/sec and a raid check takes about 
15 hours.

2. The JMicron controller experiences the above errors.

3.  The Sil 3132 controller is faster.  hdparm says 110 mb/sec and a 
raid check takes about 5 hours.  This is confirmed by usage - I can 
stream 2 videos, record 2 videos, and commercial flag a video in myth 
with no visible stuttering.  The JMicron controller stutters even with a 
single video streaming and recording.

4.  The Sil 3132 controller doesn't have any errors even under high load.

Google says the rev 3 version of the JMicron controller has these 
issues, and to turn off NCQ on it.  It also indicates that these 
problems should have been resolved in my kernel - version 2.6.27.4 - but 
apparently not.  :-(

Is there a way to turn off NCQ on that controller without impacting the 
Sil 3132 controller?  Any other things to try?

--Yan

  reply	other threads:[~2008-11-19 15:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-02 15:10 Port multiplier resets Yan Seiner
2008-11-11  9:06 ` Tejun Heo
2008-11-19 15:15   ` Yan Seiner [this message]
2008-11-20  2:27     ` Tejun Heo
2008-11-20  3:05       ` Yan Seiner
2008-11-20  3:23         ` Tejun Heo
2008-11-20  3:19   ` Yan Seiner
2008-11-20  3:24     ` Tejun Heo
2008-11-22 16:48       ` Yan Seiner
2008-11-23  0:53         ` Tejun Heo
2008-11-25  4:44           ` Yan Seiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49242D9E.4050500@seiner.com \
    --to=yan@seiner.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).