From: Yan Seiner <yan@seiner.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-ide@vger.kernel.org
Subject: Re: Port multiplier resets
Date: Wed, 19 Nov 2008 07:15:42 -0800 [thread overview]
Message-ID: <49242D9E.4050500@seiner.com> (raw)
In-Reply-To: <49194B20.8000209@kernel.org>
Tejun Heo wrote:
> Yan Seiner wrote:
>
>> I'm seeing errors on a new port multiplier install. The mobo has a
>> J-Micron SATA controller: JMicron Technologies, Inc. JMicron
>> 20360/20363 AHCI Controller (rev 03) The port multiplier has a SIL
>> 3726 chipset. AFAICT, this is a supported comnbination. The kernel
>> recognizes the chipset. The system has an adaptec SCSI controller
>> w/ 2 15K SCSI drives, 6 internal SATA drives, and 3 external SATA
>> drives using the port multiplier. Because of a broken bios (ARGH!) I
>> have to power up the external drives after the system scans the SCSI
>> bus. This seems to cause no problems. Here's a log of a recent
>> boot (scroll down for the error I see after the system is booted):
>>
>> Nov 2 05:26:07 selene kernel: [ 28.691524] ata8: SATA max UDMA/133 abar m8192@0xfdcfe000 port 0xfdcfe180 irq 16
>> Nov 2 05:26:07 selene kernel: [ 34.104580] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> Nov 2 05:26:07 selene kernel: [ 34.104580] ata8.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
>> Nov 2 05:26:07 selene kernel: [ 34.104580] ata8.00: hard resetting link
>> Nov 2 05:26:07 selene kernel: [ 34.425000] ata8.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov 2 05:26:07 selene kernel: [ 34.425003] ata8.01: hard resetting link
>> Nov 2 05:26:07 selene kernel: [ 35.001232] ata8.01: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov 2 05:26:07 selene kernel: [ 35.001234] ata8.02: hard resetting link
>> Nov 2 05:26:07 selene kernel: [ 35.321389] ata8.02: SATA link down (SStatus 0 SControl 320)
>> Nov 2 05:26:07 selene kernel: [ 35.321419] ata8.03: hard resetting link
>> Nov 2 05:26:07 selene kernel: [ 35.869739] ata8.03: SATA link down (SStatus 0 SControl 320)
>> Nov 2 05:26:07 selene kernel: [ 35.869769] ata8.04: hard resetting link
>> Nov 2 05:26:07 selene kernel: [ 36.621491] ata8.04: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov 2 05:26:07 selene kernel: [ 36.621493] ata8.05: hard resetting link
>> Nov 2 05:26:07 selene kernel: [ 36.941025] ata8.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov 2 05:26:07 selene kernel: [ 36.942522] ata8.00: ATA-8: ST31000340AS, SD15, max UDMA/133
>> Nov 2 05:26:07 selene kernel: [ 36.942525] ata8.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>> Nov 2 05:26:07 selene kernel: [ 36.944479] ata8.00: configured for UDMA/133
>> Nov 2 05:26:07 selene kernel: [ 53.613888] ata8.01: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)
>> Nov 2 05:26:07 selene kernel: [ 53.613888] ata8.04: failed to IDENTIFY (I/O error, err_mask=0x100)
>> Nov 2 05:26:07 selene kernel: [ 53.613888] ata8: failed to recover some devices, retrying in 5 secs
>> Nov 2 05:26:07 selene kernel: [ 61.420677] ata8.01: hard resetting link
>> Nov 2 05:26:07 selene kernel: [ 66.420763] ata8.15: qc timeout (cmd 0xe4)
>> Nov 2 05:26:07 selene kernel: [ 66.420770] ata8.01: failed to read SCR 2 (Emask=0x4)
>> Nov 2 05:26:07 selene kernel: [ 66.420773] ata8.01: failed to read SCR 2 (Emask=0x40)
>> Nov 2 05:26:07 selene kernel: [ 66.420775] ata8.01: COMRESET failed (errno=-5)
>> Nov 2 05:26:07 selene kernel: [ 66.420804] ata8.01: failed to read SCR 0 (Emask=0x40)
>> Nov 2 05:26:07 selene kernel: [ 66.420806] ata8.01: reset failed, giving up
>> Nov 2 05:26:07 selene kernel: [ 66.420835] ata8.15: hard resetting link
>> Nov 2 05:26:07 selene kernel: [ 68.284883] ata8.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> Nov 2 05:26:07 selene kernel: [ 68.285116] ata8.00: hard resetting link
>> Nov 2 05:26:07 selene kernel: [ 68.604947] ata8.00: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov 2 05:26:07 selene kernel: [ 68.604949] ata8.01: hard resetting link
>> Nov 2 05:26:07 selene kernel: [ 69.329438] ata8.01: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov 2 05:26:07 selene kernel: [ 69.329440] ata8.02: hard resetting link
>> Nov 2 05:26:07 selene kernel: [ 69.649777] ata8.02: SATA link down (SStatus 0 SControl 320)
>> Nov 2 05:26:07 selene kernel: [ 69.649807] ata8.03: hard resetting link
>> Nov 2 05:26:07 selene kernel: [ 69.969439] ata8.03: SATA link down (SStatus 0 SControl 320)
>> Nov 2 05:26:07 selene kernel: [ 69.969469] ata8.04: hard resetting link
>> Nov 2 05:26:07 selene kernel: [ 70.813852] ata8.04: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov 2 05:26:07 selene kernel: [ 70.813855] ata8.05: hard resetting link
>> Nov 2 05:26:07 selene kernel: [ 71.377665] ata8.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> Nov 2 05:26:07 selene kernel: [ 71.381475] ata8.00: configured for UDMA/133
>> Nov 2 05:26:07 selene kernel: [ 71.381475] ata8.01: ATA-8: ST31000340AS, SD15, max UDMA/133
>> Nov 2 05:26:07 selene kernel: [ 71.381475] ata8.01: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>> Nov 2 05:26:07 selene kernel: [ 71.381879] ata8.01: configured for UDMA/133
>> Nov 2 05:26:07 selene kernel: [ 71.381879] ata8.04: ATA-8: ST31000340AS, SD15, max UDMA/133
>> Nov 2 05:26:07 selene kernel: [ 71.381879] ata8.04: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>> Nov 2 05:26:07 selene kernel: [ 71.381890] ata8.04: configured for UDMA/133
>> Nov 2 05:26:07 selene kernel: [ 71.389769] ata8: EH complete
>>
>
> The link probably got a PHY event after reset sequence is complete.
> Maybe our timinig is too aggressive or the PMP is just quirky but as
> long as detection succeeds in the end, it should be okay.
>
>
>> Once the system is up and running, I get these errors. They appear
>> anywhere from several hours to a few minutes apart. What exactly
>> does this mean? It doesn't seem to have any impact on the
>> performance of the drives. Typically, these drives are very heavily
>> loaded; I've been dumping > 1TB of data to them while rebuilding the
>> RAID-5 array, so they're pretty well maxed out.
>>
>> Nov 2 05:40:13 selene kernel: [ 984.254543] ata8.15: exception Emask 0x10 SAct 0x0 SErr 0x780101 action 0x7
>> Nov 2 05:40:13 selene kernel: [ 984.254549] ata8.15: irq_stat 0x0c000000
>> Nov 2 05:40:13 selene kernel: [ 984.254552] ata8: SError: { RecovData UnrecovData 10B8B Dispar BadCRC Handshk }
>> Nov 2 05:40:13 selene kernel: [ 984.254557] ata8.01: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x0
>>
>
> It indicates that the link experienced ATA bus transmission failure.
> Heh... Not only that the controller turned on all possible link error
> bits. Hmmm... The second device has active commands and it's the same
> device which caused problem during boot too. Can you please try the
> followings?
>
> 1. Swap the second drive with another one and see whether the error
> follows the drive or stays with the slot.
>
> 2. Specify libata.force=8.15:1.5Gbps
>
Sorry for the delayed response. I've installed another controller based
on the SiI 3132 chipset. The esata array is the same; I've simply moved
the esata cables to the new controller. My observations:
1. The JMicron 20360/20363 AHCI Controller (rev 03) controller is far,
far slower. hdparm clocks it at 20 mb/sec and a raid check takes about
15 hours.
2. The JMicron controller experiences the above errors.
3. The Sil 3132 controller is faster. hdparm says 110 mb/sec and a
raid check takes about 5 hours. This is confirmed by usage - I can
stream 2 videos, record 2 videos, and commercial flag a video in myth
with no visible stuttering. The JMicron controller stutters even with a
single video streaming and recording.
4. The Sil 3132 controller doesn't have any errors even under high load.
Google says the rev 3 version of the JMicron controller has these
issues, and to turn off NCQ on it. It also indicates that these
problems should have been resolved in my kernel - version 2.6.27.4 - but
apparently not. :-(
Is there a way to turn off NCQ on that controller without impacting the
Sil 3132 controller? Any other things to try?
--Yan
next prev parent reply other threads:[~2008-11-19 15:15 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-02 15:10 Port multiplier resets Yan Seiner
2008-11-11 9:06 ` Tejun Heo
2008-11-19 15:15 ` Yan Seiner [this message]
2008-11-20 2:27 ` Tejun Heo
2008-11-20 3:05 ` Yan Seiner
2008-11-20 3:23 ` Tejun Heo
2008-11-20 3:19 ` Yan Seiner
2008-11-20 3:24 ` Tejun Heo
2008-11-22 16:48 ` Yan Seiner
2008-11-23 0:53 ` Tejun Heo
2008-11-25 4:44 ` Yan Seiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49242D9E.4050500@seiner.com \
--to=yan@seiner.com \
--cc=linux-ide@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).