From: Peter Favrholdt <linux-ide@how.dk>
To: Mikael Pettersson <mikpe@it.uu.se>
Cc: linux-ide@vger.kernel.org
Subject: Re: sata_promise SATA300TX4 "intermittent problems"
Date: Thu, 08 Mar 2007 17:26:37 +0100 [thread overview]
Message-ID: <45F0393D.80301@how.dk> (raw)
In-Reply-To: <17903.7366.875191.751728@alkaid.it.uu.se>
Hi Mikael,
Thanks for the reply, I've commented below:
Mikael Pettersson wrote:
> SErr 0x01380000 would indicate:
> transport state transmission error (bit 24)
> CRC error (bit 21)
> disparity error (bit 20) [whatever that is]
> 10b_to_8b decoding error (bit 19)
>
> I.e., serious transmission issues.
:-)
> > [52849.930755] pdc_error_intr: port_status 0x00001000 serror 0x00000000
> > [52849.930880] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
> > frozen
> > [52849.930883] ata2.00: (port_status 0x00001000)
>
> "host bus timeout error" (bit 12).
> I wonder why SError was clear now.
I can't say - this whole ata thing is much too complex for me ;-)
> > I would be very happy to help debug this issue. Any suggestions on what
> > I should try next?
>
> Well, at the moment I have only one possible cure: to forcibly
> limit 3Gbps drives to 1.5Gbps operation, as the patch below does.
I haven't tried your 1.5Gbps patch (yet). But I have been running more
tests on my experiment system with the kernels I have handy. My
procedure is as follows:
1. power cycle
2. boot selected kernel
3. start dd if=/dev/sdx of=/dev/null bs=1M for x=a,b,c,d
4. wait until one fails
5. record dmesg output
So far here are my results:
2.6.18.1 fails (in 25 minutes)
2.6.19 fails (in 4 minutes)
2.6.19.2 fails (in 5 minutes)
2.6.20.1 fails (in 48 minutes)
2.6.21-rc2+p (with additional patches) doesn't fail
This is very consistent. 2.6.21-rc2+p has been tested for more than 10
hours without a hickup :-)
In the above tests it is always ata3 or ata4 (sdc or sdd) which fails.
Another strange thing which happens on 2.6.21-rc2+p but not the other
kernels: using smartctl -a -d ata while dd is running gives errors (I
also mentioned this in my first mail, but wasn't sure then):
[11046.005178] pdc_error_intr: port_status 0x00001000 serror 0x00000000
[11046.005286] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
frozen
[11046.005374] ata4.00: (port_status 0x00001000)
[11046.005383] ata4.00: cmd 25/00:00:00:3b:a0/00:01:27:00:00/e0 tag 0
cdb 0x0 data 131072 in
[11046.005385] res 50/00:00:ff:3b:a0/00:00:00:00:00/e0 Emask
0x4 (timeout)
[11046.313769] ata4: soft resetting port
[11046.469806] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11046.496254] pdc_error_intr: port_status 0x00001000 serror 0x00000000
[11046.496580] ata4.00: failed to set xfermode (err_mask=0x104)
[11046.496585] ata4: failed to recover some devices, retrying in 5 secs
[11051.495393] ata4: hard resetting port
[11051.971276] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11052.005267] ata4.00: configured for UDMA/133
[11052.005285] ata4: EH complete
[11052.042615] SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
[11052.051769] sdd: Write Protect is off
[11052.051778] sdd: Mode Sense: 00 3a 00 00
[11052.059455] SCSI device sdd: write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[11052.066354] SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
[11052.070822] sdd: Write Protect is off
[11052.070830] sdd: Mode Sense: 00 3a 00 00
[11052.073297] SCSI device sdd: write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
Then it recovers and dd continues :-)
Note that using smartctl this way on the other kernels does not show
this problem!
> On one of my test machines (an old UltraSPARC), a SATA300 TX2plus
> with a Seagate 3Gbps drive (don't have the model number handy),
> will quickly experience "DMA S/G overrun" errors during an fsck
> of a large but clean ext3 partition. With the patch below things
> work solidly on that particular machine. OTOH, on another test
> machine (a 440BX chipset Intel PIII), the same card/cable/disk
> combination works flawlessly at 3Gbps. Mysterious.
My feeling is this is not caused by 1.5Gbps or 3.0Gbps operation.
I was thinking about adding the speed selections jumpers on the
harddrives, but so far I'm not touching the system as I don't want
hardware problems (e.g. a loose cable) disturbing the test results. I'll
stick to replacing software.
My next test will be a plain 2.6.21rc2. Then I'll apply the patches one
by one.
One thought is this could be a bug/race condition which only shows under
certain lucky circumstances - maybe the robustness of 2.6.21-rc2+p is
due to local-apic not being enabled or some other subtle kernel build thing?
Any suggestion on what I could do to help track this down is much
appreciated?
Best regards,
Peter Favrholdt
next prev parent reply other threads:[~2007-03-08 16:26 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-07 14:32 sata_promise SATA300TX4 "intermittent problems" Peter Favrholdt
2007-03-07 20:12 ` Mikael Pettersson
2007-03-08 16:26 ` Peter Favrholdt [this message]
2007-03-09 6:27 ` Peter Favrholdt
2007-03-09 7:01 ` Tomi Orava
2007-03-09 7:29 ` Peter Favrholdt
2007-03-13 7:11 ` Tomi Orava
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45F0393D.80301@how.dk \
--to=linux-ide@how.dk \
--cc=linux-ide@vger.kernel.org \
--cc=mikpe@it.uu.se \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.