Re: sata_promise SATA300TX4 "intermittent problems"

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Peter Favrholdt <linux-ide@how.dk>
To: Mikael Pettersson <mikpe@it.uu.se>
Cc: linux-ide@vger.kernel.org
Subject: Re: sata_promise SATA300TX4 "intermittent problems"
Date: Thu, 08 Mar 2007 17:26:37 +0100	[thread overview]
Message-ID: <45F0393D.80301@how.dk> (raw)
In-Reply-To: <17903.7366.875191.751728@alkaid.it.uu.se>

Hi Mikael,

Thanks for the reply, I've commented below:

Mikael Pettersson wrote:
> SErr 0x01380000 would indicate:
> transport state transmission error (bit 24)
> CRC error (bit 21)
> disparity error (bit 20) [whatever that is]
> 10b_to_8b decoding error (bit 19)
> 
> I.e., serious transmission issues.

:-)

> > [52849.930755] pdc_error_intr: port_status 0x00001000 serror 0x00000000
> > [52849.930880] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 
> > frozen
> > [52849.930883] ata2.00: (port_status 0x00001000)
> 
> "host bus timeout error" (bit 12).
> I wonder why SError was clear now.

I can't say - this whole ata thing is much too complex for me ;-)

> > I would be very happy to help debug this issue. Any suggestions on what 
> > I should try next?
> 
> Well, at the moment I have only one possible cure: to forcibly
> limit 3Gbps drives to 1.5Gbps operation, as the patch below does.

I haven't tried your 1.5Gbps patch (yet). But I have been running more 
tests on my experiment system with the kernels I have handy. My 
procedure is as follows:

1. power cycle
2. boot selected kernel
3. start dd if=/dev/sdx of=/dev/null bs=1M for x=a,b,c,d
4. wait until one fails
5. record dmesg output

So far here are my results:

2.6.18.1 fails (in 25 minutes)
2.6.19   fails (in 4 minutes)
2.6.19.2 fails (in 5 minutes)
2.6.20.1 fails (in 48 minutes)
2.6.21-rc2+p (with additional patches) doesn't fail

This is very consistent. 2.6.21-rc2+p has been tested for more than 10 
hours without a hickup :-)

In the above tests it is always ata3 or ata4 (sdc or sdd) which fails.

Another strange thing which happens on 2.6.21-rc2+p but not the other 
kernels: using smartctl -a -d ata while dd is running gives errors (I 
also mentioned this in my first mail, but wasn't sure then):

[11046.005178] pdc_error_intr: port_status 0x00001000 serror 0x00000000
[11046.005286] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 
frozen
[11046.005374] ata4.00: (port_status 0x00001000)
[11046.005383] ata4.00: cmd 25/00:00:00:3b:a0/00:01:27:00:00/e0 tag 0 
cdb 0x0 data 131072 in
[11046.005385]          res 50/00:00:ff:3b:a0/00:00:00:00:00/e0 Emask 
0x4 (timeout)
[11046.313769] ata4: soft resetting port
[11046.469806] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11046.496254] pdc_error_intr: port_status 0x00001000 serror 0x00000000
[11046.496580] ata4.00: failed to set xfermode (err_mask=0x104)
[11046.496585] ata4: failed to recover some devices, retrying in 5 secs
[11051.495393] ata4: hard resetting port
[11051.971276] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11052.005267] ata4.00: configured for UDMA/133
[11052.005285] ata4: EH complete
[11052.042615] SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
[11052.051769] sdd: Write Protect is off
[11052.051778] sdd: Mode Sense: 00 3a 00 00
[11052.059455] SCSI device sdd: write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA
[11052.066354] SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
[11052.070822] sdd: Write Protect is off
[11052.070830] sdd: Mode Sense: 00 3a 00 00
[11052.073297] SCSI device sdd: write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA

Then it recovers and dd continues :-)

Note that using smartctl this way on the other kernels does not show 
this problem!

> On one of my test machines (an old UltraSPARC), a SATA300 TX2plus
> with a Seagate 3Gbps drive (don't have the model number handy),
> will quickly experience "DMA S/G overrun" errors during an fsck
> of a large but clean ext3 partition. With the patch below things
> work solidly on that particular machine. OTOH, on another test
> machine (a 440BX chipset Intel PIII), the same card/cable/disk
> combination works flawlessly at 3Gbps. Mysterious.

My feeling is this is not caused by 1.5Gbps or 3.0Gbps operation.

I was thinking about adding the speed selections jumpers on the 
harddrives, but so far I'm not touching the system as I don't want 
hardware problems (e.g. a loose cable) disturbing the test results. I'll 
stick to replacing software.

My next test will be a plain 2.6.21rc2. Then I'll apply the patches one 
by one.

One thought is this could be a bug/race condition which only shows under 
certain lucky circumstances - maybe the robustness of 2.6.21-rc2+p is 
due to local-apic not being enabled or some other subtle kernel build thing?

Any suggestion on what I could do to help track this down is much 
appreciated?

Best regards,

Peter Favrholdt

next prev parent reply	other threads:[~2007-03-08 16:26 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-07 14:32 sata_promise SATA300TX4 "intermittent problems" Peter Favrholdt
2007-03-07 20:12 ` Mikael Pettersson
2007-03-08 16:26   ` Peter Favrholdt [this message]
2007-03-09  6:27     ` Peter Favrholdt
2007-03-09  7:01       ` Tomi Orava
2007-03-09  7:29         ` Peter Favrholdt
2007-03-13  7:11       ` Tomi Orava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45F0393D.80301@how.dk \
    --to=linux-ide@how.dk \
    --cc=linux-ide@vger.kernel.org \
    --cc=mikpe@it.uu.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).