From: Tejun Heo <htejun@gmail.com>
To: "Martin Ammermüller" <tenco@gmx.de>
Cc: linux-ide@vger.kernel.org, jgarzik@pobox.com
Subject: Re: [sata_sil] kernel 2.6.17(-mm2) test - timeout issue
Date: Mon, 07 Aug 2006 00:51:32 +0900 [thread overview]
Message-ID: <44D61004.3070109@gmail.com> (raw)
In-Reply-To: <1154785005.9220.1.camel@localhost>
Hello,
Martin Ammermüller wrote:
> I tried the patch, but i couldn't see any changes in kerneloutput. I
> also noticed, that there are actually two slightly different
> error-messages.
>
> #1 (shorter one, without HSM violation):
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata1.00: (BMDMA stat 0x21)
> ata1.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout)
DRDY (device ready), DMA engine active but no DRQ (data request), while
READ DMA - seems like a packet loss/corruption during data transfer to
me, but set DRDY is a bit weird.
> ata1: port is slow to respond, please be patient
> ata1: port failed to respond (30 secs)
Again, weird. libata times out waiting for DRDY, which is weird because
DRDY was set when the timeout occurred (as reported above) but when EH
reset code is executed (which should have followed immediately), the
code sees !DRDY and times out waiting for it.
> ata1: soft resetting port
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata1.00: configured for UDMA/100
> ata1: EH complete
SRST successfully recovers the device in this case.
> #2 (longer, with HSM violation):
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x400000 action 0x2 frozen
SErr is reporting handshake error (R_ERR seen) diagnostic bit but not
reporting any error bit.
> ata1.00: (BMDMA stat 0x20)
> ata1.00: tag 0 cmd 0xc8 Emask 0x2 stat 0x58 err 0x0 (HSM violation)
DMA engine off and DRDY && DRQ. Again, looks like data transmission
error but considering data transmission direction is from device to host
(READ_DMA), the error status is confusing.
> ata1: soft resetting port
> ata1: port is slow to respond, please be patient
> ata1: port failed to respond (30 secs)
prereset saw set DRDY this time but after SRST, BSY is stuck at 1.
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ATA: abnormal status 0xD8 on port 0xDCA44087
> ATA: abnormal status 0xD8 on port 0xDCA44087
> ATA: abnormal status 0xD8 on port 0xDCA44087
> ATA: abnormal status 0xD8 on port 0xDCA44087
> ATA: abnormal status 0xD8 on port 0xDCA44087
> ata1.00: qc timeout (cmd 0xec)
We should really fail softreset if ata_busy_sleep() fails in
ata_bus_post_reset(). In this case, softreset reports success after
timeout causing the following revalidation to timeout too.
> ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata1.00: revalidation failed (errno=-5)
> ata1: failed to recover some devices, retrying in 5 secs
> ata1: hard resetting port
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata1.00: configured for UDMA/100
> ata1: EH complete
hardreset successfully recovers the device.
Hmmm.. with the patch, sata_sil should have tried hardreset at the first
time in the second case. There's our third weirdity. I think the
problem can be worked around by...
1. having shorter timeout value on READ/WRITE commands. 30s is *way*
too long.
2. making reset procedure more intelligent. There's no reason to wait
full 30s before and after softreset if it's not for hotplug. It should
switch to hardreset if the device doesn't respond in several secs.
Being responsive && giving device enough time eventually shouldn't be
too difficult.
#1 shouldn't be difficult but we need to be careful. #2 might take some
time to implement.
I'm not sure why the previous patch didn't kick in. The condition
should have been caught and EH_HARDRESET requested. Can you please
double check the patched kernel is running? You can put a little
printk() after the freeze: label in sil_interrupt() to be sure. That's
the only place where sata_sil freezes the port except for timing out.
Thanks.
--
tejun
next prev parent reply other threads:[~2006-08-06 15:51 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-24 20:50 [sata_sil] kernel 2.6.17(-mm2) test - timeout issue Martin Ammermüller
2006-06-25 3:06 ` Tejun Heo
2006-07-24 8:21 ` Martin Ammermüller
2006-07-30 20:22 ` Tejun Heo
2006-08-05 13:36 ` Martin Ammermüller
2006-08-06 15:51 ` Tejun Heo [this message]
2006-08-14 10:12 ` Martin Ammermüller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44D61004.3070109@gmail.com \
--to=htejun@gmail.com \
--cc=jgarzik@pobox.com \
--cc=linux-ide@vger.kernel.org \
--cc=tenco@gmx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).