linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <htejun@gmail.com>
To: ric@emc.com
Cc: linux-ide@vger.kernel.org, Jeff Garzik <jgarzik@pobox.com>,
	Mark Lord <mlord@pobox.com>, Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: error handling - DMA to PIO step down sequence
Date: Fri, 22 Sep 2006 00:09:11 +0900	[thread overview]
Message-ID: <4512AB17.7040907@gmail.com> (raw)
In-Reply-To: <4511907F.1010104@emc.com>

Ric Wheeler wrote:
[--snipp--]
> Derating should probably never happen on normal drive errors - even 
> those that might take 10's of seconds.  Often, drives will try really, 
> really hard to recover and might eventually respond after internally 
> giving up after up to 30 seconds.

We definitely need to improve that part of EH.  It's more of a 
proof-of-concept code to show that EH can do derating and all the fancy 
stuff at the moment.

However, I'm not so sure about being 'too' aggressive.  As long as the 
error condition from the device indicates proper error condition which 
is not transmission error, EH doesn't derate the device.  In your test 
case, libata couldn't determine anything about the error condition other 
than it has occurred for a known supported IO command, so after enough 
retries, it starts to lower transmission speed.  I want to note two 
things here.

1. The reason why EH took so long is not because of derating but 
_probably_ because libata didn't know and couldn't tell upper layer much 
about the error condition.  We definitely need to improve this part.  I 
believe some problems are in libata and some in SCSI midlayer.

2. The derating sequence should be refined.  For example,
     * if sata
	* excessive aborts and NCQ on
		-> turn off NCQ
	* frequent tx or tons of unknown errs on known supported cmds
	  and 3gbps
		-> use 1.5gbps
     * if pata
	* frequent tx or tons of unknown errs on known supported cmds
	  and udma mode
		-> step down once or twice (the first step is the next
		   lower level, the next UDMA2 if PATA for 40c-cbl case)

     * commands are failing too often that no meaningful work is done
       or many DMA errors are reported (note that this often results in
       timeout)
	-> fall back to PIO, if still unusable fallback to PIO0, nothing
	   much to lose anyway.

Above usually results in four maximum derating steps.  Hmmm.. some SATA 
devices may find one or two UDMA slow down steps useful if they're 
bridged.  Anyways, the baseline is that the current steps are 
unnecessarily too many.

Please note that derating steps isn't the biggest problem.  It just 
looks prominent because of the first problem.

> Also, NACK's from unsupported commands or any type of media errors 
> should not kick off this sequence.

No, it doesn't.  Only abort or unknown failures on known supported 
commands (READ/WRITE) or transmission errors cause the sequence.  Again, 
it's the NQ bit that's offending here.

> Would this be a reasonable thing for a config option? Better to add yet 
> another blacklist for devices that might have a justified need for this 
> derating?

No, I don't think this justifies a config option or a blacklist.  We 
just need to improve the default behavior good enough.  For your case, 
with the sequence outlined above, libata will turn off NCQ after several 
such errors and then will get media error reported correct.  It will 
result in some performance loss but if you have a drive with faulty 
firmware + media error on that device, that's fair price to pay, isn't it?

Thanks.

-- 
tejun

  reply	other threads:[~2006-09-21 15:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-20 19:03 error handling - DMA to PIO step down sequence Ric Wheeler
2006-09-21 15:09 ` Tejun Heo [this message]
2006-09-21 16:26   ` Ric Wheeler
2006-09-21 17:04     ` Alan Cox
2006-09-21 16:50       ` Ric Wheeler
2006-09-21 16:58         ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4512AB17.7040907@gmail.com \
    --to=htejun@gmail.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=jgarzik@pobox.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=mlord@pobox.com \
    --cc=ric@emc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).