linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <htejun@gmail.com>
To: jeff@garzik.org, linux-ide@vger.kernel.org,
	alan@lxorguk.ukuu.org.uk, liml@rtr.ca
Subject: [PATCHSET #upstream] libata: improve FLUSH error handling
Date: Thu, 27 Mar 2008 19:14:22 +0900	[thread overview]
Message-ID: <12066128663306-git-send-email-htejun@gmail.com> (raw)

Hello, all.

I was going through mailbox and saw Alan's patch[A] which didn't get
the love it deserved.  It turned out that ata_flush_cache() function
the patch modifies had been dead for some time.  I ended up re-doing
it in the EH framework and it turned out okay.

After the patchset, on FLUSH failure, the following is done.

  1. It's always retried with longer timeout (60s for now) after
     failure.

  2. If the device is making progress by retrying, humor it for longer
     (20 tries for now).

  3. If the device is fails the command with the same failed sector,
     retry fewer times (log2 of the original number of tries).

  4. If retried FLUSH fails for something other than device error,
     don't keep retrying.  We're likely wasting time.

As the code is being smart against retrying needlessly, it won't be
too dangerous to increase the 20 tries (taken from Alan's patch) but I
think it's as good as any other random number.  If anyone knows any
meaningful number, please chime in.  The same goes for 60 secs timeout
too.

I made a debug patch to trigger timeouts and device errors on FLUSH.
I'll post the patch as a reply.  It adds the following four module
params which can be written runtime via /sys/module/libata/parameters.

  flush_dbg_do_timeout:
	If non-zero value is written, the specfied number of FLUSHes
	will be timed out.

  flush_dbg_do_deverr:
	If non-zero value is written, the specfied number of FLUSHes
	will be terminated with device error.

  flush_dbg_fail_sector:
	The failed sector for the next deverr.

  flush_dbg_fail_increment:
        Number of sectors to add to fail_sector after each deverr.

I tested different scenarios and it all seems to work fine but it
would be really great if someone can test this on a (hmmm....) live
dying drive.

This patchet is for #upstream but generated on top of

  #upstream-fixes (4cde32fc4b32e96a99063af3183acdfd54c563f0)
+ [1] libata: ATA_EHI_LPM should be ATA_EH_LPM

as there is a humongous patchset pending review #upstream.  Once this
gets acked, I'll move it over to #upstream.  It shouldn't interfere
too much anyway.

Thanks.

--
tejun

[A] http://article.gmane.org/gmane.linux.ide/28835
[1] http://article.gmane.org/gmane.linux.ide/30077

             reply	other threads:[~2008-03-27 10:14 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-27 10:14 Tejun Heo [this message]
2008-03-27 10:14 ` [PATCH 1/4] libata: make ata_tf_to_lba[48]() generic Tejun Heo
2008-04-04  7:45   ` Jeff Garzik
2008-03-27 10:14 ` [PATCH 2/4] libata: implement ATA_QCFLAG_RETRY Tejun Heo
2008-03-27 10:14 ` [PATCH 3/4] libata: kill unused ata_flush_cache() Tejun Heo
2008-03-27 10:14 ` [PATCH 4/4] libata: improve FLUSH error handling Tejun Heo
2008-04-04  7:46   ` Jeff Garzik
2008-03-27 10:23 ` Debug patch to induce errors on FLUSH Tejun Heo
2008-03-27 14:24 ` [PATCHSET #upstream] libata: improve FLUSH error handling Mark Lord
2008-03-27 14:35   ` Mark Lord
2008-03-27 15:31     ` Alan Cox
2008-03-27 18:01     ` Ric Wheeler
2008-03-28  1:57     ` Tejun Heo
2008-03-28  2:33       ` Mark Lord
2008-03-28 13:36         ` Ric Wheeler
2008-03-28 14:52           ` Tejun Heo
2008-03-28 14:53             ` Ric Wheeler
2008-03-28 15:16               ` Alan Cox
2008-03-28 16:57                 ` Ric Wheeler
2008-03-28 16:04             ` Mark Lord
2008-03-27 17:53   ` Ric Wheeler
2008-03-27 18:52     ` Jeff Garzik
2008-03-27 20:23       ` Ric Wheeler
2008-03-28  7:46   ` Andi Kleen
2008-03-28  8:30     ` Tejun Heo
2008-03-28  8:48       ` Andi Kleen
2008-03-28  8:53         ` Tejun Heo
2008-03-27 17:51 ` Ric Wheeler
2008-03-27 18:53   ` Jeff Garzik
2008-03-27 22:00   ` Alan Cox
2008-03-28  2:02   ` Tejun Heo
2008-03-28  9:48     ` Alan Cox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12066128663306-git-send-email-htejun@gmail.com \
    --to=htejun@gmail.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=jeff@garzik.org \
    --cc=liml@rtr.ca \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).