All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ric Wheeler <ric@emc.com>
To: Tejun Heo <htejun@gmail.com>
Cc: jeff@garzik.org, linux-ide@vger.kernel.org,
	alan@lxorguk.ukuu.org.uk, liml@rtr.ca
Subject: Re: [PATCHSET #upstream] libata: improve FLUSH error handling
Date: Thu, 27 Mar 2008 13:51:00 -0400	[thread overview]
Message-ID: <47EBDE84.1070802@emc.com> (raw)
In-Reply-To: <12066128663306-git-send-email-htejun@gmail.com>



Tejun Heo wrote:
> Hello, all.
> 
> I was going through mailbox and saw Alan's patch[A] which didn't get
> the love it deserved.  It turned out that ata_flush_cache() function
> the patch modifies had been dead for some time.  I ended up re-doing
> it in the EH framework and it turned out okay.
> 
> After the patchset, on FLUSH failure, the following is done.
> 
>   1. It's always retried with longer timeout (60s for now) after
>      failure.
> 
>   2. If the device is making progress by retrying, humor it for longer
>      (20 tries for now).
> 
>   3. If the device is fails the command with the same failed sector,
>      retry fewer times (log2 of the original number of tries).
> 
>   4. If retried FLUSH fails for something other than device error,
>      don't keep retrying.  We're likely wasting time.

Are we sure that it is ever the right thing to do to reissue a flush command?

I am worried that this might be much closer to the media error class of device 
errors than something that will benefit from a retry of any type.

Also, I am unclear as to how we measure the progress of the device if the flush 
command has failed?



> As the code is being smart against retrying needlessly, it won't be
> too dangerous to increase the 20 tries (taken from Alan's patch) but I
> think it's as good as any other random number.  If anyone knows any
> meaningful number, please chime in.  The same goes for 60 secs timeout
> too.
> 
> I made a debug patch to trigger timeouts and device errors on FLUSH.
> I'll post the patch as a reply.  It adds the following four module
> params which can be written runtime via /sys/module/libata/parameters.
> 
>   flush_dbg_do_timeout:
> 	If non-zero value is written, the specfied number of FLUSHes
> 	will be timed out.
> 
>   flush_dbg_do_deverr:
> 	If non-zero value is written, the specfied number of FLUSHes
> 	will be terminated with device error.
> 
>   flush_dbg_fail_sector:
> 	The failed sector for the next deverr.
> 
>   flush_dbg_fail_increment:
>         Number of sectors to add to fail_sector after each deverr.
> 
> I tested different scenarios and it all seems to work fine but it
> would be really great if someone can test this on a (hmmm....) live
> dying drive.
> 
> This patchet is for #upstream but generated on top of
> 
>   #upstream-fixes (4cde32fc4b32e96a99063af3183acdfd54c563f0)
> + [1] libata: ATA_EHI_LPM should be ATA_EH_LPM
> 
> as there is a humongous patchset pending review #upstream.  Once this
> gets acked, I'll move it over to #upstream.  It shouldn't interfere
> too much anyway.
> 
> Thanks.
> 

We have access to a fair number of flaky drives, I can see if we can test some 
of this...

ric

  parent reply	other threads:[~2008-03-27 17:59 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-27 10:14 [PATCHSET #upstream] libata: improve FLUSH error handling Tejun Heo
2008-03-27 10:14 ` [PATCH 1/4] libata: make ata_tf_to_lba[48]() generic Tejun Heo
2008-04-04  7:45   ` Jeff Garzik
2008-03-27 10:14 ` [PATCH 2/4] libata: implement ATA_QCFLAG_RETRY Tejun Heo
2008-03-27 10:14 ` [PATCH 3/4] libata: kill unused ata_flush_cache() Tejun Heo
2008-03-27 10:14 ` [PATCH 4/4] libata: improve FLUSH error handling Tejun Heo
2008-04-04  7:46   ` Jeff Garzik
2008-03-27 10:23 ` Debug patch to induce errors on FLUSH Tejun Heo
2008-03-27 14:24 ` [PATCHSET #upstream] libata: improve FLUSH error handling Mark Lord
2008-03-27 14:35   ` Mark Lord
2008-03-27 15:31     ` Alan Cox
2008-03-27 18:01     ` Ric Wheeler
2008-03-28  1:57     ` Tejun Heo
2008-03-28  2:33       ` Mark Lord
2008-03-28 13:36         ` Ric Wheeler
2008-03-28 14:52           ` Tejun Heo
2008-03-28 14:53             ` Ric Wheeler
2008-03-28 15:16               ` Alan Cox
2008-03-28 16:57                 ` Ric Wheeler
2008-03-28 16:04             ` Mark Lord
2008-03-27 17:53   ` Ric Wheeler
2008-03-27 18:52     ` Jeff Garzik
2008-03-27 20:23       ` Ric Wheeler
2008-03-28  7:46   ` Andi Kleen
2008-03-28  8:30     ` Tejun Heo
2008-03-28  8:48       ` Andi Kleen
2008-03-28  8:53         ` Tejun Heo
2008-03-27 17:51 ` Ric Wheeler [this message]
2008-03-27 18:53   ` Jeff Garzik
2008-03-27 22:00   ` Alan Cox
2008-03-28  2:02   ` Tejun Heo
2008-03-28  9:48     ` Alan Cox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47EBDE84.1070802@emc.com \
    --to=ric@emc.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=htejun@gmail.com \
    --cc=jeff@garzik.org \
    --cc=liml@rtr.ca \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.