From: Ric Wheeler <ric@emc.com>
To: Tejun Heo <htejun@gmail.com>
Cc: jeff@garzik.org, linux-ide@vger.kernel.org,
alan@lxorguk.ukuu.org.uk, liml@rtr.ca
Subject: Re: [PATCHSET #upstream] libata: improve FLUSH error handling
Date: Thu, 27 Mar 2008 13:51:00 -0400 [thread overview]
Message-ID: <47EBDE84.1070802@emc.com> (raw)
In-Reply-To: <12066128663306-git-send-email-htejun@gmail.com>
Tejun Heo wrote:
> Hello, all.
>
> I was going through mailbox and saw Alan's patch[A] which didn't get
> the love it deserved. It turned out that ata_flush_cache() function
> the patch modifies had been dead for some time. I ended up re-doing
> it in the EH framework and it turned out okay.
>
> After the patchset, on FLUSH failure, the following is done.
>
> 1. It's always retried with longer timeout (60s for now) after
> failure.
>
> 2. If the device is making progress by retrying, humor it for longer
> (20 tries for now).
>
> 3. If the device is fails the command with the same failed sector,
> retry fewer times (log2 of the original number of tries).
>
> 4. If retried FLUSH fails for something other than device error,
> don't keep retrying. We're likely wasting time.
Are we sure that it is ever the right thing to do to reissue a flush command?
I am worried that this might be much closer to the media error class of device
errors than something that will benefit from a retry of any type.
Also, I am unclear as to how we measure the progress of the device if the flush
command has failed?
> As the code is being smart against retrying needlessly, it won't be
> too dangerous to increase the 20 tries (taken from Alan's patch) but I
> think it's as good as any other random number. If anyone knows any
> meaningful number, please chime in. The same goes for 60 secs timeout
> too.
>
> I made a debug patch to trigger timeouts and device errors on FLUSH.
> I'll post the patch as a reply. It adds the following four module
> params which can be written runtime via /sys/module/libata/parameters.
>
> flush_dbg_do_timeout:
> If non-zero value is written, the specfied number of FLUSHes
> will be timed out.
>
> flush_dbg_do_deverr:
> If non-zero value is written, the specfied number of FLUSHes
> will be terminated with device error.
>
> flush_dbg_fail_sector:
> The failed sector for the next deverr.
>
> flush_dbg_fail_increment:
> Number of sectors to add to fail_sector after each deverr.
>
> I tested different scenarios and it all seems to work fine but it
> would be really great if someone can test this on a (hmmm....) live
> dying drive.
>
> This patchet is for #upstream but generated on top of
>
> #upstream-fixes (4cde32fc4b32e96a99063af3183acdfd54c563f0)
> + [1] libata: ATA_EHI_LPM should be ATA_EH_LPM
>
> as there is a humongous patchset pending review #upstream. Once this
> gets acked, I'll move it over to #upstream. It shouldn't interfere
> too much anyway.
>
> Thanks.
>
We have access to a fair number of flaky drives, I can see if we can test some
of this...
ric
next prev parent reply other threads:[~2008-03-27 17:59 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-27 10:14 [PATCHSET #upstream] libata: improve FLUSH error handling Tejun Heo
2008-03-27 10:14 ` [PATCH 1/4] libata: make ata_tf_to_lba[48]() generic Tejun Heo
2008-04-04 7:45 ` Jeff Garzik
2008-03-27 10:14 ` [PATCH 2/4] libata: implement ATA_QCFLAG_RETRY Tejun Heo
2008-03-27 10:14 ` [PATCH 3/4] libata: kill unused ata_flush_cache() Tejun Heo
2008-03-27 10:14 ` [PATCH 4/4] libata: improve FLUSH error handling Tejun Heo
2008-04-04 7:46 ` Jeff Garzik
2008-03-27 10:23 ` Debug patch to induce errors on FLUSH Tejun Heo
2008-03-27 14:24 ` [PATCHSET #upstream] libata: improve FLUSH error handling Mark Lord
2008-03-27 14:35 ` Mark Lord
2008-03-27 15:31 ` Alan Cox
2008-03-27 18:01 ` Ric Wheeler
2008-03-28 1:57 ` Tejun Heo
2008-03-28 2:33 ` Mark Lord
2008-03-28 13:36 ` Ric Wheeler
2008-03-28 14:52 ` Tejun Heo
2008-03-28 14:53 ` Ric Wheeler
2008-03-28 15:16 ` Alan Cox
2008-03-28 16:57 ` Ric Wheeler
2008-03-28 16:04 ` Mark Lord
2008-03-27 17:53 ` Ric Wheeler
2008-03-27 18:52 ` Jeff Garzik
2008-03-27 20:23 ` Ric Wheeler
2008-03-28 7:46 ` Andi Kleen
2008-03-28 8:30 ` Tejun Heo
2008-03-28 8:48 ` Andi Kleen
2008-03-28 8:53 ` Tejun Heo
2008-03-27 17:51 ` Ric Wheeler [this message]
2008-03-27 18:53 ` Jeff Garzik
2008-03-27 22:00 ` Alan Cox
2008-03-28 2:02 ` Tejun Heo
2008-03-28 9:48 ` Alan Cox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47EBDE84.1070802@emc.com \
--to=ric@emc.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=htejun@gmail.com \
--cc=jeff@garzik.org \
--cc=liml@rtr.ca \
--cc=linux-ide@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).