linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET #upstream] libata: improve FLUSH error handling
@ 2008-03-27 10:14 Tejun Heo
  2008-03-27 10:14 ` [PATCH 1/4] libata: make ata_tf_to_lba[48]() generic Tejun Heo
                   ` (6 more replies)
  0 siblings, 7 replies; 32+ messages in thread
From: Tejun Heo @ 2008-03-27 10:14 UTC (permalink / raw)
  To: jeff, linux-ide, alan, liml

Hello, all.

I was going through mailbox and saw Alan's patch[A] which didn't get
the love it deserved.  It turned out that ata_flush_cache() function
the patch modifies had been dead for some time.  I ended up re-doing
it in the EH framework and it turned out okay.

After the patchset, on FLUSH failure, the following is done.

  1. It's always retried with longer timeout (60s for now) after
     failure.

  2. If the device is making progress by retrying, humor it for longer
     (20 tries for now).

  3. If the device is fails the command with the same failed sector,
     retry fewer times (log2 of the original number of tries).

  4. If retried FLUSH fails for something other than device error,
     don't keep retrying.  We're likely wasting time.

As the code is being smart against retrying needlessly, it won't be
too dangerous to increase the 20 tries (taken from Alan's patch) but I
think it's as good as any other random number.  If anyone knows any
meaningful number, please chime in.  The same goes for 60 secs timeout
too.

I made a debug patch to trigger timeouts and device errors on FLUSH.
I'll post the patch as a reply.  It adds the following four module
params which can be written runtime via /sys/module/libata/parameters.

  flush_dbg_do_timeout:
	If non-zero value is written, the specfied number of FLUSHes
	will be timed out.

  flush_dbg_do_deverr:
	If non-zero value is written, the specfied number of FLUSHes
	will be terminated with device error.

  flush_dbg_fail_sector:
	The failed sector for the next deverr.

  flush_dbg_fail_increment:
        Number of sectors to add to fail_sector after each deverr.

I tested different scenarios and it all seems to work fine but it
would be really great if someone can test this on a (hmmm....) live
dying drive.

This patchet is for #upstream but generated on top of

  #upstream-fixes (4cde32fc4b32e96a99063af3183acdfd54c563f0)
+ [1] libata: ATA_EHI_LPM should be ATA_EH_LPM

as there is a humongous patchset pending review #upstream.  Once this
gets acked, I'll move it over to #upstream.  It shouldn't interfere
too much anyway.

Thanks.

--
tejun

[A] http://article.gmane.org/gmane.linux.ide/28835
[1] http://article.gmane.org/gmane.linux.ide/30077

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2008-04-04  7:47 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-27 10:14 [PATCHSET #upstream] libata: improve FLUSH error handling Tejun Heo
2008-03-27 10:14 ` [PATCH 1/4] libata: make ata_tf_to_lba[48]() generic Tejun Heo
2008-04-04  7:45   ` Jeff Garzik
2008-03-27 10:14 ` [PATCH 2/4] libata: implement ATA_QCFLAG_RETRY Tejun Heo
2008-03-27 10:14 ` [PATCH 3/4] libata: kill unused ata_flush_cache() Tejun Heo
2008-03-27 10:14 ` [PATCH 4/4] libata: improve FLUSH error handling Tejun Heo
2008-04-04  7:46   ` Jeff Garzik
2008-03-27 10:23 ` Debug patch to induce errors on FLUSH Tejun Heo
2008-03-27 14:24 ` [PATCHSET #upstream] libata: improve FLUSH error handling Mark Lord
2008-03-27 14:35   ` Mark Lord
2008-03-27 15:31     ` Alan Cox
2008-03-27 18:01     ` Ric Wheeler
2008-03-28  1:57     ` Tejun Heo
2008-03-28  2:33       ` Mark Lord
2008-03-28 13:36         ` Ric Wheeler
2008-03-28 14:52           ` Tejun Heo
2008-03-28 14:53             ` Ric Wheeler
2008-03-28 15:16               ` Alan Cox
2008-03-28 16:57                 ` Ric Wheeler
2008-03-28 16:04             ` Mark Lord
2008-03-27 17:53   ` Ric Wheeler
2008-03-27 18:52     ` Jeff Garzik
2008-03-27 20:23       ` Ric Wheeler
2008-03-28  7:46   ` Andi Kleen
2008-03-28  8:30     ` Tejun Heo
2008-03-28  8:48       ` Andi Kleen
2008-03-28  8:53         ` Tejun Heo
2008-03-27 17:51 ` Ric Wheeler
2008-03-27 18:53   ` Jeff Garzik
2008-03-27 22:00   ` Alan Cox
2008-03-28  2:02   ` Tejun Heo
2008-03-28  9:48     ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).