Re: [PATCHSET #upstream] libata: improve FLUSH error handling

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Tejun Heo <htejun@gmail.com>
To: Mark Lord <liml@rtr.ca>
Cc: jeff@garzik.org, linux-ide@vger.kernel.org, alan@lxorguk.ukuu.org.uk
Subject: Re: [PATCHSET #upstream] libata: improve FLUSH error handling
Date: Fri, 28 Mar 2008 10:57:13 +0900	[thread overview]
Message-ID: <47EC5079.5020105@gmail.com> (raw)
In-Reply-To: <47EBB09F.9070607@rtr.ca>

Hello, Mark.

Mark Lord wrote:
> Speaking of which.. these are all WRITEs.
> 
> In 18 years of IDE/ATA development,
> I have *never* seen a hard disk drive report a WRITE error.
>
> Which makes sense, if you think about it -- it's rewriting the sector
> with new ECC info, so it *should* succeed.  The only case where it won't,
> is if the sector has been marked as "bad" internally, and the drive is
> too dumb to try anyways after it runs out of remap space.
> 
> In which case we've already lost data, and taking more than a hundred
> and twenty seconds isn't going to make a serious difference.

Yeah, the disk must be knee deep in shit to report WRITE failure.  I
don't really expect the code to be exercised often but was mainly trying
fill the loophole in libata error handling as this type of behavior is
what the spec requires on FLUSH errors.

I didn't add global timeout because retries are done iff the drive is
reporting progress.

1. Drives genuinely deep in shit and getting lots of WRITE errors would
report different sectors on each FLUSH and we NEED to keep retrying.
That's what the spec requires and the FLUSH could be from shutdown and
if so that would be the drive's last chance to write data to the drive.

2. There are other issues causing the command to fail (e.g. timeout, HSM
violation or somesuch).  This is the case EH can take a really long time
if it keeps retrying but the posted code doesn't retry if this is the case.

3. The drive is crazy and reporting errors for no good reason.  Unless
the drive is really anti-social and raise such error condition only
after tens of seconds, this shouldn't take too long.  Also, if LBA
doesn't change for each retry, the tries count is halved.

So, I think the code should be safe.  Do you still think we need a
global timeout?  It is easy to add.  I'm just not sure whether we need
it or not.

> Mmm.. anyone got a spare modern-ish drive to risk destroying?
> Say, one of the few still-functioning DeathStars, or an buggy-NCQ Maxtor ?
> 
> If so, it might be fun to try and produce a no-more-remaps scenario on it.
> One could use "hdparm --make-bad-sector" to corrupt a few hundred/thousand
> sectors in a row (sequentially numbered).
>
> Then loop and attempt to read from them individually with "hdparm
> --read-sector"
> (should fail on all, but it might force the drive to remap them).
> 
> Then finally try and write back to them with "hdparm --write-sector",
> and see if a WRITE ERROR is ever reported.  Maybe time the individual
> WRITEs
> to see if any of them take more than a few milliseconds.
> 
> Perhaps try this whole thing with/without the write cache enabled.
> 
> Mmm...

Heh... :-)

-- 
tejun

next prev parent reply	other threads:[~2008-03-28  1:57 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-27 10:14 [PATCHSET #upstream] libata: improve FLUSH error handling Tejun Heo
2008-03-27 10:14 ` [PATCH 1/4] libata: make ata_tf_to_lba[48]() generic Tejun Heo
2008-04-04  7:45   ` Jeff Garzik
2008-03-27 10:14 ` [PATCH 2/4] libata: implement ATA_QCFLAG_RETRY Tejun Heo
2008-03-27 10:14 ` [PATCH 3/4] libata: kill unused ata_flush_cache() Tejun Heo
2008-03-27 10:14 ` [PATCH 4/4] libata: improve FLUSH error handling Tejun Heo
2008-04-04  7:46   ` Jeff Garzik
2008-03-27 10:23 ` Debug patch to induce errors on FLUSH Tejun Heo
2008-03-27 14:24 ` [PATCHSET #upstream] libata: improve FLUSH error handling Mark Lord
2008-03-27 14:35   ` Mark Lord
2008-03-27 15:31     ` Alan Cox
2008-03-27 18:01     ` Ric Wheeler
2008-03-28  1:57     ` Tejun Heo [this message]
2008-03-28  2:33       ` Mark Lord
2008-03-28 13:36         ` Ric Wheeler
2008-03-28 14:52           ` Tejun Heo
2008-03-28 14:53             ` Ric Wheeler
2008-03-28 15:16               ` Alan Cox
2008-03-28 16:57                 ` Ric Wheeler
2008-03-28 16:04             ` Mark Lord
2008-03-27 17:53   ` Ric Wheeler
2008-03-27 18:52     ` Jeff Garzik
2008-03-27 20:23       ` Ric Wheeler
2008-03-28  7:46   ` Andi Kleen
2008-03-28  8:30     ` Tejun Heo
2008-03-28  8:48       ` Andi Kleen
2008-03-28  8:53         ` Tejun Heo
2008-03-27 17:51 ` Ric Wheeler
2008-03-27 18:53   ` Jeff Garzik
2008-03-27 22:00   ` Alan Cox
2008-03-28  2:02   ` Tejun Heo
2008-03-28  9:48     ` Alan Cox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47EC5079.5020105@gmail.com \
    --to=htejun@gmail.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=jeff@garzik.org \
    --cc=liml@rtr.ca \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).