All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Chris Mason <chris.mason@oracle.com>
Cc: Tejun Heo <tj@kernel.org>, Eric Anopolsky <erpo41@gmail.com>,
	Stephan von Krawczynski <skraw@ithnet.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: Some very basic questions
Date: Wed, 22 Oct 2008 09:38:56 -0400	[thread overview]
Message-ID: <48FF2CF0.60708@redhat.com> (raw)
In-Reply-To: <1224681593.6448.7.camel@think.oraclecorp.com>

Chris Mason wrote:
> On Wed, 2008-10-22 at 22:15 +0900, Tejun Heo wrote:
>   
>> Ric Wheeler wrote:
>>     
>>> I think that we do handle a failure in the case that you outline above
>>> since the FS will be able to notice the error before it sends a commit
>>> down (and that commit is wrapped in the barrier flush calls). This is
>>> the easy case since we still have the context for the IO.
>>>       
>> I'm no FS guy but for that to be true FS should be waiting for all the
>> outstanding IOs to finish before issuing a barrier and actually
>> doesn't need barriers at all - it can do the same with flush_cache.
>>
>>     
>
> We wait and then barrier.  If the barrier returned status that a
> previously ack'd IO had actually failed, we could do something to make
> sure the FS was consistent.
>
> -chris
>
>
>   
As I mentioned in a reply to Tejun, I am not sure that we can count on 
the barrier op giving us status for IO's that failed to destage cleanly.

Waiting and then doing the FLUSH seems to give us the best coverage for 
normal failures (and your own testing shows that it is hugely effective 
in reducing some types of corruption at least :-)).

If you look at the types of common drive failures, I would break them 
into two big groups. 

The first group would be transient errors - i.e., this IO fails (usually 
a read), but a subsequent IO will succeed with or without a sector 
remapping happening.  Causes might be:

    (1) just a bad read due to dirt on the surface of the drive - the 
read will always fail, a write might clean the surface and restore it to 
useful life.
    (2) vibrations (dropping your laptop, rolling a big machine down the 
data center, passing trains :-))
    (3) adjacent sector writes - hot spotting on drives can degrade the 
data on adjacent tracks. This causes IO errors on reads for data that 
was successfully written before, but the track itself is still perfectly 
fine.

All of these first types of errors need robust error handling on IO 
errors (i.e., quickly fail, check for IO errors and isolate the impact 
of the error as best as we can) but do not indicate a bad drive.

The second group would be persistent failures - no matter what you do to 
the drive, it is going to kick the bucket! Common causes might be:

    (1) a few bad sectors (1-5% of the drive's remapped sector table for 
example).
    (2) a bad disk head - this is a very common failure, you will see a 
large amount of bad sectors.
    (3) bad components (say bad memory chips in the write cache) can 
produce consistent errors
    (4) failure to spin up (total drive failure).

The challenging part is to figure out as best as we can how to 
differentiate the causes of IO failures or checksum failures and to 
respond correctly.  Array vendors spend a lot of time pulling out hair 
trying to do predictive drive failure, but it is really, really hard to 
get correct...

ric



  reply	other threads:[~2008-10-22 13:38 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-21 11:23 Some very basic questions Stephan von Krawczynski
2008-10-21 12:13 ` Andi Kleen
2008-10-21 14:22   ` Stephan von Krawczynski
2008-10-21 15:34     ` jim owens
2008-10-22 11:36       ` Stephan von Krawczynski
2008-10-22 12:15         ` Avi Kivity
2008-10-22 13:03           ` Ric Wheeler
2008-10-22 13:13             ` Chris Mason
2008-10-22 13:16             ` Avi Kivity
2008-10-21 13:20 ` jim owens
2008-10-21 17:01   ` Stephan von Krawczynski
2008-10-21 17:15     ` Christoph Hellwig
2008-10-21 17:31       ` Ric Wheeler
2008-10-22 12:27         ` Stephan von Krawczynski
2008-10-22 13:15           ` Chris Mason
2008-10-22 13:27             ` Ric Wheeler
2008-10-22 14:32               ` Avi Kivity
2008-10-22 14:36                 ` Chris Mason
2008-10-22 14:40                   ` Avi Kivity
2008-10-22 14:46                 ` Ric Wheeler
2008-10-22 14:54                   ` Avi Kivity
2008-10-22 15:02                     ` Ric Wheeler
2008-10-22 15:13                       ` Avi Kivity
2008-10-22 15:25                         ` Ric Wheeler
2008-10-22 15:33                           ` Chris Mason
2008-10-22 15:43                             ` Avi Kivity
2008-10-22 15:54                               ` Ric Wheeler
2008-10-22 18:28                                 ` Avi Kivity
2008-10-22 15:39                           ` Avi Kivity
2008-10-22 13:52             ` Stephan von Krawczynski
2008-10-22 15:56               ` Michel Salim
2008-10-22 16:56                 ` jim owens
2008-10-23  9:47                 ` Stephan von Krawczynski
2008-10-22 11:40       ` Stephan von Krawczynski
2008-10-21 13:59 ` Chris Mason
2008-10-21 16:09   ` Andi Kleen
2008-10-22 11:43     ` Stephan von Krawczynski
2008-10-21 16:27   ` Stephan von Krawczynski
2008-10-21 16:59     ` Andi Kleen
2008-10-22 11:46       ` Stephan von Krawczynski
2008-10-21 17:49     ` Chris Mason
2008-10-22 12:19       ` Stephan von Krawczynski
2008-10-22 12:48         ` Jeff Schroeder
2008-10-22 14:02           ` Stephan von Krawczynski
2008-10-22 13:50         ` Chris Mason
2008-10-22 14:04           ` Matthias Wächter
2008-10-22 14:32             ` Ric Wheeler
2008-10-22 14:44               ` jim owens
2008-10-24  8:42           ` Chris Samuel
2008-10-24  8:39         ` Chris Samuel
2008-10-21 20:54   ` Eric Anopolsky
2008-10-21 22:18     ` Ric Wheeler
2008-10-22  2:29       ` Eric Anopolsky
2008-10-22 10:42         ` Ric Wheeler
2008-10-22 10:53           ` Tejun Heo
2008-10-22 12:57             ` Ric Wheeler
2008-10-22 12:57             ` Ric Wheeler
2008-10-22 13:15               ` Tejun Heo
2008-10-22 13:19                 ` Chris Mason
2008-10-22 13:38                   ` Ric Wheeler [this message]
2008-10-22 13:59                     ` Chris Mason
2008-10-22 14:23                       ` Ric Wheeler
2008-10-22 13:23                 ` Ric Wheeler
2008-10-22 16:14                   ` Tejun Heo
2008-10-22 16:34                     ` Ric Wheeler
2008-10-23  3:59                       ` Tejun Heo
2008-10-22 18:32                     ` Avi Kivity
2008-10-22 19:13                       ` jim owens
2008-10-22 19:22                         ` Avi Kivity
2008-10-22 19:59                       ` Ric Wheeler
2008-10-22 21:31                     ` Eric Anopolsky
2008-10-22 21:56                       ` Ric Wheeler
  -- strict thread matches above, loose matches on Subject: below --
2008-10-21 17:37 calin
2008-10-21 20:08 ` jim owens
2008-10-22  7:15   ` Avi Kivity
2008-10-22 14:13     ` jim owens
2008-10-22 14:25       ` Avi Kivity
2008-10-22 14:35 dbz
2008-10-27 15:43 ` Stephan von Krawczynski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48FF2CF0.60708@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=chris.mason@oracle.com \
    --cc=erpo41@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=skraw@ithnet.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.