From: Ric Wheeler <rwheeler@redhat.com>
To: Eric Anopolsky <erpo41@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>,
Stephan von Krawczynski <skraw@ithnet.com>,
linux-btrfs@vger.kernel.org, Tejun Heo <tj@kernel.org>
Subject: Re: Some very basic questions
Date: Wed, 22 Oct 2008 06:42:18 -0400 [thread overview]
Message-ID: <48FF038A.4010105@redhat.com> (raw)
In-Reply-To: <1224642544.7189.17.camel@telesto>
Eric Anopolsky wrote:
> On Tue, 2008-10-21 at 18:18 -0400, Ric Wheeler wrote:
>
>> Eric Anopolsky wrote:
>>
>>> On Tue, 2008-10-21 at 09:59 -0400, Chris Mason wrote:
>>>
>>>
>>>>> - power loss at any time must not corrupt the fs (atomic fs modification)
>>>>> (new-data loss is acceptable)
>>>>>
>>>>>
>>>> Done. Btrfs already uses barriers as required for sata drives.
>>>>
>>>>
>>> Aren't there situations in which write barriers don't do what they're
>>> supposed to do?
>>>
>>> Cheers,
>>> Eric
>>>
>>>
>>>
>> If the drive effectively "lies" to you about flushing the write cache,
>> you might have an issue. I have not seen that first hand with recent
>> disk drives (and I have seen a lot :-))
>>
>
> That does not match the understanding I get from reading the
> notes/caveats section of Documentation/block/barrier.txt:
>
> "Note that block drivers must not requeue preceding requests while
> completing latter requests in an ordered sequence. Currently, no
> error checking is done against this."
>
> and perhaps more importantly:
>
> "[a technical scenario involving disk writes]
> The problem here is that the barrier request is *supposed* to indicate
> that filesystem update requests [2] and [3] made it safely to the
> physical medium and, if the machine crashes after the barrier is
> written, filesystem recovery code can depend on that. Sadly, that
> isn't true in this case anymore. IOW, the success of a I/O barrier
> should also be dependent on success of some of the preceding requests,
> where only upper layer (filesystem) knows what 'some' is.
>
> This can be solved by implementing a way to tell the block layer which
> requests affect the success of the following barrier request and
> making lower lever drivers to resume operation on error only after
> block layer tells it to do so.
>
> As the probability of this happening is very low and the drive should
> be faulty, implementing the fix is probably an overkill. But, still,
> it's there."
>
> Cheers,
> Eric
>
>
The cache flush command for ATA devices will block and wait until all of
the device's write cache has been written back.
What I assume Tejun was referring to here is that some IO might have
been written out to the device and an error happened when the device
tried to write the cache back (say due to normal drive microcode cache
destaging). The problem with this is that there is no outstanding IO
context between the host and the storage to report the error to (i.e.,
the drive has already ack'ed the write).
If this is what is being described, there is a non-zero chance that this
might happen, but it is extremely infrequent. The checksumming that we
have in btrfs will catch these bad writes when you replay the journal
after a crash (or even when you read data blocks) so I would contend
that this is about as good as we can do.
Tejun, Chris, does this match your understanding?
Thanks!
Ric
next prev parent reply other threads:[~2008-10-22 10:42 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-21 11:23 Some very basic questions Stephan von Krawczynski
2008-10-21 12:13 ` Andi Kleen
2008-10-21 14:22 ` Stephan von Krawczynski
2008-10-21 15:34 ` jim owens
2008-10-22 11:36 ` Stephan von Krawczynski
2008-10-22 12:15 ` Avi Kivity
2008-10-22 13:03 ` Ric Wheeler
2008-10-22 13:13 ` Chris Mason
2008-10-22 13:16 ` Avi Kivity
2008-10-21 13:20 ` jim owens
2008-10-21 17:01 ` Stephan von Krawczynski
2008-10-21 17:15 ` Christoph Hellwig
2008-10-21 17:31 ` Ric Wheeler
2008-10-22 12:27 ` Stephan von Krawczynski
2008-10-22 13:15 ` Chris Mason
2008-10-22 13:27 ` Ric Wheeler
2008-10-22 14:32 ` Avi Kivity
2008-10-22 14:36 ` Chris Mason
2008-10-22 14:40 ` Avi Kivity
2008-10-22 14:46 ` Ric Wheeler
2008-10-22 14:54 ` Avi Kivity
2008-10-22 15:02 ` Ric Wheeler
2008-10-22 15:13 ` Avi Kivity
2008-10-22 15:25 ` Ric Wheeler
2008-10-22 15:33 ` Chris Mason
2008-10-22 15:43 ` Avi Kivity
2008-10-22 15:54 ` Ric Wheeler
2008-10-22 18:28 ` Avi Kivity
2008-10-22 15:39 ` Avi Kivity
2008-10-22 13:52 ` Stephan von Krawczynski
2008-10-22 15:56 ` Michel Salim
2008-10-22 16:56 ` jim owens
2008-10-23 9:47 ` Stephan von Krawczynski
2008-10-22 11:40 ` Stephan von Krawczynski
2008-10-21 13:59 ` Chris Mason
2008-10-21 16:09 ` Andi Kleen
2008-10-22 11:43 ` Stephan von Krawczynski
2008-10-21 16:27 ` Stephan von Krawczynski
2008-10-21 16:59 ` Andi Kleen
2008-10-22 11:46 ` Stephan von Krawczynski
2008-10-21 17:49 ` Chris Mason
2008-10-22 12:19 ` Stephan von Krawczynski
2008-10-22 12:48 ` Jeff Schroeder
2008-10-22 14:02 ` Stephan von Krawczynski
2008-10-22 13:50 ` Chris Mason
2008-10-22 14:04 ` Matthias Wächter
2008-10-22 14:32 ` Ric Wheeler
2008-10-22 14:44 ` jim owens
2008-10-24 8:42 ` Chris Samuel
2008-10-24 8:39 ` Chris Samuel
2008-10-21 20:54 ` Eric Anopolsky
2008-10-21 22:18 ` Ric Wheeler
2008-10-22 2:29 ` Eric Anopolsky
2008-10-22 10:42 ` Ric Wheeler [this message]
2008-10-22 10:53 ` Tejun Heo
2008-10-22 12:57 ` Ric Wheeler
2008-10-22 12:57 ` Ric Wheeler
2008-10-22 13:15 ` Tejun Heo
2008-10-22 13:19 ` Chris Mason
2008-10-22 13:38 ` Ric Wheeler
2008-10-22 13:59 ` Chris Mason
2008-10-22 14:23 ` Ric Wheeler
2008-10-22 13:23 ` Ric Wheeler
2008-10-22 16:14 ` Tejun Heo
2008-10-22 16:34 ` Ric Wheeler
2008-10-23 3:59 ` Tejun Heo
2008-10-22 18:32 ` Avi Kivity
2008-10-22 19:13 ` jim owens
2008-10-22 19:22 ` Avi Kivity
2008-10-22 19:59 ` Ric Wheeler
2008-10-22 21:31 ` Eric Anopolsky
2008-10-22 21:56 ` Ric Wheeler
-- strict thread matches above, loose matches on Subject: below --
2008-10-21 17:37 calin
2008-10-21 20:08 ` jim owens
2008-10-22 7:15 ` Avi Kivity
2008-10-22 14:13 ` jim owens
2008-10-22 14:25 ` Avi Kivity
2008-10-22 14:35 dbz
2008-10-27 15:43 ` Stephan von Krawczynski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48FF038A.4010105@redhat.com \
--to=rwheeler@redhat.com \
--cc=chris.mason@oracle.com \
--cc=erpo41@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=skraw@ithnet.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox