From: Chris Mason <chris.mason@oracle.com>
To: Patrick Goetz <pgoetz@math.utexas.edu>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Atomicity or the ext4 open-write-close-rename debacle
Date: Wed, 08 Apr 2009 12:28:15 -0400 [thread overview]
Message-ID: <1239208095.22111.4.camel@think.oraclecorp.com> (raw)
In-Reply-To: <49DCC0FD.3010200@math.utexas.edu>
On Wed, 2009-04-08 at 10:21 -0500, Patrick Goetz wrote:
> Hi -
>
> I've been trying to get up to speed on new linux filesystem efforts and
> stumbled upon the following post from a btrfs developer to lwn.net:
>
> -----------------------------------------------------------
> Posted Mar 16, 2009 16:50 UTC (Mon) by masoncl (subscriber, #47138)
> The btrfs data=ordered implementation is different from ext34 and
> reiserfs. It decouples data writes from the metadata transaction, and
> simply updates the metadata for file extents after the data blocks are
> on disk.
>
> This means the transaction commit doesn't have to wait for the data
> blocks because the metadata for the file extents always reflects extents
> that are actually on disk.
>
> When you rename one file over another, the destination file is
> atomically replaced with the new file. The new file is fully consistent
> with the data that has already been written, which in the worst case
> means it has a size of zero after a crash.
> ...
> -----------------------------------------------------------
>
> Frankly this comment doesn't make any sense to me at all. First of all,
> "this means the transaction commit doesn't have to wait for the data
> blocks...". Is the data ordered or not? If you commit the transaction
> -- i.e. update the metadata before the data blocks are committed -- then
> the operations are occurring out of order and ext4
> open-write-close-rename mayhem ensues.
>
> Second, atomicity in this context means that when executing a rename,
> you always get either the old data (exactly) or the new data (exactly)
> even after a crash. The "worst case scenario" described above -- a size
> of zero after crash -- precisely violates atomicity.
>
> Any comments on this?
There isn't a quick and short description for this. Before 2.6.30,
btrfs would allow renames to result in zero length files after a crash.
Filesystem developers have always considered the rename-is-atomic
requirement to refer only to the directory entries themselves.
With 2.6.30, extra ordering is added to btrfs, making sure that metadata
and data are both atomically replaced during a rename. In other words,
for renames it will work like ext3 data=ordered mode.
-chris
next prev parent reply other threads:[~2009-04-08 16:28 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-08 15:21 Atomicity or the ext4 open-write-close-rename debacle Patrick Goetz
2009-04-08 16:28 ` Chris Mason [this message]
2009-04-08 18:28 ` Patrick Goetz
2009-04-08 23:12 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1239208095.22111.4.camel@think.oraclecorp.com \
--to=chris.mason@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=pgoetz@math.utexas.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.