public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Patrick Goetz <pgoetz@math.utexas.edu>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Atomicity or the ext4 open-write-close-rename debacle
Date: Wed, 08 Apr 2009 12:28:15 -0400	[thread overview]
Message-ID: <1239208095.22111.4.camel@think.oraclecorp.com> (raw)
In-Reply-To: <49DCC0FD.3010200@math.utexas.edu>

On Wed, 2009-04-08 at 10:21 -0500, Patrick Goetz wrote:
> Hi -
> 
> I've been trying to get up to speed on new linux filesystem efforts and 
> stumbled upon the following post from a btrfs developer to lwn.net:
> 
> -----------------------------------------------------------
> Posted Mar 16, 2009 16:50 UTC (Mon) by masoncl (subscriber, #47138)
> The btrfs data=ordered implementation is different from ext34 and 
> reiserfs. It decouples data writes from the metadata transaction, and 
> simply updates the metadata for file extents after the data blocks are 
> on disk.
> 
> This means the transaction commit doesn't have to wait for the data 
> blocks because the metadata for the file extents always reflects extents 
> that are actually on disk.
> 
> When you rename one file over another, the destination file is 
> atomically replaced with the new file. The new file is fully consistent 
> with the data that has already been written, which in the worst case 
> means it has a size of zero after a crash.
> ...
> -----------------------------------------------------------
> 
> Frankly this comment doesn't make any sense to me at all.  First of all, 
> "this means the transaction commit doesn't have to wait for the data 
> blocks...".  Is the data ordered or not? If you commit the transaction 
> -- i.e. update the metadata before the data blocks are committed -- then 
> the operations are occurring out of order and ext4 
> open-write-close-rename mayhem ensues.
> 
> Second, atomicity in this context means that when executing a rename, 
> you always get either the old data (exactly) or the new data (exactly) 
> even after a crash. The "worst case scenario" described above -- a size 
> of zero after crash -- precisely violates atomicity.
> 
> Any comments on this?

There isn't a quick and short description for this.  Before 2.6.30,
btrfs would allow renames to result in zero length files after a crash.
Filesystem developers have always considered the rename-is-atomic
requirement to refer only to the directory entries themselves.

With 2.6.30, extra ordering is added to btrfs, making sure that metadata
and data are both atomically replaced during a rename.  In other words,
for renames it will work like ext3 data=ordered mode.

-chris



  reply	other threads:[~2009-04-08 16:28 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-08 15:21 Atomicity or the ext4 open-write-close-rename debacle Patrick Goetz
2009-04-08 16:28 ` Chris Mason [this message]
2009-04-08 18:28   ` Patrick Goetz
2009-04-08 23:12     ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1239208095.22111.4.camel@think.oraclecorp.com \
    --to=chris.mason@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=pgoetz@math.utexas.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox