Atomicity or the ext4 open-write-close-rename debacle

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* Atomicity or the ext4 open-write-close-rename debacle
@ 2009-04-08 15:21 Patrick Goetz
  2009-04-08 16:28 ` Chris Mason
  0 siblings, 1 reply; 4+ messages in thread
From: Patrick Goetz @ 2009-04-08 15:21 UTC (permalink / raw)
  To: linux-btrfs

Hi -

I've been trying to get up to speed on new linux filesystem efforts and 
stumbled upon the following post from a btrfs developer to lwn.net:

-----------------------------------------------------------
Posted Mar 16, 2009 16:50 UTC (Mon) by masoncl (subscriber, #47138)
The btrfs data=ordered implementation is different from ext34 and 
reiserfs. It decouples data writes from the metadata transaction, and 
simply updates the metadata for file extents after the data blocks are 
on disk.

This means the transaction commit doesn't have to wait for the data 
blocks because the metadata for the file extents always reflects extents 
that are actually on disk.

When you rename one file over another, the destination file is 
atomically replaced with the new file. The new file is fully consistent 
with the data that has already been written, which in the worst case 
means it has a size of zero after a crash.
...
-----------------------------------------------------------

Frankly this comment doesn't make any sense to me at all.  First of all, 
"this means the transaction commit doesn't have to wait for the data 
blocks...".  Is the data ordered or not? If you commit the transaction 
-- i.e. update the metadata before the data blocks are committed -- then 
the operations are occurring out of order and ext4 
open-write-close-rename mayhem ensues.

Second, atomicity in this context means that when executing a rename, 
you always get either the old data (exactly) or the new data (exactly) 
even after a crash. The "worst case scenario" described above -- a size 
of zero after crash -- precisely violates atomicity.

Any comments on this?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Atomicity or the ext4 open-write-close-rename debacle
  2009-04-08 15:21 Atomicity or the ext4 open-write-close-rename debacle Patrick Goetz
@ 2009-04-08 16:28 ` Chris Mason
  2009-04-08 18:28   ` Patrick Goetz
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Mason @ 2009-04-08 16:28 UTC (permalink / raw)
  To: Patrick Goetz; +Cc: linux-btrfs

On Wed, 2009-04-08 at 10:21 -0500, Patrick Goetz wrote:
> Hi -
> 
> I've been trying to get up to speed on new linux filesystem efforts and 
> stumbled upon the following post from a btrfs developer to lwn.net:
> 
> -----------------------------------------------------------
> Posted Mar 16, 2009 16:50 UTC (Mon) by masoncl (subscriber, #47138)
> The btrfs data=ordered implementation is different from ext34 and 
> reiserfs. It decouples data writes from the metadata transaction, and 
> simply updates the metadata for file extents after the data blocks are 
> on disk.
> 
> This means the transaction commit doesn't have to wait for the data 
> blocks because the metadata for the file extents always reflects extents 
> that are actually on disk.
> 
> When you rename one file over another, the destination file is 
> atomically replaced with the new file. The new file is fully consistent 
> with the data that has already been written, which in the worst case 
> means it has a size of zero after a crash.
> ...
> -----------------------------------------------------------
> 
> Frankly this comment doesn't make any sense to me at all.  First of all, 
> "this means the transaction commit doesn't have to wait for the data 
> blocks...".  Is the data ordered or not? If you commit the transaction 
> -- i.e. update the metadata before the data blocks are committed -- then 
> the operations are occurring out of order and ext4 
> open-write-close-rename mayhem ensues.
> 
> Second, atomicity in this context means that when executing a rename, 
> you always get either the old data (exactly) or the new data (exactly) 
> even after a crash. The "worst case scenario" described above -- a size 
> of zero after crash -- precisely violates atomicity.
> 
> Any comments on this?

There isn't a quick and short description for this.  Before 2.6.30,
btrfs would allow renames to result in zero length files after a crash.
Filesystem developers have always considered the rename-is-atomic
requirement to refer only to the directory entries themselves.

With 2.6.30, extra ordering is added to btrfs, making sure that metadata
and data are both atomically replaced during a rename.  In other words,
for renames it will work like ext3 data=ordered mode.

-chris



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Atomicity or the ext4 open-write-close-rename debacle
  2009-04-08 16:28 ` Chris Mason
@ 2009-04-08 18:28   ` Patrick Goetz
  2009-04-08 23:12     ` Chris Mason
  0 siblings, 1 reply; 4+ messages in thread
From: Patrick Goetz @ 2009-04-08 18:28 UTC (permalink / raw)
  Cc: linux-btrfs

Chris Mason wrote:
> 
> With 2.6.30, extra ordering is added to btrfs, making sure that metadata
> and data are both atomically replaced during a rename.  In other words,
> for renames it will work like ext3 data=ordered mode.
> 

Thanks for the speedy response.

After spending several hours slogging through the discussion on Ted 
Tso's blog and spending much more time than anticipated learning about 
FUA, write barriers, fsync vs. fdatasync, how fsync is implemented in 
linux, etc., I'm curious about the technical details of how this is 
accomplished.  Any place where I can find this short of reading through 
the source code?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Atomicity or the ext4 open-write-close-rename debacle
  2009-04-08 18:28   ` Patrick Goetz
@ 2009-04-08 23:12     ` Chris Mason
  0 siblings, 0 replies; 4+ messages in thread
From: Chris Mason @ 2009-04-08 23:12 UTC (permalink / raw)
  To: Patrick Goetz; +Cc: linux-btrfs

On Wed, 2009-04-08 at 13:28 -0500, Patrick Goetz wrote:
> Chris Mason wrote:
> > 
> > With 2.6.30, extra ordering is added to btrfs, making sure that metadata
> > and data are both atomically replaced during a rename.  In other words,
> > for renames it will work like ext3 data=ordered mode.
> > 
> 
> Thanks for the speedy response.
> 
> After spending several hours slogging through the discussion on Ted 
> Tso's blog and spending much more time than anticipated learning about 
> FUA, write barriers, fsync vs. fdatasync, how fsync is implemented in 
> linux, etc., I'm curious about the technical details of how this is 
> accomplished.  Any place where I can find this short of reading through 
> the source code?

The rename flushing is pretty simple.  When one file replaces another
during rename, btrfs puts the new file into a list of things that must
be flushed before the transaction commits.

This way, we know the data is on disk before the rename metadata changes
are on disk.

-chris



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-04-08 23:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-08 15:21 Atomicity or the ext4 open-write-close-rename debacle Patrick Goetz
2009-04-08 16:28 ` Chris Mason
2009-04-08 18:28   ` Patrick Goetz
2009-04-08 23:12     ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox