linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ext4 file replace guarantees
@ 2013-06-20 21:34 Ryan Lortie
  2013-06-21  0:59 ` Theodore Ts'o
  0 siblings, 1 reply; 20+ messages in thread
From: Ryan Lortie @ 2013-06-20 21:34 UTC (permalink / raw)
  To: linux-ext4

hi,

I recently read the kernel documentation on the topic of guarantees
provided by ext4 when renaming-over-existing.  I found this:

(*) == default

auto_da_alloc(*)        Many broken applications don't use fsync() when 
noauto_da_alloc             replacing existing files via patterns such
as
                   			fd =
                   			open("foo.new")/write(fd,..)/close(fd)/
                   			rename("foo.new", "foo"), or
                   			worse yet,
                   			fd = open("foo",
                   			O_TRUNC)/write(fd,..)/close(fd).
                   			If auto_da_alloc is enabled,
                   			ext4 will detect
                   			the replace-via-rename and
                   			replace-via-truncate
                   			patterns and force that any
                   			delayed allocation
                   			blocks are allocated such that
                   			at the next
                   			journal commit, in the default
                   			data=ordered
                   			mode, the data blocks of the new
                   			file are forced
                   			to disk before the rename()
                   			operation is
                   			committed.  This provides
                   			roughly the same level
                   			of guarantees as ext3, and
                   			avoids the
                   			"zero-length" problem that can
                   			happen when a
                   			system crashes before the
                   			delayed allocation
                   			blocks are forced to disk.


in https://www.kernel.org/doc/Documentation/filesystems/ext4.txt

which says to me "replace by rename is guaranteed safe in modern ext4,
under default mount options".

I understand that this was added after the "ext4 is eating my data"
panic in 2009.

Knowing that ext4 provides this guarantee caused me to modify GLib to
remove the fsync() that we used to do from g_file_set_contents(), if we
detect that we are on ext2/3/4:

  https://git.gnome.org/browse/glib/commit/?id=9d0c17b50102267a5029b58b1f44efbad82d8f03

(we already skipped the fsync() on btrfs since this filesystem
guarantees that replace-by-rename is safe):

"""
What are the crash guarantees of overwrite-by-rename?

Overwriting an existing file using a rename is atomic. That means that
either the old content of the file is there or the new content. A
sequence like this: 
"""

in
https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_overwrite-by-rename.3F

We don't really care too much about ext2 (although it would be great if
there was a convenient API to detect the difference between
ext2/ext3/ext4 filesystems since they all share one magic number).

Anyway... by mistake, this patch (removing fsync on ext4) got backported
into one of our stable releases and landed in Debian and the Fedora 19
beta, where many users started reporting data loss.

So what's the story here?  Is this safe or not?


The _only_ thing that I can think of is that GLib also does an
fallocate() before writing the data.  Does doing fallocate() before
write() void the rename-is-safe guarantees or is this just a filesystem
bug?

In any case, we have reverted the patch for now to work around the
issue.

It would be great if I could find out some official word on what the
guaranteed behaviour of the filesystem is with respect to
replace-by-rename.  Trying to dance around these issues is starting to
get a bit annoying...

Thanks in advance.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-06-23  1:58 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-20 21:34 ext4 file replace guarantees Ryan Lortie
2013-06-21  0:59 ` Theodore Ts'o
2013-06-21 12:43   ` Ryan Lortie
2013-06-21 13:15     ` Theodore Ts'o
2013-06-21 13:51       ` Ryan Lortie
2013-06-21 14:33         ` Theodore Ts'o
2013-06-21 15:24           ` Ryan Lortie
2013-06-21 20:35             ` Theodore Ts'o
2013-06-22  3:29               ` Dave Chinner
2013-06-22  4:47                 ` Theodore Ts'o
2013-06-22 13:40                   ` Sidorov, Andrei
2013-06-22 14:06                     ` Theodore Ts'o
2013-06-22 14:41                       ` Sidorov, Andrei
2013-06-23  1:58                   ` Dave Chinner
2013-06-21 16:25         ` Joseph D. Wagner
2013-06-21 21:05           ` Theodore Ts'o
2013-06-21 21:49             ` Sidorov, Andrei
2013-06-22 12:56               ` Theodore Ts'o
2013-06-22 14:01                 ` Sidorov, Andrei
2013-06-22 14:30                   ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).