From: Theodore Ts'o <tytso@mit.edu>
To: Ryan Lortie <desrt@desrt.ca>
Cc: linux-ext4@vger.kernel.org
Subject: Re: ext4 file replace guarantees
Date: Fri, 21 Jun 2013 09:15:21 -0400 [thread overview]
Message-ID: <20130621131521.GE10730@thunk.org> (raw)
In-Reply-To: <1371818596.20553.140661246775057.0F7160F3@webmail.messagingengine.com>
On Fri, Jun 21, 2013 at 08:43:16AM -0400, Ryan Lortie wrote:
>
> On Thu, Jun 20, 2013, at 20:59, Theodore Ts'o wrote:
> > It's not _guaranteed_ safe. It significantly reduces the chances of
> > data loss in case of a crash, but it's possible for the transaction
> > containing the rename to close before the blocks are written back.
>
> I think you need to update the documentation. Specifically, this:
>
> "avoids the "zero-length" problem that can happen
> when a system crashes before the delayed allocation
> blocks are forced to disk"
>
> makes it sound like the replace-by-rename-without-fsync problem has been
> solved. This is the statement that caused me to remove the extra
> fsync() we had in GLib.
I agree it can be read that way, although we were very careful to
avoid the word "guarantee".
> Note that "significantly reduces the chances of" is not good enough to
> prevent about a dozen reports of lost data that I alone have heard about
> in the past few days...
>
> https://bugzilla.gnome.org/show_bug.cgi?id=701560#c30
So in at least a few of these bugs, people are failing to fsync()
after creatinig a new file. The implied flush *only* happens in two
cases:
#1) If an existing file was opened using O_TRUNC, an implied flush is
sent after the file is closed.
#2) If an existing file is removed via a rename, if there are any
delayed allocation blocks in the new file, they will be flushed
out.
One of the test cases created lots of new files, and that's not one of
the two cases shown above.
> It would be great if the docs would just said "If you want safety with
> ext4, it's going to be slow. Please always call fsync()." instead of
> making it sound like I'll probably be mostly OKish if I don't.
This is going to be true for all file systems. If application writers
are trying to surf the boundaries of what is safe or not, inevitably
they will eventually run into problems.
Also, although fsync() is not free, it's not the performance disaster
it was in ext3. Precisely because of delayed allocation, we don't
have to flush out pending writes of unrelated files when you do a
fsync().
Finally, I strongly encourage you to think very carefully about your
strategy for storing these sorts of registry data. Even if it is
"safe" for btrfs, if the desktop applications are constantly writing
back files for no good reason, it's going to burn battery, SSD write
cycles, and etc. And if said registry files are large XML files that
have to be complete rewritten every single time an applicatoin
modifies an element or two (and said application is doing this several
times a seocnd), the user is going to have a bad time --- in shortened
battery and SSD life if nothing else. And this is not a problem a
file system can protect you against, since while we can try to be more
clever about how we manage the metadata, the data blocks still have to
be written to disk.
The fact that you are trying to optimize out the fsync() makes me
wonder if there is something fundamentally flawed in the design of
either the application or its underlying libraries....
- Ted
next prev parent reply other threads:[~2013-06-21 13:15 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-20 21:34 ext4 file replace guarantees Ryan Lortie
2013-06-21 0:59 ` Theodore Ts'o
2013-06-21 12:43 ` Ryan Lortie
2013-06-21 13:15 ` Theodore Ts'o [this message]
2013-06-21 13:51 ` Ryan Lortie
2013-06-21 14:33 ` Theodore Ts'o
2013-06-21 15:24 ` Ryan Lortie
2013-06-21 20:35 ` Theodore Ts'o
2013-06-22 3:29 ` Dave Chinner
2013-06-22 4:47 ` Theodore Ts'o
2013-06-22 13:40 ` Sidorov, Andrei
2013-06-22 14:06 ` Theodore Ts'o
2013-06-22 14:41 ` Sidorov, Andrei
2013-06-23 1:58 ` Dave Chinner
2013-06-21 16:25 ` Joseph D. Wagner
2013-06-21 21:05 ` Theodore Ts'o
2013-06-21 21:49 ` Sidorov, Andrei
2013-06-22 12:56 ` Theodore Ts'o
2013-06-22 14:01 ` Sidorov, Andrei
2013-06-22 14:30 ` Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130621131521.GE10730@thunk.org \
--to=tytso@mit.edu \
--cc=desrt@desrt.ca \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).