From: "Theodore Ts'o" <tytso@mit.edu>
To: Sahitya Tummala <stummala@codeaurora.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>, linux-ext4@vger.kernel.org
Subject: Re: fsync_mode mount option for ext4
Date: Wed, 29 May 2019 01:23:32 -0400 [thread overview]
Message-ID: <20190529052332.GB6210@mit.edu> (raw)
In-Reply-To: <20190529040757.GI10043@codeaurora.org>
On Wed, May 29, 2019 at 09:37:58AM +0530, Sahitya Tummala wrote:
>
> Here is what I think on these mount options. Please correct me if my
> understanding is wrong.
>
> The nobarrier mount option poses risk even if there is a battery
> protection against sudden power down, as it doesn't guarantee the ordering
> of important data such as journal writes on the disk. On the storage
> devices with internal cache, if the cache flush policy is out-of-order,
> then the places where FS is trying to enforce barriers will be at risk,
> causing FS to be inconsistent.
If you have protection against sudden shutdown, then nobarrier is
perfectly safe --- which is to say, if it is guaranteed that any
writes sent to device will be persisted after a crash, then nobarrier
is perfectly safe. So for example, if you are using ext4 connected to
a million dollar EMC Storage Array, which has battery backup, using
nobarrier is perfectly safe.
That's because we still send writes to the device in an appropriate
order in nobarrier mode --- in particular, we send the journal updates
to the device in order. The cache flush policy on the HDD is
out-of-order, but so long as they all make it out to persistant store
in the end, it'll be fine.
> But whereas with fsync_mode=nobarrier, FS is not trying to enforce
> any ordering of data on the disk except to ensure the data is flushed
> from the internal cache to non-volatile memory. Thus, I see this
> fsync_mode=nobarrier is much better than a general nobarrier. And it
> provides better performance too as with nobarrier but without
> compromising much on FS consistency.
"without compomising much on FS consistency" doesn't have any meaning.
If you care about FS consistency, and you don't have power fail
protection, then at least for ext4, you *must* send a CACHE FLUSH
after any time that you modify any file system metadata block --- and
that's true for 99% of all fsync(2)'s.
I suppose you could do something where if there are times when no
metadata updates are necessary, but just data block writes, the CACHE
FLUSH could be suppressed. But (a) this won't actually provide much
performance improvements for the vast majority of workloads,
especially on an Android system, and (b) you're making a value
judgement that FS consistency is more important than application data
consistency.
You didn't answer my question directly --- exactly what is your goal
that you are trying to achieve, and what assumptions you are willing
to make? If you have power fail protection (this might require making
some adjustments to the EC), then you can use nobarrier and just not
worry about it.
If you don't have power fail protection, and you care about FS
consistency, then you pretty much have to leave the CACHE FLUSH
commands in.
If the problem is that some applications are fsync-happy, then I'd
suggest fixing the applications. Or if you really don't care about
the applications working correctly or users suffering application data
loss after a crash, you could hack in a mode, so that for non-root
users, or maybe certain specific users, fsync is turned into a no-op,
or a background, asynchronous (non-integrity) writeback.
Are you trying to hit some benchmark target? I'm really confused why
you would want to be so cavalier with application data safety.
- Ted
next prev parent reply other threads:[~2019-05-29 5:23 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-28 3:22 fsync_mode mount option for ext4 Sahitya Tummala
2019-05-28 3:40 ` Theodore Ts'o
2019-05-28 3:48 ` Sahitya Tummala
2019-05-28 13:13 ` Theodore Ts'o
2019-05-29 4:07 ` Sahitya Tummala
2019-05-29 5:23 ` Theodore Ts'o [this message]
2019-05-29 6:56 ` Christoph Hellwig
2019-05-29 10:48 ` Sahitya Tummala
2019-05-29 15:13 ` Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190529052332.GB6210@mit.edu \
--to=tytso@mit.edu \
--cc=adilger.kernel@dilger.ca \
--cc=linux-ext4@vger.kernel.org \
--cc=stummala@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).