From: Sahitya Tummala <stummala@codeaurora.org>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>,
linux-ext4@vger.kernel.org, stummala@codeaurora.org
Subject: Re: fsync_mode mount option for ext4
Date: Wed, 29 May 2019 16:18:09 +0530 [thread overview]
Message-ID: <20190529104809.GJ10043@codeaurora.org> (raw)
In-Reply-To: <20190529052332.GB6210@mit.edu>
Hi Ted,
On Wed, May 29, 2019 at 01:23:32AM -0400, Theodore Ts'o wrote:
> On Wed, May 29, 2019 at 09:37:58AM +0530, Sahitya Tummala wrote:
> >
> > Here is what I think on these mount options. Please correct me if my
> > understanding is wrong.
> >
> > The nobarrier mount option poses risk even if there is a battery
> > protection against sudden power down, as it doesn't guarantee the ordering
> > of important data such as journal writes on the disk. On the storage
> > devices with internal cache, if the cache flush policy is out-of-order,
> > then the places where FS is trying to enforce barriers will be at risk,
> > causing FS to be inconsistent.
>
> If you have protection against sudden shutdown, then nobarrier is
> perfectly safe --- which is to say, if it is guaranteed that any
> writes sent to device will be persisted after a crash, then nobarrier
> is perfectly safe. So for example, if you are using ext4 connected to
> a million dollar EMC Storage Array, which has battery backup, using
> nobarrier is perfectly safe.
>
> That's because we still send writes to the device in an appropriate
> order in nobarrier mode --- in particular, we send the journal updates
> to the device in order. The cache flush policy on the HDD is
> out-of-order, but so long as they all make it out to persistant store
> in the end, it'll be fine.
>
Got it.
> > But whereas with fsync_mode=nobarrier, FS is not trying to enforce
> > any ordering of data on the disk except to ensure the data is flushed
> > from the internal cache to non-volatile memory. Thus, I see this
> > fsync_mode=nobarrier is much better than a general nobarrier. And it
> > provides better performance too as with nobarrier but without
> > compromising much on FS consistency.
>
> "without compomising much on FS consistency" doesn't have any meaning.
> If you care about FS consistency, and you don't have power fail
> protection, then at least for ext4, you *must* send a CACHE FLUSH
> after any time that you modify any file system metadata block --- and
> that's true for 99% of all fsync(2)'s.
>
> I suppose you could do something where if there are times when no
> metadata updates are necessary, but just data block writes, the CACHE
> FLUSH could be suppressed. But (a) this won't actually provide much
> performance improvements for the vast majority of workloads,
> especially on an Android system, and (b) you're making a value
> judgement that FS consistency is more important than application data
> consistency.
>
>
> You didn't answer my question directly --- exactly what is your goal
> that you are trying to achieve, and what assumptions you are willing
> to make? If you have power fail protection (this might require making
> some adjustments to the EC), then you can use nobarrier and just not
> worry about it.
>
> If you don't have power fail protection, and you care about FS
> consistency, then you pretty much have to leave the CACHE FLUSH
> commands in.
>
> If the problem is that some applications are fsync-happy, then I'd
> suggest fixing the applications. Or if you really don't care about
> the applications working correctly or users suffering application data
> loss after a crash, you could hack in a mode, so that for non-root
> users, or maybe certain specific users, fsync is turned into a no-op,
> or a background, asynchronous (non-integrity) writeback.
>
> Are you trying to hit some benchmark target? I'm really confused why
> you would want to be so cavalier with application data safety.
>
Yes, benchmarks for random write/fsync show huge improvement.
For ex, without issuing flush in the ext4 fsync() the
random write score improves from 13MB/s to 62MB/s on eMMC,
using Androbench.
And fsync_mode=nobarrier is enabled by default on pixel phones
where f2fs is used.
https://android.googlesource.com/device/google/crosshatch/+/e02e4813256e51bacdecb93ffd8340f6efbe68e0
We have been getting requests to evaluate the same for EXT4 and
hence, I was checking with the community on its feasibility.
Thanks,
Sahitya.
> - Ted
next prev parent reply other threads:[~2019-05-29 10:48 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-28 3:22 fsync_mode mount option for ext4 Sahitya Tummala
2019-05-28 3:40 ` Theodore Ts'o
2019-05-28 3:48 ` Sahitya Tummala
2019-05-28 13:13 ` Theodore Ts'o
2019-05-29 4:07 ` Sahitya Tummala
2019-05-29 5:23 ` Theodore Ts'o
2019-05-29 6:56 ` Christoph Hellwig
2019-05-29 10:48 ` Sahitya Tummala [this message]
2019-05-29 15:13 ` Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190529104809.GJ10043@codeaurora.org \
--to=stummala@codeaurora.org \
--cc=adilger.kernel@dilger.ca \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).