linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: Dave Chinner <david@fromorbit.com>
Cc: Lucas Nussbaum <lucas.nussbaum@loria.fr>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Emmanuel Jeanvoine <emmanuel.jeanvoine@inria.fr>,
	Dmitry Monakhov <dmonakhov@openvz.org>
Subject: Re: [PATCH, RFC] fs: only call sync_filesystem() when remounting read-only
Date: Wed, 12 Mar 2014 23:14:40 -0400	[thread overview]
Message-ID: <20140313031440.GA13367@thunk.org> (raw)
In-Reply-To: <20140313011629.GA2796@thunk.org>

On Wed, Mar 12, 2014 at 09:16:29PM -0400, Theodore Ts'o wrote:
> > IMO, I think that you should be looking to fix ext4 syncfs issues,
> > not changing the VFS behaviour that might cause subtle and unnoticed
> > problems for other filesystems. We should not be moving data
> > inegrity operations without first auditing of all the filesystem
> > remount operations for issues.
> 
> The issue is that it's forcing a CACHE FLUSH if we don't need to force
> a journal commit, since it's possible that data writes could have been
> sent to the disk without modifying fs metadata that would require a
> commit.  So arguably what we're doing with ext4 is _correct_, where as
> with ext3 we would simply not calling blkdev_issue_barrier() in that
> situation.

Doing some more digging, ext4 is currently interpreting syncfs() as
requiring a data integrity sync.  So we go through some extra work to
guarantee that we call blkdev_issue_barrier(), even if a journal
commit is not required.

This change was made by Dmitry last June:

commit 06a407f13daf9e48f0ef7189c7e54082b53940c7
Author: Dmitry Monakhov <dmonakhov@openvz.org>
Date:   Wed Jun 12 22:25:07 2013 -0400

    ext4: fix data integrity for ext4_sync_fs
    
    Inode's data or non journaled quota may be written w/o jounral so we
    _must_ send a barrier at the end of ext4_sync_fs. But it can be
    skipped if journal commit will do it for us.
    
    Also fix data integrity for nojournal mode.
    
    Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

Both ext3 and xfs do *not* do this.

Looking more closely at the syncfs(2) manpage, it's not clear it
requires this:

       sync() causes all buffered modifications to file metadata and
       data to be written to the underlying filesystems.

       syncfs() is like sync(), but synchronizes just the filesystem
       containing file referred to by the open file descriptor fd.

Unlike the fsync(2) system call, it does *not* state that the data
flushed to the disk is guaranteed to be there after a crash, which I
suppose justifies ext3 and xfs's current behavior.

So the way I see it, we have three choices.

1)  Nowhere in the remount system call is it stated that it has
    ***any*** data integrity implifications.   If you are making the rw->ro
    transition, sure, you'll need to flush out any pending changes.  But there
    doesn't seem to be any justification for requiring this this if the
    remount is a no-op.   So I think changing the remount code path as I
    suggested is a valid option.

2) We could revert Dmitry's change from last June.  This would make
   ext4 work the same way as ext3 and xfs.  Which I think is also
   valid, since the syncfs(2) system call says nothing about
   guaranteeing data being preserved after a crash, unlike fsync(2).

3) We could say that a workload that calls thousands of no-op remounts
   to be stupid/busted/silly, and not do anything at all.


#1 requires core VFS changes, and Dave seems unhappy with it.

#2 requires rolling back an ext4 change, and I wonder if Dmitry had a
situation where he really needed syncfs(2) to have data integrity
guarantees.

#3 is the default if we can't come to consensus over what else to do.

Cheers,

						- Ted

  reply	other threads:[~2014-03-13  3:14 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-05 14:13 Extremely slow remounts with concurrent I/O Lucas Nussbaum
2014-03-06 13:56 ` [PATCH, RFC] jbd2: don't write non-commit blocks synchronously Theodore Ts'o
2014-03-06 17:28   ` Lucas Nussbaum
2014-03-06 18:27     ` Theodore Ts'o
2014-03-06 18:45       ` Lucas Nussbaum
2014-03-06 18:37     ` Lucas Nussbaum
2014-03-08 16:08 ` [PATCH, RFC] fs: only call sync_filesystem() when remounting read-only Theodore Ts'o
2014-03-10 11:45   ` Lucas Nussbaum
2014-03-10 14:41     ` Theodore Ts'o
2014-03-10 12:15   ` Lucas Nussbaum
2014-03-13  0:36   ` Dave Chinner
2014-03-13  1:16     ` Theodore Ts'o
2014-03-13  3:14       ` Theodore Ts'o [this message]
2014-03-13  6:04         ` Dave Chinner
2014-03-13 12:55           ` Theodore Ts'o
2014-03-13  7:39     ` Christoph Hellwig
2014-03-13 14:20       ` [PATCH] fs: push sync_filesystem() down to the file system's remount_fs() Theodore Ts'o
     [not found]         ` <1394720456-16629-1-git-send-email-tytso-3s7WtUTddSA@public.gmane.org>
2014-03-13 16:23           ` Jan Kara
2014-03-13 16:28             ` Steven Whitehouse
2014-03-13 23:15               ` [Cluster-devel] " Theodore Ts'o
     [not found]                 ` <20140313231506.GB16785-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2014-03-14 12:13                   ` Jan Kara
2014-03-14  0:33         ` Steve French
2014-03-14  1:23           ` Theodore Ts'o
2014-03-13  7:19 ` Extremely slow remounts with concurrent I/O Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140313031440.GA13367@thunk.org \
    --to=tytso@mit.edu \
    --cc=david@fromorbit.com \
    --cc=dmonakhov@openvz.org \
    --cc=emmanuel.jeanvoine@inria.fr \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lucas.nussbaum@loria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).