[PATCH] Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
@ 2019-02-01  4:42 Theodore Ts'o
  2019-02-01 21:21 ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Theodore Ts'o @ 2019-02-01  4:42 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: jack, Theodore Ts'o

This reverts commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a.

As Jan Kara pointed out, this change was unsafe since it means we lose
the call to sync_mapping_buffers() in the nojournal case.  The
original point of the commit was avoid taking the inode mutex (since
it causes a lockdep warning in generic/113); but we need the mutex in
order to call sync_mapping_buffers().

The real fix to this problem was discussed here:

https://lore.kernel.org/lkml/20181025150540.259281-4-bvanassche@acm.org

The proposed patch was to fix a syzbot complaint, but the problem can
also demonstrated via "kvm-xfstests -c nojournal generic/113".
Multiple solutions were discused in the e-mail thread, but none have
landed in the kernel as of this writing.  Anyway, commit
ad211f3e94b314 is absolutely the wrong way to suppress the lockdep, so
revert it.

Fixes: ad211f3e94b314a910d4af03178a0b52a7d1ee0a ("ext4: use ext4_write_inode() when fsyncing w/o a journal")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported: Jan Kara <jack@suse.cz>
---
 fs/ext4/fsync.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 712f00995390..5508baa11bb6 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -116,16 +116,8 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		goto out;
 	}

-	ret = file_write_and_wait_range(file, start, end);
-	if (ret)
-		return ret;
-
 	if (!journal) {
-		struct writeback_control wbc = {
-			.sync_mode = WB_SYNC_ALL
-		};
-
-		ret = ext4_write_inode(inode, &wbc);
+		ret = __generic_file_fsync(file, start, end, datasync);
 		if (!ret)
 			ret = ext4_sync_parent(inode);
 		if (test_opt(inode->i_sb, BARRIER))
@@ -133,6 +125,9 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		goto out;
 	}

+	ret = file_write_and_wait_range(file, start, end);
+	if (ret)
+		return ret;
 	/*
 	 * data=writeback,ordered:
 	 *  The caller's filemap_fdatawrite()/wait will sync the data.
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
  2019-02-01  4:42 [PATCH] Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal" Theodore Ts'o
@ 2019-02-01 21:21 ` Jan Kara
  2019-02-02  4:08   ` Theodore Y. Ts'o
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2019-02-01 21:21 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Ext4 Developers List, jack

On Thu 31-01-19 23:42:19, Theodore Ts'o wrote:
> This reverts commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a.
> 
> As Jan Kara pointed out, this change was unsafe since it means we lose
> the call to sync_mapping_buffers() in the nojournal case.  The
> original point of the commit was avoid taking the inode mutex (since
> it causes a lockdep warning in generic/113); but we need the mutex in
> order to call sync_mapping_buffers().

Actually, I don't think sync_mapping_buffers() needs inode mutex (i_rwsem
these days). It uses blkdev_mapping->private_lock for synchronization of
operations on the list of buffers and fsync_buffers_list() seems to be
pretty careful about races with mark_buffer_dirty_inode(). So why do you
think we need i_rwsem?

> The real fix to this problem was discussed here:
> 
> https://lore.kernel.org/lkml/20181025150540.259281-4-bvanassche@acm.org
> 
> The proposed patch was to fix a syzbot complaint, but the problem can
> also demonstrated via "kvm-xfstests -c nojournal generic/113".
> Multiple solutions were discused in the e-mail thread, but none have
> landed in the kernel as of this writing.  Anyway, commit
> ad211f3e94b314 is absolutely the wrong way to suppress the lockdep, so
> revert it.
> 
> Fixes: ad211f3e94b314a910d4af03178a0b52a7d1ee0a ("ext4: use ext4_write_inode() when fsyncing w/o a journal")
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Reported: Jan Kara <jack@suse.cz>

So if you decide to go via a safe way of reverting the change, I'm fine
with that so feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/fsync.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index 712f00995390..5508baa11bb6 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -116,16 +116,8 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  		goto out;
>  	}
>  
> -	ret = file_write_and_wait_range(file, start, end);
> -	if (ret)
> -		return ret;
> -
>  	if (!journal) {
> -		struct writeback_control wbc = {
> -			.sync_mode = WB_SYNC_ALL
> -		};
> -
> -		ret = ext4_write_inode(inode, &wbc);
> +		ret = __generic_file_fsync(file, start, end, datasync);
>  		if (!ret)
>  			ret = ext4_sync_parent(inode);
>  		if (test_opt(inode->i_sb, BARRIER))
> @@ -133,6 +125,9 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  		goto out;
>  	}
>  
> +	ret = file_write_and_wait_range(file, start, end);
> +	if (ret)
> +		return ret;
>  	/*
>  	 * data=writeback,ordered:
>  	 *  The caller's filemap_fdatawrite()/wait will sync the data.
> -- 
> 2.19.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
  2019-02-01 21:21 ` Jan Kara
@ 2019-02-02  4:08   ` Theodore Y. Ts'o
  2019-02-04  9:45     ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Theodore Y. Ts'o @ 2019-02-02  4:08 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ext4 Developers List

On Fri, Feb 01, 2019 at 10:21:20PM +0100, Jan Kara wrote:
> On Thu 31-01-19 23:42:19, Theodore Ts'o wrote:
> > This reverts commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a.
> > 
> > As Jan Kara pointed out, this change was unsafe since it means we lose
> > the call to sync_mapping_buffers() in the nojournal case.  The
> > original point of the commit was avoid taking the inode mutex (since
> > it causes a lockdep warning in generic/113); but we need the mutex in
> > order to call sync_mapping_buffers().
> 
> Actually, I don't think sync_mapping_buffers() needs inode mutex (i_rwsem
> these days). It uses blkdev_mapping->private_lock for synchronization of
> operations on the list of buffers and fsync_buffers_list() seems to be
> pretty careful about races with mark_buffer_dirty_inode(). So why do you
> think we need i_rwsem?

Hmm, I think you're right.  I wonder if we can therefore remove the
inode_lock() in __generic_file_fsync() then...   What do you think?

     			       		      	 - Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
  2019-02-02  4:08   ` Theodore Y. Ts'o
@ 2019-02-04  9:45     ` Jan Kara
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Kara @ 2019-02-04  9:45 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: Jan Kara, Ext4 Developers List

On Fri 01-02-19 23:08:11, Theodore Y. Ts'o wrote:
> On Fri, Feb 01, 2019 at 10:21:20PM +0100, Jan Kara wrote:
> > On Thu 31-01-19 23:42:19, Theodore Ts'o wrote:
> > > This reverts commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a.
> > > 
> > > As Jan Kara pointed out, this change was unsafe since it means we lose
> > > the call to sync_mapping_buffers() in the nojournal case.  The
> > > original point of the commit was avoid taking the inode mutex (since
> > > it causes a lockdep warning in generic/113); but we need the mutex in
> > > order to call sync_mapping_buffers().
> > 
> > Actually, I don't think sync_mapping_buffers() needs inode mutex (i_rwsem
> > these days). It uses blkdev_mapping->private_lock for synchronization of
> > operations on the list of buffers and fsync_buffers_list() seems to be
> > pretty careful about races with mark_buffer_dirty_inode(). So why do you
> > think we need i_rwsem?
> 
> Hmm, I think you're right.  I wonder if we can therefore remove the
> inode_lock() in __generic_file_fsync() then...   What do you think?

That's actually a good question. I was thinking about why we have
inode_lock() in __generic_file_fsync().  The only reason I could come up
with is that when fsync(2) races with write(2) or truncate(2), with
inode_lock() in __generic_file_fsync() you will either get old or new
metadata state on disk. Without inode_lock() you could get some
intermediate metadata state and thus after a crash may not be able to see
even the old data. We are here on the thin ice of how good data consistency
do we provide after a crash for non-journalling filesystems. It is never
going to be perfect but this change would seem like a noticeable regression
to me. What do you think?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-02-04  9:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-02-01  4:42 [PATCH] Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal" Theodore Ts'o
2019-02-01 21:21 ` Jan Kara
2019-02-02  4:08   ` Theodore Y. Ts'o
2019-02-04  9:45     ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).