* [PATCH] Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
@ 2019-02-01 4:42 Theodore Ts'o
2019-02-01 21:21 ` Jan Kara
0 siblings, 1 reply; 4+ messages in thread
From: Theodore Ts'o @ 2019-02-01 4:42 UTC (permalink / raw)
To: Ext4 Developers List; +Cc: jack, Theodore Ts'o
This reverts commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a.
As Jan Kara pointed out, this change was unsafe since it means we lose
the call to sync_mapping_buffers() in the nojournal case. The
original point of the commit was avoid taking the inode mutex (since
it causes a lockdep warning in generic/113); but we need the mutex in
order to call sync_mapping_buffers().
The real fix to this problem was discussed here:
https://lore.kernel.org/lkml/20181025150540.259281-4-bvanassche@acm.org
The proposed patch was to fix a syzbot complaint, but the problem can
also demonstrated via "kvm-xfstests -c nojournal generic/113".
Multiple solutions were discused in the e-mail thread, but none have
landed in the kernel as of this writing. Anyway, commit
ad211f3e94b314 is absolutely the wrong way to suppress the lockdep, so
revert it.
Fixes: ad211f3e94b314a910d4af03178a0b52a7d1ee0a ("ext4: use ext4_write_inode() when fsyncing w/o a journal")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported: Jan Kara <jack@suse.cz>
---
fs/ext4/fsync.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 712f00995390..5508baa11bb6 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -116,16 +116,8 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
goto out;
}
- ret = file_write_and_wait_range(file, start, end);
- if (ret)
- return ret;
-
if (!journal) {
- struct writeback_control wbc = {
- .sync_mode = WB_SYNC_ALL
- };
-
- ret = ext4_write_inode(inode, &wbc);
+ ret = __generic_file_fsync(file, start, end, datasync);
if (!ret)
ret = ext4_sync_parent(inode);
if (test_opt(inode->i_sb, BARRIER))
@@ -133,6 +125,9 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
goto out;
}
+ ret = file_write_and_wait_range(file, start, end);
+ if (ret)
+ return ret;
/*
* data=writeback,ordered:
* The caller's filemap_fdatawrite()/wait will sync the data.
--
2.19.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
2019-02-01 4:42 [PATCH] Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal" Theodore Ts'o
@ 2019-02-01 21:21 ` Jan Kara
2019-02-02 4:08 ` Theodore Y. Ts'o
0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2019-02-01 21:21 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Ext4 Developers List, jack
On Thu 31-01-19 23:42:19, Theodore Ts'o wrote:
> This reverts commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a.
>
> As Jan Kara pointed out, this change was unsafe since it means we lose
> the call to sync_mapping_buffers() in the nojournal case. The
> original point of the commit was avoid taking the inode mutex (since
> it causes a lockdep warning in generic/113); but we need the mutex in
> order to call sync_mapping_buffers().
Actually, I don't think sync_mapping_buffers() needs inode mutex (i_rwsem
these days). It uses blkdev_mapping->private_lock for synchronization of
operations on the list of buffers and fsync_buffers_list() seems to be
pretty careful about races with mark_buffer_dirty_inode(). So why do you
think we need i_rwsem?
> The real fix to this problem was discussed here:
>
> https://lore.kernel.org/lkml/20181025150540.259281-4-bvanassche@acm.org
>
> The proposed patch was to fix a syzbot complaint, but the problem can
> also demonstrated via "kvm-xfstests -c nojournal generic/113".
> Multiple solutions were discused in the e-mail thread, but none have
> landed in the kernel as of this writing. Anyway, commit
> ad211f3e94b314 is absolutely the wrong way to suppress the lockdep, so
> revert it.
>
> Fixes: ad211f3e94b314a910d4af03178a0b52a7d1ee0a ("ext4: use ext4_write_inode() when fsyncing w/o a journal")
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Reported: Jan Kara <jack@suse.cz>
So if you decide to go via a safe way of reverting the change, I'm fine
with that so feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/ext4/fsync.c | 13 ++++---------
> 1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index 712f00995390..5508baa11bb6 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -116,16 +116,8 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> goto out;
> }
>
> - ret = file_write_and_wait_range(file, start, end);
> - if (ret)
> - return ret;
> -
> if (!journal) {
> - struct writeback_control wbc = {
> - .sync_mode = WB_SYNC_ALL
> - };
> -
> - ret = ext4_write_inode(inode, &wbc);
> + ret = __generic_file_fsync(file, start, end, datasync);
> if (!ret)
> ret = ext4_sync_parent(inode);
> if (test_opt(inode->i_sb, BARRIER))
> @@ -133,6 +125,9 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> goto out;
> }
>
> + ret = file_write_and_wait_range(file, start, end);
> + if (ret)
> + return ret;
> /*
> * data=writeback,ordered:
> * The caller's filemap_fdatawrite()/wait will sync the data.
> --
> 2.19.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
2019-02-01 21:21 ` Jan Kara
@ 2019-02-02 4:08 ` Theodore Y. Ts'o
2019-02-04 9:45 ` Jan Kara
0 siblings, 1 reply; 4+ messages in thread
From: Theodore Y. Ts'o @ 2019-02-02 4:08 UTC (permalink / raw)
To: Jan Kara; +Cc: Ext4 Developers List
On Fri, Feb 01, 2019 at 10:21:20PM +0100, Jan Kara wrote:
> On Thu 31-01-19 23:42:19, Theodore Ts'o wrote:
> > This reverts commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a.
> >
> > As Jan Kara pointed out, this change was unsafe since it means we lose
> > the call to sync_mapping_buffers() in the nojournal case. The
> > original point of the commit was avoid taking the inode mutex (since
> > it causes a lockdep warning in generic/113); but we need the mutex in
> > order to call sync_mapping_buffers().
>
> Actually, I don't think sync_mapping_buffers() needs inode mutex (i_rwsem
> these days). It uses blkdev_mapping->private_lock for synchronization of
> operations on the list of buffers and fsync_buffers_list() seems to be
> pretty careful about races with mark_buffer_dirty_inode(). So why do you
> think we need i_rwsem?
Hmm, I think you're right. I wonder if we can therefore remove the
inode_lock() in __generic_file_fsync() then... What do you think?
- Ted
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
2019-02-02 4:08 ` Theodore Y. Ts'o
@ 2019-02-04 9:45 ` Jan Kara
0 siblings, 0 replies; 4+ messages in thread
From: Jan Kara @ 2019-02-04 9:45 UTC (permalink / raw)
To: Theodore Y. Ts'o; +Cc: Jan Kara, Ext4 Developers List
On Fri 01-02-19 23:08:11, Theodore Y. Ts'o wrote:
> On Fri, Feb 01, 2019 at 10:21:20PM +0100, Jan Kara wrote:
> > On Thu 31-01-19 23:42:19, Theodore Ts'o wrote:
> > > This reverts commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a.
> > >
> > > As Jan Kara pointed out, this change was unsafe since it means we lose
> > > the call to sync_mapping_buffers() in the nojournal case. The
> > > original point of the commit was avoid taking the inode mutex (since
> > > it causes a lockdep warning in generic/113); but we need the mutex in
> > > order to call sync_mapping_buffers().
> >
> > Actually, I don't think sync_mapping_buffers() needs inode mutex (i_rwsem
> > these days). It uses blkdev_mapping->private_lock for synchronization of
> > operations on the list of buffers and fsync_buffers_list() seems to be
> > pretty careful about races with mark_buffer_dirty_inode(). So why do you
> > think we need i_rwsem?
>
> Hmm, I think you're right. I wonder if we can therefore remove the
> inode_lock() in __generic_file_fsync() then... What do you think?
That's actually a good question. I was thinking about why we have
inode_lock() in __generic_file_fsync(). The only reason I could come up
with is that when fsync(2) races with write(2) or truncate(2), with
inode_lock() in __generic_file_fsync() you will either get old or new
metadata state on disk. Without inode_lock() you could get some
intermediate metadata state and thus after a crash may not be able to see
even the old data. We are here on the thin ice of how good data consistency
do we provide after a crash for non-journalling filesystems. It is never
going to be perfect but this change would seem like a noticeable regression
to me. What do you think?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-02-04 9:45 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-02-01 4:42 [PATCH] Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal" Theodore Ts'o
2019-02-01 21:21 ` Jan Kara
2019-02-02 4:08 ` Theodore Y. Ts'o
2019-02-04 9:45 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).