* [4.14-rc1 bug] fstests generic/441 failure on ext2 @ 2017-09-18 11:23 Eryu Guan 2017-09-18 12:10 ` Jeff Layton 2017-09-22 13:33 ` [PATCH] mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC Jeff Layton 0 siblings, 2 replies; 8+ messages in thread From: Eryu Guan @ 2017-09-18 11:23 UTC (permalink / raw) To: linux-fsdevel; +Cc: linux-ext4, Jeff Layton Hi all, With ext2 driven by ext4 module (or ext4 without journal, I haven't tested ext2 module, but I guess the result is the same), v4.14-rc1 kernel starts to fail fstests generic/441 as: +First fsync after reopen of fd[0] failed: Input/output error git bisect shows that this is uncovered by commit ffb959bbdf92 ("mm: remove optimizations based on i_size in mapping writeback waits"), which removed (i_size == 0) check in filemap_fdatawait(). I say "uncovered" because test fails with 4.13 kernel too if we re-open the test file without O_TRUNC flag in src/fsync-err.c (so file size is not zero, and fails the i_size == 0 check). The EIO was returned by sync_inode_metadata() in __generic_file_fsync(), the call trace is like: do_fsync vfs_fsync_range ext4_sync_file __generic_file_fsync sync_inode_metadata writeback_single_inode __writeback_single_inode filemap_fdatawait => EIO here Thanks, Eryu ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [4.14-rc1 bug] fstests generic/441 failure on ext2 2017-09-18 11:23 [4.14-rc1 bug] fstests generic/441 failure on ext2 Eryu Guan @ 2017-09-18 12:10 ` Jeff Layton 2017-09-19 14:57 ` Jan Kara 2017-09-22 13:33 ` [PATCH] mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC Jeff Layton 1 sibling, 1 reply; 8+ messages in thread From: Jeff Layton @ 2017-09-18 12:10 UTC (permalink / raw) To: Eryu Guan, linux-fsdevel; +Cc: linux-ext4, Jan Kara, linux-fsdevel On Mon, 2017-09-18 at 19:23 +0800, Eryu Guan wrote: > Hi all, > > With ext2 driven by ext4 module (or ext4 without journal, I haven't > tested ext2 module, but I guess the result is the same), v4.14-rc1 > kernel starts to fail fstests generic/441 as: > > +First fsync after reopen of fd[0] failed: Input/output error > > git bisect shows that this is uncovered by commit ffb959bbdf92 ("mm: > remove optimizations based on i_size in mapping writeback waits"), which > removed (i_size == 0) check in filemap_fdatawait(). > > I say "uncovered" because test fails with 4.13 kernel too if we re-open > the test file without O_TRUNC flag in src/fsync-err.c (so file size is > not zero, and fails the i_size == 0 check). > > The EIO was returned by sync_inode_metadata() in __generic_file_fsync(), > the call trace is like: > > do_fsync > vfs_fsync_range > ext4_sync_file > __generic_file_fsync > sync_inode_metadata > writeback_single_inode > __writeback_single_inode > filemap_fdatawait => EIO here > > Thanks, > Eryu (cc'ing Jan and linux-fsdevel) Thanks for the bug report. The analysis looks spot-on. So yeah...we have this "legacy" filemap_fdatawait call in __writeback_single_inode, and that is returning -EIO, likely because AS_EIO was set on the inode from the earlier wb errors. That error return is pretty sketchy since it could be cleared at any time, and pretty much everything we care about here is now using errseq_t for error reporting at fsync. I don't think we really care too much about that flag in this codepath anymore. Based on the comments in that function, all we really care about there is waiting until writeback completes. One possible fix would be to just have __writeback_single_inode ignore the error return from filemap_fdatawait. Since we know that AS_EIO can be cleared at any time, we'll just assume that it always is. Longer term, I think we need to consider how we can rid ourselves of AS_EIO/AS_ENOSPC altogether. Anyway, something like this should fix it, I'd think. Anyone relying on getting the error there is probably subtly broken, and should be using errseq_t anyway. Thoughts? diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 245c430a2e41..b9f523ac07b8 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1325,11 +1325,8 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) * separate, external IO completion path and ->sync_fs for guaranteeing * inode metadata is written back correctly. */ - if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) { - int err = filemap_fdatawait(mapping); - if (ret == 0) - ret = err; - } + if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) + filemap_fdatawait(mapping); /* * Some filesystems may redirty the inode during the writeback ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [4.14-rc1 bug] fstests generic/441 failure on ext2 2017-09-18 12:10 ` Jeff Layton @ 2017-09-19 14:57 ` Jan Kara 2017-09-19 15:09 ` Jeff Layton 0 siblings, 1 reply; 8+ messages in thread From: Jan Kara @ 2017-09-19 14:57 UTC (permalink / raw) To: Jeff Layton; +Cc: Eryu Guan, linux-fsdevel, linux-ext4, Jan Kara On Mon 18-09-17 08:10:24, Jeff Layton wrote: > On Mon, 2017-09-18 at 19:23 +0800, Eryu Guan wrote: > > Hi all, > > > > With ext2 driven by ext4 module (or ext4 without journal, I haven't > > tested ext2 module, but I guess the result is the same), v4.14-rc1 > > kernel starts to fail fstests generic/441 as: > > > > +First fsync after reopen of fd[0] failed: Input/output error > > > > git bisect shows that this is uncovered by commit ffb959bbdf92 ("mm: > > remove optimizations based on i_size in mapping writeback waits"), which > > removed (i_size == 0) check in filemap_fdatawait(). > > > > I say "uncovered" because test fails with 4.13 kernel too if we re-open > > the test file without O_TRUNC flag in src/fsync-err.c (so file size is > > not zero, and fails the i_size == 0 check). > > > > The EIO was returned by sync_inode_metadata() in __generic_file_fsync(), > > the call trace is like: > > > > do_fsync > > vfs_fsync_range > > ext4_sync_file > > __generic_file_fsync > > sync_inode_metadata > > writeback_single_inode > > __writeback_single_inode > > filemap_fdatawait => EIO here > > > > Thanks, > > Eryu > > (cc'ing Jan and linux-fsdevel) > > Thanks for the bug report. The analysis looks spot-on. > > So yeah...we have this "legacy" filemap_fdatawait call in > __writeback_single_inode, and that is returning -EIO, likely because > AS_EIO was set on the inode from the earlier wb errors. > > That error return is pretty sketchy since it could be cleared at any > time, and pretty much everything we care about here is now using > errseq_t for error reporting at fsync. I don't think we really care too > much about that flag in this codepath anymore. So I agree fsync(2) path is covered but that fdatawait() call is also responsible for reporting error e.g. for write_inode_now() calls and there we still have some unconverted users. So for now I don't have a better solution than to live with this additional somewhat stale EIO error. Or possibly we can have truncate to 0 clear writeback error which would mask the problem again and kind of makes sense... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [4.14-rc1 bug] fstests generic/441 failure on ext2 2017-09-19 14:57 ` Jan Kara @ 2017-09-19 15:09 ` Jeff Layton 2017-09-20 11:12 ` Jan Kara 0 siblings, 1 reply; 8+ messages in thread From: Jeff Layton @ 2017-09-19 15:09 UTC (permalink / raw) To: Jan Kara; +Cc: Eryu Guan, linux-fsdevel, linux-ext4, Jan Kara On Tue, 2017-09-19 at 16:57 +0200, Jan Kara wrote: > On Mon 18-09-17 08:10:24, Jeff Layton wrote: > > On Mon, 2017-09-18 at 19:23 +0800, Eryu Guan wrote: > > > Hi all, > > > > > > With ext2 driven by ext4 module (or ext4 without journal, I haven't > > > tested ext2 module, but I guess the result is the same), v4.14-rc1 > > > kernel starts to fail fstests generic/441 as: > > > > > > +First fsync after reopen of fd[0] failed: Input/output error > > > > > > git bisect shows that this is uncovered by commit ffb959bbdf92 ("mm: > > > remove optimizations based on i_size in mapping writeback waits"), which > > > removed (i_size == 0) check in filemap_fdatawait(). > > > > > > I say "uncovered" because test fails with 4.13 kernel too if we re-open > > > the test file without O_TRUNC flag in src/fsync-err.c (so file size is > > > not zero, and fails the i_size == 0 check). > > > > > > The EIO was returned by sync_inode_metadata() in __generic_file_fsync(), > > > the call trace is like: > > > > > > do_fsync > > > vfs_fsync_range > > > ext4_sync_file > > > __generic_file_fsync > > > sync_inode_metadata > > > writeback_single_inode > > > __writeback_single_inode > > > filemap_fdatawait => EIO here > > > > > > Thanks, > > > Eryu > > > > (cc'ing Jan and linux-fsdevel) > > > > Thanks for the bug report. The analysis looks spot-on. > > > > So yeah...we have this "legacy" filemap_fdatawait call in > > __writeback_single_inode, and that is returning -EIO, likely because > > AS_EIO was set on the inode from the earlier wb errors. > > > > That error return is pretty sketchy since it could be cleared at any > > time, and pretty much everything we care about here is now using > > errseq_t for error reporting at fsync. I don't think we really care too > > much about that flag in this codepath anymore. > > So I agree fsync(2) path is covered but that fdatawait() call is also > responsible for reporting error e.g. for write_inode_now() calls and there > we still have some unconverted users. So for now I don't have a better > solution than to live with this additional somewhat stale EIO error. Or > possibly we can have truncate to 0 clear writeback error which would mask > the problem again and kind of makes sense... > > Another thought would be to have file_check_and_advance_wb_err (and maybe filemap_check_wb_err) also clear AS_EIO and AS_ENOSPC. That sort of makes sense since legacy users would have cleared those flags. I had a patch that did that at one point, but dropped it since it didn't seem necessary. That might be the best fix for this though. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [4.14-rc1 bug] fstests generic/441 failure on ext2 2017-09-19 15:09 ` Jeff Layton @ 2017-09-20 11:12 ` Jan Kara 0 siblings, 0 replies; 8+ messages in thread From: Jan Kara @ 2017-09-20 11:12 UTC (permalink / raw) To: Jeff Layton; +Cc: Jan Kara, Eryu Guan, linux-fsdevel, linux-ext4, Jan Kara On Tue 19-09-17 11:09:11, Jeff Layton wrote: > On Tue, 2017-09-19 at 16:57 +0200, Jan Kara wrote: > > On Mon 18-09-17 08:10:24, Jeff Layton wrote: > > > On Mon, 2017-09-18 at 19:23 +0800, Eryu Guan wrote: > > > > Hi all, > > > > > > > > With ext2 driven by ext4 module (or ext4 without journal, I haven't > > > > tested ext2 module, but I guess the result is the same), v4.14-rc1 > > > > kernel starts to fail fstests generic/441 as: > > > > > > > > +First fsync after reopen of fd[0] failed: Input/output error > > > > > > > > git bisect shows that this is uncovered by commit ffb959bbdf92 ("mm: > > > > remove optimizations based on i_size in mapping writeback waits"), which > > > > removed (i_size == 0) check in filemap_fdatawait(). > > > > > > > > I say "uncovered" because test fails with 4.13 kernel too if we re-open > > > > the test file without O_TRUNC flag in src/fsync-err.c (so file size is > > > > not zero, and fails the i_size == 0 check). > > > > > > > > The EIO was returned by sync_inode_metadata() in __generic_file_fsync(), > > > > the call trace is like: > > > > > > > > do_fsync > > > > vfs_fsync_range > > > > ext4_sync_file > > > > __generic_file_fsync > > > > sync_inode_metadata > > > > writeback_single_inode > > > > __writeback_single_inode > > > > filemap_fdatawait => EIO here > > > > > > > > Thanks, > > > > Eryu > > > > > > (cc'ing Jan and linux-fsdevel) > > > > > > Thanks for the bug report. The analysis looks spot-on. > > > > > > So yeah...we have this "legacy" filemap_fdatawait call in > > > __writeback_single_inode, and that is returning -EIO, likely because > > > AS_EIO was set on the inode from the earlier wb errors. > > > > > > That error return is pretty sketchy since it could be cleared at any > > > time, and pretty much everything we care about here is now using > > > errseq_t for error reporting at fsync. I don't think we really care too > > > much about that flag in this codepath anymore. > > > > So I agree fsync(2) path is covered but that fdatawait() call is also > > responsible for reporting error e.g. for write_inode_now() calls and there > > we still have some unconverted users. So for now I don't have a better > > solution than to live with this additional somewhat stale EIO error. Or > > possibly we can have truncate to 0 clear writeback error which would mask > > the problem again and kind of makes sense... > > > > > > Another thought would be to have file_check_and_advance_wb_err (and > maybe filemap_check_wb_err) also clear AS_EIO and AS_ENOSPC. That sort > of makes sense since legacy users would have cleared those flags. > > I had a patch that did that at one point, but dropped it since it didn't > seem necessary. That might be the best fix for this though. Yeah, that would be also a workable option. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC 2017-09-18 11:23 [4.14-rc1 bug] fstests generic/441 failure on ext2 Eryu Guan 2017-09-18 12:10 ` Jeff Layton @ 2017-09-22 13:33 ` Jeff Layton 2017-09-25 8:17 ` Jan Kara 2017-09-25 19:53 ` Jeff Layton 1 sibling, 2 replies; 8+ messages in thread From: Jeff Layton @ 2017-09-22 13:33 UTC (permalink / raw) To: linux-fsdevel; +Cc: Eryu Guan, Jan Kara, linux-ext4 From: Jeff Layton <jlayton@redhat.com> Eryu noticed that he could sometimes get a leftover error reported when it shouldn't be on fsync with ext2 and non-journalled ext4. The problem is that writeback_single_inode still uses filemap_fdatawait. That picks up a previously set AS_EIO flag, which would ordinarily have been cleared before. Since we're mostly using this function as a replacement for filemap_check_errors, have filemap_check_and_advance_wb_err clear AS_EIO and AS_ENOSPC when reporting an error. That should allow the new function to better emulate the behavior of the old with respect to these flags. Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> --- mm/filemap.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index 870971e20967..404722ea0fdd 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -620,6 +620,14 @@ int file_check_and_advance_wb_err(struct file *file) trace_file_check_and_advance_wb_err(file, old); spin_unlock(&file->f_lock); } + + /* + * We're mostly using this function as a drop in replacement for + * filemap_check_errors. Clear AS_EIO/AS_ENOSPC to emulate the effect + * that the legacy code would have had on these flags. + */ + clear_bit(AS_EIO, &mapping->flags); + clear_bit(AS_ENOSPC, &mapping->flags); return err; } EXPORT_SYMBOL(file_check_and_advance_wb_err); -- 2.13.5 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC 2017-09-22 13:33 ` [PATCH] mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC Jeff Layton @ 2017-09-25 8:17 ` Jan Kara 2017-09-25 19:53 ` Jeff Layton 1 sibling, 0 replies; 8+ messages in thread From: Jan Kara @ 2017-09-25 8:17 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-fsdevel, Eryu Guan, Jan Kara, linux-ext4 On Fri 22-09-17 09:33:31, Jeff Layton wrote: > From: Jeff Layton <jlayton@redhat.com> > > Eryu noticed that he could sometimes get a leftover error reported when > it shouldn't be on fsync with ext2 and non-journalled ext4. The problem > is that writeback_single_inode still uses filemap_fdatawait. That picks > up a previously set AS_EIO flag, which would ordinarily have been > cleared before. > > Since we're mostly using this function as a replacement for > filemap_check_errors, have filemap_check_and_advance_wb_err clear AS_EIO > and AS_ENOSPC when reporting an error. That should allow the new > function to better emulate the behavior of the old with respect to these > flags. > > Reported-by: Eryu Guan <eguan@redhat.com> > Signed-off-by: Jeff Layton <jlayton@redhat.com> Looks good. You can add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > mm/filemap.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 870971e20967..404722ea0fdd 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -620,6 +620,14 @@ int file_check_and_advance_wb_err(struct file *file) > trace_file_check_and_advance_wb_err(file, old); > spin_unlock(&file->f_lock); > } > + > + /* > + * We're mostly using this function as a drop in replacement for > + * filemap_check_errors. Clear AS_EIO/AS_ENOSPC to emulate the effect > + * that the legacy code would have had on these flags. > + */ > + clear_bit(AS_EIO, &mapping->flags); > + clear_bit(AS_ENOSPC, &mapping->flags); > return err; > } > EXPORT_SYMBOL(file_check_and_advance_wb_err); > -- > 2.13.5 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC 2017-09-22 13:33 ` [PATCH] mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC Jeff Layton 2017-09-25 8:17 ` Jan Kara @ 2017-09-25 19:53 ` Jeff Layton 1 sibling, 0 replies; 8+ messages in thread From: Jeff Layton @ 2017-09-25 19:53 UTC (permalink / raw) To: Andrew Morton; +Cc: Eryu Guan, Jan Kara, linux-ext4, Jeff Layton, linux-fsdevel On Fri, 2017-09-22 at 09:33 -0400, Jeff Layton wrote: > From: Jeff Layton <jlayton@redhat.com> > > Eryu noticed that he could sometimes get a leftover error reported when > it shouldn't be on fsync with ext2 and non-journalled ext4. The problem > is that writeback_single_inode still uses filemap_fdatawait. That picks > up a previously set AS_EIO flag, which would ordinarily have been > cleared before. > > Since we're mostly using this function as a replacement for > filemap_check_errors, have filemap_check_and_advance_wb_err clear AS_EIO > and AS_ENOSPC when reporting an error. That should allow the new > function to better emulate the behavior of the old with respect to these > flags. > > Reported-by: Eryu Guan <eguan@redhat.com> > Signed-off-by: Jeff Layton <jlayton@redhat.com> > --- > mm/filemap.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 870971e20967..404722ea0fdd 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -620,6 +620,14 @@ int file_check_and_advance_wb_err(struct file *file) > trace_file_check_and_advance_wb_err(file, old); > spin_unlock(&file->f_lock); > } > + > + /* > + * We're mostly using this function as a drop in replacement for > + * filemap_check_errors. Clear AS_EIO/AS_ENOSPC to emulate the effect > + * that the legacy code would have had on these flags. > + */ > + clear_bit(AS_EIO, &mapping->flags); > + clear_bit(AS_ENOSPC, &mapping->flags); > return err; > } > EXPORT_SYMBOL(file_check_and_advance_wb_err); Andrew, would you mind picking this patch up? It seems to work fine for me, but it wouldn't hurt to let it stew in linux-next for a bit. Thanks, -- Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-09-25 19:53 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-09-18 11:23 [4.14-rc1 bug] fstests generic/441 failure on ext2 Eryu Guan 2017-09-18 12:10 ` Jeff Layton 2017-09-19 14:57 ` Jan Kara 2017-09-19 15:09 ` Jeff Layton 2017-09-20 11:12 ` Jan Kara 2017-09-22 13:33 ` [PATCH] mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC Jeff Layton 2017-09-25 8:17 ` Jan Kara 2017-09-25 19:53 ` Jeff Layton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).