* [PATCH v2 01/12] fs: fix lazytime expiration handling in __writeback_single_inode() [not found] <20210109075903.208222-1-ebiggers@kernel.org> @ 2021-01-09 7:58 ` Eric Biggers 2021-01-11 10:48 ` Christoph Hellwig 2021-01-11 14:46 ` Jan Kara 0 siblings, 2 replies; 3+ messages in thread From: Eric Biggers @ 2021-01-09 7:58 UTC (permalink / raw) To: linux-fsdevel Cc: linux-xfs, linux-ext4, linux-f2fs-devel, Theodore Ts'o, Christoph Hellwig, stable, Jan Kara From: Eric Biggers <ebiggers@google.com> When lazytime is enabled and an inode is being written due to its in-memory updated timestamps having expired, either due to a sync() or syncfs() system call or due to dirtytime_expire_interval having elapsed, the VFS needs to inform the filesystem so that the filesystem can copy the inode's timestamps out to the on-disk data structures. This is done by __writeback_single_inode() calling mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC). However, this occurs after __writeback_single_inode() has already cleared the dirty flags from ->i_state. This causes two bugs: - mark_inode_dirty_sync() redirties the inode, causing it to remain dirty. This wastefully causes the inode to be written twice. But more importantly, it breaks cases where sync_filesystem() is expected to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY ioctl (as reported at https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well as possibly filesystem freezing (freeze_super()). - Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is called from __writeback_single_inode() for lazytime expiration, xfs_fs_dirty_inode() ignores the notification. (XFS only cares about lazytime expirations, and it assumes that I_DIRTY_TIME will contain i_state during those.) Therefore, lazy timestamps aren't persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS. Fix this by moving the call to mark_inode_dirty_sync() to earlier in __writeback_single_inode(), before the dirty flags are cleared from i_state. This makes filesystems be properly notified of the timestamp expiration, and it avoids incorrectly redirtying the inode. This fixes xfstest generic/580 (which tests FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime enabled. It also fixes the new lazytime xfstest I've proposed, which reproduces the above-mentioned XFS bug (https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org). Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the right thing to do because mark_inode_dirty_sync() now knows not to move the inode to a writeback list if it is currently queued for sync. Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option") Cc: stable@vger.kernel.org Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback") Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Eric Biggers <ebiggers@google.com> --- fs/fs-writeback.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index acfb55834af23..c41cb887eb7d3 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1474,21 +1474,25 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) } /* - * Some filesystems may redirty the inode during the writeback - * due to delalloc, clear dirty metadata flags right before - * write_inode() + * If the inode has dirty timestamps and we need to write them, call + * mark_inode_dirty_sync() to notify the filesystem about it and to + * change I_DIRTY_TIME into I_DIRTY_SYNC. */ - spin_lock(&inode->i_lock); - - dirty = inode->i_state & I_DIRTY; if ((inode->i_state & I_DIRTY_TIME) && - ((dirty & I_DIRTY_INODE) || - wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync || + (wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync || time_after(jiffies, inode->dirtied_time_when + dirtytime_expire_interval * HZ))) { - dirty |= I_DIRTY_TIME; trace_writeback_lazytime(inode); + mark_inode_dirty_sync(inode); } + + /* + * Some filesystems may redirty the inode during the writeback + * due to delalloc, clear dirty metadata flags right before + * write_inode() + */ + spin_lock(&inode->i_lock); + dirty = inode->i_state & I_DIRTY; inode->i_state &= ~dirty; /* @@ -1509,8 +1513,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) spin_unlock(&inode->i_lock); - if (dirty & I_DIRTY_TIME) - mark_inode_dirty_sync(inode); /* Don't write the inode if only I_DIRTY_PAGES was set */ if (dirty & ~I_DIRTY_PAGES) { int err = write_inode(inode, wbc); -- 2.30.0 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2 01/12] fs: fix lazytime expiration handling in __writeback_single_inode() 2021-01-09 7:58 ` [PATCH v2 01/12] fs: fix lazytime expiration handling in __writeback_single_inode() Eric Biggers @ 2021-01-11 10:48 ` Christoph Hellwig 2021-01-11 14:46 ` Jan Kara 1 sibling, 0 replies; 3+ messages in thread From: Christoph Hellwig @ 2021-01-11 10:48 UTC (permalink / raw) To: Eric Biggers Cc: linux-fsdevel, linux-xfs, linux-ext4, linux-f2fs-devel, Theodore Ts'o, Christoph Hellwig, stable, Jan Kara Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2 01/12] fs: fix lazytime expiration handling in __writeback_single_inode() 2021-01-09 7:58 ` [PATCH v2 01/12] fs: fix lazytime expiration handling in __writeback_single_inode() Eric Biggers 2021-01-11 10:48 ` Christoph Hellwig @ 2021-01-11 14:46 ` Jan Kara 1 sibling, 0 replies; 3+ messages in thread From: Jan Kara @ 2021-01-11 14:46 UTC (permalink / raw) To: Eric Biggers Cc: linux-fsdevel, linux-xfs, linux-ext4, linux-f2fs-devel, Theodore Ts'o, Christoph Hellwig, stable, Jan Kara On Fri 08-01-21 23:58:52, Eric Biggers wrote: > From: Eric Biggers <ebiggers@google.com> > > When lazytime is enabled and an inode is being written due to its > in-memory updated timestamps having expired, either due to a sync() or > syncfs() system call or due to dirtytime_expire_interval having elapsed, > the VFS needs to inform the filesystem so that the filesystem can copy > the inode's timestamps out to the on-disk data structures. > > This is done by __writeback_single_inode() calling > mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC). > > However, this occurs after __writeback_single_inode() has already > cleared the dirty flags from ->i_state. This causes two bugs: > > - mark_inode_dirty_sync() redirties the inode, causing it to remain > dirty. This wastefully causes the inode to be written twice. But > more importantly, it breaks cases where sync_filesystem() is expected > to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY > ioctl (as reported at > https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well > as possibly filesystem freezing (freeze_super()). > > - Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is > called from __writeback_single_inode() for lazytime expiration, > xfs_fs_dirty_inode() ignores the notification. (XFS only cares about > lazytime expirations, and it assumes that I_DIRTY_TIME will contain > i_state during those.) Therefore, lazy timestamps aren't persisted by > sync(), syncfs(), or dirtytime_expire_interval on XFS. > > Fix this by moving the call to mark_inode_dirty_sync() to earlier in > __writeback_single_inode(), before the dirty flags are cleared from > i_state. This makes filesystems be properly notified of the timestamp > expiration, and it avoids incorrectly redirtying the inode. > > This fixes xfstest generic/580 (which tests > FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime > enabled. It also fixes the new lazytime xfstest I've proposed, which > reproduces the above-mentioned XFS bug > (https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org). > > Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But > due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the > right thing to do because mark_inode_dirty_sync() now knows not to move > the inode to a writeback list if it is currently queued for sync. > > Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option") > Cc: stable@vger.kernel.org > Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback") > Suggested-by: Jan Kara <jack@suse.cz> > Signed-off-by: Eric Biggers <ebiggers@google.com> Thanks for writing this fix! It looks good to me. You can add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/fs-writeback.c | 24 +++++++++++++----------- > 1 file changed, 13 insertions(+), 11 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index acfb55834af23..c41cb887eb7d3 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -1474,21 +1474,25 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) > } > > /* > - * Some filesystems may redirty the inode during the writeback > - * due to delalloc, clear dirty metadata flags right before > - * write_inode() > + * If the inode has dirty timestamps and we need to write them, call > + * mark_inode_dirty_sync() to notify the filesystem about it and to > + * change I_DIRTY_TIME into I_DIRTY_SYNC. > */ > - spin_lock(&inode->i_lock); > - > - dirty = inode->i_state & I_DIRTY; > if ((inode->i_state & I_DIRTY_TIME) && > - ((dirty & I_DIRTY_INODE) || > - wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync || > + (wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync || > time_after(jiffies, inode->dirtied_time_when + > dirtytime_expire_interval * HZ))) { > - dirty |= I_DIRTY_TIME; > trace_writeback_lazytime(inode); > + mark_inode_dirty_sync(inode); > } > + > + /* > + * Some filesystems may redirty the inode during the writeback > + * due to delalloc, clear dirty metadata flags right before > + * write_inode() > + */ > + spin_lock(&inode->i_lock); > + dirty = inode->i_state & I_DIRTY; > inode->i_state &= ~dirty; > > /* > @@ -1509,8 +1513,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) > > spin_unlock(&inode->i_lock); > > - if (dirty & I_DIRTY_TIME) > - mark_inode_dirty_sync(inode); > /* Don't write the inode if only I_DIRTY_PAGES was set */ > if (dirty & ~I_DIRTY_PAGES) { > int err = write_inode(inode, wbc); > -- > 2.30.0 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-01-11 14:46 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20210109075903.208222-1-ebiggers@kernel.org>
2021-01-09 7:58 ` [PATCH v2 01/12] fs: fix lazytime expiration handling in __writeback_single_inode() Eric Biggers
2021-01-11 10:48 ` Christoph Hellwig
2021-01-11 14:46 ` Jan Kara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox