* [PATCH v2 01/12] fs: fix lazytime expiration handling in __writeback_single_inode()
[not found] <20210109075903.208222-1-ebiggers@kernel.org>
@ 2021-01-09 7:58 ` Eric Biggers
2021-01-11 10:48 ` Christoph Hellwig
2021-01-11 14:46 ` Jan Kara
0 siblings, 2 replies; 3+ messages in thread
From: Eric Biggers @ 2021-01-09 7:58 UTC (permalink / raw)
To: linux-fsdevel
Cc: linux-xfs, linux-ext4, linux-f2fs-devel, Theodore Ts'o,
Christoph Hellwig, stable, Jan Kara
From: Eric Biggers <ebiggers@google.com>
When lazytime is enabled and an inode is being written due to its
in-memory updated timestamps having expired, either due to a sync() or
syncfs() system call or due to dirtytime_expire_interval having elapsed,
the VFS needs to inform the filesystem so that the filesystem can copy
the inode's timestamps out to the on-disk data structures.
This is done by __writeback_single_inode() calling
mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC).
However, this occurs after __writeback_single_inode() has already
cleared the dirty flags from ->i_state. This causes two bugs:
- mark_inode_dirty_sync() redirties the inode, causing it to remain
dirty. This wastefully causes the inode to be written twice. But
more importantly, it breaks cases where sync_filesystem() is expected
to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY
ioctl (as reported at
https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well
as possibly filesystem freezing (freeze_super()).
- Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is
called from __writeback_single_inode() for lazytime expiration,
xfs_fs_dirty_inode() ignores the notification. (XFS only cares about
lazytime expirations, and it assumes that I_DIRTY_TIME will contain
i_state during those.) Therefore, lazy timestamps aren't persisted by
sync(), syncfs(), or dirtytime_expire_interval on XFS.
Fix this by moving the call to mark_inode_dirty_sync() to earlier in
__writeback_single_inode(), before the dirty flags are cleared from
i_state. This makes filesystems be properly notified of the timestamp
expiration, and it avoids incorrectly redirtying the inode.
This fixes xfstest generic/580 (which tests
FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime
enabled. It also fixes the new lazytime xfstest I've proposed, which
reproduces the above-mentioned XFS bug
(https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org).
Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But
due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the
right thing to do because mark_inode_dirty_sync() now knows not to move
the inode to a writeback list if it is currently queued for sync.
Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
Cc: stable@vger.kernel.org
Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback")
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
fs/fs-writeback.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index acfb55834af23..c41cb887eb7d3 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1474,21 +1474,25 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
}
/*
- * Some filesystems may redirty the inode during the writeback
- * due to delalloc, clear dirty metadata flags right before
- * write_inode()
+ * If the inode has dirty timestamps and we need to write them, call
+ * mark_inode_dirty_sync() to notify the filesystem about it and to
+ * change I_DIRTY_TIME into I_DIRTY_SYNC.
*/
- spin_lock(&inode->i_lock);
-
- dirty = inode->i_state & I_DIRTY;
if ((inode->i_state & I_DIRTY_TIME) &&
- ((dirty & I_DIRTY_INODE) ||
- wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
+ (wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
time_after(jiffies, inode->dirtied_time_when +
dirtytime_expire_interval * HZ))) {
- dirty |= I_DIRTY_TIME;
trace_writeback_lazytime(inode);
+ mark_inode_dirty_sync(inode);
}
+
+ /*
+ * Some filesystems may redirty the inode during the writeback
+ * due to delalloc, clear dirty metadata flags right before
+ * write_inode()
+ */
+ spin_lock(&inode->i_lock);
+ dirty = inode->i_state & I_DIRTY;
inode->i_state &= ~dirty;
/*
@@ -1509,8 +1513,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
spin_unlock(&inode->i_lock);
- if (dirty & I_DIRTY_TIME)
- mark_inode_dirty_sync(inode);
/* Don't write the inode if only I_DIRTY_PAGES was set */
if (dirty & ~I_DIRTY_PAGES) {
int err = write_inode(inode, wbc);
--
2.30.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2 01/12] fs: fix lazytime expiration handling in __writeback_single_inode()
2021-01-09 7:58 ` [PATCH v2 01/12] fs: fix lazytime expiration handling in __writeback_single_inode() Eric Biggers
@ 2021-01-11 10:48 ` Christoph Hellwig
2021-01-11 14:46 ` Jan Kara
1 sibling, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2021-01-11 10:48 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-fsdevel, linux-xfs, linux-ext4, linux-f2fs-devel,
Theodore Ts'o, Christoph Hellwig, stable, Jan Kara
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2 01/12] fs: fix lazytime expiration handling in __writeback_single_inode()
2021-01-09 7:58 ` [PATCH v2 01/12] fs: fix lazytime expiration handling in __writeback_single_inode() Eric Biggers
2021-01-11 10:48 ` Christoph Hellwig
@ 2021-01-11 14:46 ` Jan Kara
1 sibling, 0 replies; 3+ messages in thread
From: Jan Kara @ 2021-01-11 14:46 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-fsdevel, linux-xfs, linux-ext4, linux-f2fs-devel,
Theodore Ts'o, Christoph Hellwig, stable, Jan Kara
On Fri 08-01-21 23:58:52, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
>
> When lazytime is enabled and an inode is being written due to its
> in-memory updated timestamps having expired, either due to a sync() or
> syncfs() system call or due to dirtytime_expire_interval having elapsed,
> the VFS needs to inform the filesystem so that the filesystem can copy
> the inode's timestamps out to the on-disk data structures.
>
> This is done by __writeback_single_inode() calling
> mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC).
>
> However, this occurs after __writeback_single_inode() has already
> cleared the dirty flags from ->i_state. This causes two bugs:
>
> - mark_inode_dirty_sync() redirties the inode, causing it to remain
> dirty. This wastefully causes the inode to be written twice. But
> more importantly, it breaks cases where sync_filesystem() is expected
> to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY
> ioctl (as reported at
> https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well
> as possibly filesystem freezing (freeze_super()).
>
> - Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is
> called from __writeback_single_inode() for lazytime expiration,
> xfs_fs_dirty_inode() ignores the notification. (XFS only cares about
> lazytime expirations, and it assumes that I_DIRTY_TIME will contain
> i_state during those.) Therefore, lazy timestamps aren't persisted by
> sync(), syncfs(), or dirtytime_expire_interval on XFS.
>
> Fix this by moving the call to mark_inode_dirty_sync() to earlier in
> __writeback_single_inode(), before the dirty flags are cleared from
> i_state. This makes filesystems be properly notified of the timestamp
> expiration, and it avoids incorrectly redirtying the inode.
>
> This fixes xfstest generic/580 (which tests
> FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime
> enabled. It also fixes the new lazytime xfstest I've proposed, which
> reproduces the above-mentioned XFS bug
> (https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org).
>
> Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But
> due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the
> right thing to do because mark_inode_dirty_sync() now knows not to move
> the inode to a writeback list if it is currently queued for sync.
>
> Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
> Cc: stable@vger.kernel.org
> Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback")
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Eric Biggers <ebiggers@google.com>
Thanks for writing this fix! It looks good to me. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/fs-writeback.c | 24 +++++++++++++-----------
> 1 file changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index acfb55834af23..c41cb887eb7d3 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1474,21 +1474,25 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
> }
>
> /*
> - * Some filesystems may redirty the inode during the writeback
> - * due to delalloc, clear dirty metadata flags right before
> - * write_inode()
> + * If the inode has dirty timestamps and we need to write them, call
> + * mark_inode_dirty_sync() to notify the filesystem about it and to
> + * change I_DIRTY_TIME into I_DIRTY_SYNC.
> */
> - spin_lock(&inode->i_lock);
> -
> - dirty = inode->i_state & I_DIRTY;
> if ((inode->i_state & I_DIRTY_TIME) &&
> - ((dirty & I_DIRTY_INODE) ||
> - wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
> + (wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
> time_after(jiffies, inode->dirtied_time_when +
> dirtytime_expire_interval * HZ))) {
> - dirty |= I_DIRTY_TIME;
> trace_writeback_lazytime(inode);
> + mark_inode_dirty_sync(inode);
> }
> +
> + /*
> + * Some filesystems may redirty the inode during the writeback
> + * due to delalloc, clear dirty metadata flags right before
> + * write_inode()
> + */
> + spin_lock(&inode->i_lock);
> + dirty = inode->i_state & I_DIRTY;
> inode->i_state &= ~dirty;
>
> /*
> @@ -1509,8 +1513,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
>
> spin_unlock(&inode->i_lock);
>
> - if (dirty & I_DIRTY_TIME)
> - mark_inode_dirty_sync(inode);
> /* Don't write the inode if only I_DIRTY_PAGES was set */
> if (dirty & ~I_DIRTY_PAGES) {
> int err = write_inode(inode, wbc);
> --
> 2.30.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-01-11 14:46 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20210109075903.208222-1-ebiggers@kernel.org>
2021-01-09 7:58 ` [PATCH v2 01/12] fs: fix lazytime expiration handling in __writeback_single_inode() Eric Biggers
2021-01-11 10:48 ` Christoph Hellwig
2021-01-11 14:46 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox