* [PATCH] fs-writeback: do not requeue a clean inode having skipped pages
@ 2023-09-16 4:51 Chunhai Guo
2023-09-18 9:03 ` Jan Kara
2023-09-18 11:10 ` Christian Brauner
0 siblings, 2 replies; 3+ messages in thread
From: Chunhai Guo @ 2023-09-16 4:51 UTC (permalink / raw)
To: jack; +Cc: brauner, viro, linux-fsdevel, linux-kernel, Chunhai Guo
When writing back an inode and performing an fsync on it concurrently, a
deadlock issue may arise as shown below. In each writeback iteration, a
clean inode is requeued to the wb->b_dirty queue due to non-zero
pages_skipped, without anything actually being written. This causes an
infinite loop and prevents the plug from being flushed, resulting in a
deadlock. We now avoid requeuing the clean inode to prevent this issue.
wb_writeback fsync (inode-Y)
blk_start_plug(&plug)
for (;;) {
iter i-1: some reqs with page-X added into plug->mq_list // f2fs node page-X with PG_writeback
filemap_fdatawrite
__filemap_fdatawrite_range // write inode-Y with sync_mode WB_SYNC_ALL
do_writepages
f2fs_write_data_pages
__f2fs_write_data_pages // wb_sync_req[DATA]++ for WB_SYNC_ALL
f2fs_write_cache_pages
f2fs_write_single_data_page
f2fs_do_write_data_page
f2fs_outplace_write_data
f2fs_update_data_blkaddr
f2fs_wait_on_page_writeback
wait_on_page_writeback // wait for f2fs node page-X
iter i:
progress = __writeback_inodes_wb(wb, work)
. writeback_sb_inodes
. __writeback_single_inode // write inode-Y with sync_mode WB_SYNC_NONE
. . do_writepages
. . f2fs_write_data_pages
. . . __f2fs_write_data_pages // skip writepages due to (wb_sync_req[DATA]>0)
. . . wbc->pages_skipped += get_dirty_pages(inode) // wbc->pages_skipped = 1
. if (!(inode->i_state & I_DIRTY_ALL)) // i_state = I_SYNC | I_SYNC_QUEUED
. total_wrote++; // total_wrote = 1
. requeue_inode // requeue inode-Y to wb->b_dirty queue due to non-zero pages_skipped
if (progress) // progress = 1
continue;
iter i+1:
queue_io
// similar process with iter i, infinite for-loop !
}
blk_finish_plug(&plug) // flush plug won't be called
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
---
fs/fs-writeback.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 969ce991b0b0..c1af01b2c42d 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1535,10 +1535,15 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
if (wbc->pages_skipped) {
/*
- * writeback is not making progress due to locked
- * buffers. Skip this inode for now.
+ * Writeback is not making progress due to locked buffers.
+ * Skip this inode for now. Although having skipped pages
+ * is odd for clean inodes, it can happen for some
+ * filesystems so handle that gracefully.
*/
- redirty_tail_locked(inode, wb);
+ if (inode->i_state & I_DIRTY_ALL)
+ redirty_tail_locked(inode, wb);
+ else
+ inode_cgwb_move_to_attached(inode, wb);
return;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] fs-writeback: do not requeue a clean inode having skipped pages
2023-09-16 4:51 [PATCH] fs-writeback: do not requeue a clean inode having skipped pages Chunhai Guo
@ 2023-09-18 9:03 ` Jan Kara
2023-09-18 11:10 ` Christian Brauner
1 sibling, 0 replies; 3+ messages in thread
From: Jan Kara @ 2023-09-18 9:03 UTC (permalink / raw)
To: Chunhai Guo; +Cc: jack, brauner, viro, linux-fsdevel, linux-kernel
On Fri 15-09-23 22:51:31, Chunhai Guo wrote:
> When writing back an inode and performing an fsync on it concurrently, a
> deadlock issue may arise as shown below. In each writeback iteration, a
> clean inode is requeued to the wb->b_dirty queue due to non-zero
> pages_skipped, without anything actually being written. This causes an
> infinite loop and prevents the plug from being flushed, resulting in a
> deadlock. We now avoid requeuing the clean inode to prevent this issue.
>
> wb_writeback fsync (inode-Y)
> blk_start_plug(&plug)
> for (;;) {
> iter i-1: some reqs with page-X added into plug->mq_list // f2fs node page-X with PG_writeback
> filemap_fdatawrite
> __filemap_fdatawrite_range // write inode-Y with sync_mode WB_SYNC_ALL
> do_writepages
> f2fs_write_data_pages
> __f2fs_write_data_pages // wb_sync_req[DATA]++ for WB_SYNC_ALL
> f2fs_write_cache_pages
> f2fs_write_single_data_page
> f2fs_do_write_data_page
> f2fs_outplace_write_data
> f2fs_update_data_blkaddr
> f2fs_wait_on_page_writeback
> wait_on_page_writeback // wait for f2fs node page-X
> iter i:
> progress = __writeback_inodes_wb(wb, work)
> . writeback_sb_inodes
> . __writeback_single_inode // write inode-Y with sync_mode WB_SYNC_NONE
> . . do_writepages
> . . f2fs_write_data_pages
> . . . __f2fs_write_data_pages // skip writepages due to (wb_sync_req[DATA]>0)
> . . . wbc->pages_skipped += get_dirty_pages(inode) // wbc->pages_skipped = 1
> . if (!(inode->i_state & I_DIRTY_ALL)) // i_state = I_SYNC | I_SYNC_QUEUED
> . total_wrote++; // total_wrote = 1
> . requeue_inode // requeue inode-Y to wb->b_dirty queue due to non-zero pages_skipped
> if (progress) // progress = 1
> continue;
> iter i+1:
> queue_io
> // similar process with iter i, infinite for-loop !
> }
> blk_finish_plug(&plug) // flush plug won't be called
>
> Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 969ce991b0b0..c1af01b2c42d 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1535,10 +1535,15 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
>
> if (wbc->pages_skipped) {
> /*
> - * writeback is not making progress due to locked
> - * buffers. Skip this inode for now.
> + * Writeback is not making progress due to locked buffers.
> + * Skip this inode for now. Although having skipped pages
> + * is odd for clean inodes, it can happen for some
> + * filesystems so handle that gracefully.
> */
> - redirty_tail_locked(inode, wb);
> + if (inode->i_state & I_DIRTY_ALL)
> + redirty_tail_locked(inode, wb);
> + else
> + inode_cgwb_move_to_attached(inode, wb);
> return;
> }
>
> --
> 2.25.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] fs-writeback: do not requeue a clean inode having skipped pages
2023-09-16 4:51 [PATCH] fs-writeback: do not requeue a clean inode having skipped pages Chunhai Guo
2023-09-18 9:03 ` Jan Kara
@ 2023-09-18 11:10 ` Christian Brauner
1 sibling, 0 replies; 3+ messages in thread
From: Christian Brauner @ 2023-09-18 11:10 UTC (permalink / raw)
To: Chunhai Guo; +Cc: Christian Brauner, viro, linux-fsdevel, linux-kernel, jack
On Fri, 15 Sep 2023 22:51:31 -0600, Chunhai Guo wrote:
> When writing back an inode and performing an fsync on it concurrently, a
> deadlock issue may arise as shown below. In each writeback iteration, a
> clean inode is requeued to the wb->b_dirty queue due to non-zero
> pages_skipped, without anything actually being written. This causes an
> infinite loop and prevents the plug from being flushed, resulting in a
> deadlock. We now avoid requeuing the clean inode to prevent this issue.
>
> [...]
Applied to the vfs.misc branch of the vfs/vfs.git tree.
Patches in the vfs.misc branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.misc
[1/1] fs-writeback: do not requeue a clean inode having skipped pages
https://git.kernel.org/vfs/vfs/c/3d744ef7ea7a
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-09-18 11:12 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-16 4:51 [PATCH] fs-writeback: do not requeue a clean inode having skipped pages Chunhai Guo
2023-09-18 9:03 ` Jan Kara
2023-09-18 11:10 ` Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).