linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fs-writeback: do not requeue a clean inode having skipped pages
@ 2023-09-16  4:51 Chunhai Guo
  2023-09-18  9:03 ` Jan Kara
  2023-09-18 11:10 ` Christian Brauner
  0 siblings, 2 replies; 3+ messages in thread
From: Chunhai Guo @ 2023-09-16  4:51 UTC (permalink / raw)
  To: jack; +Cc: brauner, viro, linux-fsdevel, linux-kernel, Chunhai Guo

When writing back an inode and performing an fsync on it concurrently, a
deadlock issue may arise as shown below. In each writeback iteration, a
clean inode is requeued to the wb->b_dirty queue due to non-zero
pages_skipped, without anything actually being written. This causes an
infinite loop and prevents the plug from being flushed, resulting in a
deadlock. We now avoid requeuing the clean inode to prevent this issue.

    wb_writeback        fsync (inode-Y)
blk_start_plug(&plug)
for (;;) {
  iter i-1: some reqs with page-X added into plug->mq_list // f2fs node page-X with PG_writeback
                        filemap_fdatawrite
                          __filemap_fdatawrite_range // write inode-Y with sync_mode WB_SYNC_ALL
                           do_writepages
                            f2fs_write_data_pages
                             __f2fs_write_data_pages // wb_sync_req[DATA]++ for WB_SYNC_ALL
                              f2fs_write_cache_pages
                               f2fs_write_single_data_page
                                f2fs_do_write_data_page
                                 f2fs_outplace_write_data
                                  f2fs_update_data_blkaddr
                                   f2fs_wait_on_page_writeback
                                     wait_on_page_writeback // wait for f2fs node page-X
  iter i:
    progress = __writeback_inodes_wb(wb, work)
    . writeback_sb_inodes
    .   __writeback_single_inode // write inode-Y with sync_mode WB_SYNC_NONE
    .   . do_writepages
    .   .   f2fs_write_data_pages
    .   .   .  __f2fs_write_data_pages // skip writepages due to (wb_sync_req[DATA]>0)
    .   .   .   wbc->pages_skipped += get_dirty_pages(inode) // wbc->pages_skipped = 1
    .   if (!(inode->i_state & I_DIRTY_ALL)) // i_state = I_SYNC | I_SYNC_QUEUED
    .    total_wrote++;  // total_wrote = 1
    .   requeue_inode // requeue inode-Y to wb->b_dirty queue due to non-zero pages_skipped
    if (progress) // progress = 1
      continue;
  iter i+1:
      queue_io
      // similar process with iter i, infinite for-loop !
}
blk_finish_plug(&plug)   // flush plug won't be called

Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
---
 fs/fs-writeback.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 969ce991b0b0..c1af01b2c42d 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1535,10 +1535,15 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
 
 	if (wbc->pages_skipped) {
 		/*
-		 * writeback is not making progress due to locked
-		 * buffers. Skip this inode for now.
+		 * Writeback is not making progress due to locked buffers.
+		 * Skip this inode for now. Although having skipped pages
+		 * is odd for clean inodes, it can happen for some
+		 * filesystems so handle that gracefully.
 		 */
-		redirty_tail_locked(inode, wb);
+		if (inode->i_state & I_DIRTY_ALL)
+			redirty_tail_locked(inode, wb);
+		else
+			inode_cgwb_move_to_attached(inode, wb);
 		return;
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] fs-writeback: do not requeue a clean inode having skipped pages
  2023-09-16  4:51 [PATCH] fs-writeback: do not requeue a clean inode having skipped pages Chunhai Guo
@ 2023-09-18  9:03 ` Jan Kara
  2023-09-18 11:10 ` Christian Brauner
  1 sibling, 0 replies; 3+ messages in thread
From: Jan Kara @ 2023-09-18  9:03 UTC (permalink / raw)
  To: Chunhai Guo; +Cc: jack, brauner, viro, linux-fsdevel, linux-kernel

On Fri 15-09-23 22:51:31, Chunhai Guo wrote:
> When writing back an inode and performing an fsync on it concurrently, a
> deadlock issue may arise as shown below. In each writeback iteration, a
> clean inode is requeued to the wb->b_dirty queue due to non-zero
> pages_skipped, without anything actually being written. This causes an
> infinite loop and prevents the plug from being flushed, resulting in a
> deadlock. We now avoid requeuing the clean inode to prevent this issue.
> 
>     wb_writeback        fsync (inode-Y)
> blk_start_plug(&plug)
> for (;;) {
>   iter i-1: some reqs with page-X added into plug->mq_list // f2fs node page-X with PG_writeback
>                         filemap_fdatawrite
>                           __filemap_fdatawrite_range // write inode-Y with sync_mode WB_SYNC_ALL
>                            do_writepages
>                             f2fs_write_data_pages
>                              __f2fs_write_data_pages // wb_sync_req[DATA]++ for WB_SYNC_ALL
>                               f2fs_write_cache_pages
>                                f2fs_write_single_data_page
>                                 f2fs_do_write_data_page
>                                  f2fs_outplace_write_data
>                                   f2fs_update_data_blkaddr
>                                    f2fs_wait_on_page_writeback
>                                      wait_on_page_writeback // wait for f2fs node page-X
>   iter i:
>     progress = __writeback_inodes_wb(wb, work)
>     . writeback_sb_inodes
>     .   __writeback_single_inode // write inode-Y with sync_mode WB_SYNC_NONE
>     .   . do_writepages
>     .   .   f2fs_write_data_pages
>     .   .   .  __f2fs_write_data_pages // skip writepages due to (wb_sync_req[DATA]>0)
>     .   .   .   wbc->pages_skipped += get_dirty_pages(inode) // wbc->pages_skipped = 1
>     .   if (!(inode->i_state & I_DIRTY_ALL)) // i_state = I_SYNC | I_SYNC_QUEUED
>     .    total_wrote++;  // total_wrote = 1
>     .   requeue_inode // requeue inode-Y to wb->b_dirty queue due to non-zero pages_skipped
>     if (progress) // progress = 1
>       continue;
>   iter i+1:
>       queue_io
>       // similar process with iter i, infinite for-loop !
> }
> blk_finish_plug(&plug)   // flush plug won't be called
> 
> Signed-off-by: Chunhai Guo <guochunhai@vivo.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 969ce991b0b0..c1af01b2c42d 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1535,10 +1535,15 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
>  
>  	if (wbc->pages_skipped) {
>  		/*
> -		 * writeback is not making progress due to locked
> -		 * buffers. Skip this inode for now.
> +		 * Writeback is not making progress due to locked buffers.
> +		 * Skip this inode for now. Although having skipped pages
> +		 * is odd for clean inodes, it can happen for some
> +		 * filesystems so handle that gracefully.
>  		 */
> -		redirty_tail_locked(inode, wb);
> +		if (inode->i_state & I_DIRTY_ALL)
> +			redirty_tail_locked(inode, wb);
> +		else
> +			inode_cgwb_move_to_attached(inode, wb);
>  		return;
>  	}
>  
> -- 
> 2.25.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] fs-writeback: do not requeue a clean inode having skipped pages
  2023-09-16  4:51 [PATCH] fs-writeback: do not requeue a clean inode having skipped pages Chunhai Guo
  2023-09-18  9:03 ` Jan Kara
@ 2023-09-18 11:10 ` Christian Brauner
  1 sibling, 0 replies; 3+ messages in thread
From: Christian Brauner @ 2023-09-18 11:10 UTC (permalink / raw)
  To: Chunhai Guo; +Cc: Christian Brauner, viro, linux-fsdevel, linux-kernel, jack

On Fri, 15 Sep 2023 22:51:31 -0600, Chunhai Guo wrote:
> When writing back an inode and performing an fsync on it concurrently, a
> deadlock issue may arise as shown below. In each writeback iteration, a
> clean inode is requeued to the wb->b_dirty queue due to non-zero
> pages_skipped, without anything actually being written. This causes an
> infinite loop and prevents the plug from being flushed, resulting in a
> deadlock. We now avoid requeuing the clean inode to prevent this issue.
> 
> [...]

Applied to the vfs.misc branch of the vfs/vfs.git tree.
Patches in the vfs.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.misc

[1/1] fs-writeback: do not requeue a clean inode having skipped pages
      https://git.kernel.org/vfs/vfs/c/3d744ef7ea7a

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-09-18 11:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-16  4:51 [PATCH] fs-writeback: do not requeue a clean inode having skipped pages Chunhai Guo
2023-09-18  9:03 ` Jan Kara
2023-09-18 11:10 ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).