All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sahitya Tummala <stummala@codeaurora.org>
To: Chao Yu <yuchao0@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>, linux-f2fs-devel@lists.sourceforge.net
Subject: Re: [f2fs-dev] IO hang due to f2fs checkpoint and writeback stuck
Date: Fri, 10 Jul 2020 15:37:52 +0530	[thread overview]
Message-ID: <20200710100752.GF2916@codeaurora.org> (raw)
In-Reply-To: <1de32d22-caa3-c69c-66c8-e3a8d920ca6c@huawei.com>

On Fri, Jul 10, 2020 at 04:40:19PM +0800, Chao Yu wrote:
> On 2020/7/10 11:39, Sahitya Tummala wrote:
> > Hi Chao,
> > 
> > On Fri, Jul 10, 2020 at 10:54:13AM +0800, Chao Yu wrote:
> >> Hi Sahitya,
> >>
> >> It looks block plug has already been removed by Jaegeuk with
> >> below commit:
> >>
> >> commit 1f5f11a3c41e2b23288b2769435a00f74e02496b
> >> Author: Jaegeuk Kim <jaegeuk@kernel.org>
> >> Date:   Fri May 8 12:25:45 2020 -0700
> >>
> >>     f2fs: remove blk_plugging in block_operations
> >>
> >>     blk_plugging doesn't seem to give any benefit.
> >>
> >> How about backporting this patch?
> > 
> > Yes, I have noticed that patch. But we have nested pluglists in
> > the block_operations path. Hence, I was not sure if that patch alone
> > can help.
> > 1. At the start of  block_operations
> > 2. Inside __f2fs_write_data_pages() that gets called from
> > f2fs_sync_dirty_inodes()->filemap_fdatawrite()
> 
> Would be more safe to call io_schedule() in f2fs_sync_dirty_inodes()?
> 
> - io_schedule()
>  - schedule()
>   - sched_submit_work()
>    - blk_schedule_flush_plug()
>     - blk_flush_plug_list()
> 

With Jaegeuk's patch, we will only have the plug list in CP path - inside
__f2fs_write_data_pages(), which can now be flushed as its not nested
anymore.

        blk_start_plug(&plug);
        ret = f2fs_write_cache_pages(mapping, wbc, io_type);
        blk_finish_plug(&plug);

Earlier this blk_finish_plug() will not flush the plug list as the bios
are attached to the outer plug from block_operations. So I think Jaegeuk's
patch alone also can help this issue.

> > 
> > Do you know the possible path for this issue scenario to happen?> Where does in the CP path before even f2fs_sync_node_pages() is done, the
> > node pages cab be submitted for io and get attached to CP plug list?
> 
> Maybe:
> 
> writeback_inodes_wb
>  blk_start_plug                     ---plugged
>   __writeback_inodes_wb
>    writeback_sb_inodes
>     __writeback_single_inode
>      do_writepages
>       f2fs_write_node_pages
>        blk_start_plug               ---plugged
>        f2fs_sync_node_pages
>         __write_node_page(do_balance=true)  --submit node page to plug
>          f2fs_balance_fs
>           f2fs_balance_fs_bg
>            f2fs_sync_fs
>             f2fs_write_checkpoint
>           or
>           f2fs_gc
>            f2fs_write_checkpoint
>        blk_start_plug
>  blk_finish_plug
> 

Hmmm, but from ramdumps the CP thread stack is shown as below.

f2fs_sync_file
f2fs_do_sync_file
f2fs_sync_fs
f2fs_write_checkpoint

Thanks,

> Thanks,
> 
> > 
> > Thanks,
> > 
> >>
> >> On 2020/7/10 10:30, Sahitya Tummala wrote:
> >>> Hi Chao, Jaegeuk,
> >>>
> >>> I have received an issue report that indicates that system is stuck
> >>> on IO due to f2fs checkpoint and writeback stuck waiting on each other
> >>> as explained below.
> >>>
> >>> WB thread -
> >>> ----------
> >>>
> >>> io_schedule
> >>> wait_on_page_bit
> >>> f2fs_wait_on_page_writeback -> It is waiting for node
> >>> 			node page writeback whose bio is in the
> >>> 			plug list of CP thread below.
> >>> f2fs_update_data_blkaddr
> >>> f2fs_outplace_write_data
> >>> f2fs_do_write_data_page
> >>> __write_data_page
> >>> __f2fs_write_data_pages
> >>> f2fs_write_data_pages
> >>> do_writepages
> >>>
> >>> CP thread -
> >>> -----------
> >>>
> >>> __f2fs_write_data_pages -> It is for the same inode above that is under WB (which
> >>> 	is waiting for node page writeback). In this context, there is nothing to
> >>> 	be written as the data is already under WB. 
> >>> filemap_fdatawrite
> >>> f2fs_sync_dirty_inodes -> It just loops here in f2fs_sync_dirty_inodes() until
> >>> 			f2fs_remove_dirty_inode() has been done by the WB thread above.
> >>> block_operations
> >>> f2fs_write_checkpoint
> >>>
> >>> The CP thread somehow has the node page bio in its plug list that cannot be submitted 
> >>> until end of block_operations() and CP thread is blocked on WB of an inode who is again
> >>> waiting for io pending in CP plug list. Both the stacks are stuck on for each other.
> >>>
> >>> The below patch helped to solve the issue, please review and suggest if this seems to 
> >>> be okay. Since anyways we are doing cond_resched(), I thought it will be good to flush
> >>> the plug list as well (in this issue case, it will loop for the same inode again and again).
> >>>
> >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> >>> index e460d90..152df48 100644
> >>> --- a/fs/f2fs/checkpoint.c
> >>> +++ b/fs/f2fs/checkpoint.c
> >>> @@ -1071,10 +1071,12 @@ int f2fs_sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type)
> >>>
> >>>                 iput(inode);
> >>>                 /* We need to give cpu to another writers. */
> >>> -               if (ino == cur_ino)
> >>> +               if (ino == cur_ino) {
> >>> +                       blk_flush_plug(current);
> >>>                         cond_resched();
> >>> -               else
> >>> +                } else {
> >>>                         ino = cur_ino;
> >>> +                }
> >>>         } else {
> >>>                 /*
> >>>                  * We should submit bio, since it exists several
> >>>
> >>> Thanks,
> >>>
> > 

-- 
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

  reply	other threads:[~2020-07-10 10:08 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-10  2:30 [f2fs-dev] IO hang due to f2fs checkpoint and writeback stuck Sahitya Tummala
2020-07-10  2:54 ` Chao Yu
2020-07-10  3:02   ` Gao Xiang via Linux-f2fs-devel
2020-07-10  7:33     ` Chao Yu
2020-07-10  3:39   ` Sahitya Tummala
2020-07-10  8:40     ` Chao Yu
2020-07-10 10:07       ` Sahitya Tummala [this message]
2020-07-14 12:13         ` Chao Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200710100752.GF2916@codeaurora.org \
    --to=stummala@codeaurora.org \
    --cc=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=yuchao0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.