Re: [f2fs-dev] [PATCH] f2fs: avoid deadlock in gc thread under low memory

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Wu Yan <wu-yan@tcl.com>
Cc: tang.ding@tcl.com, linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net
Subject: Re: [f2fs-dev] [PATCH] f2fs: avoid deadlock in gc thread under low memory
Date: Wed, 13 Apr 2022 19:18:04 -0700	[thread overview]
Message-ID: <YleEXOHl1Vhlr3x3@google.com> (raw)
In-Reply-To: <39c4ded0-09c0-3e38-85cb-5535099b177d@tcl.com>

On 04/14, Wu Yan wrote:
> On 4/14/22 01:00, Jaegeuk Kim wrote:
> > On 04/13, Rokudo Yan wrote:
> > > There is a potential deadlock in gc thread may happen
> > > under low memory as below:
> > > 
> > > gc_thread_func
> > >   -f2fs_gc
> > >    -do_garbage_collect
> > >     -gc_data_segment
> > >      -move_data_block
> > >       -set_page_writeback(fio.encrypted_page);
> > >       -f2fs_submit_page_write
> > > as f2fs_submit_page_write try to do io merge when possible, so the
> > > encrypted_page is marked PG_writeback but may not submit to block
> > > layer immediately, if system enter low memory when gc thread try
> > > to move next data block, it may do direct reclaim and enter fs layer
> > > as below:
> > >     -move_data_block
> > >      -f2fs_grab_cache_page(index=?, for_write=false)
> > >       -grab_cache_page
> > >        -find_or_create_page
> > >         -pagecache_get_page
> > >          -__page_cache_alloc --  __GFP_FS is set
> > >           -alloc_pages_node
> > >            -__alloc_pages
> > >             -__alloc_pages_slowpath
> > >              -__alloc_pages_direct_reclaim
> > >               -__perform_reclaim
> > >                -try_to_free_pages
> > >                 -do_try_to_free_pages
> > >                  -shrink_zones
> > >                   -mem_cgroup_soft_limit_reclaim
> > >                    -mem_cgroup_soft_reclaim
> > >                     -mem_cgroup_shrink_node
> > >                      -shrink_node_memcg
> > >                       -shrink_list
> > >                        -shrink_inactive_list
> > >                         -shrink_page_list
> > >                          -wait_on_page_writeback -- the page is marked
> > >                         writeback during previous move_data_block call
> > > 
> > > the gc thread wait for the encrypted_page writeback complete,
> > > but as gc thread held sbi->gc_lock, the writeback & sync thread
> > > may blocked waiting for sbi->gc_lock, so the bio contain the
> > > encrypted_page may nerver submit to block layer and complete the
> > > writeback, which cause deadlock. To avoid this deadlock condition,
> > > we mark the gc thread with PF_MEMALLOC_NOFS flag, then it will nerver
> > > enter fs layer when try to alloc cache page during move_data_block.
> > > 
> > > Signed-off-by: Rokudo Yan <wu-yan@tcl.com>
> > > ---
> > >   fs/f2fs/gc.c | 6 ++++++
> > >   1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > > index e020804f7b07..cc71f77b98c8 100644
> > > --- a/fs/f2fs/gc.c
> > > +++ b/fs/f2fs/gc.c
> > > @@ -38,6 +38,12 @@ static int gc_thread_func(void *data)
> > >   	wait_ms = gc_th->min_sleep_time;
> > > +	/*
> > > +	 * Make sure that no allocations from gc thread will ever
> > > +	 * recurse to the fs layer to avoid deadlock as it will
> > > +	 * hold sbi->gc_lock during garbage collection
> > > +	 */
> > > +	memalloc_nofs_save();
> > 
> > I think this cannot cover all the f2fs_gc() call cases. Can we just avoid by:
> > 
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -1233,7 +1233,7 @@ static int move_data_block(struct inode *inode, block_t bidx,
> >                                  CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA;
> > 
> >          /* do not read out */
> > -       page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
> > +       page = f2fs_grab_cache_page(inode->i_mapping, bidx, true);
> >          if (!page)
> >                  return -ENOMEM;
> > 
> > Thanks,
> > 
> > >   	set_freezable();
> > >   	do {
> > >   		bool sync_mode, foreground = false;
> > > -- 
> > > 2.25.1
> 
> Hi, Jaegeuk
> 
> I'm not sure if any other case may trigger the issue, but the stack traces I
> have caught so far are all the same as below:
> 
> f2fs_gc-253:12  D 226966.808196 572 302561 150976 0x1200840 0x0 572
> 237207473347056
> <ffffff889d88668c> __switch_to+0x134/0x150
> <ffffff889e764b6c> __schedule+0xd5c/0x1100
> <ffffff889e76554c> io_schedule+0x90/0xc0
> <ffffff889d9fb880> wait_on_page_bit+0x194/0x208
> <ffffff889da167b4> shrink_page_list+0x62c/0xe74
> <ffffff889da1d354> shrink_inactive_list+0x2c0/0x698
> <ffffff889da181f4> shrink_node_memcg+0x3dc/0x97c
> <ffffff889da17d44> mem_cgroup_shrink_node+0x144/0x218
> <ffffff889da6610c> mem_cgroup_soft_limit_reclaim+0x188/0x47c
> <ffffff889da17a40> do_try_to_free_pages+0x204/0x3a0
> <ffffff889da176c8> try_to_free_pages+0x35c/0x4d0
> <ffffff889da05d60> __alloc_pages_nodemask+0x7a4/0x10d0
> <ffffff889d9fc82c> pagecache_get_page+0x184/0x2ec

Is this deadlock trying to grab a lock, instead of waiting for writeback?
Could you share all the backtraces of the tasks?

For writeback above, looking at the code, f2fs_gc uses three mappings, meta,
node, and data, and meta/node inodes are masking GFP_NOFS in f2fs_iget(),
while data inode does not. So, the above f2fs_grab_cache_page() in
move_data_block() is actually calling w/o NOFS.

> <ffffff889dbf8860> do_garbage_collect+0xfe0/0x2828
> <ffffff889dbf7434> f2fs_gc+0x4a0/0x8ec
> <ffffff889dbf6bf4> gc_thread_func+0x240/0x4d4
> <ffffff889d8de9b0> kthread+0x17c/0x18c
> <ffffff889d88567c> ret_from_fork+0x10/0x18
> 
> Thanks
> yanwu


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

WARNING: multiple messages have this Message-ID (diff)

From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Wu Yan <wu-yan@tcl.com>
Cc: linux-f2fs-devel@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, tang.ding@tcl.com
Subject: Re: [PATCH] f2fs: avoid deadlock in gc thread under low memory
Date: Wed, 13 Apr 2022 19:18:04 -0700	[thread overview]
Message-ID: <YleEXOHl1Vhlr3x3@google.com> (raw)
In-Reply-To: <39c4ded0-09c0-3e38-85cb-5535099b177d@tcl.com>

On 04/14, Wu Yan wrote:
> On 4/14/22 01:00, Jaegeuk Kim wrote:
> > On 04/13, Rokudo Yan wrote:
> > > There is a potential deadlock in gc thread may happen
> > > under low memory as below:
> > > 
> > > gc_thread_func
> > >   -f2fs_gc
> > >    -do_garbage_collect
> > >     -gc_data_segment
> > >      -move_data_block
> > >       -set_page_writeback(fio.encrypted_page);
> > >       -f2fs_submit_page_write
> > > as f2fs_submit_page_write try to do io merge when possible, so the
> > > encrypted_page is marked PG_writeback but may not submit to block
> > > layer immediately, if system enter low memory when gc thread try
> > > to move next data block, it may do direct reclaim and enter fs layer
> > > as below:
> > >     -move_data_block
> > >      -f2fs_grab_cache_page(index=?, for_write=false)
> > >       -grab_cache_page
> > >        -find_or_create_page
> > >         -pagecache_get_page
> > >          -__page_cache_alloc --  __GFP_FS is set
> > >           -alloc_pages_node
> > >            -__alloc_pages
> > >             -__alloc_pages_slowpath
> > >              -__alloc_pages_direct_reclaim
> > >               -__perform_reclaim
> > >                -try_to_free_pages
> > >                 -do_try_to_free_pages
> > >                  -shrink_zones
> > >                   -mem_cgroup_soft_limit_reclaim
> > >                    -mem_cgroup_soft_reclaim
> > >                     -mem_cgroup_shrink_node
> > >                      -shrink_node_memcg
> > >                       -shrink_list
> > >                        -shrink_inactive_list
> > >                         -shrink_page_list
> > >                          -wait_on_page_writeback -- the page is marked
> > >                         writeback during previous move_data_block call
> > > 
> > > the gc thread wait for the encrypted_page writeback complete,
> > > but as gc thread held sbi->gc_lock, the writeback & sync thread
> > > may blocked waiting for sbi->gc_lock, so the bio contain the
> > > encrypted_page may nerver submit to block layer and complete the
> > > writeback, which cause deadlock. To avoid this deadlock condition,
> > > we mark the gc thread with PF_MEMALLOC_NOFS flag, then it will nerver
> > > enter fs layer when try to alloc cache page during move_data_block.
> > > 
> > > Signed-off-by: Rokudo Yan <wu-yan@tcl.com>
> > > ---
> > >   fs/f2fs/gc.c | 6 ++++++
> > >   1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > > index e020804f7b07..cc71f77b98c8 100644
> > > --- a/fs/f2fs/gc.c
> > > +++ b/fs/f2fs/gc.c
> > > @@ -38,6 +38,12 @@ static int gc_thread_func(void *data)
> > >   	wait_ms = gc_th->min_sleep_time;
> > > +	/*
> > > +	 * Make sure that no allocations from gc thread will ever
> > > +	 * recurse to the fs layer to avoid deadlock as it will
> > > +	 * hold sbi->gc_lock during garbage collection
> > > +	 */
> > > +	memalloc_nofs_save();
> > 
> > I think this cannot cover all the f2fs_gc() call cases. Can we just avoid by:
> > 
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -1233,7 +1233,7 @@ static int move_data_block(struct inode *inode, block_t bidx,
> >                                  CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA;
> > 
> >          /* do not read out */
> > -       page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
> > +       page = f2fs_grab_cache_page(inode->i_mapping, bidx, true);
> >          if (!page)
> >                  return -ENOMEM;
> > 
> > Thanks,
> > 
> > >   	set_freezable();
> > >   	do {
> > >   		bool sync_mode, foreground = false;
> > > -- 
> > > 2.25.1
> 
> Hi, Jaegeuk
> 
> I'm not sure if any other case may trigger the issue, but the stack traces I
> have caught so far are all the same as below:
> 
> f2fs_gc-253:12  D 226966.808196 572 302561 150976 0x1200840 0x0 572
> 237207473347056
> <ffffff889d88668c> __switch_to+0x134/0x150
> <ffffff889e764b6c> __schedule+0xd5c/0x1100
> <ffffff889e76554c> io_schedule+0x90/0xc0
> <ffffff889d9fb880> wait_on_page_bit+0x194/0x208
> <ffffff889da167b4> shrink_page_list+0x62c/0xe74
> <ffffff889da1d354> shrink_inactive_list+0x2c0/0x698
> <ffffff889da181f4> shrink_node_memcg+0x3dc/0x97c
> <ffffff889da17d44> mem_cgroup_shrink_node+0x144/0x218
> <ffffff889da6610c> mem_cgroup_soft_limit_reclaim+0x188/0x47c
> <ffffff889da17a40> do_try_to_free_pages+0x204/0x3a0
> <ffffff889da176c8> try_to_free_pages+0x35c/0x4d0
> <ffffff889da05d60> __alloc_pages_nodemask+0x7a4/0x10d0
> <ffffff889d9fc82c> pagecache_get_page+0x184/0x2ec

Is this deadlock trying to grab a lock, instead of waiting for writeback?
Could you share all the backtraces of the tasks?

For writeback above, looking at the code, f2fs_gc uses three mappings, meta,
node, and data, and meta/node inodes are masking GFP_NOFS in f2fs_iget(),
while data inode does not. So, the above f2fs_grab_cache_page() in
move_data_block() is actually calling w/o NOFS.

> <ffffff889dbf8860> do_garbage_collect+0xfe0/0x2828
> <ffffff889dbf7434> f2fs_gc+0x4a0/0x8ec
> <ffffff889dbf6bf4> gc_thread_func+0x240/0x4d4
> <ffffff889d8de9b0> kthread+0x17c/0x18c
> <ffffff889d88567c> ret_from_fork+0x10/0x18
> 
> Thanks
> yanwu

next prev parent reply	other threads:[~2022-04-14  2:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13  8:44 [f2fs-dev] [PATCH] f2fs: avoid deadlock in gc thread under low memory Rokudo Yan
2022-04-13  8:44 ` Rokudo Yan
2022-04-13  9:46 ` [f2fs-dev] " Michal Hocko via Linux-f2fs-devel
2022-04-13  9:46   ` Michal Hocko
2022-04-13 11:20   ` [f2fs-dev] " Wu Yan
2022-04-13 11:20     ` Wu Yan
2022-04-13 11:35     ` [f2fs-dev] " Michal Hocko via Linux-f2fs-devel
2022-04-13 11:35       ` Michal Hocko
2022-04-13 17:00 ` [f2fs-dev] " Jaegeuk Kim
2022-04-13 17:00   ` Jaegeuk Kim
2022-04-14  1:54   ` [f2fs-dev] " Wu Yan
2022-04-14  1:54     ` Wu Yan
2022-04-14  2:18     ` Jaegeuk Kim [this message]
2022-04-14  2:18       ` Jaegeuk Kim
2022-04-14  2:27       ` [f2fs-dev] " Wu Yan
2022-04-14  2:27         ` Wu Yan
2022-04-14  2:42         ` [f2fs-dev] " Jaegeuk Kim
2022-04-14  2:42           ` Jaegeuk Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YleEXOHl1Vhlr3x3@google.com \
    --to=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tang.ding@tcl.com \
    --cc=wu-yan@tcl.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.