Re: [f2fs-dev] [PATCH] f2fs: Fix deadlock under storage almost full/dirty condition

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sahitya Tummala <stummala@codeaurora.org>
To: Chao Yu <yuchao0@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>,
	linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net
Subject: Re: [f2fs-dev] [PATCH] f2fs: Fix deadlock under storage almost full/dirty condition
Date: Mon, 11 Nov 2019 12:14:41 +0530	[thread overview]
Message-ID: <20191111064441.GC15669@codeaurora.org> (raw)
In-Reply-To: <9ece86fd-ff53-3a70-627e-c6acb03b9264@huawei.com>

Hi Chao,

On Mon, Nov 11, 2019 at 02:28:47PM +0800, Chao Yu wrote:
> Hi Sahitya,
> 
> On 2019/11/11 11:40, Sahitya Tummala wrote:
> > Hi Chao,
> > 
> > On Mon, Nov 11, 2019 at 10:51:10AM +0800, Chao Yu wrote:
> >> On 2019/11/8 19:03, Sahitya Tummala wrote:
> >>> There could be a potential deadlock when the storage capacity
> >>> is almost full and theren't enough free segments available, due
> >>> to which FG_GC is needed in the atomic commit ioctl as shown in
> >>> the below callstack -
> >>>
> >>> schedule_timeout
> >>> io_schedule_timeout
> >>> congestion_wait
> >>> f2fs_drop_inmem_pages_all
> >>> f2fs_gc
> >>> f2fs_balance_fs
> >>> __write_node_page
> >>> f2fs_fsync_node_pages
> >>> f2fs_do_sync_file
> >>> f2fs_ioctl
> >>>
> >>> If this inode doesn't have i_gc_failures[GC_FAILURE_ATOMIC] set,
> >>> then it waits forever in f2fs_drop_inmem_pages_all(), for this
> >>> atomic inode to be dropped. And the rest of the system is stuck
> >>> waiting for sbi->gc_mutex lock, which is acquired by f2fs_balance_fs()
> >>> in the stack above.
> >>
> >> I think the root cause of this issue is there is potential infinite loop in
> >> f2fs_drop_inmem_pages_all() for the case of gc_failure is true, because once the
> >> first inode in inode_list[ATOMIC_FILE] list didn't suffer gc failure, we will
> >> skip dropping its in-memory cache and calling iput(), and traverse the list
> >> again, most possibly there is the same inode in the head of that list.
> >>
> > 
> > I thought we are expecting for those atomic updates (without any gc failures) to be
> > committed by doing congestion_wait() and thus retrying again. Hence, I just
> 
> Nope, we only need to drop inode which encounter gc failures, and keep the rest
> inodes.
> 
> > fixed only if we are ending up waiting for commit to happen in the atomic
> > commit path itself, which will be a deadlock.
> 
> Look into call stack you provide, I don't think it's correct to drop such inode,
> as its dirty pages should be committed before f2fs_fsync_node_pages(), so
> calling f2fs_drop_inmem_pages won't release any inmem pages, and won't help
> looped GC caused by skipping due to inmem pages.
> 
> And then I figure out below fix...
> 

Thanks for the explanation.
The fix below looks good to me.

Thanks,
Sahitya.

> > 
> >> Could you please check below fix:
> >>
> >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >> index 7bf7b0194944..8a3a35b42a37 100644
> >> --- a/fs/f2fs/f2fs.h
> >> +++ b/fs/f2fs/f2fs.h
> >> @@ -1395,6 +1395,7 @@ struct f2fs_sb_info {
> >>  	unsigned int gc_mode;			/* current GC state */
> >>  	unsigned int next_victim_seg[2];	/* next segment in victim section */
> >>  	/* for skip statistic */
> >> +	unsigned int atomic_files;		/* # of opened atomic file */
> >>  	unsigned long long skipped_atomic_files[2];	/* FG_GC and BG_GC */
> >>  	unsigned long long skipped_gc_rwsem;		/* FG_GC only */
> >>
> >> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> >> index ecd063239642..79f4b348951a 100644
> >> --- a/fs/f2fs/file.c
> >> +++ b/fs/f2fs/file.c
> >> @@ -2047,6 +2047,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
> >>  	spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
> >>  	if (list_empty(&fi->inmem_ilist))
> >>  		list_add_tail(&fi->inmem_ilist, &sbi->inode_list[ATOMIC_FILE]);
> >> +	sbi->atomic_files++;
> >>  	spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
> >>
> >>  	/* add inode in inmem_list first and set atomic_file */
> >> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> >> index 8b977bbd6822..6aa0bb693697 100644
> >> --- a/fs/f2fs/segment.c
> >> +++ b/fs/f2fs/segment.c
> >> @@ -288,6 +288,8 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi,
> >> bool gc_failure)
> >>  	struct list_head *head = &sbi->inode_list[ATOMIC_FILE];
> >>  	struct inode *inode;
> >>  	struct f2fs_inode_info *fi;
> >> +	unsigned int count = sbi->atomic_files;
> > 
> > If the sbi->atomic_files decrements just after this, then the below exit condition
> > may not work. In that case, looped will never be >= count.
> > 
> >> +	unsigned int looped = 0;
> >>  next:
> >>  	spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
> >>  	if (list_empty(head)) {
> >> @@ -296,22 +298,29 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi,
> >> bool gc_failure)
> >>  	}
> >>  	fi = list_first_entry(head, struct f2fs_inode_info, inmem_ilist);
> >>  	inode = igrab(&fi->vfs_inode);
> >> +	if (inode)
> >> +		list_move_tail(&fi->inmem_ilist, head);
> >>  	spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
> >>
> >>  	if (inode) {
> >>  		if (gc_failure) {
> >> -			if (fi->i_gc_failures[GC_FAILURE_ATOMIC])
> >> -				goto drop;
> >> -			goto skip;
> >> +			if (!fi->i_gc_failures[GC_FAILURE_ATOMIC])
> >> +				goto skip;
> >>  		}
> >> -drop:
> >>  		set_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
> >>  		f2fs_drop_inmem_pages(inode);
> >> +skip:
> >>  		iput(inode);
> > 
> > Does this result into f2fs_evict_inode() in this context for this inode?
> 
> Yup, we need to call igrab/iput in pair in f2fs_drop_inmem_pages_all() anyway.
> 
> Previously, we may have .i_count leak...
> 
> Thanks,
> 
> > 
> > thanks,
> > 
> >>  	}
> >> -skip:
> >> +
> >>  	congestion_wait(BLK_RW_ASYNC, HZ/50);
> >>  	cond_resched();
> >> +
> >> +	if (gc_failure) {
> >> +		if (++looped >= count)
> >> +			return;
> >> +	}
> >> +
> >>  	goto next;
> >>  }
> >>
> >> @@ -334,6 +343,7 @@ void f2fs_drop_inmem_pages(struct inode *inode)
> >>  	spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
> >>  	if (!list_empty(&fi->inmem_ilist))
> >>  		list_del_init(&fi->inmem_ilist);
> >> +	sbi->atomic_files--;
> >>  	spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
> >>  }
> >>
> >> Thanks,
> > 

-- 
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

WARNING: multiple messages have this Message-ID (diff)

From: Sahitya Tummala <stummala@codeaurora.org>
To: Chao Yu <yuchao0@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] f2fs: Fix deadlock under storage almost full/dirty condition
Date: Mon, 11 Nov 2019 12:14:41 +0530	[thread overview]
Message-ID: <20191111064441.GC15669@codeaurora.org> (raw)
In-Reply-To: <9ece86fd-ff53-3a70-627e-c6acb03b9264@huawei.com>

Hi Chao,

On Mon, Nov 11, 2019 at 02:28:47PM +0800, Chao Yu wrote:
> Hi Sahitya,
> 
> On 2019/11/11 11:40, Sahitya Tummala wrote:
> > Hi Chao,
> > 
> > On Mon, Nov 11, 2019 at 10:51:10AM +0800, Chao Yu wrote:
> >> On 2019/11/8 19:03, Sahitya Tummala wrote:
> >>> There could be a potential deadlock when the storage capacity
> >>> is almost full and theren't enough free segments available, due
> >>> to which FG_GC is needed in the atomic commit ioctl as shown in
> >>> the below callstack -
> >>>
> >>> schedule_timeout
> >>> io_schedule_timeout
> >>> congestion_wait
> >>> f2fs_drop_inmem_pages_all
> >>> f2fs_gc
> >>> f2fs_balance_fs
> >>> __write_node_page
> >>> f2fs_fsync_node_pages
> >>> f2fs_do_sync_file
> >>> f2fs_ioctl
> >>>
> >>> If this inode doesn't have i_gc_failures[GC_FAILURE_ATOMIC] set,
> >>> then it waits forever in f2fs_drop_inmem_pages_all(), for this
> >>> atomic inode to be dropped. And the rest of the system is stuck
> >>> waiting for sbi->gc_mutex lock, which is acquired by f2fs_balance_fs()
> >>> in the stack above.
> >>
> >> I think the root cause of this issue is there is potential infinite loop in
> >> f2fs_drop_inmem_pages_all() for the case of gc_failure is true, because once the
> >> first inode in inode_list[ATOMIC_FILE] list didn't suffer gc failure, we will
> >> skip dropping its in-memory cache and calling iput(), and traverse the list
> >> again, most possibly there is the same inode in the head of that list.
> >>
> > 
> > I thought we are expecting for those atomic updates (without any gc failures) to be
> > committed by doing congestion_wait() and thus retrying again. Hence, I just
> 
> Nope, we only need to drop inode which encounter gc failures, and keep the rest
> inodes.
> 
> > fixed only if we are ending up waiting for commit to happen in the atomic
> > commit path itself, which will be a deadlock.
> 
> Look into call stack you provide, I don't think it's correct to drop such inode,
> as its dirty pages should be committed before f2fs_fsync_node_pages(), so
> calling f2fs_drop_inmem_pages won't release any inmem pages, and won't help
> looped GC caused by skipping due to inmem pages.
> 
> And then I figure out below fix...
> 

Thanks for the explanation.
The fix below looks good to me.

Thanks,
Sahitya.

> > 
> >> Could you please check below fix:
> >>
> >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >> index 7bf7b0194944..8a3a35b42a37 100644
> >> --- a/fs/f2fs/f2fs.h
> >> +++ b/fs/f2fs/f2fs.h
> >> @@ -1395,6 +1395,7 @@ struct f2fs_sb_info {
> >>  	unsigned int gc_mode;			/* current GC state */
> >>  	unsigned int next_victim_seg[2];	/* next segment in victim section */
> >>  	/* for skip statistic */
> >> +	unsigned int atomic_files;		/* # of opened atomic file */
> >>  	unsigned long long skipped_atomic_files[2];	/* FG_GC and BG_GC */
> >>  	unsigned long long skipped_gc_rwsem;		/* FG_GC only */
> >>
> >> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> >> index ecd063239642..79f4b348951a 100644
> >> --- a/fs/f2fs/file.c
> >> +++ b/fs/f2fs/file.c
> >> @@ -2047,6 +2047,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
> >>  	spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
> >>  	if (list_empty(&fi->inmem_ilist))
> >>  		list_add_tail(&fi->inmem_ilist, &sbi->inode_list[ATOMIC_FILE]);
> >> +	sbi->atomic_files++;
> >>  	spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
> >>
> >>  	/* add inode in inmem_list first and set atomic_file */
> >> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> >> index 8b977bbd6822..6aa0bb693697 100644
> >> --- a/fs/f2fs/segment.c
> >> +++ b/fs/f2fs/segment.c
> >> @@ -288,6 +288,8 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi,
> >> bool gc_failure)
> >>  	struct list_head *head = &sbi->inode_list[ATOMIC_FILE];
> >>  	struct inode *inode;
> >>  	struct f2fs_inode_info *fi;
> >> +	unsigned int count = sbi->atomic_files;
> > 
> > If the sbi->atomic_files decrements just after this, then the below exit condition
> > may not work. In that case, looped will never be >= count.
> > 
> >> +	unsigned int looped = 0;
> >>  next:
> >>  	spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
> >>  	if (list_empty(head)) {
> >> @@ -296,22 +298,29 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi,
> >> bool gc_failure)
> >>  	}
> >>  	fi = list_first_entry(head, struct f2fs_inode_info, inmem_ilist);
> >>  	inode = igrab(&fi->vfs_inode);
> >> +	if (inode)
> >> +		list_move_tail(&fi->inmem_ilist, head);
> >>  	spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
> >>
> >>  	if (inode) {
> >>  		if (gc_failure) {
> >> -			if (fi->i_gc_failures[GC_FAILURE_ATOMIC])
> >> -				goto drop;
> >> -			goto skip;
> >> +			if (!fi->i_gc_failures[GC_FAILURE_ATOMIC])
> >> +				goto skip;
> >>  		}
> >> -drop:
> >>  		set_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
> >>  		f2fs_drop_inmem_pages(inode);
> >> +skip:
> >>  		iput(inode);
> > 
> > Does this result into f2fs_evict_inode() in this context for this inode?
> 
> Yup, we need to call igrab/iput in pair in f2fs_drop_inmem_pages_all() anyway.
> 
> Previously, we may have .i_count leak...
> 
> Thanks,
> 
> > 
> > thanks,
> > 
> >>  	}
> >> -skip:
> >> +
> >>  	congestion_wait(BLK_RW_ASYNC, HZ/50);
> >>  	cond_resched();
> >> +
> >> +	if (gc_failure) {
> >> +		if (++looped >= count)
> >> +			return;
> >> +	}
> >> +
> >>  	goto next;
> >>  }
> >>
> >> @@ -334,6 +343,7 @@ void f2fs_drop_inmem_pages(struct inode *inode)
> >>  	spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
> >>  	if (!list_empty(&fi->inmem_ilist))
> >>  		list_del_init(&fi->inmem_ilist);
> >> +	sbi->atomic_files--;
> >>  	spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
> >>  }
> >>
> >> Thanks,
> > 

-- 
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

next prev parent reply	other threads:[~2019-11-11  6:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-08 11:03 [f2fs-dev] [PATCH] f2fs: Fix deadlock under storage almost full/dirty condition Sahitya Tummala
2019-11-08 11:03 ` Sahitya Tummala
2019-11-11  2:51 ` [f2fs-dev] " Chao Yu
2019-11-11  2:51   ` Chao Yu
2019-11-11  3:40   ` [f2fs-dev] " Sahitya Tummala
2019-11-11  3:40     ` Sahitya Tummala
2019-11-11  6:28     ` [f2fs-dev] " Chao Yu
2019-11-11  6:28       ` Chao Yu
2019-11-11  6:44       ` Sahitya Tummala [this message]
2019-11-11  6:44         ` Sahitya Tummala
2019-11-11  7:18         ` [f2fs-dev] " Chao Yu
2019-11-11  7:18           ` Chao Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191111064441.GC15669@codeaurora.org \
    --to=stummala@codeaurora.org \
    --cc=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yuchao0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.