All of lore.kernel.org
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: Zhaolei <zhaolei@cn.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 6/9] btrfs: Fix NO_SPACE bug caused by delayed-iput
Date: Tue, 7 Jul 2015 16:13:55 +0800	[thread overview]
Message-ID: <20150707081352.GA1967@localhost.localdomain> (raw)
In-Reply-To: <581e0ffa1ce1dd6345519e7953bf4e895bda35ae.1428554023.git.zhaolei@cn.fujitsu.com>

Hi,

On Thu, Apr 09, 2015 at 12:34:41PM +0800, Zhaolei wrote:
> From: Zhao Lei <zhaolei@cn.fujitsu.com>
> 
> Steps to reproduce:
>   while true; do
>     dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%]
>     rm /btrfs_dir/file
>     sync
>   done
> 
>   And we'll see dd failed because btrfs return NO_SPACE.
> 
> Reason:
>   Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs()
>   in end to free fs space for next write, but sometimes it hadn't
>   done work on time, because btrfs-cleaner thread get delayed-iputs
>   from list before, but do iput() after next write.
> 
>   This is log:
>   [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin
> 
>   [ 2569.084280] comm=sync func=btrfs_commit_transaction() call btrfs_run_delayed_iputs()
>   [ 2569.085418] comm=sync func=btrfs_commit_transaction() done btrfs_run_delayed_iputs()
>   [ 2569.087554] comm=sync func=btrfs_commit_transaction() end
> 
>   [ 2569.191081] comm=dd begin
>   [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28
> 
>   [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 + 32677888 = 32677888
>   [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 + 23834624 = 56512512
>   ...
>   [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448 + 21762048 = 965738496
>   [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end
> 
> Fix:
>   Make btrfs_commit_transaction() wait current running btrfs-cleaner's
>   delayed-iputs() done in end.
> 
> Test:
>   Use script similar to above(more complex),
>   before patch:
>     7 failed in 100 * 20 loop.
>   after patch:
>     0 failed in 100 * 20 loop.
> 
> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
> ---
>  fs/btrfs/ctree.h       | 1 +
>  fs/btrfs/disk-io.c     | 3 ++-
>  fs/btrfs/extent-tree.c | 6 ++++++
>  fs/btrfs/inode.c       | 4 ++++
>  4 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index f9c89ca..54d4d78 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1513,6 +1513,7 @@ struct btrfs_fs_info {
>  
>  	spinlock_t delayed_iput_lock;
>  	struct list_head delayed_iputs;
> +	struct rw_semaphore delayed_iput_sem;
>  
>  	/* this protects tree_mod_seq_list */
>  	spinlock_t tree_mod_seq_lock;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 639f266..6867471 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2241,11 +2241,12 @@ int open_ctree(struct super_block *sb,
>  	spin_lock_init(&fs_info->qgroup_op_lock);
>  	spin_lock_init(&fs_info->buffer_lock);
>  	spin_lock_init(&fs_info->unused_bgs_lock);
> -	mutex_init(&fs_info->unused_bg_unpin_mutex);
>  	rwlock_init(&fs_info->tree_mod_log_lock);
> +	mutex_init(&fs_info->unused_bg_unpin_mutex);
>  	mutex_init(&fs_info->reloc_mutex);
>  	mutex_init(&fs_info->delalloc_root_mutex);
>  	seqlock_init(&fs_info->profiles_lock);
> +	init_rwsem(&fs_info->delayed_iput_sem);
>  
>  	init_completion(&fs_info->kobj_unregister);
>  	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 203ac63..6fd7dca 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3732,6 +3732,12 @@ commit_trans:
>  				ret = btrfs_commit_transaction(trans, root);
>  				if (ret)
>  					return ret;
> +				/*
> +				 * make sure that all running delayed iput are
> +				 * done
> +				 */
> +				down_write(&root->fs_info->delayed_iput_sem);
> +				up_write(&root->fs_info->delayed_iput_sem);
>  				goto again;
>  			} else {
>  				btrfs_end_transaction(trans, root);
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index d2e732d..34d10be 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3110,6 +3110,8 @@ void btrfs_run_delayed_iputs(struct btrfs_root *root)
>  	if (empty)
>  		return;
>  
> +	down_read(&fs_info->delayed_iput_sem);
> +
>  	spin_lock(&fs_info->delayed_iput_lock);
>  	list_splice_init(&fs_info->delayed_iputs, &list);
>  	spin_unlock(&fs_info->delayed_iput_lock);
> @@ -3120,6 +3122,8 @@ void btrfs_run_delayed_iputs(struct btrfs_root *root)
>  		iput(delayed->inode);
>  		kfree(delayed);
>  	}
> +
> +	up_read(&root->fs_info->delayed_iput_sem);

By introducing this "delayed_iput_sem", fstests's generic/241 cries
about a possible deadlock, can we use a waitqueue to avoid this?


[ 2061.345955] =============================================
[ 2061.346027] [ INFO: possible recursive locking detected ]
[ 2061.346027] 4.1.0+ #268 Tainted: G        W      
[ 2061.346027] ---------------------------------------------
[ 2061.346027] btrfs-cleaner/3045 is trying to acquire lock:
[ 2061.346027]  (&fs_info->delayed_iput_sem){++++..}, at:
[<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100
[ 2061.346027] but task is already holding lock:
[ 2061.346027]  (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100
[ 2061.346027] other info that might help us debug this:
[ 2061.346027]  Possible unsafe locking scenario:

[ 2061.346027]        CPU0
[ 2061.346027]        ----
[ 2061.346027]   lock(&fs_info->delayed_iput_sem);
[ 2061.346027]   lock(&fs_info->delayed_iput_sem);
[ 2061.346027] 
 *** DEADLOCK ***

[ 2061.346027]  May be due to missing lock nesting notation

[ 2061.346027] 2 locks held by btrfs-cleaner/3045:
[ 2061.346027]  #0:  (&fs_info->cleaner_mutex){+.+.+.}, at: [<ffffffff813f6cc4>] cleaner_kthread+0x64/0x1a0
[ 2061.346027]  #1:  (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100
[ 2061.346027] stack backtrace:
[ 2061.346027] CPU: 0 PID: 3045 Comm: btrfs-cleaner Tainted: G        W 4.1.0+ #268
[ 2061.346027] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
[ 2061.346027]  ffff88013760a700 ffff8800b4aff888 ffffffff818bb68e 0000000000000007
[ 2061.346027]  ffffffff828fa0a0 ffff8800b4aff968 ffffffff810db18d ffff8800b4aff978
[ 2061.346027]  ffffffff00000000 ffff88013760a750 ffffffff00000001 ffff88013760b470
[ 2061.346027] Call Trace:
[ 2061.346027]  [<ffffffff818bb68e>] dump_stack+0x4c/0x65
[ 2061.346027]  [<ffffffff810db18d>] __lock_acquire+0x214d/0x22f0
[ 2061.346027]  [<ffffffff810d5e3d>] ? trace_hardirqs_off+0xd/0x10
[ 2061.346027]  [<ffffffff810d649d>] ? __lock_is_held+0x4d/0x70
[ 2061.346027]  [<ffffffff810dc050>] lock_acquire+0xd0/0x280
[ 2061.346027]  [<ffffffff814063ab>] ?  btrfs_run_delayed_iputs+0x6b/0x100
[ 2061.346027]  [<ffffffff818c275c>] down_read+0x4c/0x70
[ 2061.346027]  [<ffffffff814063ab>] ?  btrfs_run_delayed_iputs+0x6b/0x100
[ 2061.346027]  [<ffffffff818c48cb>] ? _raw_spin_unlock+0x2b/0x40
[ 2061.346027]  [<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100
[ 2061.346027]  [<ffffffff8140042f>] ?  btrfs_commit_transaction+0xa1f/0xc50
[ 2061.346027]  [<ffffffff81400454>] btrfs_commit_transaction+0xa44/0xc50
[ 2061.346027]  [<ffffffff810cdbd0>] ? wait_woken+0xa0/0xa0
[ 2061.346027]  [<ffffffff813e5247>] flush_space+0x97/0x4b0
[ 2061.346027]  [<ffffffff818c48cb>] ? _raw_spin_unlock+0x2b/0x40
[ 2061.346027]  [<ffffffff813e5a3b>] ?  reserve_metadata_bytes+0x20b/0x6d0
[ 2061.346027]  [<ffffffff813e5a52>] reserve_metadata_bytes+0x222/0x6d0
[ 2061.346027]  [<ffffffff813e6245>] ? btrfs_block_rsv_refill+0x65/0x90
[ 2061.346027]  [<ffffffff813e6257>] btrfs_block_rsv_refill+0x77/0x90
[ 2061.346027]  [<ffffffff8140c723>] btrfs_evict_inode+0x473/0x700
[ 2061.346027]  [<ffffffff8123dcbb>] evict+0xab/0x170
[ 2061.346027]  [<ffffffff8123e6fd>] iput+0x1cd/0x350
[ 2061.346027]  [<ffffffff81406409>] btrfs_run_delayed_iputs+0xc9/0x100
[ 2061.346027]  [<ffffffff813f6cc4>] ? cleaner_kthread+0x64/0x1a0
[ 2061.346027]  [<ffffffff813f6dba>] cleaner_kthread+0x15a/0x1a0
[ 2061.346027]  [<ffffffff813f6c60>] ? btree_invalidatepage+0x90/0x90
[ 2061.346027]  [<ffffffff810a3d5f>] kthread+0xef/0x110
[ 2061.346027]  [<ffffffff810a3c70>] ?  kthread_create_on_node+0x210/0x210
[ 2061.346027]  [<ffffffff818c550f>] ret_from_fork+0x3f/0x70
[ 2061.346027]  [<ffffffff810a3c70>] ?  kthread_create_on_node+0x210/0x210


Thanks,

-liubo
>  }
>  
>  /*
> -- 
> 1.8.5.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2015-07-07  8:14 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-09  4:34 [PATCH v2 0/9] btrfs: Fix no_space on dd and rm loop Zhaolei
2015-04-09  4:34 ` [PATCH 1/9] btrfs: fix condition of commit transaction Zhaolei
2015-04-09  4:34 ` [PATCH 2/9] btrfs: Fix tail space processing in find_free_dev_extent() Zhaolei
2015-04-09  4:34 ` [PATCH 3/9] btrfs: Adjust commit-transaction condition to avoid NO_SPACE more Zhaolei
2015-04-09  4:34 ` [PATCH 4/9] btrfs: Set relative data on clear btrfs_block_group_cache->pinned Zhaolei
2015-04-09  4:34 ` [PATCH 5/9] btrfs: add WARN_ON() to check is space_info op current Zhaolei
2015-04-09  4:34 ` [PATCH 6/9] btrfs: Fix NO_SPACE bug caused by delayed-iput Zhaolei
2015-07-07  8:13   ` Liu Bo [this message]
2015-07-07  9:16     ` Zhao Lei
2015-07-07  9:26       ` Liu Bo
2015-07-07  9:38         ` Zhao Lei
2015-04-09  4:34 ` [PATCH 7/9] btrfs: Support busy loop of write and delete Zhaolei
2015-04-09  4:34 ` [PATCH 8/9] btrfs: wait for delayed iputs on no space Zhaolei
2015-04-13 14:54   ` Chris Mason
2015-04-14  4:06     ` Zhao Lei
2015-04-09  4:34 ` [PATCH 9/9] btrfs: cleanup unused alloc_chunk varible Zhaolei
  -- strict thread matches above, loose matches on Subject: below --
2015-04-03 12:11 [PATCH 0/9] btrfs: Fix no_space on dd and rm loop Zhaolei
2015-04-03 12:12 ` [PATCH 6/9] btrfs: Fix NO_SPACE bug caused by delayed-iput Zhaolei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150707081352.GA1967@localhost.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=zhaolei@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.