From: Liu Bo <bo.li.liu@oracle.com>
To: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 6/9] btrfs: Fix NO_SPACE bug caused by delayed-iput
Date: Tue, 7 Jul 2015 17:26:42 +0800 [thread overview]
Message-ID: <20150707092640.GB1967@localhost.localdomain> (raw)
In-Reply-To: <017501d0b895$9b9b52b0$d2d1f810$@cn.fujitsu.com>
On Tue, Jul 07, 2015 at 05:16:29PM +0800, Zhao Lei wrote:
> Hi, Liubo
>
> > -----Original Message-----
> > From: Liu Bo [mailto:bo.li.liu@oracle.com]
> > Sent: Tuesday, July 07, 2015 4:14 PM
> > To: Zhaolei
> > Cc: linux-btrfs@vger.kernel.org
> > Subject: Re: [PATCH 6/9] btrfs: Fix NO_SPACE bug caused by delayed-iput
> >
> > Hi,
> >
> > On Thu, Apr 09, 2015 at 12:34:41PM +0800, Zhaolei wrote:
> > > From: Zhao Lei <zhaolei@cn.fujitsu.com>
> > >
> > > Steps to reproduce:
> > > while true; do
> > > dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%]
> > > rm /btrfs_dir/file
> > > sync
> > > done
> > >
> > > And we'll see dd failed because btrfs return NO_SPACE.
> > >
> > > Reason:
> > > Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs()
> > > in end to free fs space for next write, but sometimes it hadn't
> > > done work on time, because btrfs-cleaner thread get delayed-iputs
> > > from list before, but do iput() after next write.
> > >
> > > This is log:
> > > [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin
> > >
> > > [ 2569.084280] comm=sync func=btrfs_commit_transaction() call
> > btrfs_run_delayed_iputs()
> > > [ 2569.085418] comm=sync func=btrfs_commit_transaction() done
> > btrfs_run_delayed_iputs()
> > > [ 2569.087554] comm=sync func=btrfs_commit_transaction() end
> > >
> > > [ 2569.191081] comm=dd begin
> > > [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28
> > >
> > > [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 +
> > 32677888 = 32677888
> > > [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 +
> > 23834624 = 56512512
> > > ...
> > > [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448
> > + 21762048 = 965738496
> > > [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end
> > >
> > > Fix:
> > > Make btrfs_commit_transaction() wait current running btrfs-cleaner's
> > > delayed-iputs() done in end.
> > >
> > > Test:
> > > Use script similar to above(more complex),
> > > before patch:
> > > 7 failed in 100 * 20 loop.
> > > after patch:
> > > 0 failed in 100 * 20 loop.
> > >
> > > Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
> > > ---
> > > fs/btrfs/ctree.h | 1 +
> > > fs/btrfs/disk-io.c | 3 ++-
> > > fs/btrfs/extent-tree.c | 6 ++++++
> > > fs/btrfs/inode.c | 4 ++++
> > > 4 files changed, 13 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index
> > > f9c89ca..54d4d78 100644
> > > --- a/fs/btrfs/ctree.h
> > > +++ b/fs/btrfs/ctree.h
> > > @@ -1513,6 +1513,7 @@ struct btrfs_fs_info {
> > >
> > > spinlock_t delayed_iput_lock;
> > > struct list_head delayed_iputs;
> > > + struct rw_semaphore delayed_iput_sem;
> > >
> > > /* this protects tree_mod_seq_list */
> > > spinlock_t tree_mod_seq_lock;
> > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index
> > > 639f266..6867471 100644
> > > --- a/fs/btrfs/disk-io.c
> > > +++ b/fs/btrfs/disk-io.c
> > > @@ -2241,11 +2241,12 @@ int open_ctree(struct super_block *sb,
> > > spin_lock_init(&fs_info->qgroup_op_lock);
> > > spin_lock_init(&fs_info->buffer_lock);
> > > spin_lock_init(&fs_info->unused_bgs_lock);
> > > - mutex_init(&fs_info->unused_bg_unpin_mutex);
> > > rwlock_init(&fs_info->tree_mod_log_lock);
> > > + mutex_init(&fs_info->unused_bg_unpin_mutex);
> > > mutex_init(&fs_info->reloc_mutex);
> > > mutex_init(&fs_info->delalloc_root_mutex);
> > > seqlock_init(&fs_info->profiles_lock);
> > > + init_rwsem(&fs_info->delayed_iput_sem);
> > >
> > > init_completion(&fs_info->kobj_unregister);
> > > INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
> > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index
> > > 203ac63..6fd7dca 100644
> > > --- a/fs/btrfs/extent-tree.c
> > > +++ b/fs/btrfs/extent-tree.c
> > > @@ -3732,6 +3732,12 @@ commit_trans:
> > > ret = btrfs_commit_transaction(trans, root);
> > > if (ret)
> > > return ret;
> > > + /*
> > > + * make sure that all running delayed iput are
> > > + * done
> > > + */
> > > + down_write(&root->fs_info->delayed_iput_sem);
> > > + up_write(&root->fs_info->delayed_iput_sem);
> > > goto again;
> > > } else {
> > > btrfs_end_transaction(trans, root); diff --git
> > a/fs/btrfs/inode.c
> > > b/fs/btrfs/inode.c index d2e732d..34d10be 100644
> > > --- a/fs/btrfs/inode.c
> > > +++ b/fs/btrfs/inode.c
> > > @@ -3110,6 +3110,8 @@ void btrfs_run_delayed_iputs(struct btrfs_root
> > *root)
> > > if (empty)
> > > return;
> > >
> > > + down_read(&fs_info->delayed_iput_sem);
> > > +
> > > spin_lock(&fs_info->delayed_iput_lock);
> > > list_splice_init(&fs_info->delayed_iputs, &list);
> > > spin_unlock(&fs_info->delayed_iput_lock);
> > > @@ -3120,6 +3122,8 @@ void btrfs_run_delayed_iputs(struct btrfs_root
> > *root)
> > > iput(delayed->inode);
> > > kfree(delayed);
> > > }
> > > +
> > > + up_read(&root->fs_info->delayed_iput_sem);
> >
> > By introducing this "delayed_iput_sem", fstests's generic/241 cries about a
> > possible deadlock, can we use a waitqueue to avoid this?
> >
> >
> > [ 2061.345955] =============================================
> > [ 2061.346027] [ INFO: possible recursive locking detected ]
> > [ 2061.346027] 4.1.0+ #268 Tainted: G W
> > [ 2061.346027] ---------------------------------------------
> > [ 2061.346027] btrfs-cleaner/3045 is trying to acquire lock:
> > [ 2061.346027] (&fs_info->delayed_iput_sem){++++..}, at:
> > [<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100
> > [ 2061.346027] but task is already holding lock:
> > [ 2061.346027] (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffff814063ab>]
> > btrfs_run_delayed_iputs+0x6b/0x100
> > [ 2061.346027] other info that might help us debug this:
> > [ 2061.346027] Possible unsafe locking scenario:
> >
> > [ 2061.346027] CPU0
> > [ 2061.346027] ----
> > [ 2061.346027] lock(&fs_info->delayed_iput_sem);
> > [ 2061.346027] lock(&fs_info->delayed_iput_sem);
> > [ 2061.346027]
> > *** DEADLOCK ***
> >
> > [ 2061.346027] May be due to missing lock nesting notation
> >
> > [ 2061.346027] 2 locks held by btrfs-cleaner/3045:
> > [ 2061.346027] #0: (&fs_info->cleaner_mutex){+.+.+.}, at:
> > [<ffffffff813f6cc4>] cleaner_kthread+0x64/0x1a0 [ 2061.346027] #1:
> > (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffff814063ab>]
> > btrfs_run_delayed_iputs+0x6b/0x100
> > [ 2061.346027] stack backtrace:
> > [ 2061.346027] CPU: 0 PID: 3045 Comm: btrfs-cleaner Tainted: G W
> > 4.1.0+ #268
> > [ 2061.346027] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > 1.7.5-20140709_153950- 04/01/2014 [ 2061.346027] ffff88013760a700
> > ffff8800b4aff888 ffffffff818bb68e 0000000000000007 [ 2061.346027]
> > ffffffff828fa0a0 ffff8800b4aff968 ffffffff810db18d ffff8800b4aff978
> > [ 2061.346027] ffffffff00000000 ffff88013760a750 ffffffff00000001
> > ffff88013760b470 [ 2061.346027] Call Trace:
> > [ 2061.346027] [<ffffffff818bb68e>] dump_stack+0x4c/0x65 [ 2061.346027]
> > [<ffffffff810db18d>] __lock_acquire+0x214d/0x22f0 [ 2061.346027]
> > [<ffffffff810d5e3d>] ? trace_hardirqs_off+0xd/0x10 [ 2061.346027]
> > [<ffffffff810d649d>] ? __lock_is_held+0x4d/0x70 [ 2061.346027]
> > [<ffffffff810dc050>] lock_acquire+0xd0/0x280 [ 2061.346027]
> > [<ffffffff814063ab>] ? btrfs_run_delayed_iputs+0x6b/0x100
> > [ 2061.346027] [<ffffffff818c275c>] down_read+0x4c/0x70 [ 2061.346027]
> > [<ffffffff814063ab>] ? btrfs_run_delayed_iputs+0x6b/0x100
> > [ 2061.346027] [<ffffffff818c48cb>] ? _raw_spin_unlock+0x2b/0x40
> > [ 2061.346027] [<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100
> > [ 2061.346027] [<ffffffff8140042f>] ?
> > btrfs_commit_transaction+0xa1f/0xc50
> > [ 2061.346027] [<ffffffff81400454>] btrfs_commit_transaction+0xa44/0xc50
> > [ 2061.346027] [<ffffffff810cdbd0>] ? wait_woken+0xa0/0xa0 [ 2061.346027]
> > [<ffffffff813e5247>] flush_space+0x97/0x4b0 [ 2061.346027]
> > [<ffffffff818c48cb>] ? _raw_spin_unlock+0x2b/0x40 [ 2061.346027]
> > [<ffffffff813e5a3b>] ? reserve_metadata_bytes+0x20b/0x6d0
> > [ 2061.346027] [<ffffffff813e5a52>] reserve_metadata_bytes+0x222/0x6d0
> > [ 2061.346027] [<ffffffff813e6245>] ? btrfs_block_rsv_refill+0x65/0x90
> > [ 2061.346027] [<ffffffff813e6257>] btrfs_block_rsv_refill+0x77/0x90
> > [ 2061.346027] [<ffffffff8140c723>] btrfs_evict_inode+0x473/0x700
> > [ 2061.346027] [<ffffffff8123dcbb>] evict+0xab/0x170 [ 2061.346027]
> > [<ffffffff8123e6fd>] iput+0x1cd/0x350 [ 2061.346027] [<ffffffff81406409>]
> > btrfs_run_delayed_iputs+0xc9/0x100
> > [ 2061.346027] [<ffffffff813f6cc4>] ? cleaner_kthread+0x64/0x1a0
> > [ 2061.346027] [<ffffffff813f6dba>] cleaner_kthread+0x15a/0x1a0
> > [ 2061.346027] [<ffffffff813f6c60>] ? btree_invalidatepage+0x90/0x90
> > [ 2061.346027] [<ffffffff810a3d5f>] kthread+0xef/0x110 [ 2061.346027]
> > [<ffffffff810a3c70>] ? kthread_create_on_node+0x210/0x210
> > [ 2061.346027] [<ffffffff818c550f>] ret_from_fork+0x3f/0x70 [ 2061.346027]
> > [<ffffffff810a3c70>] ? kthread_create_on_node+0x210/0x210
> >
> Thanks for report above problem.
> Seems it is caused by recursion of btrfs_run_delayed_iputs:
>
> cleaner_kthread
> -> btrfs_run_delayed_iputs() *1
> -> get delayed_iput_sem lock *2
> -> iput()
> -> ...
> -> btrfs_commit_transaction()
> -> btrfs_run_delayed_iputs() *1
> -> get delayed_iput_sem lock (dead lock) *2
>
> *1: recursion of btrfs_run_delayed_iputs()
> *2: dead lock of delayed_iput_sem
Yep, that's what stack trace shows, but I guess it hardly runs to
deadlock, since we've checked if the list is empty before acquiring
the semephore.
>
> I'll reproduce this problem in my env, and comfirm fix of
> above problem.
It didn't occur every time, but quite often on my box.
Thanks,
-liubo
>
> Thanks
> Zhaolei
>
>
> >
> > Thanks,
> >
> > -liubo
> > > }
> > >
> > > /*
> > > --
> > > 1.8.5.1
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
> > > in the body of a message to majordomo@vger.kernel.org More majordomo
> > > info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2015-07-07 9:27 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-09 4:34 [PATCH v2 0/9] btrfs: Fix no_space on dd and rm loop Zhaolei
2015-04-09 4:34 ` [PATCH 1/9] btrfs: fix condition of commit transaction Zhaolei
2015-04-09 4:34 ` [PATCH 2/9] btrfs: Fix tail space processing in find_free_dev_extent() Zhaolei
2015-04-09 4:34 ` [PATCH 3/9] btrfs: Adjust commit-transaction condition to avoid NO_SPACE more Zhaolei
2015-04-09 4:34 ` [PATCH 4/9] btrfs: Set relative data on clear btrfs_block_group_cache->pinned Zhaolei
2015-04-09 4:34 ` [PATCH 5/9] btrfs: add WARN_ON() to check is space_info op current Zhaolei
2015-04-09 4:34 ` [PATCH 6/9] btrfs: Fix NO_SPACE bug caused by delayed-iput Zhaolei
2015-07-07 8:13 ` Liu Bo
2015-07-07 9:16 ` Zhao Lei
2015-07-07 9:26 ` Liu Bo [this message]
2015-07-07 9:38 ` Zhao Lei
2015-04-09 4:34 ` [PATCH 7/9] btrfs: Support busy loop of write and delete Zhaolei
2015-04-09 4:34 ` [PATCH 8/9] btrfs: wait for delayed iputs on no space Zhaolei
2015-04-13 14:54 ` Chris Mason
2015-04-14 4:06 ` Zhao Lei
2015-04-09 4:34 ` [PATCH 9/9] btrfs: cleanup unused alloc_chunk varible Zhaolei
-- strict thread matches above, loose matches on Subject: below --
2015-04-03 12:11 [PATCH 0/9] btrfs: Fix no_space on dd and rm loop Zhaolei
2015-04-03 12:12 ` [PATCH 6/9] btrfs: Fix NO_SPACE bug caused by delayed-iput Zhaolei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150707092640.GB1967@localhost.localdomain \
--to=bo.li.liu@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=zhaolei@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).