From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cn.fujitsu.com ([59.151.112.132]:43937 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1752904AbcJYHM3 (ORCPT ); Tue, 25 Oct 2016 03:12:29 -0400 Subject: Re: [PATCH] btrfs: imporve delayed refs iterations To: References: <20161021090507.28425-1-wangxg.fnst@cn.fujitsu.com> <20161024190003.GA16018@localhost.localdomain> CC: From: Wang Xiaoguang Message-ID: <580F0428.7000002@cn.fujitsu.com> Date: Tue, 25 Oct 2016 15:05:12 +0800 MIME-Version: 1.0 In-Reply-To: <20161024190003.GA16018@localhost.localdomain> Content-Type: text/plain; charset="windows-1252"; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: hi, On 10/25/2016 03:00 AM, Liu Bo wrote: > On Fri, Oct 21, 2016 at 05:05:07PM +0800, Wang Xiaoguang wrote: >> This issue was found when I tried to delete a heavily reflinked file, >> when deleting such files, other transaction operation will not have a >> chance to make progress, for example, start_transaction() will blocked >> in wait_current_trans(root) for long time, sometimes it even triggers >> soft lockups, and the time taken to delete such heavily reflinked file >> is also very large, often hundreds of seconds. Using perf top, it reports >> that: >> >> PerfTop: 7416 irqs/sec kernel:99.8% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs) >> --------------------------------------------------------------------------------------- >> 84.37% [btrfs] [k] __btrfs_run_delayed_refs.constprop.80 >> 11.02% [kernel] [k] delay_tsc >> 0.79% [kernel] [k] _raw_spin_unlock_irq >> 0.78% [kernel] [k] _raw_spin_unlock_irqrestore >> 0.45% [kernel] [k] do_raw_spin_lock >> 0.18% [kernel] [k] __slab_alloc >> It seems __btrfs_run_delayed_refs() took most cpu time, after some debug >> work, I found it's select_delayed_ref() causing this issue, for a delayed >> head, in our case, it'll be full of BTRFS_DROP_DELAYED_REF nodes, but >> select_delayed_ref() will firstly try to iterate node list to find >> BTRFS_ADD_DELAYED_REF nodes, obviously it's a disaster in this case, and >> waste much time. >> >> To fix this issue, we introduce a new ref_add_list in struct btrfs_delayed_ref_head, >> then in select_delayed_ref(), if this list is not empty, we can directly use >> nodes in this list. With this patch, it just took about 10~15 seconds to >> delte the same file. Now using perf top, it reports that: >> >> PerfTop: 2734 irqs/sec kernel:99.5% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs) >> ---------------------------------------------------------------------------------------- >> >> 20.74% [kernel] [k] _raw_spin_unlock_irqrestore >> 16.33% [kernel] [k] __slab_alloc >> 5.41% [kernel] [k] lock_acquired >> 4.42% [kernel] [k] lock_acquire >> 4.05% [kernel] [k] lock_release >> 3.37% [kernel] [k] _raw_spin_unlock_irq >> >> For normal files, this patch also gives help, at least we do not need to >> iterate whole list to found BTRFS_ADD_DELAYED_REF nodes. >> >> Signed-off-by: Wang Xiaoguang >> --- >> fs/btrfs/delayed-ref.c | 14 ++++++++++++++ >> fs/btrfs/delayed-ref.h | 8 ++++++++ >> fs/btrfs/disk-io.c | 2 ++ >> fs/btrfs/extent-tree.c | 15 +++++++++------ >> 4 files changed, 33 insertions(+), 6 deletions(-) >> >> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c >> index 8d93854..39c28e0 100644 >> --- a/fs/btrfs/delayed-ref.c >> +++ b/fs/btrfs/delayed-ref.c >> @@ -189,6 +189,8 @@ static inline void drop_delayed_ref(struct btrfs_trans_handle *trans, >> } else { >> assert_spin_locked(&head->lock); >> list_del(&ref->list); >> + if (!list_empty(&ref->add_list)) >> + list_del(&ref->add_list); >> } >> ref->in_tree = 0; >> btrfs_put_delayed_ref(ref); >> @@ -431,6 +433,11 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle *trans, >> exist->action = ref->action; >> mod = -exist->ref_mod; >> exist->ref_mod = ref->ref_mod; >> + if (ref->action == BTRFS_ADD_DELAYED_REF) >> + list_add_tail(&exist->add_list, >> + &href->ref_add_list); >> + else if (!list_empty(&exist->add_list)) >> + list_del(&exist->add_list); > ->action is either BTRFS_ADD_DELAYED_REF or BTRFS_DROP_DELAYED_REF, so > in 'else' section, (!list_empty(&exist->add_list)) is true indeed. Oh, you're right, I'll remove this "if" statement, thanks. > >> } else >> mod = -ref->ref_mod; >> } >> @@ -444,6 +451,8 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle *trans, >> >> add_tail: >> list_add_tail(&ref->list, &href->ref_list); >> + if (ref->action == BTRFS_ADD_DELAYED_REF) >> + list_add_tail(&ref->add_list, &href->ref_add_list); >> atomic_inc(&root->num_entries); >> trans->delayed_ref_updates++; >> spin_unlock(&href->lock); >> @@ -590,6 +599,7 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, >> head_ref->must_insert_reserved = must_insert_reserved; >> head_ref->is_data = is_data; >> INIT_LIST_HEAD(&head_ref->ref_list); >> + INIT_LIST_HEAD(&head_ref->ref_add_list); >> head_ref->processing = 0; >> head_ref->total_ref_mod = count_mod; >> head_ref->qgroup_reserved = 0; >> @@ -671,6 +681,8 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info, >> ref->is_head = 0; >> ref->in_tree = 1; >> ref->seq = seq; >> + INIT_LIST_HEAD(&ref->list); >> + INIT_LIST_HEAD(&ref->add_list); >> >> full_ref = btrfs_delayed_node_to_tree_ref(ref); >> full_ref->parent = parent; >> @@ -726,6 +738,8 @@ add_delayed_data_ref(struct btrfs_fs_info *fs_info, >> ref->is_head = 0; >> ref->in_tree = 1; >> ref->seq = seq; >> + INIT_LIST_HEAD(&ref->list); >> + INIT_LIST_HEAD(&ref->add_list); >> >> full_ref = btrfs_delayed_node_to_data_ref(ref); >> full_ref->parent = parent; >> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h >> index 43f3629..dba9784 100644 >> --- a/fs/btrfs/delayed-ref.h >> +++ b/fs/btrfs/delayed-ref.h >> @@ -42,6 +42,12 @@ struct btrfs_delayed_ref_node { >> >> /*data/tree ref use list, stored in ref_head->ref_list. */ >> struct list_head list; >> + /* >> + * If action is BTRFS_ADD_DELAYED_REF, also link this node to >> + * ref_head->ref_add_list, then we do not need to iterate the >> + * whole ref_head->ref_list to find BTRFS_ADD_DELAYED_REF nodes. >> + */ >> + struct list_head add_list; >> >> /* the starting bytenr of the extent */ >> u64 bytenr; >> @@ -99,6 +105,8 @@ struct btrfs_delayed_ref_head { >> >> spinlock_t lock; >> struct list_head ref_list; >> + /* accumulate add BTRFS_ADD_DELAYED_REF nodes to this ref_add_list. */ >> + struct list_head ref_add_list; >> >> struct rb_node href_node; >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index 3a57f99..bc2edaf 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -4354,6 +4354,8 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, >> list) { >> ref->in_tree = 0; >> list_del(&ref->list); >> + if (!list_empty(&ref->add_list)) >> + list_del(&ref->add_list); >> atomic_dec(&delayed_refs->num_entries); >> btrfs_put_delayed_ref(ref); >> } >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >> index 210c94a..1284222 100644 >> --- a/fs/btrfs/extent-tree.c >> +++ b/fs/btrfs/extent-tree.c >> @@ -2454,13 +2454,14 @@ select_delayed_ref(struct btrfs_delayed_ref_head *head) >> * the extent item from the extent tree, when there still are references >> * to add, which would fail because they would not find the extent item. >> */ >> - list_for_each_entry(ref, &head->ref_list, list) { >> - if (ref->action == BTRFS_ADD_DELAYED_REF) >> - return ref; >> - } >> + if (!list_empty(&head->ref_add_list)) >> + return list_entry(head->ref_add_list.next, >> + struct btrfs_delayed_ref_node, add_list); >> >> - return list_entry(head->ref_list.next, struct btrfs_delayed_ref_node, >> - list); >> + ref = list_entry(head->ref_list.next, struct btrfs_delayed_ref_node, >> + list); >> + WARN_ON(!list_empty(&ref->add_list)); > I'd prefer ASSERT for only developers troubleshooting. Agree, I'll update it in v2, thanks. Regards, Xiaoguang Wang > > Others look good to me. > > Reviewed-by: Liu Bo > > I had a patch[1] while working on dedupe back then, it was trying to > resolve the same problem, somehow it didn't make it to this retry of dedupe. > > [1]: https://patchwork.kernel.org/patch/3959021/ > > > Thanks, > > -liubo >> + return ref; >> } >> >> /* >> @@ -2620,6 +2621,8 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, >> actual_count++; >> ref->in_tree = 0; >> list_del(&ref->list); >> + if (!list_empty(&ref->add_list)) >> + list_del(&ref->add_list); >> } >> atomic_dec(&delayed_refs->num_entries); >> >> -- >> 2.9.0 >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >