* [PATCH 0/3] btrfs: Delay block group auto removal to avoid interfering qgroups
@ 2018-05-28  9:20 Qu Wenruo
  2018-05-28  9:20 ` [PATCH 1/3] btrfs: trace: Add trace points for unused block groups Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Qu Wenruo @ 2018-05-28  9:20 UTC (permalink / raw)
  To: linux-btrfs
The patchset can be fetched from github:
https://github.com/adam900710/linux/tree/delayed_bg_removal
It's based on v4.17-rc5 branch.
This bug is reported from SUSE openQA, although it's pretty hard to hit
in real world (even real world VM), it's believed block group auto
removal (anyway, there isn't much thing can remove a chunk mapping in
btrfs) could interfere with qgroup's search on commit root.
Full details can be found in the 3rd patch.
The patchset uses 2 submitted cleanup/refactor patches as basis, and the
3rd patch will ensure unused block group will only be deleted after
current transaction is done.
Qu Wenruo (3):
  btrfs: trace: Add trace points for unused block groups
  btrfs: Use btrfs_mark_bg_unused() to replace open code
  btrfs: Delayed empty block group auto removal to next transaction
 fs/btrfs/ctree.h             |  2 ++
 fs/btrfs/extent-tree.c       | 62 ++++++++++++++++++++++++++----------
 fs/btrfs/scrub.c             |  8 +----
 fs/btrfs/transaction.c       |  7 ++++
 fs/btrfs/transaction.h       | 10 ++++++
 include/trace/events/btrfs.h | 42 ++++++++++++++++++++++++
 6 files changed, 107 insertions(+), 24 deletions(-)
-- 
2.17.0
^ permalink raw reply	[flat|nested] 9+ messages in thread* [PATCH 1/3] btrfs: trace: Add trace points for unused block groups 2018-05-28 9:20 [PATCH 0/3] btrfs: Delay block group auto removal to avoid interfering qgroups Qu Wenruo @ 2018-05-28 9:20 ` Qu Wenruo 2018-05-28 9:20 ` [PATCH 2/3] btrfs: Use btrfs_mark_bg_unused() to replace open code Qu Wenruo ` (2 subsequent siblings) 3 siblings, 0 replies; 9+ messages in thread From: Qu Wenruo @ 2018-05-28 9:20 UTC (permalink / raw) To: linux-btrfs This patch will add the following trace events: 1) btrfs_remove_block_group For btrfs_remove_block_group() function. Triggered when a block group is really removed. 2) btrfs_add_unused_block_group Triggered which block group is added to unused_bgs list. 3) btrfs_skip_unused_block_group Triggered which unused block group is not deleted. These trace events is pretty handy to debug case related to block group auto remove. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/extent-tree.c | 4 ++++ fs/btrfs/scrub.c | 1 + include/trace/events/btrfs.h | 42 ++++++++++++++++++++++++++++++++++++ 3 files changed, 47 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 51b5e2da708c..8241de81089a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6354,6 +6354,7 @@ static int update_block_group(struct btrfs_trans_handle *trans, spin_lock(&info->unused_bgs_lock); if (list_empty(&cache->bg_list)) { btrfs_get_block_group(cache); + trace_btrfs_add_unused_block_group(cache); list_add_tail(&cache->bg_list, &info->unused_bgs); } @@ -10204,6 +10205,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) /* Should always be true but just in case. */ if (list_empty(&cache->bg_list)) { btrfs_get_block_group(cache); + trace_btrfs_add_unused_block_group(cache); list_add_tail(&cache->bg_list, &info->unused_bgs); } @@ -10391,6 +10393,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, BUG_ON(!block_group); BUG_ON(!block_group->ro); + trace_btrfs_remove_block_group(block_group); /* * Free the reserved super bytes from this block group before * remove it. @@ -10755,6 +10758,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) * the ro check in case balance is currently acting on * this block group. */ + trace_btrfs_skip_unused_block_group(block_group); spin_unlock(&block_group->lock); up_write(&space_info->groups_sem); goto next; diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 52b39a0924e9..a59005862010 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3984,6 +3984,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, spin_lock(&fs_info->unused_bgs_lock); if (list_empty(&cache->bg_list)) { btrfs_get_block_group(cache); + trace_btrfs_add_unused_block_group(cache); list_add_tail(&cache->bg_list, &fs_info->unused_bgs); } diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index 965c650a5273..c226bc9a61db 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -1810,6 +1810,48 @@ TRACE_EVENT(btrfs_inode_mod_outstanding_extents, show_root_type(__entry->root_objectid), (unsigned long long)__entry->ino, __entry->mod) ); + +DECLARE_EVENT_CLASS(btrfs__block_group, + TP_PROTO(const struct btrfs_block_group_cache *bg_cache), + + TP_ARGS(bg_cache), + + TP_STRUCT__entry_btrfs( + __field( u64, bytenr ) + __field( u64, flags ) + __field( u64, len ) + __field( u64, used ) + ), + + TP_fast_assign_btrfs(bg_cache->fs_info, + __entry->bytenr = bg_cache->key.objectid, + __entry->len = bg_cache->key.offset, + __entry->flags = bg_cache->flags; + __entry->used = btrfs_block_group_used(&bg_cache->item); + ), + + TP_printk_btrfs("bg bytenr=%llu len=%llu used=%llu flags=%llu(%s)", + __entry->bytenr, __entry->len, __entry->used, __entry->flags, + __print_flags(__entry->flags, "|", BTRFS_GROUP_FLAGS)) +); + +DEFINE_EVENT(btrfs__block_group, btrfs_remove_block_group, + TP_PROTO(const struct btrfs_block_group_cache *bg_cache), + + TP_ARGS(bg_cache) +); + +DEFINE_EVENT(btrfs__block_group, btrfs_add_unused_block_group, + TP_PROTO(const struct btrfs_block_group_cache *bg_cache), + + TP_ARGS(bg_cache) +); + +DEFINE_EVENT(btrfs__block_group, btrfs_skip_unused_block_group, + TP_PROTO(const struct btrfs_block_group_cache *bg_cache), + + TP_ARGS(bg_cache) +); #endif /* _TRACE_BTRFS_H */ /* This part must be outside protection */ -- 2.17.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/3] btrfs: Use btrfs_mark_bg_unused() to replace open code 2018-05-28 9:20 [PATCH 0/3] btrfs: Delay block group auto removal to avoid interfering qgroups Qu Wenruo 2018-05-28 9:20 ` [PATCH 1/3] btrfs: trace: Add trace points for unused block groups Qu Wenruo @ 2018-05-28 9:20 ` Qu Wenruo 2018-05-28 21:45 ` Nikolay Borisov 2018-05-28 9:20 ` [PATCH 3/3] btrfs: Delayed empty block group auto removal to next transaction Qu Wenruo 2018-05-29 16:17 ` [PATCH 0/3] btrfs: Delay block group auto removal to avoid interfering qgroups David Sterba 3 siblings, 1 reply; 9+ messages in thread From: Qu Wenruo @ 2018-05-28 9:20 UTC (permalink / raw) To: linux-btrfs Introduce a small helper, btrfs_mark_bg_unused(), to accquire needed locks and add a block group to unused_bgs list. No functional modification, and only 3 callers are involved. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/ctree.h | 1 + fs/btrfs/extent-tree.c | 36 +++++++++++++++++------------------- fs/btrfs/scrub.c | 9 +-------- 3 files changed, 19 insertions(+), 27 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2771cc56a622..b19b673485fd 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2829,6 +2829,7 @@ void check_system_chunk(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, const u64 type); u64 add_new_free_space(struct btrfs_block_group_cache *block_group, struct btrfs_fs_info *info, u64 start, u64 end); +void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg); /* ctree.c */ int btrfs_bin_search(struct extent_buffer *eb, const struct btrfs_key *key, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 8241de81089a..f0d7e19feca7 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6350,16 +6350,8 @@ static int update_block_group(struct btrfs_trans_handle *trans, * dirty list to avoid races between cleaner kthread and space * cache writeout. */ - if (!alloc && old_val == 0) { - spin_lock(&info->unused_bgs_lock); - if (list_empty(&cache->bg_list)) { - btrfs_get_block_group(cache); - trace_btrfs_add_unused_block_group(cache); - list_add_tail(&cache->bg_list, - &info->unused_bgs); - } - spin_unlock(&info->unused_bgs_lock); - } + if (!alloc && old_val == 0) + btrfs_mark_bg_unused(cache); btrfs_put_block_group(cache); total -= num_bytes; @@ -10201,15 +10193,8 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) if (btrfs_chunk_readonly(info, cache->key.objectid)) { inc_block_group_ro(cache, 1); } else if (btrfs_block_group_used(&cache->item) == 0) { - spin_lock(&info->unused_bgs_lock); - /* Should always be true but just in case. */ - if (list_empty(&cache->bg_list)) { - btrfs_get_block_group(cache); - trace_btrfs_add_unused_block_group(cache); - list_add_tail(&cache->bg_list, - &info->unused_bgs); - } - spin_unlock(&info->unused_bgs_lock); + ASSERT(list_empty(&cache->bg_list)); + btrfs_mark_bg_unused(cache); } } @@ -11133,3 +11118,16 @@ void btrfs_wait_for_snapshot_creation(struct btrfs_root *root) !atomic_read(&root->will_be_snapshotted)); } } + +void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg) +{ + struct btrfs_fs_info *fs_info = bg->fs_info; + + spin_lock(&fs_info->unused_bgs_lock); + if (list_empty(&bg->bg_list)) { + btrfs_get_block_group(bg); + trace_btrfs_add_unused_block_group(bg); + list_add_tail(&bg->bg_list, &fs_info->unused_bgs); + } + spin_unlock(&fs_info->unused_bgs_lock); +} diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index a59005862010..40086b47a65f 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3981,14 +3981,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache->removed && !cache->ro && cache->reserved == 0 && btrfs_block_group_used(&cache->item) == 0) { spin_unlock(&cache->lock); - spin_lock(&fs_info->unused_bgs_lock); - if (list_empty(&cache->bg_list)) { - btrfs_get_block_group(cache); - trace_btrfs_add_unused_block_group(cache); - list_add_tail(&cache->bg_list, - &fs_info->unused_bgs); - } - spin_unlock(&fs_info->unused_bgs_lock); + btrfs_mark_bg_unused(cache); } else { spin_unlock(&cache->lock); } -- 2.17.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] btrfs: Use btrfs_mark_bg_unused() to replace open code 2018-05-28 9:20 ` [PATCH 2/3] btrfs: Use btrfs_mark_bg_unused() to replace open code Qu Wenruo @ 2018-05-28 21:45 ` Nikolay Borisov 0 siblings, 0 replies; 9+ messages in thread From: Nikolay Borisov @ 2018-05-28 21:45 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 28.05.2018 12:20, Qu Wenruo wrote: > Introduce a small helper, btrfs_mark_bg_unused(), to accquire needed > locks and add a block group to unused_bgs list. > > No functional modification, and only 3 callers are involved. > > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> > --- > fs/btrfs/ctree.h | 1 + > fs/btrfs/extent-tree.c | 36 +++++++++++++++++------------------- > fs/btrfs/scrub.c | 9 +-------- > 3 files changed, 19 insertions(+), 27 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 2771cc56a622..b19b673485fd 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -2829,6 +2829,7 @@ void check_system_chunk(struct btrfs_trans_handle *trans, > struct btrfs_fs_info *fs_info, const u64 type); > u64 add_new_free_space(struct btrfs_block_group_cache *block_group, > struct btrfs_fs_info *info, u64 start, u64 end); > +void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg); > > /* ctree.c */ > int btrfs_bin_search(struct extent_buffer *eb, const struct btrfs_key *key, > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 8241de81089a..f0d7e19feca7 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -6350,16 +6350,8 @@ static int update_block_group(struct btrfs_trans_handle *trans, > * dirty list to avoid races between cleaner kthread and space > * cache writeout. > */ > - if (!alloc && old_val == 0) { > - spin_lock(&info->unused_bgs_lock); > - if (list_empty(&cache->bg_list)) { > - btrfs_get_block_group(cache); > - trace_btrfs_add_unused_block_group(cache); > - list_add_tail(&cache->bg_list, > - &info->unused_bgs); > - } > - spin_unlock(&info->unused_bgs_lock); > - } > + if (!alloc && old_val == 0) > + btrfs_mark_bg_unused(cache); > > btrfs_put_block_group(cache); > total -= num_bytes; > @@ -10201,15 +10193,8 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) > if (btrfs_chunk_readonly(info, cache->key.objectid)) { > inc_block_group_ro(cache, 1); > } else if (btrfs_block_group_used(&cache->item) == 0) { > - spin_lock(&info->unused_bgs_lock); > - /* Should always be true but just in case. */ > - if (list_empty(&cache->bg_list)) { > - btrfs_get_block_group(cache); > - trace_btrfs_add_unused_block_group(cache); > - list_add_tail(&cache->bg_list, > - &info->unused_bgs); > - } > - spin_unlock(&info->unused_bgs_lock); > + ASSERT(list_empty(&cache->bg_list)); > + btrfs_mark_bg_unused(cache); > } > } > > @@ -11133,3 +11118,16 @@ void btrfs_wait_for_snapshot_creation(struct btrfs_root *root) > !atomic_read(&root->will_be_snapshotted)); > } > } > + > +void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg) > +{ > + struct btrfs_fs_info *fs_info = bg->fs_info; > + > + spin_lock(&fs_info->unused_bgs_lock); > + if (list_empty(&bg->bg_list)) { > + btrfs_get_block_group(bg); > + trace_btrfs_add_unused_block_group(bg); > + list_add_tail(&bg->bg_list, &fs_info->unused_bgs); > + } > + spin_unlock(&fs_info->unused_bgs_lock); > +} > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c > index a59005862010..40086b47a65f 100644 > --- a/fs/btrfs/scrub.c > +++ b/fs/btrfs/scrub.c > @@ -3981,14 +3981,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, > if (!cache->removed && !cache->ro && cache->reserved == 0 && > btrfs_block_group_used(&cache->item) == 0) { > spin_unlock(&cache->lock); > - spin_lock(&fs_info->unused_bgs_lock); > - if (list_empty(&cache->bg_list)) { > - btrfs_get_block_group(cache); > - trace_btrfs_add_unused_block_group(cache); > - list_add_tail(&cache->bg_list, > - &fs_info->unused_bgs); > - } > - spin_unlock(&fs_info->unused_bgs_lock); > + btrfs_mark_bg_unused(cache); > } else { > spin_unlock(&cache->lock); > } > ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] btrfs: Delayed empty block group auto removal to next transaction 2018-05-28 9:20 [PATCH 0/3] btrfs: Delay block group auto removal to avoid interfering qgroups Qu Wenruo 2018-05-28 9:20 ` [PATCH 1/3] btrfs: trace: Add trace points for unused block groups Qu Wenruo 2018-05-28 9:20 ` [PATCH 2/3] btrfs: Use btrfs_mark_bg_unused() to replace open code Qu Wenruo @ 2018-05-28 9:20 ` Qu Wenruo 2018-05-29 6:20 ` Nikolay Borisov 2018-05-29 16:17 ` [PATCH 0/3] btrfs: Delay block group auto removal to avoid interfering qgroups David Sterba 3 siblings, 1 reply; 9+ messages in thread From: Qu Wenruo @ 2018-05-28 9:20 UTC (permalink / raw) To: linux-btrfs Under certain KVM load and LTP tests, we are possible to hit the following calltrace if quota is enabled: ------ BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 blk_status_to_errno+0x1a/0x30 CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 (unreleased) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] task: ffff9f827b340bc0 task.stack: ffffb4f8c0304000 RIP: 0010:blk_status_to_errno+0x1a/0x30 Call Trace: submit_extent_page+0x191/0x270 [btrfs] ? btrfs_create_repair_bio+0x130/0x130 [btrfs] __do_readpage+0x2d2/0x810 [btrfs] ? btrfs_create_repair_bio+0x130/0x130 [btrfs] ? run_one_async_done+0xc0/0xc0 [btrfs] __extent_read_full_page+0xe7/0x100 [btrfs] ? run_one_async_done+0xc0/0xc0 [btrfs] read_extent_buffer_pages+0x1ab/0x2d0 [btrfs] ? run_one_async_done+0xc0/0xc0 [btrfs] btree_read_extent_buffer_pages+0x94/0xf0 [btrfs] read_tree_block+0x31/0x60 [btrfs] read_block_for_search.isra.35+0xf0/0x2e0 [btrfs] btrfs_search_slot+0x46b/0xa00 [btrfs] ? kmem_cache_alloc+0x1a8/0x510 ? btrfs_get_token_32+0x5b/0x120 [btrfs] find_parent_nodes+0x11d/0xeb0 [btrfs] ? leaf_space_used+0xb8/0xd0 [btrfs] ? btrfs_leaf_free_space+0x49/0x90 [btrfs] ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs] btrfs_find_all_roots_safe+0x93/0x100 [btrfs] btrfs_find_all_roots+0x45/0x60 [btrfs] btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs] btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs] btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs] insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs] btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs] ? pick_next_task_fair+0x2cd/0x530 ? __switch_to+0x92/0x4b0 btrfs_worker_helper+0x81/0x300 [btrfs] process_one_work+0x1da/0x3f0 worker_thread+0x2b/0x3f0 ? process_one_work+0x3f0/0x3f0 kthread+0x11a/0x130 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x35/0x40 Code: 00 00 5b c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 40 80 ff 0c 40 0f b6 c7 77 0b 48 c1 e0 04 8b 80 00 bf c8 bd c3 <0f> 0b b8 fb ff ff ff c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 ---[ end trace f079fb809e7a862b ]--- BTRFS critical (device vda2): unable to find logical 8820195328 length 16384 BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO failure BTRFS info (device vda2): forced readonly BTRFS error (device vda2): pending csums is 2887680 ------ Although we haven't hit the same bug in real world, it shows that since qgroup is heavily relying on commit root, if block group auto removal happens and removed the empty block group, qgroup could failed to find its old referencers. This patch will add a new member for btrfs_transaction, pending_unused_bgs. Now unused bgs will go trans->transaction->pending_unused_bgs, then fs_info->unused_bgs, and finally get removed by btrfs_deleted_unused_bgs(). Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/ctree.h | 3 ++- fs/btrfs/extent-tree.c | 34 ++++++++++++++++++++++++++++++---- fs/btrfs/scrub.c | 2 +- fs/btrfs/transaction.c | 7 +++++++ fs/btrfs/transaction.h | 10 ++++++++++ 5 files changed, 50 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b19b673485fd..19d7532fa4cf 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2829,7 +2829,8 @@ void check_system_chunk(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, const u64 type); u64 add_new_free_space(struct btrfs_block_group_cache *block_group, struct btrfs_fs_info *info, u64 start, u64 end); -void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg); +void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg, + struct btrfs_trans_handle *trans); /* ctree.c */ int btrfs_bin_search(struct extent_buffer *eb, const struct btrfs_key *key, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index f0d7e19feca7..80e17bd99f8e 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6351,7 +6351,7 @@ static int update_block_group(struct btrfs_trans_handle *trans, * cache writeout. */ if (!alloc && old_val == 0) - btrfs_mark_bg_unused(cache); + btrfs_mark_bg_unused(cache, trans); btrfs_put_block_group(cache); total -= num_bytes; @@ -10194,7 +10194,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) inc_block_group_ro(cache, 1); } else if (btrfs_block_group_used(&cache->item) == 0) { ASSERT(list_empty(&cache->bg_list)); - btrfs_mark_bg_unused(cache); + btrfs_mark_bg_unused(cache, NULL); } } @@ -11119,15 +11119,41 @@ void btrfs_wait_for_snapshot_creation(struct btrfs_root *root) } } -void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg) +/* + * Get a block group cache and mark it as unused (if not already + * unused/deleted) + * + * NOTE: + * Since qgroup could search commit root at any time, we can't just + * delete the block group and its chunk mapping in current trans. + * So we record bgs to be deleted in trans->transaction and then + * queue them into fs_info->unused_bgs. + * + * @bg: The block group cache + * @trans: The trans handler + */ +void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg, + struct btrfs_trans_handle *trans) { struct btrfs_fs_info *fs_info = bg->fs_info; spin_lock(&fs_info->unused_bgs_lock); if (list_empty(&bg->bg_list)) { + /* + * Get one reference, will be freed at btrfs_delete_unused_bgs() + */ btrfs_get_block_group(bg); trace_btrfs_add_unused_block_group(bg); - list_add_tail(&bg->bg_list, &fs_info->unused_bgs); + + /* + * If we don't have a trans, it means we are at mount time. + * It's completely fine to mark it ready to be deleted. + */ + if (!trans) + list_add_tail(&bg->bg_list, &fs_info->unused_bgs); + else + list_add_tail(&bg->bg_list, + &trans->transaction->pending_unused_bgs); } spin_unlock(&fs_info->unused_bgs_lock); } diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 40086b47a65f..d9e2420da457 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3981,7 +3981,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache->removed && !cache->ro && cache->reserved == 0 && btrfs_block_group_used(&cache->item) == 0) { spin_unlock(&cache->lock); - btrfs_mark_bg_unused(cache); + btrfs_mark_bg_unused(cache, NULL); } else { spin_unlock(&cache->lock); } diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index c944b4769e3c..5518215fc388 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -264,6 +264,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, INIT_LIST_HEAD(&cur_trans->pending_snapshots); INIT_LIST_HEAD(&cur_trans->pending_chunks); + INIT_LIST_HEAD(&cur_trans->pending_unused_bgs); INIT_LIST_HEAD(&cur_trans->switch_commits); INIT_LIST_HEAD(&cur_trans->dirty_bgs); INIT_LIST_HEAD(&cur_trans->io_bgs); @@ -2269,6 +2270,12 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) wake_up(&cur_trans->commit_wait); clear_bit(BTRFS_FS_NEED_ASYNC_COMMIT, &fs_info->flags); + /* transaction is done, queue pending unused bgs to cleaner */ + spin_lock(&fs_info->unused_bgs_lock); + list_splice_tail_init(&cur_trans->pending_unused_bgs, + &fs_info->unused_bgs); + spin_unlock(&fs_info->unused_bgs_lock); + spin_lock(&fs_info->trans_lock); list_del_init(&cur_trans->list); spin_unlock(&fs_info->trans_lock); diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index d8c0826bc2c7..9f160e1811f7 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -57,6 +57,16 @@ struct btrfs_transaction { struct list_head switch_commits; struct list_head dirty_bgs; + /* + * Unused block groups waiting to be deleted after current transaction. + * Protected by fs_info->unused_bgs_lock. + * + * Since qgroup could search commit root, we can't delete unused block + * group in the same trans when it get emptied, but delay it to next + * transaction (or next mount if power loss happens) + */ + struct list_head pending_unused_bgs; + /* * There is no explicit lock which protects io_bgs, rather its * consistency is implied by the fact that all the sites which modify -- 2.17.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] btrfs: Delayed empty block group auto removal to next transaction 2018-05-28 9:20 ` [PATCH 3/3] btrfs: Delayed empty block group auto removal to next transaction Qu Wenruo @ 2018-05-29 6:20 ` Nikolay Borisov 2018-05-29 6:30 ` Qu Wenruo 0 siblings, 1 reply; 9+ messages in thread From: Nikolay Borisov @ 2018-05-29 6:20 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 28.05.2018 12:20, Qu Wenruo wrote: > Under certain KVM load and LTP tests, we are possible to hit the > following calltrace if quota is enabled: > ------ > BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 > BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 blk_status_to_errno+0x1a/0x30 > CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 (unreleased) > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 > Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] > task: ffff9f827b340bc0 task.stack: ffffb4f8c0304000 > RIP: 0010:blk_status_to_errno+0x1a/0x30 > Call Trace: > submit_extent_page+0x191/0x270 [btrfs] > ? btrfs_create_repair_bio+0x130/0x130 [btrfs] > __do_readpage+0x2d2/0x810 [btrfs] > ? btrfs_create_repair_bio+0x130/0x130 [btrfs] > ? run_one_async_done+0xc0/0xc0 [btrfs] > __extent_read_full_page+0xe7/0x100 [btrfs] > ? run_one_async_done+0xc0/0xc0 [btrfs] > read_extent_buffer_pages+0x1ab/0x2d0 [btrfs] > ? run_one_async_done+0xc0/0xc0 [btrfs] > btree_read_extent_buffer_pages+0x94/0xf0 [btrfs] > read_tree_block+0x31/0x60 [btrfs] > read_block_for_search.isra.35+0xf0/0x2e0 [btrfs] > btrfs_search_slot+0x46b/0xa00 [btrfs] > ? kmem_cache_alloc+0x1a8/0x510 > ? btrfs_get_token_32+0x5b/0x120 [btrfs] > find_parent_nodes+0x11d/0xeb0 [btrfs] > ? leaf_space_used+0xb8/0xd0 [btrfs] > ? btrfs_leaf_free_space+0x49/0x90 [btrfs] > ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs] > btrfs_find_all_roots_safe+0x93/0x100 [btrfs] > btrfs_find_all_roots+0x45/0x60 [btrfs] > btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs] > btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs] > btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs] > insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs] > btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs] > ? pick_next_task_fair+0x2cd/0x530 > ? __switch_to+0x92/0x4b0 > btrfs_worker_helper+0x81/0x300 [btrfs] > process_one_work+0x1da/0x3f0 > worker_thread+0x2b/0x3f0 > ? process_one_work+0x3f0/0x3f0 > kthread+0x11a/0x130 > ? kthread_create_on_node+0x40/0x40 > ret_from_fork+0x35/0x40 > Code: 00 00 5b c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 40 80 ff 0c 40 0f b6 c7 77 0b 48 c1 e0 04 8b 80 00 bf c8 bd c3 <0f> 0b b8 fb ff ff ff c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 > ---[ end trace f079fb809e7a862b ]--- > BTRFS critical (device vda2): unable to find logical 8820195328 length 16384 > BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO failure > BTRFS info (device vda2): forced readonly > BTRFS error (device vda2): pending csums is 2887680 > ------ > > Although we haven't hit the same bug in real world, it shows that since > qgroup is heavily relying on commit root, if block group auto removal > happens and removed the empty block group, qgroup could failed to find > its old referencers. > > This patch will add a new member for btrfs_transaction, > pending_unused_bgs. > Now unused bgs will go trans->transaction->pending_unused_bgs, then > fs_info->unused_bgs, and finally get removed by > btrfs_deleted_unused_bgs(). > > Signed-off-by: Qu Wenruo <wqu@suse.com> Overall looks good and simple, couple of minor nits which are not really critical. > --- > fs/btrfs/ctree.h | 3 ++- > fs/btrfs/extent-tree.c | 34 ++++++++++++++++++++++++++++++---- > fs/btrfs/scrub.c | 2 +- > fs/btrfs/transaction.c | 7 +++++++ > fs/btrfs/transaction.h | 10 ++++++++++ > 5 files changed, 50 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index b19b673485fd..19d7532fa4cf 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -2829,7 +2829,8 @@ void check_system_chunk(struct btrfs_trans_handle *trans, > struct btrfs_fs_info *fs_info, const u64 type); > u64 add_new_free_space(struct btrfs_block_group_cache *block_group, > struct btrfs_fs_info *info, u64 start, u64 end); > -void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg); > +void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg, > + struct btrfs_trans_handle *trans); > > /* ctree.c */ > int btrfs_bin_search(struct extent_buffer *eb, const struct btrfs_key *key, > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index f0d7e19feca7..80e17bd99f8e 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -6351,7 +6351,7 @@ static int update_block_group(struct btrfs_trans_handle *trans, > * cache writeout. > */ > if (!alloc && old_val == 0) > - btrfs_mark_bg_unused(cache); > + btrfs_mark_bg_unused(cache, trans); > > btrfs_put_block_group(cache); > total -= num_bytes; > @@ -10194,7 +10194,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) > inc_block_group_ro(cache, 1); > } else if (btrfs_block_group_used(&cache->item) == 0) { > ASSERT(list_empty(&cache->bg_list)); > - btrfs_mark_bg_unused(cache); > + btrfs_mark_bg_unused(cache, NULL); nit: btrfs_mark_bg_unused(cache, false); looks better than passing NULL, see below for full sugestion. > } > } > > @@ -11119,15 +11119,41 @@ void btrfs_wait_for_snapshot_creation(struct btrfs_root *root) > } > } > > -void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg) > +/* > + * Get a block group cache and mark it as unused (if not already > + * unused/deleted) > + * > + * NOTE: > + * Since qgroup could search commit root at any time, we can't just > + * delete the block group and its chunk mapping in current trans. > + * So we record bgs to be deleted in trans->transaction and then > + * queue them into fs_info->unused_bgs. > + * > + * @bg: The block group cache > + * @trans: The trans handler > + */ > +void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg, > + struct btrfs_trans_handle *trans) > { > struct btrfs_fs_info *fs_info = bg->fs_info; > > spin_lock(&fs_info->unused_bgs_lock); With this patch fs_info->unused_bgs_lock actually become sort-of global, since it protects the fs_info->unused_bgs, but it also protects the per-transaction pending_unused_bgs. I guess there shouldn't be much contention for that lock since we don't really delete that many unused bgs. > if (list_empty(&bg->bg_list)) { > + /* > + * Get one reference, will be freed at btrfs_delete_unused_bgs() > + */ nit: Why not make the coment a single line /* text */ > btrfs_get_block_group(bg); > trace_btrfs_add_unused_block_group(bg); > - list_add_tail(&bg->bg_list, &fs_info->unused_bgs); > + > + /* > + * If we don't have a trans, it means we are at mount time. > + * It's completely fine to mark it ready to be deleted. > + */ > + if (!trans) Since you don't really use the transaction struct, perhaps make this parameter a boolean and name it something like fast_delete or direct_delete or quick_delete or some such? Then you can put this comment as the description of the parameter rather than in the code. > + list_add_tail(&bg->bg_list, &fs_info->unused_bgs); > + else > + list_add_tail(&bg->bg_list, > + &trans->transaction->pending_unused_bgs); > } > spin_unlock(&fs_info->unused_bgs_lock); > } > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c > index 40086b47a65f..d9e2420da457 100644 > --- a/fs/btrfs/scrub.c > +++ b/fs/btrfs/scrub.c > @@ -3981,7 +3981,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, > if (!cache->removed && !cache->ro && cache->reserved == 0 && > btrfs_block_group_used(&cache->item) == 0) { > spin_unlock(&cache->lock); > - btrfs_mark_bg_unused(cache); > + btrfs_mark_bg_unused(cache, NULL); > } else { > spin_unlock(&cache->lock); > } > diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c > index c944b4769e3c..5518215fc388 100644 > --- a/fs/btrfs/transaction.c > +++ b/fs/btrfs/transaction.c > @@ -264,6 +264,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, > > INIT_LIST_HEAD(&cur_trans->pending_snapshots); > INIT_LIST_HEAD(&cur_trans->pending_chunks); > + INIT_LIST_HEAD(&cur_trans->pending_unused_bgs); > INIT_LIST_HEAD(&cur_trans->switch_commits); > INIT_LIST_HEAD(&cur_trans->dirty_bgs); > INIT_LIST_HEAD(&cur_trans->io_bgs); > @@ -2269,6 +2270,12 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) > wake_up(&cur_trans->commit_wait); > clear_bit(BTRFS_FS_NEED_ASYNC_COMMIT, &fs_info->flags); > > + /* transaction is done, queue pending unused bgs to cleaner */ > + spin_lock(&fs_info->unused_bgs_lock); > + list_splice_tail_init(&cur_trans->pending_unused_bgs, > + &fs_info->unused_bgs); > + spin_unlock(&fs_info->unused_bgs_lock); > + > spin_lock(&fs_info->trans_lock); > list_del_init(&cur_trans->list); > spin_unlock(&fs_info->trans_lock); > diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h > index d8c0826bc2c7..9f160e1811f7 100644 > --- a/fs/btrfs/transaction.h > +++ b/fs/btrfs/transaction.h > @@ -57,6 +57,16 @@ struct btrfs_transaction { > struct list_head switch_commits; > struct list_head dirty_bgs; > > + /* > + * Unused block groups waiting to be deleted after current transaction. > + * Protected by fs_info->unused_bgs_lock. I think the right way to document the lock protecting the list would be : btrfs_fs_info::unuused_bgs_lock, since you want to refer to the type and not a de-factor naming standard (which might as well change in the future). > + * > + * Since qgroup could search commit root, we can't delete unused block > + * group in the same trans when it get emptied, but delay it to next > + * transaction (or next mount if power loss happens) > + */ > + struct list_head pending_unused_bgs; > + > /* > * There is no explicit lock which protects io_bgs, rather its > * consistency is implied by the fact that all the sites which modify > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] btrfs: Delayed empty block group auto removal to next transaction 2018-05-29 6:20 ` Nikolay Borisov @ 2018-05-29 6:30 ` Qu Wenruo 0 siblings, 0 replies; 9+ messages in thread From: Qu Wenruo @ 2018-05-29 6:30 UTC (permalink / raw) To: Nikolay Borisov, Qu Wenruo, linux-btrfs On 2018年05月29日 14:20, Nikolay Borisov wrote: > > > On 28.05.2018 12:20, Qu Wenruo wrote: >> Under certain KVM load and LTP tests, we are possible to hit the >> following calltrace if quota is enabled: >> ------ >> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 >> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 >> ------------[ cut here ]------------ >> WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 blk_status_to_errno+0x1a/0x30 >> CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 (unreleased) >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 >> Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] >> task: ffff9f827b340bc0 task.stack: ffffb4f8c0304000 >> RIP: 0010:blk_status_to_errno+0x1a/0x30 >> Call Trace: >> submit_extent_page+0x191/0x270 [btrfs] >> ? btrfs_create_repair_bio+0x130/0x130 [btrfs] >> __do_readpage+0x2d2/0x810 [btrfs] >> ? btrfs_create_repair_bio+0x130/0x130 [btrfs] >> ? run_one_async_done+0xc0/0xc0 [btrfs] >> __extent_read_full_page+0xe7/0x100 [btrfs] >> ? run_one_async_done+0xc0/0xc0 [btrfs] >> read_extent_buffer_pages+0x1ab/0x2d0 [btrfs] >> ? run_one_async_done+0xc0/0xc0 [btrfs] >> btree_read_extent_buffer_pages+0x94/0xf0 [btrfs] >> read_tree_block+0x31/0x60 [btrfs] >> read_block_for_search.isra.35+0xf0/0x2e0 [btrfs] >> btrfs_search_slot+0x46b/0xa00 [btrfs] >> ? kmem_cache_alloc+0x1a8/0x510 >> ? btrfs_get_token_32+0x5b/0x120 [btrfs] >> find_parent_nodes+0x11d/0xeb0 [btrfs] >> ? leaf_space_used+0xb8/0xd0 [btrfs] >> ? btrfs_leaf_free_space+0x49/0x90 [btrfs] >> ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs] >> btrfs_find_all_roots_safe+0x93/0x100 [btrfs] >> btrfs_find_all_roots+0x45/0x60 [btrfs] >> btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs] >> btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs] >> btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs] >> insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs] >> btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs] >> ? pick_next_task_fair+0x2cd/0x530 >> ? __switch_to+0x92/0x4b0 >> btrfs_worker_helper+0x81/0x300 [btrfs] >> process_one_work+0x1da/0x3f0 >> worker_thread+0x2b/0x3f0 >> ? process_one_work+0x3f0/0x3f0 >> kthread+0x11a/0x130 >> ? kthread_create_on_node+0x40/0x40 >> ret_from_fork+0x35/0x40 >> Code: 00 00 5b c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 40 80 ff 0c 40 0f b6 c7 77 0b 48 c1 e0 04 8b 80 00 bf c8 bd c3 <0f> 0b b8 fb ff ff ff c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 >> ---[ end trace f079fb809e7a862b ]--- >> BTRFS critical (device vda2): unable to find logical 8820195328 length 16384 >> BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO failure >> BTRFS info (device vda2): forced readonly >> BTRFS error (device vda2): pending csums is 2887680 >> ------ >> >> Although we haven't hit the same bug in real world, it shows that since >> qgroup is heavily relying on commit root, if block group auto removal >> happens and removed the empty block group, qgroup could failed to find >> its old referencers. >> >> This patch will add a new member for btrfs_transaction, >> pending_unused_bgs. >> Now unused bgs will go trans->transaction->pending_unused_bgs, then >> fs_info->unused_bgs, and finally get removed by >> btrfs_deleted_unused_bgs(). >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> > > Overall looks good and simple, couple of minor nits which are not really > critical. >> --- >> fs/btrfs/ctree.h | 3 ++- >> fs/btrfs/extent-tree.c | 34 ++++++++++++++++++++++++++++++---- >> fs/btrfs/scrub.c | 2 +- >> fs/btrfs/transaction.c | 7 +++++++ >> fs/btrfs/transaction.h | 10 ++++++++++ >> 5 files changed, 50 insertions(+), 6 deletions(-) >> >> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h >> index b19b673485fd..19d7532fa4cf 100644 >> --- a/fs/btrfs/ctree.h >> +++ b/fs/btrfs/ctree.h >> @@ -2829,7 +2829,8 @@ void check_system_chunk(struct btrfs_trans_handle *trans, >> struct btrfs_fs_info *fs_info, const u64 type); >> u64 add_new_free_space(struct btrfs_block_group_cache *block_group, >> struct btrfs_fs_info *info, u64 start, u64 end); >> -void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg); >> +void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg, >> + struct btrfs_trans_handle *trans); >> >> /* ctree.c */ >> int btrfs_bin_search(struct extent_buffer *eb, const struct btrfs_key *key, >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >> index f0d7e19feca7..80e17bd99f8e 100644 >> --- a/fs/btrfs/extent-tree.c >> +++ b/fs/btrfs/extent-tree.c >> @@ -6351,7 +6351,7 @@ static int update_block_group(struct btrfs_trans_handle *trans, >> * cache writeout. >> */ >> if (!alloc && old_val == 0) >> - btrfs_mark_bg_unused(cache); >> + btrfs_mark_bg_unused(cache, trans); >> >> btrfs_put_block_group(cache); >> total -= num_bytes; >> @@ -10194,7 +10194,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) >> inc_block_group_ro(cache, 1); >> } else if (btrfs_block_group_used(&cache->item) == 0) { >> ASSERT(list_empty(&cache->bg_list)); >> - btrfs_mark_bg_unused(cache); >> + btrfs_mark_bg_unused(cache, NULL); > > nit: btrfs_mark_bg_unused(cache, false); looks better than passing NULL, > see below for full sugestion. > >> } >> } >> >> @@ -11119,15 +11119,41 @@ void btrfs_wait_for_snapshot_creation(struct btrfs_root *root) >> } >> } >> >> -void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg) >> +/* >> + * Get a block group cache and mark it as unused (if not already >> + * unused/deleted) >> + * >> + * NOTE: >> + * Since qgroup could search commit root at any time, we can't just >> + * delete the block group and its chunk mapping in current trans. >> + * So we record bgs to be deleted in trans->transaction and then >> + * queue them into fs_info->unused_bgs. >> + * >> + * @bg: The block group cache >> + * @trans: The trans handler >> + */ >> +void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg, >> + struct btrfs_trans_handle *trans) >> { >> struct btrfs_fs_info *fs_info = bg->fs_info; >> >> spin_lock(&fs_info->unused_bgs_lock); > > With this patch fs_info->unused_bgs_lock actually become sort-of global, > since it protects the fs_info->unused_bgs, but it also protects the > per-transaction pending_unused_bgs. Yep, that's true. > I guess there shouldn't be much > contention for that lock since we don't really delete that many unused bgs. Right, that's one of the reasons I'm reusing the lock. Another problem is introducing new lock will increase the possibility about ABBA deadlock. Since it's not really performance critical, I prefer using the global lock other than introducing a new one. > >> if (list_empty(&bg->bg_list)) { >> + /* >> + * Get one reference, will be freed at btrfs_delete_unused_bgs() >> + */ > nit: Why not make the coment a single line /* text */ IIRC it's over 80 chars if single lined. The last ')' is just the 80th character. > >> btrfs_get_block_group(bg); >> trace_btrfs_add_unused_block_group(bg); >> - list_add_tail(&bg->bg_list, &fs_info->unused_bgs); >> + >> + /* >> + * If we don't have a trans, it means we are at mount time. >> + * It's completely fine to mark it ready to be deleted. >> + */ >> + if (!trans) > Since you don't really use the transaction struct, perhaps make this > parameter a boolean and name it something like fast_delete or > direct_delete or quick_delete or some such? Then you can put this > comment as the description of the parameter rather than in the code. I also considered this, however the problem here is we need extra function calls to get current transaction and need to keep an eye on the other call sites like in btrfs_read_block_groups(). Where if we're trying to start a transaction at improper time, it could cause extra problem. Thus here using transaction handler provides a good balance for caller and the complexity of the function. For caller, they don't really care about too much thing, if you have a trans handler, pass it. If not, pass NULL. Pretty instinctive. In fact, it's even better than a bool, in which case caller needs to determine which bool should be passed. And the function has no need to try to start/get a transaction. Thanks, Qu >> + list_add_tail(&bg->bg_list, &fs_info->unused_bgs); >> + else >> + list_add_tail(&bg->bg_list, >> + &trans->transaction->pending_unused_bgs); >> } >> spin_unlock(&fs_info->unused_bgs_lock); >> } >> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c >> index 40086b47a65f..d9e2420da457 100644 >> --- a/fs/btrfs/scrub.c >> +++ b/fs/btrfs/scrub.c >> @@ -3981,7 +3981,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, >> if (!cache->removed && !cache->ro && cache->reserved == 0 && >> btrfs_block_group_used(&cache->item) == 0) { >> spin_unlock(&cache->lock); >> - btrfs_mark_bg_unused(cache); >> + btrfs_mark_bg_unused(cache, NULL); >> } else { >> spin_unlock(&cache->lock); >> } >> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c >> index c944b4769e3c..5518215fc388 100644 >> --- a/fs/btrfs/transaction.c >> +++ b/fs/btrfs/transaction.c >> @@ -264,6 +264,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, >> >> INIT_LIST_HEAD(&cur_trans->pending_snapshots); >> INIT_LIST_HEAD(&cur_trans->pending_chunks); >> + INIT_LIST_HEAD(&cur_trans->pending_unused_bgs); >> INIT_LIST_HEAD(&cur_trans->switch_commits); >> INIT_LIST_HEAD(&cur_trans->dirty_bgs); >> INIT_LIST_HEAD(&cur_trans->io_bgs); >> @@ -2269,6 +2270,12 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) >> wake_up(&cur_trans->commit_wait); >> clear_bit(BTRFS_FS_NEED_ASYNC_COMMIT, &fs_info->flags); >> >> + /* transaction is done, queue pending unused bgs to cleaner */ >> + spin_lock(&fs_info->unused_bgs_lock); >> + list_splice_tail_init(&cur_trans->pending_unused_bgs, >> + &fs_info->unused_bgs); >> + spin_unlock(&fs_info->unused_bgs_lock); >> + >> spin_lock(&fs_info->trans_lock); >> list_del_init(&cur_trans->list); >> spin_unlock(&fs_info->trans_lock); >> diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h >> index d8c0826bc2c7..9f160e1811f7 100644 >> --- a/fs/btrfs/transaction.h >> +++ b/fs/btrfs/transaction.h >> @@ -57,6 +57,16 @@ struct btrfs_transaction { >> struct list_head switch_commits; >> struct list_head dirty_bgs; >> >> + /* >> + * Unused block groups waiting to be deleted after current transaction. >> + * Protected by fs_info->unused_bgs_lock. > > I think the right way to document the lock protecting the list would be > : btrfs_fs_info::unuused_bgs_lock, since you want to refer to the type > and not a de-factor naming standard (which might as well change in the > future). >> + * >> + * Since qgroup could search commit root, we can't delete unused block >> + * group in the same trans when it get emptied, but delay it to next >> + * transaction (or next mount if power loss happens) >> + */ >> + struct list_head pending_unused_bgs; >> + >> /* >> * There is no explicit lock which protects io_bgs, rather its >> * consistency is implied by the fact that all the sites which modify >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] btrfs: Delay block group auto removal to avoid interfering qgroups 2018-05-28 9:20 [PATCH 0/3] btrfs: Delay block group auto removal to avoid interfering qgroups Qu Wenruo ` (2 preceding siblings ...) 2018-05-28 9:20 ` [PATCH 3/3] btrfs: Delayed empty block group auto removal to next transaction Qu Wenruo @ 2018-05-29 16:17 ` David Sterba 3 siblings, 0 replies; 9+ messages in thread From: David Sterba @ 2018-05-29 16:17 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Mon, May 28, 2018 at 05:20:06PM +0800, Qu Wenruo wrote: > The patchset can be fetched from github: > https://github.com/adam900710/linux/tree/delayed_bg_removal > > It's based on v4.17-rc5 branch. Please refresh the patches on top of today's misc-next. The patch 2/3 has been there already and now is deleted as it conflicts with 1/3, so I'm going to apply them in the order as they are in this pathset. Patch 3/3 is nontrivial and I'd like to have a closer look at the bg tracking and locking, 1 and 2 can go to misc-next now. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 0/3] btrfs: Delay block group auto removal to avoid interfering qgroups @ 2018-05-28 6:58 Qu Wenruo 2018-05-28 6:58 ` [PATCH 3/3] btrfs: Delayed empty block group auto removal to next transaction Qu Wenruo 0 siblings, 1 reply; 9+ messages in thread From: Qu Wenruo @ 2018-05-28 6:58 UTC (permalink / raw) To: linux-btrfs The patchset can be fetched from github: https://github.com/adam900710/linux/tree/delayed_bg_removal It's based on v4.17-rc5 branch. This bug is reported from SUSE openQA, although it's pretty hard to hit in real world (even real world VM), it's believed block group auto removal (anyway, there isn't much thing can remove a chunk mapping in btrfs) could interfere with qgroup's search on commit root. Full details can be found in the 3rd patch. The patchset uses 2 submitted cleanup/refactor patches as basis, and the 3rd patch will ensure unused block group will only be deleted after current transaction is done. Qu Wenruo (3): btrfs: trace: Add trace points for unused block groups btrfs: Use btrfs_mark_bg_unused() to replace open code btrfs: Delayed empty block group auto removal to next transaction fs/btrfs/ctree.h | 2 ++ fs/btrfs/extent-tree.c | 62 ++++++++++++++++++++++++++---------- fs/btrfs/scrub.c | 8 +---- fs/btrfs/transaction.c | 7 ++++ fs/btrfs/transaction.h | 10 ++++++ include/trace/events/btrfs.h | 42 ++++++++++++++++++++++++ 6 files changed, 107 insertions(+), 24 deletions(-) -- 2.17.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] btrfs: Delayed empty block group auto removal to next transaction 2018-05-28 6:58 Qu Wenruo @ 2018-05-28 6:58 ` Qu Wenruo 0 siblings, 0 replies; 9+ messages in thread From: Qu Wenruo @ 2018-05-28 6:58 UTC (permalink / raw) To: linux-btrfs Under certain KVM load and LTP tests, we are possible to hit the following calltrace if quota is enabled: ------ BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 blk_status_to_errno+0x1a/0x30 CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 (unreleased) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] task: ffff9f827b340bc0 task.stack: ffffb4f8c0304000 RIP: 0010:blk_status_to_errno+0x1a/0x30 Call Trace: submit_extent_page+0x191/0x270 [btrfs] ? btrfs_create_repair_bio+0x130/0x130 [btrfs] __do_readpage+0x2d2/0x810 [btrfs] ? btrfs_create_repair_bio+0x130/0x130 [btrfs] ? run_one_async_done+0xc0/0xc0 [btrfs] __extent_read_full_page+0xe7/0x100 [btrfs] ? run_one_async_done+0xc0/0xc0 [btrfs] read_extent_buffer_pages+0x1ab/0x2d0 [btrfs] ? run_one_async_done+0xc0/0xc0 [btrfs] btree_read_extent_buffer_pages+0x94/0xf0 [btrfs] read_tree_block+0x31/0x60 [btrfs] read_block_for_search.isra.35+0xf0/0x2e0 [btrfs] btrfs_search_slot+0x46b/0xa00 [btrfs] ? kmem_cache_alloc+0x1a8/0x510 ? btrfs_get_token_32+0x5b/0x120 [btrfs] find_parent_nodes+0x11d/0xeb0 [btrfs] ? leaf_space_used+0xb8/0xd0 [btrfs] ? btrfs_leaf_free_space+0x49/0x90 [btrfs] ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs] btrfs_find_all_roots_safe+0x93/0x100 [btrfs] btrfs_find_all_roots+0x45/0x60 [btrfs] btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs] btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs] btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs] insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs] btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs] ? pick_next_task_fair+0x2cd/0x530 ? __switch_to+0x92/0x4b0 btrfs_worker_helper+0x81/0x300 [btrfs] process_one_work+0x1da/0x3f0 worker_thread+0x2b/0x3f0 ? process_one_work+0x3f0/0x3f0 kthread+0x11a/0x130 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x35/0x40 Code: 00 00 5b c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 40 80 ff 0c 40 0f b6 c7 77 0b 48 c1 e0 04 8b 80 00 bf c8 bd c3 <0f> 0b b8 fb ff ff ff c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 ---[ end trace f079fb809e7a862b ]--- BTRFS critical (device vda2): unable to find logical 8820195328 length 16384 BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO failure BTRFS info (device vda2): forced readonly BTRFS error (device vda2): pending csums is 2887680 ------ Although we haven't hit the same bug in real world, it shows that since qgroup is heavily relying on commit root, if block group auto removal happens and removed the empty block group, qgroup could failed to find its old referencers. This patch will add a new member for btrfs_transaction, pending_unused_bgs. Now unused bgs will go trans->transaction->pending_unused_bgs, then fs_info->unused_bgs, and finally get removed by btrfs_deleted_unused_bgs(). Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/ctree.h | 3 ++- fs/btrfs/extent-tree.c | 34 ++++++++++++++++++++++++++++++---- fs/btrfs/scrub.c | 2 +- fs/btrfs/transaction.c | 7 +++++++ fs/btrfs/transaction.h | 10 ++++++++++ 5 files changed, 50 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b19b673485fd..19d7532fa4cf 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2829,7 +2829,8 @@ void check_system_chunk(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, const u64 type); u64 add_new_free_space(struct btrfs_block_group_cache *block_group, struct btrfs_fs_info *info, u64 start, u64 end); -void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg); +void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg, + struct btrfs_trans_handle *trans); /* ctree.c */ int btrfs_bin_search(struct extent_buffer *eb, const struct btrfs_key *key, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index f0d7e19feca7..80e17bd99f8e 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6351,7 +6351,7 @@ static int update_block_group(struct btrfs_trans_handle *trans, * cache writeout. */ if (!alloc && old_val == 0) - btrfs_mark_bg_unused(cache); + btrfs_mark_bg_unused(cache, trans); btrfs_put_block_group(cache); total -= num_bytes; @@ -10194,7 +10194,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) inc_block_group_ro(cache, 1); } else if (btrfs_block_group_used(&cache->item) == 0) { ASSERT(list_empty(&cache->bg_list)); - btrfs_mark_bg_unused(cache); + btrfs_mark_bg_unused(cache, NULL); } } @@ -11119,15 +11119,41 @@ void btrfs_wait_for_snapshot_creation(struct btrfs_root *root) } } -void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg) +/* + * Get a block group cache and mark it as unused (if not already + * unused/deleted) + * + * NOTE: + * Since qgroup could search commit root at any time, we can't just + * delete the block group and its chunk mapping in current trans. + * So we record bgs to be deleted in trans->transaction and then + * queue them into fs_info->unused_bgs. + * + * @bg: The block group cache + * @trans: The trans handler + */ +void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg, + struct btrfs_trans_handle *trans) { struct btrfs_fs_info *fs_info = bg->fs_info; spin_lock(&fs_info->unused_bgs_lock); if (list_empty(&bg->bg_list)) { + /* + * Get one reference, will be freed at btrfs_delete_unused_bgs() + */ btrfs_get_block_group(bg); trace_btrfs_add_unused_block_group(bg); - list_add_tail(&bg->bg_list, &fs_info->unused_bgs); + + /* + * If we don't have a trans, it means we are at mount time. + * It's completely fine to mark it ready to be deleted. + */ + if (!trans) + list_add_tail(&bg->bg_list, &fs_info->unused_bgs); + else + list_add_tail(&bg->bg_list, + &trans->transaction->pending_unused_bgs); } spin_unlock(&fs_info->unused_bgs_lock); } diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 40086b47a65f..d9e2420da457 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3981,7 +3981,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache->removed && !cache->ro && cache->reserved == 0 && btrfs_block_group_used(&cache->item) == 0) { spin_unlock(&cache->lock); - btrfs_mark_bg_unused(cache); + btrfs_mark_bg_unused(cache, NULL); } else { spin_unlock(&cache->lock); } diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index c944b4769e3c..5518215fc388 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -264,6 +264,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, INIT_LIST_HEAD(&cur_trans->pending_snapshots); INIT_LIST_HEAD(&cur_trans->pending_chunks); + INIT_LIST_HEAD(&cur_trans->pending_unused_bgs); INIT_LIST_HEAD(&cur_trans->switch_commits); INIT_LIST_HEAD(&cur_trans->dirty_bgs); INIT_LIST_HEAD(&cur_trans->io_bgs); @@ -2269,6 +2270,12 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) wake_up(&cur_trans->commit_wait); clear_bit(BTRFS_FS_NEED_ASYNC_COMMIT, &fs_info->flags); + /* transaction is done, queue pending unused bgs to cleaner */ + spin_lock(&fs_info->unused_bgs_lock); + list_splice_tail_init(&cur_trans->pending_unused_bgs, + &fs_info->unused_bgs); + spin_unlock(&fs_info->unused_bgs_lock); + spin_lock(&fs_info->trans_lock); list_del_init(&cur_trans->list); spin_unlock(&fs_info->trans_lock); diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index d8c0826bc2c7..9f160e1811f7 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -57,6 +57,16 @@ struct btrfs_transaction { struct list_head switch_commits; struct list_head dirty_bgs; + /* + * Unused block groups waiting to be deleted after current transaction. + * Protected by fs_info->unused_bgs_lock. + * + * Since qgroup could search commit root, we can't delete unused block + * group in the same trans when it get emptied, but delay it to next + * transaction (or next mount if power loss happens) + */ + struct list_head pending_unused_bgs; + /* * There is no explicit lock which protects io_bgs, rather its * consistency is implied by the fact that all the sites which modify -- 2.17.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-05-29 16:20 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-05-28 9:20 [PATCH 0/3] btrfs: Delay block group auto removal to avoid interfering qgroups Qu Wenruo 2018-05-28 9:20 ` [PATCH 1/3] btrfs: trace: Add trace points for unused block groups Qu Wenruo 2018-05-28 9:20 ` [PATCH 2/3] btrfs: Use btrfs_mark_bg_unused() to replace open code Qu Wenruo 2018-05-28 21:45 ` Nikolay Borisov 2018-05-28 9:20 ` [PATCH 3/3] btrfs: Delayed empty block group auto removal to next transaction Qu Wenruo 2018-05-29 6:20 ` Nikolay Borisov 2018-05-29 6:30 ` Qu Wenruo 2018-05-29 16:17 ` [PATCH 0/3] btrfs: Delay block group auto removal to avoid interfering qgroups David Sterba -- strict thread matches above, loose matches on Subject: below -- 2018-05-28 6:58 Qu Wenruo 2018-05-28 6:58 ` [PATCH 3/3] btrfs: Delayed empty block group auto removal to next transaction Qu Wenruo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).