From: Nikolay Borisov <nborisov@suse.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Cc: "Klemens Schölhorn" <klemens@schoelhorn.eu>
Subject: Re: [PATCH v2 1/2] btrfs-progs: Exhaust delayed refs and dirty block groups to prevent delayed refs lost
Date: Mon, 8 Jul 2019 13:44:34 +0300 [thread overview]
Message-ID: <6846c7ad-a5da-abdb-92f7-ebf619acf2e8@suse.com> (raw)
In-Reply-To: <20190708073352.6095-2-wqu@suse.com>
On 8.07.19 г. 10:33 ч., Qu Wenruo wrote:
> [BUG]
> Btrfs-progs sometimes fails to find certain extent backref when
> committing transaction.
> The most reliable way to reproduce it is fsck-test/013 on 64K page sized
> system:
> [...]
> adding new data backref on 315859712 root 287 owner 292 offset 0 found 1
> btrfs unable to find ref byte nr 31850496 parent 0 root 2 owner 0 offset 0
> Failed to find [30867456, 168, 65536]
>
> Also there are some github bug reports related to this problem.
>
> [CAUSE]
> Commit 909357e86799 ("btrfs-progs: Wire up delayed refs") introduced
> delayed refs in btrfs-progs.
>
> However in that commit, delayed refs are not run at correct timing.
> That commit calls btrfs_run_delayed_refs() before
> btrfs_write_dirty_block_groups(), which needs to update
> BLOCK_GROUP_ITEMs in extent tree, thus could cause new delayed refs.
>
> This means each time we commit a transaction, we may screw up the extent
> tree by dropping some pending delayed refs, like:
>
> Transaction 711:
> btrfs_commit_transaction()
> |- btrfs_run_delayed_refs()
> | Now all delayed refs are written to extent tree
> |
> |- btrfs_write_dirty_block_groups()
> | Needs to update extent tree root
> | ADD_DELAYED_REF to 315859712.
> | Delayed refs are attached to current trans handle.
> |
> |- __commit_transaction()
> |- write_ctree_super()
> |- btrfs_finish_extent_commit()
> |- kfree(trans)
> Now delayed ref for 315859712 are lost
>
> Transaction 712:
> Tree block 315859712 get dropped
> btrfs_commit_transaction()
> |- btrfs_run_delayed_refs()
> |- run_one_delayed_ref()
> |- __free_extent()
> As previous ADD_DELAYED_REF to 315859712 is lost, extent tree
> doesn't has any backref for 315859712, causing the bug
>
> In fact, commit c31edf610cbe ("btrfs-progs: Fix false ENOSPC alert by
> tracking used space correctly") detects the tree block leakage, but in
> the reproducer we have too many noise, thus nobody notices the leakage
> warning.
>
> [FIX]
> We can't just move btrfs_run_delayed_refs() after
> btrfs_write_dirty_block_groups(), as during btrfs_run_delayed_refs(), we
> can re-dirty block groups.
> Thus we need to exhaust both delayed refs and dirty blocks.
>
> This patch will call btrfs_write_dirty_block_groups() and
> btrfs_run_delayed_refs() in a loop until both delayed refs and dirty
> blocks are exhausted. Much like what we do in commit_cowonly_roots() in
> kernel.
>
> Also, to prevent such problem from happening again (and not to debug
> such problem again), add extra check on delayed refs before freeing the
> trans handle.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>
> Reported-by: Klemens Schölhorn <klemens@schoelhorn.eu>
> Issue: 187
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> transaction.c | 27 +++++++++++++++++++++------
> 1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/transaction.c b/transaction.c
> index 551fb24e674d..3b0a428db76e 100644
> --- a/transaction.c
> +++ b/transaction.c
> @@ -193,17 +193,32 @@ commit_tree:
> ret = commit_tree_roots(trans, fs_info);
> if (ret < 0)
> goto error;
> +
> /*
> - * Ensure that all committed roots are properly accounted in the
> - * extent tree
> + * btrfs_write_dirty_block_groups() can cause CoW thus new delayed
> + * tree refs, while run such delayed tree refs can dirty block groups
> + * again, we need to exhause both dirty blocks and delayed refs
> */
> - ret = btrfs_run_delayed_refs(trans, -1);
> - if (ret < 0)
> - goto error;
> - btrfs_write_dirty_block_groups(trans);
> + while (!RB_EMPTY_ROOT(&trans->delayed_refs.href_root) ||
> + test_range_bit(&fs_info->block_group_cache, 0, (u64)-1,
> + BLOCK_GROUP_DIRTY, 0))
> + {
> + ret = btrfs_write_dirty_block_groups(trans);
> + if (ret < 0)
> + goto error;
> + ret = btrfs_run_delayed_refs(trans, -1);
> + if (ret < 0)
> + goto error;
> + }
> __commit_transaction(trans, root);
> if (ret < 0)
> goto error;
> +
> + /* There should be no pending delayed refs now */
> + if (!RB_EMPTY_ROOT(&trans->delayed_refs.href_root)) {
> + error("Uncommitted delayed refs detected");
> + goto error;
> + }
> ret = write_ctree_super(trans);
> btrfs_finish_extent_commit(trans);
> kfree(trans);
>
next prev parent reply other threads:[~2019-07-08 10:44 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-08 7:33 [PATCH v2 0/2] btrfs-progs: Fix delayed ref leakage Qu Wenruo
2019-07-08 7:33 ` [PATCH v2 1/2] btrfs-progs: Exhaust delayed refs and dirty block groups to prevent delayed refs lost Qu Wenruo
2019-07-08 10:44 ` Nikolay Borisov [this message]
2019-07-22 12:59 ` David Sterba
2019-07-08 7:33 ` [PATCH v2 2/2] btrfs-progs: Avoid unnecessary block group item COW if the content hasn't changed Qu Wenruo
2019-07-08 10:43 ` Nikolay Borisov
2019-07-08 12:50 ` Qu Wenruo
2019-07-08 13:07 ` Nikolay Borisov
2019-07-08 13:30 ` Qu Wenruo
2019-07-22 12:59 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6846c7ad-a5da-abdb-92f7-ebf619acf2e8@suse.com \
--to=nborisov@suse.com \
--cc=klemens@schoelhorn.eu \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox