From: Josef Bacik <josef@toxicpanda.com>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 01/21] btrfs: fix race when refilling delayed refs block reserve
Date: Fri, 8 Sep 2023 10:46:11 -0400 [thread overview]
Message-ID: <20230908144611.GB1977092@perftesting> (raw)
In-Reply-To: <ad77c4035370a5db4e0b813da9eff73e52fd30c0.1694174371.git.fdmanana@suse.com>
On Fri, Sep 08, 2023 at 01:09:03PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> If we have two (or more) tasks attempting to refill the delayed refs block
> reserve we can end up with the delayed block reserve being over reserved,
> that is, with a reserved space greater than its size. If this happens, we
> are holding to more reserved space than necessary, however it may or may
> not be harmless. In case the delayed refs block reserve size later
> increases to match or go beyond the reserved space, then nothing bad
> happens, we end up simply avoiding doing a space reservation later at the
> expense of doing it now. However if the delayed refs block reserve size
> never increases to match its reserved size, then at unmount time we will
> trigger the following warning from btrfs_release_global_block_rsv():
>
> WARN_ON(fs_info->delayed_block_rsv.reserved > 0);
>
> The race happens like this:
>
> 1) The delayed refs block reserve has a size of 8M and a reserved space of
> 6M for example;
>
> 2) Task A calls btrfs_delayed_refs_rsv_refill();
>
> 3) Task B also calls btrfs_delayed_refs_rsv_refill();
>
> 4) Task A sees there's a 2M difference between the size and the reserved
> space of the delayed refs rsv, so it will reserve 2M of space by
> calling btrfs_reserve_metadata_bytes();
>
> 5) Task B also sees that 2M difference, and like task A, it reserves
> another 2M of metadata space;
>
> 6) Both task A and task B increase the reserved space of block reserve
> by 2M, by calling btrfs_block_rsv_add_bytes(), so the block reserve
> ends up with a size of 8M and a reserved space of 10M;
>
> 7) As delayed refs are run and their space released, the delayed refs
> block reserve ends up with a size of 0 and a reserved space of 2M;
>
This part shouldn't happen, delayed refs space only manipulates
delayed_refs_rsv->size, so when we drop their space, we do
btrfs_delayed_refs_rsv_release
-> btrfs_block_rsv_release
-> block_rsv_release_bytes
which does
spin_lock(&block_rsv->lock);
if (num_bytes == (u64)-1) {
num_bytes = block_rsv->size;
qgroup_to_release = block_rsv->qgroup_rsv_size;
}
block_rsv->size -= num_bytes;
if (block_rsv->reserved >= block_rsv->size) {
num_bytes = block_rsv->reserved - block_rsv->size;
block_rsv->reserved = block_rsv->size;
block_rsv->full = true;
}
so if A and B race and the reservation is now much larger than size, the extra
will be cut off at the end when we call btrfs_delayed_refs_rsv_release.
Now I'm all for not holding the over-reserve for that long, but this analysis is
incorrect, we would have to somehow do the btrfs_delayed_refs_rsv_refill() and
then not call btrfs_delayed_refs_rsv_release() at some point.
This *could* happen I suppose by starting a trans handle and not modifying
anything while the delayed refs are run and finish before we do our reserve, and
in that case we didn't generate a delayed ref that could later on call
btrfs_delayed_refs_rsv_release() to clear the excess reservation, but that's a
decidedly different problem with a different solution. Thanks,
Josef
next prev parent reply other threads:[~2023-09-08 14:46 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-08 12:09 [PATCH 00/21] btrfs: updates to delayed refs accounting and space reservation fdmanana
2023-09-08 12:09 ` [PATCH 01/21] btrfs: fix race when refilling delayed refs block reserve fdmanana
2023-09-08 14:46 ` Josef Bacik [this message]
2023-09-08 17:21 ` Filipe Manana
2023-09-08 12:09 ` [PATCH 02/21] btrfs: prevent transaction block reserve underflow when starting transaction fdmanana
2023-09-08 14:57 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 03/21] btrfs: pass a space_info argument to btrfs_reserve_metadata_bytes() fdmanana
2023-09-08 14:58 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 04/21] btrfs: remove unnecessary logic when running new delayed references fdmanana
2023-09-08 14:59 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 05/21] btrfs: remove the refcount warning/check at btrfs_put_delayed_ref() fdmanana
2023-09-08 15:00 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 06/21] btrfs: return -EUCLEAN for delayed tree ref with a ref count not equals to 1 fdmanana
2023-09-08 15:01 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 07/21] btrfs: remove redundant BUG_ON() from __btrfs_inc_extent_ref() fdmanana
2023-09-08 15:02 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 08/21] btrfs: remove refs_to_add argument " fdmanana
2023-09-08 15:02 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 09/21] btrfs: remove refs_to_drop argument from __btrfs_free_extent() fdmanana
2023-09-08 15:03 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 10/21] btrfs: initialize key where it's used when running delayed data ref fdmanana
2023-09-08 15:04 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 11/21] btrfs: remove pointless 'ref_root' variable from run_delayed_data_ref() fdmanana
2023-09-08 15:07 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 12/21] btrfs: log message if extent item not found when running delayed extent op fdmanana
2023-09-08 15:08 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 13/21] btrfs: use a single variable for return value at run_delayed_extent_op() fdmanana
2023-09-08 15:09 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 14/21] btrfs: use a single variable for return value at lookup_inline_extent_backref() fdmanana
2023-09-08 15:10 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 15/21] btrfs: return -EUCLEAN if extent item is missing when searching inline backref fdmanana
2023-09-08 15:10 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 16/21] btrfs: simplify check for extent item overrun at lookup_inline_extent_backref() fdmanana
2023-09-08 15:11 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 17/21] btrfs: allow to run delayed refs by bytes to be released instead of count fdmanana
2023-09-08 15:15 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 18/21] btrfs: reserve space for delayed refs on a per ref basis fdmanana
2023-09-08 15:23 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 19/21] btrfs: remove pointless initialization at btrfs_delayed_refs_rsv_release() fdmanana
2023-09-08 15:16 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 20/21] btrfs: stop doing excessive space reservation for csum deletion fdmanana
2023-09-08 15:26 ` Josef Bacik
2023-09-08 12:09 ` [PATCH 21/21] btrfs: always reserve space for delayed refs when starting transaction fdmanana
2023-09-08 15:32 ` Josef Bacik
2023-09-08 17:20 ` [PATCH v2 00/21] btrfs: updates to delayed refs accounting and space reservation fdmanana
2023-09-08 17:20 ` [PATCH v2 01/21] btrfs: fix race when refilling delayed refs block reserve fdmanana
2023-09-08 17:20 ` [PATCH v2 02/21] btrfs: prevent transaction block reserve underflow when starting transaction fdmanana
2023-09-08 17:20 ` [PATCH v2 03/21] btrfs: pass a space_info argument to btrfs_reserve_metadata_bytes() fdmanana
2023-09-08 17:20 ` [PATCH v2 04/21] btrfs: remove unnecessary logic when running new delayed references fdmanana
2023-09-08 17:20 ` [PATCH v2 05/21] btrfs: remove the refcount warning/check at btrfs_put_delayed_ref() fdmanana
2023-09-08 17:20 ` [PATCH v2 06/21] btrfs: return -EUCLEAN for delayed tree ref with a ref count not equals to 1 fdmanana
2023-09-08 17:20 ` [PATCH v2 07/21] btrfs: remove redundant BUG_ON() from __btrfs_inc_extent_ref() fdmanana
2023-09-08 17:20 ` [PATCH v2 08/21] btrfs: remove refs_to_add argument " fdmanana
2023-09-08 17:20 ` [PATCH v2 09/21] btrfs: remove refs_to_drop argument from __btrfs_free_extent() fdmanana
2023-09-08 17:20 ` [PATCH v2 10/21] btrfs: initialize key where it's used when running delayed data ref fdmanana
2023-09-08 17:20 ` [PATCH v2 11/21] btrfs: remove pointless 'ref_root' variable from run_delayed_data_ref() fdmanana
2023-09-08 17:20 ` [PATCH v2 12/21] btrfs: log message if extent item not found when running delayed extent op fdmanana
2023-09-08 17:20 ` [PATCH v2 13/21] btrfs: use a single variable for return value at run_delayed_extent_op() fdmanana
2023-09-08 17:20 ` [PATCH v2 14/21] btrfs: use a single variable for return value at lookup_inline_extent_backref() fdmanana
2023-09-08 17:20 ` [PATCH v2 15/21] btrfs: return -EUCLEAN if extent item is missing when searching inline backref fdmanana
2023-09-08 17:20 ` [PATCH v2 16/21] btrfs: simplify check for extent item overrun at lookup_inline_extent_backref() fdmanana
2023-09-08 17:20 ` [PATCH v2 17/21] btrfs: allow to run delayed refs by bytes to be released instead of count fdmanana
2023-09-08 17:20 ` [PATCH v2 18/21] btrfs: reserve space for delayed refs on a per ref basis fdmanana
2023-09-08 17:20 ` [PATCH v2 19/21] btrfs: remove pointless initialization at btrfs_delayed_refs_rsv_release() fdmanana
2023-09-08 17:20 ` [PATCH v2 20/21] btrfs: stop doing excessive space reservation for csum deletion fdmanana
2023-09-08 17:20 ` [PATCH v2 21/21] btrfs: always reserve space for delayed refs when starting transaction fdmanana
2026-02-03 17:42 ` Alex Lyakas
2026-02-03 18:03 ` Filipe Manana
2023-09-11 17:20 ` [PATCH v2 00/21] btrfs: updates to delayed refs accounting and space reservation David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230908144611.GB1977092@perftesting \
--to=josef@toxicpanda.com \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.