From: Nikolay Borisov <nborisov@suse.com>
To: Josef Bacik <josef@toxicpanda.com>,
linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 24/24] btrfs: add a comment explaining the data flush steps
Date: Tue, 4 Feb 2020 11:47:16 +0200 [thread overview]
Message-ID: <593bda62-3f25-28d6-bb5c-efaa677872c6@suse.com> (raw)
In-Reply-To: <20200203204951.517751-25-josef@toxicpanda.com>
On 3.02.20 г. 22:49 ч., Josef Bacik wrote:
> The data flushing steps are not obvious to people other than myself and
> Chris. Write a giant comment explaining the reasoning behind each flush
> step for data as well as why it is in that particular order.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
> fs/btrfs/space-info.c | 46 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 46 insertions(+)
>
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index 18a31d96bbbd..d3befc536a7f 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -780,6 +780,52 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
> } while (flush_state <= COMMIT_TRANS);
> }
>
> +/*
> + * FLUSH_DELALLOC_WAIT:
> + * Space is free'd from flushing delalloc in one of two ways.
> + *
> + * 1) compression is on and we allocate less space than we reserved.
> + * 2) We are overwriting existing space.
> + *
> + * For #1 that extra space is reclaimed as soon as the delalloc pages are
> + * cow'ed, by way of btrfs_add_reserved_bytes() which adds the actual extent
> + * length to ->bytes_reserved, and subtracts the reserved space from
> + * ->bytes_may_use.
> + *
> + * For #2 this is trickier. Once the ordered extent runs we will drop the
> + * extent in the range we are overwriting, which creates a delayed ref for
> + * that freed extent. This however is not reclaimed until the transaction
> + * commits, thus the next stages.
> + *
> + * RUN_DELAYED_IPUTS
> + * If we are freeing inodes, we want to make sure all delayed iputs have
> + * completed, because they could have been on an inode with i_nlink == 0, and
> + * thus have been trunated and free'd up space. But again this space is not
> + * immediately re-usable, it comes in the form of a delayed ref, which must be
> + * run and then the transaction must be committed.
> + *
> + * FLUSH_DELAYED_REFS
> + * The above two cases generate delayed refs that will affect
> + * ->total_bytes_pinned. However this counter can be inconsistent with
> + * reality if there are outstanding delayed refs. This is because we adjust
> + * the counter based on the other delayed refs that exist. So for example, if
IMO this sentence would be clearer if it simply says something along the
lines of " This is because we adjust the counter based solely on the
current set of delayed refs and disregard any on-disk state which might
include more refs".
> + * we have an extent with 2 references, but we only drop 1, we'll see that
> + * there is a negative delayed ref count for the extent and assume that the
> + * space will be free'd, and thus increase ->total_bytes_pinned.
> + *
> + * Running the delayed refs gives us the actual real view of what will be
> + * freed at the transaction commit time. This stage will not actually free
> + * space for us, it just makes sure that may_commit_transaction() has all of
> + * the information it needs to make the right decision.
Is there any particular reason why total_bytes_pinned is sort of double
accounted. I.e first add_pinned_bytes is called when a DROP del ref is
added with negative ref. Then during processing of that delref
__btrfs_free_extent either:
a) Removes the ref but doesn't free the extent if there were other,
non-del refs for this extent
b) Removes the extent and calls btrfs_update_block_group to account it
again as pinned (this time also setting the respective ranges as pinned)
This double accounting doesn't really happen because after the
processing of the DROP del ref is finished
cleanup_ref_head->btrfs_cleanup_ref_head_accounting will actually clean
the pinned bytes added at creation time of the DROP del ref. Can we
avoid this and simply rely on the update of total_bytes_pinned when an
extent is freed.
> + *
> + * COMMIT_TRANS
> + * This is where we reclaim all of the pinned space generated by the previous
> + * two stages. We will not commit the transaction if we don't think we're
> + * likely to satisfy our request, which means if our current free space +
> + * total_bytes_pinned < reservation we will not commit. This is why the
> + * previous states are actually important, to make sure we know for sure
> + * wether committing the transaction will allow us to make progress.
typo: s/wether/whether/
> + */
> static const enum btrfs_flush_state data_flush_states[] = {
> FLUSH_DELALLOC_WAIT,
> RUN_DELAYED_IPUTS,
>
This looks good and helped me understand the machinery and put some
context into the code.
next prev parent reply other threads:[~2020-02-04 9:47 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-03 20:49 [PATCH 00/24][v3] Convert data reservations to the ticketing infrastructure Josef Bacik
2020-02-03 20:49 ` [PATCH 01/24] btrfs: change nr to u64 in btrfs_start_delalloc_roots Josef Bacik
2020-02-04 7:40 ` Nikolay Borisov
2020-02-03 20:49 ` [PATCH 02/24] btrfs: remove orig from shrink_delalloc Josef Bacik
2020-02-03 20:49 ` [PATCH 03/24] btrfs: handle U64_MAX for shrink_delalloc Josef Bacik
2020-02-03 20:49 ` [PATCH 04/24] btrfs: make shrink_delalloc take space_info as an arg Josef Bacik
2020-02-03 20:49 ` [PATCH 05/24] btrfs: make ALLOC_CHUNK use the space info flags Josef Bacik
2020-02-03 20:49 ` [PATCH 06/24] btrfs: call btrfs_try_granting_tickets when freeing reserved bytes Josef Bacik
2020-02-03 20:49 ` [PATCH 07/24] btrfs: call btrfs_try_granting_tickets when unpinning anything Josef Bacik
2020-02-03 20:49 ` [PATCH 08/24] btrfs: call btrfs_try_granting_tickets when reserving space Josef Bacik
2020-02-03 20:49 ` [PATCH 09/24] btrfs: use the btrfs_space_info_free_bytes_may_use helper for delalloc Josef Bacik
2020-02-03 20:49 ` [PATCH 10/24] btrfs: use btrfs_start_delalloc_roots in shrink_delalloc Josef Bacik
2020-02-03 20:49 ` [PATCH 11/24] btrfs: check tickets after waiting on ordered extents Josef Bacik
2020-02-03 20:49 ` [PATCH 12/24] btrfs: add flushing states for handling data reservations Josef Bacik
2020-02-03 20:49 ` [PATCH 13/24] btrfs: add the data transaction commit logic into may_commit_transaction Josef Bacik
2020-02-03 20:49 ` [PATCH 14/24] btrfs: add btrfs_reserve_data_bytes and use it Josef Bacik
2020-02-03 20:49 ` [PATCH 15/24] btrfs: use ticketing for data space reservations Josef Bacik
2020-02-03 20:49 ` [PATCH 16/24] btrfs: serialize data reservations if we are flushing Josef Bacik
2020-02-03 20:49 ` [PATCH 17/24] btrfs: use the same helper for data and metadata reservations Josef Bacik
2020-02-03 20:49 ` [PATCH 18/24] btrfs: drop the commit_cycles stuff for data reservations Josef Bacik
2020-02-04 7:39 ` Nikolay Borisov
2020-02-03 20:49 ` [PATCH 19/24] btrfs: don't pass bytes_needed to may_commit_transaction Josef Bacik
2020-02-04 7:42 ` Nikolay Borisov
2020-02-03 20:49 ` [PATCH 20/24] btrfs: don't force commit if we are data Josef Bacik
2020-02-04 7:43 ` Nikolay Borisov
2020-02-03 20:49 ` [PATCH 21/24] btrfs: run delayed iputs before committing the transaction for data Josef Bacik
2020-02-04 7:43 ` Nikolay Borisov
2020-02-03 20:49 ` [PATCH 22/24] btrfs: flush delayed refs when trying to reserve data space Josef Bacik
2020-02-03 20:49 ` [PATCH 23/24] btrfs: do async reclaim for data reservations Josef Bacik
2020-02-03 20:49 ` [PATCH 24/24] btrfs: add a comment explaining the data flush steps Josef Bacik
2020-02-04 9:47 ` Nikolay Borisov [this message]
2020-02-04 16:16 ` Josef Bacik
2020-02-04 9:48 ` [PATCH 00/24][v3] Convert data reservations to the ticketing infrastructure Nikolay Borisov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=593bda62-3f25-28d6-bb5c-efaa677872c6@suse.com \
--to=nborisov@suse.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox