From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH 1/3] btrfs: add a comment describing block-rsvs
Date: Mon, 3 Feb 2020 15:44:34 -0500 [thread overview]
Message-ID: <20200203204436.517473-2-josef@toxicpanda.com> (raw)
In-Reply-To: <20200203204436.517473-1-josef@toxicpanda.com>
This is a giant comment at the top of block-rsv.c describing generally
how block rsvs work. It is purely about the block rsv's themselves, and
nothing to do with how the actual reservation system works.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/block-rsv.c | 81 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 81 insertions(+)
diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
index d07bd41a7c1e..54380f477f80 100644
--- a/fs/btrfs/block-rsv.c
+++ b/fs/btrfs/block-rsv.c
@@ -6,6 +6,87 @@
#include "space-info.h"
#include "transaction.h"
+/*
+ * HOW DO BLOCK RSVS WORK
+ *
+ * Think of block_rsv's as bucktes for logically grouped reservations. Each
+ * block_rsv has a ->size and a ->reserved. ->size is how large we want our
+ * block rsv to be, ->reserved is how much space is currently reserved for
+ * this block reserve.
+ *
+ * ->failfast exists for the truncate case, and is described below.
+ *
+ * NORMAL OPERATION
+ * We determine we need N items of reservation, we use the appropriate
+ * btrfs_calc*() helper to determine the number of bytes. We call into
+ * reserve_metadata_bytes() and get our bytes, we then add this space to our
+ * ->size and our ->reserved.
+ *
+ * We go to modify the tree for our operation, we allocate a tree block, which
+ * calls btrfs_use_block_rsv(), and subtracts nodesize from
+ * block_rsv->reserved.
+ *
+ * We finish our operation, we subtract our original reservation from ->size,
+ * and then we subtract ->size from ->reserved if there is an excess and free
+ * the excess back to the space info, by reducing space_info->bytes_may_use by
+ * the excess amount.
+ *
+ * In some cases we may return this excess to the global block reserve or
+ * delayed refs reserve if either of their ->size is greater than their
+ * ->reserved.
+ *
+ * BLOCK_RSV_TRANS, BLOCK_RSV_DELOPS, BLOCK_RSV_CHUNK
+ * These behave normally, as described above, just within the confines of the
+ * lifetime of ther particular operation (transaction for the whole trans
+ * handle lifetime, for example).
+ *
+ * BLOCK_RSV_GLOBAL
+ * This has existed forever, with diminishing degrees of importance.
+ * Currently it exists to save us from ourselves. We definitely over-reserve
+ * space most of the time, but the nature of COW is that we do not know how
+ * much space we may need to use for any given operation. This is
+ * particularly true about the extent tree. Modifying one extent could
+ * balloon into 1000 modifications of the extent tree, which we have no way of
+ * properly predicting. To cover this case we have the global reserve act as
+ * the "root" space to allow us to not abort the transaciton when things are
+ * very tight. As such we tend to treat this space as sacred, and only use it
+ * if we are desparate. Generally we should no longer be depending on its
+ * space, and if new use cases arise we need to address them elsewhere.
+ *
+ * BLOCK_RSV_DELALLOC
+ * The individual item sizes are determined by the per-inode size
+ * calculations, which are described with the delalloc code. This is pretty
+ * straightforward, it's just the calculation of ->size encodes a lot of
+ * different items, and thus it gets used when updating inodes, inserting file
+ * extents, and inserting checksums.
+ *
+ * BLOCK_RSV_DELREFS
+ * We keep a running talley of how many delayed refs we have on the system.
+ * We assume each one of these delayed refs are going to use a full
+ * reservation. We use the transaction items and pre-reserve space for every
+ * operation, and use this reservation to refill any gap between ->size and
+ * ->reserved that may exist.
+ *
+ * From there it's straightforward, removing a delayed ref means we remove its
+ * count from ->size and free up reservations as necessary. Since this is the
+ * most dynamic block rsv in the system, we will try to refill this block rsv
+ * first with any excess returned by any other block reserve.
+ *
+ * BLOCK_RSV_EMPTY
+ * This is the fallback block rsv to make us try to reserve space if we don't
+ * have a specific bucket for this allocation. It is mostly used for updating
+ * the device tree and such, since that is a separate pool we're content to
+ * just reserve space from the space_info on demand.
+ *
+ * BLOCK_RSV_TEMP
+ * This is used by things like truncate and iput. We will temporarily
+ * allocate a block rsv, set it to some size, and then truncate bytes until we
+ * have no space left. With ->failfast set we'll simply return ENOSPC from
+ * btrfs_use_block_rsv() to signal that we need to unwind and try to make a
+ * new reservation. This is because these operations are unbounded, so we
+ * want to do as much work as we can, and then back off and re-reserve.
+ */
+
static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
struct btrfs_block_rsv *block_rsv,
struct btrfs_block_rsv *dest, u64 num_bytes,
--
2.24.1
next prev parent reply other threads:[~2020-02-03 20:44 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-03 20:44 [PATCH 0/3] Add comments describing how space reservation works Josef Bacik
2020-02-03 20:44 ` Josef Bacik [this message]
2020-02-04 9:30 ` [PATCH 1/3] btrfs: add a comment describing block-rsvs Qu Wenruo
2020-02-04 10:32 ` Nikolay Borisov
2020-02-03 20:44 ` [PATCH 2/3] btrfs: add a comment describing delalloc space reservation Josef Bacik
2020-02-04 9:48 ` Qu Wenruo
2020-02-04 12:27 ` Nikolay Borisov
2020-02-04 12:39 ` Qu Wenruo
2020-02-05 13:44 ` David Sterba
2020-02-03 20:44 ` [PATCH 3/3] btrfs: describe the space reservation system in general Josef Bacik
2020-02-04 10:14 ` Qu Wenruo
-- strict thread matches above, loose matches on Subject: below --
2020-02-04 18:18 [PATCH 0/3][v2] Add comments describing how space reservation works Josef Bacik
2020-02-04 18:18 ` [PATCH 1/3] btrfs: add a comment describing block-rsvs Josef Bacik
2020-02-07 15:11 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200203204436.517473-2-josef@toxicpanda.com \
--to=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox