From: Josef Bacik <josef@toxicpanda.com>
To: Nikolay Borisov <nborisov@suse.com>
Cc: Josef Bacik <josef@toxicpanda.com>,
kernel-team@fb.com, linux-btrfs@vger.kernel.org,
Josef Bacik <jbacik@fb.com>
Subject: Re: [PATCH 06/42] btrfs: introduce delayed_refs_rsv
Date: Fri, 28 Sep 2018 07:58:07 -0400 [thread overview]
Message-ID: <20180928115805.2f747knbtvhvgfuw@destiny> (raw)
In-Reply-To: <9a0183be-6aa9-05b4-4058-9d14f6617795@suse.com>
On Fri, Sep 28, 2018 at 02:51:10PM +0300, Nikolay Borisov wrote:
>
>
> On 28.09.2018 14:17, Josef Bacik wrote:
> > From: Josef Bacik <jbacik@fb.com>
> >
> > Traditionally we've had voodoo in btrfs to account for the space that
> > delayed refs may take up by having a global_block_rsv. This works most
> > of the time, except when it doesn't. We've had issues reported and seen
> > in production where sometimes the global reserve is exhausted during
> > transaction commit before we can run all of our delayed refs, resulting
> > in an aborted transaction. Because of this voodoo we have equally
> > dubious flushing semantics around throttling delayed refs which we often
> > get wrong.
> >
> > So instead give them their own block_rsv. This way we can always know
> > exactly how much outstanding space we need for delayed refs. This
> > allows us to make sure we are constantly filling that reservation up
> > with space, and allows us to put more precise pressure on the enospc
> > system. Instead of doing math to see if its a good time to throttle,
> > the normal enospc code will be invoked if we have a lot of delayed refs
> > pending, and they will be run via the normal flushing mechanism.
> >
> > For now the delayed_refs_rsv will hold the reservations for the delayed
> > refs, the block group updates, and deleting csums. We could have a
> > separate rsv for the block group updates, but the csum deletion stuff is
> > still handled via the delayed_refs so that will stay there.
> >
> > Signed-off-by: Josef Bacik <jbacik@fb.com>
> > ---
> > fs/btrfs/ctree.h | 27 +++--
> > fs/btrfs/delayed-ref.c | 28 ++++-
> > fs/btrfs/disk-io.c | 4 +
> > fs/btrfs/extent-tree.c | 279 +++++++++++++++++++++++++++++++++++--------
> > fs/btrfs/inode.c | 2 +-
> > fs/btrfs/transaction.c | 77 ++++++------
> > include/trace/events/btrfs.h | 2 +
> > 7 files changed, 312 insertions(+), 107 deletions(-)
> >
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index 66f1d3895bca..1a2c3b629af2 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -452,8 +452,9 @@ struct btrfs_space_info {
> > #define BTRFS_BLOCK_RSV_TRANS 3
> > #define BTRFS_BLOCK_RSV_CHUNK 4
> > #define BTRFS_BLOCK_RSV_DELOPS 5
> > -#define BTRFS_BLOCK_RSV_EMPTY 6
> > -#define BTRFS_BLOCK_RSV_TEMP 7
> > +#define BTRFS_BLOCK_RSV_DELREFS 6
> > +#define BTRFS_BLOCK_RSV_EMPTY 7
> > +#define BTRFS_BLOCK_RSV_TEMP 8
> >
> > struct btrfs_block_rsv {
> > u64 size;
> > @@ -794,6 +795,8 @@ struct btrfs_fs_info {
> > struct btrfs_block_rsv chunk_block_rsv;
> > /* block reservation for delayed operations */
> > struct btrfs_block_rsv delayed_block_rsv;
> > + /* block reservation for delayed refs */
> > + struct btrfs_block_rsv delayed_refs_rsv;
> >
> > struct btrfs_block_rsv empty_block_rsv;
> >
> > @@ -2608,8 +2611,7 @@ static inline u64 btrfs_calc_trunc_metadata_size(struct btrfs_fs_info *fs_info,
> >
> > int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans,
> > struct btrfs_fs_info *fs_info);
> > -int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans,
> > - struct btrfs_fs_info *fs_info);
> > +bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
> > void btrfs_dec_block_group_reservations(struct btrfs_fs_info *fs_info,
> > const u64 start);
> > void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg);
> > @@ -2723,10 +2725,12 @@ enum btrfs_reserve_flush_enum {
> > enum btrfs_flush_state {
> > FLUSH_DELAYED_ITEMS_NR = 1,
> > FLUSH_DELAYED_ITEMS = 2,
> > - FLUSH_DELALLOC = 3,
> > - FLUSH_DELALLOC_WAIT = 4,
> > - ALLOC_CHUNK = 5,
> > - COMMIT_TRANS = 6,
> > + FLUSH_DELAYED_REFS_NR = 3,
> > + FLUSH_DELAYED_REFS = 4,
> > + FLUSH_DELALLOC = 5,
> > + FLUSH_DELALLOC_WAIT = 6,
> > + ALLOC_CHUNK = 7,
> > + COMMIT_TRANS = 8,
> > };
> >
> > int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes);
> > @@ -2777,6 +2781,13 @@ int btrfs_cond_migrate_bytes(struct btrfs_fs_info *fs_info,
> > void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
> > struct btrfs_block_rsv *block_rsv,
> > u64 num_bytes);
> > +void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr);
> > +void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans);
> > +int btrfs_throttle_delayed_refs(struct btrfs_fs_info *fs_info,
> > + enum btrfs_reserve_flush_enum flush);
> > +void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info,
> > + struct btrfs_block_rsv *src,
> > + u64 num_bytes);
> > int btrfs_inc_block_group_ro(struct btrfs_block_group_cache *cache);
> > void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache);
> > void btrfs_put_block_group_cache(struct btrfs_fs_info *info);
> > diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> > index 27f7dd4e3d52..96ce087747b2 100644
> > --- a/fs/btrfs/delayed-ref.c
> > +++ b/fs/btrfs/delayed-ref.c
> > @@ -467,11 +467,14 @@ static int insert_delayed_ref(struct btrfs_trans_handle *trans,
> > * existing and update must have the same bytenr
> > */
> > static noinline void
> > -update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
> > +update_existing_head_ref(struct btrfs_trans_handle *trans,
> > struct btrfs_delayed_ref_head *existing,
> > struct btrfs_delayed_ref_head *update,
> > int *old_ref_mod_ret)
> > {
> > + struct btrfs_delayed_ref_root *delayed_refs =
> > + &trans->transaction->delayed_refs;
> > + struct btrfs_fs_info *fs_info = trans->fs_info;
> > int old_ref_mod;
> >
> > BUG_ON(existing->is_data != update->is_data);
> > @@ -529,10 +532,18 @@ update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
> > * versa we need to make sure to adjust pending_csums accordingly.
> > */
> > if (existing->is_data) {
> > - if (existing->total_ref_mod >= 0 && old_ref_mod < 0)
> > + u64 csum_items =
> > + btrfs_csum_bytes_to_leaves(fs_info,
> > + existing->num_bytes);
> > +
> > + if (existing->total_ref_mod >= 0 && old_ref_mod < 0) {
> > delayed_refs->pending_csums -= existing->num_bytes;
> > - if (existing->total_ref_mod < 0 && old_ref_mod >= 0)
> > + btrfs_delayed_refs_rsv_release(fs_info, csum_items);
> > + }
> > + if (existing->total_ref_mod < 0 && old_ref_mod >= 0) {
> > delayed_refs->pending_csums += existing->num_bytes;
> > + trans->delayed_ref_updates += csum_items;
> > + }
> > }
> > spin_unlock(&existing->lock);
> > }
> > @@ -638,7 +649,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
> > && head_ref->qgroup_reserved
> > && existing->qgroup_ref_root
> > && existing->qgroup_reserved);
> > - update_existing_head_ref(delayed_refs, existing, head_ref,
> > + update_existing_head_ref(trans, existing, head_ref,
> > old_ref_mod);
> > /*
> > * we've updated the existing ref, free the newly
> > @@ -649,8 +660,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
> > } else {
> > if (old_ref_mod)
> > *old_ref_mod = 0;
> > - if (head_ref->is_data && head_ref->ref_mod < 0)
> > + if (head_ref->is_data && head_ref->ref_mod < 0) {
> > delayed_refs->pending_csums += head_ref->num_bytes;
> > + trans->delayed_ref_updates +=
> > + btrfs_csum_bytes_to_leaves(trans->fs_info,
> > + head_ref->num_bytes);
> > + }
> > delayed_refs->num_heads++;
> > delayed_refs->num_heads_ready++;
> > atomic_inc(&delayed_refs->num_entries);
> > @@ -785,6 +800,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
> >
> > ret = insert_delayed_ref(trans, delayed_refs, head_ref, &ref->node);
> > spin_unlock(&delayed_refs->lock);
> > + btrfs_update_delayed_refs_rsv(trans);
>
> You haven't adressed my initial point about merging modification of
> delayed_ref_updates and calling btrfs_update_delayed_refs_rsv into one
> function otherwise this seems error prone. I don't see why this cannot
> be made, if there is some reason which I'm missing then explain it.
>
> As it stands this btrfs_updated_delayed_refs_rsv is paired with the
> modifications made in one of the 2nd level callees:
>
> btrfs_add_delayed_tree_ref
> add_delayed_ref_head
> update_existing_head_ref
>
> I'd rather have btrfs_update_delayed_refs_rsv renamed to something else
> with 'inc' in its name and called everytime we modify
> delayed_ref_update. I'm willing to bet 50 bucks in 6 months time someone
> will change delayed_ref_updates and will forget to call
> btrfs_update_delayed_refs_rsv.
>
Because we have to take the delayed_refs_rsv lock in this helper, I want to take
it as little as possible since it is a fs wide lock, so I want to batch it.
There's no reason to change it. Thanks,
Josef
next prev parent reply other threads:[~2018-09-28 11:58 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-28 11:17 [PATCH 00/42][v3] My current patch queue Josef Bacik
2018-09-28 11:17 ` [PATCH 01/42] btrfs: add btrfs_delete_ref_head helper Josef Bacik
2018-09-28 11:17 ` [PATCH 02/42] btrfs: add cleanup_ref_head_accounting helper Josef Bacik
2018-09-28 11:17 ` [PATCH 03/42] btrfs: cleanup extent_op handling Josef Bacik
2018-09-28 11:17 ` [PATCH 04/42] btrfs: only track ref_heads in delayed_ref_updates Josef Bacik
2018-09-28 11:17 ` [PATCH 05/42] btrfs: only count ref heads run in __btrfs_run_delayed_refs Josef Bacik
2018-09-28 11:17 ` [PATCH 06/42] btrfs: introduce delayed_refs_rsv Josef Bacik
2018-09-28 11:51 ` Nikolay Borisov
2018-09-28 11:58 ` Josef Bacik [this message]
2018-09-28 11:17 ` [PATCH 07/42] btrfs: check if free bgs for commit Josef Bacik
2018-10-04 11:24 ` David Sterba
2018-10-11 18:33 ` Josef Bacik
2018-10-12 16:50 ` David Sterba
2018-09-28 11:17 ` [PATCH 08/42] btrfs: dump block_rsv whe dumping space info Josef Bacik
2018-10-01 17:08 ` David Sterba
2018-09-28 11:17 ` [PATCH 09/42] btrfs: release metadata before running delayed refs Josef Bacik
2018-09-28 11:17 ` [PATCH 10/42] btrfs: protect space cache inode alloc with nofs Josef Bacik
2018-10-01 17:08 ` David Sterba
2018-09-28 11:17 ` [PATCH 11/42] btrfs: fix truncate throttling Josef Bacik
2018-09-28 11:17 ` [PATCH 12/42] btrfs: don't use global rsv for chunk allocation Josef Bacik
2018-09-28 11:17 ` [PATCH 13/42] btrfs: add ALLOC_CHUNK_FORCE to the flushing code Josef Bacik
2018-09-28 11:17 ` [PATCH 14/42] btrfs: reset max_extent_size properly Josef Bacik
2018-09-28 11:17 ` [PATCH 15/42] btrfs: don't enospc all tickets on flush failure Josef Bacik
2018-09-28 11:17 ` [PATCH 16/42] btrfs: loop in inode_rsv_refill Josef Bacik
2018-10-02 13:47 ` David Sterba
2018-09-28 11:17 ` [PATCH 17/42] btrfs: run delayed iputs before committing Josef Bacik
2018-09-28 11:17 ` [PATCH 18/42] btrfs: move the dio_sem higher up the callchain Josef Bacik
2018-10-03 12:27 ` David Sterba
2018-10-03 14:54 ` Filipe Manana
2018-09-28 11:17 ` [PATCH 19/42] btrfs: set max_extent_size properly Josef Bacik
2018-09-28 11:17 ` [PATCH 20/42] btrfs: don't use ctl->free_space for max_extent_size Josef Bacik
2018-09-28 11:18 ` [PATCH 21/42] btrfs: reset max_extent_size on clear in a bitmap Josef Bacik
2018-09-28 11:18 ` [PATCH 22/42] btrfs: only run delayed refs if we're committing Josef Bacik
2018-09-28 11:18 ` [PATCH 23/42] btrfs: make sure we create all new bgs Josef Bacik
2018-10-08 13:45 ` David Sterba
2018-09-28 11:18 ` [PATCH 24/42] btrfs: assert on non-empty delayed iputs Josef Bacik
2018-10-08 13:44 ` David Sterba
2018-09-28 11:18 ` [PATCH 25/42] btrfs: pass delayed_refs_root to btrfs_delayed_ref_lock Josef Bacik
2018-09-28 11:18 ` [PATCH 26/42] btrfs: make btrfs_destroy_delayed_refs use btrfs_delayed_ref_lock Josef Bacik
2018-09-28 11:18 ` [PATCH 27/42] btrfs: make btrfs_destroy_delayed_refs use btrfs_delete_ref_head Josef Bacik
2018-09-28 11:18 ` [PATCH 28/42] btrfs: handle delayed ref head accounting cleanup in abort Josef Bacik
2018-09-28 11:18 ` [PATCH 29/42] btrfs: call btrfs_create_pending_block_groups unconditionally Josef Bacik
2018-09-28 11:18 ` [PATCH 30/42] btrfs: just delete pending bgs if we are aborted Josef Bacik
2018-09-28 11:18 ` [PATCH 31/42] btrfs: cleanup pending bgs on transaction abort Josef Bacik
2018-09-28 11:18 ` [PATCH 32/42] btrfs: only free reserved extent if we didn't insert it Josef Bacik
2018-09-28 11:18 ` [PATCH 33/42] btrfs: fix insert_reserved error handling Josef Bacik
2018-09-28 11:18 ` [PATCH 34/42] btrfs: wait on ordered extents on abort cleanup Josef Bacik
2018-09-28 11:18 ` [PATCH 35/42] MAINTAINERS: update my email address for btrfs Josef Bacik
2018-09-28 11:18 ` [PATCH 36/42] btrfs: wait on caching when putting the bg cache Josef Bacik
2018-10-01 17:17 ` David Sterba
2018-09-28 11:18 ` [PATCH 37/42] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik
2018-10-08 10:59 ` Filipe Manana
2018-09-28 11:18 ` [PATCH 38/42] btrfs: be more explicit about allowed flush states Josef Bacik
2018-09-28 11:18 ` [PATCH 39/42] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue Josef Bacik
2018-09-28 11:18 ` [PATCH 40/42] btrfs: drop min_size from evict_refill_and_join Josef Bacik
2018-10-03 12:52 ` David Sterba
2018-09-28 11:18 ` [PATCH 41/42] btrfs: reserve extra space during evict() Josef Bacik
2018-09-28 11:18 ` [PATCH 42/42] btrfs: don't run delayed_iputs in commit Josef Bacik
-- strict thread matches above, loose matches on Subject: below --
2018-10-11 19:53 [PATCH 00/42][v4] My current patch queue Josef Bacik
2018-10-11 19:53 ` [PATCH 06/42] btrfs: introduce delayed_refs_rsv Josef Bacik
2018-10-12 19:32 [PATCH 00/42][v5] My current patch queue Josef Bacik
2018-10-12 19:32 ` [PATCH 06/42] btrfs: introduce delayed_refs_rsv Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180928115805.2f747knbtvhvgfuw@destiny \
--to=josef@toxicpanda.com \
--cc=jbacik@fb.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=nborisov@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).