public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Boris Burkov <boris@bur.io>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 18/18] btrfs: track data relocation with simple quota
Date: Thu, 13 Jul 2023 13:37:52 -0400	[thread overview]
Message-ID: <20230713173752.GR207541@perftesting> (raw)
In-Reply-To: <d9e6a4525095ec5abb1818547d565fcf3ef58460.1688597211.git.boris@bur.io>

On Wed, Jul 05, 2023 at 04:20:55PM -0700, Boris Burkov wrote:
> Relocation data allocations are quite tricky for simple quotas. The
> basic data relocation sequence is (ignoring details that aren't relevant
> to this fix):
> - create a fake relocation data fs root
> - create a fake relocation inode in that root
> - foreach data extent:
>   - preallocate a data extent on behalf of the fake inode
>   - copy over the data
> - foreach extent
>   - swap the refs so that the original file extent now refers to the new
>     extent item
> - drop the fake root, dropping its refs on the old extents, which lets
>   us delete them.
> 
> Done naively, this results in storing an extent item in the extent tree
> whose owner_ref points at the relocation data root and a no-op squota
> recording, since the reloc root is not a legit fstree. So far, that's
> OK. The problem comes when you do the swap, and leave an extent item
> owned by this bogus root as the real permanent extents of the file. If
> the file then drops that ref, we free it and no-op account that against
> the fake relocation root. Essentially, this means that relocation is
> simple quota "extent laundering", since we re-own the extents into a
> fake root.
> 
> Simple quotas very intentionally doesn't have a mechanism for
> transferring ownership of extents, as that is exactly the complicated
> thing we are trying to avoid with the new design. Further, it cannot be
> correctly done in this case, since at the time you create the new
> "real" refs, there is no way to know which was the original owner before
> relocation unless we track it.
> 
> Therefore, it makes more sense to trick the preallocation to handle
> relocation as a special case and note the proper owner ref from the
> beginning. That way, we never write out an extent item without the
> correct owner ref that it will eventually have.
> 
> This could be done by wiring a special root parameter all the way
> through the allocation code path, but to avoid that special case
> touching all the code, take advantage of the serial nature of relocation
> to store the src root on the relocation root object. Then when we finish
> the prealloc, if it happens to be this case, prepare the delayed ref
> appropriately.
> 
> This is obviously a smelly bit of code, but I think it is the best
> solution to the problem, given the relocation implementation.
> 
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
>  fs/btrfs/ctree.h       |  1 +
>  fs/btrfs/extent-tree.c | 13 +++++++------
>  fs/btrfs/relocation.c  | 15 +++++++++++++++
>  3 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index f2d2b313bde5..577186994188 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -333,6 +333,7 @@ struct btrfs_root {
>  #ifdef CONFIG_BTRFS_DEBUG
>  	struct list_head leak_list;
>  #endif
> +	u64 relocation_src_root;
>  };
>  
>  static inline bool btrfs_root_readonly(const struct btrfs_root *root)
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 99845a54e168..10e026d5b684 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -57,7 +57,7 @@ static void __run_delayed_extent_op(struct btrfs_delayed_extent_op *extent_op,
>  static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
>  				      u64 parent, u64 root_objectid,
>  				      u64 flags, u64 owner, u64 offset,
> -				      struct btrfs_key *ins, int ref_mod);
> +				      struct btrfs_key *ins, int ref_mod, u64 oref_root);
>  static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
>  				     struct btrfs_delayed_ref_node *node,
>  				     struct btrfs_delayed_extent_op *extent_op);
> @@ -1541,7 +1541,7 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans,
>  		ret = alloc_reserved_file_extent(trans, parent, ref_root,
>  						 flags, ref->objectid,
>  						 ref->offset, &ins,
> -						 node->ref_mod);
> +						 node->ref_mod, href->owning_root);
>  		if (!ret)
>  			ret = btrfs_record_simple_quota_delta(trans->fs_info, &delta);
>  	} else if (node->action == BTRFS_ADD_DELAYED_REF) {
> @@ -4683,7 +4683,7 @@ static int alloc_reserved_extent(struct btrfs_trans_handle *trans, u64 bytenr,
>  static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
>  				      u64 parent, u64 root_objectid,
>  				      u64 flags, u64 owner, u64 offset,
> -				      struct btrfs_key *ins, int ref_mod)
> +				      struct btrfs_key *ins, int ref_mod, u64 oref_root)
>  {
>  	struct btrfs_fs_info *fs_info = trans->fs_info;
>  	struct btrfs_root *extent_root;
> @@ -4731,7 +4731,7 @@ static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
>  	if (simple_quota) {
>  		btrfs_set_extent_inline_ref_type(leaf, iref, BTRFS_EXTENT_OWNER_REF_KEY);
>  		oref = (struct btrfs_extent_owner_ref *)(&iref->offset);
> -		btrfs_set_extent_owner_ref_root_id(leaf, oref, root_objectid);
> +		btrfs_set_extent_owner_ref_root_id(leaf, oref, oref_root);
>  		iref = (struct btrfs_extent_inline_ref *)(oref + 1);
>  	}
>  	btrfs_set_extent_inline_ref_type(leaf, iref, type);
> @@ -4842,7 +4842,8 @@ int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
>  
>  	BUG_ON(root_objectid == BTRFS_TREE_LOG_OBJECTID);
>  
> -	BUG_ON(root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID);
> +	if (btrfs_is_data_reloc_root(root) && is_fstree(root->relocation_src_root))
> +		owning_root = root->relocation_src_root;
>  
>  	btrfs_init_generic_ref(&generic_ref, BTRFS_ADD_DELAYED_EXTENT,
>  			       ins->objectid, ins->offset, 0, owning_root);
> @@ -4899,7 +4900,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
>  	spin_unlock(&space_info->lock);
>  
>  	ret = alloc_reserved_file_extent(trans, 0, root_objectid, 0, owner,
> -					 offset, ins, 1);
> +					 offset, ins, 1, root_objectid);
>  	if (ret)
>  		btrfs_pin_extent(trans, ins->objectid, ins->offset, 1);
>  	ret = btrfs_record_simple_quota_delta(fs_info, &delta);
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 119f670538f7..e12377c818c0 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -3665,6 +3665,21 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
>  				    struct btrfs_extent_item);
>  		flags = btrfs_extent_flags(path->nodes[0], ei);
>  
> +		/*
> +		 * If we are relocating a simple quota owned extent item, we need
> +		 * to note the owner on the reloc data root so that when we
> +		 * allocate the replacement item, we can attribute it to the
> +		 * correct eventual owner (rather than the reloc data root)
> +		 */
> +		if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) {
> +			struct btrfs_root *root = BTRFS_I(rc->data_inode)->root;
> +			u64 owning_root_id = btrfs_get_extent_owner_root(fs_info,
> +									 path->nodes[0],
> +									 path->slots[0]);
> +
> +			root->relocation_src_root = owning_root_id;
> +		}
> +

This is almost correct but can mess up if we have adjacent extents that are
owned by different roots.  If you look further down we move the extents via
relocate_data_extent(), which will cluster together ranges to limit the number
of preallocations you do.  You're going to have to add a check in there for the
owning_root_id != root->relocation_src_root, do the prealloc, and then set
root->relocation_src_root in there and carry on.  Thanks,

Josef

      reply	other threads:[~2023-07-13 17:38 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-05 23:20 [PATCH 00/18] btrfs: simple quotas Boris Burkov
2023-07-05 23:20 ` [PATCH 01/18] btrfs: free qgroup rsv on io failure Boris Burkov
2023-07-13 14:01   ` Josef Bacik
2023-07-05 23:20 ` [PATCH 02/18] btrfs: fix start transaction qgroup rsv double free Boris Burkov
2023-07-13 14:02   ` Josef Bacik
2023-07-05 23:20 ` [PATCH 03/18] btrfs: introduce quota mode Boris Burkov
2023-07-13 14:02   ` Josef Bacik
2023-07-05 23:20 ` [PATCH 04/18] btrfs: add new quota mode for simple quotas Boris Burkov
2023-07-13 14:07   ` Josef Bacik
2023-07-05 23:20 ` [PATCH 05/18] btrfs: expose quota mode via sysfs Boris Burkov
2023-07-13 14:11   ` Josef Bacik
2023-07-05 23:20 ` [PATCH 06/18] btrfs: flush reservations during quota disable Boris Burkov
2023-07-13 14:20   ` Josef Bacik
2023-07-05 23:20 ` [PATCH 07/18] btrfs: create qgroup earlier in snapshot creation Boris Burkov
2023-07-13 14:26   ` Josef Bacik
2023-07-13 19:00     ` Boris Burkov
2023-07-13 20:37       ` Josef Bacik
2023-07-13 23:13         ` Boris Burkov
2023-07-05 23:20 ` [PATCH 08/18] btrfs: function for recording simple quota deltas Boris Burkov
2023-07-13 14:34   ` Josef Bacik
2023-07-05 23:20 ` [PATCH 09/18] btrfs: rename tree_ref and data_ref owning_root Boris Burkov
2023-07-13 16:33   ` Josef Bacik
2023-07-05 23:20 ` [PATCH 10/18] btrfs: track owning root in btrfs_ref Boris Burkov
2023-07-13 16:58   ` Josef Bacik
2023-07-13 21:21     ` Boris Burkov
2023-07-05 23:20 ` [PATCH 11/18] btrfs: track original extent owner in head_ref Boris Burkov
2023-07-13 17:09   ` Josef Bacik
2023-07-05 23:20 ` [PATCH 12/18] btrfs: new inline ref storing owning subvol of data extents Boris Burkov
2023-07-13 17:16   ` Josef Bacik
2023-07-05 23:20 ` [PATCH 13/18] btrfs: inline owner ref lookup helper Boris Burkov
2023-07-13 17:18   ` Josef Bacik
2023-07-05 23:20 ` [PATCH 14/18] btrfs: record simple quota deltas Boris Burkov
2023-07-13 17:23   ` Josef Bacik
2023-07-05 23:20 ` [PATCH 15/18] btrfs: simple quota auto hierarchy for nested subvols Boris Burkov
2023-07-13 17:28   ` Josef Bacik
2023-07-05 23:20 ` [PATCH 16/18] btrfs: check generation when recording simple quota delta Boris Burkov
2023-07-13 17:29   ` Josef Bacik
2023-07-05 23:20 ` [PATCH 17/18] btrfs: track metadata relocation cow with simple quota Boris Burkov
2023-07-13 17:31   ` Josef Bacik
2023-07-05 23:20 ` [PATCH 18/18] btrfs: track data relocation " Boris Burkov
2023-07-13 17:37   ` Josef Bacik [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230713173752.GR207541@perftesting \
    --to=josef@toxicpanda.com \
    --cc=boris@bur.io \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox