Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: Mark Harmstone <mark@harmstone.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v3 08/17] btrfs: redirect I/O for remapped block groups
Date: Mon, 20 Oct 2025 10:44:56 -0700	[thread overview]
Message-ID: <20251020174456.GC696792@zen.localdomain> (raw)
In-Reply-To: <f74da96e-7b50-48f2-b9ad-a2aefc8ed9aa@harmstone.com>

On Mon, Oct 20, 2025 at 03:31:39PM +0100, Mark Harmstone wrote:
> On 15/10/2025 5.21 am, Boris Burkov wrote:
> > On Thu, Oct 09, 2025 at 12:28:03PM +0100, Mark Harmstone wrote:
> > > Change btrfs_map_block() so that if the block group has the REMAPPED
> > > flag set, we call btrfs_translate_remap() to obtain a new address.
> > > 
> > > btrfs_translate_remap() searches the remap tree for a range
> > > corresponding to the logical address passed to btrfs_map_block(). If it
> > > is within an identity remap, this part of the block group hasn't yet
> > > been relocated, and so we use the existing address.
> > > 
> > > If it is within an actual remap, we subtract the start of the remap
> > > range and add the address of its destination, contained in the item's
> > > payload.
> > > 
> > > Signed-off-by: Mark Harmstone <mark@harmstone.com>
> > > ---
> > >   fs/btrfs/relocation.c | 59 +++++++++++++++++++++++++++++++++++++++++++
> > >   fs/btrfs/relocation.h |  2 ++
> > >   fs/btrfs/volumes.c    | 31 +++++++++++++++++++++++
> > >   3 files changed, 92 insertions(+)
> > > 
> > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> > > index 748290758459..4f5d3aaf0f65 100644
> > > --- a/fs/btrfs/relocation.c
> > > +++ b/fs/btrfs/relocation.c
> > > @@ -3870,6 +3870,65 @@ static const char *stage_to_string(enum reloc_stage stage)
> > >   	return "unknown";
> > >   }
> > > +int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
> > > +			  u64 *length, bool nolock)
> > > +{
> > > +	int ret;
> > > +	struct btrfs_key key, found_key;
> > > +	struct extent_buffer *leaf;
> > > +	struct btrfs_remap *remap;
> > > +	BTRFS_PATH_AUTO_FREE(path);
> > > +
> > > +	path = btrfs_alloc_path();
> > > +	if (!path)
> > > +		return -ENOMEM;
> > > +
> > > +	if (nolock) {
> > > +		path->search_commit_root = 1;
> > > +		path->skip_locking = 1;
> > > +	}
> > > +
> > > +	key.objectid = *logical;
> > > +	key.type = (u8)-1;
> > > +	key.offset = (u64)-1;
> > > +
> > > +	ret = btrfs_search_slot(NULL, fs_info->remap_root, &key, path,
> > > +				0, 0);
> > > +	if (ret < 0)
> > > +		return ret;
> > > +
> > > +	leaf = path->nodes[0];
> > > +
> > > +	if (path->slots[0] == 0)
> > > +		return -ENOENT;
> > > +
> > > +	path->slots[0]--;
> > > +
> > > +	btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> > > +
> > > +	if (found_key.type != BTRFS_REMAP_KEY &&
> > > +	    found_key.type != BTRFS_IDENTITY_REMAP_KEY) {
> > > +		return -ENOENT;
> > > +	}
> > > +
> > > +	if (found_key.objectid > *logical ||
> > > +	    found_key.objectid + found_key.offset <= *logical) {
> > > +		return -ENOENT;
> > > +	}
> > > +
> > > +	if (*logical + *length > found_key.objectid + found_key.offset)
> > > +		*length = found_key.objectid + found_key.offset - *logical;
> > > +
> > > +	if (found_key.type == BTRFS_IDENTITY_REMAP_KEY)
> > > +		return 0;
> > > +
> > > +	remap = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_remap);
> > > +
> > > +	*logical += btrfs_remap_address(leaf, remap) - found_key.objectid;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >   /*
> > >    * function to relocate all extents in a block group.
> > >    */
> > > diff --git a/fs/btrfs/relocation.h b/fs/btrfs/relocation.h
> > > index 5c36b3f84b57..a653c42a25a3 100644
> > > --- a/fs/btrfs/relocation.h
> > > +++ b/fs/btrfs/relocation.h
> > > @@ -31,5 +31,7 @@ int btrfs_should_cancel_balance(const struct btrfs_fs_info *fs_info);
> > >   struct btrfs_root *find_reloc_root(struct btrfs_fs_info *fs_info, u64 bytenr);
> > >   bool btrfs_should_ignore_reloc_root(const struct btrfs_root *root);
> > >   u64 btrfs_get_reloc_bg_bytenr(const struct btrfs_fs_info *fs_info);
> > > +int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
> > > +			  u64 *length, bool nolock);
> > >   #endif
> > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > > index 0abe02a7072f..f2d1203778aa 100644
> > > --- a/fs/btrfs/volumes.c
> > > +++ b/fs/btrfs/volumes.c
> > > @@ -6635,6 +6635,37 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
> > >   	if (IS_ERR(map))
> > >   		return PTR_ERR(map);
> > > +	if (map->type & BTRFS_BLOCK_GROUP_REMAPPED) {
> > > +		u64 new_logical = logical;
> > > +		bool nolock = !(map->type & BTRFS_BLOCK_GROUP_DATA);
> > 
> > Why not data?
> 
> Because I thought we'd maxed out our lockdep classes so this was the only way to
> prevent lockdep from complaining about metadata reads - not yet knowing about
> the btrfs_lockdep_keysets array.
> 
> I think I'm going to remove the search_commit_root stuff from this patchset, and
> leave it for a future optimization. As you know it's full of gotchas.
> 
> > > +
> > > +		/*
> > > +		 * We use search_commit_root in btrfs_translate_remap for
> > > +		 * metadata blocks, to avoid lockdep complaining about
> > > +		 * recursive locking.
> > > +		 * If we get -ENOENT this means this is a BG that has just had
> > > +		 * its REMAPPED flag set, and so nothing has yet been actually
> > > +		 * remapped.
> > > +		 */
> > 
> > I'm actually kind of worried about this now. What is preventing the
> > following racy interleaving:
> > 
> >        T1                                         T2
> >                                          start_block_group_remapping() // in TXN-K set REMAPPED
> > btrfs_map_block()
> >    btrfs_translate_remap()
> >      ENOENT // searched in commit root
> >                                          do_remap_tree_reloc() // in TXN-K do all remaps
> >                                          // fully remapped, removed, discarded
> >                                          // TXN-K committed
> >    // not remapped! but the original chunk map is gone gone
> >    num_copies = ...
> 
> The bits on the right will be separate transactions, I thought that would save
> us - does it not? I'm guessing this means that search_commit_root means to
> search the last transaction that was flushed to disk, rather than the last
> in-memory transaction.
> 
> Yes, I'm definitely saving this bit until later.
> 

I think that's a good idea.

> > > +		ret = btrfs_translate_remap(fs_info, &new_logical, length,
> > > +					    nolock);
> > > +		if (ret && (!nolock || ret != -ENOENT))
> > > +			return ret;
> > > +
> > > +		if (ret != -ENOENT && new_logical != logical) {
> > > +			btrfs_free_chunk_map(map);
> > > +
> > > +			map = btrfs_get_chunk_map(fs_info, new_logical,
> > > +						  *length);
> > > +			if (IS_ERR(map))
> > > +				return PTR_ERR(map);
> > > +
> > > +			logical = new_logical;
> > > +		}
> > > +
> > > +		ret = 0;
> > > +	}
> > > +
> > >   	num_copies = btrfs_chunk_map_num_copies(map);
> > >   	if (io_geom.mirror_num > num_copies)
> > >   		return -EINVAL;
> > > -- 
> > > 2.49.1
> > > 
> 

  reply	other threads:[~2025-10-20 17:45 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-09 11:27 [PATCH v3 00/17] Remap tree Mark Harmstone
2025-10-09 11:27 ` [PATCH v3 01/17] btrfs: add definitions and constants for remap-tree Mark Harmstone
2025-10-09 11:27 ` [PATCH v3 02/17] btrfs: add REMAP chunk type Mark Harmstone
2025-10-15  3:37   ` Boris Burkov
2025-10-20  9:58     ` Mark Harmstone
2025-10-20 17:35       ` Boris Burkov
2025-10-09 11:27 ` [PATCH v3 03/17] btrfs: allow remapped chunks to have zero stripes Mark Harmstone
2025-10-15  3:47   ` Boris Burkov
2025-10-20 12:15     ` Mark Harmstone
2025-10-09 11:27 ` [PATCH v3 04/17] btrfs: remove remapped block groups from the free-space tree Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 05/17] btrfs: don't add metadata items for the remap tree to the extent tree Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 06/17] btrfs: add extended version of struct block_group_item Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 07/17] btrfs: allow mounting filesystems with remap-tree incompat flag Mark Harmstone
2025-10-15  3:55   ` Boris Burkov
2025-10-20 11:32     ` Mark Harmstone
2025-10-20 17:44       ` Boris Burkov
2025-10-09 11:28 ` [PATCH v3 08/17] btrfs: redirect I/O for remapped block groups Mark Harmstone
2025-10-15  4:21   ` Boris Burkov
2025-10-20 14:31     ` Mark Harmstone
2025-10-20 17:44       ` Boris Burkov [this message]
2025-10-09 11:28 ` [PATCH v3 09/17] btrfs: release BG lock before calling btrfs_link_bg_list() Mark Harmstone
2025-10-09 11:56   ` Filipe Manana
2025-10-09 14:58     ` Mark Harmstone
2025-10-09 15:16       ` Filipe Manana
2025-10-09 16:30         ` Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 10/17] btrfs: handle deletions from remapped block group Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 11/17] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 12/17] btrfs: move existing remaps before relocating block group Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 13/17] btrfs: replace identity maps with actual remaps when doing relocations Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 14/17] btrfs: add do_remap param to btrfs_discard_extent() Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 15/17] btrfs: allow balancing remap tree Mark Harmstone
2025-10-15  4:24   ` Boris Burkov
2025-10-09 11:28 ` [PATCH v3 16/17] btrfs: handle discarding fully-remapped block groups Mark Harmstone
2025-10-15  4:54   ` Boris Burkov
2025-10-23 17:35     ` Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 17/17] btrfs: add stripe removal pending flag Mark Harmstone
2025-10-15  5:05   ` Boris Burkov
2025-10-20 14:52     ` Mark Harmstone

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251020174456.GC696792@zen.localdomain \
    --to=boris@bur.io \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mark@harmstone.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox