From: Boris Burkov <boris@bur.io>
To: Mark Harmstone <mark@harmstone.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v3 08/17] btrfs: redirect I/O for remapped block groups
Date: Mon, 20 Oct 2025 10:44:56 -0700 [thread overview]
Message-ID: <20251020174456.GC696792@zen.localdomain> (raw)
In-Reply-To: <f74da96e-7b50-48f2-b9ad-a2aefc8ed9aa@harmstone.com>
On Mon, Oct 20, 2025 at 03:31:39PM +0100, Mark Harmstone wrote:
> On 15/10/2025 5.21 am, Boris Burkov wrote:
> > On Thu, Oct 09, 2025 at 12:28:03PM +0100, Mark Harmstone wrote:
> > > Change btrfs_map_block() so that if the block group has the REMAPPED
> > > flag set, we call btrfs_translate_remap() to obtain a new address.
> > >
> > > btrfs_translate_remap() searches the remap tree for a range
> > > corresponding to the logical address passed to btrfs_map_block(). If it
> > > is within an identity remap, this part of the block group hasn't yet
> > > been relocated, and so we use the existing address.
> > >
> > > If it is within an actual remap, we subtract the start of the remap
> > > range and add the address of its destination, contained in the item's
> > > payload.
> > >
> > > Signed-off-by: Mark Harmstone <mark@harmstone.com>
> > > ---
> > > fs/btrfs/relocation.c | 59 +++++++++++++++++++++++++++++++++++++++++++
> > > fs/btrfs/relocation.h | 2 ++
> > > fs/btrfs/volumes.c | 31 +++++++++++++++++++++++
> > > 3 files changed, 92 insertions(+)
> > >
> > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> > > index 748290758459..4f5d3aaf0f65 100644
> > > --- a/fs/btrfs/relocation.c
> > > +++ b/fs/btrfs/relocation.c
> > > @@ -3870,6 +3870,65 @@ static const char *stage_to_string(enum reloc_stage stage)
> > > return "unknown";
> > > }
> > > +int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
> > > + u64 *length, bool nolock)
> > > +{
> > > + int ret;
> > > + struct btrfs_key key, found_key;
> > > + struct extent_buffer *leaf;
> > > + struct btrfs_remap *remap;
> > > + BTRFS_PATH_AUTO_FREE(path);
> > > +
> > > + path = btrfs_alloc_path();
> > > + if (!path)
> > > + return -ENOMEM;
> > > +
> > > + if (nolock) {
> > > + path->search_commit_root = 1;
> > > + path->skip_locking = 1;
> > > + }
> > > +
> > > + key.objectid = *logical;
> > > + key.type = (u8)-1;
> > > + key.offset = (u64)-1;
> > > +
> > > + ret = btrfs_search_slot(NULL, fs_info->remap_root, &key, path,
> > > + 0, 0);
> > > + if (ret < 0)
> > > + return ret;
> > > +
> > > + leaf = path->nodes[0];
> > > +
> > > + if (path->slots[0] == 0)
> > > + return -ENOENT;
> > > +
> > > + path->slots[0]--;
> > > +
> > > + btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> > > +
> > > + if (found_key.type != BTRFS_REMAP_KEY &&
> > > + found_key.type != BTRFS_IDENTITY_REMAP_KEY) {
> > > + return -ENOENT;
> > > + }
> > > +
> > > + if (found_key.objectid > *logical ||
> > > + found_key.objectid + found_key.offset <= *logical) {
> > > + return -ENOENT;
> > > + }
> > > +
> > > + if (*logical + *length > found_key.objectid + found_key.offset)
> > > + *length = found_key.objectid + found_key.offset - *logical;
> > > +
> > > + if (found_key.type == BTRFS_IDENTITY_REMAP_KEY)
> > > + return 0;
> > > +
> > > + remap = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_remap);
> > > +
> > > + *logical += btrfs_remap_address(leaf, remap) - found_key.objectid;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > /*
> > > * function to relocate all extents in a block group.
> > > */
> > > diff --git a/fs/btrfs/relocation.h b/fs/btrfs/relocation.h
> > > index 5c36b3f84b57..a653c42a25a3 100644
> > > --- a/fs/btrfs/relocation.h
> > > +++ b/fs/btrfs/relocation.h
> > > @@ -31,5 +31,7 @@ int btrfs_should_cancel_balance(const struct btrfs_fs_info *fs_info);
> > > struct btrfs_root *find_reloc_root(struct btrfs_fs_info *fs_info, u64 bytenr);
> > > bool btrfs_should_ignore_reloc_root(const struct btrfs_root *root);
> > > u64 btrfs_get_reloc_bg_bytenr(const struct btrfs_fs_info *fs_info);
> > > +int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
> > > + u64 *length, bool nolock);
> > > #endif
> > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > > index 0abe02a7072f..f2d1203778aa 100644
> > > --- a/fs/btrfs/volumes.c
> > > +++ b/fs/btrfs/volumes.c
> > > @@ -6635,6 +6635,37 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
> > > if (IS_ERR(map))
> > > return PTR_ERR(map);
> > > + if (map->type & BTRFS_BLOCK_GROUP_REMAPPED) {
> > > + u64 new_logical = logical;
> > > + bool nolock = !(map->type & BTRFS_BLOCK_GROUP_DATA);
> >
> > Why not data?
>
> Because I thought we'd maxed out our lockdep classes so this was the only way to
> prevent lockdep from complaining about metadata reads - not yet knowing about
> the btrfs_lockdep_keysets array.
>
> I think I'm going to remove the search_commit_root stuff from this patchset, and
> leave it for a future optimization. As you know it's full of gotchas.
>
> > > +
> > > + /*
> > > + * We use search_commit_root in btrfs_translate_remap for
> > > + * metadata blocks, to avoid lockdep complaining about
> > > + * recursive locking.
> > > + * If we get -ENOENT this means this is a BG that has just had
> > > + * its REMAPPED flag set, and so nothing has yet been actually
> > > + * remapped.
> > > + */
> >
> > I'm actually kind of worried about this now. What is preventing the
> > following racy interleaving:
> >
> > T1 T2
> > start_block_group_remapping() // in TXN-K set REMAPPED
> > btrfs_map_block()
> > btrfs_translate_remap()
> > ENOENT // searched in commit root
> > do_remap_tree_reloc() // in TXN-K do all remaps
> > // fully remapped, removed, discarded
> > // TXN-K committed
> > // not remapped! but the original chunk map is gone gone
> > num_copies = ...
>
> The bits on the right will be separate transactions, I thought that would save
> us - does it not? I'm guessing this means that search_commit_root means to
> search the last transaction that was flushed to disk, rather than the last
> in-memory transaction.
>
> Yes, I'm definitely saving this bit until later.
>
I think that's a good idea.
> > > + ret = btrfs_translate_remap(fs_info, &new_logical, length,
> > > + nolock);
> > > + if (ret && (!nolock || ret != -ENOENT))
> > > + return ret;
> > > +
> > > + if (ret != -ENOENT && new_logical != logical) {
> > > + btrfs_free_chunk_map(map);
> > > +
> > > + map = btrfs_get_chunk_map(fs_info, new_logical,
> > > + *length);
> > > + if (IS_ERR(map))
> > > + return PTR_ERR(map);
> > > +
> > > + logical = new_logical;
> > > + }
> > > +
> > > + ret = 0;
> > > + }
> > > +
> > > num_copies = btrfs_chunk_map_num_copies(map);
> > > if (io_geom.mirror_num > num_copies)
> > > return -EINVAL;
> > > --
> > > 2.49.1
> > >
>
next prev parent reply other threads:[~2025-10-20 17:45 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-09 11:27 [PATCH v3 00/17] Remap tree Mark Harmstone
2025-10-09 11:27 ` [PATCH v3 01/17] btrfs: add definitions and constants for remap-tree Mark Harmstone
2025-10-09 11:27 ` [PATCH v3 02/17] btrfs: add REMAP chunk type Mark Harmstone
2025-10-15 3:37 ` Boris Burkov
2025-10-20 9:58 ` Mark Harmstone
2025-10-20 17:35 ` Boris Burkov
2025-10-09 11:27 ` [PATCH v3 03/17] btrfs: allow remapped chunks to have zero stripes Mark Harmstone
2025-10-15 3:47 ` Boris Burkov
2025-10-20 12:15 ` Mark Harmstone
2025-10-09 11:27 ` [PATCH v3 04/17] btrfs: remove remapped block groups from the free-space tree Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 05/17] btrfs: don't add metadata items for the remap tree to the extent tree Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 06/17] btrfs: add extended version of struct block_group_item Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 07/17] btrfs: allow mounting filesystems with remap-tree incompat flag Mark Harmstone
2025-10-15 3:55 ` Boris Burkov
2025-10-20 11:32 ` Mark Harmstone
2025-10-20 17:44 ` Boris Burkov
2025-10-09 11:28 ` [PATCH v3 08/17] btrfs: redirect I/O for remapped block groups Mark Harmstone
2025-10-15 4:21 ` Boris Burkov
2025-10-20 14:31 ` Mark Harmstone
2025-10-20 17:44 ` Boris Burkov [this message]
2025-10-09 11:28 ` [PATCH v3 09/17] btrfs: release BG lock before calling btrfs_link_bg_list() Mark Harmstone
2025-10-09 11:56 ` Filipe Manana
2025-10-09 14:58 ` Mark Harmstone
2025-10-09 15:16 ` Filipe Manana
2025-10-09 16:30 ` Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 10/17] btrfs: handle deletions from remapped block group Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 11/17] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 12/17] btrfs: move existing remaps before relocating block group Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 13/17] btrfs: replace identity maps with actual remaps when doing relocations Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 14/17] btrfs: add do_remap param to btrfs_discard_extent() Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 15/17] btrfs: allow balancing remap tree Mark Harmstone
2025-10-15 4:24 ` Boris Burkov
2025-10-09 11:28 ` [PATCH v3 16/17] btrfs: handle discarding fully-remapped block groups Mark Harmstone
2025-10-15 4:54 ` Boris Burkov
2025-10-23 17:35 ` Mark Harmstone
2025-10-09 11:28 ` [PATCH v3 17/17] btrfs: add stripe removal pending flag Mark Harmstone
2025-10-15 5:05 ` Boris Burkov
2025-10-20 14:52 ` Mark Harmstone
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251020174456.GC696792@zen.localdomain \
--to=boris@bur.io \
--cc=linux-btrfs@vger.kernel.org \
--cc=mark@harmstone.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox