Re: [PATCH 03/12] btrfs: allow remapped chunks to have zero stripes

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Mark Harmstone <mark@harmstone.com>
To: Boris Burkov <boris@bur.io>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 03/12] btrfs: allow remapped chunks to have zero stripes
Date: Fri, 8 Aug 2025 15:12:10 +0100	[thread overview]
Message-ID: <5918c2a5-7145-46a9-9ab4-779e172c7c83@harmstone.com> (raw)
In-Reply-To: <20250613214107.GC3621880@zen.localdomain>

On 13/06/2025 10.41 pm, Boris Burkov wrote:

...snip...

 >> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
>> index 0505f8d76581..fd83df06e3fb 100644
>> --- a/fs/btrfs/tree-checker.c
>> +++ b/fs/btrfs/tree-checker.c
>> @@ -829,7 +829,7 @@ int btrfs_check_chunk_valid(const struct btrfs_fs_info *fs_info,
>>   	u64 type;
>>   	u64 features;
>>   	u32 chunk_sector_size;
>> -	bool mixed = false;
>> +	bool mixed = false, remapped;
> 
> nit:
> I *personally* don't like such declarations. I think they are lightly if
> not decisively discouraged in the Linux Coding Style document as well
> (at least to my reading). In our code, I found some examples where they
> are used where both values are related and get assigned. But I haven't
> seen any like this for unrelated variables with mixed assignment.
> 
>>   	int raid_index;
>>   	int nparity;
>>   	int ncopies;
>> @@ -853,12 +853,14 @@ int btrfs_check_chunk_valid(const struct btrfs_fs_info *fs_info, >  	ncopies = btrfs_raid_array[raid_index].ncopies;
>>   	nparity = btrfs_raid_array[raid_index].nparity;
>>   
>> -	if (unlikely(!num_stripes)) {
>> +	remapped = type & BTRFS_BLOCK_GROUP_REMAPPED;
>> +
>> +	if (unlikely(!remapped && !num_stripes)) {
>>   		chunk_err(fs_info, leaf, chunk, logical,
>>   			  "invalid chunk num_stripes, have %u", num_stripes);
>>   		return -EUCLEAN;
>>   	}
>> -	if (unlikely(num_stripes < ncopies)) {
>> +	if (unlikely(!remapped && num_stripes < ncopies)) {
> 
> I think remapped only permits you exactly num_stripes == 0, not
> num_stripes = 2 if ncopies = 3, right? Though it makes the logic less
> neat, I would make the check as precise as possible on the invariants.

I'm changing the check here to num_stripes != 0 && num_stripes < ncopies,
which I hope makes things a little clearer. It also means that we have an
error if something goes wrong with a partially remapped chunk.

The current logic is that we keep the RAID flags as they are when a chunk
is remapped, and ncopies is derived from the RAID flags. I'm not entirely
sure this is the right thing to do, but it hopefully might make it easier
to discover what happened if something goes wrong. Similarly with
sub_stripes etc.

>>   		chunk_err(fs_info, leaf, chunk, logical,
>>   			  "invalid chunk num_stripes < ncopies, have %u < %d",
>>   			  num_stripes, ncopies);
>> @@ -960,7 +962,7 @@ int btrfs_check_chunk_valid(const struct btrfs_fs_info *fs_info,
>>   		}
>>   	}
>>   
>> -	if (unlikely((type & BTRFS_BLOCK_GROUP_RAID10 &&
>> +	if (unlikely(!remapped && ((type & BTRFS_BLOCK_GROUP_RAID10 &&
>>   		      sub_stripes != btrfs_raid_array[BTRFS_RAID_RAID10].sub_stripes) ||
>>   		     (type & BTRFS_BLOCK_GROUP_RAID1 &&
>>   		      num_stripes != btrfs_raid_array[BTRFS_RAID_RAID1].devs_min) ||
>> @@ -975,7 +977,7 @@ int btrfs_check_chunk_valid(const struct btrfs_fs_info *fs_info,
>>   		     (type & BTRFS_BLOCK_GROUP_DUP &&
>>   		      num_stripes != btrfs_raid_array[BTRFS_RAID_DUP].dev_stripes) ||
>>   		     ((type & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0 &&
>> -		      num_stripes != btrfs_raid_array[BTRFS_RAID_SINGLE].dev_stripes))) {
>> +		      num_stripes != btrfs_raid_array[BTRFS_RAID_SINGLE].dev_stripes)))) {
>>   		chunk_err(fs_info, leaf, chunk, logical,
>>   			"invalid num_stripes:sub_stripes %u:%u for profile %llu",
>>   			num_stripes, sub_stripes,
>> @@ -999,11 +1001,11 @@ static int check_leaf_chunk_item(struct extent_buffer *leaf,
>>   	struct btrfs_fs_info *fs_info = leaf->fs_info;
>>   	int num_stripes;
>>   
>> -	if (unlikely(btrfs_item_size(leaf, slot) < sizeof(struct btrfs_chunk))) {
>> +	if (unlikely(btrfs_item_size(leaf, slot) < offsetof(struct btrfs_chunk, stripe))) {
>>   		chunk_err(fs_info, leaf, chunk, key->offset,
>>   			"invalid chunk item size: have %u expect [%zu, %u)",
>>   			btrfs_item_size(leaf, slot),
>> -			sizeof(struct btrfs_chunk),
>> +			offsetof(struct btrfs_chunk, stripe),
>>   			BTRFS_LEAF_DATA_SIZE(fs_info));
>>   		return -EUCLEAN;
> 
> Same complaint as above for nstripes < ncopies. Is there some way to
> more generically bypass stripe checking if we detect the case we care
> about: (remapped && num_stripes == 0) ?

I'm presuming this comment is for the if beginning on line 964. I'm moving this whole
bit to a helper function, it's too confusing as it is.

> 
>>   	}
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index e7c467b6af46..9159d11cb143 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -6133,6 +6133,12 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
>>   		goto out_free_map;
>>   	}
>>   
>> +	/* avoid divide by zero on fully-remapped chunks */
>> +	if (map->num_stripes == 0) {
>> +		ret = -EOPNOTSUPP;
>> +		goto out_free_map;
>> +	}
>> +
> 
> This seems kinda sketchy to me. Presumably once we have remapped a block
> group, we do want to discard it. But this makes that impossible? Does
> the discarding happen before we set num_stripes to 0? Even with
> discard=async?
> 
>>   	offset = logical - map->start;
>>   	length = min_t(u64, map->start + map->chunk_len - logical, length);
>>   	*length_ret = length;
>> @@ -6953,7 +6959,7 @@ u64 btrfs_calc_stripe_length(const struct btrfs_chunk_map *map)
>>   {
>>   	const int data_stripes = calc_data_stripes(map->type, map->num_stripes);
>>   
>> -	return div_u64(map->chunk_len, data_stripes);
>> +	return data_stripes ? div_u64(map->chunk_len, data_stripes) : 0;
> 
> Rather than making 0 a special value meaning remapped, would it be
> useful/clearer to also include a remapped flag on the btrfs_chunk_map?
> This function would have to behave the same way, presumably, but it
> would be less implicit, and we could make assertions on chunk maps that
> they have 0 stripes iff remapped at various places where we use the
> stripe length. That would raise confidence that all the uses of stripe
> logic account for remapped chunks.

It's less that 0 is a special value, more that btrfs_calc_stripe_length()
gets called from read_one_chunk(), and will divide-by-zero otherwise. I could
have put a check in read_one_chunk() instead, but this seems less footgunny.
If a chunk is fully remapped all I/O to it will be redirected elsewhere, so its
stripe length won't actually be used.

The BTRFS_BLOCK_GROUP_REMAPPED flag already gets set in the btrfs_chunk_map,
as does the num_stripes == 0 meaning fully remapped.

> 
>>   }
>>   
>>   #if BITS_PER_LONG == 32
>> -- 
>> 2.49.0
>>
>

next prev parent reply	other threads:[~2025-08-08 14:21 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-05 16:23 [PATCH 00/12] btrfs: remap tree Mark Harmstone
2025-06-05 16:23 ` [PATCH 01/12] btrfs: add definitions and constants for remap-tree Mark Harmstone
2025-06-13 21:02   ` Boris Burkov
2025-06-05 16:23 ` [PATCH 02/12] btrfs: add REMAP chunk type Mark Harmstone
2025-06-13 21:22   ` Boris Burkov
2025-06-05 16:23 ` [PATCH 03/12] btrfs: allow remapped chunks to have zero stripes Mark Harmstone
2025-06-13 21:41   ` Boris Burkov
2025-08-08 14:12     ` Mark Harmstone [this message]
2025-06-05 16:23 ` [PATCH 04/12] btrfs: remove remapped block groups from the free-space tree Mark Harmstone
2025-06-06  6:41   ` kernel test robot
2025-06-13 22:00   ` Boris Burkov
2025-08-12 14:50     ` Mark Harmstone
2025-06-05 16:23 ` [PATCH 05/12] btrfs: don't add metadata items for the remap tree to the extent tree Mark Harmstone
2025-06-13 22:39   ` Boris Burkov
2025-06-05 16:23 ` [PATCH 06/12] btrfs: add extended version of struct block_group_item Mark Harmstone
2025-06-05 16:23 ` [PATCH 07/12] btrfs: allow mounting filesystems with remap-tree incompat flag Mark Harmstone
2025-06-05 16:23 ` [PATCH 08/12] btrfs: redirect I/O for remapped block groups Mark Harmstone
2025-06-05 16:23 ` [PATCH 09/12] btrfs: handle deletions from remapped block group Mark Harmstone
2025-06-13 23:42   ` Boris Burkov
2025-08-11 16:48     ` Mark Harmstone
2025-08-11 16:59     ` Mark Harmstone
2025-06-05 16:23 ` [PATCH 10/12] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
2025-06-13 23:25   ` Boris Burkov
2025-08-12 11:20     ` Mark Harmstone
2025-06-05 16:23 ` [PATCH 11/12] btrfs: move existing remaps before relocating block group Mark Harmstone
2025-06-06 11:20   ` kernel test robot
2025-06-05 16:23 ` [PATCH 12/12] btrfs: replace identity maps with actual remaps when doing relocations Mark Harmstone
2025-06-05 16:43 ` [PATCH 00/12] btrfs: remap tree Jonah Sabean
2025-06-06 13:35   ` Mark Harmstone
2025-06-09 16:05     ` Anand Jain
2025-06-09 18:51 ` David Sterba
2025-06-10  9:19   ` Mark Harmstone
2025-06-10 14:31 ` Mark Harmstone
2025-06-10 23:56   ` Qu Wenruo
2025-06-11  8:06     ` Mark Harmstone
2025-06-11 15:28 ` Mark Harmstone
2025-06-14  0:04 ` Boris Burkov
2025-06-26 22:10 ` Mark Harmstone
2025-06-27  5:59   ` Neal Gompa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5918c2a5-7145-46a9-9ab4-779e172c7c83@harmstone.com \
    --to=mark@harmstone.com \
    --cc=boris@bur.io \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox