Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: Johannes Thumshirn <jth@kernel.org>, Chris Mason <clm@fb.com>,
	Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>
Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	Filipe Manana <fdmanana@suse.com>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>
Subject: Re: [PATCH v2 3/3] btrfs: update stripe_extent delete loop assumptions
Date: Thu, 11 Jul 2024 17:25:02 +0930	[thread overview]
Message-ID: <5838503b-a4aa-4023-901b-99d637cadac4@suse.com> (raw)
In-Reply-To: <ca001842-92f4-46ff-80ee-e7a8a97fc433@suse.com>



在 2024/7/11 17:14, Qu Wenruo 写道:
> 
> 
> 在 2024/7/11 16:25, Qu Wenruo 写道:
>>
>>
>> 在 2024/7/11 15:51, Johannes Thumshirn 写道:
>>> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>>
>>> btrfs_delete_raid_extent() was written under the assumption, that it's
>>> call-chain always passes a start, length tuple that matches a single
>>> extent. But btrfs_delete_raid_extent() is called by
>>> do_free_extent_acounting() which in term is called by > 
>>> __btrfs_free_extent().
>>
>> But from the call site __btrfs_free_extent(), it is still called for a 
>> single extent.
>>
>> Or we will get an error and abort the current transaction.
> 
> Or does it mean, one data extent can have multiple RST entries?
> 
> Is that a non-zoned RST specific behavior?
> As I still remember that we split ordered extents for zoned devices, so 
> that it should always have one extent for each split OE.


OK, it's indeed an RST specific behavior (at least for RST with 
non-zoned devices).

I can have the following layout:

         item 15 key (258 EXTENT_DATA 419430400) itemoff 15306 itemsize 53
                 generation 10 type 1 (regular)
                 extent data disk byte 1808793600 nr 117440512
                 extent data offset 0 nr 117440512 ram 117440512
                 extent compression 0 (none)

Which is a large data extent with 112MiB length.

Meanwhile for the RST entries there are 3 split ones:

         item 13 key (1808793600 RAID_STRIPE 33619968) itemoff 15835 
itemsize 32
                         stripe 0 devid 2 physical 1787822080
                         stripe 1 devid 1 physical 1808793600
         item 14 key (1842413568 RAID_STRIPE 58789888) itemoff 15803 
itemsize 32
                         stripe 0 devid 2 physical 1821442048
                         stripe 1 devid 1 physical 1842413568
         item 15 key (1901203456 RAID_STRIPE 25030656) itemoff 15771 
itemsize 32
                         stripe 0 devid 2 physical 1880231936
                         stripe 1 devid 1 physical 1901203456

So yes, it's possible to have multiple RST entries for a single data 
extent, it's no longer the old zoned behavior.

In that case, the patch looks fine to me.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu


> 
> Thanks,
> Qu
>>
>>>
>>> But this call-chain passes in a start address and a length that can
>>> possibly match multiple on-disk extents.
>>
>> Mind to give a more detailed example on this?
>>
>> Thanks,
>> Qu
>>
>>>
>>> To make this possible, we have to adjust the start and length of each
>>> btree node lookup, to not delete beyond the requested range.
>>>
>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>> ---
>>>   fs/btrfs/raid-stripe-tree.c | 5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
>>> index fd56535b2289..6f65be334637 100644
>>> --- a/fs/btrfs/raid-stripe-tree.c
>>> +++ b/fs/btrfs/raid-stripe-tree.c
>>> @@ -66,6 +66,11 @@ int btrfs_delete_raid_extent(struct 
>>> btrfs_trans_handle *trans, u64 start, u64 le
>>>           if (ret)
>>>               break;
>>> +        start += key.offset;
>>> +        length -= key.offset;
>>> +        if (length == 0)
>>> +            break;
>>> +
>>>           btrfs_release_path(path);
>>>       }
>>>
>>
> 

      reply	other threads:[~2024-07-11  7:55 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-11  6:21 [PATCH v2 0/3] btrfs: more RAID stripe tree updates Johannes Thumshirn
2024-07-11  6:21 ` [PATCH v2 1/3] btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block Johannes Thumshirn
2024-07-11  6:21 ` [PATCH v2 2/3] btrfs: replace stripe extents Johannes Thumshirn
2024-07-11  7:51   ` Naohiro Aota
2024-07-12  6:34     ` Johannes Thumshirn
2024-07-11  6:21 ` [PATCH v2 3/3] btrfs: update stripe_extent delete loop assumptions Johannes Thumshirn
2024-07-11  6:55   ` Qu Wenruo
2024-07-11  7:44     ` Qu Wenruo
2024-07-11  7:55       ` Qu Wenruo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5838503b-a4aa-4023-901b-99d637cadac4@suse.com \
    --to=wqu@suse.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=johannes.thumshirn@wdc.com \
    --cc=josef@toxicpanda.com \
    --cc=jth@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox