* [PATCH v3 0/3] btrfs: more RAID stripe tree updates
@ 2024-07-12 7:48 Johannes Thumshirn
2024-07-12 7:48 ` [PATCH v3 1/3] btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block Johannes Thumshirn
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2024-07-12 7:48 UTC (permalink / raw)
To: Chris Mason, Josef Bacik, David Sterba
Cc: linux-btrfs, linux-kernel, Qu Wenru, Filipe Manana,
Johannes Thumshirn
Three further RST updates targeted for 6.11 (hopefully).
The first one is a reworked version of the scrub vs dev-replace deadlock
fix. It does have reviews from Josef and Qu but I'd love to head Filipe's
take on it.
The second one updates a stripe extent in case a write to a already
present logical address happens.
The third one correects assumptions in the delete code. My assumption was
that we are deleting a single stripe extent on each call to
btrfs_delete_stripe_extent(). But do_free_extent_accounting() passes in a
start address and range of bytes that is deleted, so we need to keep track
of how many bytes we already have deleted and update the loop accordingly.
NOTE:
The next big bug in RST is related to relocation. When relocation is
reading from disk (via relocate_file_extent_cluster() ->
page_cache_ra_unbounded()) we're trying to lookup logical addresses that
for some reason RST does not know about and this leads to a tree dump and
ultimately a panic afterwards.
---
Changes in v3:
- Add Qu's Reviewed-by on patch 3
- Change patch 2 to using write_extent_buffer() (and drop Qu's R-b again)
- Link to v2: https://lore.kernel.org/r/20240711-b4-rst-updates-v2-0-d7b8113d88b7@kernel.org
Changes in v2:
- Add Qu's Reviewed-by on patch 2
- Add patch 3
- Link to v1: https://lore.kernel.org/r/20240709-b4-rst-updates-v1-0-200800dfe0fd@kernel.org
---
Johannes Thumshirn (3):
btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block
btrfs: replace stripe extents
btrfs: update stripe_extent delete loop assumptions
fs/btrfs/raid-stripe-tree.c | 38 ++++++++++++++++++++++++++++++++++++++
fs/btrfs/volumes.c | 28 +++++++++++++++++-----------
2 files changed, 55 insertions(+), 11 deletions(-)
---
base-commit: 584df860cac6e35e364ada101ccd13495b954644
change-id: 20240709-b4-rst-updates-bb9c0e49cd5b
Best regards,
--
Johannes Thumshirn <jth@kernel.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 1/3] btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block
2024-07-12 7:48 [PATCH v3 0/3] btrfs: more RAID stripe tree updates Johannes Thumshirn
@ 2024-07-12 7:48 ` Johannes Thumshirn
2024-07-15 11:29 ` Filipe Manana
2024-07-12 7:48 ` [PATCH v3 2/3] btrfs: replace stripe extents Johannes Thumshirn
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Johannes Thumshirn @ 2024-07-12 7:48 UTC (permalink / raw)
To: Chris Mason, Josef Bacik, David Sterba
Cc: linux-btrfs, linux-kernel, Qu Wenru, Filipe Manana,
Johannes Thumshirn
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Don't hold the dev_replace rwsem for the entirety of btrfs_map_block().
It is only needed to protect
a) calls to find_live_mirror() and
b) calling into handle_ops_on_dev_replace().
But there is no need to hold the rwsem for any kind of set_io_stripe()
calls.
So relax taking the dev_replace rwsem to only protect both cases and check
if the device replace status has changed in the meantime, for which we have
to re-do the find_live_mirror() calls.
This fixes a deadlock on raid-stripe-tree where device replace performs a
scrub operation, which in turn calls into btrfs_map_block() to find the
physical location of the block.
Cc: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/volumes.c | 28 +++++++++++++++++-----------
1 file changed, 17 insertions(+), 11 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fcedc43ef291..4209419244a1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6650,14 +6650,9 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
max_len = btrfs_max_io_len(map, map_offset, &io_geom);
*length = min_t(u64, map->chunk_len - map_offset, max_len);
+again:
down_read(&dev_replace->rwsem);
dev_replace_is_ongoing = btrfs_dev_replace_is_ongoing(dev_replace);
- /*
- * Hold the semaphore for read during the whole operation, write is
- * requested at commit time but must wait.
- */
- if (!dev_replace_is_ongoing)
- up_read(&dev_replace->rwsem);
switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
case BTRFS_BLOCK_GROUP_RAID0:
@@ -6695,6 +6690,7 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
"stripe index math went horribly wrong, got stripe_index=%u, num_stripes=%u",
io_geom.stripe_index, map->num_stripes);
ret = -EINVAL;
+ up_read(&dev_replace->rwsem);
goto out;
}
@@ -6710,6 +6706,8 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
*/
num_alloc_stripes += 2;
+ up_read(&dev_replace->rwsem);
+
/*
* If this I/O maps to a single device, try to return the device and
* physical block information on the stack instead of allocating an
@@ -6782,6 +6780,18 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
goto out;
}
+ /*
+ * Check if something changed the dev_replace state since
+ * we've checked it for the last time and if redo the whole
+ * mapping operation.
+ */
+ down_read(&dev_replace->rwsem);
+ if (dev_replace_is_ongoing !=
+ btrfs_dev_replace_is_ongoing(dev_replace)) {
+ up_read(&dev_replace->rwsem);
+ goto again;
+ }
+
if (op != BTRFS_MAP_READ)
io_geom.max_errors = btrfs_chunk_max_errors(map);
@@ -6789,6 +6799,7 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
op != BTRFS_MAP_READ) {
handle_ops_on_dev_replace(bioc, dev_replace, logical, &io_geom);
}
+ up_read(&dev_replace->rwsem);
*bioc_ret = bioc;
bioc->num_stripes = io_geom.num_stripes;
@@ -6796,11 +6807,6 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
bioc->mirror_num = io_geom.mirror_num;
out:
- if (dev_replace_is_ongoing) {
- lockdep_assert_held(&dev_replace->rwsem);
- /* Unlock and let waiting writers proceed */
- up_read(&dev_replace->rwsem);
- }
btrfs_free_chunk_map(map);
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 2/3] btrfs: replace stripe extents
2024-07-12 7:48 [PATCH v3 0/3] btrfs: more RAID stripe tree updates Johannes Thumshirn
2024-07-12 7:48 ` [PATCH v3 1/3] btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block Johannes Thumshirn
@ 2024-07-12 7:48 ` Johannes Thumshirn
2024-07-12 10:19 ` Qu Wenruo
2024-07-18 15:35 ` David Sterba
2024-07-12 7:48 ` [PATCH v3 3/3] btrfs: update stripe_extent delete loop assumptions Johannes Thumshirn
2024-07-15 12:41 ` [PATCH v3 0/3] btrfs: more RAID stripe tree updates David Sterba
3 siblings, 2 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2024-07-12 7:48 UTC (permalink / raw)
To: Chris Mason, Josef Bacik, David Sterba
Cc: linux-btrfs, linux-kernel, Qu Wenru, Filipe Manana,
Johannes Thumshirn
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Update stripe extents in case a write to an already existing address
incoming.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/raid-stripe-tree.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index e6f7a234b8f6..53ca2c1a32ac 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -73,6 +73,36 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
return ret;
}
+static int update_raid_extent_item(struct btrfs_trans_handle *trans,
+ struct btrfs_key *key,
+ struct btrfs_stripe_extent *stripe_extent,
+ const size_t item_size)
+{
+ struct btrfs_path *path;
+ struct extent_buffer *leaf;
+ int ret;
+ int slot;
+
+ path = btrfs_alloc_path();
+ if (!path)
+ return -ENOMEM;
+
+ ret = btrfs_search_slot(trans, trans->fs_info->stripe_root, key, path,
+ 0, 1);
+ if (ret)
+ return ret == 1 ? ret : -EINVAL;
+
+ leaf = path->nodes[0];
+ slot = path->slots[0];
+
+ write_extent_buffer(leaf, stripe_extent,
+ btrfs_item_ptr_offset(leaf, slot), item_size);
+ btrfs_mark_buffer_dirty(trans, leaf);
+ btrfs_free_path(path);
+
+ return ret;
+}
+
static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans,
struct btrfs_io_context *bioc)
{
@@ -112,6 +142,9 @@ static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans,
ret = btrfs_insert_item(trans, stripe_root, &stripe_key, stripe_extent,
item_size);
+ if (ret == -EEXIST)
+ ret = update_raid_extent_item(trans, &stripe_key, stripe_extent,
+ item_size);
if (ret)
btrfs_abort_transaction(trans, ret);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 3/3] btrfs: update stripe_extent delete loop assumptions
2024-07-12 7:48 [PATCH v3 0/3] btrfs: more RAID stripe tree updates Johannes Thumshirn
2024-07-12 7:48 ` [PATCH v3 1/3] btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block Johannes Thumshirn
2024-07-12 7:48 ` [PATCH v3 2/3] btrfs: replace stripe extents Johannes Thumshirn
@ 2024-07-12 7:48 ` Johannes Thumshirn
2024-07-15 12:41 ` [PATCH v3 0/3] btrfs: more RAID stripe tree updates David Sterba
3 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2024-07-12 7:48 UTC (permalink / raw)
To: Chris Mason, Josef Bacik, David Sterba
Cc: linux-btrfs, linux-kernel, Qu Wenru, Filipe Manana,
Johannes Thumshirn
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
btrfs_delete_raid_extent() was written under the assumption, that it's
call-chain always passes a start, length tuple that matches a single
extent. But btrfs_delete_raid_extent() is called by
do_free_extent_acounting() which in term is called by
__btrfs_free_extent().
But this call-chain passes in a start address and a length that can
possibly match multiple on-disk extents.
To make this possible, we have to adjust the start and length of each
btree node lookup, to not delete beyond the requested range.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/raid-stripe-tree.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 53ca2c1a32ac..684d4744f02d 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -66,6 +66,11 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
if (ret)
break;
+ start += key.offset;
+ length -= key.offset;
+ if (length == 0)
+ break;
+
btrfs_release_path(path);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/3] btrfs: replace stripe extents
2024-07-12 7:48 ` [PATCH v3 2/3] btrfs: replace stripe extents Johannes Thumshirn
@ 2024-07-12 10:19 ` Qu Wenruo
2024-07-18 15:35 ` David Sterba
1 sibling, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2024-07-12 10:19 UTC (permalink / raw)
To: Johannes Thumshirn, Chris Mason, Josef Bacik, David Sterba
Cc: linux-btrfs, linux-kernel, Qu Wenru, Filipe Manana,
Johannes Thumshirn
在 2024/7/12 17:18, Johannes Thumshirn 写道:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Update stripe extents in case a write to an already existing address
> incoming.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/raid-stripe-tree.c | 33 +++++++++++++++++++++++++++++++++
> 1 file changed, 33 insertions(+)
>
> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> index e6f7a234b8f6..53ca2c1a32ac 100644
> --- a/fs/btrfs/raid-stripe-tree.c
> +++ b/fs/btrfs/raid-stripe-tree.c
> @@ -73,6 +73,36 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
> return ret;
> }
>
> +static int update_raid_extent_item(struct btrfs_trans_handle *trans,
> + struct btrfs_key *key,
> + struct btrfs_stripe_extent *stripe_extent,
> + const size_t item_size)
> +{
> + struct btrfs_path *path;
> + struct extent_buffer *leaf;
> + int ret;
> + int slot;
> +
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
> +
> + ret = btrfs_search_slot(trans, trans->fs_info->stripe_root, key, path,
> + 0, 1);
> + if (ret)
> + return ret == 1 ? ret : -EINVAL;
> +
> + leaf = path->nodes[0];
> + slot = path->slots[0];
> +
> + write_extent_buffer(leaf, stripe_extent,
> + btrfs_item_ptr_offset(leaf, slot), item_size);
Since the replace one should be the same size, an ASSERT() would make it
easier to catch future problems.
Thanks,
Qu
> + btrfs_mark_buffer_dirty(trans, leaf);
> + btrfs_free_path(path);
> +
> + return ret;
> +}
> +
> static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans,
> struct btrfs_io_context *bioc)
> {
> @@ -112,6 +142,9 @@ static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans,
>
> ret = btrfs_insert_item(trans, stripe_root, &stripe_key, stripe_extent,
> item_size);
> + if (ret == -EEXIST)
> + ret = update_raid_extent_item(trans, &stripe_key, stripe_extent,
> + item_size);
> if (ret)
> btrfs_abort_transaction(trans, ret);
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/3] btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block
2024-07-12 7:48 ` [PATCH v3 1/3] btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block Johannes Thumshirn
@ 2024-07-15 11:29 ` Filipe Manana
2024-07-15 11:38 ` Johannes Thumshirn
0 siblings, 1 reply; 9+ messages in thread
From: Filipe Manana @ 2024-07-15 11:29 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs, linux-kernel,
Qu Wenru, Johannes Thumshirn
On Fri, Jul 12, 2024 at 8:49 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Don't hold the dev_replace rwsem for the entirety of btrfs_map_block().
>
> It is only needed to protect
> a) calls to find_live_mirror() and
> b) calling into handle_ops_on_dev_replace().
>
> But there is no need to hold the rwsem for any kind of set_io_stripe()
> calls.
>
> So relax taking the dev_replace rwsem to only protect both cases and check
> if the device replace status has changed in the meantime, for which we have
> to re-do the find_live_mirror() calls.
>
> This fixes a deadlock on raid-stripe-tree where device replace performs a
> scrub operation, which in turn calls into btrfs_map_block() to find the
> physical location of the block.
>
> Cc: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/volumes.c | 28 +++++++++++++++++-----------
> 1 file changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index fcedc43ef291..4209419244a1 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6650,14 +6650,9 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
> max_len = btrfs_max_io_len(map, map_offset, &io_geom);
> *length = min_t(u64, map->chunk_len - map_offset, max_len);
>
> +again:
> down_read(&dev_replace->rwsem);
> dev_replace_is_ongoing = btrfs_dev_replace_is_ongoing(dev_replace);
> - /*
> - * Hold the semaphore for read during the whole operation, write is
> - * requested at commit time but must wait.
> - */
> - if (!dev_replace_is_ongoing)
> - up_read(&dev_replace->rwsem);
>
> switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
> case BTRFS_BLOCK_GROUP_RAID0:
> @@ -6695,6 +6690,7 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
> "stripe index math went horribly wrong, got stripe_index=%u, num_stripes=%u",
> io_geom.stripe_index, map->num_stripes);
> ret = -EINVAL;
> + up_read(&dev_replace->rwsem);
> goto out;
> }
>
> @@ -6710,6 +6706,8 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
> */
> num_alloc_stripes += 2;
>
> + up_read(&dev_replace->rwsem);
> +
> /*
> * If this I/O maps to a single device, try to return the device and
> * physical block information on the stack instead of allocating an
> @@ -6782,6 +6780,18 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
> goto out;
> }
>
> + /*
> + * Check if something changed the dev_replace state since
> + * we've checked it for the last time and if redo the whole
> + * mapping operation.
> + */
> + down_read(&dev_replace->rwsem);
> + if (dev_replace_is_ongoing !=
> + btrfs_dev_replace_is_ongoing(dev_replace)) {
> + up_read(&dev_replace->rwsem);
> + goto again;
We previously allocated bioc, so before the goto we have to free it
(call btrfs_put_bioc(bioc)), otherwise we'll leak it as after the goto
we end up allocating a new one.
Otherwise it looks fine, thanks.
> + }
> +
> if (op != BTRFS_MAP_READ)
> io_geom.max_errors = btrfs_chunk_max_errors(map);
>
> @@ -6789,6 +6799,7 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
> op != BTRFS_MAP_READ) {
> handle_ops_on_dev_replace(bioc, dev_replace, logical, &io_geom);
> }
> + up_read(&dev_replace->rwsem);
>
> *bioc_ret = bioc;
> bioc->num_stripes = io_geom.num_stripes;
> @@ -6796,11 +6807,6 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
> bioc->mirror_num = io_geom.mirror_num;
>
> out:
> - if (dev_replace_is_ongoing) {
> - lockdep_assert_held(&dev_replace->rwsem);
> - /* Unlock and let waiting writers proceed */
> - up_read(&dev_replace->rwsem);
> - }
> btrfs_free_chunk_map(map);
> return ret;
> }
>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/3] btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block
2024-07-15 11:29 ` Filipe Manana
@ 2024-07-15 11:38 ` Johannes Thumshirn
0 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2024-07-15 11:38 UTC (permalink / raw)
To: Filipe Manana, Johannes Thumshirn
Cc: Chris Mason, Josef Bacik, David Sterba,
linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
Qu Wenru
On 15.07.24 13:29, Filipe Manana wrote:
> On Fri, Jul 12, 2024 at 8:49 AM Johannes Thumshirn <jth@kernel.org> wrote:
>>
>> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>
>> Don't hold the dev_replace rwsem for the entirety of btrfs_map_block().
>>
>> It is only needed to protect
>> a) calls to find_live_mirror() and
>> b) calling into handle_ops_on_dev_replace().
>>
>> But there is no need to hold the rwsem for any kind of set_io_stripe()
>> calls.
>>
>> So relax taking the dev_replace rwsem to only protect both cases and check
>> if the device replace status has changed in the meantime, for which we have
>> to re-do the find_live_mirror() calls.
>>
>> This fixes a deadlock on raid-stripe-tree where device replace performs a
>> scrub operation, which in turn calls into btrfs_map_block() to find the
>> physical location of the block.
>>
>> Cc: Filipe Manana <fdmanana@suse.com>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>> Reviewed-by: Qu Wenruo <wqu@suse.com>
>> ---
>> fs/btrfs/volumes.c | 28 +++++++++++++++++-----------
>> 1 file changed, 17 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index fcedc43ef291..4209419244a1 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -6650,14 +6650,9 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>> max_len = btrfs_max_io_len(map, map_offset, &io_geom);
>> *length = min_t(u64, map->chunk_len - map_offset, max_len);
>>
>> +again:
>> down_read(&dev_replace->rwsem);
>> dev_replace_is_ongoing = btrfs_dev_replace_is_ongoing(dev_replace);
>> - /*
>> - * Hold the semaphore for read during the whole operation, write is
>> - * requested at commit time but must wait.
>> - */
>> - if (!dev_replace_is_ongoing)
>> - up_read(&dev_replace->rwsem);
>>
>> switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
>> case BTRFS_BLOCK_GROUP_RAID0:
>> @@ -6695,6 +6690,7 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>> "stripe index math went horribly wrong, got stripe_index=%u, num_stripes=%u",
>> io_geom.stripe_index, map->num_stripes);
>> ret = -EINVAL;
>> + up_read(&dev_replace->rwsem);
>> goto out;
>> }
>>
>> @@ -6710,6 +6706,8 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>> */
>> num_alloc_stripes += 2;
>>
>> + up_read(&dev_replace->rwsem);
>> +
>> /*
>> * If this I/O maps to a single device, try to return the device and
>> * physical block information on the stack instead of allocating an
>> @@ -6782,6 +6780,18 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>> goto out;
>> }
>>
>> + /*
>> + * Check if something changed the dev_replace state since
>> + * we've checked it for the last time and if redo the whole
>> + * mapping operation.
>> + */
>> + down_read(&dev_replace->rwsem);
>> + if (dev_replace_is_ongoing !=
>> + btrfs_dev_replace_is_ongoing(dev_replace)) {
>> + up_read(&dev_replace->rwsem);
>> + goto again;
>
> We previously allocated bioc, so before the goto we have to free it
> (call btrfs_put_bioc(bioc)), otherwise we'll leak it as after the goto
> we end up allocating a new one.
>
> Otherwise it looks fine, thanks.
>
Good catch, will update.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 0/3] btrfs: more RAID stripe tree updates
2024-07-12 7:48 [PATCH v3 0/3] btrfs: more RAID stripe tree updates Johannes Thumshirn
` (2 preceding siblings ...)
2024-07-12 7:48 ` [PATCH v3 3/3] btrfs: update stripe_extent delete loop assumptions Johannes Thumshirn
@ 2024-07-15 12:41 ` David Sterba
3 siblings, 0 replies; 9+ messages in thread
From: David Sterba @ 2024-07-15 12:41 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs, linux-kernel,
Qu Wenru, Filipe Manana, Johannes Thumshirn
On Fri, Jul 12, 2024 at 09:48:35AM +0200, Johannes Thumshirn wrote:
> Three further RST updates targeted for 6.11 (hopefully).
6.11 is doable, all are fixes.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/3] btrfs: replace stripe extents
2024-07-12 7:48 ` [PATCH v3 2/3] btrfs: replace stripe extents Johannes Thumshirn
2024-07-12 10:19 ` Qu Wenruo
@ 2024-07-18 15:35 ` David Sterba
1 sibling, 0 replies; 9+ messages in thread
From: David Sterba @ 2024-07-18 15:35 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs, linux-kernel,
Qu Wenru, Filipe Manana, Johannes Thumshirn
On Fri, Jul 12, 2024 at 09:48:37AM +0200, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Update stripe extents in case a write to an already existing address
> incoming.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Please update the subject and changelog (in for-next), this does explain
much or lacks context. Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-07-18 15:35 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-12 7:48 [PATCH v3 0/3] btrfs: more RAID stripe tree updates Johannes Thumshirn
2024-07-12 7:48 ` [PATCH v3 1/3] btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block Johannes Thumshirn
2024-07-15 11:29 ` Filipe Manana
2024-07-15 11:38 ` Johannes Thumshirn
2024-07-12 7:48 ` [PATCH v3 2/3] btrfs: replace stripe extents Johannes Thumshirn
2024-07-12 10:19 ` Qu Wenruo
2024-07-18 15:35 ` David Sterba
2024-07-12 7:48 ` [PATCH v3 3/3] btrfs: update stripe_extent delete loop assumptions Johannes Thumshirn
2024-07-15 12:41 ` [PATCH v3 0/3] btrfs: more RAID stripe tree updates David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox