* [PATCH 1/3] btrfs: check rw_devices, not num_devices for restriping
2020-01-17 14:07 [PATCH 0/3][v2] clean up how we mark block groups read only Josef Bacik
@ 2020-01-17 14:07 ` Josef Bacik
2020-01-17 14:07 ` [PATCH 2/3] btrfs: fix force usage in inc_block_group_ro Josef Bacik
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Josef Bacik @ 2020-01-17 14:07 UTC (permalink / raw)
To: linux-btrfs, kernel-team
While running xfstests with compression on I noticed I was panicing on
btrfs/154. I bisected this down to my inc_block_group_ro patches, which
was strange.
What was happening is with my patches we now use btrfs_can_overcommit()
to see if we can flip a block group read only. Before this would fail
because we weren't taking into account the usable un-allocated space for
allocating chunks. With my patches we were allowed to do the balance,
which is technically correct.
However this test is testing restriping with a degraded mount, something
that isn't working right because Anand's fix for the test was never
actually merged.
So now we're trying to allocate a chunk and cannot because we want to
allocate a RAID1 chunk, but there's only 1 device that's available for
usage. This results in an ENOSPC in one of the BUG_ON(ret) paths in
relocation (and a tricky path that is going to take many more patches to
fix.)
But we shouldn't even be making it this far, we don't have enough
devices to restripe. The problem is we're using btrfs_num_devices(),
which for some reason includes missing devices. That's not actually
what we want, we want the rw_devices.
Fix this by getting the rw_devices. With this patch we're no longer
panicing with my other patches applied, and we're in fact erroring out
at the correct spot instead of at inc_block_group_ro. The fact that
this was working before was just sheer dumb luck.
Fixes: e4d8ec0f65b9 ("Btrfs: implement online profile changing")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/volumes.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c72fd33a9ce1..035578e061fb 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3884,7 +3884,14 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
}
}
- num_devices = btrfs_num_devices(fs_info);
+ /*
+ * Balance is an exculsive operation, so no operation that's going to
+ * affect rw_devices can run concurrent with it, thus it is safe to
+ * simply grab the value. We want rw_devices because we do not want to
+ * allow restriping if we don't have enough devices we can actually
+ * allocate from.
+ */
+ num_devices = fs_info->fs_devices->rw_devices;
/*
* SINGLE profile on-disk has no profile bit, but in-memory we have a
--
2.24.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH 3/3] btrfs: use btrfs_can_overcommit in inc_block_group_ro
2020-01-17 14:07 [PATCH 0/3][v2] clean up how we mark block groups read only Josef Bacik
2020-01-17 14:07 ` [PATCH 1/3] btrfs: check rw_devices, not num_devices for restriping Josef Bacik
2020-01-17 14:07 ` [PATCH 2/3] btrfs: fix force usage in inc_block_group_ro Josef Bacik
@ 2020-01-17 14:07 ` Josef Bacik
2020-01-29 6:05 ` [PATCH 0/3][v2] clean up how we mark block groups read only Qu Wenruo
3 siblings, 0 replies; 6+ messages in thread
From: Josef Bacik @ 2020-01-17 14:07 UTC (permalink / raw)
To: linux-btrfs, kernel-team; +Cc: Qu Wenruo
inc_block_group_ro does a calculation to see if we have enough room left
over if we mark this block group as read only in order to see if it's ok
to mark the block group as read only.
The problem is this calculation _only_ works for data, where our used is
always less than our total. For metadata we will overcommit, so this
will almost always fail for metadata.
Fix this by exporting btrfs_can_overcommit, and then see if we have
enough space to remove the remaining free space in the block group we
are trying to mark read only. If we do then we can mark this block
group as read only.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/block-group.c | 37 ++++++++++++++++++++++++++-----------
fs/btrfs/space-info.c | 18 ++++++++++--------
fs/btrfs/space-info.h | 3 +++
3 files changed, 39 insertions(+), 19 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 71770d04b7a3..7e71ec9682d0 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1191,7 +1191,6 @@ static int inc_block_group_ro(struct btrfs_block_group *cache, int force)
{
struct btrfs_space_info *sinfo = cache->space_info;
u64 num_bytes;
- u64 sinfo_used;
int ret = -ENOSPC;
spin_lock(&sinfo->lock);
@@ -1205,19 +1204,38 @@ static int inc_block_group_ro(struct btrfs_block_group *cache, int force)
num_bytes = cache->length - cache->reserved - cache->pinned -
cache->bytes_super - cache->used;
- sinfo_used = btrfs_space_info_used(sinfo, true);
/*
- * sinfo_used + num_bytes should always <= sinfo->total_bytes.
- *
- * Here we make sure if we mark this bg RO, we still have enough
- * free space as buffer.
+ * Data never overcommits, even in mixed mode, so do just the straight
+ * check of left over space in how much we have allocated.
*/
- if (force || (sinfo_used + num_bytes <= sinfo->total_bytes)) {
+ if (force) {
+ ret = 0;
+ } else if (sinfo->flags & BTRFS_BLOCK_GROUP_DATA) {
+ u64 sinfo_used = btrfs_space_info_used(sinfo, true);
+
+ /*
+ * Here we make sure if we mark this bg RO, we still have enough
+ * free space as buffer.
+ */
+ if (sinfo_used + num_bytes <= sinfo->total_bytes)
+ ret = 0;
+ } else {
+ /*
+ * We overcommit metadata, so we need to do the
+ * btrfs_can_overcommit check here, and we need to pass in
+ * BTRFS_RESERVE_NO_FLUSH to give ourselves the most amount of
+ * leeway to allow us to mark this block group as read only.
+ */
+ if (btrfs_can_overcommit(cache->fs_info, sinfo, num_bytes,
+ BTRFS_RESERVE_NO_FLUSH))
+ ret = 0;
+ }
+
+ if (!ret) {
sinfo->bytes_readonly += num_bytes;
cache->ro++;
list_add_tail(&cache->ro_list, &sinfo->ro_bgs);
- ret = 0;
}
out:
spin_unlock(&cache->lock);
@@ -1225,9 +1243,6 @@ static int inc_block_group_ro(struct btrfs_block_group *cache, int force)
if (ret == -ENOSPC && btrfs_test_opt(cache->fs_info, ENOSPC_DEBUG)) {
btrfs_info(cache->fs_info,
"unable to make block group %llu ro", cache->start);
- btrfs_info(cache->fs_info,
- "sinfo_used=%llu bg_num_bytes=%llu",
- sinfo_used, num_bytes);
btrfs_dump_space_info(cache->fs_info, cache->space_info, 0, 0);
}
return ret;
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 537bc310a673..01297c5b2666 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -159,9 +159,9 @@ static inline u64 calc_global_rsv_need_space(struct btrfs_block_rsv *global)
return (global->size << 1);
}
-static int can_overcommit(struct btrfs_fs_info *fs_info,
- struct btrfs_space_info *space_info, u64 bytes,
- enum btrfs_reserve_flush_enum flush)
+int btrfs_can_overcommit(struct btrfs_fs_info *fs_info,
+ struct btrfs_space_info *space_info, u64 bytes,
+ enum btrfs_reserve_flush_enum flush)
{
u64 profile;
u64 avail;
@@ -226,7 +226,8 @@ void btrfs_try_granting_tickets(struct btrfs_fs_info *fs_info,
/* Check and see if our ticket can be satisified now. */
if ((used + ticket->bytes <= space_info->total_bytes) ||
- can_overcommit(fs_info, space_info, ticket->bytes, flush)) {
+ btrfs_can_overcommit(fs_info, space_info, ticket->bytes,
+ flush)) {
btrfs_space_info_update_bytes_may_use(fs_info,
space_info,
ticket->bytes);
@@ -639,13 +640,14 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info,
return to_reclaim;
to_reclaim = min_t(u64, num_online_cpus() * SZ_1M, SZ_16M);
- if (can_overcommit(fs_info, space_info, to_reclaim,
- BTRFS_RESERVE_FLUSH_ALL))
+ if (btrfs_can_overcommit(fs_info, space_info, to_reclaim,
+ BTRFS_RESERVE_FLUSH_ALL))
return 0;
used = btrfs_space_info_used(space_info, true);
- if (can_overcommit(fs_info, space_info, SZ_1M, BTRFS_RESERVE_FLUSH_ALL))
+ if (btrfs_can_overcommit(fs_info, space_info, SZ_1M,
+ BTRFS_RESERVE_FLUSH_ALL))
expected = div_factor_fine(space_info->total_bytes, 95);
else
expected = div_factor_fine(space_info->total_bytes, 90);
@@ -1004,7 +1006,7 @@ static int __reserve_metadata_bytes(struct btrfs_fs_info *fs_info,
*/
if (!pending_tickets &&
((used + orig_bytes <= space_info->total_bytes) ||
- can_overcommit(fs_info, space_info, orig_bytes, flush))) {
+ btrfs_can_overcommit(fs_info, space_info, orig_bytes, flush))) {
btrfs_space_info_update_bytes_may_use(fs_info, space_info,
orig_bytes);
ret = 0;
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index 1a349e3f9cc1..24514cd2c6c1 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -127,6 +127,9 @@ int btrfs_reserve_metadata_bytes(struct btrfs_root *root,
enum btrfs_reserve_flush_enum flush);
void btrfs_try_granting_tickets(struct btrfs_fs_info *fs_info,
struct btrfs_space_info *space_info);
+int btrfs_can_overcommit(struct btrfs_fs_info *fs_info,
+ struct btrfs_space_info *space_info, u64 bytes,
+ enum btrfs_reserve_flush_enum flush);
static inline void btrfs_space_info_free_bytes_may_use(
struct btrfs_fs_info *fs_info,
--
2.24.1
^ permalink raw reply related [flat|nested] 6+ messages in thread