[PATCH] btrfs: mark qgroup inconsistent if dropping a large subvolume

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] btrfs: mark qgroup inconsistent if dropping a large subvolume
@ 2025-12-19  5:53 Qu Wenruo
  2026-02-19  9:52 ` Qu Wenruo
  2026-02-19 11:47 ` Filipe Manana
  0 siblings, 2 replies; 4+ messages in thread
From: Qu Wenruo @ 2025-12-19  5:53 UTC (permalink / raw)
  To: linux-btrfs

Commit 011b46c30476 ("btrfs: skip subtree scan if it's too high to avoid
low stall in btrfs_commit_transaction()") tries to solves the problem of
long transaction hang caused by large qgroup workload triggered by
dropping a large subtree.

But there is another situation where dropping a subvolume without any
snapshot can lead to the same problem.

The only difference is that non-shared subvolume dropping triggers a lot
of leaf rescan in one transaction. In theory btrfs_end_transaction()
should be able to commit a transaction due to various limits, but qgroup
workload is never part of the threshold.

So for now just follow the same drop_subtree_threshold for any subvolume
drop, so that we can avoid long transaction hang.

Unfortunately this means any slightly large subvolume deletion will mark
qgroup inconsistent and needs a rescan.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c | 10 ++++++++++
 fs/btrfs/qgroup.c      | 14 ++++++++++++++
 fs/btrfs/qgroup.h      |  1 +
 3 files changed, 25 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1dcd69fe97ed..59fe3d89e910 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6151,6 +6151,16 @@ int btrfs_drop_snapshot(struct btrfs_root *root, bool update_ref, bool for_reloc
 		}
 	}
 
+	/*
+	 * Not only high subtree can cause long qgroup workload,
+	 * a lot of level 0 drop in a single transaction can also lead
+	 * to a lot of qgroup load and freeze a transaction.
+	 *
+	 * So check the level and if it's too high just mark qgroup
+	 * inconsistent instead of a possible long transaction freeze.
+	 */
+	btrfs_qgroup_check_subvol_drop(fs_info, level);
+
 	wc->restarted = test_bit(BTRFS_ROOT_DEAD_TREE, &root->state);
 	wc->level = level;
 	wc->shared_level = -1;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 14d393a5853d..4dfeed998c54 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -4953,3 +4953,17 @@ int btrfs_record_squota_delta(struct btrfs_fs_info *fs_info,
 	spin_unlock(&fs_info->qgroup_lock);
 	return ret;
 }
+
+void btrfs_qgroup_check_subvol_drop(struct btrfs_fs_info *fs_info, u8 level)
+{
+	u8 drop_subtree_thres;
+
+	if (!btrfs_qgroup_full_accounting(fs_info))
+		return;
+	spin_lock(&fs_info->qgroup_lock);
+	drop_subtree_thres = fs_info->qgroup_drop_subtree_thres;
+	spin_unlock(&fs_info->qgroup_lock);
+
+	if (level >= drop_subtree_thres)
+		qgroup_mark_inconsistent(fs_info, "subvolume level reached threshold");
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index a979fd59a4da..785ed16f5cc4 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -453,5 +453,6 @@ void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans);
 bool btrfs_check_quota_leak(const struct btrfs_fs_info *fs_info);
 int btrfs_record_squota_delta(struct btrfs_fs_info *fs_info,
 			      const struct btrfs_squota_delta *delta);
+void btrfs_qgroup_check_subvol_drop(struct btrfs_fs_info *fs_info, u8 level);
 
 #endif
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: mark qgroup inconsistent if dropping a large subvolume
  2025-12-19  5:53 [PATCH] btrfs: mark qgroup inconsistent if dropping a large subvolume Qu Wenruo
@ 2026-02-19  9:52 ` Qu Wenruo
  2026-02-19 11:47 ` Filipe Manana
  1 sibling, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2026-02-19  9:52 UTC (permalink / raw)
  To: linux-btrfs

Gentle ping?

This should handle large subvolume without no snapshot better, with less 
hangs in qgroup accounting.

Thanks,
Qu

在 2025/12/19 16:23, Qu Wenruo 写道:
> Commit 011b46c30476 ("btrfs: skip subtree scan if it's too high to avoid
> low stall in btrfs_commit_transaction()") tries to solves the problem of
> long transaction hang caused by large qgroup workload triggered by
> dropping a large subtree.
> 
> But there is another situation where dropping a subvolume without any
> snapshot can lead to the same problem.
> 
> The only difference is that non-shared subvolume dropping triggers a lot
> of leaf rescan in one transaction. In theory btrfs_end_transaction()
> should be able to commit a transaction due to various limits, but qgroup
> workload is never part of the threshold.
> 
> So for now just follow the same drop_subtree_threshold for any subvolume
> drop, so that we can avoid long transaction hang.
> 
> Unfortunately this means any slightly large subvolume deletion will mark
> qgroup inconsistent and needs a rescan.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/extent-tree.c | 10 ++++++++++
>   fs/btrfs/qgroup.c      | 14 ++++++++++++++
>   fs/btrfs/qgroup.h      |  1 +
>   3 files changed, 25 insertions(+)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 1dcd69fe97ed..59fe3d89e910 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -6151,6 +6151,16 @@ int btrfs_drop_snapshot(struct btrfs_root *root, bool update_ref, bool for_reloc
>   		}
>   	}
>   
> +	/*
> +	 * Not only high subtree can cause long qgroup workload,
> +	 * a lot of level 0 drop in a single transaction can also lead
> +	 * to a lot of qgroup load and freeze a transaction.
> +	 *
> +	 * So check the level and if it's too high just mark qgroup
> +	 * inconsistent instead of a possible long transaction freeze.
> +	 */
> +	btrfs_qgroup_check_subvol_drop(fs_info, level);
> +
>   	wc->restarted = test_bit(BTRFS_ROOT_DEAD_TREE, &root->state);
>   	wc->level = level;
>   	wc->shared_level = -1;
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 14d393a5853d..4dfeed998c54 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -4953,3 +4953,17 @@ int btrfs_record_squota_delta(struct btrfs_fs_info *fs_info,
>   	spin_unlock(&fs_info->qgroup_lock);
>   	return ret;
>   }
> +
> +void btrfs_qgroup_check_subvol_drop(struct btrfs_fs_info *fs_info, u8 level)
> +{
> +	u8 drop_subtree_thres;
> +
> +	if (!btrfs_qgroup_full_accounting(fs_info))
> +		return;
> +	spin_lock(&fs_info->qgroup_lock);
> +	drop_subtree_thres = fs_info->qgroup_drop_subtree_thres;
> +	spin_unlock(&fs_info->qgroup_lock);
> +
> +	if (level >= drop_subtree_thres)
> +		qgroup_mark_inconsistent(fs_info, "subvolume level reached threshold");
> +}
> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
> index a979fd59a4da..785ed16f5cc4 100644
> --- a/fs/btrfs/qgroup.h
> +++ b/fs/btrfs/qgroup.h
> @@ -453,5 +453,6 @@ void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans);
>   bool btrfs_check_quota_leak(const struct btrfs_fs_info *fs_info);
>   int btrfs_record_squota_delta(struct btrfs_fs_info *fs_info,
>   			      const struct btrfs_squota_delta *delta);
> +void btrfs_qgroup_check_subvol_drop(struct btrfs_fs_info *fs_info, u8 level);
>   
>   #endif


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: mark qgroup inconsistent if dropping a large subvolume
  2025-12-19  5:53 [PATCH] btrfs: mark qgroup inconsistent if dropping a large subvolume Qu Wenruo
  2026-02-19  9:52 ` Qu Wenruo
@ 2026-02-19 11:47 ` Filipe Manana
  2026-02-19 21:08   ` Qu Wenruo
  1 sibling, 1 reply; 4+ messages in thread
From: Filipe Manana @ 2026-02-19 11:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Dec 19, 2025 at 5:54 AM Qu Wenruo <wqu@suse.com> wrote:
>
> Commit 011b46c30476 ("btrfs: skip subtree scan if it's too high to avoid
> low stall in btrfs_commit_transaction()") tries to solves the problem of

solves -> solve

> long transaction hang caused by large qgroup workload triggered by
> dropping a large subtree.
>
> But there is another situation where dropping a subvolume without any
> snapshot can lead to the same problem.
>
> The only difference is that non-shared subvolume dropping triggers a lot
> of leaf rescan in one transaction. In theory btrfs_end_transaction()
> should be able to commit a transaction due to various limits, but qgroup
> workload is never part of the threshold.

What do you mean by btrfs_end_transaction() should be able to commit a
transaction?
We never trigger transaction commits there.


>
> So for now just follow the same drop_subtree_threshold for any subvolume
> drop, so that we can avoid long transaction hang.
>
> Unfortunately this means any slightly large subvolume deletion will mark
> qgroup inconsistent and needs a rescan.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent-tree.c | 10 ++++++++++
>  fs/btrfs/qgroup.c      | 14 ++++++++++++++
>  fs/btrfs/qgroup.h      |  1 +
>  3 files changed, 25 insertions(+)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 1dcd69fe97ed..59fe3d89e910 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -6151,6 +6151,16 @@ int btrfs_drop_snapshot(struct btrfs_root *root, bool update_ref, bool for_reloc
>                 }
>         }
>
> +       /*
> +        * Not only high subtree can cause long qgroup workload,
> +        * a lot of level 0 drop in a single transaction can also lead
> +        * to a lot of qgroup load and freeze a transaction.

I find this confusing. So it's saying that we can do a lot of heavy
qgroup work if we drop a lot of trees that have a single node (root is
a leaf, at level 0)?

But the code added in this patch doesn't do anything about that, it
only looks at a single root to drop by checking its level against
fs_info->qgroup_drop_subtree_thres, which has a default value of 3.
That paragraph gives the idea we will check if we have many roots to
drop below the threshold and do something about it too, not just for
the case of a root at or above the threshold.

The code seems ok, it's just this comment and that part in the
changelog doesn't make sense to me.
Thanks.

> +        *
> +        * So check the level and if it's too high just mark qgroup
> +        * inconsistent instead of a possible long transaction freeze.
> +        */
> +       btrfs_qgroup_check_subvol_drop(fs_info, level);
> +
>         wc->restarted = test_bit(BTRFS_ROOT_DEAD_TREE, &root->state);
>         wc->level = level;
>         wc->shared_level = -1;
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 14d393a5853d..4dfeed998c54 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -4953,3 +4953,17 @@ int btrfs_record_squota_delta(struct btrfs_fs_info *fs_info,
>         spin_unlock(&fs_info->qgroup_lock);
>         return ret;
>  }
> +
> +void btrfs_qgroup_check_subvol_drop(struct btrfs_fs_info *fs_info, u8 level)
> +{
> +       u8 drop_subtree_thres;
> +
> +       if (!btrfs_qgroup_full_accounting(fs_info))
> +               return;
> +       spin_lock(&fs_info->qgroup_lock);
> +       drop_subtree_thres = fs_info->qgroup_drop_subtree_thres;
> +       spin_unlock(&fs_info->qgroup_lock);
> +
> +       if (level >= drop_subtree_thres)
> +               qgroup_mark_inconsistent(fs_info, "subvolume level reached threshold");
> +}
> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
> index a979fd59a4da..785ed16f5cc4 100644
> --- a/fs/btrfs/qgroup.h
> +++ b/fs/btrfs/qgroup.h
> @@ -453,5 +453,6 @@ void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans);
>  bool btrfs_check_quota_leak(const struct btrfs_fs_info *fs_info);
>  int btrfs_record_squota_delta(struct btrfs_fs_info *fs_info,
>                               const struct btrfs_squota_delta *delta);
> +void btrfs_qgroup_check_subvol_drop(struct btrfs_fs_info *fs_info, u8 level);
>
>  #endif
> --
> 2.52.0
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: mark qgroup inconsistent if dropping a large subvolume
  2026-02-19 11:47 ` Filipe Manana
@ 2026-02-19 21:08   ` Qu Wenruo
  0 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2026-02-19 21:08 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs



在 2026/2/19 22:17, Filipe Manana 写道:
> On Fri, Dec 19, 2025 at 5:54 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> Commit 011b46c30476 ("btrfs: skip subtree scan if it's too high to avoid
>> low stall in btrfs_commit_transaction()") tries to solves the problem of
> 
> solves -> solve
> 
>> long transaction hang caused by large qgroup workload triggered by
>> dropping a large subtree.
>>
>> But there is another situation where dropping a subvolume without any
>> snapshot can lead to the same problem.
>>
>> The only difference is that non-shared subvolume dropping triggers a lot
>> of leaf rescan in one transaction. In theory btrfs_end_transaction()
>> should be able to commit a transaction due to various limits, but qgroup
>> workload is never part of the threshold.
> 
> What do you mean by btrfs_end_transaction() should be able to commit a
> transaction?
> We never trigger transaction commits there.

I mean never let a transaction to have too many dirty qgroup records 
(and maybe other limits).

With that behavior we can spread the workload into multiple transactions 
and each transaction will have a much smaller qgroup records to handle, 
thus smaller critical sections.

> 
> 
>>
>> So for now just follow the same drop_subtree_threshold for any subvolume
>> drop, so that we can avoid long transaction hang.
>>
>> Unfortunately this means any slightly large subvolume deletion will mark
>> qgroup inconsistent and needs a rescan.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/extent-tree.c | 10 ++++++++++
>>   fs/btrfs/qgroup.c      | 14 ++++++++++++++
>>   fs/btrfs/qgroup.h      |  1 +
>>   3 files changed, 25 insertions(+)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 1dcd69fe97ed..59fe3d89e910 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -6151,6 +6151,16 @@ int btrfs_drop_snapshot(struct btrfs_root *root, bool update_ref, bool for_reloc
>>                  }
>>          }
>>
>> +       /*
>> +        * Not only high subtree can cause long qgroup workload,
>> +        * a lot of level 0 drop in a single transaction can also lead
>> +        * to a lot of qgroup load and freeze a transaction.
> 
> I find this confusing. So it's saying that we can do a lot of heavy
> qgroup work if we drop a lot of trees that have a single node (root is
> a leaf, at level 0)?

No, I mean a subvolume that has high level, but no shared tree blocks 
with any snapshot.

In that case we generate a lot of qgroup records on the leaves.

The existing qgroup pause checks are based on the subtree level, but 
since there is no shared tree blocks, the subtree level is always 0.

The total amount of work is the same but when a tall subtree is dropped 
we know it's going to freeze the transaction thus we pause qgroup account.

But a lot of level 0 subtrees won't trigger such pause, thus still 
freeze the fs.

Thanks,
Qu

> 
> But the code added in this patch doesn't do anything about that, it
> only looks at a single root to drop by checking its level against
> fs_info->qgroup_drop_subtree_thres, which has a default value of 3.
> That paragraph gives the idea we will check if we have many roots to
> drop below the threshold and do something about it too, not just for
> the case of a root at or above the threshold.
> 
> The code seems ok, it's just this comment and that part in the
> changelog doesn't make sense to me.
> Thanks.
> 
>> +        *
>> +        * So check the level and if it's too high just mark qgroup
>> +        * inconsistent instead of a possible long transaction freeze.
>> +        */
>> +       btrfs_qgroup_check_subvol_drop(fs_info, level);
>> +
>>          wc->restarted = test_bit(BTRFS_ROOT_DEAD_TREE, &root->state);
>>          wc->level = level;
>>          wc->shared_level = -1;
>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>> index 14d393a5853d..4dfeed998c54 100644
>> --- a/fs/btrfs/qgroup.c
>> +++ b/fs/btrfs/qgroup.c
>> @@ -4953,3 +4953,17 @@ int btrfs_record_squota_delta(struct btrfs_fs_info *fs_info,
>>          spin_unlock(&fs_info->qgroup_lock);
>>          return ret;
>>   }
>> +
>> +void btrfs_qgroup_check_subvol_drop(struct btrfs_fs_info *fs_info, u8 level)
>> +{
>> +       u8 drop_subtree_thres;
>> +
>> +       if (!btrfs_qgroup_full_accounting(fs_info))
>> +               return;
>> +       spin_lock(&fs_info->qgroup_lock);
>> +       drop_subtree_thres = fs_info->qgroup_drop_subtree_thres;
>> +       spin_unlock(&fs_info->qgroup_lock);
>> +
>> +       if (level >= drop_subtree_thres)
>> +               qgroup_mark_inconsistent(fs_info, "subvolume level reached threshold");
>> +}
>> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
>> index a979fd59a4da..785ed16f5cc4 100644
>> --- a/fs/btrfs/qgroup.h
>> +++ b/fs/btrfs/qgroup.h
>> @@ -453,5 +453,6 @@ void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans);
>>   bool btrfs_check_quota_leak(const struct btrfs_fs_info *fs_info);
>>   int btrfs_record_squota_delta(struct btrfs_fs_info *fs_info,
>>                                const struct btrfs_squota_delta *delta);
>> +void btrfs_qgroup_check_subvol_drop(struct btrfs_fs_info *fs_info, u8 level);
>>
>>   #endif
>> --
>> 2.52.0
>>
>>
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-19 21:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-19  5:53 [PATCH] btrfs: mark qgroup inconsistent if dropping a large subvolume Qu Wenruo
2026-02-19  9:52 ` Qu Wenruo
2026-02-19 11:47 ` Filipe Manana
2026-02-19 21:08   ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox