linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements
@ 2024-09-25 10:50 fdmanana
  2024-09-25 10:50 ` [PATCH 1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled fdmanana
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: fdmanana @ 2024-09-25 10:50 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Some fixes around delayed refs and qgroups after the conversion of a
red black tree to xarray in this merge window, and some improvements
and cleanups. Details in the changelogs.

Filipe Manana (8):
  btrfs: fix missing error handling when adding delayed ref with qgroups enabled
  btrfs: use sector numbers as keys for the dirty extents xarray
  btrfs: end assignment with semicolon at btrfs_qgroup_extent event class
  btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_record
  btrfs: store fs_info in a local variable at btrfs_qgroup_trace_extent_post()
  btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_extent()
  btrfs: always use delayed_refs local variable at btrfs_qgroup_trace_extent()
  btrfs: remove pointless initialization at btrfs_qgroup_trace_extent()

 fs/btrfs/delayed-ref.c       | 59 ++++++++++++++++++++++++++----------
 fs/btrfs/delayed-ref.h       | 10 +++++-
 fs/btrfs/qgroup.c            | 58 +++++++++++++++++------------------
 fs/btrfs/qgroup.h            | 13 ++++++--
 include/trace/events/btrfs.h | 17 ++++++-----
 5 files changed, 101 insertions(+), 56 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled
  2024-09-25 10:50 [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements fdmanana
@ 2024-09-25 10:50 ` fdmanana
  2024-09-25 10:50 ` [PATCH 2/8] btrfs: use sector numbers as keys for the dirty extents xarray fdmanana
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: fdmanana @ 2024-09-25 10:50 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(),
if we fail to insert the qgroup record we don't error out, we ignore it.
In fact we treat it as if there was no error and there was already an
existing record - we don't distinguish between the cases where
btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already
existed and we can free the given record, and the case where it returns
a negative error value, meaning the insertion into the xarray that is
used to track records failed.

Effectively we end up ignoring that we are lacking qgroup record in the
dirty extents xarray, resulting in incorrect qgroup accounting.

Fix this by checking for errors and return them to the callers.

Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/delayed-ref.c | 42 +++++++++++++++++++++++++++++++++---------
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index ad9ef8312e41..32f719b9e661 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -840,6 +840,8 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref,
  * helper function to actually insert a head node into the rbtree.
  * this does all the dirty work in terms of maintaining the correct
  * overall modification count.
+ *
+ * Returns an error pointer in case of an error.
  */
 static noinline struct btrfs_delayed_ref_head *
 add_delayed_ref_head(struct btrfs_trans_handle *trans,
@@ -862,6 +864,9 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 		if (ret) {
 			/* Clean up if insertion fails or item exists. */
 			xa_release(&delayed_refs->dirty_extents, qrecord->bytenr);
+			/* Caller responsible for freeing qrecord on error. */
+			if (ret < 0)
+				return ERR_PTR(ret);
 			kfree(qrecord);
 		} else {
 			qrecord_inserted = true;
@@ -1000,27 +1005,35 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_delayed_ref_node *node;
 	struct btrfs_delayed_ref_head *head_ref;
+	struct btrfs_delayed_ref_head *new_head_ref;
 	struct btrfs_delayed_ref_root *delayed_refs;
 	struct btrfs_qgroup_extent_record *record = NULL;
 	bool qrecord_inserted;
 	int action = generic_ref->action;
 	bool merged;
+	int ret;
 
 	node = kmem_cache_alloc(btrfs_delayed_ref_node_cachep, GFP_NOFS);
 	if (!node)
 		return -ENOMEM;
 
 	head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, GFP_NOFS);
-	if (!head_ref)
+	if (!head_ref) {
+		ret = -ENOMEM;
 		goto free_node;
+	}
 
 	if (btrfs_qgroup_full_accounting(fs_info) && !generic_ref->skip_qgroup) {
 		record = kzalloc(sizeof(*record), GFP_NOFS);
-		if (!record)
+		if (!record) {
+			ret = -ENOMEM;
 			goto free_head_ref;
+		}
 		if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents,
-			       generic_ref->bytenr, GFP_NOFS))
+			       generic_ref->bytenr, GFP_NOFS)) {
+			ret = -ENOMEM;
 			goto free_record;
+		}
 	}
 
 	init_delayed_ref_common(fs_info, node, generic_ref);
@@ -1034,8 +1047,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
 	 * insert both the head node and the new ref without dropping
 	 * the spin lock
 	 */
-	head_ref = add_delayed_ref_head(trans, head_ref, record,
-					action, &qrecord_inserted);
+	new_head_ref = add_delayed_ref_head(trans, head_ref, record,
+					    action, &qrecord_inserted);
+	if (IS_ERR(new_head_ref)) {
+		spin_unlock(&delayed_refs->lock);
+		ret = PTR_ERR(new_head_ref);
+		goto free_record;
+	}
+	head_ref = new_head_ref;
 
 	merged = insert_delayed_ref(trans, head_ref, node);
 	spin_unlock(&delayed_refs->lock);
@@ -1063,7 +1082,7 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
 	kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
 free_node:
 	kmem_cache_free(btrfs_delayed_ref_node_cachep, node);
-	return -ENOMEM;
+	return ret;
 }
 
 /*
@@ -1094,6 +1113,7 @@ int btrfs_add_delayed_extent_op(struct btrfs_trans_handle *trans,
 				struct btrfs_delayed_extent_op *extent_op)
 {
 	struct btrfs_delayed_ref_head *head_ref;
+	struct btrfs_delayed_ref_head *head_ref_ret;
 	struct btrfs_delayed_ref_root *delayed_refs;
 	struct btrfs_ref generic_ref = {
 		.type = BTRFS_REF_METADATA,
@@ -1113,11 +1133,15 @@ int btrfs_add_delayed_extent_op(struct btrfs_trans_handle *trans,
 	delayed_refs = &trans->transaction->delayed_refs;
 	spin_lock(&delayed_refs->lock);
 
-	add_delayed_ref_head(trans, head_ref, NULL, BTRFS_UPDATE_DELAYED_HEAD,
-			     NULL);
-
+	head_ref_ret = add_delayed_ref_head(trans, head_ref, NULL,
+					    BTRFS_UPDATE_DELAYED_HEAD, NULL);
 	spin_unlock(&delayed_refs->lock);
 
+	if (IS_ERR(head_ref_ret)) {
+		kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
+		return PTR_ERR(head_ref_ret);
+	}
+
 	/*
 	 * Need to update the delayed_refs_rsv with any changes we may have
 	 * made.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/8] btrfs: use sector numbers as keys for the dirty extents xarray
  2024-09-25 10:50 [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements fdmanana
  2024-09-25 10:50 ` [PATCH 1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled fdmanana
@ 2024-09-25 10:50 ` fdmanana
  2024-09-25 22:25   ` Qu Wenruo
  2024-09-25 10:50 ` [PATCH 3/8] btrfs: end assignment with semicolon at btrfs_qgroup_extent event class fdmanana
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: fdmanana @ 2024-09-25 10:50 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

We are using the logical address ("bytenr") of an extent as the key for
qgroup records in the dirty extents xarray. This is a problem because the
xarrays use "unsigned long" for keys/indices, meaning that on a 32 bits
platform any extent starting at or beyond 4G is truncated, which is a too
low limitation as virtually everyone is using storage with more than 4G of
space. This means a "bytenr" of 4G gets truncated to 0, and so does 8G and
16G for example, resulting in incorrect qgroup accounting.

Fix this by using sector numbers as keys instead, that is, using keys that
match the logical address right shifted by fs_info->sectorsize_bits, which
is what we do for the fs_info->buffer_radix that tracks extent buffers
(radix trees also use an "unsigned long" type for keys). This also makes
the index space more dense which helps optimize the xarray (as mentioned
at Documentation/core-api/xarray.rst).

Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/delayed-ref.c | 13 ++++++++-----
 fs/btrfs/delayed-ref.h | 10 +++++++++-
 fs/btrfs/qgroup.c      | 11 ++++++-----
 3 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 32f719b9e661..f075ac11e51c 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -849,6 +849,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 		     struct btrfs_qgroup_extent_record *qrecord,
 		     int action, bool *qrecord_inserted_ret)
 {
+	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_delayed_ref_head *existing;
 	struct btrfs_delayed_ref_root *delayed_refs;
 	bool qrecord_inserted = false;
@@ -859,11 +860,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 	if (qrecord) {
 		int ret;
 
-		ret = btrfs_qgroup_trace_extent_nolock(trans->fs_info,
+		ret = btrfs_qgroup_trace_extent_nolock(fs_info,
 						       delayed_refs, qrecord);
 		if (ret) {
 			/* Clean up if insertion fails or item exists. */
-			xa_release(&delayed_refs->dirty_extents, qrecord->bytenr);
+			xa_release(&delayed_refs->dirty_extents,
+				   qrecord->bytenr >> fs_info->sectorsize_bits);
 			/* Caller responsible for freeing qrecord on error. */
 			if (ret < 0)
 				return ERR_PTR(ret);
@@ -873,7 +875,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 		}
 	}
 
-	trace_add_delayed_ref_head(trans->fs_info, head_ref, action);
+	trace_add_delayed_ref_head(fs_info, head_ref, action);
 
 	existing = htree_insert(&delayed_refs->href_root,
 				&head_ref->href_node);
@@ -895,7 +897,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 		if (head_ref->is_data && head_ref->ref_mod < 0) {
 			delayed_refs->pending_csums += head_ref->num_bytes;
 			trans->delayed_ref_csum_deletions +=
-				btrfs_csum_bytes_to_leaves(trans->fs_info,
+				btrfs_csum_bytes_to_leaves(fs_info,
 							   head_ref->num_bytes);
 		}
 		delayed_refs->num_heads++;
@@ -1030,7 +1032,8 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
 			goto free_head_ref;
 		}
 		if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents,
-			       generic_ref->bytenr, GFP_NOFS)) {
+			       generic_ref->bytenr >> fs_info->sectorsize_bits,
+			       GFP_NOFS)) {
 			ret = -ENOMEM;
 			goto free_record;
 		}
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 085f30968aba..352921e76c74 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -202,7 +202,15 @@ struct btrfs_delayed_ref_root {
 	/* head ref rbtree */
 	struct rb_root_cached href_root;
 
-	/* Track dirty extent records. */
+	/*
+	 * Track dirty extent records.
+	 * The keys correspond to the logical address of the extent ("bytenr")
+	 * right shifted by fs_info->sectorsize_bits. This is both to get a more
+	 * dense index space (optimizes xarray structure) and because indexes in
+	 * xarrays are of "unsigned long" type, meaning they are 32 bits wide on
+	 * 32 bits platforms, limiting the extent range to 4G which is too low
+	 * and makes it unusable (truncated index values) on 32 bits platforms.
+	 */
 	struct xarray dirty_extents;
 
 	/* this spin lock protects the rbtree and the entries inside */
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index c297909f1506..a76e4610fe80 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2005,7 +2005,7 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
 				struct btrfs_qgroup_extent_record *record)
 {
 	struct btrfs_qgroup_extent_record *existing, *ret;
-	unsigned long bytenr = record->bytenr;
+	const unsigned long index = (record->bytenr >> fs_info->sectorsize_bits);
 
 	if (!btrfs_qgroup_full_accounting(fs_info))
 		return 1;
@@ -2014,7 +2014,7 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
 	trace_btrfs_qgroup_trace_extent(fs_info, record);
 
 	xa_lock(&delayed_refs->dirty_extents);
-	existing = xa_load(&delayed_refs->dirty_extents, bytenr);
+	existing = xa_load(&delayed_refs->dirty_extents, index);
 	if (existing) {
 		if (record->data_rsv && !existing->data_rsv) {
 			existing->data_rsv = record->data_rsv;
@@ -2024,7 +2024,7 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
 		return 1;
 	}
 
-	ret = __xa_store(&delayed_refs->dirty_extents, record->bytenr, record, GFP_ATOMIC);
+	ret = __xa_store(&delayed_refs->dirty_extents, index, record, GFP_ATOMIC);
 	xa_unlock(&delayed_refs->dirty_extents);
 	if (xa_is_err(ret)) {
 		qgroup_mark_inconsistent(fs_info);
@@ -2129,6 +2129,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_qgroup_extent_record *record;
 	struct btrfs_delayed_ref_root *delayed_refs;
+	const unsigned long index = (bytenr >> fs_info->sectorsize_bits);
 	int ret;
 
 	if (!btrfs_qgroup_full_accounting(fs_info) || bytenr == 0 || num_bytes == 0)
@@ -2137,7 +2138,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	if (!record)
 		return -ENOMEM;
 
-	if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents, bytenr, GFP_NOFS)) {
+	if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents, index, GFP_NOFS)) {
 		kfree(record);
 		return -ENOMEM;
 	}
@@ -2152,7 +2153,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	spin_unlock(&delayed_refs->lock);
 	if (ret) {
 		/* Clean up if insertion fails or item exists. */
-		xa_release(&delayed_refs->dirty_extents, record->bytenr);
+		xa_release(&delayed_refs->dirty_extents, index);
 		kfree(record);
 		return 0;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/8] btrfs: end assignment with semicolon at btrfs_qgroup_extent event class
  2024-09-25 10:50 [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements fdmanana
  2024-09-25 10:50 ` [PATCH 1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled fdmanana
  2024-09-25 10:50 ` [PATCH 2/8] btrfs: use sector numbers as keys for the dirty extents xarray fdmanana
@ 2024-09-25 10:50 ` fdmanana
  2024-09-25 10:50 ` [PATCH 4/8] btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_record fdmanana
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: fdmanana @ 2024-09-25 10:50 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

While running checkpatch against against a patch that modifies the
btrfs_qgroup_extent event class, it complained about using a comma instead
of a semicolon:

  $ ./scripts/checkpatch.pl qgroups/0003-btrfs-qgroups-remove-bytenr-field-from-struct-btrfs_.patch
  WARNING: Possible comma where semicolon could be used
  #215: FILE: include/trace/events/btrfs.h:1720:
  +		__entry->bytenr		= bytenr,
		__entry->num_bytes	= rec->num_bytes;

  total: 0 errors, 1 warnings, 184 lines checked

So replace the comma with a semicolon to silence checkpatch and possibly
other tools. It also makes the code consistent with the rest.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 include/trace/events/btrfs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index bf60ad50011e..af6b3827fb1d 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1716,7 +1716,7 @@ DECLARE_EVENT_CLASS(btrfs_qgroup_extent,
 	),
 
 	TP_fast_assign_btrfs(fs_info,
-		__entry->bytenr		= rec->bytenr,
+		__entry->bytenr		= rec->bytenr;
 		__entry->num_bytes	= rec->num_bytes;
 	),
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/8] btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_record
  2024-09-25 10:50 [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements fdmanana
                   ` (2 preceding siblings ...)
  2024-09-25 10:50 ` [PATCH 3/8] btrfs: end assignment with semicolon at btrfs_qgroup_extent event class fdmanana
@ 2024-09-25 10:50 ` fdmanana
  2024-09-25 10:50 ` [PATCH 5/8] btrfs: store fs_info in a local variable at btrfs_qgroup_trace_extent_post() fdmanana
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: fdmanana @ 2024-09-25 10:50 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Now that we track qgroup extent records in a xarray we don't need to have
a "bytenr" field in  struct btrfs_qgroup_extent_record, since we can get
it from the index of the record in the xarray.

So remove the field and grab the bytenr from either the index key or any
other place where it's available (delayed refs). This reduces the size of
struct btrfs_qgroup_extent_record from 40 bytes down to 32 bytes, meaning
that we now can store 128 instances of this structure instead of 102 per
4K page.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/delayed-ref.c       |  8 ++++----
 fs/btrfs/qgroup.c            | 27 +++++++++++++++------------
 fs/btrfs/qgroup.h            | 13 ++++++++++---
 include/trace/events/btrfs.h | 17 ++++++++++-------
 4 files changed, 39 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index f075ac11e51c..388d4ed1938e 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -830,7 +830,6 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref,
 			qrecord->data_rsv = reserved;
 			qrecord->data_rsv_refroot = generic_ref->ref_root;
 		}
-		qrecord->bytenr = generic_ref->bytenr;
 		qrecord->num_bytes = generic_ref->num_bytes;
 		qrecord->old_roots = NULL;
 	}
@@ -861,11 +860,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 		int ret;
 
 		ret = btrfs_qgroup_trace_extent_nolock(fs_info,
-						       delayed_refs, qrecord);
+						       delayed_refs, qrecord,
+						       head_ref->bytenr);
 		if (ret) {
 			/* Clean up if insertion fails or item exists. */
 			xa_release(&delayed_refs->dirty_extents,
-				   qrecord->bytenr >> fs_info->sectorsize_bits);
+				   head_ref->bytenr >> fs_info->sectorsize_bits);
 			/* Caller responsible for freeing qrecord on error. */
 			if (ret < 0)
 				return ERR_PTR(ret);
@@ -1076,7 +1076,7 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
 		kmem_cache_free(btrfs_delayed_ref_node_cachep, node);
 
 	if (qrecord_inserted)
-		return btrfs_qgroup_trace_extent_post(trans, record);
+		return btrfs_qgroup_trace_extent_post(trans, record, head_ref->bytenr);
 	return 0;
 
 free_record:
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index a76e4610fe80..4eaeea5f2241 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2001,17 +2001,18 @@ int btrfs_limit_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid,
  * Return <0 for insertion failure, caller can free @record safely.
  */
 int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
-				struct btrfs_delayed_ref_root *delayed_refs,
-				struct btrfs_qgroup_extent_record *record)
+				     struct btrfs_delayed_ref_root *delayed_refs,
+				     struct btrfs_qgroup_extent_record *record,
+				     u64 bytenr)
 {
 	struct btrfs_qgroup_extent_record *existing, *ret;
-	const unsigned long index = (record->bytenr >> fs_info->sectorsize_bits);
+	const unsigned long index = (bytenr >> fs_info->sectorsize_bits);
 
 	if (!btrfs_qgroup_full_accounting(fs_info))
 		return 1;
 
 	lockdep_assert_held(&delayed_refs->lock);
-	trace_btrfs_qgroup_trace_extent(fs_info, record);
+	trace_btrfs_qgroup_trace_extent(fs_info, record, bytenr);
 
 	xa_lock(&delayed_refs->dirty_extents);
 	existing = xa_load(&delayed_refs->dirty_extents, index);
@@ -2056,7 +2057,8 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
  * transaction committing, but not now as qgroup accounting will be wrong again.
  */
 int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans,
-				   struct btrfs_qgroup_extent_record *qrecord)
+				   struct btrfs_qgroup_extent_record *qrecord,
+				   u64 bytenr)
 {
 	struct btrfs_backref_walk_ctx ctx = { 0 };
 	int ret;
@@ -2087,7 +2089,7 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans,
 	if (trans->fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING)
 		return 0;
 
-	ctx.bytenr = qrecord->bytenr;
+	ctx.bytenr = bytenr;
 	ctx.fs_info = trans->fs_info;
 
 	ret = btrfs_find_all_roots(&ctx, true);
@@ -2144,12 +2146,11 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	}
 
 	delayed_refs = &trans->transaction->delayed_refs;
-	record->bytenr = bytenr;
 	record->num_bytes = num_bytes;
 	record->old_roots = NULL;
 
 	spin_lock(&delayed_refs->lock);
-	ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs, record);
+	ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs, record, bytenr);
 	spin_unlock(&delayed_refs->lock);
 	if (ret) {
 		/* Clean up if insertion fails or item exists. */
@@ -2157,7 +2158,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 		kfree(record);
 		return 0;
 	}
-	return btrfs_qgroup_trace_extent_post(trans, record);
+	return btrfs_qgroup_trace_extent_post(trans, record, bytenr);
 }
 
 /*
@@ -3033,14 +3034,16 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans)
 	delayed_refs = &trans->transaction->delayed_refs;
 	qgroup_to_skip = delayed_refs->qgroup_to_skip;
 	xa_for_each(&delayed_refs->dirty_extents, index, record) {
+		const u64 bytenr = (index << fs_info->sectorsize_bits);
+
 		num_dirty_extents++;
-		trace_btrfs_qgroup_account_extents(fs_info, record);
+		trace_btrfs_qgroup_account_extents(fs_info, record, bytenr);
 
 		if (!ret && !(fs_info->qgroup_flags &
 			      BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING)) {
 			struct btrfs_backref_walk_ctx ctx = { 0 };
 
-			ctx.bytenr = record->bytenr;
+			ctx.bytenr = bytenr;
 			ctx.fs_info = fs_info;
 
 			/*
@@ -3082,7 +3085,7 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans)
 				ulist_del(record->old_roots, qgroup_to_skip,
 					  0);
 			}
-			ret = btrfs_qgroup_account_extent(trans, record->bytenr,
+			ret = btrfs_qgroup_account_extent(trans, bytenr,
 							  record->num_bytes,
 							  record->old_roots,
 							  new_roots);
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 98adf4ec7b01..836e9f59ec84 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -125,7 +125,12 @@ struct btrfs_inode;
  * Record a dirty extent, and info qgroup to update quota on it
  */
 struct btrfs_qgroup_extent_record {
-	u64 bytenr;
+	/*
+	 * The bytenr of the extent is given by its index in the dirty_extents
+	 * xarray of struct btrfs_delayed_ref_root left shifted by
+	 * fs_info->sectorsize_bits.
+	 */
+
 	u64 num_bytes;
 
 	/*
@@ -343,9 +348,11 @@ void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info);
 int btrfs_qgroup_trace_extent_nolock(
 		struct btrfs_fs_info *fs_info,
 		struct btrfs_delayed_ref_root *delayed_refs,
-		struct btrfs_qgroup_extent_record *record);
+		struct btrfs_qgroup_extent_record *record,
+		u64 bytenr);
 int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans,
-				   struct btrfs_qgroup_extent_record *qrecord);
+				   struct btrfs_qgroup_extent_record *qrecord,
+				   u64 bytenr);
 int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 			      u64 num_bytes);
 int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index af6b3827fb1d..8d2ff32fb3b0 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1706,9 +1706,10 @@ DEFINE_EVENT(btrfs__qgroup_rsv_data, btrfs_qgroup_release_data,
 
 DECLARE_EVENT_CLASS(btrfs_qgroup_extent,
 	TP_PROTO(const struct btrfs_fs_info *fs_info,
-		 const struct btrfs_qgroup_extent_record *rec),
+		 const struct btrfs_qgroup_extent_record *rec,
+		 u64 bytenr),
 
-	TP_ARGS(fs_info, rec),
+	TP_ARGS(fs_info, rec, bytenr),
 
 	TP_STRUCT__entry_btrfs(
 		__field(	u64,  bytenr		)
@@ -1716,7 +1717,7 @@ DECLARE_EVENT_CLASS(btrfs_qgroup_extent,
 	),
 
 	TP_fast_assign_btrfs(fs_info,
-		__entry->bytenr		= rec->bytenr;
+		__entry->bytenr		= bytenr;
 		__entry->num_bytes	= rec->num_bytes;
 	),
 
@@ -1727,17 +1728,19 @@ DECLARE_EVENT_CLASS(btrfs_qgroup_extent,
 DEFINE_EVENT(btrfs_qgroup_extent, btrfs_qgroup_account_extents,
 
 	TP_PROTO(const struct btrfs_fs_info *fs_info,
-		 const struct btrfs_qgroup_extent_record *rec),
+		 const struct btrfs_qgroup_extent_record *rec,
+		 u64 bytenr),
 
-	TP_ARGS(fs_info, rec)
+	TP_ARGS(fs_info, rec, bytenr)
 );
 
 DEFINE_EVENT(btrfs_qgroup_extent, btrfs_qgroup_trace_extent,
 
 	TP_PROTO(const struct btrfs_fs_info *fs_info,
-		 const struct btrfs_qgroup_extent_record *rec),
+		 const struct btrfs_qgroup_extent_record *rec,
+		 u64 bytenr),
 
-	TP_ARGS(fs_info, rec)
+	TP_ARGS(fs_info, rec, bytenr)
 );
 
 TRACE_EVENT(qgroup_num_dirty_extents,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/8] btrfs: store fs_info in a local variable at btrfs_qgroup_trace_extent_post()
  2024-09-25 10:50 [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements fdmanana
                   ` (3 preceding siblings ...)
  2024-09-25 10:50 ` [PATCH 4/8] btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_record fdmanana
@ 2024-09-25 10:50 ` fdmanana
  2024-09-25 10:50 ` [PATCH 6/8] btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_extent() fdmanana
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: fdmanana @ 2024-09-25 10:50 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Instead of extracting fs_info from the transaction multiples times, store
it in a local variable and use it.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/qgroup.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 4eaeea5f2241..8f9f2828c46e 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2060,10 +2060,14 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans,
 				   struct btrfs_qgroup_extent_record *qrecord,
 				   u64 bytenr)
 {
-	struct btrfs_backref_walk_ctx ctx = { 0 };
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_backref_walk_ctx ctx = {
+		.bytenr = bytenr,
+		.fs_info = fs_info,
+	};
 	int ret;
 
-	if (!btrfs_qgroup_full_accounting(trans->fs_info))
+	if (!btrfs_qgroup_full_accounting(fs_info))
 		return 0;
 	/*
 	 * We are always called in a context where we are already holding a
@@ -2086,16 +2090,13 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans,
 	 */
 	ASSERT(trans != NULL);
 
-	if (trans->fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING)
+	if (fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING)
 		return 0;
 
-	ctx.bytenr = bytenr;
-	ctx.fs_info = trans->fs_info;
-
 	ret = btrfs_find_all_roots(&ctx, true);
 	if (ret < 0) {
-		qgroup_mark_inconsistent(trans->fs_info);
-		btrfs_warn(trans->fs_info,
+		qgroup_mark_inconsistent(fs_info);
+		btrfs_warn(fs_info,
 "error accounting new delayed refs extent (err code: %d), quota inconsistent",
 			ret);
 		return 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/8] btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_extent()
  2024-09-25 10:50 [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements fdmanana
                   ` (4 preceding siblings ...)
  2024-09-25 10:50 ` [PATCH 5/8] btrfs: store fs_info in a local variable at btrfs_qgroup_trace_extent_post() fdmanana
@ 2024-09-25 10:50 ` fdmanana
  2024-09-25 10:50 ` [PATCH 7/8] btrfs: always use delayed_refs local variable " fdmanana
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: fdmanana @ 2024-09-25 10:50 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

There's no need to hold the delayed refs spinlock when calling
btrfs_qgroup_trace_extent_nolock() from btrfs_qgroup_trace_extent(), since
it doesn't change anything in delayed refs and it only changes the xarray
used to track qgroup extent records, which is protected by the xarray's
lock.

Holding the lock is only adding unnecessary lock contention with other
tasks that actually need to take the lock to add/remove/change delayed
references. So remove the locking.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/qgroup.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 8f9f2828c46e..c847b3223e7f 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2011,7 +2011,6 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
 	if (!btrfs_qgroup_full_accounting(fs_info))
 		return 1;
 
-	lockdep_assert_held(&delayed_refs->lock);
 	trace_btrfs_qgroup_trace_extent(fs_info, record, bytenr);
 
 	xa_lock(&delayed_refs->dirty_extents);
@@ -2150,9 +2149,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	record->num_bytes = num_bytes;
 	record->old_roots = NULL;
 
-	spin_lock(&delayed_refs->lock);
 	ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs, record, bytenr);
-	spin_unlock(&delayed_refs->lock);
 	if (ret) {
 		/* Clean up if insertion fails or item exists. */
 		xa_release(&delayed_refs->dirty_extents, index);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 7/8] btrfs: always use delayed_refs local variable at btrfs_qgroup_trace_extent()
  2024-09-25 10:50 [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements fdmanana
                   ` (5 preceding siblings ...)
  2024-09-25 10:50 ` [PATCH 6/8] btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_extent() fdmanana
@ 2024-09-25 10:50 ` fdmanana
  2024-09-25 10:50 ` [PATCH 8/8] btrfs: remove pointless initialization " fdmanana
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: fdmanana @ 2024-09-25 10:50 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Instead of dereferecing the delayed refs from the transaction multiple
times, store it early in the local variable and then always use the
variable.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/qgroup.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index c847b3223e7f..ce7641fdc900 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2130,7 +2130,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_qgroup_extent_record *record;
-	struct btrfs_delayed_ref_root *delayed_refs;
+	struct btrfs_delayed_ref_root *delayed_refs = &trans->transaction->delayed_refs;
 	const unsigned long index = (bytenr >> fs_info->sectorsize_bits);
 	int ret;
 
@@ -2140,12 +2140,11 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	if (!record)
 		return -ENOMEM;
 
-	if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents, index, GFP_NOFS)) {
+	if (xa_reserve(&delayed_refs->dirty_extents, index, GFP_NOFS)) {
 		kfree(record);
 		return -ENOMEM;
 	}
 
-	delayed_refs = &trans->transaction->delayed_refs;
 	record->num_bytes = num_bytes;
 	record->old_roots = NULL;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 8/8] btrfs: remove pointless initialization at btrfs_qgroup_trace_extent()
  2024-09-25 10:50 [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements fdmanana
                   ` (6 preceding siblings ...)
  2024-09-25 10:50 ` [PATCH 7/8] btrfs: always use delayed_refs local variable " fdmanana
@ 2024-09-25 10:50 ` fdmanana
  2024-09-25 22:35 ` [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements Qu Wenruo
  2024-09-26  9:33 ` [PATCH v2 " fdmanana
  9 siblings, 0 replies; 25+ messages in thread
From: fdmanana @ 2024-09-25 10:50 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

The qgroup record was allocated with kzalloc(), so it's pointless to set
its old_roots member to NULL. Remove the assignment.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/qgroup.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index ce7641fdc900..9e10c1190df9 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2146,7 +2146,6 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	}
 
 	record->num_bytes = num_bytes;
-	record->old_roots = NULL;
 
 	ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs, record, bytenr);
 	if (ret) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/8] btrfs: use sector numbers as keys for the dirty extents xarray
  2024-09-25 10:50 ` [PATCH 2/8] btrfs: use sector numbers as keys for the dirty extents xarray fdmanana
@ 2024-09-25 22:25   ` Qu Wenruo
  0 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2024-09-25 22:25 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



在 2024/9/25 20:20, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> We are using the logical address ("bytenr") of an extent as the key for
> qgroup records in the dirty extents xarray. This is a problem because the
> xarrays use "unsigned long" for keys/indices, meaning that on a 32 bits
> platform any extent starting at or beyond 4G is truncated, which is a too
> low limitation as virtually everyone is using storage with more than 4G of
> space. This means a "bytenr" of 4G gets truncated to 0, and so does 8G and
> 16G for example, resulting in incorrect qgroup accounting.
>
> Fix this by using sector numbers as keys instead, that is, using keys that
> match the logical address right shifted by fs_info->sectorsize_bits, which
> is what we do for the fs_info->buffer_radix that tracks extent buffers
> (radix trees also use an "unsigned long" type for keys). This also makes
> the index space more dense which helps optimize the xarray (as mentioned
> at Documentation/core-api/xarray.rst).
>
> Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>   fs/btrfs/delayed-ref.c | 13 ++++++++-----
>   fs/btrfs/delayed-ref.h | 10 +++++++++-
>   fs/btrfs/qgroup.c      | 11 ++++++-----
>   3 files changed, 23 insertions(+), 11 deletions(-)
>
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index 32f719b9e661..f075ac11e51c 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -849,6 +849,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>   		     struct btrfs_qgroup_extent_record *qrecord,
>   		     int action, bool *qrecord_inserted_ret)
>   {
> +	struct btrfs_fs_info *fs_info = trans->fs_info;
>   	struct btrfs_delayed_ref_head *existing;
>   	struct btrfs_delayed_ref_root *delayed_refs;
>   	bool qrecord_inserted = false;
> @@ -859,11 +860,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>   	if (qrecord) {
>   		int ret;
>
> -		ret = btrfs_qgroup_trace_extent_nolock(trans->fs_info,
> +		ret = btrfs_qgroup_trace_extent_nolock(fs_info,
>   						       delayed_refs, qrecord);
>   		if (ret) {
>   			/* Clean up if insertion fails or item exists. */
> -			xa_release(&delayed_refs->dirty_extents, qrecord->bytenr);
> +			xa_release(&delayed_refs->dirty_extents,
> +				   qrecord->bytenr >> fs_info->sectorsize_bits);
>   			/* Caller responsible for freeing qrecord on error. */
>   			if (ret < 0)
>   				return ERR_PTR(ret);
> @@ -873,7 +875,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>   		}
>   	}
>
> -	trace_add_delayed_ref_head(trans->fs_info, head_ref, action);
> +	trace_add_delayed_ref_head(fs_info, head_ref, action);
>
>   	existing = htree_insert(&delayed_refs->href_root,
>   				&head_ref->href_node);
> @@ -895,7 +897,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>   		if (head_ref->is_data && head_ref->ref_mod < 0) {
>   			delayed_refs->pending_csums += head_ref->num_bytes;
>   			trans->delayed_ref_csum_deletions +=
> -				btrfs_csum_bytes_to_leaves(trans->fs_info,
> +				btrfs_csum_bytes_to_leaves(fs_info,
>   							   head_ref->num_bytes);
>   		}
>   		delayed_refs->num_heads++;
> @@ -1030,7 +1032,8 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
>   			goto free_head_ref;
>   		}
>   		if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents,
> -			       generic_ref->bytenr, GFP_NOFS)) {
> +			       generic_ref->bytenr >> fs_info->sectorsize_bits,
> +			       GFP_NOFS)) {
>   			ret = -ENOMEM;
>   			goto free_record;
>   		}
> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
> index 085f30968aba..352921e76c74 100644
> --- a/fs/btrfs/delayed-ref.h
> +++ b/fs/btrfs/delayed-ref.h
> @@ -202,7 +202,15 @@ struct btrfs_delayed_ref_root {
>   	/* head ref rbtree */
>   	struct rb_root_cached href_root;
>
> -	/* Track dirty extent records. */
> +	/*
> +	 * Track dirty extent records.
> +	 * The keys correspond to the logical address of the extent ("bytenr")
> +	 * right shifted by fs_info->sectorsize_bits. This is both to get a more
> +	 * dense index space (optimizes xarray structure) and because indexes in
> +	 * xarrays are of "unsigned long" type, meaning they are 32 bits wide on
> +	 * 32 bits platforms, limiting the extent range to 4G which is too low
> +	 * and makes it unusable (truncated index values) on 32 bits platforms.
> +	 */
>   	struct xarray dirty_extents;
>
>   	/* this spin lock protects the rbtree and the entries inside */
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index c297909f1506..a76e4610fe80 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2005,7 +2005,7 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
>   				struct btrfs_qgroup_extent_record *record)
>   {
>   	struct btrfs_qgroup_extent_record *existing, *ret;
> -	unsigned long bytenr = record->bytenr;
> +	const unsigned long index = (record->bytenr >> fs_info->sectorsize_bits);

Could we check if record->bytenr >= MAX_LFS_FILESIZE and error out directly?
(With btrfs_err_32bit_limit() called to indicate the problem).

Just like what we did in alloc_extent_buffer().

Otherwise looks good.

Thanks,
Qu

>
>   	if (!btrfs_qgroup_full_accounting(fs_info))
>   		return 1;
> @@ -2014,7 +2014,7 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
>   	trace_btrfs_qgroup_trace_extent(fs_info, record);
>
>   	xa_lock(&delayed_refs->dirty_extents);
> -	existing = xa_load(&delayed_refs->dirty_extents, bytenr);
> +	existing = xa_load(&delayed_refs->dirty_extents, index);
>   	if (existing) {
>   		if (record->data_rsv && !existing->data_rsv) {
>   			existing->data_rsv = record->data_rsv;
> @@ -2024,7 +2024,7 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
>   		return 1;
>   	}
>
> -	ret = __xa_store(&delayed_refs->dirty_extents, record->bytenr, record, GFP_ATOMIC);
> +	ret = __xa_store(&delayed_refs->dirty_extents, index, record, GFP_ATOMIC);
>   	xa_unlock(&delayed_refs->dirty_extents);
>   	if (xa_is_err(ret)) {
>   		qgroup_mark_inconsistent(fs_info);
> @@ -2129,6 +2129,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
>   	struct btrfs_fs_info *fs_info = trans->fs_info;
>   	struct btrfs_qgroup_extent_record *record;
>   	struct btrfs_delayed_ref_root *delayed_refs;
> +	const unsigned long index = (bytenr >> fs_info->sectorsize_bits);
>   	int ret;
>
>   	if (!btrfs_qgroup_full_accounting(fs_info) || bytenr == 0 || num_bytes == 0)
> @@ -2137,7 +2138,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
>   	if (!record)
>   		return -ENOMEM;
>
> -	if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents, bytenr, GFP_NOFS)) {
> +	if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents, index, GFP_NOFS)) {
>   		kfree(record);
>   		return -ENOMEM;
>   	}
> @@ -2152,7 +2153,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
>   	spin_unlock(&delayed_refs->lock);
>   	if (ret) {
>   		/* Clean up if insertion fails or item exists. */
> -		xa_release(&delayed_refs->dirty_extents, record->bytenr);
> +		xa_release(&delayed_refs->dirty_extents, index);
>   		kfree(record);
>   		return 0;
>   	}


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements
  2024-09-25 10:50 [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements fdmanana
                   ` (7 preceding siblings ...)
  2024-09-25 10:50 ` [PATCH 8/8] btrfs: remove pointless initialization " fdmanana
@ 2024-09-25 22:35 ` Qu Wenruo
  2024-09-26  9:33 ` [PATCH v2 " fdmanana
  9 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2024-09-25 22:35 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



在 2024/9/25 20:20, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> Some fixes around delayed refs and qgroups after the conversion of a
> red black tree to xarray in this merge window, and some improvements
> and cleanups. Details in the changelogs.
>
> Filipe Manana (8):
>    btrfs: fix missing error handling when adding delayed ref with qgroups enabled
>    btrfs: use sector numbers as keys for the dirty extents xarray

I'd prefer an error out other than continuing if the record->bytenr goes
beyond MAX_LFS_FILESIZE to be extra safe on 32bit systems.

Otherwise the remaining patches look good to me.

Thanks for the cleanup, especially since the convert to xarray
introduces extra xa_lock for us, it means we do not need to re-use the
delayed refs lock.

Thanks,
Qu

>    btrfs: end assignment with semicolon at btrfs_qgroup_extent event class
>    btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_record
>    btrfs: store fs_info in a local variable at btrfs_qgroup_trace_extent_post()
>    btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_extent()
>    btrfs: always use delayed_refs local variable at btrfs_qgroup_trace_extent()
>    btrfs: remove pointless initialization at btrfs_qgroup_trace_extent()
>
>   fs/btrfs/delayed-ref.c       | 59 ++++++++++++++++++++++++++----------
>   fs/btrfs/delayed-ref.h       | 10 +++++-
>   fs/btrfs/qgroup.c            | 58 +++++++++++++++++------------------
>   fs/btrfs/qgroup.h            | 13 ++++++--
>   include/trace/events/btrfs.h | 17 ++++++-----
>   5 files changed, 101 insertions(+), 56 deletions(-)
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements
  2024-09-25 10:50 [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements fdmanana
                   ` (8 preceding siblings ...)
  2024-09-25 22:35 ` [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements Qu Wenruo
@ 2024-09-26  9:33 ` fdmanana
  2024-09-26  9:33   ` [PATCH v2 1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled fdmanana
                     ` (8 more replies)
  9 siblings, 9 replies; 25+ messages in thread
From: fdmanana @ 2024-09-26  9:33 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Some fixes around delayed refs and qgroups after the conversion of a
red black tree to xarray in this merge window, and some improvements
and cleanups. Details in the changelogs.

V2: Updated patch 2/8 to check for MAX_LFS_FILESIZE and error out.

Filipe Manana (8):
  btrfs: fix missing error handling when adding delayed ref with qgroups enabled
  btrfs: use sector numbers as keys for the dirty extents xarray
  btrfs: end assignment with semicolon at btrfs_qgroup_extent event class
  btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_record
  btrfs: store fs_info in a local variable at btrfs_qgroup_trace_extent_post()
  btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_extent()
  btrfs: always use delayed_refs local variable at btrfs_qgroup_trace_extent()
  btrfs: remove pointless initialization at btrfs_qgroup_trace_extent()

 fs/btrfs/delayed-ref.c       | 59 ++++++++++++++++++++++---------
 fs/btrfs/delayed-ref.h       | 10 +++++-
 fs/btrfs/qgroup.c            | 68 +++++++++++++++++++++---------------
 fs/btrfs/qgroup.h            | 13 +++++--
 include/trace/events/btrfs.h | 17 +++++----
 5 files changed, 111 insertions(+), 56 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled
  2024-09-26  9:33 ` [PATCH v2 " fdmanana
@ 2024-09-26  9:33   ` fdmanana
  2024-10-08 19:39     ` Kees Bakker
  2024-09-26  9:33   ` [PATCH v2 2/8] btrfs: use sector numbers as keys for the dirty extents xarray fdmanana
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: fdmanana @ 2024-09-26  9:33 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(),
if we fail to insert the qgroup record we don't error out, we ignore it.
In fact we treat it as if there was no error and there was already an
existing record - we don't distinguish between the cases where
btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already
existed and we can free the given record, and the case where it returns
a negative error value, meaning the insertion into the xarray that is
used to track records failed.

Effectively we end up ignoring that we are lacking qgroup record in the
dirty extents xarray, resulting in incorrect qgroup accounting.

Fix this by checking for errors and return them to the callers.

Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/delayed-ref.c | 42 +++++++++++++++++++++++++++++++++---------
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index ad9ef8312e41..32f719b9e661 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -840,6 +840,8 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref,
  * helper function to actually insert a head node into the rbtree.
  * this does all the dirty work in terms of maintaining the correct
  * overall modification count.
+ *
+ * Returns an error pointer in case of an error.
  */
 static noinline struct btrfs_delayed_ref_head *
 add_delayed_ref_head(struct btrfs_trans_handle *trans,
@@ -862,6 +864,9 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 		if (ret) {
 			/* Clean up if insertion fails or item exists. */
 			xa_release(&delayed_refs->dirty_extents, qrecord->bytenr);
+			/* Caller responsible for freeing qrecord on error. */
+			if (ret < 0)
+				return ERR_PTR(ret);
 			kfree(qrecord);
 		} else {
 			qrecord_inserted = true;
@@ -1000,27 +1005,35 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_delayed_ref_node *node;
 	struct btrfs_delayed_ref_head *head_ref;
+	struct btrfs_delayed_ref_head *new_head_ref;
 	struct btrfs_delayed_ref_root *delayed_refs;
 	struct btrfs_qgroup_extent_record *record = NULL;
 	bool qrecord_inserted;
 	int action = generic_ref->action;
 	bool merged;
+	int ret;
 
 	node = kmem_cache_alloc(btrfs_delayed_ref_node_cachep, GFP_NOFS);
 	if (!node)
 		return -ENOMEM;
 
 	head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, GFP_NOFS);
-	if (!head_ref)
+	if (!head_ref) {
+		ret = -ENOMEM;
 		goto free_node;
+	}
 
 	if (btrfs_qgroup_full_accounting(fs_info) && !generic_ref->skip_qgroup) {
 		record = kzalloc(sizeof(*record), GFP_NOFS);
-		if (!record)
+		if (!record) {
+			ret = -ENOMEM;
 			goto free_head_ref;
+		}
 		if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents,
-			       generic_ref->bytenr, GFP_NOFS))
+			       generic_ref->bytenr, GFP_NOFS)) {
+			ret = -ENOMEM;
 			goto free_record;
+		}
 	}
 
 	init_delayed_ref_common(fs_info, node, generic_ref);
@@ -1034,8 +1047,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
 	 * insert both the head node and the new ref without dropping
 	 * the spin lock
 	 */
-	head_ref = add_delayed_ref_head(trans, head_ref, record,
-					action, &qrecord_inserted);
+	new_head_ref = add_delayed_ref_head(trans, head_ref, record,
+					    action, &qrecord_inserted);
+	if (IS_ERR(new_head_ref)) {
+		spin_unlock(&delayed_refs->lock);
+		ret = PTR_ERR(new_head_ref);
+		goto free_record;
+	}
+	head_ref = new_head_ref;
 
 	merged = insert_delayed_ref(trans, head_ref, node);
 	spin_unlock(&delayed_refs->lock);
@@ -1063,7 +1082,7 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
 	kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
 free_node:
 	kmem_cache_free(btrfs_delayed_ref_node_cachep, node);
-	return -ENOMEM;
+	return ret;
 }
 
 /*
@@ -1094,6 +1113,7 @@ int btrfs_add_delayed_extent_op(struct btrfs_trans_handle *trans,
 				struct btrfs_delayed_extent_op *extent_op)
 {
 	struct btrfs_delayed_ref_head *head_ref;
+	struct btrfs_delayed_ref_head *head_ref_ret;
 	struct btrfs_delayed_ref_root *delayed_refs;
 	struct btrfs_ref generic_ref = {
 		.type = BTRFS_REF_METADATA,
@@ -1113,11 +1133,15 @@ int btrfs_add_delayed_extent_op(struct btrfs_trans_handle *trans,
 	delayed_refs = &trans->transaction->delayed_refs;
 	spin_lock(&delayed_refs->lock);
 
-	add_delayed_ref_head(trans, head_ref, NULL, BTRFS_UPDATE_DELAYED_HEAD,
-			     NULL);
-
+	head_ref_ret = add_delayed_ref_head(trans, head_ref, NULL,
+					    BTRFS_UPDATE_DELAYED_HEAD, NULL);
 	spin_unlock(&delayed_refs->lock);
 
+	if (IS_ERR(head_ref_ret)) {
+		kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
+		return PTR_ERR(head_ref_ret);
+	}
+
 	/*
 	 * Need to update the delayed_refs_rsv with any changes we may have
 	 * made.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/8] btrfs: use sector numbers as keys for the dirty extents xarray
  2024-09-26  9:33 ` [PATCH v2 " fdmanana
  2024-09-26  9:33   ` [PATCH v2 1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled fdmanana
@ 2024-09-26  9:33   ` fdmanana
  2024-09-26  9:33   ` [PATCH v2 3/8] btrfs: end assignment with semicolon at btrfs_qgroup_extent event class fdmanana
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: fdmanana @ 2024-09-26  9:33 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

We are using the logical address ("bytenr") of an extent as the key for
qgroup records in the dirty extents xarray. This is a problem because the
xarrays use "unsigned long" for keys/indices, meaning that on a 32 bits
platform any extent starting at or beyond 4G is truncated, which is a too
low limitation as virtually everyone is using storage with more than 4G of
space. This means a "bytenr" of 4G gets truncated to 0, and so does 8G and
16G for example, resulting in incorrect qgroup accounting.

Fix this by using sector numbers as keys instead, that is, using keys that
match the logical address right shifted by fs_info->sectorsize_bits, which
is what we do for the fs_info->buffer_radix that tracks extent buffers
(radix trees also use an "unsigned long" type for keys). This also makes
the index space more dense which helps optimize the xarray (as mentioned
at Documentation/core-api/xarray.rst).

Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/delayed-ref.c | 13 ++++++++-----
 fs/btrfs/delayed-ref.h | 10 +++++++++-
 fs/btrfs/qgroup.c      | 21 ++++++++++++++++-----
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 32f719b9e661..f075ac11e51c 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -849,6 +849,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 		     struct btrfs_qgroup_extent_record *qrecord,
 		     int action, bool *qrecord_inserted_ret)
 {
+	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_delayed_ref_head *existing;
 	struct btrfs_delayed_ref_root *delayed_refs;
 	bool qrecord_inserted = false;
@@ -859,11 +860,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 	if (qrecord) {
 		int ret;
 
-		ret = btrfs_qgroup_trace_extent_nolock(trans->fs_info,
+		ret = btrfs_qgroup_trace_extent_nolock(fs_info,
 						       delayed_refs, qrecord);
 		if (ret) {
 			/* Clean up if insertion fails or item exists. */
-			xa_release(&delayed_refs->dirty_extents, qrecord->bytenr);
+			xa_release(&delayed_refs->dirty_extents,
+				   qrecord->bytenr >> fs_info->sectorsize_bits);
 			/* Caller responsible for freeing qrecord on error. */
 			if (ret < 0)
 				return ERR_PTR(ret);
@@ -873,7 +875,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 		}
 	}
 
-	trace_add_delayed_ref_head(trans->fs_info, head_ref, action);
+	trace_add_delayed_ref_head(fs_info, head_ref, action);
 
 	existing = htree_insert(&delayed_refs->href_root,
 				&head_ref->href_node);
@@ -895,7 +897,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 		if (head_ref->is_data && head_ref->ref_mod < 0) {
 			delayed_refs->pending_csums += head_ref->num_bytes;
 			trans->delayed_ref_csum_deletions +=
-				btrfs_csum_bytes_to_leaves(trans->fs_info,
+				btrfs_csum_bytes_to_leaves(fs_info,
 							   head_ref->num_bytes);
 		}
 		delayed_refs->num_heads++;
@@ -1030,7 +1032,8 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
 			goto free_head_ref;
 		}
 		if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents,
-			       generic_ref->bytenr, GFP_NOFS)) {
+			       generic_ref->bytenr >> fs_info->sectorsize_bits,
+			       GFP_NOFS)) {
 			ret = -ENOMEM;
 			goto free_record;
 		}
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 085f30968aba..352921e76c74 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -202,7 +202,15 @@ struct btrfs_delayed_ref_root {
 	/* head ref rbtree */
 	struct rb_root_cached href_root;
 
-	/* Track dirty extent records. */
+	/*
+	 * Track dirty extent records.
+	 * The keys correspond to the logical address of the extent ("bytenr")
+	 * right shifted by fs_info->sectorsize_bits. This is both to get a more
+	 * dense index space (optimizes xarray structure) and because indexes in
+	 * xarrays are of "unsigned long" type, meaning they are 32 bits wide on
+	 * 32 bits platforms, limiting the extent range to 4G which is too low
+	 * and makes it unusable (truncated index values) on 32 bits platforms.
+	 */
 	struct xarray dirty_extents;
 
 	/* this spin lock protects the rbtree and the entries inside */
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index c297909f1506..152a43ae62c4 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2005,16 +2005,26 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
 				struct btrfs_qgroup_extent_record *record)
 {
 	struct btrfs_qgroup_extent_record *existing, *ret;
-	unsigned long bytenr = record->bytenr;
+	const unsigned long index = (record->bytenr >> fs_info->sectorsize_bits);
 
 	if (!btrfs_qgroup_full_accounting(fs_info))
 		return 1;
 
+#if BITS_PER_LONG == 32
+	if (bytenr >= MAX_LFS_FILESIZE) {
+		btrfs_err_rl(fs_info,
+"qgroup record for extent at %llu is beyond 32bit page cache and xarray index limit",
+			     bytenr);
+		btrfs_err_32bit_limit(fs_info);
+		return -EOVERFLOW;
+	}
+#endif
+
 	lockdep_assert_held(&delayed_refs->lock);
 	trace_btrfs_qgroup_trace_extent(fs_info, record);
 
 	xa_lock(&delayed_refs->dirty_extents);
-	existing = xa_load(&delayed_refs->dirty_extents, bytenr);
+	existing = xa_load(&delayed_refs->dirty_extents, index);
 	if (existing) {
 		if (record->data_rsv && !existing->data_rsv) {
 			existing->data_rsv = record->data_rsv;
@@ -2024,7 +2034,7 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
 		return 1;
 	}
 
-	ret = __xa_store(&delayed_refs->dirty_extents, record->bytenr, record, GFP_ATOMIC);
+	ret = __xa_store(&delayed_refs->dirty_extents, index, record, GFP_ATOMIC);
 	xa_unlock(&delayed_refs->dirty_extents);
 	if (xa_is_err(ret)) {
 		qgroup_mark_inconsistent(fs_info);
@@ -2129,6 +2139,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_qgroup_extent_record *record;
 	struct btrfs_delayed_ref_root *delayed_refs;
+	const unsigned long index = (bytenr >> fs_info->sectorsize_bits);
 	int ret;
 
 	if (!btrfs_qgroup_full_accounting(fs_info) || bytenr == 0 || num_bytes == 0)
@@ -2137,7 +2148,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	if (!record)
 		return -ENOMEM;
 
-	if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents, bytenr, GFP_NOFS)) {
+	if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents, index, GFP_NOFS)) {
 		kfree(record);
 		return -ENOMEM;
 	}
@@ -2152,7 +2163,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	spin_unlock(&delayed_refs->lock);
 	if (ret) {
 		/* Clean up if insertion fails or item exists. */
-		xa_release(&delayed_refs->dirty_extents, record->bytenr);
+		xa_release(&delayed_refs->dirty_extents, index);
 		kfree(record);
 		return 0;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/8] btrfs: end assignment with semicolon at btrfs_qgroup_extent event class
  2024-09-26  9:33 ` [PATCH v2 " fdmanana
  2024-09-26  9:33   ` [PATCH v2 1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled fdmanana
  2024-09-26  9:33   ` [PATCH v2 2/8] btrfs: use sector numbers as keys for the dirty extents xarray fdmanana
@ 2024-09-26  9:33   ` fdmanana
  2024-09-26  9:33   ` [PATCH v2 4/8] btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_record fdmanana
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: fdmanana @ 2024-09-26  9:33 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

While running checkpatch against a patch that modifies the
btrfs_qgroup_extent event class, it complained about using a comma instead
of a semicolon:

  $ ./scripts/checkpatch.pl qgroups/0003-btrfs-qgroups-remove-bytenr-field-from-struct-btrfs_.patch
  WARNING: Possible comma where semicolon could be used
  #215: FILE: include/trace/events/btrfs.h:1720:
  +		__entry->bytenr		= bytenr,
		__entry->num_bytes	= rec->num_bytes;

  total: 0 errors, 1 warnings, 184 lines checked

So replace the comma with a semicolon to silence checkpatch and possibly
other tools. It also makes the code consistent with the rest.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 include/trace/events/btrfs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index bf60ad50011e..af6b3827fb1d 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1716,7 +1716,7 @@ DECLARE_EVENT_CLASS(btrfs_qgroup_extent,
 	),
 
 	TP_fast_assign_btrfs(fs_info,
-		__entry->bytenr		= rec->bytenr,
+		__entry->bytenr		= rec->bytenr;
 		__entry->num_bytes	= rec->num_bytes;
 	),
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 4/8] btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_record
  2024-09-26  9:33 ` [PATCH v2 " fdmanana
                     ` (2 preceding siblings ...)
  2024-09-26  9:33   ` [PATCH v2 3/8] btrfs: end assignment with semicolon at btrfs_qgroup_extent event class fdmanana
@ 2024-09-26  9:33   ` fdmanana
  2024-09-26  9:33   ` [PATCH v2 5/8] btrfs: store fs_info in a local variable at btrfs_qgroup_trace_extent_post() fdmanana
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: fdmanana @ 2024-09-26  9:33 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Now that we track qgroup extent records in a xarray we don't need to have
a "bytenr" field in  struct btrfs_qgroup_extent_record, since we can get
it from the index of the record in the xarray.

So remove the field and grab the bytenr from either the index key or any
other place where it's available (delayed refs). This reduces the size of
struct btrfs_qgroup_extent_record from 40 bytes down to 32 bytes, meaning
that we now can store 128 instances of this structure instead of 102 per
4K page.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/delayed-ref.c       |  8 ++++----
 fs/btrfs/qgroup.c            | 27 +++++++++++++++------------
 fs/btrfs/qgroup.h            | 13 ++++++++++---
 include/trace/events/btrfs.h | 17 ++++++++++-------
 4 files changed, 39 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index f075ac11e51c..388d4ed1938e 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -830,7 +830,6 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref,
 			qrecord->data_rsv = reserved;
 			qrecord->data_rsv_refroot = generic_ref->ref_root;
 		}
-		qrecord->bytenr = generic_ref->bytenr;
 		qrecord->num_bytes = generic_ref->num_bytes;
 		qrecord->old_roots = NULL;
 	}
@@ -861,11 +860,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
 		int ret;
 
 		ret = btrfs_qgroup_trace_extent_nolock(fs_info,
-						       delayed_refs, qrecord);
+						       delayed_refs, qrecord,
+						       head_ref->bytenr);
 		if (ret) {
 			/* Clean up if insertion fails or item exists. */
 			xa_release(&delayed_refs->dirty_extents,
-				   qrecord->bytenr >> fs_info->sectorsize_bits);
+				   head_ref->bytenr >> fs_info->sectorsize_bits);
 			/* Caller responsible for freeing qrecord on error. */
 			if (ret < 0)
 				return ERR_PTR(ret);
@@ -1076,7 +1076,7 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
 		kmem_cache_free(btrfs_delayed_ref_node_cachep, node);
 
 	if (qrecord_inserted)
-		return btrfs_qgroup_trace_extent_post(trans, record);
+		return btrfs_qgroup_trace_extent_post(trans, record, head_ref->bytenr);
 	return 0;
 
 free_record:
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 152a43ae62c4..722edb04b78f 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2001,11 +2001,12 @@ int btrfs_limit_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid,
  * Return <0 for insertion failure, caller can free @record safely.
  */
 int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
-				struct btrfs_delayed_ref_root *delayed_refs,
-				struct btrfs_qgroup_extent_record *record)
+				     struct btrfs_delayed_ref_root *delayed_refs,
+				     struct btrfs_qgroup_extent_record *record,
+				     u64 bytenr)
 {
 	struct btrfs_qgroup_extent_record *existing, *ret;
-	const unsigned long index = (record->bytenr >> fs_info->sectorsize_bits);
+	const unsigned long index = (bytenr >> fs_info->sectorsize_bits);
 
 	if (!btrfs_qgroup_full_accounting(fs_info))
 		return 1;
@@ -2021,7 +2022,7 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
 #endif
 
 	lockdep_assert_held(&delayed_refs->lock);
-	trace_btrfs_qgroup_trace_extent(fs_info, record);
+	trace_btrfs_qgroup_trace_extent(fs_info, record, bytenr);
 
 	xa_lock(&delayed_refs->dirty_extents);
 	existing = xa_load(&delayed_refs->dirty_extents, index);
@@ -2066,7 +2067,8 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
  * transaction committing, but not now as qgroup accounting will be wrong again.
  */
 int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans,
-				   struct btrfs_qgroup_extent_record *qrecord)
+				   struct btrfs_qgroup_extent_record *qrecord,
+				   u64 bytenr)
 {
 	struct btrfs_backref_walk_ctx ctx = { 0 };
 	int ret;
@@ -2097,7 +2099,7 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans,
 	if (trans->fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING)
 		return 0;
 
-	ctx.bytenr = qrecord->bytenr;
+	ctx.bytenr = bytenr;
 	ctx.fs_info = trans->fs_info;
 
 	ret = btrfs_find_all_roots(&ctx, true);
@@ -2154,12 +2156,11 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	}
 
 	delayed_refs = &trans->transaction->delayed_refs;
-	record->bytenr = bytenr;
 	record->num_bytes = num_bytes;
 	record->old_roots = NULL;
 
 	spin_lock(&delayed_refs->lock);
-	ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs, record);
+	ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs, record, bytenr);
 	spin_unlock(&delayed_refs->lock);
 	if (ret) {
 		/* Clean up if insertion fails or item exists. */
@@ -2167,7 +2168,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 		kfree(record);
 		return 0;
 	}
-	return btrfs_qgroup_trace_extent_post(trans, record);
+	return btrfs_qgroup_trace_extent_post(trans, record, bytenr);
 }
 
 /*
@@ -3043,14 +3044,16 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans)
 	delayed_refs = &trans->transaction->delayed_refs;
 	qgroup_to_skip = delayed_refs->qgroup_to_skip;
 	xa_for_each(&delayed_refs->dirty_extents, index, record) {
+		const u64 bytenr = (index << fs_info->sectorsize_bits);
+
 		num_dirty_extents++;
-		trace_btrfs_qgroup_account_extents(fs_info, record);
+		trace_btrfs_qgroup_account_extents(fs_info, record, bytenr);
 
 		if (!ret && !(fs_info->qgroup_flags &
 			      BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING)) {
 			struct btrfs_backref_walk_ctx ctx = { 0 };
 
-			ctx.bytenr = record->bytenr;
+			ctx.bytenr = bytenr;
 			ctx.fs_info = fs_info;
 
 			/*
@@ -3092,7 +3095,7 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans)
 				ulist_del(record->old_roots, qgroup_to_skip,
 					  0);
 			}
-			ret = btrfs_qgroup_account_extent(trans, record->bytenr,
+			ret = btrfs_qgroup_account_extent(trans, bytenr,
 							  record->num_bytes,
 							  record->old_roots,
 							  new_roots);
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 98adf4ec7b01..836e9f59ec84 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -125,7 +125,12 @@ struct btrfs_inode;
  * Record a dirty extent, and info qgroup to update quota on it
  */
 struct btrfs_qgroup_extent_record {
-	u64 bytenr;
+	/*
+	 * The bytenr of the extent is given by its index in the dirty_extents
+	 * xarray of struct btrfs_delayed_ref_root left shifted by
+	 * fs_info->sectorsize_bits.
+	 */
+
 	u64 num_bytes;
 
 	/*
@@ -343,9 +348,11 @@ void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info);
 int btrfs_qgroup_trace_extent_nolock(
 		struct btrfs_fs_info *fs_info,
 		struct btrfs_delayed_ref_root *delayed_refs,
-		struct btrfs_qgroup_extent_record *record);
+		struct btrfs_qgroup_extent_record *record,
+		u64 bytenr);
 int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans,
-				   struct btrfs_qgroup_extent_record *qrecord);
+				   struct btrfs_qgroup_extent_record *qrecord,
+				   u64 bytenr);
 int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 			      u64 num_bytes);
 int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans,
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index af6b3827fb1d..8d2ff32fb3b0 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1706,9 +1706,10 @@ DEFINE_EVENT(btrfs__qgroup_rsv_data, btrfs_qgroup_release_data,
 
 DECLARE_EVENT_CLASS(btrfs_qgroup_extent,
 	TP_PROTO(const struct btrfs_fs_info *fs_info,
-		 const struct btrfs_qgroup_extent_record *rec),
+		 const struct btrfs_qgroup_extent_record *rec,
+		 u64 bytenr),
 
-	TP_ARGS(fs_info, rec),
+	TP_ARGS(fs_info, rec, bytenr),
 
 	TP_STRUCT__entry_btrfs(
 		__field(	u64,  bytenr		)
@@ -1716,7 +1717,7 @@ DECLARE_EVENT_CLASS(btrfs_qgroup_extent,
 	),
 
 	TP_fast_assign_btrfs(fs_info,
-		__entry->bytenr		= rec->bytenr;
+		__entry->bytenr		= bytenr;
 		__entry->num_bytes	= rec->num_bytes;
 	),
 
@@ -1727,17 +1728,19 @@ DECLARE_EVENT_CLASS(btrfs_qgroup_extent,
 DEFINE_EVENT(btrfs_qgroup_extent, btrfs_qgroup_account_extents,
 
 	TP_PROTO(const struct btrfs_fs_info *fs_info,
-		 const struct btrfs_qgroup_extent_record *rec),
+		 const struct btrfs_qgroup_extent_record *rec,
+		 u64 bytenr),
 
-	TP_ARGS(fs_info, rec)
+	TP_ARGS(fs_info, rec, bytenr)
 );
 
 DEFINE_EVENT(btrfs_qgroup_extent, btrfs_qgroup_trace_extent,
 
 	TP_PROTO(const struct btrfs_fs_info *fs_info,
-		 const struct btrfs_qgroup_extent_record *rec),
+		 const struct btrfs_qgroup_extent_record *rec,
+		 u64 bytenr),
 
-	TP_ARGS(fs_info, rec)
+	TP_ARGS(fs_info, rec, bytenr)
 );
 
 TRACE_EVENT(qgroup_num_dirty_extents,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 5/8] btrfs: store fs_info in a local variable at btrfs_qgroup_trace_extent_post()
  2024-09-26  9:33 ` [PATCH v2 " fdmanana
                     ` (3 preceding siblings ...)
  2024-09-26  9:33   ` [PATCH v2 4/8] btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_record fdmanana
@ 2024-09-26  9:33   ` fdmanana
  2024-09-26  9:33   ` [PATCH v2 6/8] btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_extent() fdmanana
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: fdmanana @ 2024-09-26  9:33 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Instead of extracting fs_info from the transaction multiples times, store
it in a local variable and use it.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/qgroup.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 722edb04b78f..8d000a9d35af 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2070,10 +2070,14 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans,
 				   struct btrfs_qgroup_extent_record *qrecord,
 				   u64 bytenr)
 {
-	struct btrfs_backref_walk_ctx ctx = { 0 };
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_backref_walk_ctx ctx = {
+		.bytenr = bytenr,
+		.fs_info = fs_info,
+	};
 	int ret;
 
-	if (!btrfs_qgroup_full_accounting(trans->fs_info))
+	if (!btrfs_qgroup_full_accounting(fs_info))
 		return 0;
 	/*
 	 * We are always called in a context where we are already holding a
@@ -2096,16 +2100,13 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans,
 	 */
 	ASSERT(trans != NULL);
 
-	if (trans->fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING)
+	if (fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING)
 		return 0;
 
-	ctx.bytenr = bytenr;
-	ctx.fs_info = trans->fs_info;
-
 	ret = btrfs_find_all_roots(&ctx, true);
 	if (ret < 0) {
-		qgroup_mark_inconsistent(trans->fs_info);
-		btrfs_warn(trans->fs_info,
+		qgroup_mark_inconsistent(fs_info);
+		btrfs_warn(fs_info,
 "error accounting new delayed refs extent (err code: %d), quota inconsistent",
 			ret);
 		return 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 6/8] btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_extent()
  2024-09-26  9:33 ` [PATCH v2 " fdmanana
                     ` (4 preceding siblings ...)
  2024-09-26  9:33   ` [PATCH v2 5/8] btrfs: store fs_info in a local variable at btrfs_qgroup_trace_extent_post() fdmanana
@ 2024-09-26  9:33   ` fdmanana
  2024-09-26  9:33   ` [PATCH v2 7/8] btrfs: always use delayed_refs local variable " fdmanana
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: fdmanana @ 2024-09-26  9:33 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

There's no need to hold the delayed refs spinlock when calling
btrfs_qgroup_trace_extent_nolock() from btrfs_qgroup_trace_extent(), since
it doesn't change anything in delayed refs and it only changes the xarray
used to track qgroup extent records, which is protected by the xarray's
lock.

Holding the lock is only adding unnecessary lock contention with other
tasks that actually need to take the lock to add/remove/change delayed
references. So remove the locking.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/qgroup.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 8d000a9d35af..3574af6d25ef 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2021,7 +2021,6 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
 	}
 #endif
 
-	lockdep_assert_held(&delayed_refs->lock);
 	trace_btrfs_qgroup_trace_extent(fs_info, record, bytenr);
 
 	xa_lock(&delayed_refs->dirty_extents);
@@ -2160,9 +2159,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	record->num_bytes = num_bytes;
 	record->old_roots = NULL;
 
-	spin_lock(&delayed_refs->lock);
 	ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs, record, bytenr);
-	spin_unlock(&delayed_refs->lock);
 	if (ret) {
 		/* Clean up if insertion fails or item exists. */
 		xa_release(&delayed_refs->dirty_extents, index);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 7/8] btrfs: always use delayed_refs local variable at btrfs_qgroup_trace_extent()
  2024-09-26  9:33 ` [PATCH v2 " fdmanana
                     ` (5 preceding siblings ...)
  2024-09-26  9:33   ` [PATCH v2 6/8] btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_extent() fdmanana
@ 2024-09-26  9:33   ` fdmanana
  2024-09-26  9:33   ` [PATCH v2 8/8] btrfs: remove pointless initialization " fdmanana
  2024-09-26  9:55   ` [PATCH v2 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements Qu Wenruo
  8 siblings, 0 replies; 25+ messages in thread
From: fdmanana @ 2024-09-26  9:33 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Instead of dereferecing the delayed refs from the transaction multiple
times, store it early in the local variable and then always use the
variable.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/qgroup.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 3574af6d25ef..6294f949417c 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2140,7 +2140,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_qgroup_extent_record *record;
-	struct btrfs_delayed_ref_root *delayed_refs;
+	struct btrfs_delayed_ref_root *delayed_refs = &trans->transaction->delayed_refs;
 	const unsigned long index = (bytenr >> fs_info->sectorsize_bits);
 	int ret;
 
@@ -2150,12 +2150,11 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	if (!record)
 		return -ENOMEM;
 
-	if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents, index, GFP_NOFS)) {
+	if (xa_reserve(&delayed_refs->dirty_extents, index, GFP_NOFS)) {
 		kfree(record);
 		return -ENOMEM;
 	}
 
-	delayed_refs = &trans->transaction->delayed_refs;
 	record->num_bytes = num_bytes;
 	record->old_roots = NULL;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 8/8] btrfs: remove pointless initialization at btrfs_qgroup_trace_extent()
  2024-09-26  9:33 ` [PATCH v2 " fdmanana
                     ` (6 preceding siblings ...)
  2024-09-26  9:33   ` [PATCH v2 7/8] btrfs: always use delayed_refs local variable " fdmanana
@ 2024-09-26  9:33   ` fdmanana
  2024-09-26  9:55   ` [PATCH v2 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements Qu Wenruo
  8 siblings, 0 replies; 25+ messages in thread
From: fdmanana @ 2024-09-26  9:33 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

The qgroup record was allocated with kzalloc(), so it's pointless to set
its old_roots member to NULL. Remove the assignment.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/qgroup.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 6294f949417c..34bb72f32ec4 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2156,7 +2156,6 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	}
 
 	record->num_bytes = num_bytes;
-	record->old_roots = NULL;
 
 	ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs, record, bytenr);
 	if (ret) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements
  2024-09-26  9:33 ` [PATCH v2 " fdmanana
                     ` (7 preceding siblings ...)
  2024-09-26  9:33   ` [PATCH v2 8/8] btrfs: remove pointless initialization " fdmanana
@ 2024-09-26  9:55   ` Qu Wenruo
  8 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2024-09-26  9:55 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



在 2024/9/26 19:03, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> Some fixes around delayed refs and qgroups after the conversion of a
> red black tree to xarray in this merge window, and some improvements
> and cleanups. Details in the changelogs.
>
> V2: Updated patch 2/8 to check for MAX_LFS_FILESIZE and error out.
>
> Filipe Manana (8):
>    btrfs: fix missing error handling when adding delayed ref with qgroups enabled
>    btrfs: use sector numbers as keys for the dirty extents xarray
>    btrfs: end assignment with semicolon at btrfs_qgroup_extent event class
>    btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_record
>    btrfs: store fs_info in a local variable at btrfs_qgroup_trace_extent_post()
>    btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_extent()
>    btrfs: always use delayed_refs local variable at btrfs_qgroup_trace_extent()
>    btrfs: remove pointless initialization at btrfs_qgroup_trace_extent()

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

>
>   fs/btrfs/delayed-ref.c       | 59 ++++++++++++++++++++++---------
>   fs/btrfs/delayed-ref.h       | 10 +++++-
>   fs/btrfs/qgroup.c            | 68 +++++++++++++++++++++---------------
>   fs/btrfs/qgroup.h            | 13 +++++--
>   include/trace/events/btrfs.h | 17 +++++----
>   5 files changed, 111 insertions(+), 56 deletions(-)
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled
  2024-09-26  9:33   ` [PATCH v2 1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled fdmanana
@ 2024-10-08 19:39     ` Kees Bakker
  2024-10-08 20:03       ` Filipe Manana
  0 siblings, 1 reply; 25+ messages in thread
From: Kees Bakker @ 2024-10-08 19:39 UTC (permalink / raw)
  To: fdmanana, linux-btrfs

Op 26-09-2024 om 11:33 schreef fdmanana@kernel.org:
> From: Filipe Manana <fdmanana@suse.com>
>
> When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(),
> if we fail to insert the qgroup record we don't error out, we ignore it.
> In fact we treat it as if there was no error and there was already an
> existing record - we don't distinguish between the cases where
> btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already
> existed and we can free the given record, and the case where it returns
> a negative error value, meaning the insertion into the xarray that is
> used to track records failed.
>
> Effectively we end up ignoring that we are lacking qgroup record in the
> dirty extents xarray, resulting in incorrect qgroup accounting.
>
> Fix this by checking for errors and return them to the callers.
>
> Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> [...
> @@ -1034,8 +1047,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
>   	 * insert both the head node and the new ref without dropping
>   	 * the spin lock
>   	 */
> -	head_ref = add_delayed_ref_head(trans, head_ref, record,
> -					action, &qrecord_inserted);
> +	new_head_ref = add_delayed_ref_head(trans, head_ref, record,
> +					    action, &qrecord_inserted);
> +	if (IS_ERR(new_head_ref)) {
> +		spin_unlock(&delayed_refs->lock);
> +		ret = PTR_ERR(new_head_ref);
> +		goto free_record;
> +	}
> +	head_ref = new_head_ref;
>
There is a chance (not sure how big a chance) that head_ref is going to 
be freed twice.

In the IS_ERR true path, head_ref still has the old value from before 
calling add_delayed_ref_head.
However, in add_delayed_ref_head is is possible that head_ref is freed 
and replaced. If
IS_ERR(new_head_ref) is true the code jumps to the end of the function 
where (the old) head_ref
is freed again.

Is it perhaps possible to assign head_ref to the new value before doing 
the IS_ERR check?
In other words, do this:
         head_ref = new_head_ref;
         if (IS_ERR(new_head_ref)) {
                 spin_unlock(&delayed_refs->lock);
                 ret = PTR_ERR(new_head_ref);
                 goto free_record;
         }

-- 
Kees

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled
  2024-10-08 19:39     ` Kees Bakker
@ 2024-10-08 20:03       ` Filipe Manana
  2024-10-08 20:20         ` Kees Bakker
  0 siblings, 1 reply; 25+ messages in thread
From: Filipe Manana @ 2024-10-08 20:03 UTC (permalink / raw)
  To: Kees Bakker; +Cc: linux-btrfs

On Tue, Oct 8, 2024 at 8:39 PM Kees Bakker <kees@ijzerbout.nl> wrote:
>
> Op 26-09-2024 om 11:33 schreef fdmanana@kernel.org:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(),
> > if we fail to insert the qgroup record we don't error out, we ignore it.
> > In fact we treat it as if there was no error and there was already an
> > existing record - we don't distinguish between the cases where
> > btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already
> > existed and we can free the given record, and the case where it returns
> > a negative error value, meaning the insertion into the xarray that is
> > used to track records failed.
> >
> > Effectively we end up ignoring that we are lacking qgroup record in the
> > dirty extents xarray, resulting in incorrect qgroup accounting.
> >
> > Fix this by checking for errors and return them to the callers.
> >
> > Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> > [...
> > @@ -1034,8 +1047,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
> >        * insert both the head node and the new ref without dropping
> >        * the spin lock
> >        */
> > -     head_ref = add_delayed_ref_head(trans, head_ref, record,
> > -                                     action, &qrecord_inserted);
> > +     new_head_ref = add_delayed_ref_head(trans, head_ref, record,
> > +                                         action, &qrecord_inserted);
> > +     if (IS_ERR(new_head_ref)) {
> > +             spin_unlock(&delayed_refs->lock);
> > +             ret = PTR_ERR(new_head_ref);
> > +             goto free_record;
> > +     }
> > +     head_ref = new_head_ref;
> >
> There is a chance (not sure how big a chance) that head_ref is going to
> be freed twice.
>
> In the IS_ERR true path, head_ref still has the old value from before
> calling add_delayed_ref_head.
> However, in add_delayed_ref_head is is possible that head_ref is freed
> and replaced. If
> IS_ERR(new_head_ref) is true the code jumps to the end of the function
> where (the old) head_ref
> is freed again.

No, it's not possible to have a double free.
add_delayed_ref_head() never frees the given head_ref in case it
returns an error - it's the caller's responsibility to free it.

Thanks.

>
> Is it perhaps possible to assign head_ref to the new value before doing
> the IS_ERR check?
> In other words, do this:
>          head_ref = new_head_ref;
>          if (IS_ERR(new_head_ref)) {
>                  spin_unlock(&delayed_refs->lock);
>                  ret = PTR_ERR(new_head_ref);
>                  goto free_record;
>          }
>
> --
> Kees

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled
  2024-10-08 20:03       ` Filipe Manana
@ 2024-10-08 20:20         ` Kees Bakker
  2024-10-08 20:30           ` Filipe Manana
  0 siblings, 1 reply; 25+ messages in thread
From: Kees Bakker @ 2024-10-08 20:20 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs

Op 08-10-2024 om 22:03 schreef Filipe Manana:
> On Tue, Oct 8, 2024 at 8:39 PM Kees Bakker <kees@ijzerbout.nl> wrote:
>> Op 26-09-2024 om 11:33 schreef fdmanana@kernel.org:
>>> From: Filipe Manana <fdmanana@suse.com>
>>>
>>> When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(),
>>> if we fail to insert the qgroup record we don't error out, we ignore it.
>>> In fact we treat it as if there was no error and there was already an
>>> existing record - we don't distinguish between the cases where
>>> btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already
>>> existed and we can free the given record, and the case where it returns
>>> a negative error value, meaning the insertion into the xarray that is
>>> used to track records failed.
>>>
>>> Effectively we end up ignoring that we are lacking qgroup record in the
>>> dirty extents xarray, resulting in incorrect qgroup accounting.
>>>
>>> Fix this by checking for errors and return them to the callers.
>>>
>>> Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>> ---
>>> [...
>>> @@ -1034,8 +1047,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
>>>         * insert both the head node and the new ref without dropping
>>>         * the spin lock
>>>         */
>>> -     head_ref = add_delayed_ref_head(trans, head_ref, record,
>>> -                                     action, &qrecord_inserted);
>>> +     new_head_ref = add_delayed_ref_head(trans, head_ref, record,
>>> +                                         action, &qrecord_inserted);
>>> +     if (IS_ERR(new_head_ref)) {
>>> +             spin_unlock(&delayed_refs->lock);
>>> +             ret = PTR_ERR(new_head_ref);
>>> +             goto free_record;
>>> +     }
>>> +     head_ref = new_head_ref;
>>>
>> There is a chance (not sure how big a chance) that head_ref is going to
>> be freed twice.
>>
>> In the IS_ERR true path, head_ref still has the old value from before
>> calling add_delayed_ref_head.
>> However, in add_delayed_ref_head is is possible that head_ref is freed
>> and replaced. If
>> IS_ERR(new_head_ref) is true the code jumps to the end of the function
>> where (the old) head_ref
>> is freed again.
> No, it's not possible to have a double free.
> add_delayed_ref_head() never frees the given head_ref in case it
> returns an error - it's the caller's responsibility to free it.
>
> Thanks.
OK, but ... in add_delayed_ref_head() I see the following on line 881

         existing = htree_insert(&delayed_refs->href_root,
                                 &head_ref->href_node);
         if (existing) {
                 update_existing_head_ref(trans, existing, head_ref);
                 /*
                  * we've updated the existing ref, free the newly
                  * allocated ref
                  */
                 kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
                 head_ref = existing;

It's calling kmem_cache_free with (the old) head_ref, which can be 
repeated in add_delayed_ref
>
>> Is it perhaps possible to assign head_ref to the new value before doing
>> the IS_ERR check?
>> In other words, do this:
>>           head_ref = new_head_ref;
>>           if (IS_ERR(new_head_ref)) {
>>                   spin_unlock(&delayed_refs->lock);
>>                   ret = PTR_ERR(new_head_ref);
>>                   goto free_record;
>>           }
>>
>> --
>> Kees


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled
  2024-10-08 20:20         ` Kees Bakker
@ 2024-10-08 20:30           ` Filipe Manana
  0 siblings, 0 replies; 25+ messages in thread
From: Filipe Manana @ 2024-10-08 20:30 UTC (permalink / raw)
  To: Kees Bakker; +Cc: linux-btrfs

On Tue, Oct 8, 2024 at 9:21 PM Kees Bakker <kees@ijzerbout.nl> wrote:
>
> Op 08-10-2024 om 22:03 schreef Filipe Manana:
> > On Tue, Oct 8, 2024 at 8:39 PM Kees Bakker <kees@ijzerbout.nl> wrote:
> >> Op 26-09-2024 om 11:33 schreef fdmanana@kernel.org:
> >>> From: Filipe Manana <fdmanana@suse.com>
> >>>
> >>> When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(),
> >>> if we fail to insert the qgroup record we don't error out, we ignore it.
> >>> In fact we treat it as if there was no error and there was already an
> >>> existing record - we don't distinguish between the cases where
> >>> btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already
> >>> existed and we can free the given record, and the case where it returns
> >>> a negative error value, meaning the insertion into the xarray that is
> >>> used to track records failed.
> >>>
> >>> Effectively we end up ignoring that we are lacking qgroup record in the
> >>> dirty extents xarray, resulting in incorrect qgroup accounting.
> >>>
> >>> Fix this by checking for errors and return them to the callers.
> >>>
> >>> Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
> >>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >>> ---
> >>> [...
> >>> @@ -1034,8 +1047,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
> >>>         * insert both the head node and the new ref without dropping
> >>>         * the spin lock
> >>>         */
> >>> -     head_ref = add_delayed_ref_head(trans, head_ref, record,
> >>> -                                     action, &qrecord_inserted);
> >>> +     new_head_ref = add_delayed_ref_head(trans, head_ref, record,
> >>> +                                         action, &qrecord_inserted);
> >>> +     if (IS_ERR(new_head_ref)) {
> >>> +             spin_unlock(&delayed_refs->lock);
> >>> +             ret = PTR_ERR(new_head_ref);
> >>> +             goto free_record;
> >>> +     }
> >>> +     head_ref = new_head_ref;
> >>>
> >> There is a chance (not sure how big a chance) that head_ref is going to
> >> be freed twice.
> >>
> >> In the IS_ERR true path, head_ref still has the old value from before
> >> calling add_delayed_ref_head.
> >> However, in add_delayed_ref_head is is possible that head_ref is freed
> >> and replaced. If
> >> IS_ERR(new_head_ref) is true the code jumps to the end of the function
> >> where (the old) head_ref
> >> is freed again.
> > No, it's not possible to have a double free.
> > add_delayed_ref_head() never frees the given head_ref in case it
> > returns an error - it's the caller's responsibility to free it.
> >
> > Thanks.
> OK, but ... in add_delayed_ref_head() I see the following on line 881
>
>          existing = htree_insert(&delayed_refs->href_root,
>                                  &head_ref->href_node);
>          if (existing) {
>                  update_existing_head_ref(trans, existing, head_ref);
>                  /*
>                   * we've updated the existing ref, free the newly
>                   * allocated ref
>                   */
>                  kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
>                  head_ref = existing;
>
> It's calling kmem_cache_free with (the old) head_ref, which can be
> repeated in add_delayed_ref

No, it can't because after line 881, the one which calls
kmem_cache_free(), we never return an error pointer.

Thanks.

> >
> >> Is it perhaps possible to assign head_ref to the new value before doing
> >> the IS_ERR check?
> >> In other words, do this:
> >>           head_ref = new_head_ref;
> >>           if (IS_ERR(new_head_ref)) {
> >>                   spin_unlock(&delayed_refs->lock);
> >>                   ret = PTR_ERR(new_head_ref);
> >>                   goto free_record;
> >>           }
> >>
> >> --
> >> Kees
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-10-08 20:31 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-25 10:50 [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements fdmanana
2024-09-25 10:50 ` [PATCH 1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled fdmanana
2024-09-25 10:50 ` [PATCH 2/8] btrfs: use sector numbers as keys for the dirty extents xarray fdmanana
2024-09-25 22:25   ` Qu Wenruo
2024-09-25 10:50 ` [PATCH 3/8] btrfs: end assignment with semicolon at btrfs_qgroup_extent event class fdmanana
2024-09-25 10:50 ` [PATCH 4/8] btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_record fdmanana
2024-09-25 10:50 ` [PATCH 5/8] btrfs: store fs_info in a local variable at btrfs_qgroup_trace_extent_post() fdmanana
2024-09-25 10:50 ` [PATCH 6/8] btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_extent() fdmanana
2024-09-25 10:50 ` [PATCH 7/8] btrfs: always use delayed_refs local variable " fdmanana
2024-09-25 10:50 ` [PATCH 8/8] btrfs: remove pointless initialization " fdmanana
2024-09-25 22:35 ` [PATCH 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements Qu Wenruo
2024-09-26  9:33 ` [PATCH v2 " fdmanana
2024-09-26  9:33   ` [PATCH v2 1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled fdmanana
2024-10-08 19:39     ` Kees Bakker
2024-10-08 20:03       ` Filipe Manana
2024-10-08 20:20         ` Kees Bakker
2024-10-08 20:30           ` Filipe Manana
2024-09-26  9:33   ` [PATCH v2 2/8] btrfs: use sector numbers as keys for the dirty extents xarray fdmanana
2024-09-26  9:33   ` [PATCH v2 3/8] btrfs: end assignment with semicolon at btrfs_qgroup_extent event class fdmanana
2024-09-26  9:33   ` [PATCH v2 4/8] btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_record fdmanana
2024-09-26  9:33   ` [PATCH v2 5/8] btrfs: store fs_info in a local variable at btrfs_qgroup_trace_extent_post() fdmanana
2024-09-26  9:33   ` [PATCH v2 6/8] btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_extent() fdmanana
2024-09-26  9:33   ` [PATCH v2 7/8] btrfs: always use delayed_refs local variable " fdmanana
2024-09-26  9:33   ` [PATCH v2 8/8] btrfs: remove pointless initialization " fdmanana
2024-09-26  9:55   ` [PATCH v2 0/8] btrfs: delayed refs and qgroups, fixes, cleanups, improvements Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).