* [PATCH 01/12] btrfs: fix missing error handling when searching for inode refs during log replay
2025-06-24 14:42 [PATCH 00/12] btrfs: log tree fixes and cleanups fdmanana
@ 2025-06-24 14:42 ` fdmanana
2025-06-25 9:45 ` Johannes Thumshirn
2025-06-24 14:42 ` [PATCH 02/12] btrfs: fix iteration of extrefs " fdmanana
` (11 subsequent siblings)
12 siblings, 1 reply; 34+ messages in thread
From: fdmanana @ 2025-06-24 14:42 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
During log replay, at __add_inode_ref(), when we are searching for inode
ref keys we totally ignore if btrfs_search_slot() returns an error. This
may make a log replay succeed when there was an actual error and leave
some metadata inconsistency in a subvolume tree. Fix this by checking if
an error was returned from btrfs_search_slot() and if so, return it to
the caller.
Fixes: e02119d5a7b4 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 839252881138..08948803c857 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -1073,7 +1073,9 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
search_key.type = BTRFS_INODE_REF_KEY;
search_key.offset = parent_objectid;
ret = btrfs_search_slot(NULL, root, &search_key, path, 0, 0);
- if (ret == 0) {
+ if (ret < 0) {
+ return ret;
+ } else if (ret == 0) {
struct btrfs_inode_ref *victim_ref;
unsigned long ptr;
unsigned long ptr_end;
--
2.47.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 01/12] btrfs: fix missing error handling when searching for inode refs during log replay
2025-06-24 14:42 ` [PATCH 01/12] btrfs: fix missing error handling when searching for inode refs during log replay fdmanana
@ 2025-06-25 9:45 ` Johannes Thumshirn
2025-06-25 10:58 ` Filipe Manana
0 siblings, 1 reply; 34+ messages in thread
From: Johannes Thumshirn @ 2025-06-25 9:45 UTC (permalink / raw)
To: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
On 24.06.25 16:56, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> During log replay, at __add_inode_ref(), when we are searching for inode
> ref keys we totally ignore if btrfs_search_slot() returns an error. This
> may make a log replay succeed when there was an actual error and leave
> some metadata inconsistency in a subvolume tree. Fix this by checking if
> an error was returned from btrfs_search_slot() and if so, return it to
> the caller.
>
> Fixes: e02119d5a7b4 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> fs/btrfs/tree-log.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 839252881138..08948803c857 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -1073,7 +1073,9 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
> search_key.type = BTRFS_INODE_REF_KEY;
> search_key.offset = parent_objectid;
> ret = btrfs_search_slot(NULL, root, &search_key, path, 0, 0);
> - if (ret == 0) {
> + if (ret < 0) {
> + return ret;
> + } else if (ret == 0) {
Technically the else is not needed after the return.
Otherwise:
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 01/12] btrfs: fix missing error handling when searching for inode refs during log replay
2025-06-25 9:45 ` Johannes Thumshirn
@ 2025-06-25 10:58 ` Filipe Manana
0 siblings, 0 replies; 34+ messages in thread
From: Filipe Manana @ 2025-06-25 10:58 UTC (permalink / raw)
To: Johannes Thumshirn; +Cc: linux-btrfs@vger.kernel.org
On Wed, Jun 25, 2025 at 10:45 AM Johannes Thumshirn
<Johannes.Thumshirn@wdc.com> wrote:
>
> On 24.06.25 16:56, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > During log replay, at __add_inode_ref(), when we are searching for inode
> > ref keys we totally ignore if btrfs_search_slot() returns an error. This
> > may make a log replay succeed when there was an actual error and leave
> > some metadata inconsistency in a subvolume tree. Fix this by checking if
> > an error was returned from btrfs_search_slot() and if so, return it to
> > the caller.
> >
> > Fixes: e02119d5a7b4 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> > fs/btrfs/tree-log.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > index 839252881138..08948803c857 100644
> > --- a/fs/btrfs/tree-log.c
> > +++ b/fs/btrfs/tree-log.c
> > @@ -1073,7 +1073,9 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
> > search_key.type = BTRFS_INODE_REF_KEY;
> > search_key.offset = parent_objectid;
> > ret = btrfs_search_slot(NULL, root, &search_key, path, 0, 0);
> > - if (ret == 0) {
> > + if (ret < 0) {
> > + return ret;
> > + } else if (ret == 0) {
>
> Technically the else is not needed after the return.
Yes, but I don't think we have any rule that dictates to not use the else.
Just a matter of personal preference.
Thanks.
>
> Otherwise:
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 02/12] btrfs: fix iteration of extrefs during log replay
2025-06-24 14:42 [PATCH 00/12] btrfs: log tree fixes and cleanups fdmanana
2025-06-24 14:42 ` [PATCH 01/12] btrfs: fix missing error handling when searching for inode refs during log replay fdmanana
@ 2025-06-24 14:42 ` fdmanana
2025-06-25 10:30 ` Johannes Thumshirn
2025-06-27 10:51 ` Qu Wenruo
2025-06-24 14:42 ` [PATCH 03/12] btrfs: fix inode lookup error handling " fdmanana
` (10 subsequent siblings)
12 siblings, 2 replies; 34+ messages in thread
From: fdmanana @ 2025-06-24 14:42 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
At __inode_add_ref() when processing extrefs, if we jump into the next
label he have an undefined value of victim_name.len, since we haven't
initialized it before we did the goto. This results in an invalid memory
access in the next iteration of the loop since victim_name.len was not
initialized to the length of the name of the current extref.
Fix this by initializing victim_name.len with the current extref's name
length.
Fixes: e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 08948803c857..e862deaf27da 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -1146,13 +1146,13 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
struct fscrypt_str victim_name;
extref = (struct btrfs_inode_extref *)(base + cur_offset);
+ victim_name.len = btrfs_inode_extref_name_len(leaf, extref);
if (btrfs_inode_extref_parent(leaf, extref) != parent_objectid)
goto next;
ret = read_alloc_one_name(leaf, &extref->name,
- btrfs_inode_extref_name_len(leaf, extref),
- &victim_name);
+ victim_name.len, &victim_name);
if (ret)
return ret;
--
2.47.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 02/12] btrfs: fix iteration of extrefs during log replay
2025-06-24 14:42 ` [PATCH 02/12] btrfs: fix iteration of extrefs " fdmanana
@ 2025-06-25 10:30 ` Johannes Thumshirn
2025-06-27 10:51 ` Qu Wenruo
1 sibling, 0 replies; 34+ messages in thread
From: Johannes Thumshirn @ 2025-06-25 10:30 UTC (permalink / raw)
To: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 02/12] btrfs: fix iteration of extrefs during log replay
2025-06-24 14:42 ` [PATCH 02/12] btrfs: fix iteration of extrefs " fdmanana
2025-06-25 10:30 ` Johannes Thumshirn
@ 2025-06-27 10:51 ` Qu Wenruo
2025-06-27 12:09 ` Filipe Manana
1 sibling, 1 reply; 34+ messages in thread
From: Qu Wenruo @ 2025-06-27 10:51 UTC (permalink / raw)
To: fdmanana, linux-btrfs
在 2025/6/25 00:12, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> At __inode_add_ref() when processing extrefs, if we jump into the next
> label he have an undefined value of victim_name.len, since we haven't
^^ he?
Otherwise looks good to me.
I'll give a reviewed-by to the cover-letter to avoid bombarding the
mailing list.
Thanks,
Qu
> initialized it before we did the goto. This results in an invalid memory
> access in the next iteration of the loop since victim_name.len was not
> initialized to the length of the name of the current extref.
>
> Fix this by initializing victim_name.len with the current extref's name
> length.
>
> Fixes: e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> fs/btrfs/tree-log.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 08948803c857..e862deaf27da 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -1146,13 +1146,13 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
> struct fscrypt_str victim_name;
>
> extref = (struct btrfs_inode_extref *)(base + cur_offset);
> + victim_name.len = btrfs_inode_extref_name_len(leaf, extref);
>
> if (btrfs_inode_extref_parent(leaf, extref) != parent_objectid)
> goto next;
>
> ret = read_alloc_one_name(leaf, &extref->name,
> - btrfs_inode_extref_name_len(leaf, extref),
> - &victim_name);
> + victim_name.len, &victim_name);
> if (ret)
> return ret;
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 02/12] btrfs: fix iteration of extrefs during log replay
2025-06-27 10:51 ` Qu Wenruo
@ 2025-06-27 12:09 ` Filipe Manana
0 siblings, 0 replies; 34+ messages in thread
From: Filipe Manana @ 2025-06-27 12:09 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Fri, Jun 27, 2025 at 11:51 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> 在 2025/6/25 00:12, fdmanana@kernel.org 写道:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > At __inode_add_ref() when processing extrefs, if we jump into the next
> > label he have an undefined value of victim_name.len, since we haven't
>
> ^^ he?
Ah, it was supposed to be "we". Will fix that up, thanks.
>
> Otherwise looks good to me.
>
> I'll give a reviewed-by to the cover-letter to avoid bombarding the
> mailing list.
>
> Thanks,
> Qu
>
> > initialized it before we did the goto. This results in an invalid memory
> > access in the next iteration of the loop since victim_name.len was not
> > initialized to the length of the name of the current extref.
> >
> > Fix this by initializing victim_name.len with the current extref's name
> > length.
> >
> > Fixes: e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> > fs/btrfs/tree-log.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > index 08948803c857..e862deaf27da 100644
> > --- a/fs/btrfs/tree-log.c
> > +++ b/fs/btrfs/tree-log.c
> > @@ -1146,13 +1146,13 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
> > struct fscrypt_str victim_name;
> >
> > extref = (struct btrfs_inode_extref *)(base + cur_offset);
> > + victim_name.len = btrfs_inode_extref_name_len(leaf, extref);
> >
> > if (btrfs_inode_extref_parent(leaf, extref) != parent_objectid)
> > goto next;
> >
> > ret = read_alloc_one_name(leaf, &extref->name,
> > - btrfs_inode_extref_name_len(leaf, extref),
> > - &victim_name);
> > + victim_name.len, &victim_name);
> > if (ret)
> > return ret;
> >
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 03/12] btrfs: fix inode lookup error handling during log replay
2025-06-24 14:42 [PATCH 00/12] btrfs: log tree fixes and cleanups fdmanana
2025-06-24 14:42 ` [PATCH 01/12] btrfs: fix missing error handling when searching for inode refs during log replay fdmanana
2025-06-24 14:42 ` [PATCH 02/12] btrfs: fix iteration of extrefs " fdmanana
@ 2025-06-24 14:42 ` fdmanana
2025-06-25 10:34 ` Johannes Thumshirn
2025-06-24 14:42 ` [PATCH 04/12] btrfs: record new subvolume in parent dir earlier to avoid dir logging races fdmanana
` (9 subsequent siblings)
12 siblings, 1 reply; 34+ messages in thread
From: fdmanana @ 2025-06-24 14:42 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
When replay log trees we use read_one_inode() to get an inode, which is
just a wrapper around btrfs_iget_logging(), which in turn is wrapper for
btrfs_iget(). But read_one_inode() always returns NULL for any error
that btrfs_iget_logging() / btrfs_iget() may return and this is a problem
because:
1) In many callers of read_one_inode() we convert the NULL into -EIO,
which is not accurate since btrfs_iget() may return -ENOMEM and -ENOENT
for example, besides -EIO and other errors. So during log replay we
may end up reporting a false -EIO, which is confusing since we may
not have had any IO error at all;
2) When replaying directory deletes, at replay_dir_deletes(), we assume
the NULL returned from read_one_inode() means that the inode doesn't
exist and then proceed as if no error had happened. This is wrong
because unless btrfs_iget() returned ERR_PTR(-ENOENT), we had an
actual error and the target inode may exist in the target subvolume
root - this may later result in the log replay code failing at a
later stage (if we are "lucky") or succeed but leaving some
inconsistency in the filesystem.
So fix this by not ignoring errors from btrfs_iget_logging() and as
a consequence remove the read_one_inode() wrapper and just use
btrfs_iget_logging() directly. Also since btrfs_iget_logging() is
supposed to be called only against subvolume roots, just like
read_one_inode() which had a comment about it, add an assertion to
btrfs_iget_logging() to check that the target root corresponds to a
subvolume root.
Fixes: 5d4f98a28c7d ("Btrfs: Mixed back reference (FORWARD ROLLING FORMAT CHANGE)")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 127 +++++++++++++++++++++-----------------------
1 file changed, 62 insertions(+), 65 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index e862deaf27da..95b89ed9fd6c 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -143,6 +143,9 @@ static struct btrfs_inode *btrfs_iget_logging(u64 objectid, struct btrfs_root *r
unsigned int nofs_flag;
struct btrfs_inode *inode;
+ /* Only meant to be called for subvolume roots and not for log roots. */
+ ASSERT(is_fstree(btrfs_root_id(root)));
+
/*
* We're holding a transaction handle whether we are logging or
* replaying a log tree, so we must make sure NOFS semantics apply
@@ -604,21 +607,6 @@ static int read_alloc_one_name(struct extent_buffer *eb, void *start, int len,
return 0;
}
-/*
- * simple helper to read an inode off the disk from a given root
- * This can only be called for subvolume roots and not for the log
- */
-static noinline struct btrfs_inode *read_one_inode(struct btrfs_root *root,
- u64 objectid)
-{
- struct btrfs_inode *inode;
-
- inode = btrfs_iget_logging(objectid, root);
- if (IS_ERR(inode))
- return NULL;
- return inode;
-}
-
/* replays a single extent in 'eb' at 'slot' with 'key' into the
* subvolume 'root'. path is released on entry and should be released
* on exit.
@@ -674,9 +662,9 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
return -EUCLEAN;
}
- inode = read_one_inode(root, key->objectid);
- if (!inode)
- return -EIO;
+ inode = btrfs_iget_logging(key->objectid, root);
+ if (IS_ERR(inode))
+ return PTR_ERR(inode);
/*
* first check to see if we already have this extent in the
@@ -948,9 +936,10 @@ static noinline int drop_one_dir_item(struct btrfs_trans_handle *trans,
btrfs_release_path(path);
- inode = read_one_inode(root, location.objectid);
- if (!inode) {
- ret = -EIO;
+ inode = btrfs_iget_logging(location.objectid, root);
+ if (IS_ERR(inode)) {
+ ret = PTR_ERR(inode);
+ inode = NULL;
goto out;
}
@@ -1167,10 +1156,10 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
kfree(victim_name.name);
return ret;
} else if (!ret) {
- ret = -ENOENT;
- victim_parent = read_one_inode(root,
- parent_objectid);
- if (victim_parent) {
+ victim_parent = btrfs_iget_logging(parent_objectid, root);
+ if (IS_ERR(victim_parent)) {
+ ret = PTR_ERR(victim_parent);
+ } else {
inc_nlink(&inode->vfs_inode);
btrfs_release_path(path);
@@ -1315,9 +1304,9 @@ static int unlink_old_inode_refs(struct btrfs_trans_handle *trans,
struct btrfs_inode *dir;
btrfs_release_path(path);
- dir = read_one_inode(root, parent_id);
- if (!dir) {
- ret = -ENOENT;
+ dir = btrfs_iget_logging(parent_id, root);
+ if (IS_ERR(dir)) {
+ ret = PTR_ERR(dir);
kfree(name.name);
goto out;
}
@@ -1389,15 +1378,17 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
* copy the back ref in. The link count fixup code will take
* care of the rest
*/
- dir = read_one_inode(root, parent_objectid);
- if (!dir) {
- ret = -ENOENT;
+ dir = btrfs_iget_logging(parent_objectid, root);
+ if (IS_ERR(dir)) {
+ ret = PTR_ERR(dir);
+ dir = NULL;
goto out;
}
- inode = read_one_inode(root, inode_objectid);
- if (!inode) {
- ret = -EIO;
+ inode = btrfs_iget_logging(inode_objectid, root);
+ if (IS_ERR(inode)) {
+ ret = PTR_ERR(inode);
+ inode = NULL;
goto out;
}
@@ -1409,11 +1400,13 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
* parent object can change from one array
* item to another.
*/
- if (!dir)
- dir = read_one_inode(root, parent_objectid);
if (!dir) {
- ret = -ENOENT;
- goto out;
+ dir = btrfs_iget_logging(parent_objectid, root);
+ if (IS_ERR(dir)) {
+ ret = PTR_ERR(dir);
+ dir = NULL;
+ goto out;
+ }
}
} else {
ret = ref_get_fields(eb, ref_ptr, &name, &ref_index);
@@ -1681,9 +1674,9 @@ static noinline int fixup_inode_link_counts(struct btrfs_trans_handle *trans,
break;
btrfs_release_path(path);
- inode = read_one_inode(root, key.offset);
- if (!inode) {
- ret = -EIO;
+ inode = btrfs_iget_logging(key.offset, root);
+ if (IS_ERR(inode)) {
+ ret = PTR_ERR(inode);
break;
}
@@ -1719,9 +1712,9 @@ static noinline int link_to_fixup_dir(struct btrfs_trans_handle *trans,
struct btrfs_inode *inode;
struct inode *vfs_inode;
- inode = read_one_inode(root, objectid);
- if (!inode)
- return -EIO;
+ inode = btrfs_iget_logging(objectid, root);
+ if (IS_ERR(inode))
+ return PTR_ERR(inode);
vfs_inode = &inode->vfs_inode;
key.objectid = BTRFS_TREE_LOG_FIXUP_OBJECTID;
@@ -1760,14 +1753,14 @@ static noinline int insert_one_name(struct btrfs_trans_handle *trans,
struct btrfs_inode *dir;
int ret;
- inode = read_one_inode(root, location->objectid);
- if (!inode)
- return -ENOENT;
+ inode = btrfs_iget_logging(location->objectid, root);
+ if (IS_ERR(inode))
+ return PTR_ERR(inode);
- dir = read_one_inode(root, dirid);
- if (!dir) {
+ dir = btrfs_iget_logging(dirid, root);
+ if (IS_ERR(dir)) {
iput(&inode->vfs_inode);
- return -EIO;
+ return PTR_ERR(dir);
}
ret = btrfs_add_link(trans, dir, inode, name, 1, index);
@@ -1844,9 +1837,9 @@ static noinline int replay_one_name(struct btrfs_trans_handle *trans,
bool update_size = true;
bool name_added = false;
- dir = read_one_inode(root, key->objectid);
- if (!dir)
- return -EIO;
+ dir = btrfs_iget_logging(key->objectid, root);
+ if (IS_ERR(dir))
+ return PTR_ERR(dir);
ret = read_alloc_one_name(eb, di + 1, btrfs_dir_name_len(eb, di), &name);
if (ret)
@@ -2146,9 +2139,10 @@ static noinline int check_item_in_log(struct btrfs_trans_handle *trans,
btrfs_dir_item_key_to_cpu(eb, di, &location);
btrfs_release_path(path);
btrfs_release_path(log_path);
- inode = read_one_inode(root, location.objectid);
- if (!inode) {
- ret = -EIO;
+ inode = btrfs_iget_logging(location.objectid, root);
+ if (IS_ERR(inode)) {
+ ret = PTR_ERR(inode);
+ inode = NULL;
goto out;
}
@@ -2300,14 +2294,17 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans,
if (!log_path)
return -ENOMEM;
- dir = read_one_inode(root, dirid);
- /* it isn't an error if the inode isn't there, that can happen
- * because we replay the deletes before we copy in the inode item
- * from the log
+ dir = btrfs_iget_logging(dirid, root);
+ /*
+ * It isn't an error if the inode isn't there, that can happen because
+ * we replay the deletes before we copy in the inode item from the log.
*/
- if (!dir) {
+ if (IS_ERR(dir)) {
btrfs_free_path(log_path);
- return 0;
+ ret = PTR_ERR(dir);
+ if (ret == -ENOENT)
+ ret = 0;
+ return ret;
}
range_start = 0;
@@ -2466,9 +2463,9 @@ static int replay_one_buffer(struct btrfs_root *log, struct extent_buffer *eb,
struct btrfs_inode *inode;
u64 from;
- inode = read_one_inode(root, key.objectid);
- if (!inode) {
- ret = -EIO;
+ inode = btrfs_iget_logging(key.objectid, root);
+ if (IS_ERR(inode)) {
+ ret = PTR_ERR(inode);
break;
}
from = ALIGN(i_size_read(&inode->vfs_inode),
--
2.47.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 04/12] btrfs: record new subvolume in parent dir earlier to avoid dir logging races
2025-06-24 14:42 [PATCH 00/12] btrfs: log tree fixes and cleanups fdmanana
` (2 preceding siblings ...)
2025-06-24 14:42 ` [PATCH 03/12] btrfs: fix inode lookup error handling " fdmanana
@ 2025-06-24 14:42 ` fdmanana
2025-06-25 10:39 ` Johannes Thumshirn
2025-06-24 14:42 ` [PATCH 05/12] btrfs: propagate last_unlink_trans earlier when doing a rmdir fdmanana
` (8 subsequent siblings)
12 siblings, 1 reply; 34+ messages in thread
From: fdmanana @ 2025-06-24 14:42 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
Instead of recording that a new subvolume was created in a directory after
we add the entry do the directory, record it before adding the entry. This
is to avoid races where after creating the entry and before recording the
new subvolume in the directory (the call to btrfs_record_new_subvolume()),
another task logs the directory, so we end up with a log tree where we
logged a directory that has an entry pointing to a root that was not yet
committed, resulting in an invalid entry if the log is persisted and
replayed later due to a power failure or crash.
Also state this requirement in the function comment for
btrfs_record_new_subvolume(), similar to what we do for the
btrfs_record_unlink_dir() and btrfs_record_snapshot_destroy().
Fixes: 45c4102f0d82 ("btrfs: avoid transaction commit on any fsync after subvolume creation")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/ioctl.c | 4 ++--
fs/btrfs/tree-log.c | 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5068f1fa86f6..aa8cefadf423 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -666,14 +666,14 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
goto out;
}
+ btrfs_record_new_subvolume(trans, BTRFS_I(dir));
+
ret = btrfs_create_new_inode(trans, &new_inode_args);
if (ret) {
btrfs_abort_transaction(trans, ret);
goto out;
}
- btrfs_record_new_subvolume(trans, BTRFS_I(dir));
-
d_instantiate_new(dentry, new_inode_args.inode);
new_inode_args.inode = NULL;
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 95b89ed9fd6c..7ccf513d7ec8 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -7454,6 +7454,8 @@ void btrfs_record_snapshot_destroy(struct btrfs_trans_handle *trans,
* full log sync.
* Also we don't need to worry with renames, since btrfs_rename() marks the log
* for full commit when renaming a subvolume.
+ *
+ * Must be called before creating the subvolume entry in its parent directory.
*/
void btrfs_record_new_subvolume(const struct btrfs_trans_handle *trans,
struct btrfs_inode *dir)
--
2.47.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 05/12] btrfs: propagate last_unlink_trans earlier when doing a rmdir
2025-06-24 14:42 [PATCH 00/12] btrfs: log tree fixes and cleanups fdmanana
` (3 preceding siblings ...)
2025-06-24 14:42 ` [PATCH 04/12] btrfs: record new subvolume in parent dir earlier to avoid dir logging races fdmanana
@ 2025-06-24 14:42 ` fdmanana
2025-06-25 10:46 ` Johannes Thumshirn
2025-06-24 14:42 ` [PATCH 06/12] btrfs: use btrfs_record_snapshot_destroy() during rmdir fdmanana
` (7 subsequent siblings)
12 siblings, 1 reply; 34+ messages in thread
From: fdmanana @ 2025-06-24 14:42 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
In case the removed directory had a snapshot that was deleted, we are
propagating its inode's last_unlink_trans to the parent directory after
we removed the entry from the parent directory. This leaves a small race
window where someone can log the parent directory after we removed the
entry and before we updated last_unlink_trans, and as a result if we ever
try to replay such a log tree, we will fail since we will attempt to
remove a snapshot during log replay, which is currently not possible and
results in the log replay (and mount) to fail. This is the type of failure
described in commit 1ec9a1ae1e30 ("Btrfs: fix unreplayable log after
snapshot delete + parent dir fsync").
So fix this by propagating the last_unlink_trans to the parent directory
before we remove the entry from it.
Fixes: 44f714dae50a ("Btrfs: improve performance on fsync against new inode after rename/unlink")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/inode.c | 36 ++++++++++++++++++------------------
1 file changed, 18 insertions(+), 18 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 80c72c594b19..252271cbde28 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4707,7 +4707,6 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
int ret = 0;
struct btrfs_trans_handle *trans;
- u64 last_unlink_trans;
struct fscrypt_name fname;
if (inode->i_size > BTRFS_EMPTY_DIR_SIZE)
@@ -4733,6 +4732,23 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
goto out_notrans;
}
+ /*
+ * Propagate the last_unlink_trans value of the deleted dir to its
+ * parent directory. This is to prevent an unrecoverable log tree in the
+ * case we do something like this:
+ * 1) create dir foo
+ * 2) create snapshot under dir foo
+ * 3) delete the snapshot
+ * 4) rmdir foo
+ * 5) mkdir foo
+ * 6) fsync foo or some file inside foo
+ *
+ * This is because we can't unlink other roots when replaying the dir
+ * deletes for directory foo.
+ */
+ if (BTRFS_I(inode)->last_unlink_trans >= trans->transid)
+ BTRFS_I(dir)->last_unlink_trans = BTRFS_I(inode)->last_unlink_trans;
+
if (unlikely(btrfs_ino(BTRFS_I(inode)) == BTRFS_EMPTY_SUBVOL_DIR_OBJECTID)) {
ret = btrfs_unlink_subvol(trans, BTRFS_I(dir), dentry);
goto out;
@@ -4742,27 +4758,11 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
if (ret)
goto out;
- last_unlink_trans = BTRFS_I(inode)->last_unlink_trans;
-
/* now the directory is empty */
ret = btrfs_unlink_inode(trans, BTRFS_I(dir), BTRFS_I(d_inode(dentry)),
&fname.disk_name);
- if (!ret) {
+ if (!ret)
btrfs_i_size_write(BTRFS_I(inode), 0);
- /*
- * Propagate the last_unlink_trans value of the deleted dir to
- * its parent directory. This is to prevent an unrecoverable
- * log tree in the case we do something like this:
- * 1) create dir foo
- * 2) create snapshot under dir foo
- * 3) delete the snapshot
- * 4) rmdir foo
- * 5) mkdir foo
- * 6) fsync foo or some file inside foo
- */
- if (last_unlink_trans >= trans->transid)
- BTRFS_I(dir)->last_unlink_trans = last_unlink_trans;
- }
out:
btrfs_end_transaction(trans);
out_notrans:
--
2.47.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 06/12] btrfs: use btrfs_record_snapshot_destroy() during rmdir
2025-06-24 14:42 [PATCH 00/12] btrfs: log tree fixes and cleanups fdmanana
` (4 preceding siblings ...)
2025-06-24 14:42 ` [PATCH 05/12] btrfs: propagate last_unlink_trans earlier when doing a rmdir fdmanana
@ 2025-06-24 14:42 ` fdmanana
2025-06-25 10:47 ` Johannes Thumshirn
2025-06-24 14:42 ` [PATCH 07/12] btrfs: use inode already stored in local variable at btrfs_rmdir() fdmanana
` (6 subsequent siblings)
12 siblings, 1 reply; 34+ messages in thread
From: fdmanana @ 2025-06-24 14:42 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
We are setting the parent directory's last_unlink_trans directly which
may result in a concurrent task starting to log the directory not see the
update and therefore can log the directory after we removed a child
directory which had a snapshot within instead of falling back to a
transaction commit. Replaying such a log tree would result in a mount
failure since we can't currently delete snapshots (and subvolumes) during
log replay. This is the type of failure described in commit 1ec9a1ae1e30
("Btrfs: fix unreplayable log after snapshot delete + parent dir fsync").
Fix this by using btrfs_record_snapshot_destroy() which updates the
last_unlink_trans field while holding the inode's log_mutex lock.
Fixes: 44f714dae50a ("Btrfs: improve performance on fsync against new inode after rename/unlink")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 252271cbde28..12141348236d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4747,7 +4747,7 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
* deletes for directory foo.
*/
if (BTRFS_I(inode)->last_unlink_trans >= trans->transid)
- BTRFS_I(dir)->last_unlink_trans = BTRFS_I(inode)->last_unlink_trans;
+ btrfs_record_snapshot_destroy(trans, BTRFS_I(dir));
if (unlikely(btrfs_ino(BTRFS_I(inode)) == BTRFS_EMPTY_SUBVOL_DIR_OBJECTID)) {
ret = btrfs_unlink_subvol(trans, BTRFS_I(dir), dentry);
--
2.47.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 06/12] btrfs: use btrfs_record_snapshot_destroy() during rmdir
2025-06-24 14:42 ` [PATCH 06/12] btrfs: use btrfs_record_snapshot_destroy() during rmdir fdmanana
@ 2025-06-25 10:47 ` Johannes Thumshirn
2025-06-25 10:55 ` Filipe Manana
0 siblings, 1 reply; 34+ messages in thread
From: Johannes Thumshirn @ 2025-06-25 10:47 UTC (permalink / raw)
To: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
Shouldn't this be merged into 5/12?
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 06/12] btrfs: use btrfs_record_snapshot_destroy() during rmdir
2025-06-25 10:47 ` Johannes Thumshirn
@ 2025-06-25 10:55 ` Filipe Manana
2025-06-27 10:11 ` Johannes Thumshirn
0 siblings, 1 reply; 34+ messages in thread
From: Filipe Manana @ 2025-06-25 10:55 UTC (permalink / raw)
To: Johannes Thumshirn; +Cc: linux-btrfs@vger.kernel.org
On Wed, Jun 25, 2025 at 11:48 AM Johannes Thumshirn
<Johannes.Thumshirn@wdc.com> wrote:
>
> Shouldn't this be merged into 5/12?
These are two different behavioural changes, so I prefer to keep them
separate and with dedicate change logs.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 06/12] btrfs: use btrfs_record_snapshot_destroy() during rmdir
2025-06-25 10:55 ` Filipe Manana
@ 2025-06-27 10:11 ` Johannes Thumshirn
0 siblings, 0 replies; 34+ messages in thread
From: Johannes Thumshirn @ 2025-06-27 10:11 UTC (permalink / raw)
To: Filipe Manana; +Cc: linux-btrfs@vger.kernel.org
On 25.06.25 12:56, Filipe Manana wrote:
> On Wed, Jun 25, 2025 at 11:48 AM Johannes Thumshirn
> <Johannes.Thumshirn@wdc.com> wrote:
>>
>> Shouldn't this be merged into 5/12?
>
> These are two different behavioural changes, so I prefer to keep them
> separate and with dedicate change logs.
>
OK fine with me then,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 07/12] btrfs: use inode already stored in local variable at btrfs_rmdir()
2025-06-24 14:42 [PATCH 00/12] btrfs: log tree fixes and cleanups fdmanana
` (5 preceding siblings ...)
2025-06-24 14:42 ` [PATCH 06/12] btrfs: use btrfs_record_snapshot_destroy() during rmdir fdmanana
@ 2025-06-24 14:42 ` fdmanana
2025-06-25 10:52 ` Johannes Thumshirn
2025-06-27 10:10 ` Johannes Thumshirn
2025-06-24 14:42 ` [PATCH 08/12] btrfs: use btrfs inodes in btrfs_rmdir() to avoid so much usage of BTRFS_I() fdmanana
` (5 subsequent siblings)
12 siblings, 2 replies; 34+ messages in thread
From: fdmanana @ 2025-06-24 14:42 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
There's no need to call d_inode(dentry) when calling btrfs_unlink_inode()
since we have already stored that in a local inode variable. So just use
the local variable to make the code less verbose.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/inode.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 12141348236d..f8e4651fe60b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4759,8 +4759,7 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
goto out;
/* now the directory is empty */
- ret = btrfs_unlink_inode(trans, BTRFS_I(dir), BTRFS_I(d_inode(dentry)),
- &fname.disk_name);
+ ret = btrfs_unlink_inode(trans, BTRFS_I(dir), BTRFS_I(inode), &fname.disk_name);
if (!ret)
btrfs_i_size_write(BTRFS_I(inode), 0);
out:
--
2.47.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 07/12] btrfs: use inode already stored in local variable at btrfs_rmdir()
2025-06-24 14:42 ` [PATCH 07/12] btrfs: use inode already stored in local variable at btrfs_rmdir() fdmanana
@ 2025-06-25 10:52 ` Johannes Thumshirn
2025-06-25 10:54 ` Filipe Manana
2025-06-27 10:10 ` Johannes Thumshirn
1 sibling, 1 reply; 34+ messages in thread
From: Johannes Thumshirn @ 2025-06-25 10:52 UTC (permalink / raw)
To: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
On 24.06.25 16:55, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> There's no need to call d_inode(dentry) when calling btrfs_unlink_inode()
> since we have already stored that in a local inode variable. So just use
> the local variable to make the code less verbose.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> fs/btrfs/inode.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 12141348236d..f8e4651fe60b 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -4759,8 +4759,7 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
> goto out;
>
> /* now the directory is empty */
> - ret = btrfs_unlink_inode(trans, BTRFS_I(dir), BTRFS_I(d_inode(dentry)),
> - &fname.disk_name);
> + ret = btrfs_unlink_inode(trans, BTRFS_I(dir), BTRFS_I(inode), &fname.disk_name);
> if (!ret)
> btrfs_i_size_write(BTRFS_I(inode), 0);
> out:
I'd personally add a local
'struct btrfs_inode *binode = BTRFS_I(inode);'
to the function as it can be used multiple time, this sparing us
multiple calls to BTRFS_I() (which is cheap I know) and improves code
readability.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 07/12] btrfs: use inode already stored in local variable at btrfs_rmdir()
2025-06-25 10:52 ` Johannes Thumshirn
@ 2025-06-25 10:54 ` Filipe Manana
0 siblings, 0 replies; 34+ messages in thread
From: Filipe Manana @ 2025-06-25 10:54 UTC (permalink / raw)
To: Johannes Thumshirn; +Cc: linux-btrfs@vger.kernel.org
On Wed, Jun 25, 2025 at 11:52 AM Johannes Thumshirn
<Johannes.Thumshirn@wdc.com> wrote:
>
> On 24.06.25 16:55, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > There's no need to call d_inode(dentry) when calling btrfs_unlink_inode()
> > since we have already stored that in a local inode variable. So just use
> > the local variable to make the code less verbose.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> > fs/btrfs/inode.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 12141348236d..f8e4651fe60b 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -4759,8 +4759,7 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
> > goto out;
> >
> > /* now the directory is empty */
> > - ret = btrfs_unlink_inode(trans, BTRFS_I(dir), BTRFS_I(d_inode(dentry)),
> > - &fname.disk_name);
> > + ret = btrfs_unlink_inode(trans, BTRFS_I(dir), BTRFS_I(inode), &fname.disk_name);
> > if (!ret)
> > btrfs_i_size_write(BTRFS_I(inode), 0);
> > out:
>
>
> I'd personally add a local
> 'struct btrfs_inode *binode = BTRFS_I(inode);'
> to the function as it can be used multiple time, this sparing us
> multiple calls to BTRFS_I() (which is cheap I know) and improves code
> readability.
Which is what one of the next patches does...
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 07/12] btrfs: use inode already stored in local variable at btrfs_rmdir()
2025-06-24 14:42 ` [PATCH 07/12] btrfs: use inode already stored in local variable at btrfs_rmdir() fdmanana
2025-06-25 10:52 ` Johannes Thumshirn
@ 2025-06-27 10:10 ` Johannes Thumshirn
1 sibling, 0 replies; 34+ messages in thread
From: Johannes Thumshirn @ 2025-06-27 10:10 UTC (permalink / raw)
To: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 08/12] btrfs: use btrfs inodes in btrfs_rmdir() to avoid so much usage of BTRFS_I()
2025-06-24 14:42 [PATCH 00/12] btrfs: log tree fixes and cleanups fdmanana
` (6 preceding siblings ...)
2025-06-24 14:42 ` [PATCH 07/12] btrfs: use inode already stored in local variable at btrfs_rmdir() fdmanana
@ 2025-06-24 14:42 ` fdmanana
2025-06-25 10:54 ` Johannes Thumshirn
2025-06-24 14:42 ` [PATCH 09/12] btrfs: split inode ref processing from __add_inode_ref() into a helper fdmanana
` (4 subsequent siblings)
12 siblings, 1 reply; 34+ messages in thread
From: fdmanana @ 2025-06-24 14:42 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
Almost everywhere we want to use a btrfs inode and therefore we have a
lot of calls to BTRFS_I(), making the code more verbose. Instead use btrfs
inode local variables to avoid so much use of BTRFS_I().
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/inode.c | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f8e4651fe60b..6df7ef1b869b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4701,32 +4701,33 @@ int btrfs_delete_subvolume(struct btrfs_inode *dir, struct dentry *dentry)
return ret;
}
-static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
+static int btrfs_rmdir(struct inode *vfs_dir, struct dentry *dentry)
{
- struct inode *inode = d_inode(dentry);
- struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+ struct btrfs_inode *dir = BTRFS_I(vfs_dir);
+ struct btrfs_inode *inode = BTRFS_I(d_inode(dentry));
+ struct btrfs_fs_info *fs_info = inode->root->fs_info;
int ret = 0;
struct btrfs_trans_handle *trans;
struct fscrypt_name fname;
- if (inode->i_size > BTRFS_EMPTY_DIR_SIZE)
+ if (inode->vfs_inode.i_size > BTRFS_EMPTY_DIR_SIZE)
return -ENOTEMPTY;
- if (btrfs_ino(BTRFS_I(inode)) == BTRFS_FIRST_FREE_OBJECTID) {
+ if (btrfs_ino(inode) == BTRFS_FIRST_FREE_OBJECTID) {
if (unlikely(btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))) {
btrfs_err(fs_info,
"extent tree v2 doesn't support snapshot deletion yet");
return -EOPNOTSUPP;
}
- return btrfs_delete_subvolume(BTRFS_I(dir), dentry);
+ return btrfs_delete_subvolume(dir, dentry);
}
- ret = fscrypt_setup_filename(dir, &dentry->d_name, 1, &fname);
+ ret = fscrypt_setup_filename(vfs_dir, &dentry->d_name, 1, &fname);
if (ret)
return ret;
/* This needs to handle no-key deletions later on */
- trans = __unlink_start_trans(BTRFS_I(dir));
+ trans = __unlink_start_trans(dir);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
goto out_notrans;
@@ -4746,22 +4747,22 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
* This is because we can't unlink other roots when replaying the dir
* deletes for directory foo.
*/
- if (BTRFS_I(inode)->last_unlink_trans >= trans->transid)
- btrfs_record_snapshot_destroy(trans, BTRFS_I(dir));
+ if (inode->last_unlink_trans >= trans->transid)
+ btrfs_record_snapshot_destroy(trans, dir);
- if (unlikely(btrfs_ino(BTRFS_I(inode)) == BTRFS_EMPTY_SUBVOL_DIR_OBJECTID)) {
- ret = btrfs_unlink_subvol(trans, BTRFS_I(dir), dentry);
+ if (unlikely(btrfs_ino(inode) == BTRFS_EMPTY_SUBVOL_DIR_OBJECTID)) {
+ ret = btrfs_unlink_subvol(trans, dir, dentry);
goto out;
}
- ret = btrfs_orphan_add(trans, BTRFS_I(inode));
+ ret = btrfs_orphan_add(trans, inode);
if (ret)
goto out;
/* now the directory is empty */
- ret = btrfs_unlink_inode(trans, BTRFS_I(dir), BTRFS_I(inode), &fname.disk_name);
+ ret = btrfs_unlink_inode(trans, dir, inode, &fname.disk_name);
if (!ret)
- btrfs_i_size_write(BTRFS_I(inode), 0);
+ btrfs_i_size_write(inode, 0);
out:
btrfs_end_transaction(trans);
out_notrans:
--
2.47.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 09/12] btrfs: split inode ref processing from __add_inode_ref() into a helper
2025-06-24 14:42 [PATCH 00/12] btrfs: log tree fixes and cleanups fdmanana
` (7 preceding siblings ...)
2025-06-24 14:42 ` [PATCH 08/12] btrfs: use btrfs inodes in btrfs_rmdir() to avoid so much usage of BTRFS_I() fdmanana
@ 2025-06-24 14:42 ` fdmanana
2025-06-25 11:07 ` Johannes Thumshirn
2025-06-24 14:42 ` [PATCH 10/12] btrfs: split inode rextef " fdmanana
` (3 subsequent siblings)
12 siblings, 1 reply; 34+ messages in thread
From: fdmanana @ 2025-06-24 14:42 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
The __add_inode_ref() function is quite big and with too much nesting, so
move the code that processes inode refs into a helper function, to make
the function easier to read and reduce the level of indentation too.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 106 ++++++++++++++++++++++++++------------------
1 file changed, 62 insertions(+), 44 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 7ccf513d7ec8..648e1705c1c4 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -1041,6 +1041,59 @@ static noinline int backref_in_log(struct btrfs_root *log,
return ret;
}
+static int unlink_refs_not_in_log(struct btrfs_trans_handle *trans,
+ struct btrfs_path *path,
+ struct btrfs_root *log_root,
+ struct btrfs_key *search_key,
+ struct btrfs_inode *dir,
+ struct btrfs_inode *inode,
+ u64 parent_objectid)
+{
+ struct extent_buffer *leaf = path->nodes[0];
+ unsigned long ptr;
+ unsigned long ptr_end;
+
+ /*
+ * Check all the names in this back reference to see if thet are in the
+ * log. If so, we allow them to stay otherwise they must be unlinked as
+ * a conflict.
+ */
+ ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
+ ptr_end = ptr + btrfs_item_size(leaf, path->slots[0]);
+ while (ptr < ptr_end) {
+ struct fscrypt_str victim_name;
+ struct btrfs_inode_ref *victim_ref;
+ int ret;
+
+ victim_ref = (struct btrfs_inode_ref *)ptr;
+ ret = read_alloc_one_name(leaf, (victim_ref + 1),
+ btrfs_inode_ref_name_len(leaf, victim_ref),
+ &victim_name);
+ if (ret)
+ return ret;
+
+ ret = backref_in_log(log_root, search_key, parent_objectid, &victim_name);
+ if (ret) {
+ kfree(victim_name.name);
+ if (ret < 0)
+ return ret;
+ ptr = (unsigned long)(victim_ref + 1) + victim_name.len;
+ continue;
+ }
+
+ inc_nlink(&inode->vfs_inode);
+ btrfs_release_path(path);
+
+ ret = unlink_inode_for_log_replay(trans, dir, inode, &victim_name);
+ kfree(victim_name.name);
+ if (ret)
+ return ret;
+ return -EAGAIN;
+ }
+
+ return 0;
+}
+
static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct btrfs_path *path,
@@ -1065,54 +1118,19 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
if (ret < 0) {
return ret;
} else if (ret == 0) {
- struct btrfs_inode_ref *victim_ref;
- unsigned long ptr;
- unsigned long ptr_end;
-
- leaf = path->nodes[0];
-
- /* are we trying to overwrite a back ref for the root directory
- * if so, just jump out, we're done
+ /*
+ * Are we trying to overwrite a back ref for the root directory?
+ * If so, we're done.
*/
if (search_key.objectid == search_key.offset)
return 1;
- /* check all the names in this back reference to see
- * if they are in the log. if so, we allow them to stay
- * otherwise they must be unlinked as a conflict
- */
- ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
- ptr_end = ptr + btrfs_item_size(leaf, path->slots[0]);
- while (ptr < ptr_end) {
- struct fscrypt_str victim_name;
-
- victim_ref = (struct btrfs_inode_ref *)ptr;
- ret = read_alloc_one_name(leaf, (victim_ref + 1),
- btrfs_inode_ref_name_len(leaf, victim_ref),
- &victim_name);
- if (ret)
- return ret;
-
- ret = backref_in_log(log_root, &search_key,
- parent_objectid, &victim_name);
- if (ret < 0) {
- kfree(victim_name.name);
- return ret;
- } else if (!ret) {
- inc_nlink(&inode->vfs_inode);
- btrfs_release_path(path);
-
- ret = unlink_inode_for_log_replay(trans, dir, inode,
- &victim_name);
- kfree(victim_name.name);
- if (ret)
- return ret;
- goto again;
- }
- kfree(victim_name.name);
-
- ptr = (unsigned long)(victim_ref + 1) + victim_name.len;
- }
+ ret = unlink_refs_not_in_log(trans, path, log_root, &search_key,
+ dir, inode, parent_objectid);
+ if (ret == -EAGAIN)
+ goto again;
+ else if (ret)
+ return ret;
}
btrfs_release_path(path);
--
2.47.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 10/12] btrfs: split inode rextef processing from __add_inode_ref() into a helper
2025-06-24 14:42 [PATCH 00/12] btrfs: log tree fixes and cleanups fdmanana
` (8 preceding siblings ...)
2025-06-24 14:42 ` [PATCH 09/12] btrfs: split inode ref processing from __add_inode_ref() into a helper fdmanana
@ 2025-06-24 14:42 ` fdmanana
2025-06-25 11:35 ` Johannes Thumshirn
2025-06-24 14:42 ` [PATCH 11/12] btrfs: add btrfs prefix to is_fsstree() and make it return bool fdmanana
` (2 subsequent siblings)
12 siblings, 1 reply; 34+ messages in thread
From: fdmanana @ 2025-06-24 14:42 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
The __add_inode_ref() function is quite big and with too much nesting, so
move the code that processes inode extrefs into a helper function, to make
the function easier to read and reduce the level of indentation too.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 131 +++++++++++++++++++++++++-------------------
1 file changed, 74 insertions(+), 57 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 648e1705c1c4..82664bb79d36 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -1094,6 +1094,73 @@ static int unlink_refs_not_in_log(struct btrfs_trans_handle *trans,
return 0;
}
+static int unlink_extrefs_not_in_log(struct btrfs_trans_handle *trans,
+ struct btrfs_path *path,
+ struct btrfs_root *root,
+ struct btrfs_root *log_root,
+ struct btrfs_key *search_key,
+ struct btrfs_inode *inode,
+ u64 inode_objectid,
+ u64 parent_objectid)
+{
+ struct extent_buffer *leaf = path->nodes[0];
+ const unsigned long base = btrfs_item_ptr_offset(leaf, path->slots[0]);
+ const u32 item_size = btrfs_item_size(leaf, path->slots[0]);
+ u32 cur_offset = 0;
+
+ while (cur_offset < item_size) {
+ struct btrfs_inode_extref *extref;
+ struct btrfs_inode *victim_parent;
+ struct fscrypt_str victim_name;
+ int ret;
+
+ extref = (struct btrfs_inode_extref *)(base + cur_offset);
+ victim_name.len = btrfs_inode_extref_name_len(leaf, extref);
+
+ if (btrfs_inode_extref_parent(leaf, extref) != parent_objectid)
+ goto next;
+
+ ret = read_alloc_one_name(leaf, &extref->name, victim_name.len,
+ &victim_name);
+ if (ret)
+ return ret;
+
+ search_key->objectid = inode_objectid;
+ search_key->type = BTRFS_INODE_EXTREF_KEY;
+ search_key->offset = btrfs_extref_hash(parent_objectid,
+ victim_name.name,
+ victim_name.len);
+ ret = backref_in_log(log_root, search_key, parent_objectid, &victim_name);
+ if (ret) {
+ kfree(victim_name.name);
+ if (ret < 0)
+ return ret;
+next:
+ cur_offset += victim_name.len + sizeof(*extref);
+ continue;
+ }
+
+ victim_parent = btrfs_iget_logging(parent_objectid, root);
+ if (IS_ERR(victim_parent)) {
+ kfree(victim_name.name);
+ return PTR_ERR(victim_parent);
+ }
+
+ inc_nlink(&inode->vfs_inode);
+ btrfs_release_path(path);
+
+ ret = unlink_inode_for_log_replay(trans, victim_parent, inode,
+ &victim_name);
+ iput(&victim_parent->vfs_inode);
+ kfree(victim_name.name);
+ if (ret)
+ return ret;
+ return -EAGAIN;
+ }
+
+ return 0;
+}
+
static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct btrfs_path *path,
@@ -1104,7 +1171,6 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
u64 ref_index, struct fscrypt_str *name)
{
int ret;
- struct extent_buffer *leaf;
struct btrfs_dir_item *di;
struct btrfs_key search_key;
struct btrfs_inode_extref *extref;
@@ -1139,62 +1205,13 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
if (IS_ERR(extref)) {
return PTR_ERR(extref);
} else if (extref) {
- u32 item_size;
- u32 cur_offset = 0;
- unsigned long base;
- struct btrfs_inode *victim_parent;
-
- leaf = path->nodes[0];
-
- item_size = btrfs_item_size(leaf, path->slots[0]);
- base = btrfs_item_ptr_offset(leaf, path->slots[0]);
-
- while (cur_offset < item_size) {
- struct fscrypt_str victim_name;
-
- extref = (struct btrfs_inode_extref *)(base + cur_offset);
- victim_name.len = btrfs_inode_extref_name_len(leaf, extref);
-
- if (btrfs_inode_extref_parent(leaf, extref) != parent_objectid)
- goto next;
-
- ret = read_alloc_one_name(leaf, &extref->name,
- victim_name.len, &victim_name);
- if (ret)
- return ret;
-
- search_key.objectid = inode_objectid;
- search_key.type = BTRFS_INODE_EXTREF_KEY;
- search_key.offset = btrfs_extref_hash(parent_objectid,
- victim_name.name,
- victim_name.len);
- ret = backref_in_log(log_root, &search_key,
- parent_objectid, &victim_name);
- if (ret < 0) {
- kfree(victim_name.name);
- return ret;
- } else if (!ret) {
- victim_parent = btrfs_iget_logging(parent_objectid, root);
- if (IS_ERR(victim_parent)) {
- ret = PTR_ERR(victim_parent);
- } else {
- inc_nlink(&inode->vfs_inode);
- btrfs_release_path(path);
-
- ret = unlink_inode_for_log_replay(trans,
- victim_parent,
- inode, &victim_name);
- iput(&victim_parent->vfs_inode);
- }
- kfree(victim_name.name);
- if (ret)
- return ret;
- goto again;
- }
- kfree(victim_name.name);
-next:
- cur_offset += victim_name.len + sizeof(*extref);
- }
+ ret = unlink_extrefs_not_in_log(trans, path, root, log_root,
+ &search_key, inode,
+ inode_objectid, parent_objectid);
+ if (ret == -EAGAIN)
+ goto again;
+ else if (ret)
+ return ret;
}
btrfs_release_path(path);
--
2.47.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 10/12] btrfs: split inode rextef processing from __add_inode_ref() into a helper
2025-06-24 14:42 ` [PATCH 10/12] btrfs: split inode rextef " fdmanana
@ 2025-06-25 11:35 ` Johannes Thumshirn
2025-06-27 10:09 ` David Sterba
0 siblings, 1 reply; 34+ messages in thread
From: Johannes Thumshirn @ 2025-06-25 11:35 UTC (permalink / raw)
To: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
On 24.06.25 16:56, fdmanana@kernel.org wrote:
> +static int unlink_extrefs_not_in_log(struct btrfs_trans_handle *trans,
> + struct btrfs_path *path,
> + struct btrfs_root *root,
> + struct btrfs_root *log_root,
> + struct btrfs_key *search_key,
> + struct btrfs_inode *inode,
> + u64 inode_objectid,
> + u64 parent_objectid)
> +{
Again personal preference I guess, but unlink_extrefs_not_in_log() has 8
arguments. The previous patch introduced unlink_refs_not_in_log() which
has 7.
Of them trans, path, log_root, search_key, inode and parent_objectid are
shared.
I'd suggest making a 'struct unlink_not_in_log_ctx' that can be passed
into unlink_extrefs_not_in_log() and unlink_refs_not_in_log() reducing
the number of parameters.
Otherwise the change itself looks fine,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 10/12] btrfs: split inode rextef processing from __add_inode_ref() into a helper
2025-06-25 11:35 ` Johannes Thumshirn
@ 2025-06-27 10:09 ` David Sterba
0 siblings, 0 replies; 34+ messages in thread
From: David Sterba @ 2025-06-27 10:09 UTC (permalink / raw)
To: Johannes Thumshirn; +Cc: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
On Wed, Jun 25, 2025 at 11:35:04AM +0000, Johannes Thumshirn wrote:
> On 24.06.25 16:56, fdmanana@kernel.org wrote:
> > +static int unlink_extrefs_not_in_log(struct btrfs_trans_handle *trans,
> > + struct btrfs_path *path,
> > + struct btrfs_root *root,
> > + struct btrfs_root *log_root,
> > + struct btrfs_key *search_key,
> > + struct btrfs_inode *inode,
> > + u64 inode_objectid,
> > + u64 parent_objectid)
> > +{
>
>
> Again personal preference I guess, but unlink_extrefs_not_in_log() has 8
> arguments. The previous patch introduced unlink_refs_not_in_log() which
> has 7.
>
> Of them trans, path, log_root, search_key, inode and parent_objectid are
> shared.
>
> I'd suggest making a 'struct unlink_not_in_log_ctx' that can be passed
> into unlink_extrefs_not_in_log() and unlink_refs_not_in_log() reducing
> the number of parameters.
This makes sense when the set of parameters is passed accross several
functions, it's not much of a difference when it's just one call,
arguably it's worse as passing it in a structre is one more indirection.
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 11/12] btrfs: add btrfs prefix to is_fsstree() and make it return bool
2025-06-24 14:42 [PATCH 00/12] btrfs: log tree fixes and cleanups fdmanana
` (9 preceding siblings ...)
2025-06-24 14:42 ` [PATCH 10/12] btrfs: split inode rextef " fdmanana
@ 2025-06-24 14:42 ` fdmanana
2025-06-25 11:35 ` Johannes Thumshirn
2025-06-24 14:42 ` [PATCH 12/12] btrfs: split btrfs_is_fsstree() into multiple if statements for readability fdmanana
2025-06-27 11:04 ` [PATCH 00/12] btrfs: log tree fixes and cleanups Qu Wenruo
12 siblings, 1 reply; 34+ messages in thread
From: fdmanana @ 2025-06-24 14:42 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
This is an exported function and therefore it should have a 'btrfs_'
prefix, to make it clear it's btrfs specific, avoid future name collisions
with code outside btrfs, and make its naming consistent with most other
btrfs exported functions.
So add a 'btrfs_' prefix to it and make it return bool instead of int,
since all we need is to return true or false.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/ctree.h | 6 +++---
fs/btrfs/delayed-ref.c | 10 +++++-----
fs/btrfs/disk-io.c | 8 ++++----
fs/btrfs/extent-tree.c | 6 +++---
fs/btrfs/extent_map.c | 6 +++---
fs/btrfs/ioctl.c | 4 ++--
fs/btrfs/qgroup.c | 25 +++++++++++++------------
fs/btrfs/relocation.c | 2 +-
fs/btrfs/tree-checker.c | 12 ++++++------
fs/btrfs/tree-log.c | 2 +-
10 files changed, 41 insertions(+), 40 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8a54a0b6e502..d51cc692f2c5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -721,13 +721,13 @@ static inline int btrfs_next_item(struct btrfs_root *root, struct btrfs_path *p)
}
int btrfs_leaf_free_space(const struct extent_buffer *leaf);
-static inline int is_fstree(u64 rootid)
+static inline bool btrfs_is_fstree(u64 rootid)
{
if (rootid == BTRFS_FS_TREE_OBJECTID ||
((s64)rootid >= (s64)BTRFS_FIRST_FREE_OBJECTID &&
!btrfs_qgroup_level(rootid)))
- return 1;
- return 0;
+ return true;
+ return false;
}
static inline bool btrfs_is_data_reloc_root(const struct btrfs_root *root)
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 739c9e29aaa3..ca382c5b186f 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -928,7 +928,7 @@ static void init_delayed_ref_common(struct btrfs_fs_info *fs_info,
if (action == BTRFS_ADD_DELAYED_EXTENT)
action = BTRFS_ADD_DELAYED_REF;
- if (is_fstree(generic_ref->ref_root))
+ if (btrfs_is_fstree(generic_ref->ref_root))
seq = atomic64_read(&fs_info->tree_mod_seq);
refcount_set(&ref->refs, 1);
@@ -958,8 +958,8 @@ void btrfs_init_tree_ref(struct btrfs_ref *generic_ref, int level, u64 mod_root,
#endif
generic_ref->tree_ref.level = level;
generic_ref->type = BTRFS_REF_METADATA;
- if (skip_qgroup || !(is_fstree(generic_ref->ref_root) &&
- (!mod_root || is_fstree(mod_root))))
+ if (skip_qgroup || !(btrfs_is_fstree(generic_ref->ref_root) &&
+ (!mod_root || btrfs_is_fstree(mod_root))))
generic_ref->skip_qgroup = true;
else
generic_ref->skip_qgroup = false;
@@ -976,8 +976,8 @@ void btrfs_init_data_ref(struct btrfs_ref *generic_ref, u64 ino, u64 offset,
generic_ref->data_ref.objectid = ino;
generic_ref->data_ref.offset = offset;
generic_ref->type = BTRFS_REF_DATA;
- if (skip_qgroup || !(is_fstree(generic_ref->ref_root) &&
- (!mod_root || is_fstree(mod_root))))
+ if (skip_qgroup || !(btrfs_is_fstree(generic_ref->ref_root) &&
+ (!mod_root || btrfs_is_fstree(mod_root))))
generic_ref->skip_qgroup = true;
else
generic_ref->skip_qgroup = false;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ee3cdd7035cc..859e49b57edf 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -884,7 +884,7 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
btrfs_set_root_used(&root->root_item, leaf->len);
btrfs_set_root_last_snapshot(&root->root_item, 0);
btrfs_set_root_dirid(&root->root_item, 0);
- if (is_fstree(objectid))
+ if (btrfs_is_fstree(objectid))
generate_random_guid(root->root_item.uuid);
else
export_guid(root->root_item.uuid, &guid_null);
@@ -1104,7 +1104,7 @@ static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
if (btrfs_root_id(root) != BTRFS_TREE_LOG_OBJECTID &&
!btrfs_is_data_reloc_root(root) &&
- is_fstree(btrfs_root_id(root))) {
+ btrfs_is_fstree(btrfs_root_id(root))) {
set_bit(BTRFS_ROOT_SHAREABLE, &root->state);
btrfs_check_and_init_root_item(&root->root_item);
}
@@ -1113,7 +1113,7 @@ static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
* Don't assign anonymous block device to roots that are not exposed to
* userspace, the id pool is limited to 1M
*/
- if (is_fstree(btrfs_root_id(root)) &&
+ if (btrfs_is_fstree(btrfs_root_id(root)) &&
btrfs_root_refs(&root->root_item) > 0) {
if (!anon_dev) {
ret = get_anon_bdev(&root->anon_dev);
@@ -1315,7 +1315,7 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
* This is namely for free-space-tree and quota tree, which can change
* at runtime and should only be grabbed from fs_info.
*/
- if (!is_fstree(objectid) && objectid != BTRFS_DATA_RELOC_TREE_OBJECTID)
+ if (!btrfs_is_fstree(objectid) && objectid != BTRFS_DATA_RELOC_TREE_OBJECTID)
return ERR_PTR(-ENOENT);
again:
root = btrfs_lookup_fs_root(fs_info, objectid);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 10f50c725313..85833bf216de 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1545,7 +1545,7 @@ static void free_head_ref_squota_rsv(struct btrfs_fs_info *fs_info,
* where it has already been unset.
*/
if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_SIMPLE ||
- !href->is_data || !is_fstree(root))
+ !href->is_data || !btrfs_is_fstree(root))
return;
btrfs_qgroup_free_refroot(fs_info, root, href->reserved_bytes,
@@ -4963,7 +4963,7 @@ int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
ASSERT(generic_ref.ref_root != BTRFS_TREE_LOG_OBJECTID);
- if (btrfs_is_data_reloc_root(root) && is_fstree(root->relocation_src_root))
+ if (btrfs_is_data_reloc_root(root) && btrfs_is_fstree(root->relocation_src_root))
generic_ref.owning_root = root->relocation_src_root;
btrfs_init_data_ref(&generic_ref, owner, offset, 0, false);
@@ -5887,7 +5887,7 @@ static noinline int walk_up_proc(struct btrfs_trans_handle *trans,
return ret;
}
}
- if (is_fstree(btrfs_root_id(root))) {
+ if (btrfs_is_fstree(btrfs_root_id(root))) {
ret = btrfs_qgroup_trace_leaf_items(trans, eb);
if (ret) {
btrfs_err_rl(fs_info,
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 02bfdb976e40..57f52585a6dd 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -84,7 +84,7 @@ static void remove_em(struct btrfs_inode *inode, struct extent_map *em)
rb_erase(&em->rb_node, &inode->extent_tree.root);
RB_CLEAR_NODE(&em->rb_node);
- if (!btrfs_is_testing(fs_info) && is_fstree(btrfs_root_id(inode->root)))
+ if (!btrfs_is_testing(fs_info) && btrfs_is_fstree(btrfs_root_id(inode->root)))
percpu_counter_dec(&fs_info->evictable_extent_maps);
}
@@ -502,7 +502,7 @@ static int add_extent_mapping(struct btrfs_inode *inode,
setup_extent_mapping(inode, em, modified);
- if (!btrfs_is_testing(fs_info) && is_fstree(btrfs_root_id(root)))
+ if (!btrfs_is_testing(fs_info) && btrfs_is_fstree(btrfs_root_id(root)))
percpu_counter_inc(&fs_info->evictable_extent_maps);
return 0;
@@ -1337,7 +1337,7 @@ static void btrfs_extent_map_shrinker_worker(struct work_struct *work)
if (!root)
continue;
- if (is_fstree(btrfs_root_id(root)))
+ if (btrfs_is_fstree(btrfs_root_id(root)))
nr_dropped += btrfs_scan_root(root, &ctx);
btrfs_put_root(root);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index aa8cefadf423..c28db44cb5c4 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2889,7 +2889,7 @@ static long btrfs_ioctl_default_subvol(struct file *file, void __user *argp)
ret = PTR_ERR(new_root);
goto out;
}
- if (!is_fstree(btrfs_root_id(new_root))) {
+ if (!btrfs_is_fstree(btrfs_root_id(new_root))) {
ret = -ENOENT;
goto out_free;
}
@@ -3832,7 +3832,7 @@ static long btrfs_ioctl_qgroup_create(struct file *file, void __user *arg)
goto out;
}
- if (sa->create && is_fstree(sa->qgroupid)) {
+ if (sa->create && btrfs_is_fstree(sa->qgroupid)) {
ret = -EINVAL;
goto out;
}
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 7791000dcefc..e38272ac808d 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -482,7 +482,7 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info)
* during mount before we start doing things like creating
* subvolumes.
*/
- if (is_fstree(qgroup->qgroupid) &&
+ if (btrfs_is_fstree(qgroup->qgroupid) &&
qgroup->qgroupid > tree_root->free_objectid)
/*
* Don't need to check against BTRFS_LAST_FREE_OBJECTID,
@@ -1878,7 +1878,8 @@ int btrfs_qgroup_cleanup_dropped_subvolume(struct btrfs_fs_info *fs_info, u64 su
struct btrfs_trans_handle *trans;
int ret;
- if (!is_fstree(subvolid) || !btrfs_qgroup_enabled(fs_info) || !fs_info->quota_root)
+ if (!btrfs_is_fstree(subvolid) || !btrfs_qgroup_enabled(fs_info) ||
+ !fs_info->quota_root)
return 0;
/*
@@ -2932,7 +2933,7 @@ static int maybe_fs_roots(struct ulist *roots)
* trees.
* If it contains a non-fs tree, it won't be shared with fs/subvol trees.
*/
- return is_fstree(unode->val);
+ return btrfs_is_fstree(unode->val);
}
int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr,
@@ -3591,7 +3592,7 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes, bool enforce,
int ret = 0;
LIST_HEAD(qgroup_list);
- if (!is_fstree(ref_root))
+ if (!btrfs_is_fstree(ref_root))
return 0;
if (num_bytes == 0)
@@ -3651,7 +3652,7 @@ void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
struct btrfs_qgroup *qgroup;
LIST_HEAD(qgroup_list);
- if (!is_fstree(ref_root))
+ if (!btrfs_is_fstree(ref_root))
return;
if (num_bytes == 0)
@@ -4219,7 +4220,7 @@ static int qgroup_reserve_data(struct btrfs_inode *inode,
int ret;
if (btrfs_qgroup_mode(root->fs_info) == BTRFS_QGROUP_MODE_DISABLED ||
- !is_fstree(btrfs_root_id(root)) || len == 0)
+ !btrfs_is_fstree(btrfs_root_id(root)) || len == 0)
return 0;
/* @reserved parameter is mandatory for qgroup */
@@ -4472,7 +4473,7 @@ int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
int ret;
if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED ||
- !is_fstree(btrfs_root_id(root)) || num_bytes == 0)
+ !btrfs_is_fstree(btrfs_root_id(root)) || num_bytes == 0)
return 0;
BUG_ON(num_bytes != round_down(num_bytes, fs_info->nodesize));
@@ -4517,7 +4518,7 @@ void btrfs_qgroup_free_meta_all_pertrans(struct btrfs_root *root)
struct btrfs_fs_info *fs_info = root->fs_info;
if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED ||
- !is_fstree(btrfs_root_id(root)))
+ !btrfs_is_fstree(btrfs_root_id(root)))
return;
/* TODO: Update trace point to handle such free */
@@ -4533,7 +4534,7 @@ void __btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes,
struct btrfs_fs_info *fs_info = root->fs_info;
if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED ||
- !is_fstree(btrfs_root_id(root)))
+ !btrfs_is_fstree(btrfs_root_id(root)))
return;
/*
@@ -4592,7 +4593,7 @@ void btrfs_qgroup_convert_reserved_meta(struct btrfs_root *root, int num_bytes)
struct btrfs_fs_info *fs_info = root->fs_info;
if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED ||
- !is_fstree(btrfs_root_id(root)))
+ !btrfs_is_fstree(btrfs_root_id(root)))
return;
/* Same as btrfs_qgroup_free_meta_prealloc() */
num_bytes = sub_root_meta_rsv(root, num_bytes,
@@ -4818,7 +4819,7 @@ int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
if (!btrfs_qgroup_full_accounting(fs_info))
return 0;
- if (!is_fstree(btrfs_root_id(root)) || !root->reloc_root)
+ if (!btrfs_is_fstree(btrfs_root_id(root)) || !root->reloc_root)
return 0;
spin_lock(&blocks->lock);
@@ -4902,7 +4903,7 @@ int btrfs_record_squota_delta(struct btrfs_fs_info *fs_info,
if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_SIMPLE)
return 0;
- if (!is_fstree(root))
+ if (!btrfs_is_fstree(root))
return 0;
/* If the extent predates enabling quotas, don't count it. */
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index d7ec1d72821c..2670c0eb3cda 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2625,7 +2625,7 @@ int relocate_tree_blocks(struct btrfs_trans_handle *trans,
* tree.
*/
if (block->owner &&
- (!is_fstree(block->owner) ||
+ (!btrfs_is_fstree(block->owner) ||
block->owner == BTRFS_DATA_RELOC_TREE_OBJECTID)) {
ret = relocate_cowonly_block(trans, rc, block, path);
if (ret)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 8f4703b488b7..0f556f4de3f9 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -191,7 +191,7 @@ static bool check_prev_ino(struct extent_buffer *leaf,
* Only subvolume trees along with their reloc trees need this check.
* Things like log tree doesn't follow this ino requirement.
*/
- if (!is_fstree(btrfs_header_owner(leaf)))
+ if (!btrfs_is_fstree(btrfs_header_owner(leaf)))
return true;
if (key->objectid == prev_key->objectid)
@@ -475,7 +475,7 @@ static int check_root_key(struct extent_buffer *leaf, struct btrfs_key *key,
* to be COWed to be relocated.
*/
if (unlikely(is_root_item && key->objectid == BTRFS_TREE_RELOC_OBJECTID &&
- !is_fstree(key->offset))) {
+ !btrfs_is_fstree(key->offset))) {
generic_err(leaf, slot,
"invalid reloc tree for root %lld, root id is not a subvolume tree",
key->offset);
@@ -493,7 +493,7 @@ static int check_root_key(struct extent_buffer *leaf, struct btrfs_key *key,
}
/* DIR_ITEM/INDEX/INODE_REF is not allowed to point to non-fs trees */
- if (unlikely(!is_fstree(key->objectid) && !is_root_item)) {
+ if (unlikely(!btrfs_is_fstree(key->objectid) && !is_root_item)) {
dir_item_err(leaf, slot,
"invalid location key objectid, have %llu expect [%llu, %llu]",
key->objectid, BTRFS_FIRST_FREE_OBJECTID,
@@ -1311,7 +1311,7 @@ static bool is_valid_dref_root(u64 rootid)
* - tree root
* For v1 space cache
*/
- return is_fstree(rootid) || rootid == BTRFS_DATA_RELOC_TREE_OBJECTID ||
+ return btrfs_is_fstree(rootid) || rootid == BTRFS_DATA_RELOC_TREE_OBJECTID ||
rootid == BTRFS_ROOT_TREE_OBJECTID;
}
@@ -2167,7 +2167,7 @@ ALLOW_ERROR_INJECTION(btrfs_check_node, ERRNO);
int btrfs_check_eb_owner(const struct extent_buffer *eb, u64 root_owner)
{
- const bool is_subvol = is_fstree(root_owner);
+ const bool is_subvol = btrfs_is_fstree(root_owner);
const u64 eb_owner = btrfs_header_owner(eb);
/*
@@ -2209,7 +2209,7 @@ int btrfs_check_eb_owner(const struct extent_buffer *eb, u64 root_owner)
* For subvolume trees, owners can mismatch, but they should all belong
* to subvolume trees.
*/
- if (unlikely(is_subvol != is_fstree(eb_owner))) {
+ if (unlikely(is_subvol != btrfs_is_fstree(eb_owner))) {
btrfs_crit(eb->fs_info,
"corrupted %s, root=%llu block=%llu owner mismatch, have %llu expect [%llu, %llu]",
btrfs_header_level(eb) == 0 ? "leaf" : "node",
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 82664bb79d36..97ed11788b47 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -144,7 +144,7 @@ static struct btrfs_inode *btrfs_iget_logging(u64 objectid, struct btrfs_root *r
struct btrfs_inode *inode;
/* Only meant to be called for subvolume roots and not for log roots. */
- ASSERT(is_fstree(btrfs_root_id(root)));
+ ASSERT(btrfs_is_fstree(btrfs_root_id(root)));
/*
* We're holding a transaction handle whether we are logging or
--
2.47.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 12/12] btrfs: split btrfs_is_fsstree() into multiple if statements for readability
2025-06-24 14:42 [PATCH 00/12] btrfs: log tree fixes and cleanups fdmanana
` (10 preceding siblings ...)
2025-06-24 14:42 ` [PATCH 11/12] btrfs: add btrfs prefix to is_fsstree() and make it return bool fdmanana
@ 2025-06-24 14:42 ` fdmanana
2025-06-25 11:37 ` Johannes Thumshirn
2025-06-27 11:04 ` [PATCH 00/12] btrfs: log tree fixes and cleanups Qu Wenruo
12 siblings, 1 reply; 34+ messages in thread
From: fdmanana @ 2025-06-24 14:42 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
Instead of a single if statement with multiple conditions, split it into
several if statements testing only one condition at a time and return true
or false immediately after. This makes it more immediate to reason.
The module's text size also slightly decreases, at least with gcc 14.2.0
on x86_64.
Before:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename
1897138 161583 16136 2074857 1fa8e9 fs/btrfs/btrfs.ko
After:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename
1896976 161583 16136 2074695 1fa847 fs/btrfs/btrfs.ko
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/ctree.h | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index d51cc692f2c5..fe70b593c7cd 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -723,11 +723,16 @@ int btrfs_leaf_free_space(const struct extent_buffer *leaf);
static inline bool btrfs_is_fstree(u64 rootid)
{
- if (rootid == BTRFS_FS_TREE_OBJECTID ||
- ((s64)rootid >= (s64)BTRFS_FIRST_FREE_OBJECTID &&
- !btrfs_qgroup_level(rootid)))
+ if (rootid == BTRFS_FS_TREE_OBJECTID)
return true;
- return false;
+
+ if ((s64)rootid < (s64)BTRFS_FIRST_FREE_OBJECTID)
+ return false;
+
+ if (btrfs_qgroup_level(rootid) != 0)
+ return false;
+
+ return true;
}
static inline bool btrfs_is_data_reloc_root(const struct btrfs_root *root)
--
2.47.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 00/12] btrfs: log tree fixes and cleanups
2025-06-24 14:42 [PATCH 00/12] btrfs: log tree fixes and cleanups fdmanana
` (11 preceding siblings ...)
2025-06-24 14:42 ` [PATCH 12/12] btrfs: split btrfs_is_fsstree() into multiple if statements for readability fdmanana
@ 2025-06-27 11:04 ` Qu Wenruo
12 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2025-06-27 11:04 UTC (permalink / raw)
To: fdmanana, linux-btrfs
在 2025/6/25 00:12, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> Several bug fixes for logging and log replay, plus some cleanups.
> Details in the changelogs.
>
> Filipe Manana (12):
> btrfs: fix missing error handling when searching for inode refs during log replay
> btrfs: fix iteration of extrefs during log replay
> btrfs: fix inode lookup error handling during log replay
Patch 1~3 look good to me.
> btrfs: record new subvolume in parent dir earlier to avoid dir logging races
> btrfs: propagate last_unlink_trans earlier when doing a rmdir
> btrfs: use btrfs_record_snapshot_destroy() during rmdir
However I'm not confident enough for log-tree code, thus can not help much.
> btrfs: use inode already stored in local variable at btrfs_rmdir()
> btrfs: use btrfs inodes in btrfs_rmdir() to avoid so much usage of BTRFS_I()
> btrfs: split inode ref processing from __add_inode_ref() into a helper
> btrfs: split inode rextef processing from __add_inode_ref() into a helper
> btrfs: add btrfs prefix to is_fsstree() and make it return bool
> btrfs: split btrfs_is_fsstree() into multiple if statements for readability
The rest also look good to me.
So for patch 1~3 and 7~12:
Reviewed-by: Qu Wenruo <wqu@suse.com>
Thanks,
Qu
>
> fs/btrfs/ctree.h | 17 +-
> fs/btrfs/delayed-ref.c | 10 +-
> fs/btrfs/disk-io.c | 8 +-
> fs/btrfs/extent-tree.c | 6 +-
> fs/btrfs/extent_map.c | 6 +-
> fs/btrfs/inode.c | 64 +++----
> fs/btrfs/ioctl.c | 8 +-
> fs/btrfs/qgroup.c | 25 +--
> fs/btrfs/relocation.c | 2 +-
> fs/btrfs/tree-checker.c | 12 +-
> fs/btrfs/tree-log.c | 362 ++++++++++++++++++++++------------------
> 11 files changed, 281 insertions(+), 239 deletions(-)
>
^ permalink raw reply [flat|nested] 34+ messages in thread