From: Leo Martins <loemra.dev@gmail.com>
To: David Sterba <dsterba@suse.cz>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v2] btrfs: implement ref_tracker for delayed_nodes
Date: Mon, 4 Aug 2025 12:00:12 -0700 [thread overview]
Message-ID: <20250804190026.484954-1-loemra.dev@gmail.com> (raw)
In-Reply-To: <20250804135750.GL6704@twin.jikos.cz>
On Mon, 4 Aug 2025 15:57:50 +0200 David Sterba <dsterba@suse.cz> wrote:
> On Tue, Jul 29, 2025 at 12:08:04AM -0700, Leo Martins wrote:
> > This patch adds ref_tracker infrastructure for btrfs_delayed_node.
> >
> > A ref_tracker object is allocated every time the reference count is
> > increased and freed when the reference count is decreased. The
> > ref_tracker object contains stack_trace information about where the
> > ref count is increased and decreased. The ref_tracker_dir embedded in
> > btrfs_delayed_node keeps track of all current references, and some
> > previous references. This information allows for detection of reference
> > leaks or double frees.
> >
> > Here is a common example of taking a reference to a delayed_node and
> > freeing it with ref_tracker.
> >
> > ```C
> > struct ref_tracker *tracker;
> > struct btrfs_delayed_node *node;
> >
> > node = btrfs_get_delayed_node(inode, tracker);
> > // use delayed_node...
> > btrfs_release_delayed_node(node, tracker);
> > ```
> >
> > There are two special cases where the delayed_node reference is long
> > term, meaning that the thread that takes the reference and the thread
> > that frees the reference are different. The inode_cache which is the
> > btrfs_delayed_node reference stored in the associated btrfs_inode, and
> > the node_list which is the btrfs_delayed_node reference stored in the
> > btrfs_delayed_root node_list/prepare_list. In these cases the
> > ref_tracker is stored in the delayed_node itself.
> >
> > This patch introduces some new wrappers (btrfs_delayed_node_ref_tracker_*)
> > to ensure that when BTRFS_DEBUG is disabled everything gets compiled out.
> >
> > Signed-off-by: Leo Martins <loemra.dev@gmail.com>
>
> I'm still missing why we need to add this whole infrastructure, there
> weren't any recent bugs in delayed inode tracking. I understand what the
> ref tracker does but it's still increasing structure size and adds a
> perf hit that is acceptable for debugging kernels but still.
Our leading btrfs crash is a soft lockup related to delayed_nodes.
[21017.354271] [ C96] Call Trace:
[21017.354273] [ C96] <IRQ>
[21017.354278] [ C96] ? rcu_dump_cpu_stacks+0xa1/0xe0
[21017.354282] [ C96] ? print_cpu_stall+0x113/0x200
[21017.354284] [ C96] ? rcu_sched_clock_irq+0x633/0x730
[21017.354287] [ C96] ? update_process_times+0x66/0xa0
[21017.354288] [ C96] ? dma_map_page_attrs+0x250/0x250
[21017.354289] [ C96] ? tick_nohz_handler+0x84/0x1d0
[21017.354291] [ C96] ? hrtimer_interrupt+0x1a1/0x5b0
[21017.354293] [ C96] ? __sysvec_apic_timer_interrupt+0x44/0xe0
[21017.354295] [ C96] ? sysvec_apic_timer_interrupt+0x6b/0x80
[21017.354296] [ C96] </IRQ>
[21017.354297] [ C96] <TASK>
[21017.354297] [ C96] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[21017.354302] [ C96] ? xa_find+0xa/0xb0
[21017.354305] [ C96] btrfs_kill_all_delayed_nodes+0x6b/0x150
[21017.354307] [ C96] ? __switch_to+0x133/0x530
[21017.354308] [ C96] ? schedule+0xff4/0x1380
[21017.354310] [ C96] btrfs_clean_one_deleted_snapshot+0x70/0xe0
[21017.354313] [ C96] cleaner_kthread+0x83/0x120
[21017.354315] [ C96] ? exportfs_encode_inode_fh+0x60/0x60
[21017.354317] [ C96] kthread+0xae/0xe0
[21017.354319] [ C96] ? __efi_queue_work+0x130/0x130
[21017.354320] [ C96] ret_from_fork+0x2f/0x40
[21017.354322] [ C96] ? __efi_queue_work+0x130/0x130
[21017.354323] [ C96] ret_from_fork_asm+0x11/0x20
[21017.354326] [ C96] </TASK>
[21042.628340] [ C96] watchdog: BUG: soft lockup - CPU#96 stuck for 45s! [btrfs-cleaner:1438]
btrfs_kill_all_delayed_nodes is getting stuck in an infinite loop because
the root->delayed_nodes xarray has a delayed_node that never gets erased.
Inspecting the crash dump reveals that the delayed_node has a reference count
of one, indicating there is a reference leak.
I believe this bug was introduced or at least magnified somewhere between
6.9 and 6.11.
Let me know if you want anymore details about the crash.
>
> We have the ref-verify code and mount option, from what I've heard it
> was used sporadically when touching related code. We can let the ref
> tracking work like that too, not turned on by default ie. requiring a
> mount option.
The only issue with a mount option is that we would not be able to conditionally
compile ref_tracker_dir and ref_tracker pointers in struct btrfs_delayed_node,
permanently increasing the size of the struct from 304 bytes to 400 bytes.
Alternatively, there could be a delayed_node ref_tracker helper struct that we
could store a pointer to.
>
> In case you'd want to enhance more structures with ref tracker the mount
> option can enable them selectively.
>
> > @@ -106,6 +113,13 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node(
> > */
> > if (refcount_inc_not_zero(&node->refs)) {
> > refcount_inc(&node->refs);
> > +#ifdef CONFIG_BTRFS_DEBUG
> > + inode_cache_tracker = &node->inode_cache_tracker;
> > +#endif
>
> We try to avoid #ifdefs if it's a repeating pattern, this should be
> abstracted away in a helper.
Sounds good.
For the record, there would be no need for it if we used the typedef
approach :-). There is a good example of how the original
author of ref_tracker implemented it in include/net/net_trackers.h.
>
> > + btrfs_delayed_node_ref_tracker_alloc(node, tracker,
> > + GFP_ATOMIC);
> > + btrfs_delayed_node_ref_tracker_alloc(
> > + node, inode_cache_tracker, GFP_ATOMIC);
> > btrfs_inode->delayed_node = node;
> > } else {
> > node = NULL;
> > @@ -125,17 +139,19 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node(
> > *
> > * Return the delayed node, or error pointer on failure.
> > */
> > -static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node(
> > - struct btrfs_inode *btrfs_inode)
> > +static struct btrfs_delayed_node *
> > +btrfs_get_or_create_delayed_node(struct btrfs_inode *btrfs_inode,
> > + struct ref_tracker **tracker)
>
> And another pattern, please don't change the style of the function
> definition line, the return value is on the same line as name, if
> parameters don't fit under reasonable column limit like 90 chars then
> it's on the next line. Most of the code has been fixed so you can just
> extend it.
Got it, I was just using the default kernel .clang-format.
Is there a btrfs specific .clang-format I could use?
>
> > {
> > struct btrfs_delayed_node *node;
> > + struct ref_tracker **inode_cache_tracker = NULL;
> > struct btrfs_root *root = btrfs_inode->root;
> > u64 ino = btrfs_ino(btrfs_inode);
> > int ret;
> > void *ptr;
> >
>
> > --- a/fs/btrfs/delayed-inode.h
> > +++ b/fs/btrfs/delayed-inode.h
> > @@ -7,6 +7,7 @@
> > #ifndef BTRFS_DELAYED_INODE_H
> > #define BTRFS_DELAYED_INODE_H
> >
> > +#include <linux/ref_tracker.h>
> > #include <linux/types.h>
> > #include <linux/rbtree.h>
> > #include <linux/spinlock.h>
> > @@ -63,8 +64,19 @@ struct btrfs_delayed_node {
> > struct rb_root_cached del_root;
> > struct mutex mutex;
> > struct btrfs_inode_item inode_item;
> > - refcount_t refs;
> > +
> > int count;
> > + refcount_t refs;
> > +
> > +#ifdef CONFIG_BTRFS_DEBUG
> > + /* Used to track all references to this delayed node. */
> > + struct ref_tracker_dir ref_dir;
> > + /* Used to track delayed node reference stored in node list. */
> > + struct ref_tracker *node_list_tracker;
> > + /* Used to track delayed node reference stored in inode cache. */
> > + struct ref_tracker *inode_cache_tracker;
> > +#endif
>
> Optional and debugging structure members are best placed near the end.
> In this case it's related to the refs so this could be mentioned in
> comments as 'refs' should not be moved.
Ok.
>
> > +
> > u64 index_cnt;
> > unsigned long flags;
> > /*
Sent using hkml (https://github.com/sjp38/hackermail)
next prev parent reply other threads:[~2025-08-04 19:00 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-29 7:08 [PATCH v2] btrfs: implement ref_tracker for delayed_nodes Leo Martins
2025-08-04 13:57 ` David Sterba
2025-08-04 19:00 ` Leo Martins [this message]
2025-08-05 15:14 ` David Sterba
2025-08-05 18:07 ` Boris Burkov
2025-08-05 19:48 ` Leo Martins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250804190026.484954-1-loemra.dev@gmail.com \
--to=loemra.dev@gmail.com \
--cc=dsterba@suse.cz \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.