From: Leo Martins <loemra.dev@gmail.com>
To: David Sterba <dsterba@suse.cz>
Cc: Filipe Manana <fdmanana@kernel.org>,
linux-btrfs@vger.kernel.org, kernel-team@fb.com,
jlayton@kernel.org
Subject: Re: [PATCH] btrfs: implement ref_tracker for delayed_nodes
Date: Wed, 23 Jul 2025 10:44:26 -0700 [thread overview]
Message-ID: <20250723174434.3484810-1-loemra.dev@gmail.com> (raw)
In-Reply-To: <20250711103853.GC22472@twin.jikos.cz>
On Fri, 11 Jul 2025 12:38:53 +0200 David Sterba <dsterba@suse.cz> wrote:
> On Thu, Jul 10, 2025 at 12:54:34PM +0100, Filipe Manana wrote:
> > > @@ -63,10 +109,18 @@ struct btrfs_delayed_node {
> > > struct rb_root_cached del_root;
> > > struct mutex mutex;
> > > struct btrfs_inode_item inode_item;
> > > +
> > > refcount_t refs;
> > > - int count;
> > > + /* Used to track all references to this delayed node. */
> > > + btrfs_delayed_node_ref_tracker_dir ref_dir;
> > > + /* Used to track delayed node reference stored in node list. */
> > > + btrfs_delayed_node_ref_tracker node_list_tracker;
> > > + /* Used to track delayed node reference stored in inode cache. */
> > > + btrfs_delayed_node_ref_tracker inode_cache_tracker;
> > > +
> > > u64 index_cnt;
> > > unsigned long flags;
> > > + int count;
> >
> > Why are you moving the count field?
> > This increases the size of the struct even without the new config
> > option enabled and as a result we get less objects per page.
> > You are essentially reverting this commit:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a20725e1e7014642fc8ba4c7dd2c4ef6d4ec56a9
> >
> > Without any explanation about why.
Good catch, did not realize that moving the count field increased
struct size.
> >
> > I'm also not a big fan of the typedefs and would prefer to have
> > struct ref_tracker used directly in the structure surrounded by an
> > #ifdef block.
>
> Agreed, we use typedefs only for function callbacks, for variable types
> not anymore.
When I try and not use typedefs I end up using more conditional compilation
which I think hinders readability. Here's a snippet of my best attempt, let
me know if you see any improvements.
```C
struct btrfs_delayed_node {
...
#ifdef CONFIG_BTRFS_DEBUG
struct ref_tracker_dir ref_dir;
struct ref_tracker *node_list_tracker;
struct ref_tracker *inode_cache_tracker;
#endif
...
}
#ifdef
static inline void
btrfs_release_delayed_node(struct btrfs_delayed_node *node,
struct ref_tracker **tracker)
{
__btrfs_release_delayed_node(node, 0, tracker);
}
void btrfs_remove_delayed_node(struct btrfs_inode *inode)
{
struct btrfs_delayed_node *delayed_node;
delayed_node = READ_ONCE(inode->delayed_node);
if (!delayed_node)
return;
inode->delayed_node = NULL;
#ifdef CONFIG_BTRFS_DEBUG
btrfs_release_delayed_node(delayed_node,
&delayed_node->inode_cache_tracker);
#else
btrfs_release_delayed_node(delayed_node, NULL);
#endif
}
```
With this approach I end up needing to add conditional compilation
anywhere I use node_list_tracker or inode_cache_tracker. This isn't
too big of an issue since their usage is relatively limited.
However, I believe this would be an appropriate place for typedefs.
Alternatively, I could have two versions of btrfs_release_delayed_node
one that takes a struct ref_tracker ** and one that takes a struct {}.
However, this would mean doing the same for all the functions that
take a tracker, btrfs_get_delayed_node, btrfs_get_or_create_delayed_node,
... But, I feel like that is the same as manually unwinding the typedef.
Any guidance would be greatly appreciated! Also curious about the aversion
to typedefs outside of function callbacks?
>
> > I also don't like adding a new Kconfig option just for this... How
> > much slower would a build with CONFIG_BTRFS_DEBUG=y be?
> >
> > If that's really a lot slower, then perhaps a more generic config
> > option name in case we add similar tracking to other data structures
> > one day.
I gathered some data here using fsperf and got some mixed results, most
tests performed slightly worse, and some got much worse
(dbench60, emptyfiles500k). I think it would be acceptible to keep in
CONFIG_BTRFS_DEBUG, but I think it could definitely benefit from
its own option, CONFIG_BTRFS_REF_TRACKER.
>
> Right, we want to keep the config options at minimum and for debugging
> features it should be fine under BTRFS_DEBUG. If this needs to be
> configured for performance reasons then it can be a mount option, same
> as we have ref-verify or fragment already.
If we wanted to allow turning ref_tracker on via mount option we would
not be able to compile out the ref_tracker objects in delayed_node which
would cause a change in size from 304 bytes to 400 bytes.
baseline represents CONFIG_BTRFS_DEBUG=y without ref_tracker
current represents CONFIG_BTRFS_DEBUG=y with ref_tracker
** means 2*stdev worse
```
[root@localhost fsperf]# ./fsperf -p "baseline" -n 5
...
<enable ref_tracker>
[root@localhost fsperf]# ./fsperf -p "tracker" -n 5
...
[root@localhost fsperf]# ./fsperf-compare "baseline" "tracker"
['btrfs']
btrfs test results
bufferedappenddatasync results
metric baseline current stdev diff
==============================================================================
avg_commit_ms 1.91 1.85 0.13 -3.23%
bg_count 4.60 4.40 0.55 -4.35%
commits 20 21.20 2.12 6.00%
dev_read_iops 262187.60 262190.40 8.32 0.00%
dev_read_kbytes 1020.80 1040 42.93 1.88%
dev_write_iops 3129696.80 3126160 2909.20 -0.11%
dev_write_kbytes 43783448 43726868.80 46497.53 -0.13%
elapsed 301.60 300.60 9.29 -0.33%
end_state_mount_ns 46733159 44610208 3503084.15 -4.54%
end_state_umount_ns 2.77e+09 2.58e+09 2.17e+08 -6.65%
max_commit_ms 2.80 2.40 0.84 -14.29%
sys_cpu 61.80 61.69 0.54 -0.17%
write_bw_bytes 3572197 3584941.20 106296.35 0.36%
write_clat_ns_mean 29694.09 29299.92 4510.28 -1.33%
write_clat_ns_p50 28262.40 27955.20 4380.52 -1.09%
write_clat_ns_p99 48793.60 48281.60 7226.28 -1.05%
write_io_kbytes 1048576 1048576 0 0.00%
write_iops 872.12 875.23 25.95 0.36%
write_lat_ns_max 764860.40 634311 354679.35 -17.07%
write_lat_ns_mean 29981.52 29539.17 4518.22 -1.48%
write_lat_ns_min 21404.80 21060.40 315.78 -1.61%
bufferedrandwrite16g results
metric baseline current stdev diff
===============================================================================
avg_commit_ms 794.59 773.06 27.11 -2.71%
commits 83.20 79.80 2.05 -4.09%
dev_read_iops 658.60 521.20 335.31 -20.86%
dev_read_kbytes 9478.40 7334.40 5341.58 -22.62%
dev_write_iops 5801247.80 5721033.20 53643.07 -1.38%
dev_write_kbytes 44303422.40 43038694.40 853851.59 -2.85%
elapsed 360 347 21.94 -3.61%
end_state_mount_ns 58480927.20 59131105.80 873330.39 1.11%
end_state_umount_ns 9.10e+09 8.71e+09 3.41e+08 -4.33%
max_commit_ms 1624 1630.60 53.77 0.41%
sys_cpu 35.52 35.88 1.25 1.00%
write_bw_bytes 47983181.20 49673467 2727170.05 3.52%
write_clat_ns_mean 83559.03 80805.49 5123.88 -3.30%
write_clat_ns_p50 27622.40 27417.60 879.39 -0.74%
write_clat_ns_p99 152166.40 171417.60 9864.47 12.65%
write_io_kbytes 16777216 16777216 0 0.00%
write_iops 11714.64 12127.31 665.81 3.52%
write_lat_ns_max 2.93e+09 2.21e+09 1.41e+09 -24.81%
write_lat_ns_mean 83674.85 80915.46 5128.53 -3.30%
write_lat_ns_min 16718.40 16552.40 455.01 -0.99%
dbench60 results
metric baseline current stdev diff
=============================================================================
avg_commit_ms 31.72 33.63 1.24 6.04%
bg_count 4 4 0 0.00%
close 4.32 6.06 0.54 40.36%**
commits 24 24 0 0.00%
deltree 526.48 610.32 31.73 15.92%**
dev_read_iops 75597.20 65111 1808.47 -13.87%**
dev_read_kbytes 1040 1040 0 0.00%
dev_write_iops 1545476.80 1347859 52326.80 -12.79%**
dev_write_kbytes 18026305.60 15533764 691195.60 -13.83%**
end_state_mount_ns 48171843.60 45794087 2671959.83 -4.94%
end_state_umount_ns 1.85e+09 1.85e+09 41447992.49 -0.14%
find 29.76 41.82 2.56 40.52%**
flush 122.33 111.94 19.45 -8.49%
lockx 0.95 1.21 0.45 27.55%
max_commit_ms 38.40 45.40 2.30 18.23%**
mkdir 0.08 0.18 0.04 129.35%**
ntcreatex 35.41 48.84 3.60 37.95%**
qfileinfo 3.52 3.31 1.12 -6.11%
qfsinfo 4.25 4.46 0.33 4.95%
qpathinfo 25.30 32.56 3.72 28.68%
readx 7.56 7.97 1.92 5.49%
rename 50.33 50.86 12.36 1.06%
sfileinfo 9.17 8.60 1.67 -6.22%
throughput 315.59 282.74 10.32 -10.41%**
unlink 44.87 54.50 6.19 21.44%
unlockx 0.75 2.19 0.37 191.39%**
writex 98.36 92.78 17.51 -5.68%
dio4kbs16threads results
metric baseline current stdev diff
================================================================================
avg_commit_ms 2339.20 1071.20 975.30 -54.21%
bg_count 20 20 0 0.00%
commits 1 1 0 0.00%
dev_read_iops 48 48 0 0.00%
dev_read_kbytes 1040 1040 0 0.00%
dev_write_iops 1105312.80 1070254 11754.77 -3.17%**
dev_write_kbytes 4919395.20 4815462.40 57354.39 -2.11%
elapsed 61 61 0 0.00%
end_state_mount_ns 50869016.60 50826535.40 1496475.65 -0.08%
end_state_umount_ns 3.02e+09 2.99e+09 43691601.05 -0.97%
frag_pct_max 98.38 98.43 0.55 0.05%
frag_pct_mean 51.91 51.61 0.27 -0.59%
frag_pct_min 5.45 4.79 0.29 -12.12%
frag_pct_p50 51.91 51.61 0.27 -0.59%
frag_pct_p95 98.38 98.43 0.55 0.05%
frag_pct_p99 98.38 98.43 0.55 0.05%
fragmented_bg_count 2 2 0 0.00%
max_commit_ms 2339.20 1071.20 975.30 -54.21%
sys_cpu 18.11 17.24 0.06 -4.78%
write_bw_bytes 72547808 69915896.40 851270.12 -3.63%**
write_clat_ns_mean 900096.95 933821.96 10417.71 3.75%**
write_clat_ns_p50 400179.20 394444.80 4486.94 -1.43%
write_clat_ns_p99 2657484.80 2867200 37361.27 7.89%**
write_io_kbytes 4255195.20 4102859.20 48123.74 -3.58%**
write_iops 17711.87 17069.31 207.83 -3.63%**
write_lat_ns_max 9.19e+08 9.59e+08 1.02e+08 4.37%
write_lat_ns_mean 900449.84 934178.51 10411.01 3.75%**
write_lat_ns_min 148662.20 149014.40 3485.37 0.24%
diorandread results
metric baseline current stdev diff
================================================================================
avg_commit_ms 5.31 5.49 0.08 3.35%**
bg_count 18 18 0 0.00%
commits 18 18 0 0.00%
dev_read_iops 2755803.60 2722414.60 23232.33 -1.21%
dev_read_kbytes 11022186.40 10888630.40 92929.32 -1.21%
dev_write_iops 41688.20 42010.80 312.31 0.77%
dev_write_kbytes 16897548 16897548 0 0.00%
elapsed 61 61 0 0.00%
end_state_mount_ns 47309004.80 49367506.20 1371322.67 4.35%
end_state_umount_ns 1.07e+09 1.07e+09 6066020.04 -0.10%
max_commit_ms 6.60 6.60 0.55 0.00%
read_bw_bytes 1.88e+08 1.86e+08 1587441.01 -1.21%
read_clat_ns_mean 345830.30 350152.25 2831.51 1.25%
read_clat_ns_p50 327270.40 331366.40 2243.47 1.25%
read_clat_ns_p99 647987.20 656179.20 9340.32 1.26%
read_io_bytes 1.13e+10 1.11e+10 95159625.71 -1.21%
read_io_kbytes 11022186.40 10888630.40 92929.32 -1.21%
read_iops 45924.71 45368.09 387.56 -1.21%
read_lat_ns_max 4530461 13372441.20 2563283.41 195.17%**
read_lat_ns_mean 346044.45 350359.97 2846.68 1.25%
read_lat_ns_min 138696.20 134640 4341.30 -2.92%
sys_cpu 24.22 24.16 0.21 -0.23%
emptyfiles500k results
metric baseline current stdev diff
===============================================================================
avg_commit_ms 26.60 24.20 5.13 -9.02%
bg_count 3 3 0 0.00%
commits 1 1 0 0.00%
dev_read_iops 48 48 0 0.00%
dev_read_kbytes 1040 1040 0 0.00%
dev_write_iops 842.80 842 15.27 -0.09%
dev_write_kbytes 13460.80 13448 244.33 -0.10%
elapsed 42.20 42.20 0.84 0.00%
end_state_mount_ns 48019710.20 47157285.80 2091327.38 -1.80%
end_state_umount_ns 1.46e+09 1.46e+09 26640049.48 0.04%
max_commit_ms 26.60 24.20 5.13 -9.02%
sys_cpu 5.47 7.76 0.18 41.97%**
write_bw_bytes 3077613.60 3073469.80 66469.32 -0.13%
write_clat_ns_mean 67202.18 98477.96 1091.86 46.54%**
write_clat_ns_p50 63027.20 93900.80 1061.71 48.98%**
write_clat_ns_p99 135270.40 169779.20 4165.81 25.51%**
write_io_kbytes 125000 125000 0 0.00%
write_iops 751.37 750.36 16.23 -0.13%
randwrite2xram results
metric baseline current stdev diff
==============================================================================
avg_commit_ms 872.44 847.24 13.74 -2.89%
commits 60 71.40 4.85 19.00%**
dev_read_iops 630.40 1092.60 240.86 73.32%
dev_read_kbytes 9414.40 16627.20 3800.10 76.61%
dev_write_iops 5119929.40 5187028.80 123210.86 1.31%
dev_write_kbytes 37392001.60 39529271.20 1840547.09 5.72%
elapsed 312.80 312.20 0.84 -0.19%
end_state_mount_ns 58923162.40 58027381.60 2021748.73 -1.52%
end_state_umount_ns 1.06e+10 1.07e+10 5.74e+08 0.87%
max_commit_ms 1608.60 1549.20 116.36 -3.69%
sys_cpu 9.83 9.63 0.10 -2.02%
write_bw_bytes 48972662.80 47887587.20 400536.11 -2.22%**
write_clat_ns_mean 331763.74 339356.86 2837.07 2.29%**
write_clat_ns_p50 29721.60 30028.80 642.55 1.03%
write_clat_ns_p99 297164.80 329728 34486.16 10.96%
write_io_kbytes 14378941.60 14054288.80 129599.67 -2.26%**
write_iops 11956.22 11691.31 97.79 -2.22%**
write_lat_ns_max 2.24e+09 2.23e+09 1.17e+08 -0.41%
write_lat_ns_mean 331873.49 339472.07 2838.23 2.29%**
write_lat_ns_min 18096.40 17637.40 772.99 -2.54%
smallfiles100k results
metric baseline current stdev diff
===============================================================================
avg_commit_ms 71.00 76.99 3.72 8.44%
bg_count 19 19 0 0.00%
commits 12.40 12.60 1.52 1.61%
dev_read_iops 111 111.80 9.82 0.72%
dev_read_kbytes 1865.60 1875.20 166.89 0.51%
dev_write_iops 2183867.80 2183585.60 12589.22 -0.01%
dev_write_kbytes 21161021.60 21168306.40 71548.04 0.03%
elapsed 390.40 391.40 47.79 0.26%
end_state_mount_ns 42532681.60 42288602.40 1686619.50 -0.57%
end_state_umount_ns 7.44e+09 8.06e+09 4.25e+08 8.30%
max_commit_ms 132.20 135 9.83 2.12%
sys_cpu 39.03 39.36 4.34 0.86%
write_bw_bytes 5.46e+08 5.45e+08 58484454.42 -0.16%
write_clat_ns_mean 12794.53 12769.69 3097.83 -0.19%
write_clat_ns_p50 11315.20 11264 3334.89 -0.45%
write_clat_ns_p99 27545.60 27494.40 5067.25 -0.19%
write_io_kbytes 2.04e+08 2.04e+08 0 0.00%
write_iops 133271.81 133057.37 14278.43 -0.16%
write_lat_ns_max 34433966 34705517.60 4302608.84 0.79%
write_lat_ns_mean 12881.08 12859.97 3106.37 -0.16%
write_lat_ns_min 5946 5814.40 204.95 -2.21%
```
Sent using hkml (https://github.com/sjp38/hackermail)
next prev parent reply other threads:[~2025-07-23 17:44 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-09 22:04 [PATCH] btrfs: implement ref_tracker for delayed_nodes Leo Martins
2025-07-10 11:54 ` Filipe Manana
2025-07-11 10:38 ` David Sterba
2025-07-23 17:44 ` Leo Martins [this message]
2025-07-23 18:04 ` Filipe Manana
2025-07-10 23:49 ` kernel test robot
2025-07-11 3:42 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250723174434.3484810-1-loemra.dev@gmail.com \
--to=loemra.dev@gmail.com \
--cc=dsterba@suse.cz \
--cc=fdmanana@kernel.org \
--cc=jlayton@kernel.org \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.