From: Leo Martins <loemra.dev@gmail.com>
To: David Sterba <dsterba@suse.cz>
Cc: Filipe Manana <fdmanana@kernel.org>,
linux-btrfs@vger.kernel.org, kernel-team@fb.com,
jlayton@kernel.org
Subject: Re: [PATCH] btrfs: implement ref_tracker for delayed_nodes
Date: Wed, 23 Jul 2025 10:44:26 -0700 [thread overview]
Message-ID: <20250723174434.3484810-1-loemra.dev@gmail.com> (raw)
In-Reply-To: <20250711103853.GC22472@twin.jikos.cz>
On Fri, 11 Jul 2025 12:38:53 +0200 David Sterba <dsterba@suse.cz> wrote:
> On Thu, Jul 10, 2025 at 12:54:34PM +0100, Filipe Manana wrote:
> > > @@ -63,10 +109,18 @@ struct btrfs_delayed_node {
> > > struct rb_root_cached del_root;
> > > struct mutex mutex;
> > > struct btrfs_inode_item inode_item;
> > > +
> > > refcount_t refs;
> > > - int count;
> > > + /* Used to track all references to this delayed node. */
> > > + btrfs_delayed_node_ref_tracker_dir ref_dir;
> > > + /* Used to track delayed node reference stored in node list. */
> > > + btrfs_delayed_node_ref_tracker node_list_tracker;
> > > + /* Used to track delayed node reference stored in inode cache. */
> > > + btrfs_delayed_node_ref_tracker inode_cache_tracker;
> > > +
> > > u64 index_cnt;
> > > unsigned long flags;
> > > + int count;
> >
> > Why are you moving the count field?
> > This increases the size of the struct even without the new config
> > option enabled and as a result we get less objects per page.
> > You are essentially reverting this commit:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a20725e1e7014642fc8ba4c7dd2c4ef6d4ec56a9
> >
> > Without any explanation about why.
Good catch, did not realize that moving the count field increased
struct size.
> >
> > I'm also not a big fan of the typedefs and would prefer to have
> > struct ref_tracker used directly in the structure surrounded by an
> > #ifdef block.
>
> Agreed, we use typedefs only for function callbacks, for variable types
> not anymore.
When I try and not use typedefs I end up using more conditional compilation
which I think hinders readability. Here's a snippet of my best attempt, let
me know if you see any improvements.
```C
struct btrfs_delayed_node {
...
#ifdef CONFIG_BTRFS_DEBUG
struct ref_tracker_dir ref_dir;
struct ref_tracker *node_list_tracker;
struct ref_tracker *inode_cache_tracker;
#endif
...
}
#ifdef
static inline void
btrfs_release_delayed_node(struct btrfs_delayed_node *node,
struct ref_tracker **tracker)
{
__btrfs_release_delayed_node(node, 0, tracker);
}
void btrfs_remove_delayed_node(struct btrfs_inode *inode)
{
struct btrfs_delayed_node *delayed_node;
delayed_node = READ_ONCE(inode->delayed_node);
if (!delayed_node)
return;
inode->delayed_node = NULL;
#ifdef CONFIG_BTRFS_DEBUG
btrfs_release_delayed_node(delayed_node,
&delayed_node->inode_cache_tracker);
#else
btrfs_release_delayed_node(delayed_node, NULL);
#endif
}
```
With this approach I end up needing to add conditional compilation
anywhere I use node_list_tracker or inode_cache_tracker. This isn't
too big of an issue since their usage is relatively limited.
However, I believe this would be an appropriate place for typedefs.
Alternatively, I could have two versions of btrfs_release_delayed_node
one that takes a struct ref_tracker ** and one that takes a struct {}.
However, this would mean doing the same for all the functions that
take a tracker, btrfs_get_delayed_node, btrfs_get_or_create_delayed_node,
... But, I feel like that is the same as manually unwinding the typedef.
Any guidance would be greatly appreciated! Also curious about the aversion
to typedefs outside of function callbacks?
>
> > I also don't like adding a new Kconfig option just for this... How
> > much slower would a build with CONFIG_BTRFS_DEBUG=y be?
> >
> > If that's really a lot slower, then perhaps a more generic config
> > option name in case we add similar tracking to other data structures
> > one day.
I gathered some data here using fsperf and got some mixed results, most
tests performed slightly worse, and some got much worse
(dbench60, emptyfiles500k). I think it would be acceptible to keep in
CONFIG_BTRFS_DEBUG, but I think it could definitely benefit from
its own option, CONFIG_BTRFS_REF_TRACKER.
>
> Right, we want to keep the config options at minimum and for debugging
> features it should be fine under BTRFS_DEBUG. If this needs to be
> configured for performance reasons then it can be a mount option, same
> as we have ref-verify or fragment already.
If we wanted to allow turning ref_tracker on via mount option we would
not be able to compile out the ref_tracker objects in delayed_node which
would cause a change in size from 304 bytes to 400 bytes.
baseline represents CONFIG_BTRFS_DEBUG=y without ref_tracker
current represents CONFIG_BTRFS_DEBUG=y with ref_tracker
** means 2*stdev worse
```
[root@localhost fsperf]# ./fsperf -p "baseline" -n 5
...
<enable ref_tracker>
[root@localhost fsperf]# ./fsperf -p "tracker" -n 5
...
[root@localhost fsperf]# ./fsperf-compare "baseline" "tracker"
['btrfs']
btrfs test results
bufferedappenddatasync results
metric baseline current stdev diff
==============================================================================
avg_commit_ms 1.91 1.85 0.13 -3.23%
bg_count 4.60 4.40 0.55 -4.35%
commits 20 21.20 2.12 6.00%
dev_read_iops 262187.60 262190.40 8.32 0.00%
dev_read_kbytes 1020.80 1040 42.93 1.88%
dev_write_iops 3129696.80 3126160 2909.20 -0.11%
dev_write_kbytes 43783448 43726868.80 46497.53 -0.13%
elapsed 301.60 300.60 9.29 -0.33%
end_state_mount_ns 46733159 44610208 3503084.15 -4.54%
end_state_umount_ns 2.77e+09 2.58e+09 2.17e+08 -6.65%
max_commit_ms 2.80 2.40 0.84 -14.29%
sys_cpu 61.80 61.69 0.54 -0.17%
write_bw_bytes 3572197 3584941.20 106296.35 0.36%
write_clat_ns_mean 29694.09 29299.92 4510.28 -1.33%
write_clat_ns_p50 28262.40 27955.20 4380.52 -1.09%
write_clat_ns_p99 48793.60 48281.60 7226.28 -1.05%
write_io_kbytes 1048576 1048576 0 0.00%
write_iops 872.12 875.23 25.95 0.36%
write_lat_ns_max 764860.40 634311 354679.35 -17.07%
write_lat_ns_mean 29981.52 29539.17 4518.22 -1.48%
write_lat_ns_min 21404.80 21060.40 315.78 -1.61%
bufferedrandwrite16g results
metric baseline current stdev diff
===============================================================================
avg_commit_ms 794.59 773.06 27.11 -2.71%
commits 83.20 79.80 2.05 -4.09%
dev_read_iops 658.60 521.20 335.31 -20.86%
dev_read_kbytes 9478.40 7334.40 5341.58 -22.62%
dev_write_iops 5801247.80 5721033.20 53643.07 -1.38%
dev_write_kbytes 44303422.40 43038694.40 853851.59 -2.85%
elapsed 360 347 21.94 -3.61%
end_state_mount_ns 58480927.20 59131105.80 873330.39 1.11%
end_state_umount_ns 9.10e+09 8.71e+09 3.41e+08 -4.33%
max_commit_ms 1624 1630.60 53.77 0.41%
sys_cpu 35.52 35.88 1.25 1.00%
write_bw_bytes 47983181.20 49673467 2727170.05 3.52%
write_clat_ns_mean 83559.03 80805.49 5123.88 -3.30%
write_clat_ns_p50 27622.40 27417.60 879.39 -0.74%
write_clat_ns_p99 152166.40 171417.60 9864.47 12.65%
write_io_kbytes 16777216 16777216 0 0.00%
write_iops 11714.64 12127.31 665.81 3.52%
write_lat_ns_max 2.93e+09 2.21e+09 1.41e+09 -24.81%
write_lat_ns_mean 83674.85 80915.46 5128.53 -3.30%
write_lat_ns_min 16718.40 16552.40 455.01 -0.99%
dbench60 results
metric baseline current stdev diff
=============================================================================
avg_commit_ms 31.72 33.63 1.24 6.04%
bg_count 4 4 0 0.00%
close 4.32 6.06 0.54 40.36%**
commits 24 24 0 0.00%
deltree 526.48 610.32 31.73 15.92%**
dev_read_iops 75597.20 65111 1808.47 -13.87%**
dev_read_kbytes 1040 1040 0 0.00%
dev_write_iops 1545476.80 1347859 52326.80 -12.79%**
dev_write_kbytes 18026305.60 15533764 691195.60 -13.83%**
end_state_mount_ns 48171843.60 45794087 2671959.83 -4.94%
end_state_umount_ns 1.85e+09 1.85e+09 41447992.49 -0.14%
find 29.76 41.82 2.56 40.52%**
flush 122.33 111.94 19.45 -8.49%
lockx 0.95 1.21 0.45 27.55%
max_commit_ms 38.40 45.40 2.30 18.23%**
mkdir 0.08 0.18 0.04 129.35%**
ntcreatex 35.41 48.84 3.60 37.95%**
qfileinfo 3.52 3.31 1.12 -6.11%
qfsinfo 4.25 4.46 0.33 4.95%
qpathinfo 25.30 32.56 3.72 28.68%
readx 7.56 7.97 1.92 5.49%
rename 50.33 50.86 12.36 1.06%
sfileinfo 9.17 8.60 1.67 -6.22%
throughput 315.59 282.74 10.32 -10.41%**
unlink 44.87 54.50 6.19 21.44%
unlockx 0.75 2.19 0.37 191.39%**
writex 98.36 92.78 17.51 -5.68%
dio4kbs16threads results
metric baseline current stdev diff
================================================================================
avg_commit_ms 2339.20 1071.20 975.30 -54.21%
bg_count 20 20 0 0.00%
commits 1 1 0 0.00%
dev_read_iops 48 48 0 0.00%
dev_read_kbytes 1040 1040 0 0.00%
dev_write_iops 1105312.80 1070254 11754.77 -3.17%**
dev_write_kbytes 4919395.20 4815462.40 57354.39 -2.11%
elapsed 61 61 0 0.00%
end_state_mount_ns 50869016.60 50826535.40 1496475.65 -0.08%
end_state_umount_ns 3.02e+09 2.99e+09 43691601.05 -0.97%
frag_pct_max 98.38 98.43 0.55 0.05%
frag_pct_mean 51.91 51.61 0.27 -0.59%
frag_pct_min 5.45 4.79 0.29 -12.12%
frag_pct_p50 51.91 51.61 0.27 -0.59%
frag_pct_p95 98.38 98.43 0.55 0.05%
frag_pct_p99 98.38 98.43 0.55 0.05%
fragmented_bg_count 2 2 0 0.00%
max_commit_ms 2339.20 1071.20 975.30 -54.21%
sys_cpu 18.11 17.24 0.06 -4.78%
write_bw_bytes 72547808 69915896.40 851270.12 -3.63%**
write_clat_ns_mean 900096.95 933821.96 10417.71 3.75%**
write_clat_ns_p50 400179.20 394444.80 4486.94 -1.43%
write_clat_ns_p99 2657484.80 2867200 37361.27 7.89%**
write_io_kbytes 4255195.20 4102859.20 48123.74 -3.58%**
write_iops 17711.87 17069.31 207.83 -3.63%**
write_lat_ns_max 9.19e+08 9.59e+08 1.02e+08 4.37%
write_lat_ns_mean 900449.84 934178.51 10411.01 3.75%**
write_lat_ns_min 148662.20 149014.40 3485.37 0.24%
diorandread results
metric baseline current stdev diff
================================================================================
avg_commit_ms 5.31 5.49 0.08 3.35%**
bg_count 18 18 0 0.00%
commits 18 18 0 0.00%
dev_read_iops 2755803.60 2722414.60 23232.33 -1.21%
dev_read_kbytes 11022186.40 10888630.40 92929.32 -1.21%
dev_write_iops 41688.20 42010.80 312.31 0.77%
dev_write_kbytes 16897548 16897548 0 0.00%
elapsed 61 61 0 0.00%
end_state_mount_ns 47309004.80 49367506.20 1371322.67 4.35%
end_state_umount_ns 1.07e+09 1.07e+09 6066020.04 -0.10%
max_commit_ms 6.60 6.60 0.55 0.00%
read_bw_bytes 1.88e+08 1.86e+08 1587441.01 -1.21%
read_clat_ns_mean 345830.30 350152.25 2831.51 1.25%
read_clat_ns_p50 327270.40 331366.40 2243.47 1.25%
read_clat_ns_p99 647987.20 656179.20 9340.32 1.26%
read_io_bytes 1.13e+10 1.11e+10 95159625.71 -1.21%
read_io_kbytes 11022186.40 10888630.40 92929.32 -1.21%
read_iops 45924.71 45368.09 387.56 -1.21%
read_lat_ns_max 4530461 13372441.20 2563283.41 195.17%**
read_lat_ns_mean 346044.45 350359.97 2846.68 1.25%
read_lat_ns_min 138696.20 134640 4341.30 -2.92%
sys_cpu 24.22 24.16 0.21 -0.23%
emptyfiles500k results
metric baseline current stdev diff
===============================================================================
avg_commit_ms 26.60 24.20 5.13 -9.02%
bg_count 3 3 0 0.00%
commits 1 1 0 0.00%
dev_read_iops 48 48 0 0.00%
dev_read_kbytes 1040 1040 0 0.00%
dev_write_iops 842.80 842 15.27 -0.09%
dev_write_kbytes 13460.80 13448 244.33 -0.10%
elapsed 42.20 42.20 0.84 0.00%
end_state_mount_ns 48019710.20 47157285.80 2091327.38 -1.80%
end_state_umount_ns 1.46e+09 1.46e+09 26640049.48 0.04%
max_commit_ms 26.60 24.20 5.13 -9.02%
sys_cpu 5.47 7.76 0.18 41.97%**
write_bw_bytes 3077613.60 3073469.80 66469.32 -0.13%
write_clat_ns_mean 67202.18 98477.96 1091.86 46.54%**
write_clat_ns_p50 63027.20 93900.80 1061.71 48.98%**
write_clat_ns_p99 135270.40 169779.20 4165.81 25.51%**
write_io_kbytes 125000 125000 0 0.00%
write_iops 751.37 750.36 16.23 -0.13%
randwrite2xram results
metric baseline current stdev diff
==============================================================================
avg_commit_ms 872.44 847.24 13.74 -2.89%
commits 60 71.40 4.85 19.00%**
dev_read_iops 630.40 1092.60 240.86 73.32%
dev_read_kbytes 9414.40 16627.20 3800.10 76.61%
dev_write_iops 5119929.40 5187028.80 123210.86 1.31%
dev_write_kbytes 37392001.60 39529271.20 1840547.09 5.72%
elapsed 312.80 312.20 0.84 -0.19%
end_state_mount_ns 58923162.40 58027381.60 2021748.73 -1.52%
end_state_umount_ns 1.06e+10 1.07e+10 5.74e+08 0.87%
max_commit_ms 1608.60 1549.20 116.36 -3.69%
sys_cpu 9.83 9.63 0.10 -2.02%
write_bw_bytes 48972662.80 47887587.20 400536.11 -2.22%**
write_clat_ns_mean 331763.74 339356.86 2837.07 2.29%**
write_clat_ns_p50 29721.60 30028.80 642.55 1.03%
write_clat_ns_p99 297164.80 329728 34486.16 10.96%
write_io_kbytes 14378941.60 14054288.80 129599.67 -2.26%**
write_iops 11956.22 11691.31 97.79 -2.22%**
write_lat_ns_max 2.24e+09 2.23e+09 1.17e+08 -0.41%
write_lat_ns_mean 331873.49 339472.07 2838.23 2.29%**
write_lat_ns_min 18096.40 17637.40 772.99 -2.54%
smallfiles100k results
metric baseline current stdev diff
===============================================================================
avg_commit_ms 71.00 76.99 3.72 8.44%
bg_count 19 19 0 0.00%
commits 12.40 12.60 1.52 1.61%
dev_read_iops 111 111.80 9.82 0.72%
dev_read_kbytes 1865.60 1875.20 166.89 0.51%
dev_write_iops 2183867.80 2183585.60 12589.22 -0.01%
dev_write_kbytes 21161021.60 21168306.40 71548.04 0.03%
elapsed 390.40 391.40 47.79 0.26%
end_state_mount_ns 42532681.60 42288602.40 1686619.50 -0.57%
end_state_umount_ns 7.44e+09 8.06e+09 4.25e+08 8.30%
max_commit_ms 132.20 135 9.83 2.12%
sys_cpu 39.03 39.36 4.34 0.86%
write_bw_bytes 5.46e+08 5.45e+08 58484454.42 -0.16%
write_clat_ns_mean 12794.53 12769.69 3097.83 -0.19%
write_clat_ns_p50 11315.20 11264 3334.89 -0.45%
write_clat_ns_p99 27545.60 27494.40 5067.25 -0.19%
write_io_kbytes 2.04e+08 2.04e+08 0 0.00%
write_iops 133271.81 133057.37 14278.43 -0.16%
write_lat_ns_max 34433966 34705517.60 4302608.84 0.79%
write_lat_ns_mean 12881.08 12859.97 3106.37 -0.16%
write_lat_ns_min 5946 5814.40 204.95 -2.21%
```
Sent using hkml (https://github.com/sjp38/hackermail)
next prev parent reply other threads:[~2025-07-23 17:44 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-09 22:04 [PATCH] btrfs: implement ref_tracker for delayed_nodes Leo Martins
2025-07-10 11:54 ` Filipe Manana
2025-07-11 10:38 ` David Sterba
2025-07-23 17:44 ` Leo Martins [this message]
2025-07-23 18:04 ` Filipe Manana
2025-07-10 23:49 ` kernel test robot
2025-07-11 3:42 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250723174434.3484810-1-loemra.dev@gmail.com \
--to=loemra.dev@gmail.com \
--cc=dsterba@suse.cz \
--cc=fdmanana@kernel.org \
--cc=jlayton@kernel.org \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox