public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Leo Martins <loemra.dev@gmail.com>
To: David Sterba <dsterba@suse.cz>
Cc: Filipe Manana <fdmanana@kernel.org>,
	linux-btrfs@vger.kernel.org, kernel-team@fb.com,
	jlayton@kernel.org
Subject: Re: [PATCH] btrfs: implement ref_tracker for delayed_nodes
Date: Wed, 23 Jul 2025 10:44:26 -0700	[thread overview]
Message-ID: <20250723174434.3484810-1-loemra.dev@gmail.com> (raw)
In-Reply-To: <20250711103853.GC22472@twin.jikos.cz>

On Fri, 11 Jul 2025 12:38:53 +0200 David Sterba <dsterba@suse.cz> wrote:

> On Thu, Jul 10, 2025 at 12:54:34PM +0100, Filipe Manana wrote:
> > > @@ -63,10 +109,18 @@ struct btrfs_delayed_node {
> > >         struct rb_root_cached del_root;
> > >         struct mutex mutex;
> > >         struct btrfs_inode_item inode_item;
> > > +
> > >         refcount_t refs;
> > > -       int count;
> > > +       /* Used to track all references to this delayed node. */
> > > +       btrfs_delayed_node_ref_tracker_dir ref_dir;
> > > +       /* Used to track delayed node reference stored in node list. */
> > > +       btrfs_delayed_node_ref_tracker node_list_tracker;
> > > +       /* Used to track delayed node reference stored in inode cache. */
> > > +       btrfs_delayed_node_ref_tracker inode_cache_tracker;
> > > +
> > >         u64 index_cnt;
> > >         unsigned long flags;
> > > +       int count;
> > 
> > Why are you moving the count field?
> > This increases the size of the struct even without the new config
> > option enabled and as a result we get less objects per page.
> > You are essentially reverting this commit:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a20725e1e7014642fc8ba4c7dd2c4ef6d4ec56a9
> > 
> > Without any explanation about why.

Good catch, did not realize that moving the count field increased
struct size.

> > 
> > I'm also not a big fan of the typedefs and would prefer to have
> > struct ref_tracker used directly in the structure surrounded by an
> > #ifdef block.
> 
> Agreed, we use typedefs only for function callbacks, for variable types
> not anymore.

When I try and not use typedefs I end up using more conditional compilation
which I think hinders readability. Here's a snippet of my best attempt, let
me know if you see any improvements.

```C
struct btrfs_delayed_node {
	...
#ifdef CONFIG_BTRFS_DEBUG
	struct ref_tracker_dir ref_dir;
	struct ref_tracker *node_list_tracker;
	struct ref_tracker *inode_cache_tracker;
#endif
	...
}

#ifdef
static inline void
btrfs_release_delayed_node(struct btrfs_delayed_node *node,
			   struct ref_tracker **tracker)
{
	__btrfs_release_delayed_node(node, 0, tracker);
}

void btrfs_remove_delayed_node(struct btrfs_inode *inode)
{
	struct btrfs_delayed_node *delayed_node;

	delayed_node = READ_ONCE(inode->delayed_node);
	if (!delayed_node)
		return;

	inode->delayed_node = NULL;
#ifdef CONFIG_BTRFS_DEBUG
	btrfs_release_delayed_node(delayed_node,
				   &delayed_node->inode_cache_tracker);
#else
	btrfs_release_delayed_node(delayed_node, NULL);
#endif
}
```

With this approach I end up needing to add conditional compilation
anywhere I use node_list_tracker or inode_cache_tracker. This isn't
too big of an issue since their usage is relatively limited.
However, I believe this would be an appropriate place for typedefs.

Alternatively, I could have two versions of btrfs_release_delayed_node
one that takes a struct ref_tracker ** and one that takes a struct {}.
However, this would mean doing the same for all the functions that
take a tracker, btrfs_get_delayed_node, btrfs_get_or_create_delayed_node,
... But, I feel like that is the same as manually unwinding the typedef.

Any guidance would be greatly appreciated! Also curious about the aversion
to typedefs outside of function callbacks?

> 
> > I also don't like adding a new Kconfig option just for this... How
> > much slower would a build with CONFIG_BTRFS_DEBUG=y be?
> > 
> > If that's really a lot slower, then perhaps a more generic config
> > option name in case we add similar tracking to other data structures
> > one day.

I gathered some data here using fsperf and got some mixed results, most
tests performed slightly worse, and some got much worse
(dbench60, emptyfiles500k). I think it would be acceptible to keep in
CONFIG_BTRFS_DEBUG, but I think it could definitely benefit from
its own option, CONFIG_BTRFS_REF_TRACKER.

> 
> Right, we want to keep the config options at minimum and for debugging
> features it should be fine under BTRFS_DEBUG. If this needs to be
> configured for performance reasons then it can be a mount option, same
> as we have ref-verify or fragment already.

If we wanted to allow turning ref_tracker on via mount option we would
not be able to compile out the ref_tracker objects in delayed_node which
would cause a change in size from 304 bytes to 400 bytes.

baseline represents CONFIG_BTRFS_DEBUG=y without ref_tracker
current represents CONFIG_BTRFS_DEBUG=y with ref_tracker
** means 2*stdev worse
```
[root@localhost fsperf]# ./fsperf -p "baseline" -n 5
...
<enable ref_tracker>
[root@localhost fsperf]# ./fsperf -p "tracker" -n 5
...
[root@localhost fsperf]# ./fsperf-compare "baseline" "tracker"
['btrfs']
btrfs test results
bufferedappenddatasync results
      metric           baseline      current       stdev            diff
==============================================================================
avg_commit_ms               1.91          1.85         0.13    -3.23%
bg_count                    4.60          4.40         0.55    -4.35%
commits                       20         21.20         2.12     6.00%
dev_read_iops          262187.60     262190.40         8.32     0.00%
dev_read_kbytes          1020.80          1040        42.93     1.88%
dev_write_iops        3129696.80       3126160      2909.20    -0.11%
dev_write_kbytes        43783448   43726868.80     46497.53    -0.13%
elapsed                   301.60        300.60         9.29    -0.33%
end_state_mount_ns      46733159      44610208   3503084.15    -4.54%
end_state_umount_ns     2.77e+09      2.58e+09     2.17e+08    -6.65%
max_commit_ms               2.80          2.40         0.84   -14.29%
sys_cpu                    61.80         61.69         0.54    -0.17%
write_bw_bytes           3572197    3584941.20    106296.35     0.36%
write_clat_ns_mean      29694.09      29299.92      4510.28    -1.33%
write_clat_ns_p50       28262.40      27955.20      4380.52    -1.09%
write_clat_ns_p99       48793.60      48281.60      7226.28    -1.05%
write_io_kbytes          1048576       1048576            0     0.00%
write_iops                872.12        875.23        25.95     0.36%
write_lat_ns_max       764860.40        634311    354679.35   -17.07%
write_lat_ns_mean       29981.52      29539.17      4518.22    -1.48%
write_lat_ns_min        21404.80      21060.40       315.78    -1.61%

bufferedrandwrite16g results
      metric           baseline       current       stdev            diff
===============================================================================
avg_commit_ms              794.59        773.06        27.11    -2.71%
commits                     83.20         79.80         2.05    -4.09%
dev_read_iops              658.60        521.20       335.31   -20.86%
dev_read_kbytes           9478.40       7334.40      5341.58   -22.62%
dev_write_iops         5801247.80    5721033.20     53643.07    -1.38%
dev_write_kbytes      44303422.40   43038694.40    853851.59    -2.85%
elapsed                       360           347        21.94    -3.61%
end_state_mount_ns    58480927.20   59131105.80    873330.39     1.11%
end_state_umount_ns      9.10e+09      8.71e+09     3.41e+08    -4.33%
max_commit_ms                1624       1630.60        53.77     0.41%
sys_cpu                     35.52         35.88         1.25     1.00%
write_bw_bytes        47983181.20      49673467   2727170.05     3.52%
write_clat_ns_mean       83559.03      80805.49      5123.88    -3.30%
write_clat_ns_p50        27622.40      27417.60       879.39    -0.74%
write_clat_ns_p99       152166.40     171417.60      9864.47    12.65%
write_io_kbytes          16777216      16777216            0     0.00%
write_iops               11714.64      12127.31       665.81     3.52%
write_lat_ns_max         2.93e+09      2.21e+09     1.41e+09   -24.81%
write_lat_ns_mean        83674.85      80915.46      5128.53    -3.30%
write_lat_ns_min         16718.40      16552.40       455.01    -0.99%

dbench60 results
      metric           baseline     current       stdev            diff
=============================================================================
avg_commit_ms               31.72      33.63          1.24     6.04%
bg_count                        4          4             0     0.00%
close                        4.32       6.06          0.54    40.36%**
commits                        24         24             0     0.00%
deltree                    526.48     610.32         31.73    15.92%**
dev_read_iops            75597.20      65111       1808.47   -13.87%**
dev_read_kbytes              1040       1040             0     0.00%
dev_write_iops         1545476.80    1347859      52326.80   -12.79%**
dev_write_kbytes      18026305.60   15533764     691195.60   -13.83%**
end_state_mount_ns    48171843.60   45794087    2671959.83    -4.94%
end_state_umount_ns      1.85e+09   1.85e+09   41447992.49    -0.14%
find                        29.76      41.82          2.56    40.52%**
flush                      122.33     111.94         19.45    -8.49%
lockx                        0.95       1.21          0.45    27.55%
max_commit_ms               38.40      45.40          2.30    18.23%**
mkdir                        0.08       0.18          0.04   129.35%**
ntcreatex                   35.41      48.84          3.60    37.95%**
qfileinfo                    3.52       3.31          1.12    -6.11%
qfsinfo                      4.25       4.46          0.33     4.95%
qpathinfo                   25.30      32.56          3.72    28.68%
readx                        7.56       7.97          1.92     5.49%
rename                      50.33      50.86         12.36     1.06%
sfileinfo                    9.17       8.60          1.67    -6.22%
throughput                 315.59     282.74         10.32   -10.41%**
unlink                      44.87      54.50          6.19    21.44%
unlockx                      0.75       2.19          0.37   191.39%**
writex                      98.36      92.78         17.51    -5.68%

dio4kbs16threads results
      metric           baseline       current        stdev            diff
================================================================================
avg_commit_ms             2339.20       1071.20        975.30   -54.21%
bg_count                       20            20             0     0.00%
commits                         1             1             0     0.00%
dev_read_iops                  48            48             0     0.00%
dev_read_kbytes              1040          1040             0     0.00%
dev_write_iops         1105312.80       1070254      11754.77    -3.17%**
dev_write_kbytes       4919395.20    4815462.40      57354.39    -2.11%
elapsed                        61            61             0     0.00%
end_state_mount_ns    50869016.60   50826535.40    1496475.65    -0.08%
end_state_umount_ns      3.02e+09      2.99e+09   43691601.05    -0.97%
frag_pct_max                98.38         98.43          0.55     0.05%
frag_pct_mean               51.91         51.61          0.27    -0.59%
frag_pct_min                 5.45          4.79          0.29   -12.12%
frag_pct_p50                51.91         51.61          0.27    -0.59%
frag_pct_p95                98.38         98.43          0.55     0.05%
frag_pct_p99                98.38         98.43          0.55     0.05%
fragmented_bg_count             2             2             0     0.00%
max_commit_ms             2339.20       1071.20        975.30   -54.21%
sys_cpu                     18.11         17.24          0.06    -4.78%
write_bw_bytes           72547808   69915896.40     851270.12    -3.63%**
write_clat_ns_mean      900096.95     933821.96      10417.71     3.75%**
write_clat_ns_p50       400179.20     394444.80       4486.94    -1.43%
write_clat_ns_p99      2657484.80       2867200      37361.27     7.89%**
write_io_kbytes        4255195.20    4102859.20      48123.74    -3.58%**
write_iops               17711.87      17069.31        207.83    -3.63%**
write_lat_ns_max         9.19e+08      9.59e+08      1.02e+08     4.37%
write_lat_ns_mean       900449.84     934178.51      10411.01     3.75%**
write_lat_ns_min        148662.20     149014.40       3485.37     0.24%

diorandread results
      metric           baseline       current        stdev            diff
================================================================================
avg_commit_ms                5.31          5.49          0.08     3.35%**
bg_count                       18            18             0     0.00%
commits                        18            18             0     0.00%
dev_read_iops          2755803.60    2722414.60      23232.33    -1.21%
dev_read_kbytes       11022186.40   10888630.40      92929.32    -1.21%
dev_write_iops           41688.20      42010.80        312.31     0.77%
dev_write_kbytes         16897548      16897548             0     0.00%
elapsed                        61            61             0     0.00%
end_state_mount_ns    47309004.80   49367506.20    1371322.67     4.35%
end_state_umount_ns      1.07e+09      1.07e+09    6066020.04    -0.10%
max_commit_ms                6.60          6.60          0.55     0.00%
read_bw_bytes            1.88e+08      1.86e+08    1587441.01    -1.21%
read_clat_ns_mean       345830.30     350152.25       2831.51     1.25%
read_clat_ns_p50        327270.40     331366.40       2243.47     1.25%
read_clat_ns_p99        647987.20     656179.20       9340.32     1.26%
read_io_bytes            1.13e+10      1.11e+10   95159625.71    -1.21%
read_io_kbytes        11022186.40   10888630.40      92929.32    -1.21%
read_iops                45924.71      45368.09        387.56    -1.21%
read_lat_ns_max           4530461   13372441.20    2563283.41   195.17%**
read_lat_ns_mean        346044.45     350359.97       2846.68     1.25%
read_lat_ns_min         138696.20        134640       4341.30    -2.92%
sys_cpu                     24.22         24.16          0.21    -0.23%

emptyfiles500k results
      metric           baseline       current        stdev           diff
===============================================================================
avg_commit_ms               26.60         24.20          5.13   -9.02%
bg_count                        3             3             0    0.00%
commits                         1             1             0    0.00%
dev_read_iops                  48            48             0    0.00%
dev_read_kbytes              1040          1040             0    0.00%
dev_write_iops             842.80           842         15.27   -0.09%
dev_write_kbytes         13460.80         13448        244.33   -0.10%
elapsed                     42.20         42.20          0.84    0.00%
end_state_mount_ns    48019710.20   47157285.80    2091327.38   -1.80%
end_state_umount_ns      1.46e+09      1.46e+09   26640049.48    0.04%
max_commit_ms               26.60         24.20          5.13   -9.02%
sys_cpu                      5.47          7.76          0.18   41.97%**
write_bw_bytes         3077613.60    3073469.80      66469.32   -0.13%
write_clat_ns_mean       67202.18      98477.96       1091.86   46.54%**
write_clat_ns_p50        63027.20      93900.80       1061.71   48.98%**
write_clat_ns_p99       135270.40     169779.20       4165.81   25.51%**
write_io_kbytes            125000        125000             0    0.00%
write_iops                 751.37        750.36         16.23   -0.13%

randwrite2xram results
      metric           baseline       current       stdev           diff
==============================================================================
avg_commit_ms              872.44        847.24        13.74   -2.89%
commits                        60         71.40         4.85   19.00%**
dev_read_iops              630.40       1092.60       240.86   73.32%
dev_read_kbytes           9414.40      16627.20      3800.10   76.61%
dev_write_iops         5119929.40    5187028.80    123210.86    1.31%
dev_write_kbytes      37392001.60   39529271.20   1840547.09    5.72%
elapsed                    312.80        312.20         0.84   -0.19%
end_state_mount_ns    58923162.40   58027381.60   2021748.73   -1.52%
end_state_umount_ns      1.06e+10      1.07e+10     5.74e+08    0.87%
max_commit_ms             1608.60       1549.20       116.36   -3.69%
sys_cpu                      9.83          9.63         0.10   -2.02%
write_bw_bytes        48972662.80   47887587.20    400536.11   -2.22%**
write_clat_ns_mean      331763.74     339356.86      2837.07    2.29%**
write_clat_ns_p50        29721.60      30028.80       642.55    1.03%
write_clat_ns_p99       297164.80        329728     34486.16   10.96%
write_io_kbytes       14378941.60   14054288.80    129599.67   -2.26%**
write_iops               11956.22      11691.31        97.79   -2.22%**
write_lat_ns_max         2.24e+09      2.23e+09     1.17e+08   -0.41%
write_lat_ns_mean       331873.49     339472.07      2838.23    2.29%**
write_lat_ns_min         18096.40      17637.40       772.99   -2.54%

smallfiles100k results
      metric           baseline       current        stdev           diff
===============================================================================
avg_commit_ms               71.00         76.99          3.72    8.44%
bg_count                       19            19             0    0.00%
commits                     12.40         12.60          1.52    1.61%
dev_read_iops                 111        111.80          9.82    0.72%
dev_read_kbytes           1865.60       1875.20        166.89    0.51%
dev_write_iops         2183867.80    2183585.60      12589.22   -0.01%
dev_write_kbytes      21161021.60   21168306.40      71548.04    0.03%
elapsed                    390.40        391.40         47.79    0.26%
end_state_mount_ns    42532681.60   42288602.40    1686619.50   -0.57%
end_state_umount_ns      7.44e+09      8.06e+09      4.25e+08    8.30%
max_commit_ms              132.20           135          9.83    2.12%
sys_cpu                     39.03         39.36          4.34    0.86%
write_bw_bytes           5.46e+08      5.45e+08   58484454.42   -0.16%
write_clat_ns_mean       12794.53      12769.69       3097.83   -0.19%
write_clat_ns_p50        11315.20         11264       3334.89   -0.45%
write_clat_ns_p99        27545.60      27494.40       5067.25   -0.19%
write_io_kbytes          2.04e+08      2.04e+08             0    0.00%
write_iops              133271.81     133057.37      14278.43   -0.16%
write_lat_ns_max         34433966   34705517.60    4302608.84    0.79%
write_lat_ns_mean        12881.08      12859.97       3106.37   -0.16%
write_lat_ns_min             5946       5814.40        204.95   -2.21%
```

Sent using hkml (https://github.com/sjp38/hackermail)

  reply	other threads:[~2025-07-23 17:44 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-09 22:04 [PATCH] btrfs: implement ref_tracker for delayed_nodes Leo Martins
2025-07-10 11:54 ` Filipe Manana
2025-07-11 10:38   ` David Sterba
2025-07-23 17:44     ` Leo Martins [this message]
2025-07-23 18:04       ` Filipe Manana
2025-07-10 23:49 ` kernel test robot
2025-07-11  3:42 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250723174434.3484810-1-loemra.dev@gmail.com \
    --to=loemra.dev@gmail.com \
    --cc=dsterba@suse.cz \
    --cc=fdmanana@kernel.org \
    --cc=jlayton@kernel.org \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox