From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com,
linux-fsdevel@vger.kernel.org
Subject: [PATCH v3 0/9] ENOSPC delalloc flushing fixes
Date: Wed, 14 Jul 2021 14:47:16 -0400 [thread overview]
Message-ID: <cover.1626288241.git.josef@toxicpanda.com> (raw)
v2->v3:
- Reordered the patches to have "btrfs: wake up async_delalloc_pages waiters
after submit" come first, as it's a general fix that we need.
- Fixed the nit that Nikolay pointed out in "btrfs: handle shrink_delalloc pages
calculation differently".
- Added "btrfs: include delalloc related info in dump space info tracepoint" to
include the delalloc_bytes and ordered_bytes in the dump_space_info
tracepoint.
v1->v2:
- Two extra patches to remove the last user of sync_inode() (p9fs) and then
remove sync_inode() itself, as per hch's request
--- Original email ---
Hello,
I've been debugging and fixing a problem we hit in production related to getting
ENOSPC when we still should have had space to use. A variety of different
things were going wrong, but one of them was sometimes we wouldn't wait for all
of delalloc to be flushed. This series of patches fixes a few problems in this
area, starting with
btrfs: handle shrink_delalloc pages calculation differently
When we switched to writing pages instead of full inodes for flushing we didn't
adjust the counters to give us pages, instead using the "items" amount. This is
incorrect as we'd just not flush that much delalloc, leaving a lot laying around
when we finally ENOSPC'd.
The next bit are related to compression, as we have compression on everywhere in
production.
btrfs: wait on async extents when flushing delalloc
btrfs: wake up async_delalloc_pages waiters after submit
I ripped this code out because I added a second sync_inode() if we had async
extents in order to do the proper waiting. However sync_inode() could skip
writeout if writeout had begun already on the inode, so we still need this
waiting in order to make sure we don't try to wait on ordered extents until all
ordered extents have been created.
And finally these two patches
fs: add a filemap_fdatawrite_wbc helper
btrfs: use the filemap_fdatawrite_wbc helper for delalloc shrinking
We need a writeback helper that will take a wbc and not try to do anything fancy
other than write out the inode we want. sync_inode() has the drawback that it
will skip writeout if the inode is currently under writeback, and thus we won't
wait properly.
I ran this series through fsperf, the results ar eposted below. I'm still
printing the %diff, but I'm also printing the stdev so you can see the variance
in teh values we expect. Generally there is no change or it's within the normal
range. These patches really only affect anything when we're very full on space.
I also ran this through the enospc stress test and saw no early enospc. Thanks,
Josef
dbench60 results
metric baseline current stdev diff
==============================================================
qpathinfo 11.26 11.87 0.64 5.35%
throughput 652.90 628.99 65.68 -3.66%
flush 22292.41 35005.49 15197.94 57.03%
qfileinfo 1.04 1.24 0.17 19.02%
ntcreatex 3965.31 6167.38 7790.74 55.53%
qfsinfo 1.79 1.41 0.40 -20.88%
close 1.81 1.95 0.39 7.67%
sfileinfo 4.96 5.50 1.13 10.70%
rename 2640.98 5844.86 5873.86 121.31%
find 12.29 12.83 1.00 4.36%
unlink 3310.42 4809.88 7179.77 45.30%
writex 12521.22 37992.79 7567.72 203.43%
deltree 409.86 363.46 224.63 -11.32%
readx 2.37 4.18 0.66 76.01%
mkdir 0.03 0.11 0.01 271.06%
lockx 0.44 0.17 0.27 -61.83%
unlockx 0.16 0.64 0.11 291.70%
emptyfiles500k results
metric baseline current stdev diff
======================================================================
write_io_kbytes 125000 125000 0 0.00%
read_clat_ns_p99 0 0 0 0.00%
write_bw_bytes 1.83e+08 1.77e+08 5516543.33 -3.19%
read_iops 0 0 0 0.00%
write_clat_ns_p50 17536 17792 273.68 1.46%
read_io_kbytes 0 0 0 0.00%
read_io_bytes 0 0 0 0.00%
write_clat_ns_p99 72576 74240 2677.97 2.29%
read_bw_bytes 0 0 0 0.00%
elapsed 1 1 0 0.00%
write_lat_ns_min 0 0 0 0.00%
sys_cpu 91.86 91.29 0.66 -0.63%
write_lat_ns_max 0 0 0 0.00%
read_lat_ns_min 0 0 0 0.00%
write_iops 44583.10 43162.98 1346.81 -3.19%
read_lat_ns_max 0 0 0 0.00%
read_clat_ns_p50 0 0 0 0.00%
smallfiles100k results
metric baseline current stdev diff
========================================================================
write_io_kbytes 2.04e+08 2.04e+08 0 0.00%
read_clat_ns_p99 0 0 0 0.00%
write_bw_bytes 1.33e+08 1.40e+08 15033768.39 5.20%
read_iops 0 0 0 0.00%
write_clat_ns_p50 6424 6688 79.77 4.11%
read_io_kbytes 0 0 0 0.00%
read_io_bytes 0 0 0 0.00%
write_clat_ns_p99 15960 16320 251.24 2.26%
read_bw_bytes 0 0 0 0.00%
elapsed 1592.50 1492 234.98 -6.31%
write_lat_ns_min 2730.75 2768 45.73 1.36%
sys_cpu 5.75 6.25 0.65 8.68%
write_lat_ns_max 5.17e+08 1.52e+08 1.05e+09 -70.54%
read_lat_ns_min 0 0 0 0.00%
write_iops 32545.54 34239.34 3670.35 5.20%
read_lat_ns_max 0 0 0 0.00%
read_clat_ns_p50 0 0 0 0.00%
bufferedrandwrite16g results
metric baseline current stdev diff
===========================================================================
write_io_kbytes 16777216 16777216 0 0.00%
read_clat_ns_p99 0 0 0 0.00%
write_bw_bytes 91729211.62 94071836 13859731.84 2.55%
read_iops 0 0 0 0.00%
write_clat_ns_p50 12704 11840 1227.26 -6.80%
read_io_kbytes 0 0 0 0.00%
read_io_bytes 0 0 0 0.00%
write_clat_ns_p99 31136 32384 2925.86 4.01%
read_bw_bytes 0 0 0 0.00%
elapsed 191.38 183 25.98 -4.38%
write_lat_ns_min 3961.25 4068 95.29 2.69%
sys_cpu 31.83 29.77 6.85 -6.47%
write_lat_ns_max 3.05e+10 1.70e+10 1.95e+10 -44.34%
read_lat_ns_min 0 0 0 0.00%
write_iops 22394.83 22966.76 3383.72 2.55%
read_lat_ns_max 0 0 0 0.00%
read_clat_ns_p50 0 0 0 0.00%
dio4kbs16threads results
metric baseline current stdev diff
===========================================================================
write_io_kbytes 4187092 4949612 599817.04 18.21%
read_clat_ns_p99 0 0 0 0.00%
write_bw_bytes 71448568.62 84467746 10233293.68 18.22%
read_iops 0 0 0 0.00%
write_clat_ns_p50 245504 240640 4370.25 -1.98%
read_io_kbytes 0 0 0 0.00%
read_io_bytes 0 0 0 0.00%
write_clat_ns_p99 22249472 20054016 1120427.41 -9.87%
read_bw_bytes 0 0 0 0.00%
elapsed 61 61 0 0.00%
write_lat_ns_min 38440 38571 225.10 0.34%
sys_cpu 3.89 4.57 0.46 17.47%
write_lat_ns_max 1.23e+09 9.48e+08 7.00e+08 -23.22%
read_lat_ns_min 0 0 0 0.00%
write_iops 17443.50 20622.01 2498.36 18.22%
read_lat_ns_max 0 0 0 0.00%
read_clat_ns_p50 0 0 0 0.00%
randwrite2xram results
metric baseline current stdev diff
========================================================================
write_io_kbytes 33948247 34359528 4793805.63 1.21%
read_clat_ns_p99 0 0 0 0.00%
write_bw_bytes 1.15e+08 1.17e+08 17455355.14 1.47%
read_iops 0 0 0 0.00%
write_clat_ns_p50 15232 14400 1043.81 -5.46%
read_io_kbytes 0 0 0 0.00%
read_io_bytes 0 0 0 0.00%
write_clat_ns_p99 95264 67072 25379.05 -29.59%
read_bw_bytes 0 0 0 0.00%
elapsed 313.62 314 5.10 0.12%
write_lat_ns_min 5399.38 5658 126.19 4.79%
sys_cpu 11.61 11.06 2.04 -4.69%
write_lat_ns_max 3.16e+10 2.91e+10 1.46e+10 -8.05%
read_lat_ns_min 0 0 0 0.00%
write_iops 28099.86 28511.67 4261.56 1.47%
read_lat_ns_max 0 0 0 0.00%
read_clat_ns_p50 0 0 0 0.00%
untarfirefox results
metric baseline current stdev diff
======================================================
elapsed 46.89 46.75 0.18 -0.29%
Josef Bacik (9):
btrfs: wake up async_delalloc_pages waiters after submit
btrfs: include delalloc related info in dump space info tracepoint
btrfs: enable a tracepoint when we fail tickets
btrfs: handle shrink_delalloc pages calculation differently
btrfs: wait on async extents when flushing delalloc
fs: add a filemap_fdatawrite_wbc helper
btrfs: use the filemap_fdatawrite_wbc helper for delalloc shrinking
9p: migrate from sync_inode to filemap_fdatawrite_wbc
fs: kill sync_inode
fs/9p/vfs_file.c | 7 +--
fs/btrfs/ctree.h | 9 ++--
fs/btrfs/inode.c | 16 +++----
fs/btrfs/space-info.c | 82 ++++++++++++++++++++++++++++++------
fs/fs-writeback.c | 19 +--------
include/linux/fs.h | 3 +-
include/trace/events/btrfs.h | 21 +++++++--
mm/filemap.c | 35 +++++++++++----
8 files changed, 128 insertions(+), 64 deletions(-)
--
2.26.3
next reply other threads:[~2021-07-14 18:47 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-14 18:47 Josef Bacik [this message]
2021-07-14 18:47 ` [PATCH v3 1/9] btrfs: wake up async_delalloc_pages waiters after submit Josef Bacik
2021-07-14 18:47 ` [PATCH v3 2/9] btrfs: include delalloc related info in dump space info tracepoint Josef Bacik
2021-07-14 18:47 ` [PATCH v3 3/9] btrfs: enable a tracepoint when we fail tickets Josef Bacik
2021-07-14 18:47 ` [PATCH v3 4/9] btrfs: handle shrink_delalloc pages calculation differently Josef Bacik
2021-07-14 18:47 ` [PATCH v3 5/9] btrfs: wait on async extents when flushing delalloc Josef Bacik
2021-07-14 18:47 ` [PATCH v3 6/9] fs: add a filemap_fdatawrite_wbc helper Josef Bacik
2021-07-15 6:23 ` Christoph Hellwig
2021-07-21 11:20 ` David Sterba
2021-07-14 18:47 ` [PATCH v3 7/9] btrfs: use the filemap_fdatawrite_wbc helper for delalloc shrinking Josef Bacik
2021-07-14 18:47 ` [PATCH v3 8/9] 9p: migrate from sync_inode to filemap_fdatawrite_wbc Josef Bacik
2021-07-15 6:25 ` Christoph Hellwig
2021-07-21 10:49 ` David Sterba
2021-07-14 18:47 ` [PATCH v3 9/9] fs: kill sync_inode Josef Bacik
2021-07-15 6:27 ` Christoph Hellwig
2021-07-15 7:31 ` [PATCH v3 0/9] ENOSPC delalloc flushing fixes Nikolay Borisov
2021-07-21 11:29 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1626288241.git.josef@toxicpanda.com \
--to=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).