From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH 0/6] Deadlock fix and cleanups
Date: Fri, 30 Sep 2022 16:45:07 -0400 [thread overview]
Message-ID: <cover.1664570261.git.josef@toxicpanda.com> (raw)
Hello,
There's a problem with how we do our extent locking, namely that if we encounter
any contention in the range we're trying to lock we'll leave any area we were
able to lock locked. This creates dependency issues when the contended area is
blocked on some other thing that depends on an operation that would need the
range lock for the area that was successfully locked. The exact scenario we
encountered is described in patch 1. We need to change this behavior to simply
wait on the contended area and then attempt to re-lock our entire range,
clearing anything that was locked before we encountered contention.
The followup patches are optimizations and cleanups around caching the extent
state in more places. Additionally I've added caching of the extent state of
the contended area to hopefully reduce the pain of needing to unlock and then
wait on the contended area.
The first patch needs to be backported to stable because of the extent-io-tree.c
move. Once it lands in Linus I'll manually backport the patch and send it back
to the stable branches.
I've run this through a sanity test on xfstests, and I ran it through 2 tests on
fsperf that I felt would hammer on the extent locking code the most and be most
likely to run into lock contention. I used the new function profiling stuff to
grab latency numbers for lock_extent(), but of course there's a lot of variance
here. The only thing that fell outside of the standard deviation were some of
the maximum latency numbers, but generally everything was within the standard
deviation. Again these are mostly just for information, deadlock fixing comes
before performance. Thanks,
Josef
bufferedrandwrite16g results
metric baseline current stdev diff
================================================================================
avg_commit_ms 10090.03 8453.23 3505.07 -16.22%
bg_count 20.80 20.80 0.45 0.00%
commits 6.20 6 1.10 -3.23%
elapsed 140.60 139.60 18.06 -0.71%
end_state_mount_ns 42066858.80 45104291.80 6317588.83 7.22%
end_state_umount_ns 6.26e+09 6.23e+09 1.76e+08 -0.39%
lock_extent_calls 10795318.60 10713477.60 233933.51 -0.76%
lock_extent_ns_max 3945976.80 7027641 1676910.27 78.10%
lock_extent_ns_mean 2258.36 2187.89 212.76 -3.12%
lock_extent_ns_min 503.60 500.80 7.44 -0.56%
lock_extent_ns_p50 1964.80 1953.60 190.31 -0.57%
lock_extent_ns_p95 4257.40 4153.20 409.06 -2.45%
lock_extent_ns_p99 6967.20 6429.20 1166.93 -7.72%
max_commit_ms 24686.20 25927 5930.26 5.03%
sys_cpu 46.68 45.52 6.83 -2.48%
write_bw_bytes 1.25e+08 1.24e+08 15352552.51 -0.61%
write_clat_ns_mean 23568.91 21840.81 4061.83 -7.33%
write_clat_ns_p50 13734.40 13683.20 1268.43 -0.37%
write_clat_ns_p99 33152 30873.60 4236.59 -6.87%
write_io_kbytes 16777216 16777216 0 0.00%
write_iops 30413.83 30228.55 3748.18 -0.61%
write_lat_ns_max 1.72e+10 1.77e+10 9.25e+09 2.64%
write_lat_ns_mean 23645.69 21934.65 4057.93 -7.24%
write_lat_ns_min 6049.40 5877.60 80.29 -2.84%
randwrite2xram results
metric baseline current stdev diff
================================================================================
avg_commit_ms 21196.15 32607.37 2286.20 53.84%
bg_count 43.80 39.80 6.46 -9.13%
commits 11.20 9.80 1.10 -12.50%
elapsed 329.20 350 4.97 6.32%
end_state_mount_ns 61846504.40 57392390.60 7914609.45 -7.20%
end_state_umount_ns 1.74e+10 1.80e+10 2.35e+09 3.44%
lock_extent_calls 26193512.60 24046630 4169768.34 -8.20%
lock_extent_ns_max 23699711.60 17524496.80 13253697.58 -26.06%
lock_extent_ns_mean 1871.04 1866.95 26.60 -0.22%
lock_extent_ns_min 495.60 501 5.41 1.09%
lock_extent_ns_p50 1681.80 1675.40 22.13 -0.38%
lock_extent_ns_p95 3487.60 3492 45.35 0.13%
lock_extent_ns_p99 4416.60 4431.80 44.77 0.34%
max_commit_ms 53482 116910.40 8707.34 118.60%
sys_cpu 9.88 8.32 1.75 -15.85%
write_bw_bytes 1.05e+08 89841897.40 20713472.34 -14.84%
write_clat_ns_mean 158731.16 234418.93 30030.49 47.68%
write_clat_ns_p50 14732.80 14873.60 448.91 0.96%
write_clat_ns_p99 75622.40 82892.80 12865.88 9.61%
write_io_kbytes 31930975.20 28774377.60 5985362.29 -9.89%
write_iops 25756.55 21934.06 5057.00 -14.84%
write_lat_ns_max 3.49e+10 5.87e+10 6.68e+09 68.41%
write_lat_ns_mean 158828.62 234533.46 30028.84 47.66%
write_lat_ns_min 7809 7923.40 371.87 1.46%
Josef Bacik (6):
btrfs: unlock locked extent area if we have contention
btrfs: add a cached_state to try_lock_extent
btrfs: use cached_state for btrfs_check_nocow_lock
btrfs: use a cached_state everywhere in relocation
btrfs: cache the failed state when locking extents
btrfs: add cached_state to read_extent_buffer_subpage
fs/btrfs/extent-io-tree.c | 68 +++++++++++++++++++++++++++------------
fs/btrfs/extent-io-tree.h | 6 ++--
fs/btrfs/extent_io.c | 17 +++++++---
fs/btrfs/file.c | 12 ++++---
fs/btrfs/inode.c | 3 +-
fs/btrfs/ordered-data.c | 7 ++--
fs/btrfs/ordered-data.h | 3 +-
fs/btrfs/relocation.c | 40 +++++++++++++++--------
8 files changed, 106 insertions(+), 50 deletions(-)
--
2.26.3
next reply other threads:[~2022-09-30 20:45 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-30 20:45 Josef Bacik [this message]
2022-09-30 20:45 ` [PATCH 1/6] btrfs: unlock locked extent area if we have contention Josef Bacik
2022-09-30 20:45 ` [PATCH 2/6] btrfs: add a cached_state to try_lock_extent Josef Bacik
2022-09-30 20:45 ` [PATCH 3/6] btrfs: use cached_state for btrfs_check_nocow_lock Josef Bacik
2022-09-30 20:45 ` [PATCH 4/6] btrfs: use a cached_state everywhere in relocation Josef Bacik
2022-09-30 20:45 ` [PATCH 5/6] btrfs: cache the failed state when locking extents Josef Bacik
2022-09-30 20:45 ` [PATCH 6/6] btrfs: add cached_state to read_extent_buffer_subpage Josef Bacik
2022-10-03 11:03 ` [PATCH 0/6] Deadlock fix and cleanups Filipe Manana
2022-10-07 16:21 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1664570261.git.josef@toxicpanda.com \
--to=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).