From: Filipe Manana <fdmanana@gmail.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 0/6] Deadlock fix and cleanups
Date: Mon, 3 Oct 2022 12:03:23 +0100 [thread overview]
Message-ID: <CAL3q7H4uDS9kVad9USMN-D-qryy+RksPyX79LMKDW7fT1WWMJg@mail.gmail.com> (raw)
In-Reply-To: <cover.1664570261.git.josef@toxicpanda.com>
On Fri, Sep 30, 2022 at 10:08 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> Hello,
>
> There's a problem with how we do our extent locking, namely that if we encounter
> any contention in the range we're trying to lock we'll leave any area we were
> able to lock locked. This creates dependency issues when the contended area is
> blocked on some other thing that depends on an operation that would need the
> range lock for the area that was successfully locked. The exact scenario we
> encountered is described in patch 1. We need to change this behavior to simply
> wait on the contended area and then attempt to re-lock our entire range,
> clearing anything that was locked before we encountered contention.
>
> The followup patches are optimizations and cleanups around caching the extent
> state in more places. Additionally I've added caching of the extent state of
> the contended area to hopefully reduce the pain of needing to unlock and then
> wait on the contended area.
>
> The first patch needs to be backported to stable because of the extent-io-tree.c
> move. Once it lands in Linus I'll manually backport the patch and send it back
> to the stable branches.
>
> I've run this through a sanity test on xfstests, and I ran it through 2 tests on
> fsperf that I felt would hammer on the extent locking code the most and be most
> likely to run into lock contention. I used the new function profiling stuff to
> grab latency numbers for lock_extent(), but of course there's a lot of variance
> here. The only thing that fell outside of the standard deviation were some of
> the maximum latency numbers, but generally everything was within the standard
> deviation. Again these are mostly just for information, deadlock fixing comes
> before performance. Thanks,
>
> Josef
>
> bufferedrandwrite16g results
> metric baseline current stdev diff
> ================================================================================
> avg_commit_ms 10090.03 8453.23 3505.07 -16.22%
> bg_count 20.80 20.80 0.45 0.00%
> commits 6.20 6 1.10 -3.23%
> elapsed 140.60 139.60 18.06 -0.71%
> end_state_mount_ns 42066858.80 45104291.80 6317588.83 7.22%
> end_state_umount_ns 6.26e+09 6.23e+09 1.76e+08 -0.39%
> lock_extent_calls 10795318.60 10713477.60 233933.51 -0.76%
> lock_extent_ns_max 3945976.80 7027641 1676910.27 78.10%
> lock_extent_ns_mean 2258.36 2187.89 212.76 -3.12%
> lock_extent_ns_min 503.60 500.80 7.44 -0.56%
> lock_extent_ns_p50 1964.80 1953.60 190.31 -0.57%
> lock_extent_ns_p95 4257.40 4153.20 409.06 -2.45%
> lock_extent_ns_p99 6967.20 6429.20 1166.93 -7.72%
> max_commit_ms 24686.20 25927 5930.26 5.03%
> sys_cpu 46.68 45.52 6.83 -2.48%
> write_bw_bytes 1.25e+08 1.24e+08 15352552.51 -0.61%
> write_clat_ns_mean 23568.91 21840.81 4061.83 -7.33%
> write_clat_ns_p50 13734.40 13683.20 1268.43 -0.37%
> write_clat_ns_p99 33152 30873.60 4236.59 -6.87%
> write_io_kbytes 16777216 16777216 0 0.00%
> write_iops 30413.83 30228.55 3748.18 -0.61%
> write_lat_ns_max 1.72e+10 1.77e+10 9.25e+09 2.64%
> write_lat_ns_mean 23645.69 21934.65 4057.93 -7.24%
> write_lat_ns_min 6049.40 5877.60 80.29 -2.84%
>
> randwrite2xram results
> metric baseline current stdev diff
> ================================================================================
> avg_commit_ms 21196.15 32607.37 2286.20 53.84%
> bg_count 43.80 39.80 6.46 -9.13%
> commits 11.20 9.80 1.10 -12.50%
> elapsed 329.20 350 4.97 6.32%
> end_state_mount_ns 61846504.40 57392390.60 7914609.45 -7.20%
> end_state_umount_ns 1.74e+10 1.80e+10 2.35e+09 3.44%
> lock_extent_calls 26193512.60 24046630 4169768.34 -8.20%
> lock_extent_ns_max 23699711.60 17524496.80 13253697.58 -26.06%
> lock_extent_ns_mean 1871.04 1866.95 26.60 -0.22%
> lock_extent_ns_min 495.60 501 5.41 1.09%
> lock_extent_ns_p50 1681.80 1675.40 22.13 -0.38%
> lock_extent_ns_p95 3487.60 3492 45.35 0.13%
> lock_extent_ns_p99 4416.60 4431.80 44.77 0.34%
> max_commit_ms 53482 116910.40 8707.34 118.60%
> sys_cpu 9.88 8.32 1.75 -15.85%
> write_bw_bytes 1.05e+08 89841897.40 20713472.34 -14.84%
> write_clat_ns_mean 158731.16 234418.93 30030.49 47.68%
> write_clat_ns_p50 14732.80 14873.60 448.91 0.96%
> write_clat_ns_p99 75622.40 82892.80 12865.88 9.61%
> write_io_kbytes 31930975.20 28774377.60 5985362.29 -9.89%
> write_iops 25756.55 21934.06 5057.00 -14.84%
> write_lat_ns_max 3.49e+10 5.87e+10 6.68e+09 68.41%
> write_lat_ns_mean 158828.62 234533.46 30028.84 47.66%
> write_lat_ns_min 7809 7923.40 371.87 1.46%
>
> Josef Bacik (6):
> btrfs: unlock locked extent area if we have contention
> btrfs: add a cached_state to try_lock_extent
> btrfs: use cached_state for btrfs_check_nocow_lock
> btrfs: use a cached_state everywhere in relocation
> btrfs: cache the failed state when locking extents
> btrfs: add cached_state to read_extent_buffer_subpage
The whole patchset looks good to me, so
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Thanks.
>
> fs/btrfs/extent-io-tree.c | 68 +++++++++++++++++++++++++++------------
> fs/btrfs/extent-io-tree.h | 6 ++--
> fs/btrfs/extent_io.c | 17 +++++++---
> fs/btrfs/file.c | 12 ++++---
> fs/btrfs/inode.c | 3 +-
> fs/btrfs/ordered-data.c | 7 ++--
> fs/btrfs/ordered-data.h | 3 +-
> fs/btrfs/relocation.c | 40 +++++++++++++++--------
> 8 files changed, 106 insertions(+), 50 deletions(-)
>
> --
> 2.26.3
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
next prev parent reply other threads:[~2022-10-03 11:04 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-30 20:45 [PATCH 0/6] Deadlock fix and cleanups Josef Bacik
2022-09-30 20:45 ` [PATCH 1/6] btrfs: unlock locked extent area if we have contention Josef Bacik
2022-09-30 20:45 ` [PATCH 2/6] btrfs: add a cached_state to try_lock_extent Josef Bacik
2022-09-30 20:45 ` [PATCH 3/6] btrfs: use cached_state for btrfs_check_nocow_lock Josef Bacik
2022-09-30 20:45 ` [PATCH 4/6] btrfs: use a cached_state everywhere in relocation Josef Bacik
2022-09-30 20:45 ` [PATCH 5/6] btrfs: cache the failed state when locking extents Josef Bacik
2022-09-30 20:45 ` [PATCH 6/6] btrfs: add cached_state to read_extent_buffer_subpage Josef Bacik
2022-10-03 11:03 ` Filipe Manana [this message]
2022-10-07 16:21 ` [PATCH 0/6] Deadlock fix and cleanups David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAL3q7H4uDS9kVad9USMN-D-qryy+RksPyX79LMKDW7fT1WWMJg@mail.gmail.com \
--to=fdmanana@gmail.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).