linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH 0/6] Deadlock fix and cleanups
Date: Fri, 30 Sep 2022 16:45:07 -0400	[thread overview]
Message-ID: <cover.1664570261.git.josef@toxicpanda.com> (raw)

Hello,

There's a problem with how we do our extent locking, namely that if we encounter
any contention in the range we're trying to lock we'll leave any area we were
able to lock locked.  This creates dependency issues when the contended area is
blocked on some other thing that depends on an operation that would need the
range lock for the area that was successfully locked.  The exact scenario we
encountered is described in patch 1.  We need to change this behavior to simply
wait on the contended area and then attempt to re-lock our entire range,
clearing anything that was locked before we encountered contention.

The followup patches are optimizations and cleanups around caching the extent
state in more places.  Additionally I've added caching of the extent state of
the contended area to hopefully reduce the pain of needing to unlock and then
wait on the contended area.

The first patch needs to be backported to stable because of the extent-io-tree.c
move.  Once it lands in Linus I'll manually backport the patch and send it back
to the stable branches.

I've run this through a sanity test on xfstests, and I ran it through 2 tests on
fsperf that I felt would hammer on the extent locking code the most and be most
likely to run into lock contention.  I used the new function profiling stuff to
grab latency numbers for lock_extent(), but of course there's a lot of variance
here.  The only thing that fell outside of the standard deviation were some of
the maximum latency numbers, but generally everything was within the standard
deviation.  Again these are mostly just for information, deadlock fixing comes
before performance.  Thanks,

Josef

bufferedrandwrite16g results
      metric           baseline       current        stdev            diff
================================================================================
avg_commit_ms            10090.03       8453.23       3505.07   -16.22%
bg_count                    20.80         20.80          0.45     0.00%
commits                      6.20             6          1.10    -3.23%
elapsed                    140.60        139.60         18.06    -0.71%
end_state_mount_ns    42066858.80   45104291.80    6317588.83     7.22%
end_state_umount_ns      6.26e+09      6.23e+09      1.76e+08    -0.39%
lock_extent_calls     10795318.60   10713477.60     233933.51    -0.76%
lock_extent_ns_max     3945976.80       7027641    1676910.27    78.10%
lock_extent_ns_mean       2258.36       2187.89        212.76    -3.12%
lock_extent_ns_min         503.60        500.80          7.44    -0.56%
lock_extent_ns_p50        1964.80       1953.60        190.31    -0.57%
lock_extent_ns_p95        4257.40       4153.20        409.06    -2.45%
lock_extent_ns_p99        6967.20       6429.20       1166.93    -7.72%
max_commit_ms            24686.20         25927       5930.26     5.03%
sys_cpu                     46.68         45.52          6.83    -2.48%
write_bw_bytes           1.25e+08      1.24e+08   15352552.51    -0.61%
write_clat_ns_mean       23568.91      21840.81       4061.83    -7.33%
write_clat_ns_p50        13734.40      13683.20       1268.43    -0.37%
write_clat_ns_p99           33152      30873.60       4236.59    -6.87%
write_io_kbytes          16777216      16777216             0     0.00%
write_iops               30413.83      30228.55       3748.18    -0.61%
write_lat_ns_max         1.72e+10      1.77e+10      9.25e+09     2.64%
write_lat_ns_mean        23645.69      21934.65       4057.93    -7.24%
write_lat_ns_min          6049.40       5877.60         80.29    -2.84%

randwrite2xram results
      metric           baseline       current        stdev            diff
================================================================================
avg_commit_ms            21196.15      32607.37       2286.20    53.84%
bg_count                    43.80         39.80          6.46    -9.13%
commits                     11.20          9.80          1.10   -12.50%
elapsed                    329.20           350          4.97     6.32%
end_state_mount_ns    61846504.40   57392390.60    7914609.45    -7.20%
end_state_umount_ns      1.74e+10      1.80e+10      2.35e+09     3.44%
lock_extent_calls     26193512.60      24046630    4169768.34    -8.20%
lock_extent_ns_max    23699711.60   17524496.80   13253697.58   -26.06%
lock_extent_ns_mean       1871.04       1866.95         26.60    -0.22%
lock_extent_ns_min         495.60           501          5.41     1.09%
lock_extent_ns_p50        1681.80       1675.40         22.13    -0.38%
lock_extent_ns_p95        3487.60          3492         45.35     0.13%
lock_extent_ns_p99        4416.60       4431.80         44.77     0.34%
max_commit_ms               53482     116910.40       8707.34   118.60%
sys_cpu                      9.88          8.32          1.75   -15.85%
write_bw_bytes           1.05e+08   89841897.40   20713472.34   -14.84%
write_clat_ns_mean      158731.16     234418.93      30030.49    47.68%
write_clat_ns_p50        14732.80      14873.60        448.91     0.96%
write_clat_ns_p99        75622.40      82892.80      12865.88     9.61%
write_io_kbytes       31930975.20   28774377.60    5985362.29    -9.89%
write_iops               25756.55      21934.06       5057.00   -14.84%
write_lat_ns_max         3.49e+10      5.87e+10      6.68e+09    68.41%
write_lat_ns_mean       158828.62     234533.46      30028.84    47.66%
write_lat_ns_min             7809       7923.40        371.87     1.46%

Josef Bacik (6):
  btrfs: unlock locked extent area if we have contention
  btrfs: add a cached_state to try_lock_extent
  btrfs: use cached_state for btrfs_check_nocow_lock
  btrfs: use a cached_state everywhere in relocation
  btrfs: cache the failed state when locking extents
  btrfs: add cached_state to read_extent_buffer_subpage

 fs/btrfs/extent-io-tree.c | 68 +++++++++++++++++++++++++++------------
 fs/btrfs/extent-io-tree.h |  6 ++--
 fs/btrfs/extent_io.c      | 17 +++++++---
 fs/btrfs/file.c           | 12 ++++---
 fs/btrfs/inode.c          |  3 +-
 fs/btrfs/ordered-data.c   |  7 ++--
 fs/btrfs/ordered-data.h   |  3 +-
 fs/btrfs/relocation.c     | 40 +++++++++++++++--------
 8 files changed, 106 insertions(+), 50 deletions(-)

-- 
2.26.3


             reply	other threads:[~2022-09-30 20:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-30 20:45 Josef Bacik [this message]
2022-09-30 20:45 ` [PATCH 1/6] btrfs: unlock locked extent area if we have contention Josef Bacik
2022-09-30 20:45 ` [PATCH 2/6] btrfs: add a cached_state to try_lock_extent Josef Bacik
2022-09-30 20:45 ` [PATCH 3/6] btrfs: use cached_state for btrfs_check_nocow_lock Josef Bacik
2022-09-30 20:45 ` [PATCH 4/6] btrfs: use a cached_state everywhere in relocation Josef Bacik
2022-09-30 20:45 ` [PATCH 5/6] btrfs: cache the failed state when locking extents Josef Bacik
2022-09-30 20:45 ` [PATCH 6/6] btrfs: add cached_state to read_extent_buffer_subpage Josef Bacik
2022-10-03 11:03 ` [PATCH 0/6] Deadlock fix and cleanups Filipe Manana
2022-10-07 16:21 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1664570261.git.josef@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).