linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@gmail.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 0/6] Deadlock fix and cleanups
Date: Mon, 3 Oct 2022 12:03:23 +0100	[thread overview]
Message-ID: <CAL3q7H4uDS9kVad9USMN-D-qryy+RksPyX79LMKDW7fT1WWMJg@mail.gmail.com> (raw)
In-Reply-To: <cover.1664570261.git.josef@toxicpanda.com>

On Fri, Sep 30, 2022 at 10:08 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> Hello,
>
> There's a problem with how we do our extent locking, namely that if we encounter
> any contention in the range we're trying to lock we'll leave any area we were
> able to lock locked.  This creates dependency issues when the contended area is
> blocked on some other thing that depends on an operation that would need the
> range lock for the area that was successfully locked.  The exact scenario we
> encountered is described in patch 1.  We need to change this behavior to simply
> wait on the contended area and then attempt to re-lock our entire range,
> clearing anything that was locked before we encountered contention.
>
> The followup patches are optimizations and cleanups around caching the extent
> state in more places.  Additionally I've added caching of the extent state of
> the contended area to hopefully reduce the pain of needing to unlock and then
> wait on the contended area.
>
> The first patch needs to be backported to stable because of the extent-io-tree.c
> move.  Once it lands in Linus I'll manually backport the patch and send it back
> to the stable branches.
>
> I've run this through a sanity test on xfstests, and I ran it through 2 tests on
> fsperf that I felt would hammer on the extent locking code the most and be most
> likely to run into lock contention.  I used the new function profiling stuff to
> grab latency numbers for lock_extent(), but of course there's a lot of variance
> here.  The only thing that fell outside of the standard deviation were some of
> the maximum latency numbers, but generally everything was within the standard
> deviation.  Again these are mostly just for information, deadlock fixing comes
> before performance.  Thanks,
>
> Josef
>
> bufferedrandwrite16g results
>       metric           baseline       current        stdev            diff
> ================================================================================
> avg_commit_ms            10090.03       8453.23       3505.07   -16.22%
> bg_count                    20.80         20.80          0.45     0.00%
> commits                      6.20             6          1.10    -3.23%
> elapsed                    140.60        139.60         18.06    -0.71%
> end_state_mount_ns    42066858.80   45104291.80    6317588.83     7.22%
> end_state_umount_ns      6.26e+09      6.23e+09      1.76e+08    -0.39%
> lock_extent_calls     10795318.60   10713477.60     233933.51    -0.76%
> lock_extent_ns_max     3945976.80       7027641    1676910.27    78.10%
> lock_extent_ns_mean       2258.36       2187.89        212.76    -3.12%
> lock_extent_ns_min         503.60        500.80          7.44    -0.56%
> lock_extent_ns_p50        1964.80       1953.60        190.31    -0.57%
> lock_extent_ns_p95        4257.40       4153.20        409.06    -2.45%
> lock_extent_ns_p99        6967.20       6429.20       1166.93    -7.72%
> max_commit_ms            24686.20         25927       5930.26     5.03%
> sys_cpu                     46.68         45.52          6.83    -2.48%
> write_bw_bytes           1.25e+08      1.24e+08   15352552.51    -0.61%
> write_clat_ns_mean       23568.91      21840.81       4061.83    -7.33%
> write_clat_ns_p50        13734.40      13683.20       1268.43    -0.37%
> write_clat_ns_p99           33152      30873.60       4236.59    -6.87%
> write_io_kbytes          16777216      16777216             0     0.00%
> write_iops               30413.83      30228.55       3748.18    -0.61%
> write_lat_ns_max         1.72e+10      1.77e+10      9.25e+09     2.64%
> write_lat_ns_mean        23645.69      21934.65       4057.93    -7.24%
> write_lat_ns_min          6049.40       5877.60         80.29    -2.84%
>
> randwrite2xram results
>       metric           baseline       current        stdev            diff
> ================================================================================
> avg_commit_ms            21196.15      32607.37       2286.20    53.84%
> bg_count                    43.80         39.80          6.46    -9.13%
> commits                     11.20          9.80          1.10   -12.50%
> elapsed                    329.20           350          4.97     6.32%
> end_state_mount_ns    61846504.40   57392390.60    7914609.45    -7.20%
> end_state_umount_ns      1.74e+10      1.80e+10      2.35e+09     3.44%
> lock_extent_calls     26193512.60      24046630    4169768.34    -8.20%
> lock_extent_ns_max    23699711.60   17524496.80   13253697.58   -26.06%
> lock_extent_ns_mean       1871.04       1866.95         26.60    -0.22%
> lock_extent_ns_min         495.60           501          5.41     1.09%
> lock_extent_ns_p50        1681.80       1675.40         22.13    -0.38%
> lock_extent_ns_p95        3487.60          3492         45.35     0.13%
> lock_extent_ns_p99        4416.60       4431.80         44.77     0.34%
> max_commit_ms               53482     116910.40       8707.34   118.60%
> sys_cpu                      9.88          8.32          1.75   -15.85%
> write_bw_bytes           1.05e+08   89841897.40   20713472.34   -14.84%
> write_clat_ns_mean      158731.16     234418.93      30030.49    47.68%
> write_clat_ns_p50        14732.80      14873.60        448.91     0.96%
> write_clat_ns_p99        75622.40      82892.80      12865.88     9.61%
> write_io_kbytes       31930975.20   28774377.60    5985362.29    -9.89%
> write_iops               25756.55      21934.06       5057.00   -14.84%
> write_lat_ns_max         3.49e+10      5.87e+10      6.68e+09    68.41%
> write_lat_ns_mean       158828.62     234533.46      30028.84    47.66%
> write_lat_ns_min             7809       7923.40        371.87     1.46%
>
> Josef Bacik (6):
>   btrfs: unlock locked extent area if we have contention
>   btrfs: add a cached_state to try_lock_extent
>   btrfs: use cached_state for btrfs_check_nocow_lock
>   btrfs: use a cached_state everywhere in relocation
>   btrfs: cache the failed state when locking extents
>   btrfs: add cached_state to read_extent_buffer_subpage

The whole patchset looks good to me, so

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

>
>  fs/btrfs/extent-io-tree.c | 68 +++++++++++++++++++++++++++------------
>  fs/btrfs/extent-io-tree.h |  6 ++--
>  fs/btrfs/extent_io.c      | 17 +++++++---
>  fs/btrfs/file.c           | 12 ++++---
>  fs/btrfs/inode.c          |  3 +-
>  fs/btrfs/ordered-data.c   |  7 ++--
>  fs/btrfs/ordered-data.h   |  3 +-
>  fs/btrfs/relocation.c     | 40 +++++++++++++++--------
>  8 files changed, 106 insertions(+), 50 deletions(-)
>
> --
> 2.26.3
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

  parent reply	other threads:[~2022-10-03 11:04 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-30 20:45 [PATCH 0/6] Deadlock fix and cleanups Josef Bacik
2022-09-30 20:45 ` [PATCH 1/6] btrfs: unlock locked extent area if we have contention Josef Bacik
2022-09-30 20:45 ` [PATCH 2/6] btrfs: add a cached_state to try_lock_extent Josef Bacik
2022-09-30 20:45 ` [PATCH 3/6] btrfs: use cached_state for btrfs_check_nocow_lock Josef Bacik
2022-09-30 20:45 ` [PATCH 4/6] btrfs: use a cached_state everywhere in relocation Josef Bacik
2022-09-30 20:45 ` [PATCH 5/6] btrfs: cache the failed state when locking extents Josef Bacik
2022-09-30 20:45 ` [PATCH 6/6] btrfs: add cached_state to read_extent_buffer_subpage Josef Bacik
2022-10-03 11:03 ` Filipe Manana [this message]
2022-10-07 16:21 ` [PATCH 0/6] Deadlock fix and cleanups David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL3q7H4uDS9kVad9USMN-D-qryy+RksPyX79LMKDW7fT1WWMJg@mail.gmail.com \
    --to=fdmanana@gmail.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).