Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 7/7] btrfs: make concurrent fsyncs wait less when waiting for a transaction commit
Date: Wed, 27 Jan 2021 10:26:37 -0500	[thread overview]
Message-ID: <0abbf39a-d65b-1f54-d654-e09533a82e47@toxicpanda.com> (raw)
In-Reply-To: <8c391b7c0a7a6a7e827644d424cd4c343f39588e.1611742865.git.fdmanana@suse.com>

On 1/27/21 5:35 AM, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Often an fsync needs to fallback to a transaction commit for several
> reasons (to ensure consistency after a power failure, a new block group
> was allocated or a temporary error such as ENOMEM or ENOSPC happened).
> 
> In that case the log is marked as needing a full commit and any concurrent
> tasks attempting to log inodes or commit the log will also fallback to the
> transaction commit. When this happens they all wait for the task that first
> started the transaction commit to finish the transaction commit - however
> they wait until the full transaction commit happens, which is not needed,
> as they only need to wait for the superblocks to be persisted and not for
> unpinning all the extents pinned during the transaction's lifetime, which
> even for short lived transactions can be a few thousand and take some
> significant amount of time to complete - for dbench workloads I have
> observed up to 4~5 milliseconds of time spent unpinning extents in the
> worst cases, and the number of pinned extents was between 2 to 3 thousand.
> 
> So allow fsync tasks to skip waiting for the unpinning of extents when
> they call btrfs_commit_transaction() and they were not the task that
> started the transaction commit (that one has to do it, the alternative
> would be to offload the transaction commit to another task so that it
> could avoid waiting for the extent unpinning or offload the extent
> unpinning to another task).
> 
> This patch is part of a patchset comprised of the following patches:
> 
>    btrfs: remove unnecessary directory inode item update when deleting dir entry
>    btrfs: stop setting nbytes when filling inode item for logging
>    btrfs: avoid logging new ancestor inodes when logging new inode
>    btrfs: skip logging directories already logged when logging all parents
>    btrfs: skip logging inodes already logged when logging new entries
>    btrfs: remove unnecessary check_parent_dirs_for_sync()
>    btrfs: make concurrent fsyncs wait less when waiting for a transaction commit
> 
> After applying the entire patchset, dbench shows improvements in respect
> to throughput and latency. The script used to measure it is the following:
> 
>    $ cat dbench-test.sh
>    #!/bin/bash
> 
>    DEV=/dev/sdk
>    MNT=/mnt/sdk
>    MOUNT_OPTIONS="-o ssd"
>    MKFS_OPTIONS="-m single -d single"
> 
>    echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> 
>    umount $DEV &> /dev/null
>    mkfs.btrfs -f $MKFS_OPTIONS $DEV
>    mount $MOUNT_OPTIONS $DEV $MNT
> 
>    dbench -D $MNT -t 300 64
> 
>    umount $MNT
> 
> The test was run on a physical machine with 12 cores (Intel corei7), 64G
> of ram, using a NVMe device and a non-debug kernel configuration (Debian's
> default configuration).
> 
> Before applying patchset, 32 clients:
> 
>   Operation      Count    AvgLat    MaxLat
>   ----------------------------------------
>   NTCreateX    9627107     0.153    61.938
>   Close        7072076     0.001     3.175
>   Rename        407633     1.222    44.439
>   Unlink       1943895     0.658    44.440
>   Deltree          256    17.339   110.891
>   Mkdir            128     0.003     0.009
>   Qpathinfo    8725406     0.064    17.850
>   Qfileinfo    1529516     0.001     2.188
>   Qfsinfo      1599884     0.002     1.457
>   Sfileinfo     784200     0.005     3.562
>   Find         3373513     0.411    30.312
>   WriteX       4802132     0.053    29.054
>   ReadX        15089959     0.002     5.801
>   LockX          31344     0.002     0.425
>   UnlockX        31344     0.001     0.173
>   Flush         674724     5.952   341.830
> 
> Throughput 1008.02 MB/sec  32 clients  32 procs  max_latency=341.833 ms
> 
> After applying patchset, 32 clients:
> 
> After patchset, with 32 clients:
> 
>   Operation      Count    AvgLat    MaxLat
>   ----------------------------------------
>   NTCreateX    9931568     0.111    25.597
>   Close        7295730     0.001     2.171
>   Rename        420549     0.982    49.714
>   Unlink       2005366     0.497    39.015
>   Deltree          256    11.149    89.242
>   Mkdir            128     0.002     0.014
>   Qpathinfo    9001863     0.049    20.761
>   Qfileinfo    1577730     0.001     2.546
>   Qfsinfo      1650508     0.002     3.531
>   Sfileinfo     809031     0.005     5.846
>   Find         3480259     0.309    23.977
>   WriteX       4952505     0.043    41.283
>   ReadX        15568127     0.002     5.476
>   LockX          32338     0.002     0.978
>   UnlockX        32338     0.001     2.032
>   Flush         696017     7.485   228.835
> 
> Throughput 1049.91 MB/sec  32 clients  32 procs  max_latency=228.847 ms
> 
>   --> +4.1% throughput, -39.6% max latency
> 
> Before applying patchset, 64 clients:
> 
>   Operation      Count    AvgLat    MaxLat
>   ----------------------------------------
>   NTCreateX    8956748     0.342   108.312
>   Close        6579660     0.001     3.823
>   Rename        379209     2.396    81.897
>   Unlink       1808625     1.108   131.148
>   Deltree          256    25.632   172.176
>   Mkdir            128     0.003     0.018
>   Qpathinfo    8117615     0.131    55.916
>   Qfileinfo    1423495     0.001     2.635
>   Qfsinfo      1488496     0.002     5.412
>   Sfileinfo     729472     0.007     8.643
>   Find         3138598     0.855    78.321
>   WriteX       4470783     0.102    79.442
>   ReadX        14038139     0.002     7.578
>   LockX          29158     0.002     0.844
>   UnlockX        29158     0.001     0.567
>   Flush         627746    14.168   506.151
> 
> Throughput 924.738 MB/sec  64 clients  64 procs  max_latency=506.154 ms
> 
> After applying patchset, 64 clients:
> 
>   Operation      Count    AvgLat    MaxLat
>   ----------------------------------------
>   NTCreateX    9069003     0.303    43.193
>   Close        6662328     0.001     3.888
>   Rename        383976     2.194    46.418
>   Unlink       1831080     1.022    43.873
>   Deltree          256    24.037   155.763
>   Mkdir            128     0.002     0.005
>   Qpathinfo    8219173     0.137    30.233
>   Qfileinfo    1441203     0.001     3.204
>   Qfsinfo      1507092     0.002     4.055
>   Sfileinfo     738775     0.006     5.431
>   Find         3177874     0.936    38.170
>   WriteX       4526152     0.084    39.518
>   ReadX        14213562     0.002    24.760
>   LockX          29522     0.002     1.221
>   UnlockX        29522     0.001     0.694
>   Flush         635652    14.358   422.039
> 
> Throughput 990.13 MB/sec  64 clients  64 procs  max_latency=422.043 ms
> 
>   --> +6.8% throughput, -18.1% max latency
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Funnily enough I've got a patch to async off unpinning as it drastically affects 
our WhatsApp workload.  Once that lands maybe we can undo this bit.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

  reply	other threads:[~2021-01-27 15:32 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-27 10:34 [PATCH 0/7] btrfs: more performance improvements for dbench workloads fdmanana
2021-01-27 10:34 ` [PATCH 1/7] btrfs: remove unnecessary directory inode item update when deleting dir entry fdmanana
2021-01-27 10:34 ` [PATCH 2/7] btrfs: stop setting nbytes when filling inode item for logging fdmanana
2021-01-27 10:34 ` [PATCH 3/7] btrfs: avoid logging new ancestor inodes when logging new inode fdmanana
2021-01-27 10:34 ` [PATCH 4/7] btrfs: skip logging directories already logged when logging all parents fdmanana
2021-01-27 10:34 ` [PATCH 5/7] btrfs: skip logging inodes already logged when logging new entries fdmanana
2021-01-27 10:34 ` [PATCH 6/7] btrfs: remove unnecessary check_parent_dirs_for_sync() fdmanana
2021-01-27 15:23   ` Josef Bacik
2021-01-27 15:36     ` Filipe Manana
2021-01-27 15:42       ` Josef Bacik
2021-01-27 10:35 ` [PATCH 7/7] btrfs: make concurrent fsyncs wait less when waiting for a transaction commit fdmanana
2021-01-27 15:26   ` Josef Bacik [this message]
2021-01-27 15:42 ` [PATCH 0/7] btrfs: more performance improvements for dbench workloads Josef Bacik
2021-02-01 21:56 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0abbf39a-d65b-1f54-d654-e09533a82e47@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox