From: Josef Bacik <josef@toxicpanda.com>
To: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 1/3] btrfs: do not take the log_mutex of the subvolume when pinning the log
Date: Tue, 11 Aug 2020 10:46:47 -0400 [thread overview]
Message-ID: <f06846d9-7d89-560a-5ea6-bb78619086bf@toxicpanda.com> (raw)
In-Reply-To: <20200811114337.689881-1-fdmanana@kernel.org>
On 8/11/20 7:43 AM, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> During a rename we pin the log to make sure no one commits a log that
> reflects an ongoing rename operation, as it might result in a committed
> log where it recorded the unlink of the old name without having recorded
> the new name. However we are taking the subvolume's log_mutex before
> incrementing the log_writers counter, which is not necessary since that
> counter is atomic and we only remove the old name from the log and add
> the new name to the log after we have incremented log_writers, ensuring
> that no one can commit the log after we have removed the old name from
> the log and before we added the new name to the log.
>
> By taking the log_mutex lock we are just adding unnecessary contention on
> the lock, which can become visible for workloads that mix renames with
> fsyncs, writes for files opened with O_SYNC and unlink operations (if the
> inode or its parent were fsynced before in the current transaction).
>
> So just remove the lock and unlock of the subvolume's log_mutex at
> btrfs_pin_log_trans().
>
> Using dbench, which mixes different types of operations that end up taking
> that mutex (fsyncs, renames, unlinks and writes into files opened with
> O_SYNC) revealed some small gains. The following script that calls dbench
> was used:
>
> #!/bin/bash
>
> DEV=/dev/nvme0n1
> MNT=/mnt/btrfs
> MOUNT_OPTIONS="-o ssd -o space_cache=v2"
> MKFS_OPTIONS="-m single -d single"
> THREADS=32
>
> echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> mkfs.btrfs -f $MKFS_OPTIONS $DEV
> mount $MOUNT_OPTIONS $DEV $MNT
>
> dbench -s -t 600 -D $MNT $THREADS
>
> umount $MNT
>
> The test was run on bare metal, no virtualization, on a box with 12 cores
> (Intel i7-8700), 64Gb of RAM and using a NVMe device, with a kernel
> configuration that is the default of typical distributions (debian in this
> case), without debug options enabled (kasan, kmemleak, slub debug, debug
> of page allocations, lock debugging, etc).
>
> Results before this patch:
>
> Operation Count AvgLat MaxLat
> ----------------------------------------
> NTCreateX 4410848 0.017 738.640
> Close 3240222 0.001 0.834
> Rename 186850 7.478 1272.476
> Unlink 890875 0.128 785.018
> Deltree 128 2.846 12.081
> Mkdir 64 0.002 0.003
> Qpathinfo 3997659 0.009 11.171
> Qfileinfo 701307 0.001 0.478
> Qfsinfo 733494 0.002 1.103
> Sfileinfo 359362 0.004 3.266
> Find 1546226 0.041 4.128
> WriteX 2202803 7.905 1376.989
> ReadX 6917775 0.003 3.887
> LockX 14392 0.002 0.043
> UnlockX 14392 0.001 0.085
> Flush 309225 0.128 1033.936
>
> Throughput 231.555 MB/sec (sync open) 32 clients 32 procs max_latency=1376.993 ms
>
> Results after this patch:
>
> Operation Count AvgLat MaxLat
> ----------------------------------------
> NTCreateX 4603244 0.017 232.776
> Close 3381299 0.001 1.041
> Rename 194871 7.251 1073.165
> Unlink 929730 0.133 119.233
> Deltree 128 2.871 10.199
> Mkdir 64 0.002 0.004
> Qpathinfo 4171343 0.009 11.317
> Qfileinfo 731227 0.001 1.635
> Qfsinfo 765079 0.002 3.568
> Sfileinfo 374881 0.004 1.220
> Find 1612964 0.041 4.675
> WriteX 2296720 7.569 1178.204
> ReadX 7213633 0.003 3.075
> LockX 14976 0.002 0.076
> UnlockX 14976 0.001 0.061
> Flush 322635 0.102 579.505
>
> Throughput 241.4 MB/sec (sync open) 32 clients 32 procs max_latency=1178.207 ms
> (+4.3% throughput, -14.4% max latency)
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
prev parent reply other threads:[~2020-08-11 14:46 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-11 11:43 [PATCH 1/3] btrfs: do not take the log_mutex of the subvolume when pinning the log fdmanana
2020-08-11 14:46 ` Josef Bacik [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f06846d9-7d89-560a-5ea6-bb78619086bf@toxicpanda.com \
--to=josef@toxicpanda.com \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox