From: Anand Jain <anand.jain@oracle.com>
To: linux-btrfs@vger.kernel.org
Cc: dsterba@suse.com, wqu@suse.com, hrx@bupt.moe, waxhead@dirtcellar.net
Subject: Re: [PATCH v2 0/3] raid1 balancing methods
Date: Fri, 11 Oct 2024 11:35:07 +0800 [thread overview]
Message-ID: <662f66fe-8016-49f3-9c2e-6624a8fe3679@oracle.com> (raw)
In-Reply-To: <cover.1728608421.git.anand.jain@oracle.com>
On 11/10/24 8:19 am, Anand Jain wrote:
> v2:
> 1. Move new features to CONFIG_BTRFS_EXPERIMENTAL instead of CONFIG_BTRFS_DEBUG.
> 2. Correct the typo from %est_wait to %best_wait.
> 3. Initialize %best_wait to U64_MAX and remove the check for 0.
> 4. Implement rotation with a minimum contiguous read threshold before
> switching to the next stripe. Configure this, using:
>
> echo rotation:[min_contiguous_read] > /sys/fs/btrfs/<uuid>/read_policy
>
> The default value is the sector size, and the min_contiguous_read
> value must be a multiple of the sector size.
>
> 5. Tested FIO random read/write and defrag compression workloads with
> min_contiguous_read set to sector size, 192k, and 256k.
>
> RAID1 balancing method rotation is better for multi-process workloads
> such as fio and also single-process workload such as defragmentation.
>
> $ fio --filename=/btrfs/foo --size=5Gi --direct=1 --rw=randrw --bs=4k \
> --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \
> --time_based --group_reporting --name=iops-test-job --eta-newline=1
>
>
> | | | | Read I/O count |
> | | Read | Write | devid1 | devid2 |
> |---------|------------|------------|--------|--------|
> | pid | 20.3MiB/s | 20.5MiB/s | 313895 | 313895 |
> | rotation| | | | |
> | 4096| 20.4MiB/s | 20.5MiB/s | 313895 | 313895 |
> | 196608| 20.2MiB/s | 20.2MiB/s | 310152 | 310175 |
> | 262144| 20.3MiB/s | 20.4MiB/s | 312180 | 312191 |
> | latency| 18.4MiB/s | 18.4MiB/s | 272980 | 291683 |
> | devid:1 | 14.8MiB/s | 14.9MiB/s | 456376 | 0 |
>
> rotation RAID1 balancing technique performs more than 2x better for
> single-process defrag.
>
> $ time -p btrfs filesystem defrag -r -f -c /btrfs
>
>
> | | Time | Read I/O Count |
> | | Real | devid1 | devid2 |
> |---------|-------|--------|--------|
> | pid | 18.00s| 3800 | 0 |
> | rotation| | | |
> | 4096| 8.95s| 1900 | 1901 |
> | 196608| 8.50s| 1881 | 1919 |
> | 262144| 8.80s| 1881 | 1919 |
> | latency | 17.18s| 3800 | 0 |
> | devid:2 | 17.48s| 0 | 3800 |
>
Copy and paste error. Please ignore the below paragraph. Thx.
---vvv--- ignore ---vvv----
> Rotation keeps all devices active, and for now, the Rotation RAID1
> balancing method is preferable as default. More workload testing is
> needed while the code is EXPERIMENTAL.
> While Latency is better during the failing/unstable block layer transport.
> As of now these two techniques, are needed to be further independently
> tested with different worloads, and in the long term we should be merge
> these technique to a unified heuristic.
---^^^------------^^^------
> Rotation keeps all devices active, and for now, the Rotation RAID1
> balancing method should be the default. More workload testing is needed
> while the code is EXPERIMENTAL.
>
> Latency is smarter with unstable block layer transport.
>
> Both techniques need independent testing across workloads, with the goal of
> eventually merging them into a unified approach? for the long term.
>
> Devid is a hands-on approach, provides manual or user-space script control.
>
> These RAID1 balancing methods are tunable via the sysfs knob.
> The mount -o option and btrfs properties are under consideration.
>
> Thx.
>
> --------- original v1 ------------
>
> The RAID1-balancing methods helps distribute read I/O across devices, and
> this patch introduces three balancing methods: rotation, latency, and
> devid. These methods are enabled under the `CONFIG_BTRFS_DEBUG` config
> option and are on top of the previously added
> `/sys/fs/btrfs/<UUID>/read_policy` interface to configure the desired
> RAID1 read balancing method.
>
> I've tested these patches using fio and filesystem defragmentation
> workloads on a two-device RAID1 setup (with both data and metadata
> mirrored across identical devices). I tracked device read counts by
> extracting stats from `/sys/devices/<..>/stat` for each device. Below is
> a summary of the results, with each result the average of three
> iterations.
>
> A typical generic random rw workload:
>
> $ fio --filename=/btrfs/foo --size=10Gi --direct=1 --rw=randrw --bs=4k \
> --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based \
> --group_reporting --name=iops-test-job --eta-newline=1
>
> | | | | Read I/O count |
> | | Read | Write | devid1 | devid2 |
> |---------|------------|------------|--------|--------|
> | pid | 29.4MiB/s | 29.5MiB/s | 456548 | 447975 |
> | rotation| 29.3MiB/s | 29.3MiB/s | 450105 | 450055 |
> | latency | 21.9MiB/s | 21.9MiB/s | 672387 | 0 |
> | devid:1 | 22.0MiB/s | 22.0MiB/s | 674788 | 0 |
>
> Defragmentation with compression workload:
>
> $ xfs_io -f -d -c 'pwrite -S 0xab 0 1G' /btrfs/foo
> $ sync
> $ echo 3 > /proc/sys/vm/drop_caches
> $ btrfs filesystem defrag -f -c /btrfs/foo
>
> | | Time | Read I/O Count |
> | | Real | devid1 | devid2 |
> |---------|-------|--------|--------|
> | pid | 21.61s| 3810 | 0 |
> | rotation| 11.55s| 1905 | 1905 |
> | latency | 20.99s| 0 | 3810 |
> | devid:2 | 21.41s| 0 | 3810 |
>
> . The PID-based balancing method works well for the generic random rw fio
> workload.
> . The rotation method is ideal when you want to keep both devices active,
> and it boosts performance in sequential defragmentation scenarios.
> . The latency-based method work well when we have mixed device types or
> when one device experiences intermittent I/O failures the latency
> increases and it automatically picks the other device for further Read
> IOs.
> . The devid method is a more hands-on approach, useful for diagnosing and
> testing RAID1 mirror synchronizations.
>
> Anand Jain (3):
> btrfs: introduce RAID1 round-robin read balancing
> btrfs: use the path with the lowest latency for RAID1 reads
> btrfs: add RAID1 preferred read device
>
> fs/btrfs/disk-io.c | 4 ++
> fs/btrfs/sysfs.c | 116 +++++++++++++++++++++++++++++++++++++++------
> fs/btrfs/volumes.c | 109 ++++++++++++++++++++++++++++++++++++++++++
> fs/btrfs/volumes.h | 16 +++++++
> 4 files changed, 230 insertions(+), 15 deletions(-)
>
next prev parent reply other threads:[~2024-10-11 3:35 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-11 2:49 [PATCH v2 0/3] raid1 balancing methods Anand Jain
2024-10-11 2:49 ` [PATCH v2 1/3] btrfs: introduce RAID1 round-robin read balancing Anand Jain
2024-10-11 2:49 ` [PATCH v2 2/3] btrfs: use the path with the lowest latency for RAID1 reads Anand Jain
2024-10-11 2:49 ` [PATCH v2 3/3] btrfs: add RAID1 preferred read device Anand Jain
2024-10-11 3:35 ` Anand Jain [this message]
2024-10-11 4:59 ` [PATCH v2 0/3] raid1 balancing methods Qu Wenruo
2024-10-11 6:04 ` Anand Jain
2024-10-21 14:05 ` David Sterba
2024-10-21 15:36 ` Anand Jain
2024-10-21 18:42 ` David Sterba
2024-10-22 0:31 ` Anand Jain
2024-10-21 14:32 ` waxhead
2024-10-21 15:44 ` Anand Jain
2024-10-22 7:07 ` Johannes Thumshirn
2024-10-24 4:39 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=662f66fe-8016-49f3-9c2e-6624a8fe3679@oracle.com \
--to=anand.jain@oracle.com \
--cc=dsterba@suse.com \
--cc=hrx@bupt.moe \
--cc=linux-btrfs@vger.kernel.org \
--cc=waxhead@dirtcellar.net \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).