From: Anand Jain <anand.jain@oracle.com>
To: linux-btrfs@vger.kernel.org
Cc: dsterba@suse.com, wqu@suse.com, hrx@bupt.moe, waxhead@dirtcellar.net
Subject: [PATCH v4 00/10] raid1 balancing methods
Date: Tue, 17 Dec 2024 02:13:08 +0800 [thread overview]
Message-ID: <cover.1734370092.git.anand.jain@oracle.com> (raw)
v4:
Fixes based on review comments:
3/10: Use == 0 to check strlen(str); drop dynamic alloc for %param.
4/10: Add __maybe_unused for %value_str in btrfs_read_policy_to_enum().
Return int instead of enum btrfs_read_policy.
5/10: Fix change-log (: is part of optional [ ]).
Wrap btrfs_read_policy_name[] with ifdef for new methods.
Use IS_ALIGNED() for sector-size alignment check.
Roundup misaligned %value.
Use named constants: BTRFS_DEFAULT_RR_MIN_CONTIGUOUS_READ, BTRFS_RAID1_MAX_MIRRORS.
Mark %s1 and %s2 in btrfs_cmp_devid() as const.
Add comments to btrfs_read_rr();
Use loop-local %i. Add space around /.
Use >> for sector-size division.
Prefix %min_contiguous_read with rr.
7/10: Move experimental to the top of the feature list.
Use experiment=on. Skip printing when off.
v3:
1. Removed the latency-based RAID1 balancing patch. (Per David's review)
2. Renamed "rotation" to "round-robin" and set the per-set
min_contiguous_read to 256k. (Per David's review)
3. Added raid1-balancing module configuration for fstests testing.
raid1-balancing can now be configured through both module load
parameters and sysfs.
The logic of individual methods remains unchanged, and performance metrics
are consistent with v2.
-----
v2:
1. Move new features to CONFIG_BTRFS_EXPERIMENTAL instead of CONFIG_BTRFS_DEBUG.
2. Correct the typo from %est_wait to %best_wait.
3. Initialize %best_wait to U64_MAX and remove the check for 0.
4. Implement rotation with a minimum contiguous read threshold before
switching to the next stripe. Configure this, using:
echo rotation:[min_contiguous_read] > /sys/fs/btrfs/<uuid>/read_policy
The default value is the sector size, and the min_contiguous_read
value must be a multiple of the sector size.
5. Tested FIO random read/write and defrag compression workloads with
min_contiguous_read set to sector size, 192k, and 256k.
RAID1 balancing method rotation is better for multi-process workloads
such as fio and also single-process workload such as defragmentation.
$ fio --filename=/btrfs/foo --size=5Gi --direct=1 --rw=randrw --bs=4k \
--ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \
--time_based --group_reporting --name=iops-test-job --eta-newline=1
| | | | Read I/O count |
| | Read | Write | devid1 | devid2 |
|---------|------------|------------|--------|--------|
| pid | 20.3MiB/s | 20.5MiB/s | 313895 | 313895 |
| rotation| | | | |
| 4096| 20.4MiB/s | 20.5MiB/s | 313895 | 313895 |
| 196608| 20.2MiB/s | 20.2MiB/s | 310152 | 310175 |
| 262144| 20.3MiB/s | 20.4MiB/s | 312180 | 312191 |
| latency| 18.4MiB/s | 18.4MiB/s | 272980 | 291683 |
| devid:1 | 14.8MiB/s | 14.9MiB/s | 456376 | 0 |
rotation RAID1 balancing technique performs more than 2x better for
single-process defrag.
$ time -p btrfs filesystem defrag -r -f -c /btrfs
| | Time | Read I/O Count |
| | Real | devid1 | devid2 |
|---------|-------|--------|--------|
| pid | 18.00s| 3800 | 0 |
| rotation| | | |
| 4096| 8.95s| 1900 | 1901 |
| 196608| 8.50s| 1881 | 1919 |
| 262144| 8.80s| 1881 | 1919 |
| latency | 17.18s| 3800 | 0 |
| devid:2 | 17.48s| 0 | 3800 |
Rotation keeps all devices active, and for now, the Rotation RAID1
balancing method is preferable as default. More workload testing is
needed while the code is EXPERIMENTAL.
While Latency is better during the failing/unstable block layer transport.
As of now these two techniques, are needed to be further independently
tested with different worloads, and in the long term we should be merge
these technique to a unified heuristic.
Rotation keeps all devices active, and for now, the Rotation RAID1
balancing method should be the default. More workload testing is needed
while the code is EXPERIMENTAL.
Latency is smarter with unstable block layer transport.
Both techniques need independent testing across workloads, with the goal of
eventually merging them into a unified approach? for the long term.
Devid is a hands-on approach, provides manual or user-space script control.
These RAID1 balancing methods are tunable via the sysfs knob.
The mount -o option and btrfs properties are under consideration.
Thx.
--------- original v1 ------------
The RAID1-balancing methods helps distribute read I/O across devices, and
this patch introduces three balancing methods: rotation, latency, and
devid. These methods are enabled under the `CONFIG_BTRFS_DEBUG` config
option and are on top of the previously added
`/sys/fs/btrfs/<UUID>/read_policy` interface to configure the desired
RAID1 read balancing method.
I've tested these patches using fio and filesystem defragmentation
workloads on a two-device RAID1 setup (with both data and metadata
mirrored across identical devices). I tracked device read counts by
extracting stats from `/sys/devices/<..>/stat` for each device. Below is
a summary of the results, with each result the average of three
iterations.
A typical generic random rw workload:
$ fio --filename=/btrfs/foo --size=10Gi --direct=1 --rw=randrw --bs=4k \
--ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based \
--group_reporting --name=iops-test-job --eta-newline=1
| | | | Read I/O count |
| | Read | Write | devid1 | devid2 |
|---------|------------|------------|--------|--------|
| pid | 29.4MiB/s | 29.5MiB/s | 456548 | 447975 |
| rotation| 29.3MiB/s | 29.3MiB/s | 450105 | 450055 |
| latency | 21.9MiB/s | 21.9MiB/s | 672387 | 0 |
| devid:1 | 22.0MiB/s | 22.0MiB/s | 674788 | 0 |
Defragmentation with compression workload:
$ xfs_io -f -d -c 'pwrite -S 0xab 0 1G' /btrfs/foo
$ sync
$ echo 3 > /proc/sys/vm/drop_caches
$ btrfs filesystem defrag -f -c /btrfs/foo
| | Time | Read I/O Count |
| | Real | devid1 | devid2 |
|---------|-------|--------|--------|
| pid | 21.61s| 3810 | 0 |
| rotation| 11.55s| 1905 | 1905 |
| latency | 20.99s| 0 | 3810 |
| devid:2 | 21.41s| 0 | 3810 |
. The PID-based balancing method works well for the generic random rw fio
workload.
. The rotation method is ideal when you want to keep both devices active,
and it boosts performance in sequential defragmentation scenarios.
. The latency-based method work well when we have mixed device types or
when one device experiences intermittent I/O failures the latency
increases and it automatically picks the other device for further Read
IOs.
. The devid method is a more hands-on approach, useful for diagnosing and
testing RAID1 mirror synchronizations.
Subject: [PATCH v4 0/9] *** SUBJECT HERE ***
*** BLURB HERE ***
Anand Jain (9):
btrfs: initialize fs_devices->fs_info earlier
btrfs: simplify output formatting in btrfs_read_policy_show
btrfs: add btrfs_read_policy_to_enum helper and refactor read policy
store
btrfs: handle value associated with raid1 balancing parameter
btrfs: introduce RAID1 round-robin read balancing
btrfs: add RAID1 preferred read device
btrfs: expose experimental mode in module information
btrfs: enable RAID1 balancing configuration via modprobe parameter
btrfs: modload to print RAID1 balancing status
fs/btrfs/disk-io.c | 1 +
fs/btrfs/super.c | 20 +++++-
fs/btrfs/sysfs.c | 173 ++++++++++++++++++++++++++++++++++++++++-----
fs/btrfs/sysfs.h | 5 ++
fs/btrfs/volumes.c | 99 +++++++++++++++++++++++++-
fs/btrfs/volumes.h | 16 +++++
6 files changed, 292 insertions(+), 22 deletions(-)
--
2.47.0
next reply other threads:[~2024-12-16 18:14 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-16 18:13 Anand Jain [this message]
2024-12-16 18:13 ` [PATCH v4 1/9] btrfs: initialize fs_devices->fs_info earlier Anand Jain
2024-12-18 3:08 ` Naohiro Aota
2024-12-16 18:13 ` [PATCH v4 2/9] btrfs: simplify output formatting in btrfs_read_policy_show Anand Jain
2024-12-16 18:13 ` [PATCH v4 3/9] btrfs: add btrfs_read_policy_to_enum helper and refactor read policy store Anand Jain
2024-12-18 3:17 ` Naohiro Aota
2024-12-18 5:48 ` Anand Jain
2025-01-01 9:49 ` Anand Jain
2024-12-16 18:13 ` [PATCH v4 4/9] btrfs: handle value associated with raid1 balancing parameter Anand Jain
2024-12-16 18:13 ` [PATCH v4 5/9] btrfs: introduce RAID1 round-robin read balancing Anand Jain
2024-12-18 5:53 ` Naohiro Aota
2024-12-18 15:20 ` Anand Jain
2024-12-16 18:13 ` [PATCH v4 6/9] btrfs: add RAID1 preferred read device Anand Jain
2024-12-16 18:13 ` [PATCH v4 7/9] btrfs: expose experimental mode in module information Anand Jain
2024-12-16 18:13 ` [PATCH v4 8/9] btrfs: enable RAID1 balancing configuration via modprobe parameter Anand Jain
2024-12-16 18:13 ` [PATCH v4 9/9] btrfs: modload to print RAID1 balancing status Anand Jain
2024-12-17 3:53 ` [PATCH] btrfs: fix smatch warning inconsistent indenting Anand Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1734370092.git.anand.jain@oracle.com \
--to=anand.jain@oracle.com \
--cc=dsterba@suse.com \
--cc=hrx@bupt.moe \
--cc=linux-btrfs@vger.kernel.org \
--cc=waxhead@dirtcellar.net \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox