* [PATCH v4 00/10] raid1 balancing methods
@ 2024-12-16 18:13 Anand Jain
2024-12-16 18:13 ` [PATCH v4 1/9] btrfs: initialize fs_devices->fs_info earlier Anand Jain
` (8 more replies)
0 siblings, 9 replies; 17+ messages in thread
From: Anand Jain @ 2024-12-16 18:13 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, wqu, hrx, waxhead
v4:
Fixes based on review comments:
3/10: Use == 0 to check strlen(str); drop dynamic alloc for %param.
4/10: Add __maybe_unused for %value_str in btrfs_read_policy_to_enum().
Return int instead of enum btrfs_read_policy.
5/10: Fix change-log (: is part of optional [ ]).
Wrap btrfs_read_policy_name[] with ifdef for new methods.
Use IS_ALIGNED() for sector-size alignment check.
Roundup misaligned %value.
Use named constants: BTRFS_DEFAULT_RR_MIN_CONTIGUOUS_READ, BTRFS_RAID1_MAX_MIRRORS.
Mark %s1 and %s2 in btrfs_cmp_devid() as const.
Add comments to btrfs_read_rr();
Use loop-local %i. Add space around /.
Use >> for sector-size division.
Prefix %min_contiguous_read with rr.
7/10: Move experimental to the top of the feature list.
Use experiment=on. Skip printing when off.
v3:
1. Removed the latency-based RAID1 balancing patch. (Per David's review)
2. Renamed "rotation" to "round-robin" and set the per-set
min_contiguous_read to 256k. (Per David's review)
3. Added raid1-balancing module configuration for fstests testing.
raid1-balancing can now be configured through both module load
parameters and sysfs.
The logic of individual methods remains unchanged, and performance metrics
are consistent with v2.
-----
v2:
1. Move new features to CONFIG_BTRFS_EXPERIMENTAL instead of CONFIG_BTRFS_DEBUG.
2. Correct the typo from %est_wait to %best_wait.
3. Initialize %best_wait to U64_MAX and remove the check for 0.
4. Implement rotation with a minimum contiguous read threshold before
switching to the next stripe. Configure this, using:
echo rotation:[min_contiguous_read] > /sys/fs/btrfs/<uuid>/read_policy
The default value is the sector size, and the min_contiguous_read
value must be a multiple of the sector size.
5. Tested FIO random read/write and defrag compression workloads with
min_contiguous_read set to sector size, 192k, and 256k.
RAID1 balancing method rotation is better for multi-process workloads
such as fio and also single-process workload such as defragmentation.
$ fio --filename=/btrfs/foo --size=5Gi --direct=1 --rw=randrw --bs=4k \
--ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \
--time_based --group_reporting --name=iops-test-job --eta-newline=1
| | | | Read I/O count |
| | Read | Write | devid1 | devid2 |
|---------|------------|------------|--------|--------|
| pid | 20.3MiB/s | 20.5MiB/s | 313895 | 313895 |
| rotation| | | | |
| 4096| 20.4MiB/s | 20.5MiB/s | 313895 | 313895 |
| 196608| 20.2MiB/s | 20.2MiB/s | 310152 | 310175 |
| 262144| 20.3MiB/s | 20.4MiB/s | 312180 | 312191 |
| latency| 18.4MiB/s | 18.4MiB/s | 272980 | 291683 |
| devid:1 | 14.8MiB/s | 14.9MiB/s | 456376 | 0 |
rotation RAID1 balancing technique performs more than 2x better for
single-process defrag.
$ time -p btrfs filesystem defrag -r -f -c /btrfs
| | Time | Read I/O Count |
| | Real | devid1 | devid2 |
|---------|-------|--------|--------|
| pid | 18.00s| 3800 | 0 |
| rotation| | | |
| 4096| 8.95s| 1900 | 1901 |
| 196608| 8.50s| 1881 | 1919 |
| 262144| 8.80s| 1881 | 1919 |
| latency | 17.18s| 3800 | 0 |
| devid:2 | 17.48s| 0 | 3800 |
Rotation keeps all devices active, and for now, the Rotation RAID1
balancing method is preferable as default. More workload testing is
needed while the code is EXPERIMENTAL.
While Latency is better during the failing/unstable block layer transport.
As of now these two techniques, are needed to be further independently
tested with different worloads, and in the long term we should be merge
these technique to a unified heuristic.
Rotation keeps all devices active, and for now, the Rotation RAID1
balancing method should be the default. More workload testing is needed
while the code is EXPERIMENTAL.
Latency is smarter with unstable block layer transport.
Both techniques need independent testing across workloads, with the goal of
eventually merging them into a unified approach? for the long term.
Devid is a hands-on approach, provides manual or user-space script control.
These RAID1 balancing methods are tunable via the sysfs knob.
The mount -o option and btrfs properties are under consideration.
Thx.
--------- original v1 ------------
The RAID1-balancing methods helps distribute read I/O across devices, and
this patch introduces three balancing methods: rotation, latency, and
devid. These methods are enabled under the `CONFIG_BTRFS_DEBUG` config
option and are on top of the previously added
`/sys/fs/btrfs/<UUID>/read_policy` interface to configure the desired
RAID1 read balancing method.
I've tested these patches using fio and filesystem defragmentation
workloads on a two-device RAID1 setup (with both data and metadata
mirrored across identical devices). I tracked device read counts by
extracting stats from `/sys/devices/<..>/stat` for each device. Below is
a summary of the results, with each result the average of three
iterations.
A typical generic random rw workload:
$ fio --filename=/btrfs/foo --size=10Gi --direct=1 --rw=randrw --bs=4k \
--ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based \
--group_reporting --name=iops-test-job --eta-newline=1
| | | | Read I/O count |
| | Read | Write | devid1 | devid2 |
|---------|------------|------------|--------|--------|
| pid | 29.4MiB/s | 29.5MiB/s | 456548 | 447975 |
| rotation| 29.3MiB/s | 29.3MiB/s | 450105 | 450055 |
| latency | 21.9MiB/s | 21.9MiB/s | 672387 | 0 |
| devid:1 | 22.0MiB/s | 22.0MiB/s | 674788 | 0 |
Defragmentation with compression workload:
$ xfs_io -f -d -c 'pwrite -S 0xab 0 1G' /btrfs/foo
$ sync
$ echo 3 > /proc/sys/vm/drop_caches
$ btrfs filesystem defrag -f -c /btrfs/foo
| | Time | Read I/O Count |
| | Real | devid1 | devid2 |
|---------|-------|--------|--------|
| pid | 21.61s| 3810 | 0 |
| rotation| 11.55s| 1905 | 1905 |
| latency | 20.99s| 0 | 3810 |
| devid:2 | 21.41s| 0 | 3810 |
. The PID-based balancing method works well for the generic random rw fio
workload.
. The rotation method is ideal when you want to keep both devices active,
and it boosts performance in sequential defragmentation scenarios.
. The latency-based method work well when we have mixed device types or
when one device experiences intermittent I/O failures the latency
increases and it automatically picks the other device for further Read
IOs.
. The devid method is a more hands-on approach, useful for diagnosing and
testing RAID1 mirror synchronizations.
Subject: [PATCH v4 0/9] *** SUBJECT HERE ***
*** BLURB HERE ***
Anand Jain (9):
btrfs: initialize fs_devices->fs_info earlier
btrfs: simplify output formatting in btrfs_read_policy_show
btrfs: add btrfs_read_policy_to_enum helper and refactor read policy
store
btrfs: handle value associated with raid1 balancing parameter
btrfs: introduce RAID1 round-robin read balancing
btrfs: add RAID1 preferred read device
btrfs: expose experimental mode in module information
btrfs: enable RAID1 balancing configuration via modprobe parameter
btrfs: modload to print RAID1 balancing status
fs/btrfs/disk-io.c | 1 +
fs/btrfs/super.c | 20 +++++-
fs/btrfs/sysfs.c | 173 ++++++++++++++++++++++++++++++++++++++++-----
fs/btrfs/sysfs.h | 5 ++
fs/btrfs/volumes.c | 99 +++++++++++++++++++++++++-
fs/btrfs/volumes.h | 16 +++++
6 files changed, 292 insertions(+), 22 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v4 1/9] btrfs: initialize fs_devices->fs_info earlier
2024-12-16 18:13 [PATCH v4 00/10] raid1 balancing methods Anand Jain
@ 2024-12-16 18:13 ` Anand Jain
2024-12-18 3:08 ` Naohiro Aota
2024-12-16 18:13 ` [PATCH v4 2/9] btrfs: simplify output formatting in btrfs_read_policy_show Anand Jain
` (7 subsequent siblings)
8 siblings, 1 reply; 17+ messages in thread
From: Anand Jain @ 2024-12-16 18:13 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, wqu, hrx, waxhead
Currently, fs_devices->fs_info is initialized in btrfs_init_devices_late(),
but this occurs too late for find_live_mirror(), which is invoked by
load_super_root() much earlier than btrfs_init_devices_late().
Fix this by moving the initialization to open_ctree(), before load_super_root().
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/disk-io.c | 1 +
fs/btrfs/volumes.c | 2 --
2 files changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 814320948645..ab45b02df957 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3321,6 +3321,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
fs_info->sectors_per_page = (PAGE_SIZE >> fs_info->sectorsize_bits);
fs_info->csums_per_leaf = BTRFS_MAX_ITEM_SIZE(fs_info) / fs_info->csum_size;
fs_info->stripesize = stripesize;
+ fs_info->fs_devices->fs_info = fs_info;
/*
* Handle the space caching options appropriately now that we have the
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1cccaf9c2b0d..fe5ceea2ba0b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7568,8 +7568,6 @@ int btrfs_init_devices_late(struct btrfs_fs_info *fs_info)
struct btrfs_device *device;
int ret = 0;
- fs_devices->fs_info = fs_info;
-
mutex_lock(&fs_devices->device_list_mutex);
list_for_each_entry(device, &fs_devices->devices, dev_list)
device->fs_info = fs_info;
--
2.47.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 2/9] btrfs: simplify output formatting in btrfs_read_policy_show
2024-12-16 18:13 [PATCH v4 00/10] raid1 balancing methods Anand Jain
2024-12-16 18:13 ` [PATCH v4 1/9] btrfs: initialize fs_devices->fs_info earlier Anand Jain
@ 2024-12-16 18:13 ` Anand Jain
2024-12-16 18:13 ` [PATCH v4 3/9] btrfs: add btrfs_read_policy_to_enum helper and refactor read policy store Anand Jain
` (6 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: Anand Jain @ 2024-12-16 18:13 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, wqu, hrx, waxhead
Refactor the logic in btrfs_read_policy_show() to streamline the
formatting of read policies output. Streamline the space and bracket
handling around the active policy without altering the functional output.
This is in preparation to add more methods.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/sysfs.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index b843308e2bc6..fd3c49c6c3c5 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1316,14 +1316,16 @@ static ssize_t btrfs_read_policy_show(struct kobject *kobj,
int i;
for (i = 0; i < BTRFS_NR_READ_POLICY; i++) {
- if (policy == i)
- ret += sysfs_emit_at(buf, ret, "%s[%s]",
- (ret == 0 ? "" : " "),
- btrfs_read_policy_name[i]);
- else
- ret += sysfs_emit_at(buf, ret, "%s%s",
- (ret == 0 ? "" : " "),
- btrfs_read_policy_name[i]);
+ if (ret != 0)
+ ret += sysfs_emit_at(buf, ret, " ");
+
+ if (i == policy)
+ ret += sysfs_emit_at(buf, ret, "[");
+
+ ret += sysfs_emit_at(buf, ret, "%s", btrfs_read_policy_name[i]);
+
+ if (i == policy)
+ ret += sysfs_emit_at(buf, ret, "]");
}
ret += sysfs_emit_at(buf, ret, "\n");
--
2.47.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 3/9] btrfs: add btrfs_read_policy_to_enum helper and refactor read policy store
2024-12-16 18:13 [PATCH v4 00/10] raid1 balancing methods Anand Jain
2024-12-16 18:13 ` [PATCH v4 1/9] btrfs: initialize fs_devices->fs_info earlier Anand Jain
2024-12-16 18:13 ` [PATCH v4 2/9] btrfs: simplify output formatting in btrfs_read_policy_show Anand Jain
@ 2024-12-16 18:13 ` Anand Jain
2024-12-18 3:17 ` Naohiro Aota
2024-12-16 18:13 ` [PATCH v4 4/9] btrfs: handle value associated with raid1 balancing parameter Anand Jain
` (5 subsequent siblings)
8 siblings, 1 reply; 17+ messages in thread
From: Anand Jain @ 2024-12-16 18:13 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, wqu, hrx, waxhead
Introduce the `btrfs_read_policy_to_enum` helper function to simplify the
conversion of a string read policy to its corresponding enum value. This
reduces duplication and improves code clarity in `btrfs_read_policy_store`.
The `btrfs_read_policy_store` function has been refactored to use the new
helper.
The parameter is copied locally to allow modification, enabling the
separation of the method and its value. This prepares for the addition of
more functionality in subsequent patches.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/sysfs.c | 46 ++++++++++++++++++++++++++++++++++------------
1 file changed, 34 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index fd3c49c6c3c5..34903e5bf8d0 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1307,6 +1307,30 @@ BTRFS_ATTR(, temp_fsid, btrfs_temp_fsid_show);
static const char * const btrfs_read_policy_name[] = { "pid" };
+static int btrfs_read_policy_to_enum(const char *str)
+{
+ char param[32] = {'\0'};
+ int index;
+ bool found = false;
+
+ if (!str || strlen(str) == 0)
+ return 0;
+
+ strcpy(param, str);
+
+ for (index = 0; index < BTRFS_NR_READ_POLICY; index++) {
+ if (sysfs_streq(param, btrfs_read_policy_name[index])) {
+ found = true;
+ break;
+ }
+ }
+
+ if (found)
+ return index;
+
+ return -EINVAL;
+}
+
static ssize_t btrfs_read_policy_show(struct kobject *kobj,
struct kobj_attribute *a, char *buf)
{
@@ -1338,21 +1362,19 @@ static ssize_t btrfs_read_policy_store(struct kobject *kobj,
const char *buf, size_t len)
{
struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
- int i;
+ int index;
- for (i = 0; i < BTRFS_NR_READ_POLICY; i++) {
- if (sysfs_streq(buf, btrfs_read_policy_name[i])) {
- if (i != READ_ONCE(fs_devices->read_policy)) {
- WRITE_ONCE(fs_devices->read_policy, i);
- btrfs_info(fs_devices->fs_info,
- "read policy set to '%s'",
- btrfs_read_policy_name[i]);
- }
- return len;
- }
+ index = btrfs_read_policy_to_enum(buf);
+ if (index == -EINVAL)
+ return -EINVAL;
+
+ if (index != READ_ONCE(fs_devices->read_policy)) {
+ WRITE_ONCE(fs_devices->read_policy, index);
+ btrfs_info(fs_devices->fs_info, "read policy set to '%s'",
+ btrfs_read_policy_name[index]);
}
- return -EINVAL;
+ return len;
}
BTRFS_ATTR_RW(, read_policy, btrfs_read_policy_show, btrfs_read_policy_store);
--
2.47.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 4/9] btrfs: handle value associated with raid1 balancing parameter
2024-12-16 18:13 [PATCH v4 00/10] raid1 balancing methods Anand Jain
` (2 preceding siblings ...)
2024-12-16 18:13 ` [PATCH v4 3/9] btrfs: add btrfs_read_policy_to_enum helper and refactor read policy store Anand Jain
@ 2024-12-16 18:13 ` Anand Jain
2024-12-16 18:13 ` [PATCH v4 5/9] btrfs: introduce RAID1 round-robin read balancing Anand Jain
` (4 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: Anand Jain @ 2024-12-16 18:13 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, wqu, hrx, waxhead
This change enables specifying additional configuration values alongside
the raid1 balancing / read policy in a single input string.
Updated btrfs_read_policy_to_enum() to parse and handle a value associated
with the policy in the format `policy:value`, the value part if present is
converted 64-bit integer. Update btrfs_read_policy_store() to accommodate
the new parameter.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/sysfs.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 34903e5bf8d0..9c7bedf974d2 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1307,9 +1307,10 @@ BTRFS_ATTR(, temp_fsid, btrfs_temp_fsid_show);
static const char * const btrfs_read_policy_name[] = { "pid" };
-static int btrfs_read_policy_to_enum(const char *str)
+static int btrfs_read_policy_to_enum(const char *str, s64 *value)
{
char param[32] = {'\0'};
+ char *__maybe_unused value_str;
int index;
bool found = false;
@@ -1318,6 +1319,16 @@ static int btrfs_read_policy_to_enum(const char *str)
strcpy(param, str);
+#ifdef CONFIG_BTRFS_EXPERIMENTAL
+ /* Separate value from input in policy:value format. */
+ if ((value_str = strchr(param, ':'))) {
+ *value_str = '\0';
+ value_str++;
+ if (value && kstrtou64(value_str, 10, value) != 0)
+ return -EINVAL;
+ }
+#endif
+
for (index = 0; index < BTRFS_NR_READ_POLICY; index++) {
if (sysfs_streq(param, btrfs_read_policy_name[index])) {
found = true;
@@ -1363,8 +1374,9 @@ static ssize_t btrfs_read_policy_store(struct kobject *kobj,
{
struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
int index;
+ s64 value = -1;
- index = btrfs_read_policy_to_enum(buf);
+ index = btrfs_read_policy_to_enum(buf, &value);
if (index == -EINVAL)
return -EINVAL;
--
2.47.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 5/9] btrfs: introduce RAID1 round-robin read balancing
2024-12-16 18:13 [PATCH v4 00/10] raid1 balancing methods Anand Jain
` (3 preceding siblings ...)
2024-12-16 18:13 ` [PATCH v4 4/9] btrfs: handle value associated with raid1 balancing parameter Anand Jain
@ 2024-12-16 18:13 ` Anand Jain
2024-12-18 5:53 ` Naohiro Aota
2024-12-16 18:13 ` [PATCH v4 6/9] btrfs: add RAID1 preferred read device Anand Jain
` (3 subsequent siblings)
8 siblings, 1 reply; 17+ messages in thread
From: Anand Jain @ 2024-12-16 18:13 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, wqu, hrx, waxhead
This feature balances I/O across the striped devices when reading from
RAID1 blocks.
echo round-robin[:min_contiguous_read] > /sys/fs/btrfs/<uuid>/read_policy
The min_contiguous_read parameter defines the minimum read size before
switching to the next mirrored device. This setting is optional, with a
default value of 256 KiB.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/sysfs.c | 44 +++++++++++++++++++++++++++-
fs/btrfs/volumes.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/volumes.h | 11 +++++++
3 files changed, 127 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 9c7bedf974d2..b0e1fb787ce6 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1305,7 +1305,12 @@ static ssize_t btrfs_temp_fsid_show(struct kobject *kobj,
}
BTRFS_ATTR(, temp_fsid, btrfs_temp_fsid_show);
-static const char * const btrfs_read_policy_name[] = { "pid" };
+static const char *btrfs_read_policy_name[] = {
+ "pid",
+#ifdef CONFIG_BTRFS_EXPERIMENTAL
+ "round-robin",
+#endif
+};
static int btrfs_read_policy_to_enum(const char *str, s64 *value)
{
@@ -1359,6 +1364,12 @@ static ssize_t btrfs_read_policy_show(struct kobject *kobj,
ret += sysfs_emit_at(buf, ret, "%s", btrfs_read_policy_name[i]);
+#ifdef CONFIG_BTRFS_EXPERIMENTAL
+ if (i == BTRFS_READ_POLICY_RR)
+ ret += sysfs_emit_at(buf, ret, ":%d",
+ fs_devices->rr_min_contiguous_read);
+#endif
+
if (i == policy)
ret += sysfs_emit_at(buf, ret, "]");
}
@@ -1380,6 +1391,37 @@ static ssize_t btrfs_read_policy_store(struct kobject *kobj,
if (index == -EINVAL)
return -EINVAL;
+#ifdef CONFIG_BTRFS_EXPERIMENTAL
+ if (index == BTRFS_READ_POLICY_RR) {
+ if (value != -1) {
+ u32 sectorsize = fs_devices->fs_info->sectorsize;
+
+ if (!IS_ALIGNED(value, sectorsize)) {
+ u64 temp_value = round_up(value, sectorsize);
+
+ btrfs_warn(fs_devices->fs_info,
+"read_policy: min contiguous read %lld should be multiples of the sectorsize %u, rounded to %llu",
+ value, sectorsize, temp_value);
+ value = temp_value;
+ }
+ } else {
+ value = BTRFS_DEFAULT_RR_MIN_CONTIGUOUS_READ;
+ }
+
+ if (index != READ_ONCE(fs_devices->read_policy) ||
+ value != READ_ONCE(fs_devices->rr_min_contiguous_read)) {
+ WRITE_ONCE(fs_devices->read_policy, index);
+ WRITE_ONCE(fs_devices->rr_min_contiguous_read, value);
+ atomic_set(&fs_devices->total_reads, 0);
+
+ btrfs_info(fs_devices->fs_info, "read policy set to '%s:%lld'",
+ btrfs_read_policy_name[index], value);
+
+ }
+
+ return len;
+ }
+#endif
if (index != READ_ONCE(fs_devices->read_policy)) {
WRITE_ONCE(fs_devices->read_policy, index);
btrfs_info(fs_devices->fs_info, "read policy set to '%s'",
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fe5ceea2ba0b..77c3b66d56a0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1328,6 +1328,9 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
fs_devices->total_rw_bytes = 0;
fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_REGULAR;
fs_devices->read_policy = BTRFS_READ_POLICY_PID;
+#ifdef CONFIG_BTRFS_EXPERIMENTAL
+ fs_devices->rr_min_contiguous_read = BTRFS_DEFAULT_RR_MIN_CONTIGUOUS_READ;
+#endif
return 0;
}
@@ -5959,6 +5962,71 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,
return len;
}
+#ifdef CONFIG_BTRFS_EXPERIMENTAL
+struct stripe_mirror {
+ u64 devid;
+ int num;
+};
+
+static int btrfs_cmp_devid(const void *a, const void *b)
+{
+ const struct stripe_mirror *s1 = (struct stripe_mirror *)a;
+ const struct stripe_mirror *s2 = (struct stripe_mirror *)b;
+
+ if (s1->devid < s2->devid)
+ return -1;
+ if (s1->devid > s2->devid)
+ return 1;
+ return 0;
+}
+
+/*
+ * btrfs_read_rr.
+ *
+ * Select a stripe for reading using a round-robin algorithm:
+ *
+ * 1. Compute the read cycle as the total sectors read divided by the minimum
+ * sectors per device.
+ * 2. Determine the stripe number for the current read by taking the modulus
+ * of the read cycle with the total number of stripes:
+ *
+ * stripe index = (total sectors / min sectors per dev) % num stripes
+ *
+ * The calculated stripe index is then used to select the corresponding device
+ * from the list of devices, which is ordered by devid.
+ */
+static int btrfs_read_rr(struct btrfs_chunk_map *map, int first, int num_stripe)
+{
+ struct stripe_mirror stripes[BTRFS_RAID1_MAX_MIRRORS] = {0};
+ struct btrfs_fs_devices *fs_devices;
+ struct btrfs_device *device;
+ int read_cycle;
+ int index;
+ int ret_stripe;
+ int total_reads;
+ int reads_per_dev = 0;
+
+ device = map->stripes[first].dev;
+
+ fs_devices = device->fs_devices;
+ reads_per_dev = fs_devices->rr_min_contiguous_read >> SECTOR_SHIFT;
+ index = 0;
+ for (int i = first; i < first + num_stripe; i++) {
+ stripes[index].devid = map->stripes[i].dev->devid;
+ stripes[index].num = i;
+ index++;
+ }
+ sort(stripes, num_stripe, sizeof(struct stripe_mirror),
+ btrfs_cmp_devid, NULL);
+
+ total_reads = atomic_inc_return(&fs_devices->total_reads);
+ read_cycle = total_reads / reads_per_dev;
+ ret_stripe = stripes[read_cycle % num_stripe].num;
+
+ return ret_stripe;
+}
+#endif
+
static int find_live_mirror(struct btrfs_fs_info *fs_info,
struct btrfs_chunk_map *map, int first,
int dev_replace_is_ongoing)
@@ -5988,6 +6056,11 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
case BTRFS_READ_POLICY_PID:
preferred_mirror = first + (current->pid % num_stripes);
break;
+#ifdef CONFIG_BTRFS_EXPERIMENTAL
+ case BTRFS_READ_POLICY_RR:
+ preferred_mirror = btrfs_read_rr(map, first, num_stripes);
+ break;
+#endif
}
if (dev_replace_is_ongoing &&
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 3a416b1bc24c..b7b130ce0b10 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -296,6 +296,8 @@ enum btrfs_chunk_allocation_policy {
BTRFS_CHUNK_ALLOC_ZONED,
};
+#define BTRFS_DEFAULT_RR_MIN_CONTIGUOUS_READ (SZ_256K)
+#define BTRFS_RAID1_MAX_MIRRORS (4)
/*
* Read policies for mirrored block group profiles, read picks the stripe based
* on these policies.
@@ -303,6 +305,10 @@ enum btrfs_chunk_allocation_policy {
enum btrfs_read_policy {
/* Use process PID to choose the stripe */
BTRFS_READ_POLICY_PID,
+#ifdef CONFIG_BTRFS_EXPERIMENTAL
+ /* Balancing raid1 reads across all striped devices (round-robin) */
+ BTRFS_READ_POLICY_RR,
+#endif
BTRFS_NR_READ_POLICY,
};
@@ -431,6 +437,11 @@ struct btrfs_fs_devices {
enum btrfs_read_policy read_policy;
#ifdef CONFIG_BTRFS_EXPERIMENTAL
+ /* IO stat, read counter. */
+ atomic_t total_reads;
+ /* Min contiguous reads before switching to next device. */
+ int rr_min_contiguous_read;
+
/* Checksum mode - offload it or do it synchronously. */
enum btrfs_offload_csum_mode offload_csum_mode;
#endif
--
2.47.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 6/9] btrfs: add RAID1 preferred read device
2024-12-16 18:13 [PATCH v4 00/10] raid1 balancing methods Anand Jain
` (4 preceding siblings ...)
2024-12-16 18:13 ` [PATCH v4 5/9] btrfs: introduce RAID1 round-robin read balancing Anand Jain
@ 2024-12-16 18:13 ` Anand Jain
2024-12-16 18:13 ` [PATCH v4 7/9] btrfs: expose experimental mode in module information Anand Jain
` (2 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: Anand Jain @ 2024-12-16 18:13 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, wqu, hrx, waxhead
When there's stale data on a mirrored device, this feature lets you choose
which device to read from. Mainly used for testing.
echo "devid:<devid-value>" > /sys/fs/btrfs/<UUID>/read_policy
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/sysfs.c | 33 ++++++++++++++++++++++++++++++++-
fs/btrfs/volumes.c | 21 +++++++++++++++++++++
fs/btrfs/volumes.h | 5 +++++
3 files changed, 58 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index b0e1fb787ce6..b4b24a16a50c 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1309,6 +1309,7 @@ static const char *btrfs_read_policy_name[] = {
"pid",
#ifdef CONFIG_BTRFS_EXPERIMENTAL
"round-robin",
+ "devid",
#endif
};
@@ -1368,8 +1369,11 @@ static ssize_t btrfs_read_policy_show(struct kobject *kobj,
if (i == BTRFS_READ_POLICY_RR)
ret += sysfs_emit_at(buf, ret, ":%d",
fs_devices->rr_min_contiguous_read);
-#endif
+ if (i == BTRFS_READ_POLICY_DEVID)
+ ret += sysfs_emit_at(buf, ret, ":%llu",
+ fs_devices->read_devid);
+#endif
if (i == policy)
ret += sysfs_emit_at(buf, ret, "]");
}
@@ -1421,6 +1425,33 @@ static ssize_t btrfs_read_policy_store(struct kobject *kobj,
return len;
}
+
+ if (index == BTRFS_READ_POLICY_DEVID) {
+
+ if (value != -1) {
+ BTRFS_DEV_LOOKUP_ARGS(args);
+
+ /* Validate input devid */
+ args.devid = value;
+ if (btrfs_find_device(fs_devices, &args) == NULL)
+ return -EINVAL;
+ } else {
+ /* Set default devid to the devid of the latest device */
+ value = fs_devices->latest_dev->devid;
+ }
+
+ if (index != READ_ONCE(fs_devices->read_policy) ||
+ (value != READ_ONCE(fs_devices->read_devid))) {
+ WRITE_ONCE(fs_devices->read_policy, index);
+ WRITE_ONCE(fs_devices->read_devid, value);
+
+ btrfs_info(fs_devices->fs_info, "read policy set to '%s:%llu'",
+ btrfs_read_policy_name[index], value);
+
+ }
+
+ return len;
+ }
#endif
if (index != READ_ONCE(fs_devices->read_policy)) {
WRITE_ONCE(fs_devices->read_policy, index);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 77c3b66d56a0..ee2dd7e461b3 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1330,6 +1330,7 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
fs_devices->read_policy = BTRFS_READ_POLICY_PID;
#ifdef CONFIG_BTRFS_EXPERIMENTAL
fs_devices->rr_min_contiguous_read = BTRFS_DEFAULT_RR_MIN_CONTIGUOUS_READ;
+ fs_devices->read_devid = latest_dev->devid;
#endif
return 0;
@@ -5963,6 +5964,23 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,
}
#ifdef CONFIG_BTRFS_EXPERIMENTAL
+static int btrfs_read_preferred(struct btrfs_chunk_map *map, int first,
+ int num_stripe)
+{
+ int last = first + num_stripe;
+ int stripe_index;
+
+ for (stripe_index = first; stripe_index < last; stripe_index++) {
+ struct btrfs_device *device = map->stripes[stripe_index].dev;
+
+ if (device->devid == READ_ONCE(device->fs_devices->read_devid))
+ return stripe_index;
+ }
+
+ /* If no read-preferred device, use first stripe */
+ return first;
+}
+
struct stripe_mirror {
u64 devid;
int num;
@@ -6060,6 +6078,9 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
case BTRFS_READ_POLICY_RR:
preferred_mirror = btrfs_read_rr(map, first, num_stripes);
break;
+ case BTRFS_READ_POLICY_DEVID:
+ preferred_mirror = btrfs_read_preferred(map, first, num_stripes);
+ break;
#endif
}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index b7b130ce0b10..62812f4231dd 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -308,6 +308,8 @@ enum btrfs_read_policy {
#ifdef CONFIG_BTRFS_EXPERIMENTAL
/* Balancing raid1 reads across all striped devices (round-robin) */
BTRFS_READ_POLICY_RR,
+ /* Read from the specific device */
+ BTRFS_READ_POLICY_DEVID,
#endif
BTRFS_NR_READ_POLICY,
};
@@ -442,6 +444,9 @@ struct btrfs_fs_devices {
/* Min contiguous reads before switching to next device. */
int rr_min_contiguous_read;
+ /* Device to be used for reading in case of RAID1. */
+ u64 read_devid;
+
/* Checksum mode - offload it or do it synchronously. */
enum btrfs_offload_csum_mode offload_csum_mode;
#endif
--
2.47.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 7/9] btrfs: expose experimental mode in module information
2024-12-16 18:13 [PATCH v4 00/10] raid1 balancing methods Anand Jain
` (5 preceding siblings ...)
2024-12-16 18:13 ` [PATCH v4 6/9] btrfs: add RAID1 preferred read device Anand Jain
@ 2024-12-16 18:13 ` Anand Jain
2024-12-16 18:13 ` [PATCH v4 8/9] btrfs: enable RAID1 balancing configuration via modprobe parameter Anand Jain
2024-12-16 18:13 ` [PATCH v4 9/9] btrfs: modload to print RAID1 balancing status Anand Jain
8 siblings, 0 replies; 17+ messages in thread
From: Anand Jain @ 2024-12-16 18:13 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, wqu, hrx, waxhead
Commit c9c49e8f157e ("btrfs: split out CONFIG_BTRFS_EXPERIMENTAL from
CONFIG_BTRFS_DEBUG") introduces a way to enable or disable experimental
features, print its status during module load, like so:
Btrfs loaded, experimental=on, debug=on, assert=on, zoned=yes, fsverity=yes
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/super.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 3381389ab93a..fb6a009c72ae 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2457,6 +2457,9 @@ static __cold void btrfs_interface_exit(void)
static int __init btrfs_print_mod_info(void)
{
static const char options[] = ""
+#ifdef CONFIG_BTRFS_EXPERIMENTAL
+ ", experimental=on"
+#endif
#ifdef CONFIG_BTRFS_DEBUG
", debug=on"
#endif
--
2.47.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 8/9] btrfs: enable RAID1 balancing configuration via modprobe parameter
2024-12-16 18:13 [PATCH v4 00/10] raid1 balancing methods Anand Jain
` (6 preceding siblings ...)
2024-12-16 18:13 ` [PATCH v4 7/9] btrfs: expose experimental mode in module information Anand Jain
@ 2024-12-16 18:13 ` Anand Jain
2024-12-16 18:13 ` [PATCH v4 9/9] btrfs: modload to print RAID1 balancing status Anand Jain
8 siblings, 0 replies; 17+ messages in thread
From: Anand Jain @ 2024-12-16 18:13 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, wqu, hrx, waxhead
This update allows configuring the `raid1-balancing` methods using a
modprobe parameter when experimental mode CONFIG_BTRFS_EXPERIMENTAL
is enabled.
Examples:
- Set the RAID1 balancing method to round-robin with a custom
`min_contiguous_read` of 192k:
$ modprobe btrfs raid1-balancing=round-robin:196608
- Set the round-robin balancing method with the default
`min_contiguous_read` of 256k:
$ modprobe btrfs raid1-balancing=round-robin
- Set the `devid` balancing method, defaulting to the latest
device:
$ modprobe btrfs raid1-balancing=devid
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/super.c | 5 +++++
fs/btrfs/sysfs.c | 28 +++++++++++++++++++++++++++-
fs/btrfs/sysfs.h | 5 +++++
fs/btrfs/volumes.c | 5 ++++-
4 files changed, 41 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index fb6a009c72ae..58190989a29d 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2538,6 +2538,11 @@ static const struct init_sequence mod_init_seq[] = {
}, {
.init_func = extent_map_init,
.exit_func = extent_map_exit,
+#ifdef CONFIG_BTRFS_EXPERIMENTAL
+ }, {
+ .init_func = btrfs_raid1_balancing_init,
+ .exit_func = NULL,
+#endif
}, {
.init_func = ordered_data_init,
.exit_func = ordered_data_exit,
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index b4b24a16a50c..dc10908556f8 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1313,7 +1313,21 @@ static const char *btrfs_read_policy_name[] = {
#endif
};
-static int btrfs_read_policy_to_enum(const char *str, s64 *value)
+#ifdef CONFIG_BTRFS_EXPERIMENTAL
+/* Global module configuration parameters */
+static char *raid1_balancing;
+char *btrfs_get_raid1_balancing(void)
+{
+ return raid1_balancing;
+}
+
+/* Set perm 0, disable sys/module/btrfs/parameter/raid1_balancing interface */
+module_param(raid1_balancing, charp, 0);
+MODULE_PARM_DESC(raid1_balancing,
+"Global read policy; pid (default), round-robin:[min_contiguous_read], devid:[[devid]|[latest-gen]|[oldest-gen]]");
+#endif
+
+int btrfs_read_policy_to_enum(const char *str, s64 *value)
{
char param[32] = {'\0'};
char *__maybe_unused value_str;
@@ -1348,6 +1362,18 @@ static int btrfs_read_policy_to_enum(const char *str, s64 *value)
return -EINVAL;
}
+#ifdef CONFIG_BTRFS_EXPERIMENTAL
+int __init btrfs_raid1_balancing_init(void)
+{
+ if (btrfs_read_policy_to_enum(raid1_balancing, NULL) == -EINVAL) {
+ btrfs_err(NULL, "Invalid raid1_balancing %s", raid1_balancing);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+#endif
+
static ssize_t btrfs_read_policy_show(struct kobject *kobj,
struct kobj_attribute *a, char *buf)
{
diff --git a/fs/btrfs/sysfs.h b/fs/btrfs/sysfs.h
index e6a284c59809..e97d383b9ffc 100644
--- a/fs/btrfs/sysfs.h
+++ b/fs/btrfs/sysfs.h
@@ -47,5 +47,10 @@ void btrfs_sysfs_del_qgroups(struct btrfs_fs_info *fs_info);
int btrfs_sysfs_add_qgroups(struct btrfs_fs_info *fs_info);
void btrfs_sysfs_del_one_qgroup(struct btrfs_fs_info *fs_info,
struct btrfs_qgroup *qgroup);
+int btrfs_read_policy_to_enum(const char *str, s64 *value);
+#ifdef CONFIG_BTRFS_EXPERIMENTAL
+int __init btrfs_raid1_balancing_init(void);
+char *btrfs_get_raid1_balancing(void);
+#endif
#endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ee2dd7e461b3..3cf4fc06d261 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1327,10 +1327,13 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
fs_devices->latest_dev = latest_dev;
fs_devices->total_rw_bytes = 0;
fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_REGULAR;
- fs_devices->read_policy = BTRFS_READ_POLICY_PID;
#ifdef CONFIG_BTRFS_EXPERIMENTAL
fs_devices->rr_min_contiguous_read = BTRFS_DEFAULT_RR_MIN_CONTIGUOUS_READ;
fs_devices->read_devid = latest_dev->devid;
+ fs_devices->read_policy =
+ btrfs_read_policy_to_enum(btrfs_get_raid1_balancing(), NULL);
+#else
+ fs_devices->read_policy = BTRFS_READ_POLICY_PID;
#endif
return 0;
--
2.47.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 9/9] btrfs: modload to print RAID1 balancing status
2024-12-16 18:13 [PATCH v4 00/10] raid1 balancing methods Anand Jain
` (7 preceding siblings ...)
2024-12-16 18:13 ` [PATCH v4 8/9] btrfs: enable RAID1 balancing configuration via modprobe parameter Anand Jain
@ 2024-12-16 18:13 ` Anand Jain
2024-12-17 3:53 ` [PATCH] btrfs: fix smatch warning inconsistent indenting Anand Jain
8 siblings, 1 reply; 17+ messages in thread
From: Anand Jain @ 2024-12-16 18:13 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, wqu, hrx, waxhead
Modified the Btrfs loading message to include the RAID1 balancing status
if the experimental feature is enabled.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/super.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 58190989a29d..573e8e1a2b24 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2480,7 +2480,17 @@ static int __init btrfs_print_mod_info(void)
", fsverity=no"
#endif
;
- pr_info("Btrfs loaded%s\n", options);
+
+#ifdef CONFIG_BTRFS_EXPERIMENTAL
+ if (btrfs_get_raid1_balancing() == NULL)
+ pr_info("Btrfs loaded%s\n", options);
+ else
+ pr_info("Btrfs loaded%s, raid1_balancing=%s\n",
+ options, btrfs_get_raid1_balancing());
+#else
+ pr_info("Btrfs loaded%s\n", options);
+#endif
+
return 0;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH] btrfs: fix smatch warning inconsistent indenting
2024-12-16 18:13 ` [PATCH v4 9/9] btrfs: modload to print RAID1 balancing status Anand Jain
@ 2024-12-17 3:53 ` Anand Jain
0 siblings, 0 replies; 17+ messages in thread
From: Anand Jain @ 2024-12-17 3:53 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, wqu, hrx, waxhead, kernel test robot
Corrects indentation warning reported by smatch.
Fixes:
[PATCH v4 9/9] btrfs: modload to print RAID1 balancing status
in the ML, for the fixup in it.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202412170215.uLhijGQD-lkp@intel.com/
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
This patch should be rolled into the mentioned patch.
fs/btrfs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 573e8e1a2b24..236eec7c19cf 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2488,7 +2488,7 @@ static int __init btrfs_print_mod_info(void)
pr_info("Btrfs loaded%s, raid1_balancing=%s\n",
options, btrfs_get_raid1_balancing());
#else
- pr_info("Btrfs loaded%s\n", options);
+ pr_info("Btrfs loaded%s\n", options);
#endif
return 0;
--
2.47.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/9] btrfs: initialize fs_devices->fs_info earlier
2024-12-16 18:13 ` [PATCH v4 1/9] btrfs: initialize fs_devices->fs_info earlier Anand Jain
@ 2024-12-18 3:08 ` Naohiro Aota
0 siblings, 0 replies; 17+ messages in thread
From: Naohiro Aota @ 2024-12-18 3:08 UTC (permalink / raw)
To: Anand Jain
Cc: linux-btrfs@vger.kernel.org, dsterba@suse.com, WenRuo Qu,
hrx@bupt.moe, waxhead@dirtcellar.net
On Tue, Dec 17, 2024 at 02:13:09AM +0800, Anand Jain wrote:
> Currently, fs_devices->fs_info is initialized in btrfs_init_devices_late(),
> but this occurs too late for find_live_mirror(), which is invoked by
> load_super_root() much earlier than btrfs_init_devices_late().
>
> Fix this by moving the initialization to open_ctree(), before load_super_root().
>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
Looks good to me.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 3/9] btrfs: add btrfs_read_policy_to_enum helper and refactor read policy store
2024-12-16 18:13 ` [PATCH v4 3/9] btrfs: add btrfs_read_policy_to_enum helper and refactor read policy store Anand Jain
@ 2024-12-18 3:17 ` Naohiro Aota
2024-12-18 5:48 ` Anand Jain
0 siblings, 1 reply; 17+ messages in thread
From: Naohiro Aota @ 2024-12-18 3:17 UTC (permalink / raw)
To: Anand Jain
Cc: linux-btrfs@vger.kernel.org, dsterba@suse.com, WenRuo Qu,
hrx@bupt.moe, waxhead@dirtcellar.net
On Tue, Dec 17, 2024 at 02:13:11AM +0800, Anand Jain wrote:
> Introduce the `btrfs_read_policy_to_enum` helper function to simplify the
> conversion of a string read policy to its corresponding enum value. This
> reduces duplication and improves code clarity in `btrfs_read_policy_store`.
> The `btrfs_read_policy_store` function has been refactored to use the new
> helper.
>
> The parameter is copied locally to allow modification, enabling the
> separation of the method and its value. This prepares for the addition of
> more functionality in subsequent patches.
>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> fs/btrfs/sysfs.c | 46 ++++++++++++++++++++++++++++++++++------------
> 1 file changed, 34 insertions(+), 12 deletions(-)
>
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index fd3c49c6c3c5..34903e5bf8d0 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -1307,6 +1307,30 @@ BTRFS_ATTR(, temp_fsid, btrfs_temp_fsid_show);
>
> static const char * const btrfs_read_policy_name[] = { "pid" };
>
> +static int btrfs_read_policy_to_enum(const char *str)
> +{
> + char param[32] = {'\0'};
> + int index;
> + bool found = false;
> +
> + if (!str || strlen(str) == 0)
> + return 0;
> +
> + strcpy(param, str);
I think "str" is originated from user input. So, using strcpy can cause a
buffer overflow.
> +
> + for (index = 0; index < BTRFS_NR_READ_POLICY; index++) {
> + if (sysfs_streq(param, btrfs_read_policy_name[index])) {
> + found = true;
> + break;
> + }
> + }
> +
> + if (found)
> + return index;
> +
> + return -EINVAL;
We can replace the logic with sysfs_match_string().
> +}
> +
> static ssize_t btrfs_read_policy_show(struct kobject *kobj,
> struct kobj_attribute *a, char *buf)
> {
> @@ -1338,21 +1362,19 @@ static ssize_t btrfs_read_policy_store(struct kobject *kobj,
> const char *buf, size_t len)
> {
> struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
> - int i;
> + int index;
>
> - for (i = 0; i < BTRFS_NR_READ_POLICY; i++) {
> - if (sysfs_streq(buf, btrfs_read_policy_name[i])) {
> - if (i != READ_ONCE(fs_devices->read_policy)) {
> - WRITE_ONCE(fs_devices->read_policy, i);
> - btrfs_info(fs_devices->fs_info,
> - "read policy set to '%s'",
> - btrfs_read_policy_name[i]);
> - }
> - return len;
> - }
> + index = btrfs_read_policy_to_enum(buf);
> + if (index == -EINVAL)
> + return -EINVAL;
> +
> + if (index != READ_ONCE(fs_devices->read_policy)) {
> + WRITE_ONCE(fs_devices->read_policy, index);
> + btrfs_info(fs_devices->fs_info, "read policy set to '%s'",
> + btrfs_read_policy_name[index]);
> }
>
> - return -EINVAL;
> + return len;
> }
> BTRFS_ATTR_RW(, read_policy, btrfs_read_policy_show, btrfs_read_policy_store);
>
> --
> 2.47.0
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 3/9] btrfs: add btrfs_read_policy_to_enum helper and refactor read policy store
2024-12-18 3:17 ` Naohiro Aota
@ 2024-12-18 5:48 ` Anand Jain
2025-01-01 9:49 ` Anand Jain
0 siblings, 1 reply; 17+ messages in thread
From: Anand Jain @ 2024-12-18 5:48 UTC (permalink / raw)
To: Naohiro Aota
Cc: linux-btrfs@vger.kernel.org, dsterba@suse.com, WenRuo Qu,
hrx@bupt.moe, waxhead@dirtcellar.net
On 18/12/24 08:47, Naohiro Aota wrote:
> On Tue, Dec 17, 2024 at 02:13:11AM +0800, Anand Jain wrote:
>> Introduce the `btrfs_read_policy_to_enum` helper function to simplify the
>> conversion of a string read policy to its corresponding enum value. This
>> reduces duplication and improves code clarity in `btrfs_read_policy_store`.
>> The `btrfs_read_policy_store` function has been refactored to use the new
>> helper.
>>
>> The parameter is copied locally to allow modification, enabling the
>> separation of the method and its value. This prepares for the addition of
>> more functionality in subsequent patches.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> fs/btrfs/sysfs.c | 46 ++++++++++++++++++++++++++++++++++------------
>> 1 file changed, 34 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
>> index fd3c49c6c3c5..34903e5bf8d0 100644
>> --- a/fs/btrfs/sysfs.c
>> +++ b/fs/btrfs/sysfs.c
>> @@ -1307,6 +1307,30 @@ BTRFS_ATTR(, temp_fsid, btrfs_temp_fsid_show);
>>
>> static const char * const btrfs_read_policy_name[] = { "pid" };
>>
>> +static int btrfs_read_policy_to_enum(const char *str)
>> +{
>> + char param[32] = {'\0'};
>> + int index;
>> + bool found = false;
>> +
>> + if (!str || strlen(str) == 0)
>> + return 0;
>> +
>> + strcpy(param, str);
>
> I think "str" is originated from user input. So, using strcpy can cause a
> buffer overflow.
Oh no. I missed that to check later. Made local changed and updated
sysfs input validity test-script.
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 165371a9702c..8faa014c475a 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1337,7 +1337,7 @@ int btrfs_read_policy_to_enum(const char *str, s64
*value)
if (!str || strlen(str) == 0)
return 0;
- strcpy(param, str);
+ strncpy(param, str, 31);
#ifdef CONFIG_BTRFS_EXPERIMENTAL
/* Separate value from input in policy:value format. */
Thanks!!, Anand
>
>> +
>> + for (index = 0; index < BTRFS_NR_READ_POLICY; index++) {
>> + if (sysfs_streq(param, btrfs_read_policy_name[index])) {
>> + found = true;
>> + break;
>> + }
>> + }
>> +
>> + if (found)
>> + return index;
>> +
>> + return -EINVAL;
>
> We can replace the logic with sysfs_match_string().
>
>> +}
>> +
>> static ssize_t btrfs_read_policy_show(struct kobject *kobj,
>> struct kobj_attribute *a, char *buf)
>> {
>> @@ -1338,21 +1362,19 @@ static ssize_t btrfs_read_policy_store(struct kobject *kobj,
>> const char *buf, size_t len)
>> {
>> struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
>> - int i;
>> + int index;
>>
>> - for (i = 0; i < BTRFS_NR_READ_POLICY; i++) {
>> - if (sysfs_streq(buf, btrfs_read_policy_name[i])) {
>> - if (i != READ_ONCE(fs_devices->read_policy)) {
>> - WRITE_ONCE(fs_devices->read_policy, i);
>> - btrfs_info(fs_devices->fs_info,
>> - "read policy set to '%s'",
>> - btrfs_read_policy_name[i]);
>> - }
>> - return len;
>> - }
>> + index = btrfs_read_policy_to_enum(buf);
>> + if (index == -EINVAL)
>> + return -EINVAL;
>> +
>> + if (index != READ_ONCE(fs_devices->read_policy)) {
>> + WRITE_ONCE(fs_devices->read_policy, index);
>> + btrfs_info(fs_devices->fs_info, "read policy set to '%s'",
>> + btrfs_read_policy_name[index]);
>> }
>>
>> - return -EINVAL;
>> + return len;
>> }
>> BTRFS_ATTR_RW(, read_policy, btrfs_read_policy_show, btrfs_read_policy_store);
>>
>> --
>> 2.47.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v4 5/9] btrfs: introduce RAID1 round-robin read balancing
2024-12-16 18:13 ` [PATCH v4 5/9] btrfs: introduce RAID1 round-robin read balancing Anand Jain
@ 2024-12-18 5:53 ` Naohiro Aota
2024-12-18 15:20 ` Anand Jain
0 siblings, 1 reply; 17+ messages in thread
From: Naohiro Aota @ 2024-12-18 5:53 UTC (permalink / raw)
To: Anand Jain
Cc: linux-btrfs@vger.kernel.org, dsterba@suse.com, WenRuo Qu,
hrx@bupt.moe, waxhead@dirtcellar.net
On Tue, Dec 17, 2024 at 02:13:13AM +0800, Anand Jain wrote:
> This feature balances I/O across the striped devices when reading from
> RAID1 blocks.
>
> echo round-robin[:min_contiguous_read] > /sys/fs/btrfs/<uuid>/read_policy
>
> The min_contiguous_read parameter defines the minimum read size before
> switching to the next mirrored device. This setting is optional, with a
> default value of 256 KiB.
>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> fs/btrfs/sysfs.c | 44 +++++++++++++++++++++++++++-
> fs/btrfs/volumes.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++
> fs/btrfs/volumes.h | 11 +++++++
> 3 files changed, 127 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index 9c7bedf974d2..b0e1fb787ce6 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -1305,7 +1305,12 @@ static ssize_t btrfs_temp_fsid_show(struct kobject *kobj,
> }
> BTRFS_ATTR(, temp_fsid, btrfs_temp_fsid_show);
>
> -static const char * const btrfs_read_policy_name[] = { "pid" };
> +static const char *btrfs_read_policy_name[] = {
> + "pid",
> +#ifdef CONFIG_BTRFS_EXPERIMENTAL
> + "round-robin",
> +#endif
> +};
>
> static int btrfs_read_policy_to_enum(const char *str, s64 *value)
> {
> @@ -1359,6 +1364,12 @@ static ssize_t btrfs_read_policy_show(struct kobject *kobj,
>
> ret += sysfs_emit_at(buf, ret, "%s", btrfs_read_policy_name[i]);
>
> +#ifdef CONFIG_BTRFS_EXPERIMENTAL
> + if (i == BTRFS_READ_POLICY_RR)
> + ret += sysfs_emit_at(buf, ret, ":%d",
> + fs_devices->rr_min_contiguous_read);
I guess we want READ_ONCE() here as well.
> +#endif
> +
> if (i == policy)
> ret += sysfs_emit_at(buf, ret, "]");
> }
> @@ -1380,6 +1391,37 @@ static ssize_t btrfs_read_policy_store(struct kobject *kobj,
> if (index == -EINVAL)
> return -EINVAL;
>
> +#ifdef CONFIG_BTRFS_EXPERIMENTAL
> + if (index == BTRFS_READ_POLICY_RR) {
> + if (value != -1) {
> + u32 sectorsize = fs_devices->fs_info->sectorsize;
> +
> + if (!IS_ALIGNED(value, sectorsize)) {
> + u64 temp_value = round_up(value, sectorsize);
> +
> + btrfs_warn(fs_devices->fs_info,
> +"read_policy: min contiguous read %lld should be multiples of the sectorsize %u, rounded to %llu",
> + value, sectorsize, temp_value);
> + value = temp_value;
> + }
> + } else {
> + value = BTRFS_DEFAULT_RR_MIN_CONTIGUOUS_READ;
> + }
> +
> + if (index != READ_ONCE(fs_devices->read_policy) ||
> + value != READ_ONCE(fs_devices->rr_min_contiguous_read)) {
> + WRITE_ONCE(fs_devices->read_policy, index);
> + WRITE_ONCE(fs_devices->rr_min_contiguous_read, value);
> + atomic_set(&fs_devices->total_reads, 0);
> +
> + btrfs_info(fs_devices->fs_info, "read policy set to '%s:%lld'",
> + btrfs_read_policy_name[index], value);
> +
> + }
> +
> + return len;
> + }
> +#endif
> if (index != READ_ONCE(fs_devices->read_policy)) {
> WRITE_ONCE(fs_devices->read_policy, index);
> btrfs_info(fs_devices->fs_info, "read policy set to '%s'",
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index fe5ceea2ba0b..77c3b66d56a0 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1328,6 +1328,9 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
> fs_devices->total_rw_bytes = 0;
> fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_REGULAR;
> fs_devices->read_policy = BTRFS_READ_POLICY_PID;
> +#ifdef CONFIG_BTRFS_EXPERIMENTAL
> + fs_devices->rr_min_contiguous_read = BTRFS_DEFAULT_RR_MIN_CONTIGUOUS_READ;
> +#endif
>
> return 0;
> }
> @@ -5959,6 +5962,71 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,
> return len;
> }
>
> +#ifdef CONFIG_BTRFS_EXPERIMENTAL
> +struct stripe_mirror {
> + u64 devid;
> + int num;
> +};
> +
> +static int btrfs_cmp_devid(const void *a, const void *b)
> +{
> + const struct stripe_mirror *s1 = (struct stripe_mirror *)a;
> + const struct stripe_mirror *s2 = (struct stripe_mirror *)b;
> +
> + if (s1->devid < s2->devid)
> + return -1;
> + if (s1->devid > s2->devid)
> + return 1;
> + return 0;
> +}
> +
> +/*
> + * btrfs_read_rr.
> + *
> + * Select a stripe for reading using a round-robin algorithm:
> + *
> + * 1. Compute the read cycle as the total sectors read divided by the minimum
> + * sectors per device.
> + * 2. Determine the stripe number for the current read by taking the modulus
> + * of the read cycle with the total number of stripes:
> + *
> + * stripe index = (total sectors / min sectors per dev) % num stripes
> + *
> + * The calculated stripe index is then used to select the corresponding device
> + * from the list of devices, which is ordered by devid.
> + */
> +static int btrfs_read_rr(struct btrfs_chunk_map *map, int first, int num_stripe)
> +{
> + struct stripe_mirror stripes[BTRFS_RAID1_MAX_MIRRORS] = {0};
> + struct btrfs_fs_devices *fs_devices;
> + struct btrfs_device *device;
> + int read_cycle;
> + int index;
> + int ret_stripe;
> + int total_reads;
> + int reads_per_dev = 0;
> +
> + device = map->stripes[first].dev;
> +
> + fs_devices = device->fs_devices;
> + reads_per_dev = fs_devices->rr_min_contiguous_read >> SECTOR_SHIFT;
Want READ_ONCE() as well. Also, is it OK to divide it with (1 <<
SECTOR_SHIFT), which is not necessary equal to fs_info->sectorsize?
> + index = 0;
> + for (int i = first; i < first + num_stripe; i++) {
> + stripes[index].devid = map->stripes[i].dev->devid;
> + stripes[index].num = i;
> + index++;
> + }
> + sort(stripes, num_stripe, sizeof(struct stripe_mirror),
> + btrfs_cmp_devid, NULL);
> +
> + total_reads = atomic_inc_return(&fs_devices->total_reads);
> + read_cycle = total_reads / reads_per_dev;
> + ret_stripe = stripes[read_cycle % num_stripe].num;
I'm not sure the logic here. Since the code increments the total_reads
counter by 1, can we assume this function is invoked per
fs_info->sectorsize?
> +
> + return ret_stripe;
> +}
> +#endif
> +
> static int find_live_mirror(struct btrfs_fs_info *fs_info,
> struct btrfs_chunk_map *map, int first,
> int dev_replace_is_ongoing)
> @@ -5988,6 +6056,11 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
> case BTRFS_READ_POLICY_PID:
> preferred_mirror = first + (current->pid % num_stripes);
> break;
> +#ifdef CONFIG_BTRFS_EXPERIMENTAL
> + case BTRFS_READ_POLICY_RR:
> + preferred_mirror = btrfs_read_rr(map, first, num_stripes);
> + break;
> +#endif
> }
>
> if (dev_replace_is_ongoing &&
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 3a416b1bc24c..b7b130ce0b10 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -296,6 +296,8 @@ enum btrfs_chunk_allocation_policy {
> BTRFS_CHUNK_ALLOC_ZONED,
> };
>
> +#define BTRFS_DEFAULT_RR_MIN_CONTIGUOUS_READ (SZ_256K)
> +#define BTRFS_RAID1_MAX_MIRRORS (4)
> /*
> * Read policies for mirrored block group profiles, read picks the stripe based
> * on these policies.
> @@ -303,6 +305,10 @@ enum btrfs_chunk_allocation_policy {
> enum btrfs_read_policy {
> /* Use process PID to choose the stripe */
> BTRFS_READ_POLICY_PID,
> +#ifdef CONFIG_BTRFS_EXPERIMENTAL
> + /* Balancing raid1 reads across all striped devices (round-robin) */
> + BTRFS_READ_POLICY_RR,
> +#endif
> BTRFS_NR_READ_POLICY,
> };
>
> @@ -431,6 +437,11 @@ struct btrfs_fs_devices {
> enum btrfs_read_policy read_policy;
>
> #ifdef CONFIG_BTRFS_EXPERIMENTAL
> + /* IO stat, read counter. */
> + atomic_t total_reads;
> + /* Min contiguous reads before switching to next device. */
> + int rr_min_contiguous_read;
> +
> /* Checksum mode - offload it or do it synchronously. */
> enum btrfs_offload_csum_mode offload_csum_mode;
> #endif
> --
> 2.47.0
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 5/9] btrfs: introduce RAID1 round-robin read balancing
2024-12-18 5:53 ` Naohiro Aota
@ 2024-12-18 15:20 ` Anand Jain
0 siblings, 0 replies; 17+ messages in thread
From: Anand Jain @ 2024-12-18 15:20 UTC (permalink / raw)
To: Naohiro Aota
Cc: linux-btrfs@vger.kernel.org, dsterba@suse.com, WenRuo Qu,
hrx@bupt.moe, waxhead@dirtcellar.net
On 18/12/24 11:23, Naohiro Aota wrote:
> On Tue, Dec 17, 2024 at 02:13:13AM +0800, Anand Jain wrote:
>> This feature balances I/O across the striped devices when reading from
>> RAID1 blocks.
>>
>> echo round-robin[:min_contiguous_read] > /sys/fs/btrfs/<uuid>/read_policy
>>
>> The min_contiguous_read parameter defines the minimum read size before
>> switching to the next mirrored device. This setting is optional, with a
>> default value of 256 KiB.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> fs/btrfs/sysfs.c | 44 +++++++++++++++++++++++++++-
>> fs/btrfs/volumes.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++
>> fs/btrfs/volumes.h | 11 +++++++
>> 3 files changed, 127 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
>> index 9c7bedf974d2..b0e1fb787ce6 100644
>> --- a/fs/btrfs/sysfs.c
>> +++ b/fs/btrfs/sysfs.c
>> @@ -1305,7 +1305,12 @@ static ssize_t btrfs_temp_fsid_show(struct kobject *kobj,
>> }
>> BTRFS_ATTR(, temp_fsid, btrfs_temp_fsid_show);
>>
>> -static const char * const btrfs_read_policy_name[] = { "pid" };
>> +static const char *btrfs_read_policy_name[] = {
>> + "pid",
>> +#ifdef CONFIG_BTRFS_EXPERIMENTAL
>> + "round-robin",
>> +#endif
>> +};
>>
>> static int btrfs_read_policy_to_enum(const char *str, s64 *value)
>> {
>> @@ -1359,6 +1364,12 @@ static ssize_t btrfs_read_policy_show(struct kobject *kobj,
>>
>> ret += sysfs_emit_at(buf, ret, "%s", btrfs_read_policy_name[i]);
>>
>> +#ifdef CONFIG_BTRFS_EXPERIMENTAL
>> + if (i == BTRFS_READ_POLICY_RR)
>> + ret += sysfs_emit_at(buf, ret, ":%d",
>> + fs_devices->rr_min_contiguous_read);
>
> I guess we want READ_ONCE() here as well.
>
>> +#endif
>> +
>> if (i == policy)
>> ret += sysfs_emit_at(buf, ret, "]");
>> }
>> @@ -1380,6 +1391,37 @@ static ssize_t btrfs_read_policy_store(struct kobject *kobj,
>> if (index == -EINVAL)
>> return -EINVAL;
>>
>> +#ifdef CONFIG_BTRFS_EXPERIMENTAL
>> + if (index == BTRFS_READ_POLICY_RR) {
>> + if (value != -1) {
>> + u32 sectorsize = fs_devices->fs_info->sectorsize;
>> +
>> + if (!IS_ALIGNED(value, sectorsize)) {
>> + u64 temp_value = round_up(value, sectorsize);
>> +
>> + btrfs_warn(fs_devices->fs_info,
>> +"read_policy: min contiguous read %lld should be multiples of the sectorsize %u, rounded to %llu",
>> + value, sectorsize, temp_value);
>> + value = temp_value;
>> + }
>> + } else {
>> + value = BTRFS_DEFAULT_RR_MIN_CONTIGUOUS_READ;
>> + }
>> +
>> + if (index != READ_ONCE(fs_devices->read_policy) ||
>> + value != READ_ONCE(fs_devices->rr_min_contiguous_read)) {
>> + WRITE_ONCE(fs_devices->read_policy, index);
>> + WRITE_ONCE(fs_devices->rr_min_contiguous_read, value);
>> + atomic_set(&fs_devices->total_reads, 0);
>> +
>> + btrfs_info(fs_devices->fs_info, "read policy set to '%s:%lld'",
>> + btrfs_read_policy_name[index], value);
>> +
>> + }
>> +
>> + return len;
>> + }
>> +#endif
>> if (index != READ_ONCE(fs_devices->read_policy)) {
>> WRITE_ONCE(fs_devices->read_policy, index);
>> btrfs_info(fs_devices->fs_info, "read policy set to '%s'",
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index fe5ceea2ba0b..77c3b66d56a0 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1328,6 +1328,9 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
>> fs_devices->total_rw_bytes = 0;
>> fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_REGULAR;
>> fs_devices->read_policy = BTRFS_READ_POLICY_PID;
>> +#ifdef CONFIG_BTRFS_EXPERIMENTAL
>> + fs_devices->rr_min_contiguous_read = BTRFS_DEFAULT_RR_MIN_CONTIGUOUS_READ;
>> +#endif
>>
>> return 0;
>> }
>> @@ -5959,6 +5962,71 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,
>> return len;
>> }
>>
>> +#ifdef CONFIG_BTRFS_EXPERIMENTAL
>> +struct stripe_mirror {
>> + u64 devid;
>> + int num;
>> +};
>> +
>> +static int btrfs_cmp_devid(const void *a, const void *b)
>> +{
>> + const struct stripe_mirror *s1 = (struct stripe_mirror *)a;
>> + const struct stripe_mirror *s2 = (struct stripe_mirror *)b;
>> +
>> + if (s1->devid < s2->devid)
>> + return -1;
>> + if (s1->devid > s2->devid)
>> + return 1;
>> + return 0;
>> +}
>> +
>> +/*
>> + * btrfs_read_rr.
>> + *
>> + * Select a stripe for reading using a round-robin algorithm:
>> + *
>> + * 1. Compute the read cycle as the total sectors read divided by the minimum
>> + * sectors per device.
>> + * 2. Determine the stripe number for the current read by taking the modulus
>> + * of the read cycle with the total number of stripes:
>> + *
>> + * stripe index = (total sectors / min sectors per dev) % num stripes
>> + *
>> + * The calculated stripe index is then used to select the corresponding device
>> + * from the list of devices, which is ordered by devid.
>> + */
>> +static int btrfs_read_rr(struct btrfs_chunk_map *map, int first, int num_stripe)
>> +{
>> + struct stripe_mirror stripes[BTRFS_RAID1_MAX_MIRRORS] = {0};
>> + struct btrfs_fs_devices *fs_devices;
>> + struct btrfs_device *device;
>> + int read_cycle;
>> + int index;
>> + int ret_stripe;
>> + int total_reads;
>> + int reads_per_dev = 0;
>> +
>> + device = map->stripes[first].dev;
>> +
>> + fs_devices = device->fs_devices;
>> + reads_per_dev = fs_devices->rr_min_contiguous_read >> SECTOR_SHIFT;
>
> Want READ_ONCE() as well. Also, is it OK to divide it with (1 <<
> SECTOR_SHIFT), which is not necessary equal to fs_info->sectorsize?
>
>> + index = 0;
>> + for (int i = first; i < first + num_stripe; i++) {
>> + stripes[index].devid = map->stripes[i].dev->devid;
>> + stripes[index].num = i;
>> + index++;
>> + }
>> + sort(stripes, num_stripe, sizeof(struct stripe_mirror),
>> + btrfs_cmp_devid, NULL);
>> +
>> + total_reads = atomic_inc_return(&fs_devices->total_reads);
>> + read_cycle = total_reads / reads_per_dev;
>> + ret_stripe = stripes[read_cycle % num_stripe].num;
>
> I'm not sure the logic here. Since the code increments the total_reads
> counter by 1, can we assume this function is invoked per
> fs_info->sectorsize?
>
You're right. To fix this, we need to track read I/O stat in
`struct device` on our own. I avoided this earlier as the
block layer already provides I/O stats (though they might
be stale). Unless there is a better way. I'm trying.
Thanks for your review.
-Anand
>> +
>> + return ret_stripe;
>> +}
>> +#endif
>> +
>> static int find_live_mirror(struct btrfs_fs_info *fs_info,
>> struct btrfs_chunk_map *map, int first,
>> int dev_replace_is_ongoing)
>> @@ -5988,6 +6056,11 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
>> case BTRFS_READ_POLICY_PID:
>> preferred_mirror = first + (current->pid % num_stripes);
>> break;
>> +#ifdef CONFIG_BTRFS_EXPERIMENTAL
>> + case BTRFS_READ_POLICY_RR:
>> + preferred_mirror = btrfs_read_rr(map, first, num_stripes);
>> + break;
>> +#endif
>> }
>>
>> if (dev_replace_is_ongoing &&
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 3a416b1bc24c..b7b130ce0b10 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -296,6 +296,8 @@ enum btrfs_chunk_allocation_policy {
>> BTRFS_CHUNK_ALLOC_ZONED,
>> };
>>
>> +#define BTRFS_DEFAULT_RR_MIN_CONTIGUOUS_READ (SZ_256K)
>> +#define BTRFS_RAID1_MAX_MIRRORS (4)
>> /*
>> * Read policies for mirrored block group profiles, read picks the stripe based
>> * on these policies.
>> @@ -303,6 +305,10 @@ enum btrfs_chunk_allocation_policy {
>> enum btrfs_read_policy {
>> /* Use process PID to choose the stripe */
>> BTRFS_READ_POLICY_PID,
>> +#ifdef CONFIG_BTRFS_EXPERIMENTAL
>> + /* Balancing raid1 reads across all striped devices (round-robin) */
>> + BTRFS_READ_POLICY_RR,
>> +#endif
>> BTRFS_NR_READ_POLICY,
>> };
>>
>> @@ -431,6 +437,11 @@ struct btrfs_fs_devices {
>> enum btrfs_read_policy read_policy;
>>
>> #ifdef CONFIG_BTRFS_EXPERIMENTAL
>> + /* IO stat, read counter. */
>> + atomic_t total_reads;
>> + /* Min contiguous reads before switching to next device. */
>> + int rr_min_contiguous_read;
>> +
>> /* Checksum mode - offload it or do it synchronously. */
>> enum btrfs_offload_csum_mode offload_csum_mode;
>> #endif
>> --
>> 2.47.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 3/9] btrfs: add btrfs_read_policy_to_enum helper and refactor read policy store
2024-12-18 5:48 ` Anand Jain
@ 2025-01-01 9:49 ` Anand Jain
0 siblings, 0 replies; 17+ messages in thread
From: Anand Jain @ 2025-01-01 9:49 UTC (permalink / raw)
To: Naohiro Aota
Cc: linux-btrfs@vger.kernel.org, dsterba@suse.com, WenRuo Qu,
hrx@bupt.moe, waxhead@dirtcellar.net
>>> + for (index = 0; index < BTRFS_NR_READ_POLICY; index++) {
>>> + if (sysfs_streq(param, btrfs_read_policy_name[index])) {
>>> + found = true;
>>> + break;
>>> + }
>>> + }
>>> +
>>> + if (found)
>>> + return index;
>>> +
>>> + return -EINVAL;
>>
>> We can replace the logic with sysfs_match_string().
This has been fixed as well, in the local ws.
Thanks, Anand
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-01-01 9:49 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-16 18:13 [PATCH v4 00/10] raid1 balancing methods Anand Jain
2024-12-16 18:13 ` [PATCH v4 1/9] btrfs: initialize fs_devices->fs_info earlier Anand Jain
2024-12-18 3:08 ` Naohiro Aota
2024-12-16 18:13 ` [PATCH v4 2/9] btrfs: simplify output formatting in btrfs_read_policy_show Anand Jain
2024-12-16 18:13 ` [PATCH v4 3/9] btrfs: add btrfs_read_policy_to_enum helper and refactor read policy store Anand Jain
2024-12-18 3:17 ` Naohiro Aota
2024-12-18 5:48 ` Anand Jain
2025-01-01 9:49 ` Anand Jain
2024-12-16 18:13 ` [PATCH v4 4/9] btrfs: handle value associated with raid1 balancing parameter Anand Jain
2024-12-16 18:13 ` [PATCH v4 5/9] btrfs: introduce RAID1 round-robin read balancing Anand Jain
2024-12-18 5:53 ` Naohiro Aota
2024-12-18 15:20 ` Anand Jain
2024-12-16 18:13 ` [PATCH v4 6/9] btrfs: add RAID1 preferred read device Anand Jain
2024-12-16 18:13 ` [PATCH v4 7/9] btrfs: expose experimental mode in module information Anand Jain
2024-12-16 18:13 ` [PATCH v4 8/9] btrfs: enable RAID1 balancing configuration via modprobe parameter Anand Jain
2024-12-16 18:13 ` [PATCH v4 9/9] btrfs: modload to print RAID1 balancing status Anand Jain
2024-12-17 3:53 ` [PATCH] btrfs: fix smatch warning inconsistent indenting Anand Jain
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox