* [PATCH v6 1/5] btrfs: add btrfs_strmatch helper
2020-02-19 11:29 [PATCH v6 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
@ 2020-02-19 11:29 ` Anand Jain
2020-02-19 11:29 ` [PATCH v6 2/5] btrfs: create read policy framework Anand Jain
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Anand Jain @ 2020-02-19 11:29 UTC (permalink / raw)
To: linux-btrfs; +Cc: josef, dsterba
Add a generic helper to match the golden-string in the given-string,
and ignore the leading and trailing whitespaces if any.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Suggested-by: David Sterba <dsterba@suse.com>
---
v5: born
fs/btrfs/sysfs.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 93cf76118a04..7bb68cef98ab 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -809,6 +809,29 @@ static ssize_t btrfs_checksum_show(struct kobject *kobj,
BTRFS_ATTR(, checksum, btrfs_checksum_show);
+/*
+ * Match the %golden in the %given. Ignore the leading and trailing whitespaces
+ * if any.
+ */
+static int btrfs_strmatch(const char *given, const char *golden)
+{
+ size_t len = strlen(golden);
+ char *stripped;
+
+ /* strip leading whitespace */
+ stripped = skip_spaces(given);
+
+ if (strncmp(stripped, golden, len) == 0) {
+ /* strip trailing whitespace */
+ if (strlen(skip_spaces(stripped + len)))
+ return -EINVAL;
+
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
static const struct attribute *btrfs_attrs[] = {
BTRFS_ATTR_PTR(, label),
BTRFS_ATTR_PTR(, nodesize),
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v6 2/5] btrfs: create read policy framework
2020-02-19 11:29 [PATCH v6 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
2020-02-19 11:29 ` [PATCH v6 1/5] btrfs: add btrfs_strmatch helper Anand Jain
@ 2020-02-19 11:29 ` Anand Jain
2020-02-19 11:29 ` [PATCH v6 3/5] btrfs: create read policy sysfs attribute, pid Anand Jain
` (2 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Anand Jain @ 2020-02-19 11:29 UTC (permalink / raw)
To: linux-btrfs; +Cc: josef, dsterba
As of now we use %pid method to read stripped mirrored data, which means
process id determines the stripe id to be read. This type of routing
typically helps in a system with many small independent processes tying
to read random data. On the other hand the %pid based read IO policy is
inefficient because if there is a single process trying to read large
data the overall disk bandwidth remains under-utilized.
So this patch introduces read policy framework so that we could add more
read policies, such as IO routing based on device's wait-queue or manual
when we have a read-preferred device or a policy based on the target
storage caching.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
v5: Title renamed from:- btrfs: add read_policy framework
Change log updated.
Unnecessary comment dropped, added more where necessary.
Optimize code in the switch remove duplicate code.
Define BTRFS_READ_POLICY_DEFAULT dropped.
Rename enum btrfs_read_policy_type to enum btrfs_read_policy.
Rename BTRFS_READ_BY_PID to BTRFS_READ_POLICY_PID.
(As its mainly renames. Reviewed-by retained).
v4: -
v3: Declare fs_devices::readmirror as enum btrfs_readmirror_policy_type
v2: Declare fs_devices::readmirror as u8 instead of atomic_t
A small change in comment and change log wordings.
fs/btrfs/volumes.c | 13 ++++++++++++-
fs/btrfs/volumes.h | 14 ++++++++++++++
2 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 387f80656476..b6efb87bb0ae 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1209,6 +1209,7 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
fs_devices->opened = 1;
fs_devices->latest_bdev = latest_dev->bdev;
fs_devices->total_rw_bytes = 0;
+ fs_devices->read_policy = BTRFS_READ_POLICY_PID;
out:
return ret;
}
@@ -5358,7 +5359,17 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
else
num_stripes = map->num_stripes;
- preferred_mirror = first + current->pid % num_stripes;
+ switch (fs_info->fs_devices->read_policy) {
+ default:
+ /*
+ * Shouldn't happen, just warn and use pid instead of failing.
+ */
+ btrfs_warn_rl(fs_info,
+ "unknown read_policy type %u, fallback to pid",
+ fs_info->fs_devices->read_policy);
+ case BTRFS_READ_POLICY_PID:
+ preferred_mirror = first + current->pid % num_stripes;
+ }
if (dev_replace_is_ongoing &&
fs_info->dev_replace.cont_reading_from_srcdev_mode ==
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index f01552a0785e..ed2bba741b6e 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -209,6 +209,15 @@ struct btrfs_device {
BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes);
BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
+/*
+ * Read policies for the mirrored block groups, read picks the stripe based
+ * on these policies.
+ */
+enum btrfs_read_policy {
+ BTRFS_READ_POLICY_PID,
+ BTRFS_NR_READ_POLICY,
+};
+
struct btrfs_fs_devices {
u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
u8 metadata_uuid[BTRFS_FSID_SIZE];
@@ -260,6 +269,11 @@ struct btrfs_fs_devices {
struct kobject *devices_kobj;
struct kobject *devinfo_kobj;
struct completion kobj_unregister;
+
+ /*
+ * policy used to read the mirrored stripes
+ */
+ enum btrfs_read_policy read_policy;
};
#define BTRFS_BIO_INLINE_CSUM_SIZE 64
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v6 3/5] btrfs: create read policy sysfs attribute, pid
2020-02-19 11:29 [PATCH v6 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
2020-02-19 11:29 ` [PATCH v6 1/5] btrfs: add btrfs_strmatch helper Anand Jain
2020-02-19 11:29 ` [PATCH v6 2/5] btrfs: create read policy framework Anand Jain
@ 2020-02-19 11:29 ` Anand Jain
2020-02-19 11:29 ` [PATCH v6 4/5] btrfs: introduce new device-state read_preferred Anand Jain
2020-02-19 11:29 ` [PATCH v6 5/5] btrfs: introduce new read_policy device Anand Jain
4 siblings, 0 replies; 8+ messages in thread
From: Anand Jain @ 2020-02-19 11:29 UTC (permalink / raw)
To: linux-btrfs; +Cc: josef, dsterba
Add
/sys/fs/btrfs/UUID/read_policy
attribute so that the read policy for the raid1 and raid10 chunks can be
tuned.
When this attribute is read, it shall show all available policies, with
active policy being with in [ ]. The read_policy attribute can be written
using one of the items listed in there.
For example:
$cat /sys/fs/btrfs/UUID/read_policy
[pid]
$echo pid > /sys/fs/btrfs/UUID/read_policy
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v5:
Title rename: old: btrfs: sysfs, add read_policy attribute
Uses the btrfs_strmatch() helper (BTRFS_READ_POLICY_NAME_MAX dropped).
Use the table for the policy names.
Rename len to ret.
Use a simple logic to prefix space in btrfs_read_policy_show()
Reviewed-by: Josef Bacik <josef@toxicpanda.com> dropped.
v4:-
v3: rename [by_pid] to [pid]
v2: v2: check input len before strip and kstrdup
fs/btrfs/sysfs.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 7bb68cef98ab..c9a8850b186a 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -832,6 +832,54 @@ static int btrfs_strmatch(const char *given, const char *golden)
return -EINVAL;
}
+static const char* const btrfs_read_policy_name[] = { "pid" };
+
+static ssize_t btrfs_read_policy_show(struct kobject *kobj,
+ struct kobj_attribute *a, char *buf)
+{
+ int i;
+ ssize_t ret = 0;
+ struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
+
+ for (i = 0; i < BTRFS_NR_READ_POLICY; i++) {
+ if (fs_devices->read_policy == i)
+ ret += snprintf(buf + ret, PAGE_SIZE - ret, "%s[%s]",
+ (ret == 0 ? "" : " "),
+ btrfs_read_policy_name[i]);
+ else
+ ret += snprintf(buf + ret, PAGE_SIZE - ret, "%s%s",
+ (ret == 0 ? "" : " "),
+ btrfs_read_policy_name[i]);
+ }
+
+ ret += snprintf(buf + ret, PAGE_SIZE - ret, "\n");
+
+ return ret;
+}
+
+static ssize_t btrfs_read_policy_store(struct kobject *kobj,
+ struct kobj_attribute *a,
+ const char *buf, size_t len)
+{
+ int i;
+ struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
+
+ for (i = 0; i < BTRFS_NR_READ_POLICY; i++) {
+ if (btrfs_strmatch(buf, btrfs_read_policy_name[i]) == 0) {
+ if (i != fs_devices->read_policy) {
+ fs_devices->read_policy = i;
+ btrfs_info(fs_devices->fs_info,
+ "read policy set to '%s'",
+ btrfs_read_policy_name[i]);
+ }
+ return len;
+ }
+ }
+
+ return -EINVAL;
+}
+BTRFS_ATTR_RW(, read_policy, btrfs_read_policy_show, btrfs_read_policy_store);
+
static const struct attribute *btrfs_attrs[] = {
BTRFS_ATTR_PTR(, label),
BTRFS_ATTR_PTR(, nodesize),
@@ -840,6 +888,7 @@ static int btrfs_strmatch(const char *given, const char *golden)
BTRFS_ATTR_PTR(, quota_override),
BTRFS_ATTR_PTR(, metadata_uuid),
BTRFS_ATTR_PTR(, checksum),
+ BTRFS_ATTR_PTR(, read_policy),
NULL,
};
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v6 4/5] btrfs: introduce new device-state read_preferred
2020-02-19 11:29 [PATCH v6 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
` (2 preceding siblings ...)
2020-02-19 11:29 ` [PATCH v6 3/5] btrfs: create read policy sysfs attribute, pid Anand Jain
@ 2020-02-19 11:29 ` Anand Jain
2020-02-19 11:29 ` [PATCH v6 5/5] btrfs: introduce new read_policy device Anand Jain
4 siblings, 0 replies; 8+ messages in thread
From: Anand Jain @ 2020-02-19 11:29 UTC (permalink / raw)
To: linux-btrfs; +Cc: josef, dsterba
Provides a sysfs interface to set the device state as read_preferred.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v6: If there is no change in device's read prefer then don't log.
Add pid to the logs.
v5: born
fs/btrfs/sysfs.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/volumes.h | 1 +
2 files changed, 56 insertions(+)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index c9a8850b186a..72daaedb7b04 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1317,11 +1317,66 @@ static ssize_t btrfs_devinfo_writeable_show(struct kobject *kobj,
}
BTRFS_ATTR(devid, writeable, btrfs_devinfo_writeable_show);
+static ssize_t btrfs_devinfo_read_pref_show(struct kobject *kobj,
+ struct kobj_attribute *a, char *buf)
+{
+ int val;
+ struct btrfs_device *device = container_of(kobj, struct btrfs_device,
+ devid_kobj);
+
+ val = !!test_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state);
+
+ return snprintf(buf, PAGE_SIZE, "%d\n", val);
+}
+
+static ssize_t btrfs_devinfo_read_pref_store(struct kobject *kobj,
+ struct kobj_attribute *a,
+ const char *buf, size_t len)
+{
+ int ret;
+ unsigned long val;
+ struct btrfs_device *device;
+
+ ret = kstrtoul(skip_spaces(buf), 0, &val);
+ if (ret)
+ return ret;
+
+ if (val != 0 && val != 1)
+ return -EINVAL;
+
+ /*
+ * lock is not required, the btrfs_device struct can't be freed while
+ * its kobject btrfs_device::devid_kobj is still open.
+ */
+ device = container_of(kobj, struct btrfs_device, devid_kobj);
+
+ if (val &&
+ ! test_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state)) {
+
+ set_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state);
+ btrfs_info(device->fs_devices->fs_info,
+ "set read preferred on devid %llu (%d)",
+ device->devid, task_pid_nr(current));
+ } else if (!val &&
+ test_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state)) {
+
+ clear_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state);
+ btrfs_info(device->fs_devices->fs_info,
+ "reset read preferred on devid %llu (%d)",
+ device->devid, task_pid_nr(current));
+ }
+
+ return len;
+}
+BTRFS_ATTR_RW(devid, read_preferred, btrfs_devinfo_read_pref_show,
+ btrfs_devinfo_read_pref_store);
+
static struct attribute *devid_attrs[] = {
BTRFS_ATTR_PTR(devid, in_fs_metadata),
BTRFS_ATTR_PTR(devid, missing),
BTRFS_ATTR_PTR(devid, replace_target),
BTRFS_ATTR_PTR(devid, writeable),
+ BTRFS_ATTR_PTR(devid, read_preferred),
NULL
};
ATTRIBUTE_GROUPS(devid);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index ed2bba741b6e..07962a0ce898 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -52,6 +52,7 @@ struct btrfs_io_geometry {
#define BTRFS_DEV_STATE_MISSING (2)
#define BTRFS_DEV_STATE_REPLACE_TGT (3)
#define BTRFS_DEV_STATE_FLUSH_SENT (4)
+#define BTRFS_DEV_STATE_READ_PREFERRED (5)
struct btrfs_device {
struct list_head dev_list; /* device_list_mutex */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v6 5/5] btrfs: introduce new read_policy device
2020-02-19 11:29 [PATCH v6 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
` (3 preceding siblings ...)
2020-02-19 11:29 ` [PATCH v6 4/5] btrfs: introduce new device-state read_preferred Anand Jain
@ 2020-02-19 11:29 ` Anand Jain
2020-02-19 12:18 ` Steven Davies
4 siblings, 1 reply; 8+ messages in thread
From: Anand Jain @ 2020-02-19 11:29 UTC (permalink / raw)
To: linux-btrfs; +Cc: josef, dsterba
A new read policy 'device' is introduced with this patch, which when set
can pick only the device flagged as read_preferred for reading. This
tunable is for the advance users and the testers, which can make sure that
reads are read from the device they prefer for chunks of type raid1,
raid10, raid1c3 and raid1c4.
The default read policy is pid which can be changed to device as below.
$ pwd
/sys/fs/btrfs/12345678-1234-1234-1234-123456789abc
$ cat read_policy; echo device > ./read_policy; cat read_policy
[pid] device
pid [device]
One or more devices which are favored for reading should set the flag
read-preferred. In an example below a typical two disk raid1, devid1 is
configured as read preferred.
$ echo 1 > devinfo/1/read_preferred
$ cat devinfo/1/read_preferred; cat devinfo/2/read_preffered
1
0
So now when the file is read, the read IO would prefer device(s) with
read_preferred flags for reading.
$ echo 3 > /proc/sys/vm/drop_caches; md5sum /btrfs/YkZI
Since the devid 1 (sdb) is our read preferred device, the reads are set
to sdb only.
$ iostat -zy 1 | egrep 'sdb|sdc' (from another terminal)
sdb 50.00 40048.00 0.00 40048 0
$ echo 0 > ./devinfo/1/read_preferred; echo 1 >
./devinfo/2/read_preferred;
[ 3343.918658] BTRFS info (device sdb): reset read preferred on devid 1
(1334)
[ 3343.919876] BTRFS info (device sdb): set read preferred on devid 2
(1334)
$ echo 3 > /proc/sys/vm/drop_caches; md5sum /btrfs/YkZI
Since now we changed the read preferred from devid 1 (sdb) to 2 (sdc),
now all the read IO goes to sdc.
$ iostat -zy 1 | egrep 'sdb|sdc' (from another terminal)
sdc 49.00 40048.00 0.00 40048 0
Whenever there isn't any read preferred device(s) or if more than one
stripe is marked as read preferred device then this read policy shall
use the stripe 0 for reading.
The command
$ echo pid > ./read_policy
goes back to the pid read policy type.
As of now this is in memory only feature which means after a unmount
mount cycle the configuration will be lost and has to be configured
again.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v6:
. If there isn't read preferred device in the chunk don't reset
read policy to default, instead just use stripe 0. As this is in
the read path it avoids going through the device list to find
read preferred device. So inline to this drop to check if there
is read preferred device before setting read policy to device.
. Commit log updated. Adds more info about this new feature.
v5: born
fs/btrfs/sysfs.c | 3 ++-
fs/btrfs/volumes.c | 24 ++++++++++++++++++++++++
fs/btrfs/volumes.h | 1 +
3 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 72daaedb7b04..af53ed879dd6 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -832,7 +832,8 @@ static int btrfs_strmatch(const char *given, const char *golden)
return -EINVAL;
}
-static const char* const btrfs_read_policy_name[] = { "pid" };
+/* Must follow the order as in enum btrfs_read_policy */
+static const char* const btrfs_read_policy_name[] = { "pid", "device" };
static ssize_t btrfs_read_policy_show(struct kobject *kobj,
struct kobj_attribute *a, char *buf)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b6efb87bb0ae..43c09ec0bf86 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5341,6 +5341,26 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
return ret;
}
+static int btrfs_find_read_preferred(struct map_lookup *map, int num_stripe)
+{
+ int i;
+
+ /*
+ * If there are more than one read preferred devices, then just pick the
+ * first found read preferred device as of now. Once we have the Qdepth
+ * based device selection, we could pick the least busy device among the
+ * read preferred devices.
+ */
+ for (i = 0; i < num_stripe; i++) {
+ if (test_bit(BTRFS_DEV_STATE_READ_PREFERRED,
+ &map->stripes[i].dev->dev_state))
+ return i;
+ }
+
+ /* If there is no read preferred device then just use stripe 0 */
+ return 0;
+}
+
static int find_live_mirror(struct btrfs_fs_info *fs_info,
struct map_lookup *map, int first,
int dev_replace_is_ongoing)
@@ -5360,6 +5380,10 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
num_stripes = map->num_stripes;
switch (fs_info->fs_devices->read_policy) {
+ case BTRFS_READ_POLICY_DEVICE:
+ preferred_mirror = btrfs_find_read_preferred(map, num_stripes);
+ preferred_mirror = first + preferred_mirror;
+ break;
default:
/*
* Shouldn't happen, just warn and use pid instead of failing.
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 07962a0ce898..9c3c6ba7aad5 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -216,6 +216,7 @@ struct btrfs_device {
*/
enum btrfs_read_policy {
BTRFS_READ_POLICY_PID,
+ BTRFS_READ_POLICY_DEVICE,
BTRFS_NR_READ_POLICY,
};
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH v6 5/5] btrfs: introduce new read_policy device
2020-02-19 11:29 ` [PATCH v6 5/5] btrfs: introduce new read_policy device Anand Jain
@ 2020-02-19 12:18 ` Steven Davies
2020-02-20 3:54 ` Anand Jain
0 siblings, 1 reply; 8+ messages in thread
From: Steven Davies @ 2020-02-19 12:18 UTC (permalink / raw)
To: Anand Jain; +Cc: josef, dsterba, linux-btrfs
On 2020-02-19 11:29, Anand Jain wrote:
> A new read policy 'device' is introduced with this patch, which when
> set
> can pick only the device flagged as read_preferred for reading. This
> tunable is for the advance users and the testers, which can make sure
> that
> reads are read from the device they prefer for chunks of type raid1,
> raid10, raid1c3 and raid1c4.
>
> The default read policy is pid which can be changed to device as below.
>
> $ pwd
> /sys/fs/btrfs/12345678-1234-1234-1234-123456789abc
>
> $ cat read_policy; echo device > ./read_policy; cat read_policy
> [pid] device
> pid [device]
>
> One or more devices which are favored for reading should set the flag
> read-preferred. In an example below a typical two disk raid1, devid1 is
> configured as read preferred.
>
> $ echo 1 > devinfo/1/read_preferred
> $ cat devinfo/1/read_preferred; cat devinfo/2/read_preffered
Typo: should be read_preferred
> 1
> 0
>
> So now when the file is read, the read IO would prefer device(s) with
> read_preferred flags for reading.
>
> $ echo 3 > /proc/sys/vm/drop_caches; md5sum /btrfs/YkZI
>
> Since the devid 1 (sdb) is our read preferred device, the reads are set
> to sdb only.
> $ iostat -zy 1 | egrep 'sdb|sdc' (from another terminal)
> sdb 50.00 40048.00 0.00 40048 0
>
> $ echo 0 > ./devinfo/1/read_preferred; echo 1 >
> ./devinfo/2/read_preferred;
>
> [ 3343.918658] BTRFS info (device sdb): reset read preferred on devid 1
> (1334)
> [ 3343.919876] BTRFS info (device sdb): set read preferred on devid 2
> (1334)
>
> $ echo 3 > /proc/sys/vm/drop_caches; md5sum /btrfs/YkZI
>
> Since now we changed the read preferred from devid 1 (sdb) to 2 (sdc),
> now all the read IO goes to sdc.
>
> $ iostat -zy 1 | egrep 'sdb|sdc' (from another terminal)
> sdc 49.00 40048.00 0.00 40048 0
>
> Whenever there isn't any read preferred device(s) or if more than one
> stripe is marked as read preferred device then this read policy shall
> use the stripe 0 for reading.
Should we consider the situation where more than one device is preferred
(perhaps for a future patch) - e.g. devid1 is HDD, devid2 is SSD, devid3
is SSD and data is RAID1C3?
Will there be a warning when this fallback to stripe 0 happens? Although
I imagine that would either always display on mount before
read_preferred is set or flood dmesg for every read.
Perhaps fallback to the %pid policy to give some form of balancing would
be a better default?
--
Steven Davies
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v6 5/5] btrfs: introduce new read_policy device
2020-02-19 12:18 ` Steven Davies
@ 2020-02-20 3:54 ` Anand Jain
0 siblings, 0 replies; 8+ messages in thread
From: Anand Jain @ 2020-02-20 3:54 UTC (permalink / raw)
To: Steven Davies; +Cc: josef, dsterba, linux-btrfs
>> Whenever there isn't any read preferred device(s) or if more than one
>> stripe is marked as read preferred device then this read policy shall
>> use the stripe 0 for reading.
>
> Should we consider the situation where more than one device is preferred
> (perhaps for a future patch) - e.g. devid1 is HDD, devid2 is SSD, devid3
> is SSD and data is RAID1C3?
Once we have read policy type qdepth, we will use the read preferred
device with the larger qdepth. This message is in the code comment. Oops
I should have add it here also.
> Will there be a warning when this fallback to stripe 0 happens? Although
> I imagine that would either always display on mount before
> read_preferred is set or flood dmesg for every read.
In a 3 disks raid1, if there is only one disk marked as read preferred,
and if the stripe 0 and 1 are on non-read-preferred disks, it will pick
stripe 0 and warning is unnecessary.
In a 3 disks raid1, if there are 2 disks marked as read preferred, and
the stripe 0 and 1 are on those two read preferred disks, we will be
using the Qdepth to find the suitable read preferred device.
> Perhaps fallback to the %pid policy to give some form of balancing would
> be a better default?
>
Lets say read_policy is set to 'device' but there isn't any
read_preferred device, then it make sense to fall back to default
read_policy. But for every read to determine if there is any read
preferred device outside of the striped chunk not a good idea.
Thanks, Anand
^ permalink raw reply [flat|nested] 8+ messages in thread