* [PATCH 1/3] btrfs: add readmirror type framework
2019-12-20 5:06 [PATCH 0/3] readmirror feature (sysfs and in-memory only approach) Anand Jain
@ 2019-12-20 5:06 ` Anand Jain
2019-12-20 14:44 ` Josef Bacik
2019-12-20 5:06 ` [PATCH 2/3] btrfs: sysfs, add readmirror kobject Anand Jain
2019-12-20 5:06 ` [PATCH 3/3] btrfs: sysfs, create by_pid readmirror attribute Anand Jain
2 siblings, 1 reply; 6+ messages in thread
From: Anand Jain @ 2019-12-20 5:06 UTC (permalink / raw)
To: linux-btrfs
As of now we use %pid method to read stripped mirrored data. So
application's process id determines the stripe id to be read. This type
of routing typically helps in a system with many small independent
applications tying to read random data. On the other hand the %pid
based read IO distribution policy is inefficient if there is a single
application trying to read large data and the overall disk bandwidth
remains under utilized.
So this patch introduces a framework where we could add more readmirror
policies, such as routing the IO based on device's waitqueue or manual
when we have a read-preferred device or a policy based on the target
storage caching.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/volumes.c | 16 +++++++++++++++-
fs/btrfs/volumes.h | 8 ++++++++
2 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c95e47aa84f8..0c6caae29248 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1162,6 +1162,8 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
fs_devices->opened = 1;
fs_devices->latest_bdev = latest_dev->bdev;
fs_devices->total_rw_bytes = 0;
+ /* Set the default readmirror policy */
+ atomic_set(&fs_devices->readmirror, BTRFS_READMIRROR_DEFAULT);
out:
return ret;
}
@@ -5300,7 +5302,19 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
else
num_stripes = map->num_stripes;
- preferred_mirror = first + current->pid % num_stripes;
+ switch (atomic_read(&fs_info->fs_devices->readmirror)) {
+ case BTRFS_READMIRROR_BY_PID:
+ preferred_mirror = first + current->pid % num_stripes;
+ break;
+ default:
+ /*
+ * Shouln't happen, just warn and use default instead of failing.
+ */
+ btrfs_warn_rl(fs_info,
+ "unknown readmirror type %u, fallback to by_pid",
+ atomic_read(&fs_info->fs_devices->readmirror));
+ preferred_mirror = first + current->pid % num_stripes;
+ }
if (dev_replace_is_ongoing &&
fs_info->dev_replace.cont_reading_from_srcdev_mode ==
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 68021d1ee216..d9c4c4e1dbc2 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -209,6 +209,12 @@ struct btrfs_device {
BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes);
BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
+/* readmirror_policy types */
+#define BTRFS_READMIRROR_DEFAULT BTRFS_READMIRROR_BY_PID
+enum btrfs_readmirror_policy_type {
+ BTRFS_READMIRROR_BY_PID,
+};
+
struct btrfs_fs_devices {
u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
u8 metadata_uuid[BTRFS_FSID_SIZE];
@@ -260,6 +266,8 @@ struct btrfs_fs_devices {
struct kobject *devices_kobj;
struct kobject *devinfo_kobj;
struct completion kobj_unregister;
+
+ atomic_t readmirror;
};
#define BTRFS_BIO_INLINE_CSUM_SIZE 64
--
1.8.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH 1/3] btrfs: add readmirror type framework
2019-12-20 5:06 ` [PATCH 1/3] btrfs: add readmirror type framework Anand Jain
@ 2019-12-20 14:44 ` Josef Bacik
2020-01-02 10:12 ` Anand Jain
0 siblings, 1 reply; 6+ messages in thread
From: Josef Bacik @ 2019-12-20 14:44 UTC (permalink / raw)
To: Anand Jain, linux-btrfs
On 12/20/19 12:06 AM, Anand Jain wrote:
> As of now we use %pid method to read stripped mirrored data. So
> application's process id determines the stripe id to be read. This type
> of routing typically helps in a system with many small independent
> applications tying to read random data. On the other hand the %pid
> based read IO distribution policy is inefficient if there is a single
> application trying to read large data and the overall disk bandwidth
> remains under utilized.
>
> So this patch introduces a framework where we could add more readmirror
> policies, such as routing the IO based on device's waitqueue or manual
> when we have a read-preferred device or a policy based on the target
> storage caching.
>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> fs/btrfs/volumes.c | 16 +++++++++++++++-
> fs/btrfs/volumes.h | 8 ++++++++
> 2 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index c95e47aa84f8..0c6caae29248 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1162,6 +1162,8 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
> fs_devices->opened = 1;
> fs_devices->latest_bdev = latest_dev->bdev;
> fs_devices->total_rw_bytes = 0;
> + /* Set the default readmirror policy */
> + atomic_set(&fs_devices->readmirror, BTRFS_READMIRROR_DEFAULT);
There's no reason for this to be atomic, it's just a behavior change, if you
really want to be super safe use READ_ONCE/WRITE_ONCE and have readmirror be
your enum. Thanks,
Josef
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/3] btrfs: add readmirror type framework
2019-12-20 14:44 ` Josef Bacik
@ 2020-01-02 10:12 ` Anand Jain
0 siblings, 0 replies; 6+ messages in thread
From: Anand Jain @ 2020-01-02 10:12 UTC (permalink / raw)
To: Josef Bacik, linux-btrfs
On 12/20/19 10:44 PM, Josef Bacik wrote:
> On 12/20/19 12:06 AM, Anand Jain wrote:
>> As of now we use %pid method to read stripped mirrored data. So
>> application's process id determines the stripe id to be read. This type
>> of routing typically helps in a system with many small independent
>> applications tying to read random data. On the other hand the %pid
>> based read IO distribution policy is inefficient if there is a single
>> application trying to read large data and the overall disk bandwidth
>> remains under utilized.
>>
>> So this patch introduces a framework where we could add more readmirror
>> policies, such as routing the IO based on device's waitqueue or manual
>> when we have a read-preferred device or a policy based on the target
>> storage caching.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> fs/btrfs/volumes.c | 16 +++++++++++++++-
>> fs/btrfs/volumes.h | 8 ++++++++
>> 2 files changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index c95e47aa84f8..0c6caae29248 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1162,6 +1162,8 @@ static int open_fs_devices(struct
>> btrfs_fs_devices *fs_devices,
>> fs_devices->opened = 1;
>> fs_devices->latest_bdev = latest_dev->bdev;
>> fs_devices->total_rw_bytes = 0;
>> + /* Set the default readmirror policy */
>> + atomic_set(&fs_devices->readmirror, BTRFS_READMIRROR_DEFAULT);
> There's no reason for this to be atomic, it's just a behavior change, if
> you really want to be super safe use READ_ONCE/WRITE_ONCE and have
> readmirror be your enum. Thanks,
Agreed fs_devices::readmirror doesn't have to be atmoic_t. Fixed this
to declare it as u8 in v2.
Thanks, Anand
> Josef
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/3] btrfs: sysfs, add readmirror kobject
2019-12-20 5:06 [PATCH 0/3] readmirror feature (sysfs and in-memory only approach) Anand Jain
2019-12-20 5:06 ` [PATCH 1/3] btrfs: add readmirror type framework Anand Jain
@ 2019-12-20 5:06 ` Anand Jain
2019-12-20 5:06 ` [PATCH 3/3] btrfs: sysfs, create by_pid readmirror attribute Anand Jain
2 siblings, 0 replies; 6+ messages in thread
From: Anand Jain @ 2019-12-20 5:06 UTC (permalink / raw)
To: linux-btrfs
Add
/sys/fs/btrfs/UUID/readmirror
kobject so that we can be the readmirror policies as attributes under it.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/sysfs.c | 22 ++++++++++++++++++++++
fs/btrfs/volumes.h | 1 +
2 files changed, 23 insertions(+)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index d414b98fb27f..da5e1938e9b9 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -355,6 +355,10 @@ static ssize_t supported_checksums_show(struct kobject *kobj,
#endif
+static const struct attribute *btrfs_readmirror_attrs[] = {
+ NULL,
+};
+
static ssize_t btrfs_show_u64(u64 *value_ptr, spinlock_t *lock, char *buf)
{
u64 val;
@@ -772,6 +776,13 @@ void btrfs_sysfs_remove_mounted(struct btrfs_fs_info *fs_info)
{
btrfs_reset_fs_info_ptr(fs_info);
+ if (fs_info->fs_devices->readmirror_kobj) {
+ sysfs_remove_files(fs_info->fs_devices->readmirror_kobj,
+ btrfs_readmirror_attrs);
+ kobject_del(fs_info->fs_devices->readmirror_kobj);
+ kobject_put(fs_info->fs_devices->readmirror_kobj);
+ }
+
if (fs_info->space_info_kobj) {
sysfs_remove_files(fs_info->space_info_kobj, allocation_attrs);
kobject_del(fs_info->space_info_kobj);
@@ -1224,6 +1235,17 @@ int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info)
if (error)
goto failure;
+ fs_devs->readmirror_kobj = kobject_create_and_add("readmirror",
+ &fs_devs->fsid_kobj);
+ if (!fs_devs->readmirror_kobj) {
+ error = -ENOMEM;
+ goto failure;
+ }
+ error = sysfs_create_files(fs_info->readmirror_kobj,
+ btrfs_readmirror_attrs);
+ if (error)
+ goto failure;
+
return 0;
failure:
btrfs_sysfs_remove_mounted(fs_info);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index d9c4c4e1dbc2..5a9fca16a8a6 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -268,6 +268,7 @@ struct btrfs_fs_devices {
struct completion kobj_unregister;
atomic_t readmirror;
+ struct kobject *readmirror_kobj;
};
#define BTRFS_BIO_INLINE_CSUM_SIZE 64
--
1.8.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH 3/3] btrfs: sysfs, create by_pid readmirror attribute
2019-12-20 5:06 [PATCH 0/3] readmirror feature (sysfs and in-memory only approach) Anand Jain
2019-12-20 5:06 ` [PATCH 1/3] btrfs: add readmirror type framework Anand Jain
2019-12-20 5:06 ` [PATCH 2/3] btrfs: sysfs, add readmirror kobject Anand Jain
@ 2019-12-20 5:06 ` Anand Jain
2 siblings, 0 replies; 6+ messages in thread
From: Anand Jain @ 2019-12-20 5:06 UTC (permalink / raw)
To: linux-btrfs
Add the exisiting %pid based readmirror policy as an attribute
/sys/fs/btrfs/UUID/readmirror/by_pid
When read, this returns 0 or 1. 1 indicates the currently active policy.
When written with 1, it sets pid as the current active policy and when
written 0 it resets to the default readmirror policy which as of now is
pid as well. For any other value it returns EINVAL.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/sysfs.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 53 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index da5e1938e9b9..5a09cf868328 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -355,7 +355,59 @@ static ssize_t supported_checksums_show(struct kobject *kobj,
#endif
+#define readmirror_to_fs_devices(_kobj) to_fs_devs((_kobj)->parent)
+/*
+ * Set the readmirror type to BTRFS_READMIRROR_BY_PID
+ */
+static ssize_t btrfs_sysfs_readmirror_by_pid_store(struct kobject *kobj,
+ struct kobj_attribute *a,
+ const char *buf, size_t count)
+{
+ int ret;
+ unsigned long val;
+ struct btrfs_fs_devices *fs_devices;
+
+ fs_devices = readmirror_to_fs_devices(kobj);
+
+ ret = kstrtoul(skip_spaces(buf), 0, &val);
+ if (ret)
+ return ret;
+
+ switch (val) {
+ case 0:
+ atomic_set(&fs_devices->readmirror, BTRFS_READMIRROR_DEFAULT);
+ break;
+ case 1:
+ atomic_set(&fs_devices->readmirror, BTRFS_READMIRROR_BY_PID);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return count;
+}
+
+static ssize_t btrfs_sysfs_readmirror_by_pid_show(struct kobject *kobj,
+ struct kobj_attribute *a,
+ char *buf)
+{
+ int val;
+ struct btrfs_fs_devices *fs_devices;
+
+ fs_devices = readmirror_to_fs_devices(kobj);
+
+ if (atomic_read(&fs_devices->readmirror) == BTRFS_READMIRROR_BY_PID)
+ val = 1;
+ else
+ val = 0;
+
+ return snprintf(buf, PAGE_SIZE, "%d\n", val);
+}
+BTRFS_ATTR_RW(readmirror, by_pid, btrfs_sysfs_readmirror_by_pid_show,
+ btrfs_sysfs_readmirror_by_pid_store);
+
static const struct attribute *btrfs_readmirror_attrs[] = {
+ BTRFS_ATTR_PTR(readmirror, by_pid),
NULL,
};
@@ -1241,7 +1293,7 @@ int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info)
error = -ENOMEM;
goto failure;
}
- error = sysfs_create_files(fs_info->readmirror_kobj,
+ error = sysfs_create_files(fs_devs->readmirror_kobj,
btrfs_readmirror_attrs);
if (error)
goto failure;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread