From: Anand Jain <anand.jain@oracle.com>
To: linux-btrfs@vger.kernel.org
Cc: dsterba@suse.com, josef@toxicpanda.com
Subject: [PATCH v4 1/3] btrfs: add read_policy latency
Date: Tue, 19 Jan 2021 23:52:05 -0800 [thread overview]
Message-ID: <63f6f00e2ecc741efd2200c3c87b5db52c6be2fd.1611114341.git.anand.jain@oracle.com> (raw)
In-Reply-To: <cover.1611114341.git.anand.jain@oracle.com>
The read policy type latency routes the read IO based on the historical
average wait-time experienced by the read IOs through the individual
device. This patch obtains the historical read IO stats from the kernel
block layer and calculates its average.
Example usage:
echo "latency" > /sys/fs/btrfs/$uuid/read_policy
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
v4: For btrfs_debug_rl() use fs_info instead of
device->fs_devices->fs_info.
v3: The block layer commit 0d02129e76ed (block: merge struct block_device and
struct hd_struct) has changed the first argument in the function
part_stat_read_all() in 5.11-rc1. So the compilation will fail. This patch
fixes it.
Commit log updated.
v2: Use btrfs_debug_rl() instead of btrfs_info_rl()
It is better we have this debug until we test this on at least few
hardwares.
Drop the unrelated changes.
Update change log.
rfc->v1: Drop part_stat_read_all instead use part_stat_read
Drop inflight
fs/btrfs/sysfs.c | 3 ++-
fs/btrfs/volumes.c | 38 ++++++++++++++++++++++++++++++++++++++
fs/btrfs/volumes.h | 2 ++
3 files changed, 42 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 4522a1c4cd08..7c0324fe97b2 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -915,7 +915,8 @@ static bool strmatch(const char *buffer, const char *string)
return false;
}
-static const char * const btrfs_read_policy_name[] = { "pid" };
+/* Must follow the order as in enum btrfs_read_policy */
+static const char * const btrfs_read_policy_name[] = { "pid", "latency" };
static ssize_t btrfs_read_policy_show(struct kobject *kobj,
struct kobj_attribute *a, char *buf)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 62d6a890fc50..f361f1c87eb6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -14,6 +14,7 @@
#include <linux/semaphore.h>
#include <linux/uuid.h>
#include <linux/list_sort.h>
+#include <linux/part_stat.h>
#include "misc.h"
#include "ctree.h"
#include "extent_map.h"
@@ -5490,6 +5491,39 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
return ret;
}
+static int btrfs_find_best_stripe(struct btrfs_fs_info *fs_info,
+ struct map_lookup *map, int first,
+ int num_stripe)
+{
+ u64 est_wait = 0;
+ int best_stripe = 0;
+ int index;
+
+ for (index = first; index < first + num_stripe; index++) {
+ u64 read_wait;
+ u64 avg_wait = 0;
+ unsigned long read_ios;
+ struct btrfs_device *device = map->stripes[index].dev;
+
+ read_wait = part_stat_read(device->bdev, nsecs[READ]);
+ read_ios = part_stat_read(device->bdev, ios[READ]);
+
+ if (read_wait && read_ios && read_wait >= read_ios)
+ avg_wait = div_u64(read_wait, read_ios);
+ else
+ btrfs_debug_rl(fs_info,
+ "devid: %llu avg_wait ZERO read_wait %llu read_ios %lu",
+ device->devid, read_wait, read_ios);
+
+ if (est_wait == 0 || est_wait > avg_wait) {
+ est_wait = avg_wait;
+ best_stripe = index;
+ }
+ }
+
+ return best_stripe;
+}
+
static int find_live_mirror(struct btrfs_fs_info *fs_info,
struct map_lookup *map, int first,
int dev_replace_is_ongoing)
@@ -5519,6 +5553,10 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
case BTRFS_READ_POLICY_PID:
preferred_mirror = first + (current->pid % num_stripes);
break;
+ case BTRFS_READ_POLICY_LATENCY:
+ preferred_mirror = btrfs_find_best_stripe(fs_info, map, first,
+ num_stripes);
+ break;
}
if (dev_replace_is_ongoing &&
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 1997a4649a66..71ba1f0e93f4 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -222,6 +222,8 @@ enum btrfs_chunk_allocation_policy {
enum btrfs_read_policy {
/* Use process PID to choose the stripe */
BTRFS_READ_POLICY_PID,
+ /* Find and use device with the lowest latency */
+ BTRFS_READ_POLICY_LATENCY,
BTRFS_NR_READ_POLICY,
};
--
2.28.0
next prev parent reply other threads:[~2021-01-20 7:53 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-20 7:52 [PATCH v4 0/3] btrfs: read_policy types latency, device and round-robin Anand Jain
2021-01-20 7:52 ` Anand Jain [this message]
2021-01-20 12:14 ` [PATCH v4 1/3] btrfs: add read_policy latency David Sterba
2021-01-21 10:10 ` Anand Jain
2021-01-21 17:52 ` David Sterba
2021-01-22 8:10 ` Anand Jain
2021-01-30 1:08 ` Anand Jain
2021-02-04 12:30 ` Anand Jain
2021-02-09 21:12 ` Michal Rostecki
2021-02-10 6:14 ` Anand Jain
2021-01-20 7:52 ` [PATCH v4 2/3] btrfs: introduce new device-state read_preferred Anand Jain
2021-01-21 10:19 ` Anand Jain
2021-01-20 7:52 ` [PATCH v4 3/3] btrfs: introduce new read_policy device Anand Jain
2021-01-20 12:34 ` [PATCH v4 0/3, full-cover-letter] btrfs: read_policy types latency, device and round-robin Anand Jain
2021-01-22 5:52 ` Anand Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=63f6f00e2ecc741efd2200c3c87b5db52c6be2fd.1611114341.git.anand.jain@oracle.com \
--to=anand.jain@oracle.com \
--cc=dsterba@suse.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).