public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Wang Yugui <wangyugui@e16-tech.com>
To: Anand Jain <anand.jain@oracle.com>
Cc: linux-btrfs@vger.kernel.org, louis@waffle.tech
Subject: Re: [PATCH 6/7] btrfs: introduce new read_policy device
Date: Tue, 27 Oct 2020 15:11:59 +0800	[thread overview]
Message-ID: <20201027151158.C1D4.409509F4@e16-tech.com> (raw)
In-Reply-To: <eacd759d436260ccd586d52c9d2500e63b4aa614.1603751876.git.anand.jain@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 1341 bytes --]

Hi, Anand Jain
Cc: Louis Jencka

> Read-policy type 'device' and device flag 'read-preferred':
> 
> The read-policy type device picks the device(s) flagged as
> read-preferred for reading chunks of type raid1, raid10,
> raid1c3 and raid1c4.
> 
> A system might contain SSD, nvme, iscsi or san lun, and which are all
> a non-rotational device, so it is not a good idea to set the read-preferred
> automatically. Instead device read-policy along with the read-preferred
> flag provides an ability to do it manually. This advance tuning is
> useful in more than one situation, for example,
>  - In heterogeneous-disk volume, it provides an ability to manually choose
>     the low latency disks for reading.
>  - Useful for more accurate testing.
>  - Avoid known problematic device from reading the chunk until it is
>    replaced (by marking the other good devices as read-preferred).

It is still OK to auto for the most common case of the mixed of ssd and
hdd?

I am trying 'manually if failed to auto' with a 'u8' var rather than a 'bool'
var.

There are 2 patch I am working but yet not completed.

and someone of them is based on 'btrfs: balance RAID1/RAID10 mirror
selection' from Louis Jencka louis@waffle.tech

Feel free to merge them into your patch as a new one please.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2020/10/27


[-- Attachment #2: 0001-btrfs-add-tier-score-to-device.patch --]
[-- Type: application/octet-stream, Size: 2530 bytes --]

From 8a8f6405073f835531664aafa333570fba28c31f Mon Sep 17 00:00:00 2001
From: wangyugui <wangyugui@e16-tech.com>
Date: Tue, 27 Oct 2020 08:14:46 +0800
Subject: [PATCH 1/3] btrfs: add tier score to device

We use a single score value to define the tier level of a device.
Different score means different tier, and bigger is faster.
    DAX device(dax=1)
    SSD device(rotational=0)
    HDD device(rotational=1)
TODO/FIXME: Different bus(DIMM/NVMe/SAS/SATA/VirtIO/...) support.
TODO/FIXME: Different media detail(SSD MLC/TLC/..; HDD PMR/SMR/...) support.
TODO/FIXME: User-assigned property to mark as the top tier score.

In most case, only 1 or 2 tiers are used at the same time, so we group them into
top tier and other tier(s).
---
 fs/btrfs/volumes.c | 18 ++++++++++++++++++
 fs/btrfs/volumes.h |  2 ++
 2 files changed, 20 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1997a7d..d767c99 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -608,6 +608,22 @@ static int btrfs_free_stale_devices(const char *path,
 	return ret;
 }
 
+/*
+ * Get the tier score to the device, higher is faster.
+ * FIXME: detech bus(DIMM/NVMe(40)/SCSI(30)/SATA(20)/..)
+ * FIXME: media detail(SSD SLC/MLC/..,)
+ * FIXME: usre-defined property to set to max score 127
+ */
+static void dev_get_tier_score(struct btrfs_device *device, struct request_queue *q)
+{
+	if (blk_queue_dax(q))
+		device->tier_score = 50;
+	else if (blk_queue_nonrot(q))
+		device->tier_score = 10;
+	else
+		device->tier_score = 0;
+}
+
 /*
  * This is only used on mount, and we are protected from competing things
  * messing with our fs_devices by the uuid_mutex, thus we do not need the
@@ -660,6 +676,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 	}
 
 	q = bdev_get_queue(bdev);
+	dev_get_tier_score(device,q);
 	if (!blk_queue_nonrot(q))
 		fs_devices->rotating = true;
 
@@ -2598,6 +2615,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 
 	atomic64_add(device->total_bytes, &fs_info->free_chunk_space);
 
+	dev_get_tier_score(device,q);
 	if (!blk_queue_nonrot(q))
 		fs_devices->rotating = true;
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 302c923..5ffa429 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -138,6 +138,8 @@ struct btrfs_device {
 	struct completion kobj_unregister;
 	/* For sysfs/FSID/devinfo/devid/ */
 	struct kobject devid_kobj;
+
+	u8 tier_score; /* storage tier_score; higher is faster */
 };
 
 /*
-- 
2.29.1


[-- Attachment #3: 0003-btrfs-tier-aware-mirror-path-select.patch --]
[-- Type: application/octet-stream, Size: 2182 bytes --]

From 4ac2fc0a3be670e0960928012f5f156b50f2c69d Mon Sep 17 00:00:00 2001
From: wangyugui <wangyugui@e16-tech.com>
Date: Tue, 27 Oct 2020 09:33:21 +0800
Subject: [PATCH 3/3] btrfs: tier-aware mirror path select

This extended the patch 'btrfs: balance RAID1/RAID10 mirror selection' from louis@waffle.tech
- add the tier-aware feature
---
 fs/btrfs/volumes.c | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d380b20..cc4a791 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5562,6 +5562,9 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
 	return ret;
 }
 
+/* Used for round-robin balancing RAID1/RAID10 reads. */
+static atomic_t rr_counter = ATOMIC_INIT(0);
+
 static int find_live_mirror(struct btrfs_fs_info *fs_info,
 			    struct map_lookup *map, int first,
 			    int dev_replace_is_ongoing)
@@ -5572,6 +5575,11 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
 	int tolerance;
 	struct btrfs_device *srcdev;
 
+	/* tier-aware */
+	int top_tier_num_stripes;
+	int top_tier_stripe_idxs[4];
+	u8 top_tier_score = 0;
+
 	ASSERT((map->type &
 		 (BTRFS_BLOCK_GROUP_RAID1_MASK | BTRFS_BLOCK_GROUP_RAID10)));
 
@@ -5580,7 +5588,29 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
 	else
 		num_stripes = map->num_stripes;
 
-	preferred_mirror = first + current->pid % num_stripes;
+	for (i = 0; i < num_stripes; ++i)
+	{
+		if (map->stripes[i].dev->tier_score > top_tier_score)
+		{
+			top_tier_score = map->stripes[i].dev->tier_score;
+			top_tier_stripe_idxs[0] = i;
+			top_tier_num_stripes = 1;
+		}
+		else if (map->stripes[i].dev->tier_score == top_tier_score)
+		{
+			top_tier_stripe_idxs[top_tier_num_stripes] = i;
+			top_tier_num_stripes++;
+		}
+	}
+	preferred_mirror = first;
+	if (top_tier_num_stripes > 1)
+	{
+		preferred_mirror += top_tier_stripe_idxs[((unsigned)atomic_inc_return(&rr_counter)) % top_tier_num_stripes];
+	}
+	else
+	{
+		preferred_mirror += top_tier_stripe_idxs[0];
+	}
 
 	if (dev_replace_is_ongoing &&
 	    fs_info->dev_replace.cont_reading_from_srcdev_mode ==
-- 
2.29.1


  reply	other threads:[~2020-10-27  7:12 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-26 23:55 [PATCH RFC 0/7] btrfs: read_policy types latency, device and Anand Jain
2020-10-26 23:55 ` [PATCH RFC 1/7] block: export part_stat_read_all Anand Jain
2020-10-27 18:09   ` Josef Bacik
2020-10-28  8:26     ` Anand Jain
2020-10-26 23:55 ` [PATCH RFC 2/7] block: export part_stat_read_inflight Anand Jain
2020-10-27 18:10   ` Josef Bacik
2020-10-28  8:32     ` Anand Jain
2020-10-26 23:55 ` [PATCH RFC 3/7] btrfs: add read_policy latency Anand Jain
2020-10-27 18:20   ` Josef Bacik
2020-10-26 23:55 ` [PATCH RFC 4/7] btrfs: trace, add event btrfs_read_policy Anand Jain
2020-10-27 18:22   ` Josef Bacik
2020-10-28  8:59     ` Anand Jain
2020-10-28 12:41       ` Josef Bacik
2020-10-26 23:55 ` [PATCH 5/7] btrfs: introduce new device-state read_preferred Anand Jain
2020-10-26 23:55 ` [PATCH 6/7] btrfs: introduce new read_policy device Anand Jain
2020-10-27  7:11   ` Wang Yugui [this message]
2020-10-26 23:55 ` [PATCH RFC 7/7] btrfs: introduce new read_policy round-robin Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201027151158.C1D4.409509F4@e16-tech.com \
    --to=wangyugui@e16-tech.com \
    --cc=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=louis@waffle.tech \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox