* Re: An old "write-mostly" read balance issue
2015-02-08 15:40 An old "write-mostly" read balance issue Dark Penguin
@ 2015-02-23 0:03 ` NeilBrown
2015-02-24 8:20 ` Tomáš Hodek
0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2015-02-23 0:03 UTC (permalink / raw)
To: Dark Penguin; +Cc: linux-raid, tomas.hodek
[-- Attachment #1: Type: text/plain, Size: 5403 bytes --]
On Sun, 08 Feb 2015 18:40:07 +0300 Dark Penguin <darkpenguin@yandex.ru> wrote:
> There is an old issue about RAID1 read-balancing when "write-mostly"
> disks are present.
>
> The problem is, according to the manual, "md driver will avoid reading
> from these devices if at all possible".
>
> One way to understand this statement is that these drives will never be
> read from, except when the main drive can not be read from. There are A
> LOT of situations when this is the expected and desired behaviour:
> - People mirroring an SSD with an HDD and suffering a performance loss;
> - People mirroring a fast HDD with a slow HDD for reliability, for
> example, mirroring a 300Gb WD Raptor to a 300Gb partition on a 3Tb 5900
> "green" drive for backup; since the larger drive may be used for
> something other than this RAID, many would prefer it to be spared the
> workload.
> - In my case, I have a home RAID1 storage, which is idle 95% of the
> time, and 95% of the remaining 5% I only read from it. So I want one of
> the drives to spin down and never turn on, in order to avoid wearing
> down the mechanics. They say, "The best way to keep a device from
> breaking is to turn it off and not use it". :) But even if I simply
> retrieve the contents of my volume, that request is apparently enough to
> load the first drive to 100% for a split second, which causes the second
> drive to spin up, which is extremely undesirable.
>
> I've spent a lot of time looking for the answer "why does it spin up",
> and "normal forum users" couldn not even help me, but then I found out
> that there is another way to read that statement: apparently, there are
> other people who would like to see whatever little benefit reading from
> the second drive could give them. I can not say which side is a
> majority, but I respect their wishes as well, and personally I'm fine
> with any default behaviour as long as I have what I need.
>
> I've found a patch for that:
> http://marc.info/?l=linux-raid&m=135982797322422
> Apparently, it can be used with any kernel, but I'm not good enough to
> make sure nothing's broken everytime I upgrade the kernel, and frankly,
> I think there are A LOT of people who wish to see the behaviour I would
> expect. So my plea is for the developers to accept this patch and make
> this behaviour optional, if not default. At least give us a compile
> option to build the kernel this way! There are people out there who use
> RAID1 at home and not in production, and therefore care less about
> performance than home storage idling, and who understand the words "if
> at all possible" in the more obvious way! I think that's the whole
> reason why the "write-mostly" option is there in the first place, but if
> there are people who don't agree with me - I'm not going to argue, they
> can have it their way, just give us the option to do what we want, too!
>
>
Hi,
thanks for reporting this. It is definitely a bug. It was introduced by
commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
md/raid1: read balance chooses idlest disk for SSD
I don't recall seeing the patch from Tomas Hodek which you provided a link
for - sorry Tomas.
I prefer the second of the two patches. I will submit the following to Linus
some time this week.
Thanks for pursuing this Dark Penguin.
NeilBrown
From: Tomas Hodek <tomas.hodek@volny.cz>
Date: Mon, 23 Feb 2015 11:00:38 +1100
Subject: [PATCH] Subject: md/raid1: fix read balance when a drive is
write-mostly.
When a drive is marked write-mostly it should only be the
target of reads if there is no other option.
This behaviour was broken by
commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
md/raid1: read balance chooses idlest disk for SSD
which causes a write-mostly device to be *preferred* is some cases.
Restore correct behaviour by checking and setting
best_dist_disk and best_pending_disk rather than best_disk.
We only need to test one of these as they are both changed
from -1 or >=0 at the same time.
As we leave min_pending and best_dist unchanged, any non-write-mostly
device will appear better than the write-mostly device.
Reported-by: tomas.hodek@volny.cz
Reported-by: Dark Penguin <darkpenguin@yandex.ru>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: http://marc.info/?l=linux-raid&m=135982797322422
Fixes: 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
Cc: stable@vger.kernel.org (3.6+)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 0b6349f9c5c5..7742e0999bf2 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -560,7 +560,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
if (test_bit(WriteMostly, &rdev->flags)) {
/* Don't balance among write-mostly, just
* use the first as a last resort */
- if (best_disk < 0) {
+ if (best_dist_disk < 0) {
if (is_badblock(rdev, this_sector, sectors,
&first_bad, &bad_sectors)) {
if (first_bad < this_sector)
@@ -569,7 +569,8 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
best_good_sectors = first_bad - this_sector;
} else
best_good_sectors = sectors;
- best_disk = disk;
+ best_dist_disk = disk;
+ best_pending_disk = disk;
}
continue;
}
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply related [flat|nested] 3+ messages in thread