* An old "write-mostly" read balance issue
@ 2015-02-08 15:40 Dark Penguin
2015-02-23 0:03 ` NeilBrown
0 siblings, 1 reply; 3+ messages in thread
From: Dark Penguin @ 2015-02-08 15:40 UTC (permalink / raw)
To: linux-raid
There is an old issue about RAID1 read-balancing when "write-mostly"
disks are present.
The problem is, according to the manual, "md driver will avoid reading
from these devices if at all possible".
One way to understand this statement is that these drives will never be
read from, except when the main drive can not be read from. There are A
LOT of situations when this is the expected and desired behaviour:
- People mirroring an SSD with an HDD and suffering a performance loss;
- People mirroring a fast HDD with a slow HDD for reliability, for
example, mirroring a 300Gb WD Raptor to a 300Gb partition on a 3Tb 5900
"green" drive for backup; since the larger drive may be used for
something other than this RAID, many would prefer it to be spared the
workload.
- In my case, I have a home RAID1 storage, which is idle 95% of the
time, and 95% of the remaining 5% I only read from it. So I want one of
the drives to spin down and never turn on, in order to avoid wearing
down the mechanics. They say, "The best way to keep a device from
breaking is to turn it off and not use it". :) But even if I simply
retrieve the contents of my volume, that request is apparently enough to
load the first drive to 100% for a split second, which causes the second
drive to spin up, which is extremely undesirable.
I've spent a lot of time looking for the answer "why does it spin up",
and "normal forum users" couldn not even help me, but then I found out
that there is another way to read that statement: apparently, there are
other people who would like to see whatever little benefit reading from
the second drive could give them. I can not say which side is a
majority, but I respect their wishes as well, and personally I'm fine
with any default behaviour as long as I have what I need.
I've found a patch for that:
http://marc.info/?l=linux-raid&m=135982797322422
Apparently, it can be used with any kernel, but I'm not good enough to
make sure nothing's broken everytime I upgrade the kernel, and frankly,
I think there are A LOT of people who wish to see the behaviour I would
expect. So my plea is for the developers to accept this patch and make
this behaviour optional, if not default. At least give us a compile
option to build the kernel this way! There are people out there who use
RAID1 at home and not in production, and therefore care less about
performance than home storage idling, and who understand the words "if
at all possible" in the more obvious way! I think that's the whole
reason why the "write-mostly" option is there in the first place, but if
there are people who don't agree with me - I'm not going to argue, they
can have it their way, just give us the option to do what we want, too!
--
darkpenguin
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: An old "write-mostly" read balance issue
2015-02-08 15:40 An old "write-mostly" read balance issue Dark Penguin
@ 2015-02-23 0:03 ` NeilBrown
2015-02-24 8:20 ` Tomáš Hodek
0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2015-02-23 0:03 UTC (permalink / raw)
To: Dark Penguin; +Cc: linux-raid, tomas.hodek
[-- Attachment #1: Type: text/plain, Size: 5403 bytes --]
On Sun, 08 Feb 2015 18:40:07 +0300 Dark Penguin <darkpenguin@yandex.ru> wrote:
> There is an old issue about RAID1 read-balancing when "write-mostly"
> disks are present.
>
> The problem is, according to the manual, "md driver will avoid reading
> from these devices if at all possible".
>
> One way to understand this statement is that these drives will never be
> read from, except when the main drive can not be read from. There are A
> LOT of situations when this is the expected and desired behaviour:
> - People mirroring an SSD with an HDD and suffering a performance loss;
> - People mirroring a fast HDD with a slow HDD for reliability, for
> example, mirroring a 300Gb WD Raptor to a 300Gb partition on a 3Tb 5900
> "green" drive for backup; since the larger drive may be used for
> something other than this RAID, many would prefer it to be spared the
> workload.
> - In my case, I have a home RAID1 storage, which is idle 95% of the
> time, and 95% of the remaining 5% I only read from it. So I want one of
> the drives to spin down and never turn on, in order to avoid wearing
> down the mechanics. They say, "The best way to keep a device from
> breaking is to turn it off and not use it". :) But even if I simply
> retrieve the contents of my volume, that request is apparently enough to
> load the first drive to 100% for a split second, which causes the second
> drive to spin up, which is extremely undesirable.
>
> I've spent a lot of time looking for the answer "why does it spin up",
> and "normal forum users" couldn not even help me, but then I found out
> that there is another way to read that statement: apparently, there are
> other people who would like to see whatever little benefit reading from
> the second drive could give them. I can not say which side is a
> majority, but I respect their wishes as well, and personally I'm fine
> with any default behaviour as long as I have what I need.
>
> I've found a patch for that:
> http://marc.info/?l=linux-raid&m=135982797322422
> Apparently, it can be used with any kernel, but I'm not good enough to
> make sure nothing's broken everytime I upgrade the kernel, and frankly,
> I think there are A LOT of people who wish to see the behaviour I would
> expect. So my plea is for the developers to accept this patch and make
> this behaviour optional, if not default. At least give us a compile
> option to build the kernel this way! There are people out there who use
> RAID1 at home and not in production, and therefore care less about
> performance than home storage idling, and who understand the words "if
> at all possible" in the more obvious way! I think that's the whole
> reason why the "write-mostly" option is there in the first place, but if
> there are people who don't agree with me - I'm not going to argue, they
> can have it their way, just give us the option to do what we want, too!
>
>
Hi,
thanks for reporting this. It is definitely a bug. It was introduced by
commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
md/raid1: read balance chooses idlest disk for SSD
I don't recall seeing the patch from Tomas Hodek which you provided a link
for - sorry Tomas.
I prefer the second of the two patches. I will submit the following to Linus
some time this week.
Thanks for pursuing this Dark Penguin.
NeilBrown
From: Tomas Hodek <tomas.hodek@volny.cz>
Date: Mon, 23 Feb 2015 11:00:38 +1100
Subject: [PATCH] Subject: md/raid1: fix read balance when a drive is
write-mostly.
When a drive is marked write-mostly it should only be the
target of reads if there is no other option.
This behaviour was broken by
commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
md/raid1: read balance chooses idlest disk for SSD
which causes a write-mostly device to be *preferred* is some cases.
Restore correct behaviour by checking and setting
best_dist_disk and best_pending_disk rather than best_disk.
We only need to test one of these as they are both changed
from -1 or >=0 at the same time.
As we leave min_pending and best_dist unchanged, any non-write-mostly
device will appear better than the write-mostly device.
Reported-by: tomas.hodek@volny.cz
Reported-by: Dark Penguin <darkpenguin@yandex.ru>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: http://marc.info/?l=linux-raid&m=135982797322422
Fixes: 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
Cc: stable@vger.kernel.org (3.6+)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 0b6349f9c5c5..7742e0999bf2 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -560,7 +560,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
if (test_bit(WriteMostly, &rdev->flags)) {
/* Don't balance among write-mostly, just
* use the first as a last resort */
- if (best_disk < 0) {
+ if (best_dist_disk < 0) {
if (is_badblock(rdev, this_sector, sectors,
&first_bad, &bad_sectors)) {
if (first_bad < this_sector)
@@ -569,7 +569,8 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
best_good_sectors = first_bad - this_sector;
} else
best_good_sectors = sectors;
- best_disk = disk;
+ best_dist_disk = disk;
+ best_pending_disk = disk;
}
continue;
}
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: An old "write-mostly" read balance issue
2015-02-23 0:03 ` NeilBrown
@ 2015-02-24 8:20 ` Tomáš Hodek
0 siblings, 0 replies; 3+ messages in thread
From: Tomáš Hodek @ 2015-02-24 8:20 UTC (permalink / raw)
To: NeilBrown, Dark Penguin; +Cc: linux-raid
Dne 23.2.2015 v 01:03 NeilBrown napsal(a):
> Hi,
> thanks for reporting this. It is definitely a bug. It was introduced by
>
>
Hello,
Thank you that you have marked a current write-mostly behaviour as the bug.
If you think that I can help you, please tell me.
Best regards,
Tomas
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-02-24 8:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-08 15:40 An old "write-mostly" read balance issue Dark Penguin
2015-02-23 0:03 ` NeilBrown
2015-02-24 8:20 ` Tomáš Hodek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox