Linux RAID subsystem development
 help / color / mirror / Atom feed
* An old "write-mostly" read balance issue
@ 2015-02-08 15:40 Dark Penguin
  2015-02-23  0:03 ` NeilBrown
  0 siblings, 1 reply; 3+ messages in thread
From: Dark Penguin @ 2015-02-08 15:40 UTC (permalink / raw)
  To: linux-raid

There is an old issue about RAID1 read-balancing when "write-mostly" 
disks are present.

The problem is, according to the manual, "md driver will avoid reading 
from these devices if at all possible".

One way to understand this statement is that these drives will never be 
read from, except when the main drive can not be read from. There are A 
LOT of situations when this is the expected and desired behaviour:
- People mirroring an SSD with an HDD and suffering a performance loss;
- People mirroring a fast HDD with a slow HDD for reliability, for 
example, mirroring a 300Gb WD Raptor to a 300Gb partition on a 3Tb 5900 
"green" drive for backup; since the larger drive may be used for 
something other than this RAID, many would prefer it to be spared the 
workload.
- In my case, I have a home RAID1 storage, which is idle 95% of the 
time, and 95% of the remaining 5% I only read from it. So I want one of 
the drives to spin down and never turn on, in order to avoid wearing 
down the mechanics. They say, "The best way to keep a device from 
breaking is to turn it off and not use it". :) But even if I simply 
retrieve the contents of my volume, that request is apparently enough to 
load the first drive to 100% for a split second, which causes the second 
drive to spin up, which is extremely undesirable.

I've spent a lot of time looking for the answer "why does it spin up", 
and "normal forum users" couldn not even help me, but then I found out 
that there is another way to read that statement: apparently, there are 
other people who would like to see whatever little benefit reading from 
the second drive could give them. I can not say which side is a 
majority, but I respect their wishes as well, and personally I'm fine 
with any default behaviour as long as I have what I need.

I've found a patch for that:
http://marc.info/?l=linux-raid&m=135982797322422
Apparently, it can be used with any kernel, but I'm not good enough to 
make sure nothing's broken everytime I upgrade the kernel, and frankly, 
I think there are A LOT of people who wish to see the behaviour I would 
expect. So my plea is for the developers to accept this patch and make 
this behaviour optional, if not default. At least give us a compile 
option to build the kernel this way! There are people out there who use 
RAID1 at home and not in production, and therefore care less about 
performance than home storage idling, and who understand the words "if 
at all possible" in the more obvious way! I think that's the whole 
reason why the "write-mostly" option is there in the first place, but if 
there are people who don't agree with me - I'm not going to argue, they 
can have it their way, just give us the option to do what we want, too!


-- 
darkpenguin

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: An old "write-mostly" read balance issue
  2015-02-08 15:40 An old "write-mostly" read balance issue Dark Penguin
@ 2015-02-23  0:03 ` NeilBrown
  2015-02-24  8:20   ` Tomáš Hodek
  0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2015-02-23  0:03 UTC (permalink / raw)
  To: Dark Penguin; +Cc: linux-raid, tomas.hodek

[-- Attachment #1: Type: text/plain, Size: 5403 bytes --]

On Sun, 08 Feb 2015 18:40:07 +0300 Dark Penguin <darkpenguin@yandex.ru> wrote:

> There is an old issue about RAID1 read-balancing when "write-mostly" 
> disks are present.
> 
> The problem is, according to the manual, "md driver will avoid reading 
> from these devices if at all possible".
> 
> One way to understand this statement is that these drives will never be 
> read from, except when the main drive can not be read from. There are A 
> LOT of situations when this is the expected and desired behaviour:
> - People mirroring an SSD with an HDD and suffering a performance loss;
> - People mirroring a fast HDD with a slow HDD for reliability, for 
> example, mirroring a 300Gb WD Raptor to a 300Gb partition on a 3Tb 5900 
> "green" drive for backup; since the larger drive may be used for 
> something other than this RAID, many would prefer it to be spared the 
> workload.
> - In my case, I have a home RAID1 storage, which is idle 95% of the 
> time, and 95% of the remaining 5% I only read from it. So I want one of 
> the drives to spin down and never turn on, in order to avoid wearing 
> down the mechanics. They say, "The best way to keep a device from 
> breaking is to turn it off and not use it". :) But even if I simply 
> retrieve the contents of my volume, that request is apparently enough to 
> load the first drive to 100% for a split second, which causes the second 
> drive to spin up, which is extremely undesirable.
> 
> I've spent a lot of time looking for the answer "why does it spin up", 
> and "normal forum users" couldn not even help me, but then I found out 
> that there is another way to read that statement: apparently, there are 
> other people who would like to see whatever little benefit reading from 
> the second drive could give them. I can not say which side is a 
> majority, but I respect their wishes as well, and personally I'm fine 
> with any default behaviour as long as I have what I need.
> 
> I've found a patch for that:
> http://marc.info/?l=linux-raid&m=135982797322422
> Apparently, it can be used with any kernel, but I'm not good enough to 
> make sure nothing's broken everytime I upgrade the kernel, and frankly, 
> I think there are A LOT of people who wish to see the behaviour I would 
> expect. So my plea is for the developers to accept this patch and make 
> this behaviour optional, if not default. At least give us a compile 
> option to build the kernel this way! There are people out there who use 
> RAID1 at home and not in production, and therefore care less about 
> performance than home storage idling, and who understand the words "if 
> at all possible" in the more obvious way! I think that's the whole 
> reason why the "write-mostly" option is there in the first place, but if 
> there are people who don't agree with me - I'm not going to argue, they 
> can have it their way, just give us the option to do what we want, too!
> 
> 

Hi,
 thanks for reporting this.  It is definitely a bug.  It was introduced by 

commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
    md/raid1: read balance chooses idlest disk for SSD


 I don't recall seeing the patch from Tomas Hodek which you provided a link
for  - sorry Tomas.

I prefer the second of the two patches.  I will submit the following to Linus
some time this week.

Thanks for pursuing this Dark Penguin.

NeilBrown

From: Tomas Hodek <tomas.hodek@volny.cz>
Date: Mon, 23 Feb 2015 11:00:38 +1100
Subject: [PATCH] Subject: md/raid1: fix read balance when a drive is
 write-mostly.

When a drive is marked write-mostly it should only be the
target of reads if there is no other option.

This behaviour was broken by

commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
    md/raid1: read balance chooses idlest disk for SSD

which causes a write-mostly device to be *preferred* is some cases.

Restore correct behaviour by checking and setting
best_dist_disk and best_pending_disk rather than best_disk.

We only need to test one of these as they are both changed
from -1 or >=0 at the same time.

As we leave min_pending and best_dist unchanged, any non-write-mostly
device will appear better than the write-mostly device.

Reported-by: tomas.hodek@volny.cz
Reported-by: Dark Penguin <darkpenguin@yandex.ru>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: http://marc.info/?l=linux-raid&m=135982797322422
Fixes: 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
Cc: stable@vger.kernel.org (3.6+)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 0b6349f9c5c5..7742e0999bf2 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -560,7 +560,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
 		if (test_bit(WriteMostly, &rdev->flags)) {
 			/* Don't balance among write-mostly, just
 			 * use the first as a last resort */
-			if (best_disk < 0) {
+			if (best_dist_disk < 0) {
 				if (is_badblock(rdev, this_sector, sectors,
 						&first_bad, &bad_sectors)) {
 					if (first_bad < this_sector)
@@ -569,7 +569,8 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
 					best_good_sectors = first_bad - this_sector;
 				} else
 					best_good_sectors = sectors;
-				best_disk = disk;
+				best_dist_disk = disk;
+				best_pending_disk = disk;
 			}
 			continue;
 		}

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: An old "write-mostly" read balance issue
  2015-02-23  0:03 ` NeilBrown
@ 2015-02-24  8:20   ` Tomáš Hodek
  0 siblings, 0 replies; 3+ messages in thread
From: Tomáš Hodek @ 2015-02-24  8:20 UTC (permalink / raw)
  To: NeilBrown, Dark Penguin; +Cc: linux-raid


Dne 23.2.2015 v 01:03 NeilBrown napsal(a):
> Hi,
>   thanks for reporting this.  It is definitely a bug.  It was introduced by
>
>

Hello,

Thank you that you have marked a current write-mostly behaviour as the bug.

If you think that I can help you, please tell me.


Best regards,
Tomas

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-02-24  8:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-08 15:40 An old "write-mostly" read balance issue Dark Penguin
2015-02-23  0:03 ` NeilBrown
2015-02-24  8:20   ` Tomáš Hodek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox