Linux RAID subsystem development
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Dark Penguin <darkpenguin@yandex.ru>
Cc: linux-raid@vger.kernel.org, tomas.hodek@volny.cz
Subject: Re: An old "write-mostly" read balance issue
Date: Mon, 23 Feb 2015 11:03:32 +1100	[thread overview]
Message-ID: <20150223110332.4c135de9@notabene.brown> (raw)
In-Reply-To: <54D78357.7020808@yandex.ru>

[-- Attachment #1: Type: text/plain, Size: 5403 bytes --]

On Sun, 08 Feb 2015 18:40:07 +0300 Dark Penguin <darkpenguin@yandex.ru> wrote:

> There is an old issue about RAID1 read-balancing when "write-mostly" 
> disks are present.
> 
> The problem is, according to the manual, "md driver will avoid reading 
> from these devices if at all possible".
> 
> One way to understand this statement is that these drives will never be 
> read from, except when the main drive can not be read from. There are A 
> LOT of situations when this is the expected and desired behaviour:
> - People mirroring an SSD with an HDD and suffering a performance loss;
> - People mirroring a fast HDD with a slow HDD for reliability, for 
> example, mirroring a 300Gb WD Raptor to a 300Gb partition on a 3Tb 5900 
> "green" drive for backup; since the larger drive may be used for 
> something other than this RAID, many would prefer it to be spared the 
> workload.
> - In my case, I have a home RAID1 storage, which is idle 95% of the 
> time, and 95% of the remaining 5% I only read from it. So I want one of 
> the drives to spin down and never turn on, in order to avoid wearing 
> down the mechanics. They say, "The best way to keep a device from 
> breaking is to turn it off and not use it". :) But even if I simply 
> retrieve the contents of my volume, that request is apparently enough to 
> load the first drive to 100% for a split second, which causes the second 
> drive to spin up, which is extremely undesirable.
> 
> I've spent a lot of time looking for the answer "why does it spin up", 
> and "normal forum users" couldn not even help me, but then I found out 
> that there is another way to read that statement: apparently, there are 
> other people who would like to see whatever little benefit reading from 
> the second drive could give them. I can not say which side is a 
> majority, but I respect their wishes as well, and personally I'm fine 
> with any default behaviour as long as I have what I need.
> 
> I've found a patch for that:
> http://marc.info/?l=linux-raid&m=135982797322422
> Apparently, it can be used with any kernel, but I'm not good enough to 
> make sure nothing's broken everytime I upgrade the kernel, and frankly, 
> I think there are A LOT of people who wish to see the behaviour I would 
> expect. So my plea is for the developers to accept this patch and make 
> this behaviour optional, if not default. At least give us a compile 
> option to build the kernel this way! There are people out there who use 
> RAID1 at home and not in production, and therefore care less about 
> performance than home storage idling, and who understand the words "if 
> at all possible" in the more obvious way! I think that's the whole 
> reason why the "write-mostly" option is there in the first place, but if 
> there are people who don't agree with me - I'm not going to argue, they 
> can have it their way, just give us the option to do what we want, too!
> 
> 

Hi,
 thanks for reporting this.  It is definitely a bug.  It was introduced by 

commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
    md/raid1: read balance chooses idlest disk for SSD


 I don't recall seeing the patch from Tomas Hodek which you provided a link
for  - sorry Tomas.

I prefer the second of the two patches.  I will submit the following to Linus
some time this week.

Thanks for pursuing this Dark Penguin.

NeilBrown

From: Tomas Hodek <tomas.hodek@volny.cz>
Date: Mon, 23 Feb 2015 11:00:38 +1100
Subject: [PATCH] Subject: md/raid1: fix read balance when a drive is
 write-mostly.

When a drive is marked write-mostly it should only be the
target of reads if there is no other option.

This behaviour was broken by

commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
    md/raid1: read balance chooses idlest disk for SSD

which causes a write-mostly device to be *preferred* is some cases.

Restore correct behaviour by checking and setting
best_dist_disk and best_pending_disk rather than best_disk.

We only need to test one of these as they are both changed
from -1 or >=0 at the same time.

As we leave min_pending and best_dist unchanged, any non-write-mostly
device will appear better than the write-mostly device.

Reported-by: tomas.hodek@volny.cz
Reported-by: Dark Penguin <darkpenguin@yandex.ru>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: http://marc.info/?l=linux-raid&m=135982797322422
Fixes: 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
Cc: stable@vger.kernel.org (3.6+)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 0b6349f9c5c5..7742e0999bf2 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -560,7 +560,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
 		if (test_bit(WriteMostly, &rdev->flags)) {
 			/* Don't balance among write-mostly, just
 			 * use the first as a last resort */
-			if (best_disk < 0) {
+			if (best_dist_disk < 0) {
 				if (is_badblock(rdev, this_sector, sectors,
 						&first_bad, &bad_sectors)) {
 					if (first_bad < this_sector)
@@ -569,7 +569,8 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
 					best_good_sectors = first_bad - this_sector;
 				} else
 					best_good_sectors = sectors;
-				best_disk = disk;
+				best_dist_disk = disk;
+				best_pending_disk = disk;
 			}
 			continue;
 		}

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

  reply	other threads:[~2015-02-23  0:03 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-08 15:40 An old "write-mostly" read balance issue Dark Penguin
2015-02-23  0:03 ` NeilBrown [this message]
2015-02-24  8:20   ` Tomáš Hodek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150223110332.4c135de9@notabene.brown \
    --to=neilb@suse.de \
    --cc=darkpenguin@yandex.ru \
    --cc=linux-raid@vger.kernel.org \
    --cc=tomas.hodek@volny.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox