Raid1 doesn't balance under high load [patch]

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Raid1 doesn't balance under high load [patch]
@ 2004-06-10 11:30 Miquel van Smoorenburg
       [not found] ` <000c01c44ef0$195820a0$0201a8c0@windows>
  2004-06-11  1:26 ` Neil Brown
  0 siblings, 2 replies; 6+ messages in thread
From: Miquel van Smoorenburg @ 2004-06-10 11:30 UTC (permalink / raw)
  To: linux-raid

I have several servers installed with a bootable raid1 array. I noticed
that under high load, the load wasn't balanced over the 2 disks in
the array anymore - all reads went to just one of the disks.

The problem is in raid1.c:read_balance().

There's a check to see if the array is in sync:

        /*
         * Check if it if we can balance. We can balance on the whole
         * device if no resync is going on, or below the resync window.
         * We take the first readable disk when above the resync window.
         */
        if (!conf->mddev->in_sync && (this_sector + sectors >= conf->next_resync)) {

Now if you write to the array, the array is marked not in sync by
md.c:md_write_start(). conf->next_resync is initialized to zero, so
that means read balancing doesn't work anymore.

Now I think there should be a separate flag called 'resync_in_progress'
that flags whether, well, a resync is in progress ;) and the whole
read_balance() function should be simplified as well since it tests
the same things about three times, but for now here's a simple one-liner
that fixes it:

--- linux-2.6.6/drivers/md/raid1.c.orig	2004-05-10 04:32:37.000000000 +0200
+++ linux-2.6.6/drivers/md/raid1.c	2004-06-10 01:16:00.000000000 +0200
@@ -1172,6 +1172,7 @@
 		mddev->recovery_cp = MaxSector;
 
 	conf->resync_lock = SPIN_LOCK_UNLOCKED;
+	conf->next_resync = mddev->size << 1;
 	init_waitqueue_head(&conf->wait_idle);
 	init_waitqueue_head(&conf->wait_resume);
 

Signed-Off-By: Miquel van Smoorenburg <miquels@cistron.nl>

(please keep me cc'ed, I'm not on the list)

Mike.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Raid1 doesn't balance under high load [patch]
@ 2004-06-10 13:38 TJ Harrell
  0 siblings, 0 replies; 6+ messages in thread
From: TJ Harrell @ 2004-06-10 13:38 UTC (permalink / raw)
  To: linux-raid

I don't know the details of how writing works in the code. I do know that
consecutive writes must be issued to both disks, though. This means that
there is a concievable time when one disk is written to and one is not,
making the array unclean. I'm assuming that this is the reason that the
array is marked unclean during a write. If you disable this, I would bet
that it will create a race condition where you may read from a disk before
it is written to and get old data. This would probably cause data
corruption, no?

----- Original Message ----- 
From: Miquel van Smoorenburg
To: linux-raid@vger.kernel.org
Sent: Thursday, June 10, 2004 7:30 AM
Subject: Raid1 doesn't balance under high load [patch]

I have several servers installed with a bootable raid1 array. I noticed
that under high load, the load wasn't balanced over the 2 disks in
the array anymore - all reads went to just one of the disks.

The problem is in raid1.c:read_balance().

There's a check to see if the array is in sync:

        /*
         * Check if it if we can balance. We can balance on the whole
         * device if no resync is going on, or below the resync window.
         * We take the first readable disk when above the resync window.
         */
        if (!conf->mddev->in_sync && (this_sector + sectors >=
conf->next_resync)) {

Now if you write to the array, the array is marked not in sync by
md.c:md_write_start(). conf->next_resync is initialized to zero, so
that means read balancing doesn't work anymore.

Now I think there should be a separate flag called 'resync_in_progress'
that flags whether, well, a resync is in progress ;) and the whole
read_balance() function should be simplified as well since it tests
the same things about three times, but for now here's a simple one-liner
that fixes it:

--- linux-2.6.6/drivers/md/raid1.c.orig 2004-05-10 04:32:37.000000000 +0200
+++ linux-2.6.6/drivers/md/raid1.c 2004-06-10 01:16:00.000000000 +0200
@@ -1172,6 +1172,7 @@
  mddev->recovery_cp = MaxSector;

  conf->resync_lock = SPIN_LOCK_UNLOCKED;
+ conf->next_resync = mddev->size << 1;
  init_waitqueue_head(&conf->wait_idle);
  init_waitqueue_head(&conf->wait_resume);

Signed-Off-By: Miquel van Smoorenburg <miquels@cistron.nl>

(please keep me cc'ed, I'm not on the list)

Mike.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Raid1 doesn't balance under high load [patch]
       [not found] ` <000c01c44ef0$195820a0$0201a8c0@windows>
@ 2004-06-10 14:42   ` Miquel van Smoorenburg
  0 siblings, 0 replies; 6+ messages in thread
From: Miquel van Smoorenburg @ 2004-06-10 14:42 UTC (permalink / raw)
  To: TJ Harrell; +Cc: linux-raid

According to TJ Harrell:
> I don't know the details of how writing works in the code. I do know that
> consecutive writes must be issued to both disks, though. This means that
> there is a concievable time when one disk is written to and one is not,
> making the array unclean. I'm assuming that this is the reason that the
> array is marked unclean during a write. If you disable this, I would bet
> that it will create a race condition where you may read from a disk before
> it is written to and get old data. This would probably cause data
> corruption, no?

Not when going through the buffer/page cache, which acts as a syncpoint.
If you write a page to disk, it is first put in the pagecache as
"dirty" page. Once the write is complete (meaning both disks have
written it) the page is marked as clean.

If some other process wants to read the same data from disk, it
will go through the buffer/page cache and get the data from the
(dirty or in the mean time clean) page that is there already.
It won't go behind the buffer/pagecache's back to read the data
straight from disk ...

Think about it - all writes to disk-like devices are queued, so if
this didn't happen you'd have that problem all the time.

So the code in there isn't about preventing that at all. It is
really checking for 'is there a rebuild in progress' but it simply
gets that wrong ..

Mike.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Raid1 doesn't balance under high load [patch]
  2004-06-10 11:30 Miquel van Smoorenburg
       [not found] ` <000c01c44ef0$195820a0$0201a8c0@windows>
@ 2004-06-11  1:26 ` Neil Brown
  2004-06-14 11:30   ` Miquel van Smoorenburg
  1 sibling, 1 reply; 6+ messages in thread
From: Neil Brown @ 2004-06-11  1:26 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-raid

On Thursday June 10, miquels@cistron.nl wrote:
> I have several servers installed with a bootable raid1 array. I noticed
> that under high load, the load wasn't balanced over the 2 disks in
> the array anymore - all reads went to just one of the disks.
> 
> The problem is in raid1.c:read_balance().
> 
> There's a check to see if the array is in sync:
> 
>         /*
>          * Check if it if we can balance. We can balance on the whole
>          * device if no resync is going on, or below the resync window.
>          * We take the first readable disk when above the resync window.
>          */
>         if (!conf->mddev->in_sync && (this_sector + sectors >= conf->next_resync)) {
> 
> Now if you write to the array, the array is marked not in sync by
> md.c:md_write_start(). conf->next_resync is initialized to zero, so
> that means read balancing doesn't work anymore.

Yes, you are right.  Thanks.
Your patch is close, but not quite right (though it would be hard to
cause it to actually fail).
I think it patch is more correct.  I would appreciate it if you could
confirm that it works for you.

Thanks again,
NeilBrown

==========================================
Fix raid1 read_balancing code.

The meaning of mddev->in_sync changed subtly a while ago, and raid1
wasn't changed to match.  This results in raid1 read_balancing
not working properly.
This patch corrects the relevant test.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/raid1.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~	2004-06-11 11:20:34.000000000 +1000
+++ ./drivers/md/raid1.c	2004-06-11 11:21:57.000000000 +1000
@@ -375,7 +375,8 @@ static int read_balance(conf_t *conf, st
 	 * device if no resync is going on, or below the resync window.
 	 * We take the first readable disk when above the resync window.
 	 */
-	if (!conf->mddev->in_sync && (this_sector + sectors >= conf->next_resync)) {
+	if (!conf->mddev->recovery_cp < MaxSector &&
+	    (this_sector + sectors >= conf->next_resync)) {
 		/* make sure that disk is operational */
 		new_disk = 0;
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Raid1 doesn't balance under high load [patch]
  2004-06-11  1:26 ` Neil Brown
@ 2004-06-14 11:30   ` Miquel van Smoorenburg
  2004-06-14 23:48     ` Neil Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Miquel van Smoorenburg @ 2004-06-14 11:30 UTC (permalink / raw)
  To: Neil Brown; +Cc: Miquel van Smoorenburg, linux-raid

On 2004.06.11 03:26, Neil Brown wrote:
> On Thursday June 10, miquels@cistron.nl wrote:
> > The problem is in raid1.c:read_balance().
> > 
> > Now if you write to the array, the array is marked not in sync by
> > md.c:md_write_start(). conf->next_resync is initialized to zero, so
> > that means read balancing doesn't work anymore.
> 
> Yes, you are right.  Thanks.
> Your patch is close, but not quite right (though it would be hard to
> cause it to actually fail).
> I think it patch is more correct.  I would appreciate it if you could
> confirm that it works for you.

It doesn't because of a small typo:

> diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
> --- ./drivers/md/raid1.c~current~	2004-06-11 11:20:34.000000000 +1000
> +++ ./drivers/md/raid1.c	2004-06-11 11:21:57.000000000 +1000
> @@ -375,7 +375,8 @@ static int read_balance(conf_t *conf, st
>  	 * device if no resync is going on, or below the resync window.
>  	 * We take the first readable disk when above the resync window.
>  	 */
> -	if (!conf->mddev->in_sync && (this_sector + sectors >= conf->next_resync)) {
> +	if (!conf->mddev->recovery_cp < MaxSector &&
> +	    (this_sector + sectors >= conf->next_resync)) {
>  		/* make sure that disk is operational */
>  		new_disk = 0;

I think that should read:


-	if (!conf->mddev->in_sync && (this_sector + sectors >= conf->next_resync)) {
+	if (conf->mddev->recovery_cp < MaxSector &&
+	    this_sector + sectors >= conf->next_resync) {

and then it works.

Mike.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Raid1 doesn't balance under high load [patch]
  2004-06-14 11:30   ` Miquel van Smoorenburg
@ 2004-06-14 23:48     ` Neil Brown
  0 siblings, 0 replies; 6+ messages in thread
From: Neil Brown @ 2004-06-14 23:48 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-raid

On Monday June 14, miquels@cistron.nl wrote:
> On 2004.06.11 03:26, Neil Brown wrote:
> > On Thursday June 10, miquels@cistron.nl wrote:
> > > The problem is in raid1.c:read_balance().
> > > 
> > > Now if you write to the array, the array is marked not in sync by
> > > md.c:md_write_start(). conf->next_resync is initialized to zero, so
> > > that means read balancing doesn't work anymore.
> > 
> > Yes, you are right.  Thanks.
> > Your patch is close, but not quite right (though it would be hard to
> > cause it to actually fail).
> > I think it patch is more correct.  I would appreciate it if you could
> > confirm that it works for you.
> 
> It doesn't because of a small typo:
> 
> > diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
> > --- ./drivers/md/raid1.c~current~	2004-06-11 11:20:34.000000000 +1000
> > +++ ./drivers/md/raid1.c	2004-06-11 11:21:57.000000000 +1000
> > @@ -375,7 +375,8 @@ static int read_balance(conf_t *conf, st
> >  	 * device if no resync is going on, or below the resync window.
> >  	 * We take the first readable disk when above the resync window.
> >  	 */
> > -	if (!conf->mddev->in_sync && (this_sector + sectors >= conf->next_resync)) {
> > +	if (!conf->mddev->recovery_cp < MaxSector &&
> > +	    (this_sector + sectors >= conf->next_resync)) {
> >  		/* make sure that disk is operational */
> >  		new_disk = 0;
> 
> I think that should read:
> 
> 
> -	if (!conf->mddev->in_sync && (this_sector + sectors >= conf->next_resync)) {
> +	if (conf->mddev->recovery_cp < MaxSector &&
> +	    this_sector + sectors >= conf->next_resync) {
> 
> and then it works.

Yes.... thanks for the fix and thanks for testing it.  I appreciate
it.
I'll send the correct patch off to Andrew Morton.

NeilBrown

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-06-14 23:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-10 13:38 Raid1 doesn't balance under high load [patch] TJ Harrell
  -- strict thread matches above, loose matches on Subject: below --
2004-06-10 11:30 Miquel van Smoorenburg
     [not found] ` <000c01c44ef0$195820a0$0201a8c0@windows>
2004-06-10 14:42   ` Miquel van Smoorenburg
2004-06-11  1:26 ` Neil Brown
2004-06-14 11:30   ` Miquel van Smoorenburg
2004-06-14 23:48     ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).