linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 1/3]raid5: adjust order of some operations in handle_stripe
@ 2014-05-22 11:24 Shaohua Li
  2014-05-28  2:59 ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Shaohua Li @ 2014-05-22 11:24 UTC (permalink / raw)
  To: linux-raid, neilb


This is to revert ef5b7c69b7a1b8b8744a6168b6f. handle_stripe_clean_event()
handles finished stripes, which really should be the first thing to do. The
original changelog says checking reconstruct_state should be the first as
handle_stripe_clean_event can clear some dev->flags and impact checking
reconstruct_state code path. It's unclear to me why this happens, because I
thought written finish and reconstruct_state equals to *_result can't happen in
the same time.

I also moved checking reconstruct_state code path after handle_stripe_dirtying.
If that code sets reconstruct_state to reconstruct_state_idle, the order change
will make us miss one handle_stripe_dirtying. But the stripe will be eventually
handled again when written is finished.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 drivers/md/raid5.c |   74 ++++++++++++++++++++++++++---------------------------
 1 file changed, 37 insertions(+), 37 deletions(-)

Index: linux/drivers/md/raid5.c
===================================================================
--- linux.orig/drivers/md/raid5.c	2014-05-22 13:27:21.000000000 +0800
+++ linux/drivers/md/raid5.c	2014-05-22 14:41:05.603276379 +0800
@@ -3747,43 +3747,6 @@ static void handle_stripe(struct stripe_
 			handle_failed_sync(conf, sh, &s);
 	}
 
-	/* Now we check to see if any write operations have recently
-	 * completed
-	 */
-	prexor = 0;
-	if (sh->reconstruct_state == reconstruct_state_prexor_drain_result)
-		prexor = 1;
-	if (sh->reconstruct_state == reconstruct_state_drain_result ||
-	    sh->reconstruct_state == reconstruct_state_prexor_drain_result) {
-		sh->reconstruct_state = reconstruct_state_idle;
-
-		/* All the 'written' buffers and the parity block are ready to
-		 * be written back to disk
-		 */
-		BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags) &&
-		       !test_bit(R5_Discard, &sh->dev[sh->pd_idx].flags));
-		BUG_ON(sh->qd_idx >= 0 &&
-		       !test_bit(R5_UPTODATE, &sh->dev[sh->qd_idx].flags) &&
-		       !test_bit(R5_Discard, &sh->dev[sh->qd_idx].flags));
-		for (i = disks; i--; ) {
-			struct r5dev *dev = &sh->dev[i];
-			if (test_bit(R5_LOCKED, &dev->flags) &&
-				(i == sh->pd_idx || i == sh->qd_idx ||
-				 dev->written)) {
-				pr_debug("Writing block %d\n", i);
-				set_bit(R5_Wantwrite, &dev->flags);
-				if (prexor)
-					continue;
-				if (!test_bit(R5_Insync, &dev->flags) ||
-				    ((i == sh->pd_idx || i == sh->qd_idx)  &&
-				     s.failed == 0))
-					set_bit(STRIPE_INSYNC, &sh->state);
-			}
-		}
-		if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
-			s.dec_preread_active = 1;
-	}
-
 	/*
 	 * might be able to return some write requests if the parity blocks
 	 * are safe, or on a failed drive
@@ -3827,6 +3790,43 @@ static void handle_stripe(struct stripe_
 	if (s.to_write && !sh->reconstruct_state && !sh->check_state)
 		handle_stripe_dirtying(conf, sh, &s, disks);
 
+	/* Now we check to see if any write operations have recently
+	 * completed
+	 */
+	prexor = 0;
+	if (sh->reconstruct_state == reconstruct_state_prexor_drain_result)
+		prexor = 1;
+	if (sh->reconstruct_state == reconstruct_state_drain_result ||
+	    sh->reconstruct_state == reconstruct_state_prexor_drain_result) {
+		sh->reconstruct_state = reconstruct_state_idle;
+
+		/* All the 'written' buffers and the parity block are ready to
+		 * be written back to disk
+		 */
+		BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags) &&
+		       !test_bit(R5_Discard, &sh->dev[sh->pd_idx].flags));
+		BUG_ON(sh->qd_idx >= 0 &&
+		       !test_bit(R5_UPTODATE, &sh->dev[sh->qd_idx].flags) &&
+		       !test_bit(R5_Discard, &sh->dev[sh->qd_idx].flags));
+		for (i = disks; i--; ) {
+			struct r5dev *dev = &sh->dev[i];
+			if (test_bit(R5_LOCKED, &dev->flags) &&
+				(i == sh->pd_idx || i == sh->qd_idx ||
+				 dev->written)) {
+				pr_debug("Writing block %d\n", i);
+				set_bit(R5_Wantwrite, &dev->flags);
+				if (prexor)
+					continue;
+				if (!test_bit(R5_Insync, &dev->flags) ||
+				    ((i == sh->pd_idx || i == sh->qd_idx)  &&
+				     s.failed == 0))
+					set_bit(STRIPE_INSYNC, &sh->state);
+			}
+		}
+		if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
+			s.dec_preread_active = 1;
+	}
+
 	/* maybe we need to check and possibly fix the parity for this stripe
 	 * Any reads will already have been scheduled, so we just see if enough
 	 * data is available.  The parity check is held off while parity

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch 1/3]raid5: adjust order of some operations in handle_stripe
  2014-05-22 11:24 [patch 1/3]raid5: adjust order of some operations in handle_stripe Shaohua Li
@ 2014-05-28  2:59 ` NeilBrown
  2014-05-28  3:45   ` Shaohua Li
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2014-05-28  2:59 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 6563 bytes --]

On Thu, 22 May 2014 19:24:31 +0800 Shaohua Li <shli@kernel.org> wrote:

> 
> This is to revert ef5b7c69b7a1b8b8744a6168b6f. handle_stripe_clean_event()
> handles finished stripes, which really should be the first thing to do. The
> original changelog says checking reconstruct_state should be the first as
> handle_stripe_clean_event can clear some dev->flags and impact checking
> reconstruct_state code path. It's unclear to me why this happens, because I
> thought written finish and reconstruct_state equals to *_result can't happen in
> the same time.

"unclear to me" "I thought" are sufficient to justify a change, though they
are certainly sufficient to ask a question.

Are you asking a question or submitting a change?

You may well be correct that if reconstruct_state is not
reconstruct_state_idle, then handle_stripe_clean_event cannot possible be
called.  In that case, maybe we should change the code flow to make that more
obvious, but certainly the changelog comment should be clear about exactly
why.


> 
> I also moved checking reconstruct_state code path after handle_stripe_dirtying.
> If that code sets reconstruct_state to reconstruct_state_idle, the order change
> will make us miss one handle_stripe_dirtying. But the stripe will be eventually
> handled again when written is finished.

You haven't said here why this patch is a good thing, only why it isn't
obviously bad.  I really need some justification to make a change and you
haven't provided any, at least not in this changelog comment.

Maybe we need a completely different approach.
Instead of repeatedly shuffling code inside handle_stripe(), how about we put
all of handle_stripe inside a loop which runs as long as STRIPE_HANDLE is set
and sh->count == 1.
ie.

	if (test_and_set_bit_lock(STRIPE_ACTIVE, &sh->state)) {
		/* already being handled, ensure it gets handled
		 * again when current action finishes */
		set_bit(STRIPE_HANDLE, &sh->state);
		return;
	}

        do {
	        clear_bit(STRIPE_HANDLE, &sh->state);
                __handle_stripe(sh);
        } while (test_bit(STRIPE_HANDLE, &sh->state)
                 && atomic_read(&sh->count) == 1);
	clear_bit_unlock(STRIPE_ACTIVE, &sh->state);


where the rest of the current handle_stripe() goes in to __handle_stripe().

Would that address your performance concerns, or is there still too much
overhead?

It's not that I think your code is wrong, but I want the result to be
"obviously correct" as much as possible, and currently I don't think it is.

Thanks,
NeilBrown


> 
> Signed-off-by: Shaohua Li <shli@fusionio.com>
> ---
>  drivers/md/raid5.c |   74 ++++++++++++++++++++++++++---------------------------
>  1 file changed, 37 insertions(+), 37 deletions(-)
> 
> Index: linux/drivers/md/raid5.c
> ===================================================================
> --- linux.orig/drivers/md/raid5.c	2014-05-22 13:27:21.000000000 +0800
> +++ linux/drivers/md/raid5.c	2014-05-22 14:41:05.603276379 +0800
> @@ -3747,43 +3747,6 @@ static void handle_stripe(struct stripe_
>  			handle_failed_sync(conf, sh, &s);
>  	}
>  
> -	/* Now we check to see if any write operations have recently
> -	 * completed
> -	 */
> -	prexor = 0;
> -	if (sh->reconstruct_state == reconstruct_state_prexor_drain_result)
> -		prexor = 1;
> -	if (sh->reconstruct_state == reconstruct_state_drain_result ||
> -	    sh->reconstruct_state == reconstruct_state_prexor_drain_result) {
> -		sh->reconstruct_state = reconstruct_state_idle;
> -
> -		/* All the 'written' buffers and the parity block are ready to
> -		 * be written back to disk
> -		 */
> -		BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags) &&
> -		       !test_bit(R5_Discard, &sh->dev[sh->pd_idx].flags));
> -		BUG_ON(sh->qd_idx >= 0 &&
> -		       !test_bit(R5_UPTODATE, &sh->dev[sh->qd_idx].flags) &&
> -		       !test_bit(R5_Discard, &sh->dev[sh->qd_idx].flags));
> -		for (i = disks; i--; ) {
> -			struct r5dev *dev = &sh->dev[i];
> -			if (test_bit(R5_LOCKED, &dev->flags) &&
> -				(i == sh->pd_idx || i == sh->qd_idx ||
> -				 dev->written)) {
> -				pr_debug("Writing block %d\n", i);
> -				set_bit(R5_Wantwrite, &dev->flags);
> -				if (prexor)
> -					continue;
> -				if (!test_bit(R5_Insync, &dev->flags) ||
> -				    ((i == sh->pd_idx || i == sh->qd_idx)  &&
> -				     s.failed == 0))
> -					set_bit(STRIPE_INSYNC, &sh->state);
> -			}
> -		}
> -		if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
> -			s.dec_preread_active = 1;
> -	}
> -
>  	/*
>  	 * might be able to return some write requests if the parity blocks
>  	 * are safe, or on a failed drive
> @@ -3827,6 +3790,43 @@ static void handle_stripe(struct stripe_
>  	if (s.to_write && !sh->reconstruct_state && !sh->check_state)
>  		handle_stripe_dirtying(conf, sh, &s, disks);
>  
> +	/* Now we check to see if any write operations have recently
> +	 * completed
> +	 */
> +	prexor = 0;
> +	if (sh->reconstruct_state == reconstruct_state_prexor_drain_result)
> +		prexor = 1;
> +	if (sh->reconstruct_state == reconstruct_state_drain_result ||
> +	    sh->reconstruct_state == reconstruct_state_prexor_drain_result) {
> +		sh->reconstruct_state = reconstruct_state_idle;
> +
> +		/* All the 'written' buffers and the parity block are ready to
> +		 * be written back to disk
> +		 */
> +		BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags) &&
> +		       !test_bit(R5_Discard, &sh->dev[sh->pd_idx].flags));
> +		BUG_ON(sh->qd_idx >= 0 &&
> +		       !test_bit(R5_UPTODATE, &sh->dev[sh->qd_idx].flags) &&
> +		       !test_bit(R5_Discard, &sh->dev[sh->qd_idx].flags));
> +		for (i = disks; i--; ) {
> +			struct r5dev *dev = &sh->dev[i];
> +			if (test_bit(R5_LOCKED, &dev->flags) &&
> +				(i == sh->pd_idx || i == sh->qd_idx ||
> +				 dev->written)) {
> +				pr_debug("Writing block %d\n", i);
> +				set_bit(R5_Wantwrite, &dev->flags);
> +				if (prexor)
> +					continue;
> +				if (!test_bit(R5_Insync, &dev->flags) ||
> +				    ((i == sh->pd_idx || i == sh->qd_idx)  &&
> +				     s.failed == 0))
> +					set_bit(STRIPE_INSYNC, &sh->state);
> +			}
> +		}
> +		if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
> +			s.dec_preread_active = 1;
> +	}
> +
>  	/* maybe we need to check and possibly fix the parity for this stripe
>  	 * Any reads will already have been scheduled, so we just see if enough
>  	 * data is available.  The parity check is held off while parity


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch 1/3]raid5: adjust order of some operations in handle_stripe
  2014-05-28  2:59 ` NeilBrown
@ 2014-05-28  3:45   ` Shaohua Li
  2014-05-28  4:54     ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Shaohua Li @ 2014-05-28  3:45 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Wed, May 28, 2014 at 12:59:37PM +1000, NeilBrown wrote:
> On Thu, 22 May 2014 19:24:31 +0800 Shaohua Li <shli@kernel.org> wrote:
> 
> > 
> > This is to revert ef5b7c69b7a1b8b8744a6168b6f. handle_stripe_clean_event()
> > handles finished stripes, which really should be the first thing to do. The
> > original changelog says checking reconstruct_state should be the first as
> > handle_stripe_clean_event can clear some dev->flags and impact checking
> > reconstruct_state code path. It's unclear to me why this happens, because I
> > thought written finish and reconstruct_state equals to *_result can't happen in
> > the same time.
> 
> "unclear to me" "I thought" are sufficient to justify a change, though they
> are certainly sufficient to ask a question.
> 
> Are you asking a question or submitting a change?
> 
> You may well be correct that if reconstruct_state is not
> reconstruct_state_idle, then handle_stripe_clean_event cannot possible be
> called.  In that case, maybe we should change the code flow to make that more
> obvious, but certainly the changelog comment should be clear about exactly
> why.

I'm sorry, it's more like a question. I really didn't understand why we have
ef5b7c69b7a1b8b8744a6168b6f, so I'm not 100% sure about. It would be great you
can help share a hint.
 
> > 
> > I also moved checking reconstruct_state code path after handle_stripe_dirtying.
> > If that code sets reconstruct_state to reconstruct_state_idle, the order change
> > will make us miss one handle_stripe_dirtying. But the stripe will be eventually
> > handled again when written is finished.
> 
> You haven't said here why this patch is a good thing, only why it isn't
> obviously bad.  I really need some justification to make a change and you
> haven't provided any, at least not in this changelog comment.

ok, I'll add more about this.
 
> Maybe we need a completely different approach.
> Instead of repeatedly shuffling code inside handle_stripe(), how about we put
> all of handle_stripe inside a loop which runs as long as STRIPE_HANDLE is set
> and sh->count == 1.
> ie.
> 
> 	if (test_and_set_bit_lock(STRIPE_ACTIVE, &sh->state)) {
> 		/* already being handled, ensure it gets handled
> 		 * again when current action finishes */
> 		set_bit(STRIPE_HANDLE, &sh->state);
> 		return;
> 	}
> 
>         do {
> 	        clear_bit(STRIPE_HANDLE, &sh->state);
>                 __handle_stripe(sh);
>         } while (test_bit(STRIPE_HANDLE, &sh->state)
>                  && atomic_read(&sh->count) == 1);
> 	clear_bit_unlock(STRIPE_ACTIVE, &sh->state);
> 
> 
> where the rest of the current handle_stripe() goes in to __handle_stripe().
> 
> Would that address your performance concerns, or is there still too much
> overhead?

Let me try. One issue here is we still have massive cache miss when checking
stripe/dev state. I suppose this doesn't help but data should prove.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch 1/3]raid5: adjust order of some operations in handle_stripe
  2014-05-28  3:45   ` Shaohua Li
@ 2014-05-28  4:54     ` NeilBrown
  2014-05-28 10:09       ` Shaohua Li
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2014-05-28  4:54 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4299 bytes --]

On Wed, 28 May 2014 11:45:07 +0800 Shaohua Li <shli@kernel.org> wrote:

> On Wed, May 28, 2014 at 12:59:37PM +1000, NeilBrown wrote:
> > On Thu, 22 May 2014 19:24:31 +0800 Shaohua Li <shli@kernel.org> wrote:
> > 
> > > 
> > > This is to revert ef5b7c69b7a1b8b8744a6168b6f. handle_stripe_clean_event()
> > > handles finished stripes, which really should be the first thing to do. The
> > > original changelog says checking reconstruct_state should be the first as
> > > handle_stripe_clean_event can clear some dev->flags and impact checking
> > > reconstruct_state code path. It's unclear to me why this happens, because I
> > > thought written finish and reconstruct_state equals to *_result can't happen in
> > > the same time.
> > 
> > "unclear to me" "I thought" are sufficient to justify a change, though they
> > are certainly sufficient to ask a question.
> > 
> > Are you asking a question or submitting a change?
> > 
> > You may well be correct that if reconstruct_state is not
> > reconstruct_state_idle, then handle_stripe_clean_event cannot possible be
> > called.  In that case, maybe we should change the code flow to make that more
> > obvious, but certainly the changelog comment should be clear about exactly
> > why.
> 
> I'm sorry, it's more like a question. I really didn't understand why we have
> ef5b7c69b7a1b8b8744a6168b6f, so I'm not 100% sure about. It would be great you
> can help share a hint.

It's a while ago and I don't remember, but I suspect that I added that patch
because handle_stripe_clean_event was about to change to clear R5_UPTODATE,
and this code which was previously *after* handle_stripe_clean_event tested
R5_UPTODATE (and could BUG if it wasn't set).

You may well be right that the two pieces of code cannot both run in the one
invocation of handle_stripe().  I haven't analysed the code closely to be
sure, but on casual reflection it seems likely.  However we always need to be
careful of races in unusual situations.

If that is correct, and if there are two (or more) different situations in
which handle_stripe runs, maybe one after IO has completed and one after
reconstruction has completed, and one when new devices have been added,
then there might be value in clearly delineating these so we don't bother
testing for cases that cannot happen.

If it is not correct, then your proposed change might be dangerous.


>  
> > > 
> > > I also moved checking reconstruct_state code path after handle_stripe_dirtying.
> > > If that code sets reconstruct_state to reconstruct_state_idle, the order change
> > > will make us miss one handle_stripe_dirtying. But the stripe will be eventually
> > > handled again when written is finished.
> > 
> > You haven't said here why this patch is a good thing, only why it isn't
> > obviously bad.  I really need some justification to make a change and you
> > haven't provided any, at least not in this changelog comment.
> 
> ok, I'll add more about this.
>  
> > Maybe we need a completely different approach.
> > Instead of repeatedly shuffling code inside handle_stripe(), how about we put
> > all of handle_stripe inside a loop which runs as long as STRIPE_HANDLE is set
> > and sh->count == 1.
> > ie.
> > 
> > 	if (test_and_set_bit_lock(STRIPE_ACTIVE, &sh->state)) {
> > 		/* already being handled, ensure it gets handled
> > 		 * again when current action finishes */
> > 		set_bit(STRIPE_HANDLE, &sh->state);
> > 		return;
> > 	}
> > 
> >         do {
> > 	        clear_bit(STRIPE_HANDLE, &sh->state);
> >                 __handle_stripe(sh);
> >         } while (test_bit(STRIPE_HANDLE, &sh->state)
> >                  && atomic_read(&sh->count) == 1);
> > 	clear_bit_unlock(STRIPE_ACTIVE, &sh->state);
> > 
> > 
> > where the rest of the current handle_stripe() goes in to __handle_stripe().
> > 
> > Would that address your performance concerns, or is there still too much
> > overhead?
> 
> Let me try. One issue here is we still have massive cache miss when checking
> stripe/dev state. I suppose this doesn't help but data should prove.

That would be great - thanks.
If you can identify exactly where the cache misses are causing a problem, we
might be able to optimise around that.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch 1/3]raid5: adjust order of some operations in handle_stripe
  2014-05-28  4:54     ` NeilBrown
@ 2014-05-28 10:09       ` Shaohua Li
  0 siblings, 0 replies; 5+ messages in thread
From: Shaohua Li @ 2014-05-28 10:09 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Wed, May 28, 2014 at 02:54:35PM +1000, NeilBrown wrote:
> On Wed, 28 May 2014 11:45:07 +0800 Shaohua Li <shli@kernel.org> wrote:
> 
> > On Wed, May 28, 2014 at 12:59:37PM +1000, NeilBrown wrote:
> > > On Thu, 22 May 2014 19:24:31 +0800 Shaohua Li <shli@kernel.org> wrote:
> > > 
> > > > 
> > > > This is to revert ef5b7c69b7a1b8b8744a6168b6f. handle_stripe_clean_event()
> > > > handles finished stripes, which really should be the first thing to do. The
> > > > original changelog says checking reconstruct_state should be the first as
> > > > handle_stripe_clean_event can clear some dev->flags and impact checking
> > > > reconstruct_state code path. It's unclear to me why this happens, because I
> > > > thought written finish and reconstruct_state equals to *_result can't happen in
> > > > the same time.
> > > 
> > > "unclear to me" "I thought" are sufficient to justify a change, though they
> > > are certainly sufficient to ask a question.
> > > 
> > > Are you asking a question or submitting a change?
> > > 
> > > You may well be correct that if reconstruct_state is not
> > > reconstruct_state_idle, then handle_stripe_clean_event cannot possible be
> > > called.  In that case, maybe we should change the code flow to make that more
> > > obvious, but certainly the changelog comment should be clear about exactly
> > > why.
> > 
> > I'm sorry, it's more like a question. I really didn't understand why we have
> > ef5b7c69b7a1b8b8744a6168b6f, so I'm not 100% sure about. It would be great you
> > can help share a hint.
> 
> It's a while ago and I don't remember, but I suspect that I added that patch
> because handle_stripe_clean_event was about to change to clear R5_UPTODATE,
> and this code which was previously *after* handle_stripe_clean_event tested
> R5_UPTODATE (and could BUG if it wasn't set).
> 
> You may well be right that the two pieces of code cannot both run in the one
> invocation of handle_stripe().  I haven't analysed the code closely to be
> sure, but on casual reflection it seems likely.  However we always need to be
> careful of races in unusual situations.

Sure. The whole piece of code of handle_stripe is hard to understand.

> If that is correct, and if there are two (or more) different situations in
> which handle_stripe runs, maybe one after IO has completed and one after
> reconstruction has completed, and one when new devices have been added,
> then there might be value in clearly delineating these so we don't bother
> testing for cases that cannot happen.

Since run_io will increase stripe counter, if IO is running, the stripe can't
be handled again. So either IO is finished, handle_stripe handles the finished
IO (handle_stripe_clean_event) and then handle new requests in the stripe.
Doing handle_stripe_clean_event first is ok. So could another reconstruction be
scheduled before the IO? It sounds no. Either the stripe has towrite, so there
there is overlap, new bio to the stripe will wait. Or stripe has LOCKED set, so
reconstruction can't be rescheduled.

> 
> If it is not correct, then your proposed change might be dangerous.
> 
> 
> >  
> > > > 
> > > > I also moved checking reconstruct_state code path after handle_stripe_dirtying.
> > > > If that code sets reconstruct_state to reconstruct_state_idle, the order change
> > > > will make us miss one handle_stripe_dirtying. But the stripe will be eventually
> > > > handled again when written is finished.
> > > 
> > > You haven't said here why this patch is a good thing, only why it isn't
> > > obviously bad.  I really need some justification to make a change and you
> > > haven't provided any, at least not in this changelog comment.
> > 
> > ok, I'll add more about this.
> >  
> > > Maybe we need a completely different approach.
> > > Instead of repeatedly shuffling code inside handle_stripe(), how about we put
> > > all of handle_stripe inside a loop which runs as long as STRIPE_HANDLE is set
> > > and sh->count == 1.
> > > ie.
> > > 
> > > 	if (test_and_set_bit_lock(STRIPE_ACTIVE, &sh->state)) {
> > > 		/* already being handled, ensure it gets handled
> > > 		 * again when current action finishes */
> > > 		set_bit(STRIPE_HANDLE, &sh->state);
> > > 		return;
> > > 	}
> > > 
> > >         do {
> > > 	        clear_bit(STRIPE_HANDLE, &sh->state);
> > >                 __handle_stripe(sh);
> > >         } while (test_bit(STRIPE_HANDLE, &sh->state)
> > >                  && atomic_read(&sh->count) == 1);
> > > 	clear_bit_unlock(STRIPE_ACTIVE, &sh->state);
> > > 
> > > 
> > > where the rest of the current handle_stripe() goes in to __handle_stripe().
> > > 
> > > Would that address your performance concerns, or is there still too much
> > > overhead?
> > 
> > Let me try. One issue here is we still have massive cache miss when checking
> > stripe/dev state. I suppose this doesn't help but data should prove.
> 
> That would be great - thanks.
> If you can identify exactly where the cache misses are causing a problem, we
> might be able to optimise around that.

I tried. It's slightly better than without any change, but still not as good as
adjusting the order (which cuts the run times of handle_stripe). But I then hit
a quick hang with such change, so can't run long time test.

When checking perf annotate, some obvious cache miss comes from checking
dev->flags in analysis_stripe and checking conf->disks[]. But it's not an issue
one hotspot uses > 50% CPU. Other code in handle_stripe contributes overhead.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-05-28 10:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-22 11:24 [patch 1/3]raid5: adjust order of some operations in handle_stripe Shaohua Li
2014-05-28  2:59 ` NeilBrown
2014-05-28  3:45   ` Shaohua Li
2014-05-28  4:54     ` NeilBrown
2014-05-28 10:09       ` Shaohua Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).