linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: [patch]raid5: make_request does less prepare wait
Date: Wed, 9 Apr 2014 12:08:08 +1000	[thread overview]
Message-ID: <20140409120808.40034647@notabene.brown> (raw)
In-Reply-To: <20140408040507.GA20886@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 3998 bytes --]

On Tue, 8 Apr 2014 12:05:07 +0800 Shaohua Li <shli@kernel.org> wrote:

> 
> In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of
> contention for sequential workload (or big request size workload). For such
> workload, each bio includes several stripes. So we can just do
> prepare_to_wait/finish_wait once for the whold bio instead of every stripe.
> This reduces the lock contention completely for such workload. Random workload
> might have the similar lock contention too, but I didn't see it yet, maybe
> because my stroage is still not fast enough.
> 
> Signed-off-by: Shaohua Li <shli@fusionio.com>

Thanks,
this looks every sensible, except .....


> ---
>  drivers/md/raid5.c |   18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> Index: linux/drivers/md/raid5.c
> ===================================================================
> --- linux.orig/drivers/md/raid5.c	2014-04-08 09:04:20.000000000 +0800
> +++ linux/drivers/md/raid5.c	2014-04-08 09:11:08.201533487 +0800
> @@ -4552,6 +4552,8 @@ static void make_request(struct mddev *m
>  	struct stripe_head *sh;
>  	const int rw = bio_data_dir(bi);
>  	int remaining;
> +	DEFINE_WAIT(w);
> +	bool do_prepare;
>  
>  	if (unlikely(bi->bi_rw & REQ_FLUSH)) {
>  		md_flush_request(mddev, bi);
> @@ -4575,15 +4577,19 @@ static void make_request(struct mddev *m
>  	bi->bi_next = NULL;
>  	bi->bi_phys_segments = 1;	/* over-loaded to count active stripes */
>  
> +	prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
>  	for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
>  		DEFINE_WAIT(w);
                ^^^^^^^^^^^^^^^

Shouldn't this be removed?  If so, please resubmit with that line deleted and
I'll apply the patch.

Thanks,
NeilBrown








>  		int previous;
>  		int seq;
>  
> +		do_prepare = false;
>  	retry:
>  		seq = read_seqcount_begin(&conf->gen_lock);
>  		previous = 0;
> -		prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
> +		if (do_prepare)
> +			prepare_to_wait(&conf->wait_for_overlap, &w,
> +				TASK_UNINTERRUPTIBLE);
>  		if (unlikely(conf->reshape_progress != MaxSector)) {
>  			/* spinlock is needed as reshape_progress may be
>  			 * 64bit on a 32bit platform, and so it might be
> @@ -4604,6 +4610,7 @@ static void make_request(struct mddev *m
>  				    : logical_sector >= conf->reshape_safe) {
>  					spin_unlock_irq(&conf->device_lock);
>  					schedule();
> +					do_prepare = true;
>  					goto retry;
>  				}
>  			}
> @@ -4640,6 +4647,7 @@ static void make_request(struct mddev *m
>  				if (must_retry) {
>  					release_stripe(sh);
>  					schedule();
> +					do_prepare = true;
>  					goto retry;
>  				}
>  			}
> @@ -4663,8 +4671,10 @@ static void make_request(struct mddev *m
>  				prepare_to_wait(&conf->wait_for_overlap,
>  						&w, TASK_INTERRUPTIBLE);
>  				if (logical_sector >= mddev->suspend_lo &&
> -				    logical_sector < mddev->suspend_hi)
> +				    logical_sector < mddev->suspend_hi) {
>  					schedule();
> +					do_prepare = true;
> +				}
>  				goto retry;
>  			}
>  
> @@ -4677,9 +4687,9 @@ static void make_request(struct mddev *m
>  				md_wakeup_thread(mddev->thread);
>  				release_stripe(sh);
>  				schedule();
> +				do_prepare = true;
>  				goto retry;
>  			}
> -			finish_wait(&conf->wait_for_overlap, &w);
>  			set_bit(STRIPE_HANDLE, &sh->state);
>  			clear_bit(STRIPE_DELAYED, &sh->state);
>  			if ((bi->bi_rw & REQ_SYNC) &&
> @@ -4689,10 +4699,10 @@ static void make_request(struct mddev *m
>  		} else {
>  			/* cannot get stripe for read-ahead, just give-up */
>  			clear_bit(BIO_UPTODATE, &bi->bi_flags);
> -			finish_wait(&conf->wait_for_overlap, &w);
>  			break;
>  		}
>  	}
> +	finish_wait(&conf->wait_for_overlap, &w);
>  
>  	remaining = raid5_dec_bi_active_stripes(bi);
>  	if (remaining == 0) {


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2014-04-09  2:08 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-08  4:05 [patch]raid5: make_request does less prepare wait Shaohua Li
2014-04-09  2:08 ` NeilBrown [this message]
2014-04-09  3:25   ` Shaohua Li
2014-04-09  5:23     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140409120808.40034647@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).