* [patch]raid5: make_request does less prepare wait @ 2014-04-08 4:05 Shaohua Li 2014-04-09 2:08 ` NeilBrown 0 siblings, 1 reply; 4+ messages in thread From: Shaohua Li @ 2014-04-08 4:05 UTC (permalink / raw) To: linux-raid, neilb In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of contention for sequential workload (or big request size workload). For such workload, each bio includes several stripes. So we can just do prepare_to_wait/finish_wait once for the whold bio instead of every stripe. This reduces the lock contention completely for such workload. Random workload might have the similar lock contention too, but I didn't see it yet, maybe because my stroage is still not fast enough. Signed-off-by: Shaohua Li <shli@fusionio.com> --- drivers/md/raid5.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) Index: linux/drivers/md/raid5.c =================================================================== --- linux.orig/drivers/md/raid5.c 2014-04-08 09:04:20.000000000 +0800 +++ linux/drivers/md/raid5.c 2014-04-08 09:11:08.201533487 +0800 @@ -4552,6 +4552,8 @@ static void make_request(struct mddev *m struct stripe_head *sh; const int rw = bio_data_dir(bi); int remaining; + DEFINE_WAIT(w); + bool do_prepare; if (unlikely(bi->bi_rw & REQ_FLUSH)) { md_flush_request(mddev, bi); @@ -4575,15 +4577,19 @@ static void make_request(struct mddev *m bi->bi_next = NULL; bi->bi_phys_segments = 1; /* over-loaded to count active stripes */ + prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) { DEFINE_WAIT(w); int previous; int seq; + do_prepare = false; retry: seq = read_seqcount_begin(&conf->gen_lock); previous = 0; - prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); + if (do_prepare) + prepare_to_wait(&conf->wait_for_overlap, &w, + TASK_UNINTERRUPTIBLE); if (unlikely(conf->reshape_progress != MaxSector)) { /* spinlock is needed as reshape_progress may be * 64bit on a 32bit platform, and so it might be @@ -4604,6 +4610,7 @@ static void make_request(struct mddev *m : logical_sector >= conf->reshape_safe) { spin_unlock_irq(&conf->device_lock); schedule(); + do_prepare = true; goto retry; } } @@ -4640,6 +4647,7 @@ static void make_request(struct mddev *m if (must_retry) { release_stripe(sh); schedule(); + do_prepare = true; goto retry; } } @@ -4663,8 +4671,10 @@ static void make_request(struct mddev *m prepare_to_wait(&conf->wait_for_overlap, &w, TASK_INTERRUPTIBLE); if (logical_sector >= mddev->suspend_lo && - logical_sector < mddev->suspend_hi) + logical_sector < mddev->suspend_hi) { schedule(); + do_prepare = true; + } goto retry; } @@ -4677,9 +4687,9 @@ static void make_request(struct mddev *m md_wakeup_thread(mddev->thread); release_stripe(sh); schedule(); + do_prepare = true; goto retry; } - finish_wait(&conf->wait_for_overlap, &w); set_bit(STRIPE_HANDLE, &sh->state); clear_bit(STRIPE_DELAYED, &sh->state); if ((bi->bi_rw & REQ_SYNC) && @@ -4689,10 +4699,10 @@ static void make_request(struct mddev *m } else { /* cannot get stripe for read-ahead, just give-up */ clear_bit(BIO_UPTODATE, &bi->bi_flags); - finish_wait(&conf->wait_for_overlap, &w); break; } } + finish_wait(&conf->wait_for_overlap, &w); remaining = raid5_dec_bi_active_stripes(bi); if (remaining == 0) { ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch]raid5: make_request does less prepare wait 2014-04-08 4:05 [patch]raid5: make_request does less prepare wait Shaohua Li @ 2014-04-09 2:08 ` NeilBrown 2014-04-09 3:25 ` Shaohua Li 0 siblings, 1 reply; 4+ messages in thread From: NeilBrown @ 2014-04-09 2:08 UTC (permalink / raw) To: Shaohua Li; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 3998 bytes --] On Tue, 8 Apr 2014 12:05:07 +0800 Shaohua Li <shli@kernel.org> wrote: > > In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of > contention for sequential workload (or big request size workload). For such > workload, each bio includes several stripes. So we can just do > prepare_to_wait/finish_wait once for the whold bio instead of every stripe. > This reduces the lock contention completely for such workload. Random workload > might have the similar lock contention too, but I didn't see it yet, maybe > because my stroage is still not fast enough. > > Signed-off-by: Shaohua Li <shli@fusionio.com> Thanks, this looks every sensible, except ..... > --- > drivers/md/raid5.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > Index: linux/drivers/md/raid5.c > =================================================================== > --- linux.orig/drivers/md/raid5.c 2014-04-08 09:04:20.000000000 +0800 > +++ linux/drivers/md/raid5.c 2014-04-08 09:11:08.201533487 +0800 > @@ -4552,6 +4552,8 @@ static void make_request(struct mddev *m > struct stripe_head *sh; > const int rw = bio_data_dir(bi); > int remaining; > + DEFINE_WAIT(w); > + bool do_prepare; > > if (unlikely(bi->bi_rw & REQ_FLUSH)) { > md_flush_request(mddev, bi); > @@ -4575,15 +4577,19 @@ static void make_request(struct mddev *m > bi->bi_next = NULL; > bi->bi_phys_segments = 1; /* over-loaded to count active stripes */ > > + prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); > for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) { > DEFINE_WAIT(w); ^^^^^^^^^^^^^^^ Shouldn't this be removed? If so, please resubmit with that line deleted and I'll apply the patch. Thanks, NeilBrown > int previous; > int seq; > > + do_prepare = false; > retry: > seq = read_seqcount_begin(&conf->gen_lock); > previous = 0; > - prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); > + if (do_prepare) > + prepare_to_wait(&conf->wait_for_overlap, &w, > + TASK_UNINTERRUPTIBLE); > if (unlikely(conf->reshape_progress != MaxSector)) { > /* spinlock is needed as reshape_progress may be > * 64bit on a 32bit platform, and so it might be > @@ -4604,6 +4610,7 @@ static void make_request(struct mddev *m > : logical_sector >= conf->reshape_safe) { > spin_unlock_irq(&conf->device_lock); > schedule(); > + do_prepare = true; > goto retry; > } > } > @@ -4640,6 +4647,7 @@ static void make_request(struct mddev *m > if (must_retry) { > release_stripe(sh); > schedule(); > + do_prepare = true; > goto retry; > } > } > @@ -4663,8 +4671,10 @@ static void make_request(struct mddev *m > prepare_to_wait(&conf->wait_for_overlap, > &w, TASK_INTERRUPTIBLE); > if (logical_sector >= mddev->suspend_lo && > - logical_sector < mddev->suspend_hi) > + logical_sector < mddev->suspend_hi) { > schedule(); > + do_prepare = true; > + } > goto retry; > } > > @@ -4677,9 +4687,9 @@ static void make_request(struct mddev *m > md_wakeup_thread(mddev->thread); > release_stripe(sh); > schedule(); > + do_prepare = true; > goto retry; > } > - finish_wait(&conf->wait_for_overlap, &w); > set_bit(STRIPE_HANDLE, &sh->state); > clear_bit(STRIPE_DELAYED, &sh->state); > if ((bi->bi_rw & REQ_SYNC) && > @@ -4689,10 +4699,10 @@ static void make_request(struct mddev *m > } else { > /* cannot get stripe for read-ahead, just give-up */ > clear_bit(BIO_UPTODATE, &bi->bi_flags); > - finish_wait(&conf->wait_for_overlap, &w); > break; > } > } > + finish_wait(&conf->wait_for_overlap, &w); > > remaining = raid5_dec_bi_active_stripes(bi); > if (remaining == 0) { [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch]raid5: make_request does less prepare wait 2014-04-09 2:08 ` NeilBrown @ 2014-04-09 3:25 ` Shaohua Li 2014-04-09 5:23 ` NeilBrown 0 siblings, 1 reply; 4+ messages in thread From: Shaohua Li @ 2014-04-09 3:25 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Wed, Apr 09, 2014 at 12:08:08PM +1000, NeilBrown wrote: > On Tue, 8 Apr 2014 12:05:07 +0800 Shaohua Li <shli@kernel.org> wrote: > > > > > In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of > > contention for sequential workload (or big request size workload). For such > > workload, each bio includes several stripes. So we can just do > > prepare_to_wait/finish_wait once for the whold bio instead of every stripe. > > This reduces the lock contention completely for such workload. Random workload > > might have the similar lock contention too, but I didn't see it yet, maybe > > because my stroage is still not fast enough. > > > > Signed-off-by: Shaohua Li <shli@fusionio.com> > > Thanks, > this looks every sensible, except ..... > > > > --- > > drivers/md/raid5.c | 18 ++++++++++++++---- > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > Index: linux/drivers/md/raid5.c > > =================================================================== > > --- linux.orig/drivers/md/raid5.c 2014-04-08 09:04:20.000000000 +0800 > > +++ linux/drivers/md/raid5.c 2014-04-08 09:11:08.201533487 +0800 > > @@ -4552,6 +4552,8 @@ static void make_request(struct mddev *m > > struct stripe_head *sh; > > const int rw = bio_data_dir(bi); > > int remaining; > > + DEFINE_WAIT(w); > > + bool do_prepare; > > > > if (unlikely(bi->bi_rw & REQ_FLUSH)) { > > md_flush_request(mddev, bi); > > @@ -4575,15 +4577,19 @@ static void make_request(struct mddev *m > > bi->bi_next = NULL; > > bi->bi_phys_segments = 1; /* over-loaded to count active stripes */ > > > > + prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); > > for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) { > > DEFINE_WAIT(w); > ^^^^^^^^^^^^^^^ > > Shouldn't this be removed? If so, please resubmit with that line deleted and > I'll apply the patch. Ah, that's silly, looks I sent wrong patch, sorry! Below is the correct one and I double checked it's the one working for me. Subject: raid5: make_request does less prepare wait In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of contention for sequential workload (or big request size workload). For such workload, each bio includes several stripes. So we can just do prepare_to_wait/finish_wait once for the whold bio instead of every stripe. This reduces the lock contention completely for such workload. Random workload might have the similar lock contention too, but I didn't see it yet, maybe because my stroage is still not fast enough. Signed-off-by: Shaohua Li <shli@fusionio.com> --- drivers/md/raid5.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) Index: linux/drivers/md/raid5.c =================================================================== --- linux.orig/drivers/md/raid5.c 2014-04-08 12:02:54.485630590 +0800 +++ linux/drivers/md/raid5.c 2014-04-09 11:03:04.276210597 +0800 @@ -4552,6 +4552,8 @@ static void make_request(struct mddev *m struct stripe_head *sh; const int rw = bio_data_dir(bi); int remaining; + DEFINE_WAIT(w); + bool do_prepare; if (unlikely(bi->bi_rw & REQ_FLUSH)) { md_flush_request(mddev, bi); @@ -4575,15 +4577,18 @@ static void make_request(struct mddev *m bi->bi_next = NULL; bi->bi_phys_segments = 1; /* over-loaded to count active stripes */ + prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) { - DEFINE_WAIT(w); int previous; int seq; + do_prepare = false; retry: seq = read_seqcount_begin(&conf->gen_lock); previous = 0; - prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); + if (do_prepare) + prepare_to_wait(&conf->wait_for_overlap, &w, + TASK_UNINTERRUPTIBLE); if (unlikely(conf->reshape_progress != MaxSector)) { /* spinlock is needed as reshape_progress may be * 64bit on a 32bit platform, and so it might be @@ -4604,6 +4609,7 @@ static void make_request(struct mddev *m : logical_sector >= conf->reshape_safe) { spin_unlock_irq(&conf->device_lock); schedule(); + do_prepare = true; goto retry; } } @@ -4640,6 +4646,7 @@ static void make_request(struct mddev *m if (must_retry) { release_stripe(sh); schedule(); + do_prepare = true; goto retry; } } @@ -4663,8 +4670,10 @@ static void make_request(struct mddev *m prepare_to_wait(&conf->wait_for_overlap, &w, TASK_INTERRUPTIBLE); if (logical_sector >= mddev->suspend_lo && - logical_sector < mddev->suspend_hi) + logical_sector < mddev->suspend_hi) { schedule(); + do_prepare = true; + } goto retry; } @@ -4677,9 +4686,9 @@ static void make_request(struct mddev *m md_wakeup_thread(mddev->thread); release_stripe(sh); schedule(); + do_prepare = true; goto retry; } - finish_wait(&conf->wait_for_overlap, &w); set_bit(STRIPE_HANDLE, &sh->state); clear_bit(STRIPE_DELAYED, &sh->state); if ((bi->bi_rw & REQ_SYNC) && @@ -4689,10 +4698,10 @@ static void make_request(struct mddev *m } else { /* cannot get stripe for read-ahead, just give-up */ clear_bit(BIO_UPTODATE, &bi->bi_flags); - finish_wait(&conf->wait_for_overlap, &w); break; } } + finish_wait(&conf->wait_for_overlap, &w); remaining = raid5_dec_bi_active_stripes(bi); if (remaining == 0) { ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch]raid5: make_request does less prepare wait 2014-04-09 3:25 ` Shaohua Li @ 2014-04-09 5:23 ` NeilBrown 0 siblings, 0 replies; 4+ messages in thread From: NeilBrown @ 2014-04-09 5:23 UTC (permalink / raw) To: Shaohua Li; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 6061 bytes --] On Wed, 9 Apr 2014 11:25:47 +0800 Shaohua Li <shli@kernel.org> wrote: > On Wed, Apr 09, 2014 at 12:08:08PM +1000, NeilBrown wrote: > > On Tue, 8 Apr 2014 12:05:07 +0800 Shaohua Li <shli@kernel.org> wrote: > > > > > > > > In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of > > > contention for sequential workload (or big request size workload). For such > > > workload, each bio includes several stripes. So we can just do > > > prepare_to_wait/finish_wait once for the whold bio instead of every stripe. > > > This reduces the lock contention completely for such workload. Random workload > > > might have the similar lock contention too, but I didn't see it yet, maybe > > > because my stroage is still not fast enough. > > > > > > Signed-off-by: Shaohua Li <shli@fusionio.com> > > > > Thanks, > > this looks every sensible, except ..... > > > > > > > --- > > > drivers/md/raid5.c | 18 ++++++++++++++---- > > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > > > Index: linux/drivers/md/raid5.c > > > =================================================================== > > > --- linux.orig/drivers/md/raid5.c 2014-04-08 09:04:20.000000000 +0800 > > > +++ linux/drivers/md/raid5.c 2014-04-08 09:11:08.201533487 +0800 > > > @@ -4552,6 +4552,8 @@ static void make_request(struct mddev *m > > > struct stripe_head *sh; > > > const int rw = bio_data_dir(bi); > > > int remaining; > > > + DEFINE_WAIT(w); > > > + bool do_prepare; > > > > > > if (unlikely(bi->bi_rw & REQ_FLUSH)) { > > > md_flush_request(mddev, bi); > > > @@ -4575,15 +4577,19 @@ static void make_request(struct mddev *m > > > bi->bi_next = NULL; > > > bi->bi_phys_segments = 1; /* over-loaded to count active stripes */ > > > > > > + prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); > > > for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) { > > > DEFINE_WAIT(w); > > ^^^^^^^^^^^^^^^ > > > > Shouldn't this be removed? If so, please resubmit with that line deleted and > > I'll apply the patch. > > Ah, that's silly, looks I sent wrong patch, sorry! Below is the correct one and I double > checked it's the one working for me. > > > Subject: raid5: make_request does less prepare wait > > In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of > contention for sequential workload (or big request size workload). For such > workload, each bio includes several stripes. So we can just do > prepare_to_wait/finish_wait once for the whold bio instead of every stripe. > This reduces the lock contention completely for such workload. Random workload > might have the similar lock contention too, but I didn't see it yet, maybe > because my stroage is still not fast enough. > > Signed-off-by: Shaohua Li <shli@fusionio.com> > --- > drivers/md/raid5.c | 19 ++++++++++++++----- > 1 file changed, 14 insertions(+), 5 deletions(-) > > Index: linux/drivers/md/raid5.c > =================================================================== > --- linux.orig/drivers/md/raid5.c 2014-04-08 12:02:54.485630590 +0800 > +++ linux/drivers/md/raid5.c 2014-04-09 11:03:04.276210597 +0800 > @@ -4552,6 +4552,8 @@ static void make_request(struct mddev *m > struct stripe_head *sh; > const int rw = bio_data_dir(bi); > int remaining; > + DEFINE_WAIT(w); > + bool do_prepare; > > if (unlikely(bi->bi_rw & REQ_FLUSH)) { > md_flush_request(mddev, bi); > @@ -4575,15 +4577,18 @@ static void make_request(struct mddev *m > bi->bi_next = NULL; > bi->bi_phys_segments = 1; /* over-loaded to count active stripes */ > > + prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); > for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) { > - DEFINE_WAIT(w); > int previous; > int seq; > > + do_prepare = false; > retry: > seq = read_seqcount_begin(&conf->gen_lock); > previous = 0; > - prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); > + if (do_prepare) > + prepare_to_wait(&conf->wait_for_overlap, &w, > + TASK_UNINTERRUPTIBLE); > if (unlikely(conf->reshape_progress != MaxSector)) { > /* spinlock is needed as reshape_progress may be > * 64bit on a 32bit platform, and so it might be > @@ -4604,6 +4609,7 @@ static void make_request(struct mddev *m > : logical_sector >= conf->reshape_safe) { > spin_unlock_irq(&conf->device_lock); > schedule(); > + do_prepare = true; > goto retry; > } > } > @@ -4640,6 +4646,7 @@ static void make_request(struct mddev *m > if (must_retry) { > release_stripe(sh); > schedule(); > + do_prepare = true; > goto retry; > } > } > @@ -4663,8 +4670,10 @@ static void make_request(struct mddev *m > prepare_to_wait(&conf->wait_for_overlap, > &w, TASK_INTERRUPTIBLE); > if (logical_sector >= mddev->suspend_lo && > - logical_sector < mddev->suspend_hi) > + logical_sector < mddev->suspend_hi) { > schedule(); > + do_prepare = true; > + } > goto retry; > } > > @@ -4677,9 +4686,9 @@ static void make_request(struct mddev *m > md_wakeup_thread(mddev->thread); > release_stripe(sh); > schedule(); > + do_prepare = true; > goto retry; > } > - finish_wait(&conf->wait_for_overlap, &w); > set_bit(STRIPE_HANDLE, &sh->state); > clear_bit(STRIPE_DELAYED, &sh->state); > if ((bi->bi_rw & REQ_SYNC) && > @@ -4689,10 +4698,10 @@ static void make_request(struct mddev *m > } else { > /* cannot get stripe for read-ahead, just give-up */ > clear_bit(BIO_UPTODATE, &bi->bi_flags); > - finish_wait(&conf->wait_for_overlap, &w); > break; > } > } > + finish_wait(&conf->wait_for_overlap, &w); > > remaining = raid5_dec_bi_active_stripes(bi); > if (remaining == 0) { Applied, thanks. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-04-09 5:23 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-08 4:05 [patch]raid5: make_request does less prepare wait Shaohua Li 2014-04-09 2:08 ` NeilBrown 2014-04-09 3:25 ` Shaohua Li 2014-04-09 5:23 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).