* [patch]raid5: make_request does less prepare wait
@ 2014-04-08 4:05 Shaohua Li
2014-04-09 2:08 ` NeilBrown
0 siblings, 1 reply; 4+ messages in thread
From: Shaohua Li @ 2014-04-08 4:05 UTC (permalink / raw)
To: linux-raid, neilb
In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of
contention for sequential workload (or big request size workload). For such
workload, each bio includes several stripes. So we can just do
prepare_to_wait/finish_wait once for the whold bio instead of every stripe.
This reduces the lock contention completely for such workload. Random workload
might have the similar lock contention too, but I didn't see it yet, maybe
because my stroage is still not fast enough.
Signed-off-by: Shaohua Li <shli@fusionio.com>
---
drivers/md/raid5.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
Index: linux/drivers/md/raid5.c
===================================================================
--- linux.orig/drivers/md/raid5.c 2014-04-08 09:04:20.000000000 +0800
+++ linux/drivers/md/raid5.c 2014-04-08 09:11:08.201533487 +0800
@@ -4552,6 +4552,8 @@ static void make_request(struct mddev *m
struct stripe_head *sh;
const int rw = bio_data_dir(bi);
int remaining;
+ DEFINE_WAIT(w);
+ bool do_prepare;
if (unlikely(bi->bi_rw & REQ_FLUSH)) {
md_flush_request(mddev, bi);
@@ -4575,15 +4577,19 @@ static void make_request(struct mddev *m
bi->bi_next = NULL;
bi->bi_phys_segments = 1; /* over-loaded to count active stripes */
+ prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
DEFINE_WAIT(w);
int previous;
int seq;
+ do_prepare = false;
retry:
seq = read_seqcount_begin(&conf->gen_lock);
previous = 0;
- prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
+ if (do_prepare)
+ prepare_to_wait(&conf->wait_for_overlap, &w,
+ TASK_UNINTERRUPTIBLE);
if (unlikely(conf->reshape_progress != MaxSector)) {
/* spinlock is needed as reshape_progress may be
* 64bit on a 32bit platform, and so it might be
@@ -4604,6 +4610,7 @@ static void make_request(struct mddev *m
: logical_sector >= conf->reshape_safe) {
spin_unlock_irq(&conf->device_lock);
schedule();
+ do_prepare = true;
goto retry;
}
}
@@ -4640,6 +4647,7 @@ static void make_request(struct mddev *m
if (must_retry) {
release_stripe(sh);
schedule();
+ do_prepare = true;
goto retry;
}
}
@@ -4663,8 +4671,10 @@ static void make_request(struct mddev *m
prepare_to_wait(&conf->wait_for_overlap,
&w, TASK_INTERRUPTIBLE);
if (logical_sector >= mddev->suspend_lo &&
- logical_sector < mddev->suspend_hi)
+ logical_sector < mddev->suspend_hi) {
schedule();
+ do_prepare = true;
+ }
goto retry;
}
@@ -4677,9 +4687,9 @@ static void make_request(struct mddev *m
md_wakeup_thread(mddev->thread);
release_stripe(sh);
schedule();
+ do_prepare = true;
goto retry;
}
- finish_wait(&conf->wait_for_overlap, &w);
set_bit(STRIPE_HANDLE, &sh->state);
clear_bit(STRIPE_DELAYED, &sh->state);
if ((bi->bi_rw & REQ_SYNC) &&
@@ -4689,10 +4699,10 @@ static void make_request(struct mddev *m
} else {
/* cannot get stripe for read-ahead, just give-up */
clear_bit(BIO_UPTODATE, &bi->bi_flags);
- finish_wait(&conf->wait_for_overlap, &w);
break;
}
}
+ finish_wait(&conf->wait_for_overlap, &w);
remaining = raid5_dec_bi_active_stripes(bi);
if (remaining == 0) {
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch]raid5: make_request does less prepare wait
2014-04-08 4:05 [patch]raid5: make_request does less prepare wait Shaohua Li
@ 2014-04-09 2:08 ` NeilBrown
2014-04-09 3:25 ` Shaohua Li
0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2014-04-09 2:08 UTC (permalink / raw)
To: Shaohua Li; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 3998 bytes --]
On Tue, 8 Apr 2014 12:05:07 +0800 Shaohua Li <shli@kernel.org> wrote:
>
> In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of
> contention for sequential workload (or big request size workload). For such
> workload, each bio includes several stripes. So we can just do
> prepare_to_wait/finish_wait once for the whold bio instead of every stripe.
> This reduces the lock contention completely for such workload. Random workload
> might have the similar lock contention too, but I didn't see it yet, maybe
> because my stroage is still not fast enough.
>
> Signed-off-by: Shaohua Li <shli@fusionio.com>
Thanks,
this looks every sensible, except .....
> ---
> drivers/md/raid5.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> Index: linux/drivers/md/raid5.c
> ===================================================================
> --- linux.orig/drivers/md/raid5.c 2014-04-08 09:04:20.000000000 +0800
> +++ linux/drivers/md/raid5.c 2014-04-08 09:11:08.201533487 +0800
> @@ -4552,6 +4552,8 @@ static void make_request(struct mddev *m
> struct stripe_head *sh;
> const int rw = bio_data_dir(bi);
> int remaining;
> + DEFINE_WAIT(w);
> + bool do_prepare;
>
> if (unlikely(bi->bi_rw & REQ_FLUSH)) {
> md_flush_request(mddev, bi);
> @@ -4575,15 +4577,19 @@ static void make_request(struct mddev *m
> bi->bi_next = NULL;
> bi->bi_phys_segments = 1; /* over-loaded to count active stripes */
>
> + prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
> for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
> DEFINE_WAIT(w);
^^^^^^^^^^^^^^^
Shouldn't this be removed? If so, please resubmit with that line deleted and
I'll apply the patch.
Thanks,
NeilBrown
> int previous;
> int seq;
>
> + do_prepare = false;
> retry:
> seq = read_seqcount_begin(&conf->gen_lock);
> previous = 0;
> - prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
> + if (do_prepare)
> + prepare_to_wait(&conf->wait_for_overlap, &w,
> + TASK_UNINTERRUPTIBLE);
> if (unlikely(conf->reshape_progress != MaxSector)) {
> /* spinlock is needed as reshape_progress may be
> * 64bit on a 32bit platform, and so it might be
> @@ -4604,6 +4610,7 @@ static void make_request(struct mddev *m
> : logical_sector >= conf->reshape_safe) {
> spin_unlock_irq(&conf->device_lock);
> schedule();
> + do_prepare = true;
> goto retry;
> }
> }
> @@ -4640,6 +4647,7 @@ static void make_request(struct mddev *m
> if (must_retry) {
> release_stripe(sh);
> schedule();
> + do_prepare = true;
> goto retry;
> }
> }
> @@ -4663,8 +4671,10 @@ static void make_request(struct mddev *m
> prepare_to_wait(&conf->wait_for_overlap,
> &w, TASK_INTERRUPTIBLE);
> if (logical_sector >= mddev->suspend_lo &&
> - logical_sector < mddev->suspend_hi)
> + logical_sector < mddev->suspend_hi) {
> schedule();
> + do_prepare = true;
> + }
> goto retry;
> }
>
> @@ -4677,9 +4687,9 @@ static void make_request(struct mddev *m
> md_wakeup_thread(mddev->thread);
> release_stripe(sh);
> schedule();
> + do_prepare = true;
> goto retry;
> }
> - finish_wait(&conf->wait_for_overlap, &w);
> set_bit(STRIPE_HANDLE, &sh->state);
> clear_bit(STRIPE_DELAYED, &sh->state);
> if ((bi->bi_rw & REQ_SYNC) &&
> @@ -4689,10 +4699,10 @@ static void make_request(struct mddev *m
> } else {
> /* cannot get stripe for read-ahead, just give-up */
> clear_bit(BIO_UPTODATE, &bi->bi_flags);
> - finish_wait(&conf->wait_for_overlap, &w);
> break;
> }
> }
> + finish_wait(&conf->wait_for_overlap, &w);
>
> remaining = raid5_dec_bi_active_stripes(bi);
> if (remaining == 0) {
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch]raid5: make_request does less prepare wait
2014-04-09 2:08 ` NeilBrown
@ 2014-04-09 3:25 ` Shaohua Li
2014-04-09 5:23 ` NeilBrown
0 siblings, 1 reply; 4+ messages in thread
From: Shaohua Li @ 2014-04-09 3:25 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On Wed, Apr 09, 2014 at 12:08:08PM +1000, NeilBrown wrote:
> On Tue, 8 Apr 2014 12:05:07 +0800 Shaohua Li <shli@kernel.org> wrote:
>
> >
> > In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of
> > contention for sequential workload (or big request size workload). For such
> > workload, each bio includes several stripes. So we can just do
> > prepare_to_wait/finish_wait once for the whold bio instead of every stripe.
> > This reduces the lock contention completely for such workload. Random workload
> > might have the similar lock contention too, but I didn't see it yet, maybe
> > because my stroage is still not fast enough.
> >
> > Signed-off-by: Shaohua Li <shli@fusionio.com>
>
> Thanks,
> this looks every sensible, except .....
>
>
> > ---
> > drivers/md/raid5.c | 18 ++++++++++++++----
> > 1 file changed, 14 insertions(+), 4 deletions(-)
> >
> > Index: linux/drivers/md/raid5.c
> > ===================================================================
> > --- linux.orig/drivers/md/raid5.c 2014-04-08 09:04:20.000000000 +0800
> > +++ linux/drivers/md/raid5.c 2014-04-08 09:11:08.201533487 +0800
> > @@ -4552,6 +4552,8 @@ static void make_request(struct mddev *m
> > struct stripe_head *sh;
> > const int rw = bio_data_dir(bi);
> > int remaining;
> > + DEFINE_WAIT(w);
> > + bool do_prepare;
> >
> > if (unlikely(bi->bi_rw & REQ_FLUSH)) {
> > md_flush_request(mddev, bi);
> > @@ -4575,15 +4577,19 @@ static void make_request(struct mddev *m
> > bi->bi_next = NULL;
> > bi->bi_phys_segments = 1; /* over-loaded to count active stripes */
> >
> > + prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
> > for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
> > DEFINE_WAIT(w);
> ^^^^^^^^^^^^^^^
>
> Shouldn't this be removed? If so, please resubmit with that line deleted and
> I'll apply the patch.
Ah, that's silly, looks I sent wrong patch, sorry! Below is the correct one and I double
checked it's the one working for me.
Subject: raid5: make_request does less prepare wait
In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of
contention for sequential workload (or big request size workload). For such
workload, each bio includes several stripes. So we can just do
prepare_to_wait/finish_wait once for the whold bio instead of every stripe.
This reduces the lock contention completely for such workload. Random workload
might have the similar lock contention too, but I didn't see it yet, maybe
because my stroage is still not fast enough.
Signed-off-by: Shaohua Li <shli@fusionio.com>
---
drivers/md/raid5.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
Index: linux/drivers/md/raid5.c
===================================================================
--- linux.orig/drivers/md/raid5.c 2014-04-08 12:02:54.485630590 +0800
+++ linux/drivers/md/raid5.c 2014-04-09 11:03:04.276210597 +0800
@@ -4552,6 +4552,8 @@ static void make_request(struct mddev *m
struct stripe_head *sh;
const int rw = bio_data_dir(bi);
int remaining;
+ DEFINE_WAIT(w);
+ bool do_prepare;
if (unlikely(bi->bi_rw & REQ_FLUSH)) {
md_flush_request(mddev, bi);
@@ -4575,15 +4577,18 @@ static void make_request(struct mddev *m
bi->bi_next = NULL;
bi->bi_phys_segments = 1; /* over-loaded to count active stripes */
+ prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
- DEFINE_WAIT(w);
int previous;
int seq;
+ do_prepare = false;
retry:
seq = read_seqcount_begin(&conf->gen_lock);
previous = 0;
- prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
+ if (do_prepare)
+ prepare_to_wait(&conf->wait_for_overlap, &w,
+ TASK_UNINTERRUPTIBLE);
if (unlikely(conf->reshape_progress != MaxSector)) {
/* spinlock is needed as reshape_progress may be
* 64bit on a 32bit platform, and so it might be
@@ -4604,6 +4609,7 @@ static void make_request(struct mddev *m
: logical_sector >= conf->reshape_safe) {
spin_unlock_irq(&conf->device_lock);
schedule();
+ do_prepare = true;
goto retry;
}
}
@@ -4640,6 +4646,7 @@ static void make_request(struct mddev *m
if (must_retry) {
release_stripe(sh);
schedule();
+ do_prepare = true;
goto retry;
}
}
@@ -4663,8 +4670,10 @@ static void make_request(struct mddev *m
prepare_to_wait(&conf->wait_for_overlap,
&w, TASK_INTERRUPTIBLE);
if (logical_sector >= mddev->suspend_lo &&
- logical_sector < mddev->suspend_hi)
+ logical_sector < mddev->suspend_hi) {
schedule();
+ do_prepare = true;
+ }
goto retry;
}
@@ -4677,9 +4686,9 @@ static void make_request(struct mddev *m
md_wakeup_thread(mddev->thread);
release_stripe(sh);
schedule();
+ do_prepare = true;
goto retry;
}
- finish_wait(&conf->wait_for_overlap, &w);
set_bit(STRIPE_HANDLE, &sh->state);
clear_bit(STRIPE_DELAYED, &sh->state);
if ((bi->bi_rw & REQ_SYNC) &&
@@ -4689,10 +4698,10 @@ static void make_request(struct mddev *m
} else {
/* cannot get stripe for read-ahead, just give-up */
clear_bit(BIO_UPTODATE, &bi->bi_flags);
- finish_wait(&conf->wait_for_overlap, &w);
break;
}
}
+ finish_wait(&conf->wait_for_overlap, &w);
remaining = raid5_dec_bi_active_stripes(bi);
if (remaining == 0) {
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch]raid5: make_request does less prepare wait
2014-04-09 3:25 ` Shaohua Li
@ 2014-04-09 5:23 ` NeilBrown
0 siblings, 0 replies; 4+ messages in thread
From: NeilBrown @ 2014-04-09 5:23 UTC (permalink / raw)
To: Shaohua Li; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 6061 bytes --]
On Wed, 9 Apr 2014 11:25:47 +0800 Shaohua Li <shli@kernel.org> wrote:
> On Wed, Apr 09, 2014 at 12:08:08PM +1000, NeilBrown wrote:
> > On Tue, 8 Apr 2014 12:05:07 +0800 Shaohua Li <shli@kernel.org> wrote:
> >
> > >
> > > In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of
> > > contention for sequential workload (or big request size workload). For such
> > > workload, each bio includes several stripes. So we can just do
> > > prepare_to_wait/finish_wait once for the whold bio instead of every stripe.
> > > This reduces the lock contention completely for such workload. Random workload
> > > might have the similar lock contention too, but I didn't see it yet, maybe
> > > because my stroage is still not fast enough.
> > >
> > > Signed-off-by: Shaohua Li <shli@fusionio.com>
> >
> > Thanks,
> > this looks every sensible, except .....
> >
> >
> > > ---
> > > drivers/md/raid5.c | 18 ++++++++++++++----
> > > 1 file changed, 14 insertions(+), 4 deletions(-)
> > >
> > > Index: linux/drivers/md/raid5.c
> > > ===================================================================
> > > --- linux.orig/drivers/md/raid5.c 2014-04-08 09:04:20.000000000 +0800
> > > +++ linux/drivers/md/raid5.c 2014-04-08 09:11:08.201533487 +0800
> > > @@ -4552,6 +4552,8 @@ static void make_request(struct mddev *m
> > > struct stripe_head *sh;
> > > const int rw = bio_data_dir(bi);
> > > int remaining;
> > > + DEFINE_WAIT(w);
> > > + bool do_prepare;
> > >
> > > if (unlikely(bi->bi_rw & REQ_FLUSH)) {
> > > md_flush_request(mddev, bi);
> > > @@ -4575,15 +4577,19 @@ static void make_request(struct mddev *m
> > > bi->bi_next = NULL;
> > > bi->bi_phys_segments = 1; /* over-loaded to count active stripes */
> > >
> > > + prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
> > > for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
> > > DEFINE_WAIT(w);
> > ^^^^^^^^^^^^^^^
> >
> > Shouldn't this be removed? If so, please resubmit with that line deleted and
> > I'll apply the patch.
>
> Ah, that's silly, looks I sent wrong patch, sorry! Below is the correct one and I double
> checked it's the one working for me.
>
>
> Subject: raid5: make_request does less prepare wait
>
> In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of
> contention for sequential workload (or big request size workload). For such
> workload, each bio includes several stripes. So we can just do
> prepare_to_wait/finish_wait once for the whold bio instead of every stripe.
> This reduces the lock contention completely for such workload. Random workload
> might have the similar lock contention too, but I didn't see it yet, maybe
> because my stroage is still not fast enough.
>
> Signed-off-by: Shaohua Li <shli@fusionio.com>
> ---
> drivers/md/raid5.c | 19 ++++++++++++++-----
> 1 file changed, 14 insertions(+), 5 deletions(-)
>
> Index: linux/drivers/md/raid5.c
> ===================================================================
> --- linux.orig/drivers/md/raid5.c 2014-04-08 12:02:54.485630590 +0800
> +++ linux/drivers/md/raid5.c 2014-04-09 11:03:04.276210597 +0800
> @@ -4552,6 +4552,8 @@ static void make_request(struct mddev *m
> struct stripe_head *sh;
> const int rw = bio_data_dir(bi);
> int remaining;
> + DEFINE_WAIT(w);
> + bool do_prepare;
>
> if (unlikely(bi->bi_rw & REQ_FLUSH)) {
> md_flush_request(mddev, bi);
> @@ -4575,15 +4577,18 @@ static void make_request(struct mddev *m
> bi->bi_next = NULL;
> bi->bi_phys_segments = 1; /* over-loaded to count active stripes */
>
> + prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
> for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
> - DEFINE_WAIT(w);
> int previous;
> int seq;
>
> + do_prepare = false;
> retry:
> seq = read_seqcount_begin(&conf->gen_lock);
> previous = 0;
> - prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
> + if (do_prepare)
> + prepare_to_wait(&conf->wait_for_overlap, &w,
> + TASK_UNINTERRUPTIBLE);
> if (unlikely(conf->reshape_progress != MaxSector)) {
> /* spinlock is needed as reshape_progress may be
> * 64bit on a 32bit platform, and so it might be
> @@ -4604,6 +4609,7 @@ static void make_request(struct mddev *m
> : logical_sector >= conf->reshape_safe) {
> spin_unlock_irq(&conf->device_lock);
> schedule();
> + do_prepare = true;
> goto retry;
> }
> }
> @@ -4640,6 +4646,7 @@ static void make_request(struct mddev *m
> if (must_retry) {
> release_stripe(sh);
> schedule();
> + do_prepare = true;
> goto retry;
> }
> }
> @@ -4663,8 +4670,10 @@ static void make_request(struct mddev *m
> prepare_to_wait(&conf->wait_for_overlap,
> &w, TASK_INTERRUPTIBLE);
> if (logical_sector >= mddev->suspend_lo &&
> - logical_sector < mddev->suspend_hi)
> + logical_sector < mddev->suspend_hi) {
> schedule();
> + do_prepare = true;
> + }
> goto retry;
> }
>
> @@ -4677,9 +4686,9 @@ static void make_request(struct mddev *m
> md_wakeup_thread(mddev->thread);
> release_stripe(sh);
> schedule();
> + do_prepare = true;
> goto retry;
> }
> - finish_wait(&conf->wait_for_overlap, &w);
> set_bit(STRIPE_HANDLE, &sh->state);
> clear_bit(STRIPE_DELAYED, &sh->state);
> if ((bi->bi_rw & REQ_SYNC) &&
> @@ -4689,10 +4698,10 @@ static void make_request(struct mddev *m
> } else {
> /* cannot get stripe for read-ahead, just give-up */
> clear_bit(BIO_UPTODATE, &bi->bi_flags);
> - finish_wait(&conf->wait_for_overlap, &w);
> break;
> }
> }
> + finish_wait(&conf->wait_for_overlap, &w);
>
> remaining = raid5_dec_bi_active_stripes(bi);
> if (remaining == 0) {
Applied, thanks.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-04-09 5:23 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-08 4:05 [patch]raid5: make_request does less prepare wait Shaohua Li
2014-04-09 2:08 ` NeilBrown
2014-04-09 3:25 ` Shaohua Li
2014-04-09 5:23 ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).