Re: [PATCH 4/4] raid1: Rewrite the implementation of iobarrier.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: majianpeng <majianpeng@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: [PATCH 4/4] raid1: Rewrite the implementation of iobarrier.
Date: Thu, 14 Nov 2013 17:44:38 +1100	[thread overview]
Message-ID: <20131114174438.7f2db666@notabene.brown> (raw)
In-Reply-To: <201310311114307462606@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 16056 bytes --]

On Thu, 31 Oct 2013 11:20:55 +0800 majianpeng <majianpeng@gmail.com> wrote:

Sorry for the delayed reply.


> >On Tue, 29 Oct 2013 09:30:14 +0800 majianpeng <majianpeng@gmail.com> wrote:
> >
> >
> >Nearly there!!  Just a few more details.  See below.
> >
> >
> >> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
> >> ---
> >>  drivers/md/raid1.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++------
> >>  drivers/md/raid1.h |  14 ++++++
> >>  2 files changed, 124 insertions(+), 12 deletions(-)
> >> 
> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> >> index b4a6dcd..5b311c0 100644
> >> --- a/drivers/md/raid1.c
> >> +++ b/drivers/md/raid1.c
> >> @@ -66,7 +66,8 @@
> >>   */
> >>  static int max_queued_requests = 1024;
> >>  
> >> -static void allow_barrier(struct r1conf *conf);
> >> +static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
> >> +				sector_t bi_sector);
> >>  static void lower_barrier(struct r1conf *conf);
> >>  
> >>  static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
> >> @@ -227,6 +228,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
> >>  	struct bio *bio = r1_bio->master_bio;
> >>  	int done;
> >>  	struct r1conf *conf = r1_bio->mddev->private;
> >> +	sector_t start_next_window = r1_bio->start_next_window;
> >> +	sector_t bi_sector = bio->bi_sector;
> >
> >This should  be r1_bio->sector, not bio->bi_sector.
> >They are often the same but if multiple r1_bios are needed for some reason
> >(e.g. bad blocks) they may not be.
> >
> No, allow_barrier() only for bio not for r1bio.It only do when bio->bi_phys_segments == 0.

Yes, I see that now, thanks.


> If this value is r1_bio->sector, the value may has different value as you said.
> So in allow_barrier():
>        if (start_next_window) {
>                 if (start_next_window == conf->start_next_window) {
>                         if (conf->start_next_window + NEXT_NORMALIO_DISTANCE
>                                 <= bi_sector)
>                                 conf->next_window_requests--;
>                         else 
>                                 conf->current_window_requests--;
>                 } else 
> It will has differnt result.
> 
> >>  
> >>  	if (bio->bi_phys_segments) {
> >>  		unsigned long flags;
> >> @@ -234,6 +237,11 @@ static void call_bio_endio(struct r1bio *r1_bio)
> >>  		bio->bi_phys_segments--;
> >>  		done = (bio->bi_phys_segments == 0);
> >>  		spin_unlock_irqrestore(&conf->device_lock, flags);
> >> +		/*
> >> +		 * make_request() might be waiting for
> >> +		 * bi_phys_segments to decrease
> >> +		 */
> >> +		wake_up(&conf->wait_barrier);
> >>  	} else
> >>  		done = 1;
> >>  
> >> @@ -245,7 +253,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
> >>  		 * Wake up any possible resync thread that waits for the device
> >>  		 * to go idle.
> >>  		 */
> >> -		allow_barrier(conf);
> >> +		allow_barrier(conf, start_next_window, bi_sector);
> >>  	}
> >>  }
> >>  
> >> @@ -827,10 +835,18 @@ static void raise_barrier(struct r1conf *conf)
> >>  	/* block any new IO from starting */
> >>  	conf->barrier++;
> >>  
> >> -	/* Now wait for all pending IO to complete */
> >> +	/* For those conditions we must wait:
> >> +	 * A:while the array is in frozen state
> >> +	 * B:while barrier >= RESYNC_DEPTH, meaning resync reach
> >> +	 *   the max count which allowed.
> >> +	 * C:next_resync + RESYNC_SECTORS > start_next_window, meaning
> >> +	 *   next resync will reach to window which normal bios are handling.
> >> +	 */
> >>  	wait_event_lock_irq(conf->wait_barrier,
> >>  			    !conf->array_frozen &&
> >> -			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
> >> +			    conf->barrier < RESYNC_DEPTH &&
> >> +			    (conf->start_next_window >=
> >> +			     conf->next_resync + RESYNC_SECTORS),
> >>  			    conf->resync_lock);
> >
> >You've removed the test on conf->nr_pending here, which I think is correct.
> >It counts 'read' requests as well.  Testing start_next_window serves the same
> >purpose as it is increased whenever current_window_requests reaches zero.
> >
> In func wait_barrier():
>         if (need_to_wait_for_sync(conf, bio)) {
>               [snip]
>         } else if (bio_data_dir(bio) == WRITE) {
>                 if (conf->next_resync + NEXT_NORMALIO_DISTANCE
>                                 <= bio->bi_sector) {
>                         if (conf->start_next_window == MaxSector)
>                                 conf->start_next_window =
>                                         conf->next_resync +
>                                                 NEXT_NORMALIO_DISTANCE;
> 
>                         if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
>                                 <= bio->bi_sector)
>                                 conf->next_window_requests++;
>                         else
>                                 conf->current_window_requests++;
>                 }
>                 if (bio->bi_sector >= conf->start_next_window)
>                         sector = conf->start_next_window;
>         }
> start_next_window only for bio_data_dir(bio) == WRITE. So resync can't wait read io to complete.
> 
> >However you having modified as similar test on nr_pending in wait_barrier().
I meant:       haven't modified a similar test ....
> >That worries me a bit.  Should that be changed to a test on start_next_window
> >to match the above change?
> For this condition, i only copy it.For this condition, i'm not know clearly.
> Can you give me more details about this?


Before your patch we know if there are pending requests (submitted but not
completed) which stop use from doing resync if conf->nr_pending > 0.
After your patch we cannot use that that test as it counts READ requests, and
requests that are outside the range being resynced.

The test is now a bit more complicated.  I think it is:

		 conf->start_next_window <
                 conf->next_resync + RESYNC_SECTORS


i.e. while start_next_window is less than next_resync+RESYNC_SECTORS there
cold be outstanding writes that are blocking resync.  As soon as those writes
complete, start_next_window will increase and the resync can complete.
So if start_next_window < next_resync+RESYNC_SECTORS and there are pending
request in current->bio_list, then then there is no point waiting for resync
(it cannot progress anyway) so we may as well submit another write.

So change:

		wait_event_lock_irq(conf->wait_barrier,
				    !conf->array_frozen &&
				    (!conf->barrier ||
				    (conf->nr_pending &&
				     current->bio_list &&
				     !bio_list_empty(current->bio_list))),
				    conf->resync_lock);

in wait barrier to

		wait_event_lock_irq(conf->wait_barrier,
				    !conf->array_frozen &&
				    (!conf->barrier ||
				    (conf->start_next_window < conf->next_resync + RESYNC_SECTORS &&
				     current->bio_list &&
				     !bio_list_empty(current->bio_list))),
				    conf->resync_lock);


> >
> >
> >>  
> >>  	spin_unlock_irq(&conf->resync_lock);
> >> @@ -846,10 +862,33 @@ static void lower_barrier(struct r1conf *conf)
> >>  	wake_up(&conf->wait_barrier);
> >>  }
> >>  
> >> -static void wait_barrier(struct r1conf *conf)
> >> +static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
> >> +{
> >> +	bool wait = false;
> >> +
> >> +	if (conf->array_frozen || !bio)
> >> +		wait = true;
> >> +	else if (conf->barrier && bio_data_dir(bio) == WRITE) {
> >> +		if (conf->next_resync < RESYNC_WINDOW_SECTORS)
> >> +			wait = true;
> >> +		else if ((conf->next_resync - RESYNC_WINDOW_SECTORS
> >> +				>= bio_end_sector(bio)) ||
> >> +			 (conf->next_resync + NEXT_NORMALIO_DISTANCE
> >> +				<= bio->bi_sector))
> >> +			wait = false;
> >> +		else
> >> +			wait = true;
> >> +	}
> >> +
> >> +	return wait;
> >> +}
> >> +
> >> +static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
> >>  {
> >> +	sector_t sector = 0;
> >> +
> >>  	spin_lock_irq(&conf->resync_lock);
> >> -	if (conf->barrier) {
> >> +	if (need_to_wait_for_sync(conf, bio)) {
> >>  		conf->nr_waiting++;
> >>  		/* Wait for the barrier to drop.
> >>  		 * However if there are already pending
> >> @@ -868,16 +907,57 @@ static void wait_barrier(struct r1conf *conf)
> >>  				     !bio_list_empty(current->bio_list))),
> >>  				    conf->resync_lock);
> >>  		conf->nr_waiting--;
> >> +	} else if (bio_data_dir(bio) == WRITE) {
> >> +		if (conf->next_resync + NEXT_NORMALIO_DISTANCE
> >> +				<= bio->bi_sector) {
> >> +			if (conf->start_next_window == MaxSector)
> >> +				conf->start_next_window =
> >> +					conf->next_resync +
> >> +						NEXT_NORMALIO_DISTANCE;
> >> +
> >> +			if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
> >> +				<= bio->bi_sector)
> >> +				conf->next_window_requests++;
> >> +			else
> >> +				conf->current_window_requests++;
> >> +		}
> >> +		if (bio->bi_sector >= conf->start_next_window)
> >> +			sector = conf->start_next_window;
> >
> >You aren't setting 'sector' if we needed to wait.  I don't think that is
> >correct, is it?
> >
> Yes.How about those code:
> spin_lock_irq(&conf->resync_lock);
> retry_check:
>         if (need_to_wait_for_sync(conf, bio)) {
>                 conf->nr_waiting++;
>                 /* Wait for the barrier to drop.
>                  * However if there are already pending
>                  * requests (preventing the barrier from
>                  * rising completely), and the
>                  * pre-process bio queue isn't empty,
>                  * then don't wait, as we need to empty
>                  * that queue to get the nr_pending
>                  * count down.
>                  */
>                 wait_event_lock_irq(conf->wait_barrier,
>                                     !conf->array_frozen &&
>                                     (!conf->barrier ||
>                                     (conf->nr_pending &&
>                                      current->bio_list &&
>                                      !bio_list_empty(current->bio_list))),
>                                     conf->resync_lock);
>                 conf->nr_waiting--;
> 				goto retry_check;
> >

It would look a lot better if you made it a while loop:

 while (need_to_wait....) {
     nr_waiting++
     wait_event.....
     nr_waiting--
 if (bio_data_dir.....


Thanks,
NeilBrown



> >>  	}
> >> +
> >>  	conf->nr_pending++;
> >>  	spin_unlock_irq(&conf->resync_lock);
> >> +	return sector;
> >>  }
> >>  
> >> -static void allow_barrier(struct r1conf *conf)
> >> +static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
> >> +				sector_t bi_sector)
> >>  {
> >>  	unsigned long flags;
> >> +
> >>  	spin_lock_irqsave(&conf->resync_lock, flags);
> >>  	conf->nr_pending--;
> >> +	if (start_next_window) {
> >> +		if (start_next_window == conf->start_next_window) {
> >> +			if (conf->start_next_window + NEXT_NORMALIO_DISTANCE
> >> +				<= bi_sector)
> >> +				conf->next_window_requests--;
> >> +			else
> >> +				conf->current_window_requests--;
> >> +		} else
> >> +			conf->current_window_requests--;
> >> +
> >> +		if (!conf->current_window_requests) {
> >> +			if (conf->next_window_requests) {
> >> +				conf->current_window_requests =
> >> +					conf->next_window_requests;
> >> +				conf->next_window_requests = 0;
> >> +				conf->start_next_window +=
> >> +					NEXT_NORMALIO_DISTANCE;
> >> +			} else
> >> +				conf->start_next_window = MaxSector;
> >> +		}
> >> +	}
> >>  	spin_unlock_irqrestore(&conf->resync_lock, flags);
> >>  	wake_up(&conf->wait_barrier);
> >>  }
> >> @@ -1012,6 +1092,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
> >>  	int first_clone;
> >>  	int sectors_handled;
> >>  	int max_sectors;
> >> +	sector_t start_next_window;
> >>  
> >>  	/*
> >>  	 * Register the new request and wait if the reconstruction
> >> @@ -1041,7 +1122,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
> >>  		finish_wait(&conf->wait_barrier, &w);
> >>  	}
> >>  
> >> -	wait_barrier(conf);
> >> +	start_next_window = wait_barrier(conf, bio);
> >>  
> >>  	bitmap = mddev->bitmap;
> >>  
> >> @@ -1162,6 +1243,7 @@ read_again:
> >>  
> >>  	disks = conf->raid_disks * 2;
> >>   retry_write:
> >> +	r1_bio->start_next_window = start_next_window;
> >>  	blocked_rdev = NULL;
> >>  	rcu_read_lock();
> >>  	max_sectors = r1_bio->sectors;
> >> @@ -1230,14 +1312,24 @@ read_again:
> >>  	if (unlikely(blocked_rdev)) {
> >>  		/* Wait for this device to become unblocked */
> >>  		int j;
> >> +		sector_t old = start_next_window;
> >>  
> >>  		for (j = 0; j < i; j++)
> >>  			if (r1_bio->bios[j])
> >>  				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
> >>  		r1_bio->state = 0;
> >> -		allow_barrier(conf);
> >> +		allow_barrier(conf, start_next_window, bio->bi_sector);
> >
> >I think this should be r1_bio->sector, not bio->bi_sector, for the same
> >reason as earlier.
> >
> The exaplaination is the same at above.
> 
> Thanks!
> Jianpeng Ma
> >
> >>  		md_wait_for_blocked_rdev(blocked_rdev, mddev);
> >> -		wait_barrier(conf);
> >> +		start_next_window = wait_barrier(conf, bio);
> >> +		/*
> >> +		 * We must make sure the multi r1bios of bio have
> >> +		 * the same value of bi_phys_segments
> >> +		 */
> >> +		if (bio->bi_phys_segments && old &&
> >> +			old != start_next_window)
> >> +			/*wait the former r1bio(s) completed*/
> >> +			wait_event(conf->wait_barrier,
> >> +					bio->bi_phys_segments == 1);
> >>  		goto retry_write;
> >>  	}
> >>  
> >> @@ -1437,11 +1529,14 @@ static void print_conf(struct r1conf *conf)
> >>  
> >>  static void close_sync(struct r1conf *conf)
> >>  {
> >> -	wait_barrier(conf);
> >> -	allow_barrier(conf);
> >> +	wait_barrier(conf, NULL);
> >> +	allow_barrier(conf, 0, 0);
> >>  
> >>  	mempool_destroy(conf->r1buf_pool);
> >>  	conf->r1buf_pool = NULL;
> >> +
> >> +	conf->next_resync = 0;
> >> +	conf->start_next_window = MaxSector;
> >>  }
> >>  
> >>  static int raid1_spare_active(struct mddev *mddev)
> >> @@ -2713,6 +2808,9 @@ static struct r1conf *setup_conf(struct mddev *mddev)
> >>  	conf->pending_count = 0;
> >>  	conf->recovery_disabled = mddev->recovery_disabled - 1;
> >>  
> >> +	conf->start_next_window = MaxSector;
> >> +	conf->current_window_requests = conf->next_window_requests = 0;
> >> +
> >>  	err = -EIO;
> >>  	for (i = 0; i < conf->raid_disks * 2; i++) {
> >>  
> >> diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
> >> index 331a98a..07425a1 100644
> >> --- a/drivers/md/raid1.h
> >> +++ b/drivers/md/raid1.h
> >> @@ -41,6 +41,19 @@ struct r1conf {
> >>  	 */
> >>  	sector_t		next_resync;
> >>  
> >> +	/*when raid1 start resync,we divide raid into four partitions
> >> +	 * |---------|--------------|---------------------|-------------|
> >> +	 *        next_resync   start_next_window        Pc
> >> +	 * Now start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
> >> +	 * Pc = start_next_window + NEXT_NORMALIO_DISTANCE
> >> +	 * current_window_requests means the count of normalIO between
> >> +	 * start_next_window and Pc.
> >> +	 * next_window_requests means the count of nornalIO after Pc.
> >> +	 * */
> >> +	sector_t		start_next_window;
> >> +	int			current_window_requests;
> >> +	int			next_window_requests;
> >> +
> >>  	spinlock_t		device_lock;
> >>  
> >>  	/* list of 'struct r1bio' that need to be processed by raid1d,
> >> @@ -112,6 +125,7 @@ struct r1bio {
> >>  						 * in this BehindIO request
> >>  						 */
> >>  	sector_t		sector;
> >> +	sector_t		start_next_window;
> >>  	int			sectors;
> >>  	unsigned long		state;
> >>  	struct mddev		*mddev;
> >
> >
> >Thanks,
> >NeilBrown
> >

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

next prev parent reply	other threads:[~2013-11-14  6:44 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-28 11:40 [PATCH 4/4] raid1: Rewrite the implementation of iobarrier majianpeng
2013-10-24  1:50 ` NeilBrown
2013-10-29  1:30   ` majianpeng
2013-10-31  2:33     ` NeilBrown
2013-10-31  3:20       ` majianpeng
2013-11-14  6:44         ` NeilBrown [this message]
2013-11-15  2:29           ` majianpeng
2013-11-15  3:42             ` NeilBrown
2013-11-15  6:55               ` majianpeng
2013-11-19  4:25                 ` NeilBrown
2013-11-19  7:53                   ` majianpeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131114174438.7f2db666@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=majianpeng@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.