[PATCH 4/4] raid1: Rewrite the implementation of iobarrier.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 4/4] raid1: Rewrite the implementation of iobarrier.
@ 2013-08-28 11:40 majianpeng
  2013-10-24  1:50 ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: majianpeng @ 2013-08-28 11:40 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

There is a iobarrier in raid1 because the contend between normalIO and
resyncIO.It suspend all normal IO when reysync.
But if normalIO is outrange the resync windwos,there is no contend.
So I rewrite the iobarrier.

I patition the whole space into five parts.
|---------|-----------|------------|----------------|-------|
         Pa    next_resync   start_next_window      Pb

Firstly i introduce some concepts:
1:RESYNC_WINDOW: For resync, there are 32 sync requests at most.A sync
request is RESYNC_BLOCK_SIZE(64*1024).So the RESYNC_WINDOW is 32 *
RESYNC_BLOCK_SIZE, that is 2MB.
2:NEXT_NORMALIO_DISTANCE: it indicate the distance between next_resync
and start_next_window.It also indicate the distance between
start_next_window and Pb.Now it is 3 * RESYNC_WINDOW_SIZE.It can tune.
3:next_resync mean the next start sector which will being do sync.
4:Pa means a position which before RESYNC_WINDOW distance from
next_resync.
5:start_next_window means a position which after NEXT_NORMALIO_DISTANCE distance from
next_resync.For normalio after this position,it can't wait resyncio
6:Pb means a position which after 2 * NEXT_NORMALIO_DISTANCE distance from
next_resync.
7:current_window_requests means the count of normalIO between Pb and
start_next_window.
8:next_window_requests means the count of nonmalIO after Pb.

NormalIO will be partitioned into four types:
NoIO1: it means the end sector of bio is smaller or equal the Pa.
NoIO2: it means the start sector of bio larger or equal Pb
NoIO3: it means the start sector of bio larger or equal	start_next_window.
NoIO4: it means the location between Pa and start_next_window

For NoIO1,it don't use iobarrier.
For NoIO4,it used the original iobarrier mechanism.The nornalIO and
syncIO must be contend.
For NoIO2/3, i add two filed in struct r1conf "current_window_requests,
next_window_requests".They indicate the count of two types.
For those, they don't wait.They only add the relevant parameter.

For resync action, if there are NoIO4s, it must be wait.If not,it
forward.But if current_window_requests > 0 and sync action reached to the
start_next_window,it must wait until the current_window_requests became
zero.If current_window_requests became zero,the start_next_window also
forward. The current_window_requests will replaced by
next_window_requests.

There is a problem which when and how to change from NoIO2 to NoIO3.Only
so, the sync action can forward.

I add a filed in struct r1conf "start_next_window"..What condition will change?
A: if start_next_window == MaxSector, it means there are no NoIO2/3.
   So start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
B: if current_window_requests == 0 && next_window_requests != 0, it
   means start_next_window move to Pb

There is anthor problem which how to differentiate when NoIO2 became
NoIO3.
For example, there are many bios which are NoIO2 and a bio which is
NoIO3. NoIO3 firstly completed,so the bios of NoIO2 became NoIO3.
At generil,we should use flags and list_head. The codes should:
>list_splice_init(NoIO2, NoIO3);
>current_window_requests = next_window_requests;
>next_window_requests = 0;
>list_for_each_entry(NoIO3){
>       clear_bit(NoIO2_FLAG),
>       set_bit(NoIO3_FLAG,
>}
If there are many NoIO2, it will take too long in list_for_each_entry.
Avoid this, i add a filed in struct r1bio "start_next_window".
I used this to record the position conf->start_next_window when call wait_barrier in func
make_request.
For NoIO1/4, the value is zero.
In func allow_barrier, it check the conf->start_next_window
If r1bio->stat_next_window == conf->start_next_window,it mean
there is no transition which NoIO2 to NoIO3.In this condtion
 if bio->bi_sector > conf->start_next_window + NEXT_NORMALIO_DISTANCE
    it means the bio is NoIO2;
 else
    it means the bio si NoIO3.

If r1bio->start_next_window != conf->start_next_window,it mean
there is a trnasiton which NoIO2 to NoIO3.Because there is at most one
transtions.So it only means the bio is NoIO2.

For one bio,there are many r1bio.So we make sure the
r1bio->start_next_window is the same value.
If we met blocked_dev, it must call allow_barrier and wait_barrier.
So the first and the second value of conf->start_next_window will
change.If there are many r1bio's with differnet start_next_window.
For the relevant bio, it depent on the last value of r1bio.
It will cause error.To avoid this, we must wait previous r1bios completes.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
---
 drivers/md/raid1.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++------
 drivers/md/raid1.h |  14 +++++++
 2 files changed, 121 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 08f076a..4499870 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -66,7 +66,8 @@
  */
 static int max_queued_requests = 1024;
 
-static void allow_barrier(struct r1conf *conf);
+static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
+				sector_t bi_sector);
 static void lower_barrier(struct r1conf *conf);
 
 static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
@@ -227,6 +228,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
 	struct bio *bio = r1_bio->master_bio;
 	int done;
 	struct r1conf *conf = r1_bio->mddev->private;
+	sector_t start_next_window = r1_bio->start_next_window;
+	sector_t bi_sector = bio->bi_sector;
 
 	if (bio->bi_phys_segments) {
 		unsigned long flags;
@@ -234,6 +237,11 @@ static void call_bio_endio(struct r1bio *r1_bio)
 		bio->bi_phys_segments--;
 		done = (bio->bi_phys_segments == 0);
 		spin_unlock_irqrestore(&conf->device_lock, flags);
+		/*
+		 * make_request might be waiting for
+		 * bi_phys_sectors to decrease
+		 */
+		wake_up(&conf->wait_barrier);
 	} else
 		done = 1;
 
@@ -245,7 +253,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
 		 * Wake up any possible resync thread that waits for the device
 		 * to go idle.
 		 */
-		allow_barrier(conf);
+		allow_barrier(conf, start_next_window, bi_sector);
 	}
 }
 
@@ -827,10 +835,19 @@ static void raise_barrier(struct r1conf *conf)
 	/* block any new IO from starting */
 	conf->barrier++;
 
-	/* Now wait for all pending IO to complete */
+	/* For those conditions, we must wait
+	* A:array is in frozen state.
+	* B:conf->nr_pending > 0 mean in sync window there are normal bios.
+	* C:total barrier are more than RESYNC_DEPTH
+	* D:conf->next_resync + RESYNC_SECTORS > conf->start_next_window
+	*   mean the next nornalIO window not clear and next syncIO will be into
+	*   this window.
+	*/
 	wait_event_lock_irq(conf->wait_barrier,
 			    !conf->array_frozen && 
-			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
+			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH &&
+			    (conf->next_resync + RESYNC_SECTORS
+				    <= conf->start_next_window),
 			    conf->resync_lock);
 
 	spin_unlock_irq(&conf->resync_lock);
@@ -846,10 +863,43 @@ static void lower_barrier(struct r1conf *conf)
 	wake_up(&conf->wait_barrier);
 }
 
-static void wait_barrier(struct r1conf *conf)
+static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
 {
+	bool wait = false;
+
+	if (conf->array_frozen || !bio)
+		wait = true;
+	else if (conf->barrier && bio_data_dir(bio) == WRITE) {
+		if (conf->next_resync < RESYNC_WINDOW_SECTORS)
+			wait = true;
+		else if (conf->next_resync - RESYNC_WINDOW_SECTORS >=
+				bio_end_sector(bio))
+			wait = false;
+		else if (conf->next_resync + NEXT_NORMALIO_DISTANCE
+				<= bio->bi_sector) {
+			if (conf->start_next_window == MaxSector)
+				conf->start_next_window =
+					conf->next_resync +
+						NEXT_NORMALIO_DISTANCE;
+
+			if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
+				<= bio->bi_sector)
+				conf->next_window_requests++;
+			else
+				conf->current_window_requests++;
+		} else
+			wait = true;
+	}
+
+	return wait;
+}
+
+static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
+{
+	sector_t sector = 0;
+
 	spin_lock_irq(&conf->resync_lock);
-	if (conf->barrier) {
+	if (need_to_wait_for_sync(conf, bio)) {
 		conf->nr_waiting++;
 		/* Wait for the barrier to drop.
 		 * However if there are already pending
@@ -868,16 +918,42 @@ static void wait_barrier(struct r1conf *conf)
 				     !bio_list_empty(current->bio_list))),
 				    conf->resync_lock);
 		conf->nr_waiting--;
-	}
+	} else if (bio_data_dir(bio) == WRITE &&
+			bio->bi_sector >= conf->start_next_window)
+		sector = conf->start_next_window;
 	conf->nr_pending++;
 	spin_unlock_irq(&conf->resync_lock);
+	return sector;
 }
 
-static void allow_barrier(struct r1conf *conf)
+static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
+				sector_t bi_sector)
 {
 	unsigned long flags;
+
 	spin_lock_irqsave(&conf->resync_lock, flags);
 	conf->nr_pending--;
+	if (start_next_window) {
+		if (start_next_window == conf->start_next_window) {
+			if (conf->start_next_window + NEXT_NORMALIO_DISTANCE
+				<= bi_sector)
+				conf->next_window_requests--;
+			else
+				conf->current_window_requests--;
+		} else
+			conf->current_window_requests--;
+
+		if (!conf->current_window_requests) {
+			if (conf->next_window_requests) {
+				conf->current_window_requests =
+					conf->next_window_requests;
+				conf->next_window_requests = 0;
+				conf->start_next_window +=
+					NEXT_NORMALIO_DISTANCE;
+			} else
+				conf->start_next_window = MaxSector;
+		}
+	}
 	spin_unlock_irqrestore(&conf->resync_lock, flags);
 	wake_up(&conf->wait_barrier);
 }
@@ -1012,6 +1088,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 	int first_clone;
 	int sectors_handled;
 	int max_sectors;
+	sector_t start_next_window;
 
 	/*
 	 * Register the new request and wait if the reconstruction
@@ -1041,7 +1118,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 		finish_wait(&conf->wait_barrier, &w);
 	}
 
-	wait_barrier(conf);
+	start_next_window = wait_barrier(conf, bio);
 
 	bitmap = mddev->bitmap;
 
@@ -1162,6 +1239,7 @@ read_again:
 
 	disks = conf->raid_disks * 2;
  retry_write:
+	r1_bio->start_next_window = start_next_window;
 	blocked_rdev = NULL;
 	rcu_read_lock();
 	max_sectors = r1_bio->sectors;
@@ -1230,14 +1308,24 @@ read_again:
 	if (unlikely(blocked_rdev)) {
 		/* Wait for this device to become unblocked */
 		int j;
+		sector_t old = start_next_window;
 
 		for (j = 0; j < i; j++)
 			if (r1_bio->bios[j])
 				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
 		r1_bio->state = 0;
-		allow_barrier(conf);
+		allow_barrier(conf, start_next_window, bio->bi_sector);
 		md_wait_for_blocked_rdev(blocked_rdev, mddev);
-		wait_barrier(conf);
+		start_next_window = wait_barrier(conf, bio);
+		/*
+		 * We must make sure the multi r1bios of bio have
+		 * the same value of after_threetimes_resync_sector
+		 */
+		if (bio->bi_phys_segments && old &&
+			old != start_next_window)
+			/*wait the former r1bio(s) completed*/
+			wait_event(conf->wait_barrier,
+					bio->bi_phys_segments == 1);
 		goto retry_write;
 	}
 
@@ -1437,11 +1525,14 @@ static void print_conf(struct r1conf *conf)
 
 static void close_sync(struct r1conf *conf)
 {
-	wait_barrier(conf);
-	allow_barrier(conf);
+	wait_barrier(conf, NULL);
+	allow_barrier(conf, 0, 0);
 
 	mempool_destroy(conf->r1buf_pool);
 	conf->r1buf_pool = NULL;
+
+	conf->next_resync = 0;
+	conf->start_next_window = MaxSector;
 }
 
 static int raid1_spare_active(struct mddev *mddev)
@@ -2712,6 +2803,9 @@ static struct r1conf *setup_conf(struct mddev *mddev)
 	conf->pending_count = 0;
 	conf->recovery_disabled = mddev->recovery_disabled - 1;
 
+	conf->start_next_window = MaxSector;
+	conf->current_window_requests = conf->next_window_requests = 0;
+
 	err = -EIO;
 	for (i = 0; i < conf->raid_disks * 2; i++) {
 
diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index 331a98a..07425a1 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -41,6 +41,19 @@ struct r1conf {
 	 */
 	sector_t		next_resync;
 
+	/*when raid1 start resync,we divide raid into four partitions
+	 * |---------|--------------|---------------------|-------------|
+	 *        next_resync   start_next_window        Pc
+	 * Now start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
+	 * Pc = start_next_window + NEXT_NORMALIO_DISTANCE
+	 * current_window_requests means the count of normalIO between
+	 * start_next_window and Pc.
+	 * next_window_requests means the count of nornalIO after Pc.
+	 * */
+	sector_t		start_next_window;
+	int			current_window_requests;
+	int			next_window_requests;
+
 	spinlock_t		device_lock;
 
 	/* list of 'struct r1bio' that need to be processed by raid1d,
@@ -112,6 +125,7 @@ struct r1bio {
 						 * in this BehindIO request
 						 */
 	sector_t		sector;
+	sector_t		start_next_window;
 	int			sectors;
 	unsigned long		state;
 	struct mddev		*mddev;
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] raid1: Rewrite the implementation of iobarrier.
  2013-08-28 11:40 [PATCH 4/4] raid1: Rewrite the implementation of iobarrier majianpeng
@ 2013-10-24  1:50 ` NeilBrown
  2013-10-29  1:30   ` majianpeng
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2013-10-24  1:50 UTC (permalink / raw)
  To: majianpeng; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 15935 bytes --]


Hi,
 sorry for the excessive delay in reviewing these.

 I've applied the first three, though I fixed some careless spelling errors in
 the patch descriptions and fixed up some of the grammar.

 This one is fairly good - just a couple of fixes needed.
 It is good that you have a nice details patch description at the top, but it
 is a bit hard to read.  Fixing the spelling and improving the formatting
 would help a lot....

 See below for comments in the code....

On Wed, 28 Aug 2013 19:40:54 +0800 majianpeng <majianpeng@gmail.com> wrote:

> There is a iobarrier in raid1 because the contend between normalIO and
> resyncIO.It suspend all normal IO when reysync.
> But if normalIO is outrange the resync windwos,there is no contend.
> So I rewrite the iobarrier.
> 
> I patition the whole space into five parts.
> |---------|-----------|------------|----------------|-------|
>          Pa    next_resync   start_next_window      Pb
> 
> Firstly i introduce some concepts:
> 1:RESYNC_WINDOW: For resync, there are 32 sync requests at most.A sync
> request is RESYNC_BLOCK_SIZE(64*1024).So the RESYNC_WINDOW is 32 *
> RESYNC_BLOCK_SIZE, that is 2MB.
> 2:NEXT_NORMALIO_DISTANCE: it indicate the distance between next_resync
> and start_next_window.It also indicate the distance between
> start_next_window and Pb.Now it is 3 * RESYNC_WINDOW_SIZE.It can tune.
> 3:next_resync mean the next start sector which will being do sync.
> 4:Pa means a position which before RESYNC_WINDOW distance from
> next_resync.
> 5:start_next_window means a position which after NEXT_NORMALIO_DISTANCE distance from
> next_resync.For normalio after this position,it can't wait resyncio
> 6:Pb means a position which after 2 * NEXT_NORMALIO_DISTANCE distance from
> next_resync.
> 7:current_window_requests means the count of normalIO between Pb and
> start_next_window.
> 8:next_window_requests means the count of nonmalIO after Pb.
> 
> NormalIO will be partitioned into four types:
> NoIO1: it means the end sector of bio is smaller or equal the Pa.
> NoIO2: it means the start sector of bio larger or equal Pb
> NoIO3: it means the start sector of bio larger or equal	start_next_window.
> NoIO4: it means the location between Pa and start_next_window
> 
> For NoIO1,it don't use iobarrier.
> For NoIO4,it used the original iobarrier mechanism.The nornalIO and
> syncIO must be contend.
> For NoIO2/3, i add two filed in struct r1conf "current_window_requests,
> next_window_requests".They indicate the count of two types.
> For those, they don't wait.They only add the relevant parameter.
> 
> For resync action, if there are NoIO4s, it must be wait.If not,it
> forward.But if current_window_requests > 0 and sync action reached to the
> start_next_window,it must wait until the current_window_requests became
> zero.If current_window_requests became zero,the start_next_window also
> forward. The current_window_requests will replaced by
> next_window_requests.
> 
> There is a problem which when and how to change from NoIO2 to NoIO3.Only
> so, the sync action can forward.
> 
> I add a filed in struct r1conf "start_next_window"..What condition will change?
> A: if start_next_window == MaxSector, it means there are no NoIO2/3.
>    So start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
> B: if current_window_requests == 0 && next_window_requests != 0, it
>    means start_next_window move to Pb
> 
> There is anthor problem which how to differentiate when NoIO2 became
> NoIO3.
> For example, there are many bios which are NoIO2 and a bio which is
> NoIO3. NoIO3 firstly completed,so the bios of NoIO2 became NoIO3.
> At generil,we should use flags and list_head. The codes should:
> >list_splice_init(NoIO2, NoIO3);
> >current_window_requests = next_window_requests;
> >next_window_requests = 0;
> >list_for_each_entry(NoIO3){
> >       clear_bit(NoIO2_FLAG),
> >       set_bit(NoIO3_FLAG,
> >}
> If there are many NoIO2, it will take too long in list_for_each_entry.
> Avoid this, i add a filed in struct r1bio "start_next_window".
> I used this to record the position conf->start_next_window when call wait_barrier in func
> make_request.
> For NoIO1/4, the value is zero.
> In func allow_barrier, it check the conf->start_next_window
> If r1bio->stat_next_window == conf->start_next_window,it mean
> there is no transition which NoIO2 to NoIO3.In this condtion
>  if bio->bi_sector > conf->start_next_window + NEXT_NORMALIO_DISTANCE
>     it means the bio is NoIO2;
>  else
>     it means the bio si NoIO3.
> 
> If r1bio->start_next_window != conf->start_next_window,it mean
> there is a trnasiton which NoIO2 to NoIO3.Because there is at most one
> transtions.So it only means the bio is NoIO2.
> 
> For one bio,there are many r1bio.So we make sure the
> r1bio->start_next_window is the same value.
> If we met blocked_dev, it must call allow_barrier and wait_barrier.
> So the first and the second value of conf->start_next_window will
> change.If there are many r1bio's with differnet start_next_window.
> For the relevant bio, it depent on the last value of r1bio.
> It will cause error.To avoid this, we must wait previous r1bios completes.
> 
> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
> ---
>  drivers/md/raid1.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++------
>  drivers/md/raid1.h |  14 +++++++
>  2 files changed, 121 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 08f076a..4499870 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -66,7 +66,8 @@
>   */
>  static int max_queued_requests = 1024;
>  
> -static void allow_barrier(struct r1conf *conf);
> +static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
> +				sector_t bi_sector);
>  static void lower_barrier(struct r1conf *conf);
>  
>  static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
> @@ -227,6 +228,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
>  	struct bio *bio = r1_bio->master_bio;
>  	int done;
>  	struct r1conf *conf = r1_bio->mddev->private;
> +	sector_t start_next_window = r1_bio->start_next_window;
> +	sector_t bi_sector = bio->bi_sector;
>  
>  	if (bio->bi_phys_segments) {
>  		unsigned long flags;
> @@ -234,6 +237,11 @@ static void call_bio_endio(struct r1bio *r1_bio)
>  		bio->bi_phys_segments--;
>  		done = (bio->bi_phys_segments == 0);
>  		spin_unlock_irqrestore(&conf->device_lock, flags);
> +		/*
> +		 * make_request might be waiting for
> +		 * bi_phys_sectors to decrease
> +		 */
> +		wake_up(&conf->wait_barrier);

bi_phys_sectors???  I think you mean bi_phys_segments.  This sort of
attention to detail is really important.



>  	} else
>  		done = 1;
>  
> @@ -245,7 +253,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
>  		 * Wake up any possible resync thread that waits for the device
>  		 * to go idle.
>  		 */
> -		allow_barrier(conf);
> +		allow_barrier(conf, start_next_window, bi_sector);
>  	}
>  }
>  
> @@ -827,10 +835,19 @@ static void raise_barrier(struct r1conf *conf)
>  	/* block any new IO from starting */
>  	conf->barrier++;
>  
> -	/* Now wait for all pending IO to complete */
> +	/* For those conditions, we must wait
> +	* A:array is in frozen state.
> +	* B:conf->nr_pending > 0 mean in sync window there are normal bios.
> +	* C:total barrier are more than RESYNC_DEPTH
> +	* D:conf->next_resync + RESYNC_SECTORS > conf->start_next_window
> +	*   mean the next nornalIO window not clear and next syncIO will be into
> +	*   this window.
> +	*/

This comment is indented badly.

  /* For these conditions we must wait:
   * A: while the array is in the frozen state
   * B: while nr_pending > 0, meaning that there are normal bios in the
   *    resync window
   ...


>  	wait_event_lock_irq(conf->wait_barrier,
>  			    !conf->array_frozen && 
> -			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
> +			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH &&
> +			    (conf->next_resync + RESYNC_SECTORS
> +				    <= conf->start_next_window),

I think conditions are much more readable if the thing that you expect to
change comes first.
In this case we might need to wait until start_next_window increases, so
    conf->start_next_window >= conf->next_resync + RESYNC_SECTORS

as it is, it looks like you are waiting for next_resync to decrease.


>  			    conf->resync_lock);
>  
>  	spin_unlock_irq(&conf->resync_lock);
> @@ -846,10 +863,43 @@ static void lower_barrier(struct r1conf *conf)
>  	wake_up(&conf->wait_barrier);
>  }
>  
> -static void wait_barrier(struct r1conf *conf)
> +static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
>  {
> +	bool wait = false;
> +
> +	if (conf->array_frozen || !bio)
> +		wait = true;
> +	else if (conf->barrier && bio_data_dir(bio) == WRITE) {
> +		if (conf->next_resync < RESYNC_WINDOW_SECTORS)
> +			wait = true;
> +		else if (conf->next_resync - RESYNC_WINDOW_SECTORS >=
> +				bio_end_sector(bio))
> +			wait = false;
> +		else if (conf->next_resync + NEXT_NORMALIO_DISTANCE
> +				<= bio->bi_sector) {
> +			if (conf->start_next_window == MaxSector)
> +				conf->start_next_window =
> +					conf->next_resync +
> +						NEXT_NORMALIO_DISTANCE;
> +
> +			if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
> +				<= bio->bi_sector)
> +				conf->next_window_requests++;
> +			else
> +				conf->current_window_requests++;
> +		} else
> +			wait = true;
> +	}
> +
> +	return wait;
> +}

I really don't like it that this function changes start_next_window and
*_window_requests().
It looks like it is a simple function that tests a condition.  But it
actually has a side-effect as well.  And it has the side-effect when it
returns false which is extra confusing.

Also, you are incrementing *_window_requests for READ requests, which you
don't need to do.

I would change it so the below looks something like:
 if (need_to_wait_for_sync(conf, bio)) {
     ....
 } else if (bio_data_dir(bio) == WRITE) {
    possibly update start_next_window and *_window_requests here
 }




> +
> +static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
> +{
> +	sector_t sector = 0;
> +
>  	spin_lock_irq(&conf->resync_lock);
> -	if (conf->barrier) {
> +	if (need_to_wait_for_sync(conf, bio)) {
>  		conf->nr_waiting++;
>  		/* Wait for the barrier to drop.
>  		 * However if there are already pending
> @@ -868,16 +918,42 @@ static void wait_barrier(struct r1conf *conf)
>  				     !bio_list_empty(current->bio_list))),
>  				    conf->resync_lock);
>  		conf->nr_waiting--;
> -	}
> +	} else if (bio_data_dir(bio) == WRITE &&
> +			bio->bi_sector >= conf->start_next_window)
> +		sector = conf->start_next_window;
>  	conf->nr_pending++;
>  	spin_unlock_irq(&conf->resync_lock);
> +	return sector;
>  }
>  
> -static void allow_barrier(struct r1conf *conf)
> +static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
> +				sector_t bi_sector)
>  {
>  	unsigned long flags;
> +
>  	spin_lock_irqsave(&conf->resync_lock, flags);
>  	conf->nr_pending--;
> +	if (start_next_window) {
> +		if (start_next_window == conf->start_next_window) {
> +			if (conf->start_next_window + NEXT_NORMALIO_DISTANCE
> +				<= bi_sector)
> +				conf->next_window_requests--;
> +			else
> +				conf->current_window_requests--;
> +		} else
> +			conf->current_window_requests--;
> +
> +		if (!conf->current_window_requests) {
> +			if (conf->next_window_requests) {
> +				conf->current_window_requests =
> +					conf->next_window_requests;
> +				conf->next_window_requests = 0;
> +				conf->start_next_window +=
> +					NEXT_NORMALIO_DISTANCE;
> +			} else
> +				conf->start_next_window = MaxSector;
> +		}
> +	}
>  	spin_unlock_irqrestore(&conf->resync_lock, flags);
>  	wake_up(&conf->wait_barrier);
>  }
> @@ -1012,6 +1088,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
>  	int first_clone;
>  	int sectors_handled;
>  	int max_sectors;
> +	sector_t start_next_window;
>  
>  	/*
>  	 * Register the new request and wait if the reconstruction
> @@ -1041,7 +1118,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
>  		finish_wait(&conf->wait_barrier, &w);
>  	}
>  
> -	wait_barrier(conf);
> +	start_next_window = wait_barrier(conf, bio);
>  
>  	bitmap = mddev->bitmap;
>  
> @@ -1162,6 +1239,7 @@ read_again:
>  
>  	disks = conf->raid_disks * 2;
>   retry_write:
> +	r1_bio->start_next_window = start_next_window;
>  	blocked_rdev = NULL;
>  	rcu_read_lock();
>  	max_sectors = r1_bio->sectors;
> @@ -1230,14 +1308,24 @@ read_again:
>  	if (unlikely(blocked_rdev)) {
>  		/* Wait for this device to become unblocked */
>  		int j;
> +		sector_t old = start_next_window;
>  
>  		for (j = 0; j < i; j++)
>  			if (r1_bio->bios[j])
>  				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
>  		r1_bio->state = 0;
> -		allow_barrier(conf);
> +		allow_barrier(conf, start_next_window, bio->bi_sector);
>  		md_wait_for_blocked_rdev(blocked_rdev, mddev);
> -		wait_barrier(conf);
> +		start_next_window = wait_barrier(conf, bio);
> +		/*
> +		 * We must make sure the multi r1bios of bio have
> +		 * the same value of after_threetimes_resync_sector
> +		 */

after_threetimes_resync_sector????  Attention to detail please!


If you resubmit with these things fixed up I'll try to get it reviewed quite
a bit quicker than this time :-)

Thanks,
NeilBrown



> +		if (bio->bi_phys_segments && old &&
> +			old != start_next_window)
> +			/*wait the former r1bio(s) completed*/
> +			wait_event(conf->wait_barrier,
> +					bio->bi_phys_segments == 1);
>  		goto retry_write;
>  	}
>  
> @@ -1437,11 +1525,14 @@ static void print_conf(struct r1conf *conf)
>  
>  static void close_sync(struct r1conf *conf)
>  {
> -	wait_barrier(conf);
> -	allow_barrier(conf);
> +	wait_barrier(conf, NULL);
> +	allow_barrier(conf, 0, 0);
>  
>  	mempool_destroy(conf->r1buf_pool);
>  	conf->r1buf_pool = NULL;
> +
> +	conf->next_resync = 0;
> +	conf->start_next_window = MaxSector;
>  }
>  
>  static int raid1_spare_active(struct mddev *mddev)
> @@ -2712,6 +2803,9 @@ static struct r1conf *setup_conf(struct mddev *mddev)
>  	conf->pending_count = 0;
>  	conf->recovery_disabled = mddev->recovery_disabled - 1;
>  
> +	conf->start_next_window = MaxSector;
> +	conf->current_window_requests = conf->next_window_requests = 0;
> +
>  	err = -EIO;
>  	for (i = 0; i < conf->raid_disks * 2; i++) {
>  
> diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
> index 331a98a..07425a1 100644
> --- a/drivers/md/raid1.h
> +++ b/drivers/md/raid1.h
> @@ -41,6 +41,19 @@ struct r1conf {
>  	 */
>  	sector_t		next_resync;
>  
> +	/*when raid1 start resync,we divide raid into four partitions
> +	 * |---------|--------------|---------------------|-------------|
> +	 *        next_resync   start_next_window        Pc
> +	 * Now start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
> +	 * Pc = start_next_window + NEXT_NORMALIO_DISTANCE
> +	 * current_window_requests means the count of normalIO between
> +	 * start_next_window and Pc.
> +	 * next_window_requests means the count of nornalIO after Pc.
> +	 * */
> +	sector_t		start_next_window;
> +	int			current_window_requests;
> +	int			next_window_requests;
> +
>  	spinlock_t		device_lock;
>  
>  	/* list of 'struct r1bio' that need to be processed by raid1d,
> @@ -112,6 +125,7 @@ struct r1bio {
>  						 * in this BehindIO request
>  						 */
>  	sector_t		sector;
> +	sector_t		start_next_window;
>  	int			sectors;
>  	unsigned long		state;
>  	struct mddev		*mddev;


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH 4/4] raid1: Rewrite the implementation of iobarrier.
  2013-10-24  1:50 ` NeilBrown
@ 2013-10-29  1:30   ` majianpeng
  2013-10-31  2:33     ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: majianpeng @ 2013-10-29  1:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

>
>Hi,
> sorry for the excessive delay in reviewing these.
>
> I've applied the first three, though I fixed some careless spelling errors in
> the patch descriptions and fixed up some of the grammar.
>
> This one is fairly good - just a couple of fixes needed.
> It is good that you have a nice details patch description at the top, but it
> is a bit hard to read.  Fixing the spelling and improving the formatting
> would help a lot....
>
> See below for comments in the code....
>
[snip]
>   ...
>
>
>>  	wait_event_lock_irq(conf->wait_barrier,
>>  			    !conf->array_frozen && 
>> -			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
>> +			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH &&
>> +			    (conf->next_resync + RESYNC_SECTORS
>> +				    <= conf->start_next_window),
>
>I think conditions are much more readable if the thing that you expect to
>change comes first.
>In this case we might need to wait until start_next_window increases, so
>    conf->start_next_window >= conf->next_resync + RESYNC_SECTORS
>
>as it is, it looks like you are waiting for next_resync to decrease.
>
>
>>  			    conf->resync_lock);
>>  
>>  	spin_unlock_irq(&conf->resync_lock);
>> @@ -846,10 +863,43 @@ static void lower_barrier(struct r1conf *conf)
>>  	wake_up(&conf->wait_barrier);
>>  }
>>  
>> -static void wait_barrier(struct r1conf *conf)
>> +static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
>>  {
>> +	bool wait = false;
>> +
>> +	if (conf->array_frozen || !bio)
>> +		wait = true;
>> +	else if (conf->barrier && bio_data_dir(bio) == WRITE) {
>> +		if (conf->next_resync < RESYNC_WINDOW_SECTORS)
>> +			wait = true;
>> +		else if (conf->next_resync - RESYNC_WINDOW_SECTORS >=
>> +				bio_end_sector(bio))
>> +			wait = false;
>> +		else if (conf->next_resync + NEXT_NORMALIO_DISTANCE
>> +				<= bio->bi_sector) {
>> +			if (conf->start_next_window == MaxSector)
>> +				conf->start_next_window =
>> +					conf->next_resync +
>> +						NEXT_NORMALIO_DISTANCE;
>> +
>> +			if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
>> +				<= bio->bi_sector)
>> +				conf->next_window_requests++;
>> +			else
>> +				conf->current_window_requests++;
>> +		} else
>> +			wait = true;
>> +	}
>> +
>> +	return wait;
>> +}
>
>I really don't like it that this function changes start_next_window and
>*_window_requests().
>It looks like it is a simple function that tests a condition.  But it
>actually has a side-effect as well.  And it has the side-effect when it
>returns false which is extra confusing.
>
>Also, you are incrementing *_window_requests for READ requests, which you
>don't need to do.
>
For this, i think the patch had a error.
In func raise_barrier()
  	wait_event_lock_irq(conf->wait_barrier,
  			    !conf->array_frozen && 
 -			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
 +			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH &&
 +			    (conf->next_resync + RESYNC_SECTORS
 +				    <= conf->start_next_window),
Actualy, we should remove the conditon !conf->nr_pending. Resync don't need !conf->nr_pending.
And for read, i don't increase *_window_request.

[snip]


Below is my latest patch.

Subject:raid1: Rewrite the implementation of iobarrier.

From c24142fa460596ccf461765dc77660edabd7e0a5 Mon Sep 17 00:00:00 2001
From: Jianpeng Ma <majianpeng@gmail.com>
Date: Mon, 28 Oct 2013 19:34:34 +0800
Subject: [PATCH] raid1: Rewrite the implementation of iobarrier

There is a iobarrier in raid1 because the contend between normal IO and
resync IO.It suspend all normal IO when reysync.
But if normalIO is outrange the resync windwos,there is no contend.
So I rewrite the iobarrier.

I patition the whole space into five parts.
|---------|-----------|------------|----------------|-------|
         Pa    next_resync   start_next_window      Pb

Pa + SYNC_WINDOW = next_resync
next_resync + NEXT_NORMALIO_DISTANCE = start_next_window
start_next_window + NEXT_NORMAILIO_DISTANCE = Pb

Firstly i introduce some concepts:
1:RESYNC_WINDOW: For resync, there are 32 resync requests at most at the
same time.A sync request is RESYNC_BLOCK_SIZE(64*1024).
So the RESYNC_WINDOW is 32 * RESYNC_BLOCK_SIZE, that is 2MB.
2:NEXT_NORMALIO_DISTANCE: it indicate the distance between next_resync
and start_next_window.It also indicate the distance between
start_next_window and Pb.Now it is 3 * RESYNC_WINDOW_SIZE.It can tune.
3:next_resync: mean the next start sector which will being do sync.
4:Pa means: a position which before RESYNC_WINDOW distance from
next_resync.
5:start_next_window: means a position which after NEXT_NORMALIO_DISTANCE
distance from next_resync.For normalio after this position,
it can't wait resyncio to complete.
6:Pb: means a position which after 2 * NEXT_NORMALIO_DISTANCE distance
from next_resync.
7:current_window_requests: means the count of normalIO between Pb and
start_next_window.
8:next_window_requests: means the count of nonmalIO after Pb.

NormalIO will be partitioned into four types:
NoIO1: it means the end sector of bio is smaller or equal the Pa.
NoIO2: it means the start sector of bio larger or equal Pb
NoIO3: it means the start sector of bio larger or equal
start_next_window.
NoIO4: it means the location between Pa and start_next_window

|--------|-----------|--------------------|----------------|---------|
    |    Pa   |   next_resync   |  start_next_window   |   Pb   |
  NoIO1     NoIO4             NoIO4		     NoIO3     NoIO2

For NoIO1,it don't use iobarrier.
For NoIO4,it used the original iobarrier mechanism.The nornalIO and
resyncIO must be contend.
For NoIO2/3, i add two fields in struct r1conf "current_window_requests,
next_window_requests".They indicate the count of two types.
For those, they don't wait resync io to complete.

For resync action, if there are NoIO4s, it must be waited.If not,it
forward.But if sync action reached to the start_next_window and
current_window_requestes > 0 (that is there are NoIO3s),it must
wait until the current_window_requests became zero.
If current_window_requests became zero,the start_next_window also
forward. The current_window_requests will replaced by next_window_requests.

There is a problem which when and how to change from NoIO2 to NoIO3.Only
so, the sync action can forward.

I add a field in struct r1conf "start_next_window".What condition will
change?
A: if start_next_window == MaxSector, it means there are no NoIO2/3.
   So start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
B: if current_window_requests == 0 && next_window_requests != 0, it
   means start_next_window move to Pb

There is anthor problem which how to differentiate between
old NoIO2(now it is NoIO3) and NoIO2.
For example, there are many bios which are NoIO2 and a bio which is
NoIO3. NoIO3 firstly completed,so the bios of NoIO2 became NoIO3.
At general,we use flags and list_head. The codes should:
>list_splice_init(NoIO2, NoIO3);
>current_window_requests = next_window_requests;
>next_window_requests = 0;
>list_for_each_entry(NoIO3){
>       clear_bit(NoIO2_FLAG),
>       set_bit(NoIO3_FLAG,
>}
If there are many NoIO2, it will take too much time in list_for_each_entry.
Avoid this, i add a field in struct r1bio "start_next_window".
I used this to record the position conf->start_next_window when call
wait_barrier in func make_request.

In func allow_barrier, it check the conf->start_next_window.
If r1bio->stat_next_window == conf->start_next_window,it mean
there is no transition between NoIO2 and NoIO3.
If r1bio->start_next_window != conf->start_next_window,it mean
there is a trnasiton between NoIO2 and NoIO3.Because there is at most one
transtion.So it only means the bio is old NoIO2.

For one bio,there are many r1bio's.So we make sure the
all r1bio->start_next_window are the same value.
If we met blocked_dev in make_request(), it must call allow_barrier
and wait_barrier.So the former and the later value of
conf->start_next_window will be change.
If there are many r1bio's with differnet start_next_window,
for the relevant bio, it depent on the last value of r1bio.
It will cause error.To avoid this, we must wait previous r1bios
to complete.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
---
 drivers/md/raid1.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++------
 drivers/md/raid1.h |  14 ++++++
 2 files changed, 124 insertions(+), 12 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index b4a6dcd..5b311c0 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -66,7 +66,8 @@
  */
 static int max_queued_requests = 1024;
 
-static void allow_barrier(struct r1conf *conf);
+static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
+				sector_t bi_sector);
 static void lower_barrier(struct r1conf *conf);
 
 static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
@@ -227,6 +228,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
 	struct bio *bio = r1_bio->master_bio;
 	int done;
 	struct r1conf *conf = r1_bio->mddev->private;
+	sector_t start_next_window = r1_bio->start_next_window;
+	sector_t bi_sector = bio->bi_sector;
 
 	if (bio->bi_phys_segments) {
 		unsigned long flags;
@@ -234,6 +237,11 @@ static void call_bio_endio(struct r1bio *r1_bio)
 		bio->bi_phys_segments--;
 		done = (bio->bi_phys_segments == 0);
 		spin_unlock_irqrestore(&conf->device_lock, flags);
+		/*
+		 * make_request() might be waiting for
+		 * bi_phys_segments to decrease
+		 */
+		wake_up(&conf->wait_barrier);
 	} else
 		done = 1;
 
@@ -245,7 +253,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
 		 * Wake up any possible resync thread that waits for the device
 		 * to go idle.
 		 */
-		allow_barrier(conf);
+		allow_barrier(conf, start_next_window, bi_sector);
 	}
 }
 
@@ -827,10 +835,18 @@ static void raise_barrier(struct r1conf *conf)
 	/* block any new IO from starting */
 	conf->barrier++;
 
-	/* Now wait for all pending IO to complete */
+	/* For those conditions we must wait:
+	 * A:while the array is in frozen state
+	 * B:while barrier >= RESYNC_DEPTH, meaning resync reach
+	 *   the max count which allowed.
+	 * C:next_resync + RESYNC_SECTORS > start_next_window, meaning
+	 *   next resync will reach to window which normal bios are handling.
+	 */
 	wait_event_lock_irq(conf->wait_barrier,
 			    !conf->array_frozen &&
-			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
+			    conf->barrier < RESYNC_DEPTH &&
+			    (conf->start_next_window >=
+			     conf->next_resync + RESYNC_SECTORS),
 			    conf->resync_lock);
 
 	spin_unlock_irq(&conf->resync_lock);
@@ -846,10 +862,33 @@ static void lower_barrier(struct r1conf *conf)
 	wake_up(&conf->wait_barrier);
 }
 
-static void wait_barrier(struct r1conf *conf)
+static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
+{
+	bool wait = false;
+
+	if (conf->array_frozen || !bio)
+		wait = true;
+	else if (conf->barrier && bio_data_dir(bio) == WRITE) {
+		if (conf->next_resync < RESYNC_WINDOW_SECTORS)
+			wait = true;
+		else if ((conf->next_resync - RESYNC_WINDOW_SECTORS
+				>= bio_end_sector(bio)) ||
+			 (conf->next_resync + NEXT_NORMALIO_DISTANCE
+				<= bio->bi_sector))
+			wait = false;
+		else
+			wait = true;
+	}
+
+	return wait;
+}
+
+static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
 {
+	sector_t sector = 0;
+
 	spin_lock_irq(&conf->resync_lock);
-	if (conf->barrier) {
+	if (need_to_wait_for_sync(conf, bio)) {
 		conf->nr_waiting++;
 		/* Wait for the barrier to drop.
 		 * However if there are already pending
@@ -868,16 +907,57 @@ static void wait_barrier(struct r1conf *conf)
 				     !bio_list_empty(current->bio_list))),
 				    conf->resync_lock);
 		conf->nr_waiting--;
+	} else if (bio_data_dir(bio) == WRITE) {
+		if (conf->next_resync + NEXT_NORMALIO_DISTANCE
+				<= bio->bi_sector) {
+			if (conf->start_next_window == MaxSector)
+				conf->start_next_window =
+					conf->next_resync +
+						NEXT_NORMALIO_DISTANCE;
+
+			if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
+				<= bio->bi_sector)
+				conf->next_window_requests++;
+			else
+				conf->current_window_requests++;
+		}
+		if (bio->bi_sector >= conf->start_next_window)
+			sector = conf->start_next_window;
 	}
+
 	conf->nr_pending++;
 	spin_unlock_irq(&conf->resync_lock);
+	return sector;
 }
 
-static void allow_barrier(struct r1conf *conf)
+static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
+				sector_t bi_sector)
 {
 	unsigned long flags;
+
 	spin_lock_irqsave(&conf->resync_lock, flags);
 	conf->nr_pending--;
+	if (start_next_window) {
+		if (start_next_window == conf->start_next_window) {
+			if (conf->start_next_window + NEXT_NORMALIO_DISTANCE
+				<= bi_sector)
+				conf->next_window_requests--;
+			else
+				conf->current_window_requests--;
+		} else
+			conf->current_window_requests--;
+
+		if (!conf->current_window_requests) {
+			if (conf->next_window_requests) {
+				conf->current_window_requests =
+					conf->next_window_requests;
+				conf->next_window_requests = 0;
+				conf->start_next_window +=
+					NEXT_NORMALIO_DISTANCE;
+			} else
+				conf->start_next_window = MaxSector;
+		}
+	}
 	spin_unlock_irqrestore(&conf->resync_lock, flags);
 	wake_up(&conf->wait_barrier);
 }
@@ -1012,6 +1092,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 	int first_clone;
 	int sectors_handled;
 	int max_sectors;
+	sector_t start_next_window;
 
 	/*
 	 * Register the new request and wait if the reconstruction
@@ -1041,7 +1122,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 		finish_wait(&conf->wait_barrier, &w);
 	}
 
-	wait_barrier(conf);
+	start_next_window = wait_barrier(conf, bio);
 
 	bitmap = mddev->bitmap;
 
@@ -1162,6 +1243,7 @@ read_again:
 
 	disks = conf->raid_disks * 2;
  retry_write:
+	r1_bio->start_next_window = start_next_window;
 	blocked_rdev = NULL;
 	rcu_read_lock();
 	max_sectors = r1_bio->sectors;
@@ -1230,14 +1312,24 @@ read_again:
 	if (unlikely(blocked_rdev)) {
 		/* Wait for this device to become unblocked */
 		int j;
+		sector_t old = start_next_window;
 
 		for (j = 0; j < i; j++)
 			if (r1_bio->bios[j])
 				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
 		r1_bio->state = 0;
-		allow_barrier(conf);
+		allow_barrier(conf, start_next_window, bio->bi_sector);
 		md_wait_for_blocked_rdev(blocked_rdev, mddev);
-		wait_barrier(conf);
+		start_next_window = wait_barrier(conf, bio);
+		/*
+		 * We must make sure the multi r1bios of bio have
+		 * the same value of bi_phys_segments
+		 */
+		if (bio->bi_phys_segments && old &&
+			old != start_next_window)
+			/*wait the former r1bio(s) completed*/
+			wait_event(conf->wait_barrier,
+					bio->bi_phys_segments == 1);
 		goto retry_write;
 	}
 
@@ -1437,11 +1529,14 @@ static void print_conf(struct r1conf *conf)
 
 static void close_sync(struct r1conf *conf)
 {
-	wait_barrier(conf);
-	allow_barrier(conf);
+	wait_barrier(conf, NULL);
+	allow_barrier(conf, 0, 0);
 
 	mempool_destroy(conf->r1buf_pool);
 	conf->r1buf_pool = NULL;
+
+	conf->next_resync = 0;
+	conf->start_next_window = MaxSector;
 }
 
 static int raid1_spare_active(struct mddev *mddev)
@@ -2713,6 +2808,9 @@ static struct r1conf *setup_conf(struct mddev *mddev)
 	conf->pending_count = 0;
 	conf->recovery_disabled = mddev->recovery_disabled - 1;
 
+	conf->start_next_window = MaxSector;
+	conf->current_window_requests = conf->next_window_requests = 0;
+
 	err = -EIO;
 	for (i = 0; i < conf->raid_disks * 2; i++) {
 
diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index 331a98a..07425a1 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -41,6 +41,19 @@ struct r1conf {
 	 */
 	sector_t		next_resync;
 
+	/*when raid1 start resync,we divide raid into four partitions
+	 * |---------|--------------|---------------------|-------------|
+	 *        next_resync   start_next_window        Pc
+	 * Now start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
+	 * Pc = start_next_window + NEXT_NORMALIO_DISTANCE
+	 * current_window_requests means the count of normalIO between
+	 * start_next_window and Pc.
+	 * next_window_requests means the count of nornalIO after Pc.
+	 * */
+	sector_t		start_next_window;
+	int			current_window_requests;
+	int			next_window_requests;
+
 	spinlock_t		device_lock;
 
 	/* list of 'struct r1bio' that need to be processed by raid1d,
@@ -112,6 +125,7 @@ struct r1bio {
 						 * in this BehindIO request
 						 */
 	sector_t		sector;
+	sector_t		start_next_window;
 	int			sectors;
 	unsigned long		state;
 	struct mddev		*mddev;
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] raid1: Rewrite the implementation of iobarrier.
  2013-10-29  1:30   ` majianpeng
@ 2013-10-31  2:33     ` NeilBrown
  2013-10-31  3:20       ` majianpeng
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2013-10-31  2:33 UTC (permalink / raw)
  To: majianpeng; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 10158 bytes --]

On Tue, 29 Oct 2013 09:30:14 +0800 majianpeng <majianpeng@gmail.com> wrote:


Nearly there!!  Just a few more details.  See below.


> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
> ---
>  drivers/md/raid1.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++------
>  drivers/md/raid1.h |  14 ++++++
>  2 files changed, 124 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index b4a6dcd..5b311c0 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -66,7 +66,8 @@
>   */
>  static int max_queued_requests = 1024;
>  
> -static void allow_barrier(struct r1conf *conf);
> +static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
> +				sector_t bi_sector);
>  static void lower_barrier(struct r1conf *conf);
>  
>  static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
> @@ -227,6 +228,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
>  	struct bio *bio = r1_bio->master_bio;
>  	int done;
>  	struct r1conf *conf = r1_bio->mddev->private;
> +	sector_t start_next_window = r1_bio->start_next_window;
> +	sector_t bi_sector = bio->bi_sector;

This should  be r1_bio->sector, not bio->bi_sector.
They are often the same but if multiple r1_bios are needed for some reason
(e.g. bad blocks) they may not be.

>  
>  	if (bio->bi_phys_segments) {
>  		unsigned long flags;
> @@ -234,6 +237,11 @@ static void call_bio_endio(struct r1bio *r1_bio)
>  		bio->bi_phys_segments--;
>  		done = (bio->bi_phys_segments == 0);
>  		spin_unlock_irqrestore(&conf->device_lock, flags);
> +		/*
> +		 * make_request() might be waiting for
> +		 * bi_phys_segments to decrease
> +		 */
> +		wake_up(&conf->wait_barrier);
>  	} else
>  		done = 1;
>  
> @@ -245,7 +253,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
>  		 * Wake up any possible resync thread that waits for the device
>  		 * to go idle.
>  		 */
> -		allow_barrier(conf);
> +		allow_barrier(conf, start_next_window, bi_sector);
>  	}
>  }
>  
> @@ -827,10 +835,18 @@ static void raise_barrier(struct r1conf *conf)
>  	/* block any new IO from starting */
>  	conf->barrier++;
>  
> -	/* Now wait for all pending IO to complete */
> +	/* For those conditions we must wait:
> +	 * A:while the array is in frozen state
> +	 * B:while barrier >= RESYNC_DEPTH, meaning resync reach
> +	 *   the max count which allowed.
> +	 * C:next_resync + RESYNC_SECTORS > start_next_window, meaning
> +	 *   next resync will reach to window which normal bios are handling.
> +	 */
>  	wait_event_lock_irq(conf->wait_barrier,
>  			    !conf->array_frozen &&
> -			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
> +			    conf->barrier < RESYNC_DEPTH &&
> +			    (conf->start_next_window >=
> +			     conf->next_resync + RESYNC_SECTORS),
>  			    conf->resync_lock);

You've removed the test on conf->nr_pending here, which I think is correct.
It counts 'read' requests as well.  Testing start_next_window serves the same
purpose as it is increased whenever current_window_requests reaches zero.

However you having modified as similar test on nr_pending in wait_barrier().
That worries me a bit.  Should that be changed to a test on start_next_window
to match the above change?


>  
>  	spin_unlock_irq(&conf->resync_lock);
> @@ -846,10 +862,33 @@ static void lower_barrier(struct r1conf *conf)
>  	wake_up(&conf->wait_barrier);
>  }
>  
> -static void wait_barrier(struct r1conf *conf)
> +static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
> +{
> +	bool wait = false;
> +
> +	if (conf->array_frozen || !bio)
> +		wait = true;
> +	else if (conf->barrier && bio_data_dir(bio) == WRITE) {
> +		if (conf->next_resync < RESYNC_WINDOW_SECTORS)
> +			wait = true;
> +		else if ((conf->next_resync - RESYNC_WINDOW_SECTORS
> +				>= bio_end_sector(bio)) ||
> +			 (conf->next_resync + NEXT_NORMALIO_DISTANCE
> +				<= bio->bi_sector))
> +			wait = false;
> +		else
> +			wait = true;
> +	}
> +
> +	return wait;
> +}
> +
> +static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
>  {
> +	sector_t sector = 0;
> +
>  	spin_lock_irq(&conf->resync_lock);
> -	if (conf->barrier) {
> +	if (need_to_wait_for_sync(conf, bio)) {
>  		conf->nr_waiting++;
>  		/* Wait for the barrier to drop.
>  		 * However if there are already pending
> @@ -868,16 +907,57 @@ static void wait_barrier(struct r1conf *conf)
>  				     !bio_list_empty(current->bio_list))),
>  				    conf->resync_lock);
>  		conf->nr_waiting--;
> +	} else if (bio_data_dir(bio) == WRITE) {
> +		if (conf->next_resync + NEXT_NORMALIO_DISTANCE
> +				<= bio->bi_sector) {
> +			if (conf->start_next_window == MaxSector)
> +				conf->start_next_window =
> +					conf->next_resync +
> +						NEXT_NORMALIO_DISTANCE;
> +
> +			if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
> +				<= bio->bi_sector)
> +				conf->next_window_requests++;
> +			else
> +				conf->current_window_requests++;
> +		}
> +		if (bio->bi_sector >= conf->start_next_window)
> +			sector = conf->start_next_window;

You aren't setting 'sector' if we needed to wait.  I don't think that is
correct, is it?


>  	}
> +
>  	conf->nr_pending++;
>  	spin_unlock_irq(&conf->resync_lock);
> +	return sector;
>  }
>  
> -static void allow_barrier(struct r1conf *conf)
> +static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
> +				sector_t bi_sector)
>  {
>  	unsigned long flags;
> +
>  	spin_lock_irqsave(&conf->resync_lock, flags);
>  	conf->nr_pending--;
> +	if (start_next_window) {
> +		if (start_next_window == conf->start_next_window) {
> +			if (conf->start_next_window + NEXT_NORMALIO_DISTANCE
> +				<= bi_sector)
> +				conf->next_window_requests--;
> +			else
> +				conf->current_window_requests--;
> +		} else
> +			conf->current_window_requests--;
> +
> +		if (!conf->current_window_requests) {
> +			if (conf->next_window_requests) {
> +				conf->current_window_requests =
> +					conf->next_window_requests;
> +				conf->next_window_requests = 0;
> +				conf->start_next_window +=
> +					NEXT_NORMALIO_DISTANCE;
> +			} else
> +				conf->start_next_window = MaxSector;
> +		}
> +	}
>  	spin_unlock_irqrestore(&conf->resync_lock, flags);
>  	wake_up(&conf->wait_barrier);
>  }
> @@ -1012,6 +1092,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
>  	int first_clone;
>  	int sectors_handled;
>  	int max_sectors;
> +	sector_t start_next_window;
>  
>  	/*
>  	 * Register the new request and wait if the reconstruction
> @@ -1041,7 +1122,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
>  		finish_wait(&conf->wait_barrier, &w);
>  	}
>  
> -	wait_barrier(conf);
> +	start_next_window = wait_barrier(conf, bio);
>  
>  	bitmap = mddev->bitmap;
>  
> @@ -1162,6 +1243,7 @@ read_again:
>  
>  	disks = conf->raid_disks * 2;
>   retry_write:
> +	r1_bio->start_next_window = start_next_window;
>  	blocked_rdev = NULL;
>  	rcu_read_lock();
>  	max_sectors = r1_bio->sectors;
> @@ -1230,14 +1312,24 @@ read_again:
>  	if (unlikely(blocked_rdev)) {
>  		/* Wait for this device to become unblocked */
>  		int j;
> +		sector_t old = start_next_window;
>  
>  		for (j = 0; j < i; j++)
>  			if (r1_bio->bios[j])
>  				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
>  		r1_bio->state = 0;
> -		allow_barrier(conf);
> +		allow_barrier(conf, start_next_window, bio->bi_sector);

I think this should be r1_bio->sector, not bio->bi_sector, for the same
reason as earlier.


>  		md_wait_for_blocked_rdev(blocked_rdev, mddev);
> -		wait_barrier(conf);
> +		start_next_window = wait_barrier(conf, bio);
> +		/*
> +		 * We must make sure the multi r1bios of bio have
> +		 * the same value of bi_phys_segments
> +		 */
> +		if (bio->bi_phys_segments && old &&
> +			old != start_next_window)
> +			/*wait the former r1bio(s) completed*/
> +			wait_event(conf->wait_barrier,
> +					bio->bi_phys_segments == 1);
>  		goto retry_write;
>  	}
>  
> @@ -1437,11 +1529,14 @@ static void print_conf(struct r1conf *conf)
>  
>  static void close_sync(struct r1conf *conf)
>  {
> -	wait_barrier(conf);
> -	allow_barrier(conf);
> +	wait_barrier(conf, NULL);
> +	allow_barrier(conf, 0, 0);
>  
>  	mempool_destroy(conf->r1buf_pool);
>  	conf->r1buf_pool = NULL;
> +
> +	conf->next_resync = 0;
> +	conf->start_next_window = MaxSector;
>  }
>  
>  static int raid1_spare_active(struct mddev *mddev)
> @@ -2713,6 +2808,9 @@ static struct r1conf *setup_conf(struct mddev *mddev)
>  	conf->pending_count = 0;
>  	conf->recovery_disabled = mddev->recovery_disabled - 1;
>  
> +	conf->start_next_window = MaxSector;
> +	conf->current_window_requests = conf->next_window_requests = 0;
> +
>  	err = -EIO;
>  	for (i = 0; i < conf->raid_disks * 2; i++) {
>  
> diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
> index 331a98a..07425a1 100644
> --- a/drivers/md/raid1.h
> +++ b/drivers/md/raid1.h
> @@ -41,6 +41,19 @@ struct r1conf {
>  	 */
>  	sector_t		next_resync;
>  
> +	/*when raid1 start resync,we divide raid into four partitions
> +	 * |---------|--------------|---------------------|-------------|
> +	 *        next_resync   start_next_window        Pc
> +	 * Now start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
> +	 * Pc = start_next_window + NEXT_NORMALIO_DISTANCE
> +	 * current_window_requests means the count of normalIO between
> +	 * start_next_window and Pc.
> +	 * next_window_requests means the count of nornalIO after Pc.
> +	 * */
> +	sector_t		start_next_window;
> +	int			current_window_requests;
> +	int			next_window_requests;
> +
>  	spinlock_t		device_lock;
>  
>  	/* list of 'struct r1bio' that need to be processed by raid1d,
> @@ -112,6 +125,7 @@ struct r1bio {
>  						 * in this BehindIO request
>  						 */
>  	sector_t		sector;
> +	sector_t		start_next_window;
>  	int			sectors;
>  	unsigned long		state;
>  	struct mddev		*mddev;


Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH 4/4] raid1: Rewrite the implementation of iobarrier.
  2013-10-31  2:33     ` NeilBrown
@ 2013-10-31  3:20       ` majianpeng
  2013-11-14  6:44         ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: majianpeng @ 2013-10-31  3:20 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

>On Tue, 29 Oct 2013 09:30:14 +0800 majianpeng <majianpeng@gmail.com> wrote:
>
>
>Nearly there!!  Just a few more details.  See below.
>
>
>> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
>> ---
>>  drivers/md/raid1.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++------
>>  drivers/md/raid1.h |  14 ++++++
>>  2 files changed, 124 insertions(+), 12 deletions(-)
>> 
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index b4a6dcd..5b311c0 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -66,7 +66,8 @@
>>   */
>>  static int max_queued_requests = 1024;
>>  
>> -static void allow_barrier(struct r1conf *conf);
>> +static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
>> +				sector_t bi_sector);
>>  static void lower_barrier(struct r1conf *conf);
>>  
>>  static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
>> @@ -227,6 +228,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
>>  	struct bio *bio = r1_bio->master_bio;
>>  	int done;
>>  	struct r1conf *conf = r1_bio->mddev->private;
>> +	sector_t start_next_window = r1_bio->start_next_window;
>> +	sector_t bi_sector = bio->bi_sector;
>
>This should  be r1_bio->sector, not bio->bi_sector.
>They are often the same but if multiple r1_bios are needed for some reason
>(e.g. bad blocks) they may not be.
>
No, allow_barrier() only for bio not for r1bio.It only do when bio->bi_phys_segments == 0.
If this value is r1_bio->sector, the value may has different value as you said.
So in allow_barrier():
       if (start_next_window) {
                if (start_next_window == conf->start_next_window) {
                        if (conf->start_next_window + NEXT_NORMALIO_DISTANCE
                                <= bi_sector)
                                conf->next_window_requests--;
                        else 
                                conf->current_window_requests--;
                } else 
It will has differnt result.

>>  
>>  	if (bio->bi_phys_segments) {
>>  		unsigned long flags;
>> @@ -234,6 +237,11 @@ static void call_bio_endio(struct r1bio *r1_bio)
>>  		bio->bi_phys_segments--;
>>  		done = (bio->bi_phys_segments == 0);
>>  		spin_unlock_irqrestore(&conf->device_lock, flags);
>> +		/*
>> +		 * make_request() might be waiting for
>> +		 * bi_phys_segments to decrease
>> +		 */
>> +		wake_up(&conf->wait_barrier);
>>  	} else
>>  		done = 1;
>>  
>> @@ -245,7 +253,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
>>  		 * Wake up any possible resync thread that waits for the device
>>  		 * to go idle.
>>  		 */
>> -		allow_barrier(conf);
>> +		allow_barrier(conf, start_next_window, bi_sector);
>>  	}
>>  }
>>  
>> @@ -827,10 +835,18 @@ static void raise_barrier(struct r1conf *conf)
>>  	/* block any new IO from starting */
>>  	conf->barrier++;
>>  
>> -	/* Now wait for all pending IO to complete */
>> +	/* For those conditions we must wait:
>> +	 * A:while the array is in frozen state
>> +	 * B:while barrier >= RESYNC_DEPTH, meaning resync reach
>> +	 *   the max count which allowed.
>> +	 * C:next_resync + RESYNC_SECTORS > start_next_window, meaning
>> +	 *   next resync will reach to window which normal bios are handling.
>> +	 */
>>  	wait_event_lock_irq(conf->wait_barrier,
>>  			    !conf->array_frozen &&
>> -			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
>> +			    conf->barrier < RESYNC_DEPTH &&
>> +			    (conf->start_next_window >=
>> +			     conf->next_resync + RESYNC_SECTORS),
>>  			    conf->resync_lock);
>
>You've removed the test on conf->nr_pending here, which I think is correct.
>It counts 'read' requests as well.  Testing start_next_window serves the same
>purpose as it is increased whenever current_window_requests reaches zero.
>
In func wait_barrier():
        if (need_to_wait_for_sync(conf, bio)) {
              [snip]
        } else if (bio_data_dir(bio) == WRITE) {
                if (conf->next_resync + NEXT_NORMALIO_DISTANCE
                                <= bio->bi_sector) {
                        if (conf->start_next_window == MaxSector)
                                conf->start_next_window =
                                        conf->next_resync +
                                                NEXT_NORMALIO_DISTANCE;

                        if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
                                <= bio->bi_sector)
                                conf->next_window_requests++;
                        else
                                conf->current_window_requests++;
                }
                if (bio->bi_sector >= conf->start_next_window)
                        sector = conf->start_next_window;
        }
start_next_window only for bio_data_dir(bio) == WRITE. So resync can't wait read io to complete.

>However you having modified as similar test on nr_pending in wait_barrier().
>That worries me a bit.  Should that be changed to a test on start_next_window
>to match the above change?
For this condition, i only copy it.For this condition, i'm not know clearly.
Can you give me more details about this?
>
>
>>  
>>  	spin_unlock_irq(&conf->resync_lock);
>> @@ -846,10 +862,33 @@ static void lower_barrier(struct r1conf *conf)
>>  	wake_up(&conf->wait_barrier);
>>  }
>>  
>> -static void wait_barrier(struct r1conf *conf)
>> +static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
>> +{
>> +	bool wait = false;
>> +
>> +	if (conf->array_frozen || !bio)
>> +		wait = true;
>> +	else if (conf->barrier && bio_data_dir(bio) == WRITE) {
>> +		if (conf->next_resync < RESYNC_WINDOW_SECTORS)
>> +			wait = true;
>> +		else if ((conf->next_resync - RESYNC_WINDOW_SECTORS
>> +				>= bio_end_sector(bio)) ||
>> +			 (conf->next_resync + NEXT_NORMALIO_DISTANCE
>> +				<= bio->bi_sector))
>> +			wait = false;
>> +		else
>> +			wait = true;
>> +	}
>> +
>> +	return wait;
>> +}
>> +
>> +static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
>>  {
>> +	sector_t sector = 0;
>> +
>>  	spin_lock_irq(&conf->resync_lock);
>> -	if (conf->barrier) {
>> +	if (need_to_wait_for_sync(conf, bio)) {
>>  		conf->nr_waiting++;
>>  		/* Wait for the barrier to drop.
>>  		 * However if there are already pending
>> @@ -868,16 +907,57 @@ static void wait_barrier(struct r1conf *conf)
>>  				     !bio_list_empty(current->bio_list))),
>>  				    conf->resync_lock);
>>  		conf->nr_waiting--;
>> +	} else if (bio_data_dir(bio) == WRITE) {
>> +		if (conf->next_resync + NEXT_NORMALIO_DISTANCE
>> +				<= bio->bi_sector) {
>> +			if (conf->start_next_window == MaxSector)
>> +				conf->start_next_window =
>> +					conf->next_resync +
>> +						NEXT_NORMALIO_DISTANCE;
>> +
>> +			if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
>> +				<= bio->bi_sector)
>> +				conf->next_window_requests++;
>> +			else
>> +				conf->current_window_requests++;
>> +		}
>> +		if (bio->bi_sector >= conf->start_next_window)
>> +			sector = conf->start_next_window;
>
>You aren't setting 'sector' if we needed to wait.  I don't think that is
>correct, is it?
>
Yes.How about those code:
spin_lock_irq(&conf->resync_lock);
retry_check:
        if (need_to_wait_for_sync(conf, bio)) {
                conf->nr_waiting++;
                /* Wait for the barrier to drop.
                 * However if there are already pending
                 * requests (preventing the barrier from
                 * rising completely), and the
                 * pre-process bio queue isn't empty,
                 * then don't wait, as we need to empty
                 * that queue to get the nr_pending
                 * count down.
                 */
                wait_event_lock_irq(conf->wait_barrier,
                                    !conf->array_frozen &&
                                    (!conf->barrier ||
                                    (conf->nr_pending &&
                                     current->bio_list &&
                                     !bio_list_empty(current->bio_list))),
                                    conf->resync_lock);
                conf->nr_waiting--;
				goto retry_check;
>
>>  	}
>> +
>>  	conf->nr_pending++;
>>  	spin_unlock_irq(&conf->resync_lock);
>> +	return sector;
>>  }
>>  
>> -static void allow_barrier(struct r1conf *conf)
>> +static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
>> +				sector_t bi_sector)
>>  {
>>  	unsigned long flags;
>> +
>>  	spin_lock_irqsave(&conf->resync_lock, flags);
>>  	conf->nr_pending--;
>> +	if (start_next_window) {
>> +		if (start_next_window == conf->start_next_window) {
>> +			if (conf->start_next_window + NEXT_NORMALIO_DISTANCE
>> +				<= bi_sector)
>> +				conf->next_window_requests--;
>> +			else
>> +				conf->current_window_requests--;
>> +		} else
>> +			conf->current_window_requests--;
>> +
>> +		if (!conf->current_window_requests) {
>> +			if (conf->next_window_requests) {
>> +				conf->current_window_requests =
>> +					conf->next_window_requests;
>> +				conf->next_window_requests = 0;
>> +				conf->start_next_window +=
>> +					NEXT_NORMALIO_DISTANCE;
>> +			} else
>> +				conf->start_next_window = MaxSector;
>> +		}
>> +	}
>>  	spin_unlock_irqrestore(&conf->resync_lock, flags);
>>  	wake_up(&conf->wait_barrier);
>>  }
>> @@ -1012,6 +1092,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
>>  	int first_clone;
>>  	int sectors_handled;
>>  	int max_sectors;
>> +	sector_t start_next_window;
>>  
>>  	/*
>>  	 * Register the new request and wait if the reconstruction
>> @@ -1041,7 +1122,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
>>  		finish_wait(&conf->wait_barrier, &w);
>>  	}
>>  
>> -	wait_barrier(conf);
>> +	start_next_window = wait_barrier(conf, bio);
>>  
>>  	bitmap = mddev->bitmap;
>>  
>> @@ -1162,6 +1243,7 @@ read_again:
>>  
>>  	disks = conf->raid_disks * 2;
>>   retry_write:
>> +	r1_bio->start_next_window = start_next_window;
>>  	blocked_rdev = NULL;
>>  	rcu_read_lock();
>>  	max_sectors = r1_bio->sectors;
>> @@ -1230,14 +1312,24 @@ read_again:
>>  	if (unlikely(blocked_rdev)) {
>>  		/* Wait for this device to become unblocked */
>>  		int j;
>> +		sector_t old = start_next_window;
>>  
>>  		for (j = 0; j < i; j++)
>>  			if (r1_bio->bios[j])
>>  				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
>>  		r1_bio->state = 0;
>> -		allow_barrier(conf);
>> +		allow_barrier(conf, start_next_window, bio->bi_sector);
>
>I think this should be r1_bio->sector, not bio->bi_sector, for the same
>reason as earlier.
>
The exaplaination is the same at above.

Thanks!
Jianpeng Ma
>
>>  		md_wait_for_blocked_rdev(blocked_rdev, mddev);
>> -		wait_barrier(conf);
>> +		start_next_window = wait_barrier(conf, bio);
>> +		/*
>> +		 * We must make sure the multi r1bios of bio have
>> +		 * the same value of bi_phys_segments
>> +		 */
>> +		if (bio->bi_phys_segments && old &&
>> +			old != start_next_window)
>> +			/*wait the former r1bio(s) completed*/
>> +			wait_event(conf->wait_barrier,
>> +					bio->bi_phys_segments == 1);
>>  		goto retry_write;
>>  	}
>>  
>> @@ -1437,11 +1529,14 @@ static void print_conf(struct r1conf *conf)
>>  
>>  static void close_sync(struct r1conf *conf)
>>  {
>> -	wait_barrier(conf);
>> -	allow_barrier(conf);
>> +	wait_barrier(conf, NULL);
>> +	allow_barrier(conf, 0, 0);
>>  
>>  	mempool_destroy(conf->r1buf_pool);
>>  	conf->r1buf_pool = NULL;
>> +
>> +	conf->next_resync = 0;
>> +	conf->start_next_window = MaxSector;
>>  }
>>  
>>  static int raid1_spare_active(struct mddev *mddev)
>> @@ -2713,6 +2808,9 @@ static struct r1conf *setup_conf(struct mddev *mddev)
>>  	conf->pending_count = 0;
>>  	conf->recovery_disabled = mddev->recovery_disabled - 1;
>>  
>> +	conf->start_next_window = MaxSector;
>> +	conf->current_window_requests = conf->next_window_requests = 0;
>> +
>>  	err = -EIO;
>>  	for (i = 0; i < conf->raid_disks * 2; i++) {
>>  
>> diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
>> index 331a98a..07425a1 100644
>> --- a/drivers/md/raid1.h
>> +++ b/drivers/md/raid1.h
>> @@ -41,6 +41,19 @@ struct r1conf {
>>  	 */
>>  	sector_t		next_resync;
>>  
>> +	/*when raid1 start resync,we divide raid into four partitions
>> +	 * |---------|--------------|---------------------|-------------|
>> +	 *        next_resync   start_next_window        Pc
>> +	 * Now start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
>> +	 * Pc = start_next_window + NEXT_NORMALIO_DISTANCE
>> +	 * current_window_requests means the count of normalIO between
>> +	 * start_next_window and Pc.
>> +	 * next_window_requests means the count of nornalIO after Pc.
>> +	 * */
>> +	sector_t		start_next_window;
>> +	int			current_window_requests;
>> +	int			next_window_requests;
>> +
>>  	spinlock_t		device_lock;
>>  
>>  	/* list of 'struct r1bio' that need to be processed by raid1d,
>> @@ -112,6 +125,7 @@ struct r1bio {
>>  						 * in this BehindIO request
>>  						 */
>>  	sector_t		sector;
>> +	sector_t		start_next_window;
>>  	int			sectors;
>>  	unsigned long		state;
>>  	struct mddev		*mddev;
>
>
>Thanks,
>NeilBrown
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] raid1: Rewrite the implementation of iobarrier.
  2013-10-31  3:20       ` majianpeng
@ 2013-11-14  6:44         ` NeilBrown
  2013-11-15  2:29           ` majianpeng
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2013-11-14  6:44 UTC (permalink / raw)
  To: majianpeng; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 16056 bytes --]

On Thu, 31 Oct 2013 11:20:55 +0800 majianpeng <majianpeng@gmail.com> wrote:

Sorry for the delayed reply.


> >On Tue, 29 Oct 2013 09:30:14 +0800 majianpeng <majianpeng@gmail.com> wrote:
> >
> >
> >Nearly there!!  Just a few more details.  See below.
> >
> >
> >> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
> >> ---
> >>  drivers/md/raid1.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++------
> >>  drivers/md/raid1.h |  14 ++++++
> >>  2 files changed, 124 insertions(+), 12 deletions(-)
> >> 
> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> >> index b4a6dcd..5b311c0 100644
> >> --- a/drivers/md/raid1.c
> >> +++ b/drivers/md/raid1.c
> >> @@ -66,7 +66,8 @@
> >>   */
> >>  static int max_queued_requests = 1024;
> >>  
> >> -static void allow_barrier(struct r1conf *conf);
> >> +static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
> >> +				sector_t bi_sector);
> >>  static void lower_barrier(struct r1conf *conf);
> >>  
> >>  static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
> >> @@ -227,6 +228,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
> >>  	struct bio *bio = r1_bio->master_bio;
> >>  	int done;
> >>  	struct r1conf *conf = r1_bio->mddev->private;
> >> +	sector_t start_next_window = r1_bio->start_next_window;
> >> +	sector_t bi_sector = bio->bi_sector;
> >
> >This should  be r1_bio->sector, not bio->bi_sector.
> >They are often the same but if multiple r1_bios are needed for some reason
> >(e.g. bad blocks) they may not be.
> >
> No, allow_barrier() only for bio not for r1bio.It only do when bio->bi_phys_segments == 0.

Yes, I see that now, thanks.


> If this value is r1_bio->sector, the value may has different value as you said.
> So in allow_barrier():
>        if (start_next_window) {
>                 if (start_next_window == conf->start_next_window) {
>                         if (conf->start_next_window + NEXT_NORMALIO_DISTANCE
>                                 <= bi_sector)
>                                 conf->next_window_requests--;
>                         else 
>                                 conf->current_window_requests--;
>                 } else 
> It will has differnt result.
> 
> >>  
> >>  	if (bio->bi_phys_segments) {
> >>  		unsigned long flags;
> >> @@ -234,6 +237,11 @@ static void call_bio_endio(struct r1bio *r1_bio)
> >>  		bio->bi_phys_segments--;
> >>  		done = (bio->bi_phys_segments == 0);
> >>  		spin_unlock_irqrestore(&conf->device_lock, flags);
> >> +		/*
> >> +		 * make_request() might be waiting for
> >> +		 * bi_phys_segments to decrease
> >> +		 */
> >> +		wake_up(&conf->wait_barrier);
> >>  	} else
> >>  		done = 1;
> >>  
> >> @@ -245,7 +253,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
> >>  		 * Wake up any possible resync thread that waits for the device
> >>  		 * to go idle.
> >>  		 */
> >> -		allow_barrier(conf);
> >> +		allow_barrier(conf, start_next_window, bi_sector);
> >>  	}
> >>  }
> >>  
> >> @@ -827,10 +835,18 @@ static void raise_barrier(struct r1conf *conf)
> >>  	/* block any new IO from starting */
> >>  	conf->barrier++;
> >>  
> >> -	/* Now wait for all pending IO to complete */
> >> +	/* For those conditions we must wait:
> >> +	 * A:while the array is in frozen state
> >> +	 * B:while barrier >= RESYNC_DEPTH, meaning resync reach
> >> +	 *   the max count which allowed.
> >> +	 * C:next_resync + RESYNC_SECTORS > start_next_window, meaning
> >> +	 *   next resync will reach to window which normal bios are handling.
> >> +	 */
> >>  	wait_event_lock_irq(conf->wait_barrier,
> >>  			    !conf->array_frozen &&
> >> -			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
> >> +			    conf->barrier < RESYNC_DEPTH &&
> >> +			    (conf->start_next_window >=
> >> +			     conf->next_resync + RESYNC_SECTORS),
> >>  			    conf->resync_lock);
> >
> >You've removed the test on conf->nr_pending here, which I think is correct.
> >It counts 'read' requests as well.  Testing start_next_window serves the same
> >purpose as it is increased whenever current_window_requests reaches zero.
> >
> In func wait_barrier():
>         if (need_to_wait_for_sync(conf, bio)) {
>               [snip]
>         } else if (bio_data_dir(bio) == WRITE) {
>                 if (conf->next_resync + NEXT_NORMALIO_DISTANCE
>                                 <= bio->bi_sector) {
>                         if (conf->start_next_window == MaxSector)
>                                 conf->start_next_window =
>                                         conf->next_resync +
>                                                 NEXT_NORMALIO_DISTANCE;
> 
>                         if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
>                                 <= bio->bi_sector)
>                                 conf->next_window_requests++;
>                         else
>                                 conf->current_window_requests++;
>                 }
>                 if (bio->bi_sector >= conf->start_next_window)
>                         sector = conf->start_next_window;
>         }
> start_next_window only for bio_data_dir(bio) == WRITE. So resync can't wait read io to complete.
> 
> >However you having modified as similar test on nr_pending in wait_barrier().
I meant:       haven't modified a similar test ....
> >That worries me a bit.  Should that be changed to a test on start_next_window
> >to match the above change?
> For this condition, i only copy it.For this condition, i'm not know clearly.
> Can you give me more details about this?


Before your patch we know if there are pending requests (submitted but not
completed) which stop use from doing resync if conf->nr_pending > 0.
After your patch we cannot use that that test as it counts READ requests, and
requests that are outside the range being resynced.

The test is now a bit more complicated.  I think it is:

		 conf->start_next_window <
                 conf->next_resync + RESYNC_SECTORS


i.e. while start_next_window is less than next_resync+RESYNC_SECTORS there
cold be outstanding writes that are blocking resync.  As soon as those writes
complete, start_next_window will increase and the resync can complete.
So if start_next_window < next_resync+RESYNC_SECTORS and there are pending
request in current->bio_list, then then there is no point waiting for resync
(it cannot progress anyway) so we may as well submit another write.

So change:

		wait_event_lock_irq(conf->wait_barrier,
				    !conf->array_frozen &&
				    (!conf->barrier ||
				    (conf->nr_pending &&
				     current->bio_list &&
				     !bio_list_empty(current->bio_list))),
				    conf->resync_lock);

in wait barrier to

		wait_event_lock_irq(conf->wait_barrier,
				    !conf->array_frozen &&
				    (!conf->barrier ||
				    (conf->start_next_window < conf->next_resync + RESYNC_SECTORS &&
				     current->bio_list &&
				     !bio_list_empty(current->bio_list))),
				    conf->resync_lock);


> >
> >
> >>  
> >>  	spin_unlock_irq(&conf->resync_lock);
> >> @@ -846,10 +862,33 @@ static void lower_barrier(struct r1conf *conf)
> >>  	wake_up(&conf->wait_barrier);
> >>  }
> >>  
> >> -static void wait_barrier(struct r1conf *conf)
> >> +static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
> >> +{
> >> +	bool wait = false;
> >> +
> >> +	if (conf->array_frozen || !bio)
> >> +		wait = true;
> >> +	else if (conf->barrier && bio_data_dir(bio) == WRITE) {
> >> +		if (conf->next_resync < RESYNC_WINDOW_SECTORS)
> >> +			wait = true;
> >> +		else if ((conf->next_resync - RESYNC_WINDOW_SECTORS
> >> +				>= bio_end_sector(bio)) ||
> >> +			 (conf->next_resync + NEXT_NORMALIO_DISTANCE
> >> +				<= bio->bi_sector))
> >> +			wait = false;
> >> +		else
> >> +			wait = true;
> >> +	}
> >> +
> >> +	return wait;
> >> +}
> >> +
> >> +static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
> >>  {
> >> +	sector_t sector = 0;
> >> +
> >>  	spin_lock_irq(&conf->resync_lock);
> >> -	if (conf->barrier) {
> >> +	if (need_to_wait_for_sync(conf, bio)) {
> >>  		conf->nr_waiting++;
> >>  		/* Wait for the barrier to drop.
> >>  		 * However if there are already pending
> >> @@ -868,16 +907,57 @@ static void wait_barrier(struct r1conf *conf)
> >>  				     !bio_list_empty(current->bio_list))),
> >>  				    conf->resync_lock);
> >>  		conf->nr_waiting--;
> >> +	} else if (bio_data_dir(bio) == WRITE) {
> >> +		if (conf->next_resync + NEXT_NORMALIO_DISTANCE
> >> +				<= bio->bi_sector) {
> >> +			if (conf->start_next_window == MaxSector)
> >> +				conf->start_next_window =
> >> +					conf->next_resync +
> >> +						NEXT_NORMALIO_DISTANCE;
> >> +
> >> +			if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
> >> +				<= bio->bi_sector)
> >> +				conf->next_window_requests++;
> >> +			else
> >> +				conf->current_window_requests++;
> >> +		}
> >> +		if (bio->bi_sector >= conf->start_next_window)
> >> +			sector = conf->start_next_window;
> >
> >You aren't setting 'sector' if we needed to wait.  I don't think that is
> >correct, is it?
> >
> Yes.How about those code:
> spin_lock_irq(&conf->resync_lock);
> retry_check:
>         if (need_to_wait_for_sync(conf, bio)) {
>                 conf->nr_waiting++;
>                 /* Wait for the barrier to drop.
>                  * However if there are already pending
>                  * requests (preventing the barrier from
>                  * rising completely), and the
>                  * pre-process bio queue isn't empty,
>                  * then don't wait, as we need to empty
>                  * that queue to get the nr_pending
>                  * count down.
>                  */
>                 wait_event_lock_irq(conf->wait_barrier,
>                                     !conf->array_frozen &&
>                                     (!conf->barrier ||
>                                     (conf->nr_pending &&
>                                      current->bio_list &&
>                                      !bio_list_empty(current->bio_list))),
>                                     conf->resync_lock);
>                 conf->nr_waiting--;
> 				goto retry_check;
> >

It would look a lot better if you made it a while loop:

 while (need_to_wait....) {
     nr_waiting++
     wait_event.....
     nr_waiting--
 if (bio_data_dir.....


Thanks,
NeilBrown



> >>  	}
> >> +
> >>  	conf->nr_pending++;
> >>  	spin_unlock_irq(&conf->resync_lock);
> >> +	return sector;
> >>  }
> >>  
> >> -static void allow_barrier(struct r1conf *conf)
> >> +static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
> >> +				sector_t bi_sector)
> >>  {
> >>  	unsigned long flags;
> >> +
> >>  	spin_lock_irqsave(&conf->resync_lock, flags);
> >>  	conf->nr_pending--;
> >> +	if (start_next_window) {
> >> +		if (start_next_window == conf->start_next_window) {
> >> +			if (conf->start_next_window + NEXT_NORMALIO_DISTANCE
> >> +				<= bi_sector)
> >> +				conf->next_window_requests--;
> >> +			else
> >> +				conf->current_window_requests--;
> >> +		} else
> >> +			conf->current_window_requests--;
> >> +
> >> +		if (!conf->current_window_requests) {
> >> +			if (conf->next_window_requests) {
> >> +				conf->current_window_requests =
> >> +					conf->next_window_requests;
> >> +				conf->next_window_requests = 0;
> >> +				conf->start_next_window +=
> >> +					NEXT_NORMALIO_DISTANCE;
> >> +			} else
> >> +				conf->start_next_window = MaxSector;
> >> +		}
> >> +	}
> >>  	spin_unlock_irqrestore(&conf->resync_lock, flags);
> >>  	wake_up(&conf->wait_barrier);
> >>  }
> >> @@ -1012,6 +1092,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
> >>  	int first_clone;
> >>  	int sectors_handled;
> >>  	int max_sectors;
> >> +	sector_t start_next_window;
> >>  
> >>  	/*
> >>  	 * Register the new request and wait if the reconstruction
> >> @@ -1041,7 +1122,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
> >>  		finish_wait(&conf->wait_barrier, &w);
> >>  	}
> >>  
> >> -	wait_barrier(conf);
> >> +	start_next_window = wait_barrier(conf, bio);
> >>  
> >>  	bitmap = mddev->bitmap;
> >>  
> >> @@ -1162,6 +1243,7 @@ read_again:
> >>  
> >>  	disks = conf->raid_disks * 2;
> >>   retry_write:
> >> +	r1_bio->start_next_window = start_next_window;
> >>  	blocked_rdev = NULL;
> >>  	rcu_read_lock();
> >>  	max_sectors = r1_bio->sectors;
> >> @@ -1230,14 +1312,24 @@ read_again:
> >>  	if (unlikely(blocked_rdev)) {
> >>  		/* Wait for this device to become unblocked */
> >>  		int j;
> >> +		sector_t old = start_next_window;
> >>  
> >>  		for (j = 0; j < i; j++)
> >>  			if (r1_bio->bios[j])
> >>  				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
> >>  		r1_bio->state = 0;
> >> -		allow_barrier(conf);
> >> +		allow_barrier(conf, start_next_window, bio->bi_sector);
> >
> >I think this should be r1_bio->sector, not bio->bi_sector, for the same
> >reason as earlier.
> >
> The exaplaination is the same at above.
> 
> Thanks!
> Jianpeng Ma
> >
> >>  		md_wait_for_blocked_rdev(blocked_rdev, mddev);
> >> -		wait_barrier(conf);
> >> +		start_next_window = wait_barrier(conf, bio);
> >> +		/*
> >> +		 * We must make sure the multi r1bios of bio have
> >> +		 * the same value of bi_phys_segments
> >> +		 */
> >> +		if (bio->bi_phys_segments && old &&
> >> +			old != start_next_window)
> >> +			/*wait the former r1bio(s) completed*/
> >> +			wait_event(conf->wait_barrier,
> >> +					bio->bi_phys_segments == 1);
> >>  		goto retry_write;
> >>  	}
> >>  
> >> @@ -1437,11 +1529,14 @@ static void print_conf(struct r1conf *conf)
> >>  
> >>  static void close_sync(struct r1conf *conf)
> >>  {
> >> -	wait_barrier(conf);
> >> -	allow_barrier(conf);
> >> +	wait_barrier(conf, NULL);
> >> +	allow_barrier(conf, 0, 0);
> >>  
> >>  	mempool_destroy(conf->r1buf_pool);
> >>  	conf->r1buf_pool = NULL;
> >> +
> >> +	conf->next_resync = 0;
> >> +	conf->start_next_window = MaxSector;
> >>  }
> >>  
> >>  static int raid1_spare_active(struct mddev *mddev)
> >> @@ -2713,6 +2808,9 @@ static struct r1conf *setup_conf(struct mddev *mddev)
> >>  	conf->pending_count = 0;
> >>  	conf->recovery_disabled = mddev->recovery_disabled - 1;
> >>  
> >> +	conf->start_next_window = MaxSector;
> >> +	conf->current_window_requests = conf->next_window_requests = 0;
> >> +
> >>  	err = -EIO;
> >>  	for (i = 0; i < conf->raid_disks * 2; i++) {
> >>  
> >> diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
> >> index 331a98a..07425a1 100644
> >> --- a/drivers/md/raid1.h
> >> +++ b/drivers/md/raid1.h
> >> @@ -41,6 +41,19 @@ struct r1conf {
> >>  	 */
> >>  	sector_t		next_resync;
> >>  
> >> +	/*when raid1 start resync,we divide raid into four partitions
> >> +	 * |---------|--------------|---------------------|-------------|
> >> +	 *        next_resync   start_next_window        Pc
> >> +	 * Now start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
> >> +	 * Pc = start_next_window + NEXT_NORMALIO_DISTANCE
> >> +	 * current_window_requests means the count of normalIO between
> >> +	 * start_next_window and Pc.
> >> +	 * next_window_requests means the count of nornalIO after Pc.
> >> +	 * */
> >> +	sector_t		start_next_window;
> >> +	int			current_window_requests;
> >> +	int			next_window_requests;
> >> +
> >>  	spinlock_t		device_lock;
> >>  
> >>  	/* list of 'struct r1bio' that need to be processed by raid1d,
> >> @@ -112,6 +125,7 @@ struct r1bio {
> >>  						 * in this BehindIO request
> >>  						 */
> >>  	sector_t		sector;
> >> +	sector_t		start_next_window;
> >>  	int			sectors;
> >>  	unsigned long		state;
> >>  	struct mddev		*mddev;
> >
> >
> >Thanks,
> >NeilBrown
> >

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH 4/4] raid1: Rewrite the implementation of iobarrier.
  2013-11-14  6:44         ` NeilBrown
@ 2013-11-15  2:29           ` majianpeng
  2013-11-15  3:42             ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: majianpeng @ 2013-11-15  2:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

>On Thu, 31 Oct 2013 11:20:55 +0800 majianpeng <majianpeng@gmail.com> wrote:
>
>Sorry for the delayed reply.
>
>
>> >On Tue, 29 Oct 2013 09:30:14 +0800 majianpeng <majianpeng@gmail.com> wrote:
>> >
>> >
>> >Nearly there!!  Just a few more details.  See below.
>> >
>> >
>> >> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
>> >> ---
>> >>  drivers/md/raid1.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++------
>> >>  drivers/md/raid1.h |  14 ++++++
>> >>  2 files changed, 124 insertions(+), 12 deletions(-)
>> >> 
>> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> >> index b4a6dcd..5b311c0 100644
>> >> --- a/drivers/md/raid1.c
>> >> +++ b/drivers/md/raid1.c
>> >> @@ -66,7 +66,8 @@
>> >>   */
>> >>  static int max_queued_requests = 1024;
>> >>  
>> >> -static void allow_barrier(struct r1conf *conf);
>> >> +static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
>> >> +				sector_t bi_sector);
>> >>  static void lower_barrier(struct r1conf *conf);
>> >>  
>> >>  static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
>> >> @@ -227,6 +228,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
>> >>  	struct bio *bio = r1_bio->master_bio;
>> >>  	int done;
>> >>  	struct r1conf *conf = r1_bio->mddev->private;
>> >> +	sector_t start_next_window = r1_bio->start_next_window;
>> >> +	sector_t bi_sector = bio->bi_sector;
>> >
>> >This should  be r1_bio->sector, not bio->bi_sector.
>> >They are often the same but if multiple r1_bios are needed for some reason
>> >(e.g. bad blocks) they may not be.
>> >
>> No, allow_barrier() only for bio not for r1bio.It only do when bio->bi_phys_segments == 0.
>
>Yes, I see that now, thanks.
>
>
>> If this value is r1_bio->sector, the value may has different value as you said.
>> So in allow_barrier():
>>        if (start_next_window) {
>>                 if (start_next_window == conf->start_next_window) {
>>                         if (conf->start_next_window + NEXT_NORMALIO_DISTANCE
>>                                 <= bi_sector)
>>                                 conf->next_window_requests--;
>>                         else 
>>                                 conf->current_window_requests--;
>>                 } else 
>> It will has differnt result.
>> 
>> >>  
>> >>  	if (bio->bi_phys_segments) {
>> >>  		unsigned long flags;
>> >> @@ -234,6 +237,11 @@ static void call_bio_endio(struct r1bio *r1_bio)
>> >>  		bio->bi_phys_segments--;
>> >>  		done = (bio->bi_phys_segments == 0);
>> >>  		spin_unlock_irqrestore(&conf->device_lock, flags);
>> >> +		/*
>> >> +		 * make_request() might be waiting for
>> >> +		 * bi_phys_segments to decrease
>> >> +		 */
>> >> +		wake_up(&conf->wait_barrier);
>> >>  	} else
>> >>  		done = 1;
>> >>  
>> >> @@ -245,7 +253,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
>> >>  		 * Wake up any possible resync thread that waits for the device
>> >>  		 * to go idle.
>> >>  		 */
>> >> -		allow_barrier(conf);
>> >> +		allow_barrier(conf, start_next_window, bi_sector);
>> >>  	}
>> >>  }
>> >>  
>> >> @@ -827,10 +835,18 @@ static void raise_barrier(struct r1conf *conf)
>> >>  	/* block any new IO from starting */
>> >>  	conf->barrier++;
>> >>  
>> >> -	/* Now wait for all pending IO to complete */
>> >> +	/* For those conditions we must wait:
>> >> +	 * A:while the array is in frozen state
>> >> +	 * B:while barrier >= RESYNC_DEPTH, meaning resync reach
>> >> +	 *   the max count which allowed.
>> >> +	 * C:next_resync + RESYNC_SECTORS > start_next_window, meaning
>> >> +	 *   next resync will reach to window which normal bios are handling.
>> >> +	 */
>> >>  	wait_event_lock_irq(conf->wait_barrier,
>> >>  			    !conf->array_frozen &&
>> >> -			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
>> >> +			    conf->barrier < RESYNC_DEPTH &&
>> >> +			    (conf->start_next_window >=
>> >> +			     conf->next_resync + RESYNC_SECTORS),
>> >>  			    conf->resync_lock);
>> >
>> >You've removed the test on conf->nr_pending here, which I think is correct.
>> >It counts 'read' requests as well.  Testing start_next_window serves the same
>> >purpose as it is increased whenever current_window_requests reaches zero.
>> >
>> In func wait_barrier():
>>         if (need_to_wait_for_sync(conf, bio)) {
>>               [snip]
>>         } else if (bio_data_dir(bio) == WRITE) {
>>                 if (conf->next_resync + NEXT_NORMALIO_DISTANCE
>>                                 <= bio->bi_sector) {
>>                         if (conf->start_next_window == MaxSector)
>>                                 conf->start_next_window =
>>                                         conf->next_resync +
>>                                                 NEXT_NORMALIO_DISTANCE;
>> 
>>                         if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
>>                                 <= bio->bi_sector)
>>                                 conf->next_window_requests++;
>>                         else
>>                                 conf->current_window_requests++;
>>                 }
>>                 if (bio->bi_sector >= conf->start_next_window)
>>                         sector = conf->start_next_window;
>>         }
>> start_next_window only for bio_data_dir(bio) == WRITE. So resync can't wait read io to complete.
>> 
>> >However you having modified as similar test on nr_pending in wait_barrier().
>I meant:       haven't modified a similar test ....
>> >That worries me a bit.  Should that be changed to a test on start_next_window
>> >to match the above change?
>> For this condition, i only copy it.For this condition, i'm not know clearly.
>> Can you give me more details about this?
>
>
>Before your patch we know if there are pending requests (submitted but not
>completed) which stop use from doing resync if conf->nr_pending > 0.
>After your patch we cannot use that that test as it counts READ requests, and
>requests that are outside the range being resynced.
>
>The test is now a bit more complicated.  I think it is:
>
>		 conf->start_next_window <
>                 conf->next_resync + RESYNC_SECTORS
>
>
>i.e. while start_next_window is less than next_resync+RESYNC_SECTORS there
>cold be outstanding writes that are blocking resync.  As soon as those writes
>complete, start_next_window will increase and the resync can complete.
>So if start_next_window < next_resync+RESYNC_SECTORS and there are pending
>request in current->bio_list, then then there is no point waiting for resync
>(it cannot progress anyway) so we may as well submit another write.
>
>So change:
>
>		wait_event_lock_irq(conf->wait_barrier,
>				    !conf->array_frozen &&
>				    (!conf->barrier ||
>				    (conf->nr_pending &&
>				     current->bio_list &&
>				     !bio_list_empty(current->bio_list))),
>				    conf->resync_lock);
>
>in wait barrier to
>
>		wait_event_lock_irq(conf->wait_barrier,
>				    !conf->array_frozen &&
>				    (!conf->barrier ||
>				    (conf->start_next_window < conf->next_resync + RESYNC_SECTORS &&
>				     current->bio_list &&
>				     !bio_list_empty(current->bio_list))),
>				    conf->resync_lock);
>
>
Ok, apply this.
>> >
>> >
>> >>  
>> >>  	spin_unlock_irq(&conf->resync_lock);
>> >> @@ -846,10 +862,33 @@ static void lower_barrier(struct r1conf *conf)
>> >>  	wake_up(&conf->wait_barrier);
>> >>  }
>> >>  
>> >> -static void wait_barrier(struct r1conf *conf)
>> >> +static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
>> >> +{
>> >> +	bool wait = false;
>> >> +
>> >> +	if (conf->array_frozen || !bio)
>> >> +		wait = true;
>> >> +	else if (conf->barrier && bio_data_dir(bio) == WRITE) {
>> >> +		if (conf->next_resync < RESYNC_WINDOW_SECTORS)
>> >> +			wait = true;
>> >> +		else if ((conf->next_resync - RESYNC_WINDOW_SECTORS
>> >> +				>= bio_end_sector(bio)) ||
>> >> +			 (conf->next_resync + NEXT_NORMALIO_DISTANCE
>> >> +				<= bio->bi_sector))
>> >> +			wait = false;
>> >> +		else
>> >> +			wait = true;
>> >> +	}
>> >> +
>> >> +	return wait;
>> >> +}
>> >> +
>> >> +static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
>> >>  {
>> >> +	sector_t sector = 0;
>> >> +
>> >>  	spin_lock_irq(&conf->resync_lock);
>> >> -	if (conf->barrier) {
>> >> +	if (need_to_wait_for_sync(conf, bio)) {
>> >>  		conf->nr_waiting++;
>> >>  		/* Wait for the barrier to drop.
>> >>  		 * However if there are already pending
>> >> @@ -868,16 +907,57 @@ static void wait_barrier(struct r1conf *conf)
>> >>  				     !bio_list_empty(current->bio_list))),
>> >>  				    conf->resync_lock);
>> >>  		conf->nr_waiting--;
>> >> +	} else if (bio_data_dir(bio) == WRITE) {
>> >> +		if (conf->next_resync + NEXT_NORMALIO_DISTANCE
>> >> +				<= bio->bi_sector) {
>> >> +			if (conf->start_next_window == MaxSector)
>> >> +				conf->start_next_window =
>> >> +					conf->next_resync +
>> >> +						NEXT_NORMALIO_DISTANCE;
>> >> +
>> >> +			if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
>> >> +				<= bio->bi_sector)
>> >> +				conf->next_window_requests++;
>> >> +			else
>> >> +				conf->current_window_requests++;
>> >> +		}
>> >> +		if (bio->bi_sector >= conf->start_next_window)
>> >> +			sector = conf->start_next_window;
>> >
>> >You aren't setting 'sector' if we needed to wait.  I don't think that is
>> >correct, is it?
>> >
>> Yes.How about those code:
>> spin_lock_irq(&conf->resync_lock);
>> retry_check:
>>         if (need_to_wait_for_sync(conf, bio)) {
>>                 conf->nr_waiting++;
>>                 /* Wait for the barrier to drop.
>>                  * However if there are already pending
>>                  * requests (preventing the barrier from
>>                  * rising completely), and the
>>                  * pre-process bio queue isn't empty,
>>                  * then don't wait, as we need to empty
>>                  * that queue to get the nr_pending
>>                  * count down.
>>                  */
>>                 wait_event_lock_irq(conf->wait_barrier,
>>                                     !conf->array_frozen &&
>>                                     (!conf->barrier ||
>>                                     (conf->nr_pending &&
>>                                      current->bio_list &&
>>                                      !bio_list_empty(current->bio_list))),
>>                                     conf->resync_lock);
>>                 conf->nr_waiting--;
>> 				goto retry_check;
>> >
>
>It would look a lot better if you made it a while loop:
>
> while (need_to_wait....) {
>     nr_waiting++
>     wait_event.....
>     nr_waiting--
> if (bio_data_dir.....
>
For this, i think it don;t need while() or recheck.
The code should:

if (nedd_to_wait...) {
	nr_waiting++
	wait_event.....
	nr_waiting--
}
if (bio && bio_data_dir(bio) == WRITE) {
	 do;
}

I think again need_to_wait...(),it must return false. So we don't recheck this.
Or am I missing something?

Thanks!
Jianpeng Ma




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] raid1: Rewrite the implementation of iobarrier.
  2013-11-15  2:29           ` majianpeng
@ 2013-11-15  3:42             ` NeilBrown
  2013-11-15  6:55               ` majianpeng
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2013-11-15  3:42 UTC (permalink / raw)
  To: majianpeng; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2008 bytes --]

On Fri, 15 Nov 2013 10:29:56 +0800 majianpeng <majianpeng@gmail.com> wrote:

> >> Yes.How about those code:
> >> spin_lock_irq(&conf->resync_lock);
> >> retry_check:
> >>         if (need_to_wait_for_sync(conf, bio)) {
> >>                 conf->nr_waiting++;
> >>                 /* Wait for the barrier to drop.
> >>                  * However if there are already pending
> >>                  * requests (preventing the barrier from
> >>                  * rising completely), and the
> >>                  * pre-process bio queue isn't empty,
> >>                  * then don't wait, as we need to empty
> >>                  * that queue to get the nr_pending
> >>                  * count down.
> >>                  */
> >>                 wait_event_lock_irq(conf->wait_barrier,
> >>                                     !conf->array_frozen &&
> >>                                     (!conf->barrier ||
> >>                                     (conf->nr_pending &&
> >>                                      current->bio_list &&
> >>                                      !bio_list_empty(current->bio_list))),
> >>                                     conf->resync_lock);
> >>                 conf->nr_waiting--;
> >> 				goto retry_check;
> >> >
> >
> >It would look a lot better if you made it a while loop:
> >
> > while (need_to_wait....) {
> >     nr_waiting++
> >     wait_event.....
> >     nr_waiting--
> > if (bio_data_dir.....
> >
> For this, i think it don;t need while() or recheck.
> The code should:
> 
> if (nedd_to_wait...) {
> 	nr_waiting++
> 	wait_event.....
> 	nr_waiting--
> }
> if (bio && bio_data_dir(bio) == WRITE) {
> 	 do;
> }
> 
> I think again need_to_wait...(),it must return false. So we don't recheck this.
> Or am I missing something?
> 

An 'if' is fine.  I just didn't like the goto.
If you can send me the updated patch in a day or 2 I should be able to get it
into the current merge window.

thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH 4/4] raid1: Rewrite the implementation of iobarrier.
  2013-11-15  3:42             ` NeilBrown
@ 2013-11-15  6:55               ` majianpeng
  2013-11-19  4:25                 ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: majianpeng @ 2013-11-15  6:55 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

>On Fri, 15 Nov 2013 10:29:56 +0800 majianpeng <majianpeng@gmail.com> wrote:
>
>> >> Yes.How about those code:
>> >> spin_lock_irq(&conf->resync_lock);
>> >> retry_check:
>> >>         if (need_to_wait_for_sync(conf, bio)) {
>> >>                 conf->nr_waiting++;
>> >>                 /* Wait for the barrier to drop.
>> >>                  * However if there are already pending
>> >>                  * requests (preventing the barrier from
>> >>                  * rising completely), and the
>> >>                  * pre-process bio queue isn't empty,
>> >>                  * then don't wait, as we need to empty
>> >>                  * that queue to get the nr_pending
>> >>                  * count down.
>> >>                  */
>> >>                 wait_event_lock_irq(conf->wait_barrier,
>> >>                                     !conf->array_frozen &&
>> >>                                     (!conf->barrier ||
>> >>                                     (conf->nr_pending &&
>> >>                                      current->bio_list &&
>> >>                                      !bio_list_empty(current->bio_list))),
>> >>                                     conf->resync_lock);
>> >>                 conf->nr_waiting--;
>> >> 				goto retry_check;
>> >> >
>> >
>> >It would look a lot better if you made it a while loop:
>> >
>> > while (need_to_wait....) {
>> >     nr_waiting++
>> >     wait_event.....
>> >     nr_waiting--
>> > if (bio_data_dir.....
>> >
>> For this, i think it don;t need while() or recheck.
>> The code should:
>> 
>> if (nedd_to_wait...) {
>> 	nr_waiting++
>> 	wait_event.....
>> 	nr_waiting--
>> }
>> if (bio && bio_data_dir(bio) == WRITE) {
>> 	 do;
>> }
>> 
>> I think again need_to_wait...(),it must return false. So we don't recheck this.
>> Or am I missing something?
>> 
>
>An 'if' is fine.  I just didn't like the goto.
>If you can send me the updated patch in a day or 2 I should be able to get it
>into the current merge window.
>
>thanks,
>NeilBrown
>
Thanks very much neil! Below the latest patch.

Subject:raid1: Rewrite the implementation of iobarrier.

There is a iobarrier in raid1 because the contend between normal IO and
resync IO.It suspend all normal IO when reysync.
But if normalIO is outrange the resync windwos,there is no contend.
So I rewrite the iobarrier.

I patition the whole space into five parts.
|---------|-----------|------------|----------------|-------|
         Pa    next_resync   start_next_window      Pb

Pa + SYNC_WINDOW = next_resync
next_resync + NEXT_NORMALIO_DISTANCE = start_next_window
start_next_window + NEXT_NORMAILIO_DISTANCE = Pb

Firstly i introduce some concepts:
1:RESYNC_WINDOW: For resync, there are 32 resync requests at most at the
same time.A sync request is RESYNC_BLOCK_SIZE(64*1024).
So the RESYNC_WINDOW is 32 * RESYNC_BLOCK_SIZE, that is 2MB.
2:NEXT_NORMALIO_DISTANCE: it indicate the distance between next_resync
and start_next_window.It also indicate the distance between
start_next_window and Pb.Now it is 3 * RESYNC_WINDOW_SIZE.It can tune.
3:next_resync: mean the next start sector which will being do sync.
4:Pa means: a position which before RESYNC_WINDOW distance from
next_resync.
5:start_next_window: means a position which after NEXT_NORMALIO_DISTANCE
distance from next_resync.For normalio after this position,
it can't wait resyncio to complete.
6:Pb: means a position which after 2 * NEXT_NORMALIO_DISTANCE distance
from next_resync.
7:current_window_requests: means the count of normalIO between Pb and
start_next_window.
8:next_window_requests: means the count of nonmalIO after Pb.

NormalIO will be partitioned into four types:
NoIO1: it means the end sector of bio is smaller or equal the Pa.
NoIO2: it means the start sector of bio larger or equal Pb
NoIO3: it means the start sector of bio larger or equal
start_next_window.
NoIO4: it means the location between Pa and start_next_window

|--------|-----------|--------------------|----------------|---------|
    |    Pa   |   next_resync   |  start_next_window   |   Pb   |
  NoIO1     NoIO4             NoIO4                  NoIO3     NoIO2

For NoIO1,it don't use iobarrier.
For NoIO4,it used the original iobarrier mechanism.The nornalIO and
resyncIO must be contend.
For NoIO2/3, i add two fields in struct r1conf "current_window_requests,
next_window_requests".They indicate the count of two types.
For those, they don't wait resync io to complete.

For resync action, if there are NoIO4s, it must be waited.If not,it
forward.But if sync action reached to the start_next_window and
current_window_requestes > 0 (that is there are NoIO3s),it must
wait until the current_window_requests became zero.
If current_window_requests became zero,the start_next_window also
forward. The current_window_requests will replaced by
next_window_requests.

There is a problem which when and how to change from NoIO2 to NoIO3.Only
so, the sync action can forward.

I add a field in struct r1conf "start_next_window".What condition will
change?
A: if start_next_window == MaxSector, it means there are no NoIO2/3.
   So start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
B: if current_window_requests == 0 && next_window_requests != 0, it
   means start_next_window move to Pb

There is anthor problem which how to differentiate between
old NoIO2(now it is NoIO3) and NoIO2.
For example, there are many bios which are NoIO2 and a bio which is
NoIO3. NoIO3 firstly completed,so the bios of NoIO2 became NoIO3.
At general,we use flags and list_head. The codes should:
>list_splice_init(NoIO2, NoIO3);
>current_window_requests = next_window_requests;
>next_window_requests = 0;
>list_for_each_entry(NoIO3){
>       clear_bit(NoIO2_FLAG),
>       set_bit(NoIO3_FLAG,
>}
If there are many NoIO2, it will take too much time in
list_for_each_entry.
Avoid this, i add a field in struct r1bio "start_next_window".
I used this to record the position conf->start_next_window when call
wait_barrier in func make_request.

In func allow_barrier, it check the conf->start_next_window.
If r1bio->stat_next_window == conf->start_next_window,it mean
there is no transition between NoIO2 and NoIO3.
If r1bio->start_next_window != conf->start_next_window,it mean
there is a trnasiton between NoIO2 and NoIO3.Because there is at most
one
transtion.So it only means the bio is old NoIO2.

For one bio,there are many r1bio's.So we make sure the
all r1bio->start_next_window are the same value.
If we met blocked_dev in make_request(), it must call allow_barrier
and wait_barrier.So the former and the later value of
conf->start_next_window will be change.
If there are many r1bio's with differnet start_next_window,
for the relevant bio, it depent on the last value of r1bio.
It will cause error.To avoid this, we must wait previous r1bios
to complete.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
---
 drivers/md/raid1.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++------
 drivers/md/raid1.h |  14 ++++++
 2 files changed, 128 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index b4a6dcd..8296609 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -66,7 +66,8 @@
  */
 static int max_queued_requests = 1024;
 
-static void allow_barrier(struct r1conf *conf);
+static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
+				sector_t bi_sector);
 static void lower_barrier(struct r1conf *conf);
 
 static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
@@ -227,6 +228,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
 	struct bio *bio = r1_bio->master_bio;
 	int done;
 	struct r1conf *conf = r1_bio->mddev->private;
+	sector_t start_next_window = r1_bio->start_next_window;
+	sector_t bi_sector = bio->bi_sector;
 
 	if (bio->bi_phys_segments) {
 		unsigned long flags;
@@ -234,6 +237,11 @@ static void call_bio_endio(struct r1bio *r1_bio)
 		bio->bi_phys_segments--;
 		done = (bio->bi_phys_segments == 0);
 		spin_unlock_irqrestore(&conf->device_lock, flags);
+		/*
+		 * make_request() might be waiting for
+		 * bi_phys_segments to decrease
+		 */
+		wake_up(&conf->wait_barrier);
 	} else
 		done = 1;
 
@@ -245,7 +253,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
 		 * Wake up any possible resync thread that waits for the device
 		 * to go idle.
 		 */
-		allow_barrier(conf);
+		allow_barrier(conf, start_next_window, bi_sector);
 	}
 }
 
@@ -827,10 +835,18 @@ static void raise_barrier(struct r1conf *conf)
 	/* block any new IO from starting */
 	conf->barrier++;
 
-	/* Now wait for all pending IO to complete */
+	/* For those conditions we must wait:
+	 * A:while the array is in frozen state
+	 * B:while barrier >= RESYNC_DEPTH, meaning resync reach
+	 *   the max count which allowed.
+	 * C:next_resync + RESYNC_SECTORS > start_next_window, meaning
+	 *   next resync will reach to window which normal bios are handling.
+	 */
 	wait_event_lock_irq(conf->wait_barrier,
 			    !conf->array_frozen &&
-			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
+			    conf->barrier < RESYNC_DEPTH &&
+			    (conf->start_next_window >=
+			     conf->next_resync + RESYNC_SECTORS),
 			    conf->resync_lock);
 
 	spin_unlock_irq(&conf->resync_lock);
@@ -846,10 +862,33 @@ static void lower_barrier(struct r1conf *conf)
 	wake_up(&conf->wait_barrier);
 }
 
-static void wait_barrier(struct r1conf *conf)
+static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
+{
+	bool wait = false;
+
+	if (conf->array_frozen || !bio)
+		wait = true;
+	else if (conf->barrier && bio_data_dir(bio) == WRITE) {
+		if (conf->next_resync < RESYNC_WINDOW_SECTORS)
+			wait = true;
+		else if ((conf->next_resync - RESYNC_WINDOW_SECTORS
+				>= bio_end_sector(bio)) ||
+			 (conf->next_resync + NEXT_NORMALIO_DISTANCE
+				<= bio->bi_sector))
+			wait = false;
+		else
+			wait = true;
+	}
+
+	return wait;
+}
+
+static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
 {
+	sector_t sector = 0;
+
 	spin_lock_irq(&conf->resync_lock);
-	if (conf->barrier) {
+	if (need_to_wait_for_sync(conf, bio)) {
 		conf->nr_waiting++;
 		/* Wait for the barrier to drop.
 		 * However if there are already pending
@@ -863,21 +902,65 @@ static void wait_barrier(struct r1conf *conf)
 		wait_event_lock_irq(conf->wait_barrier,
 				    !conf->array_frozen &&
 				    (!conf->barrier ||
-				    (conf->nr_pending &&
+				    ((conf->start_next_window <
+				      conf->next_resync + RESYNC_SECTORS) &&
 				     current->bio_list &&
 				     !bio_list_empty(current->bio_list))),
 				    conf->resync_lock);
 		conf->nr_waiting--;
 	}
+
+	if (bio && bio_data_dir(bio) == WRITE) {
+		if (conf->next_resync + NEXT_NORMALIO_DISTANCE
+				<= bio->bi_sector) {
+			if (conf->start_next_window == MaxSector)
+				conf->start_next_window =
+					conf->next_resync +
+						NEXT_NORMALIO_DISTANCE;
+
+			if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
+				<= bio->bi_sector)
+				conf->next_window_requests++;
+			else
+				conf->current_window_requests++;
+		}
+		if (bio->bi_sector >= conf->start_next_window)
+			sector = conf->start_next_window;
+	}
+
 	conf->nr_pending++;
 	spin_unlock_irq(&conf->resync_lock);
+	return sector;
 }
 
-static void allow_barrier(struct r1conf *conf)
+static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
+				sector_t bi_sector)
 {
 	unsigned long flags;
+
 	spin_lock_irqsave(&conf->resync_lock, flags);
 	conf->nr_pending--;
+	if (start_next_window) {
+		if (start_next_window == conf->start_next_window) {
+			if (conf->start_next_window + NEXT_NORMALIO_DISTANCE
+				<= bi_sector)
+				conf->next_window_requests--;
+			else
+				conf->current_window_requests--;
+		} else
+			conf->current_window_requests--;
+
+		if (!conf->current_window_requests) {
+			if (conf->next_window_requests) {
+				conf->current_window_requests =
+					conf->next_window_requests;
+				conf->next_window_requests = 0;
+				conf->start_next_window +=
+					NEXT_NORMALIO_DISTANCE;
+			} else
+				conf->start_next_window = MaxSector;
+		}
+	}
 	spin_unlock_irqrestore(&conf->resync_lock, flags);
 	wake_up(&conf->wait_barrier);
 }
@@ -1012,6 +1095,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 	int first_clone;
 	int sectors_handled;
 	int max_sectors;
+	sector_t start_next_window;
 
 	/*
 	 * Register the new request and wait if the reconstruction
@@ -1041,7 +1125,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 		finish_wait(&conf->wait_barrier, &w);
 	}
 
-	wait_barrier(conf);
+	start_next_window = wait_barrier(conf, bio);
 
 	bitmap = mddev->bitmap;
 
@@ -1162,6 +1246,7 @@ read_again:
 
 	disks = conf->raid_disks * 2;
  retry_write:
+	r1_bio->start_next_window = start_next_window;
 	blocked_rdev = NULL;
 	rcu_read_lock();
 	max_sectors = r1_bio->sectors;
@@ -1230,14 +1315,24 @@ read_again:
 	if (unlikely(blocked_rdev)) {
 		/* Wait for this device to become unblocked */
 		int j;
+		sector_t old = start_next_window;
 
 		for (j = 0; j < i; j++)
 			if (r1_bio->bios[j])
 				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
 		r1_bio->state = 0;
-		allow_barrier(conf);
+		allow_barrier(conf, start_next_window, bio->bi_sector);
 		md_wait_for_blocked_rdev(blocked_rdev, mddev);
-		wait_barrier(conf);
+		start_next_window = wait_barrier(conf, bio);
+		/*
+		 * We must make sure the multi r1bios of bio have
+		 * the same value of bi_phys_segments
+		 */
+		if (bio->bi_phys_segments && old &&
+			old != start_next_window)
+			/*wait the former r1bio(s) completed*/
+			wait_event(conf->wait_barrier,
+					bio->bi_phys_segments == 1);
 		goto retry_write;
 	}
 
@@ -1437,11 +1532,14 @@ static void print_conf(struct r1conf *conf)
 
 static void close_sync(struct r1conf *conf)
 {
-	wait_barrier(conf);
-	allow_barrier(conf);
+	wait_barrier(conf, NULL);
+	allow_barrier(conf, 0, 0);
 
 	mempool_destroy(conf->r1buf_pool);
 	conf->r1buf_pool = NULL;
+
+	conf->next_resync = 0;
+	conf->start_next_window = MaxSector;
 }
 
 static int raid1_spare_active(struct mddev *mddev)
@@ -2713,6 +2811,9 @@ static struct r1conf *setup_conf(struct mddev *mddev)
 	conf->pending_count = 0;
 	conf->recovery_disabled = mddev->recovery_disabled - 1;
 
+	conf->start_next_window = MaxSector;
+	conf->current_window_requests = conf->next_window_requests = 0;
+
 	err = -EIO;
 	for (i = 0; i < conf->raid_disks * 2; i++) {
 
diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index 331a98a..07425a1 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -41,6 +41,19 @@ struct r1conf {
 	 */
 	sector_t		next_resync;
 
+	/*when raid1 start resync,we divide raid into four partitions
+	 * |---------|--------------|---------------------|-------------|
+	 *        next_resync   start_next_window        Pc
+	 * Now start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
+	 * Pc = start_next_window + NEXT_NORMALIO_DISTANCE
+	 * current_window_requests means the count of normalIO between
+	 * start_next_window and Pc.
+	 * next_window_requests means the count of nornalIO after Pc.
+	 * */
+	sector_t		start_next_window;
+	int			current_window_requests;
+	int			next_window_requests;
+
 	spinlock_t		device_lock;
 
 	/* list of 'struct r1bio' that need to be processed by raid1d,
@@ -112,6 +125,7 @@ struct r1bio {
 						 * in this BehindIO request
 						 */
 	sector_t		sector;
+	sector_t		start_next_window;
 	int			sectors;
 	unsigned long		state;
 	struct mddev		*mddev;
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] raid1: Rewrite the implementation of iobarrier.
  2013-11-15  6:55               ` majianpeng
@ 2013-11-19  4:25                 ` NeilBrown
  2013-11-19  7:53                   ` majianpeng
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2013-11-19  4:25 UTC (permalink / raw)
  To: majianpeng; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 485 bytes --]

On Fri, 15 Nov 2013 14:55:02 +0800 majianpeng <majianpeng@gmail.com> wrote:

> >If you can send me the updated patch in a day or 2 I should be able to get it
> >into the current merge window.
> >
> >thanks,
> >NeilBrown
> >
> Thanks very much neil! Below the latest patch.
> 
> Subject:raid1: Rewrite the implementation of iobarrier.

Thanks. I've applied that patch. I also cleaned up some of the formatting and
clarified the patch description quite a bit.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH 4/4] raid1: Rewrite the implementation of iobarrier.
  2013-11-19  4:25                 ` NeilBrown
@ 2013-11-19  7:53                   ` majianpeng
  0 siblings, 0 replies; 11+ messages in thread
From: majianpeng @ 2013-11-19  7:53 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

>On Fri, 15 Nov 2013 14:55:02 +0800 majianpeng <majianpeng@gmail.com> wrote:
>
>> >If you can send me the updated patch in a day or 2 I should be able to get it
>> >into the current merge window.
>> >
>> >thanks,
>> >NeilBrown
>> >
>> Thanks very much neil! Below the latest patch.
>> 
>> Subject:raid1: Rewrite the implementation of iobarrier.
>
>Thanks. I've applied that patch. I also cleaned up some of the formatting and
>clarified the patch description quite a bit.
>
>NeilBrown
>
Thanks very much!

Jianpeng Ma

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-11-19  7:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-28 11:40 [PATCH 4/4] raid1: Rewrite the implementation of iobarrier majianpeng
2013-10-24  1:50 ` NeilBrown
2013-10-29  1:30   ` majianpeng
2013-10-31  2:33     ` NeilBrown
2013-10-31  3:20       ` majianpeng
2013-11-14  6:44         ` NeilBrown
2013-11-15  2:29           ` majianpeng
2013-11-15  3:42             ` NeilBrown
2013-11-15  6:55               ` majianpeng
2013-11-19  4:25                 ` NeilBrown
2013-11-19  7:53                   ` majianpeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).