linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 3/3] raid1: Rewrite the implementation of iobarrier.
@ 2013-08-19 11:58 majianpeng
  2013-08-26  7:46 ` NeilBrown
  2013-08-26  7:49 ` NeilBrown
  0 siblings, 2 replies; 4+ messages in thread
From: majianpeng @ 2013-08-19 11:58 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

There is a iobarrier in raid1 because the contend between nornalIO and
resyncIO.It suspend all nornal IO when reysync.
But if nornal IO is outrange the resync windwos,there is no contend.
So I rewrite the iobarrier.

I patition the whole space into three parts.

|---------|-----------|------------|-----------|----------
	     Pa 	   next_resync	   Pb	       Pc

Firtly i introduce some concept:
1:RESYNC_WINDOW: For resync, there are 32 sync requests at most.A sync
request is RESYNC_BLOCK_SIZE(64*1024).So the RESYNC_WINDOW is 32 *
RESYNC_BLOCK_SIZE, that is 2MB.
2:next_resync mean the next start sector which will being do sync.
3:Pa means a position which before RESYNC_WINDOW distance from
next_resync.
4:Pb means a position which after 3 * RESYNC_WINDOW distance from
next_resync.
5:Pc means a position which after 6 * RESYNC_WINDOWS disttance from
next_resync.

NornalIO will be partitioned into four types.

NoIO1: it means the end sector of bio is smaller or equal the Pa.
NoIO2: it means the start sector of bio larger or equal Pc.
NoIO3: it means the start sector of bio larger or equal Pb.For those
	tyeps, i don't care the location of end sector.
NoIO4: it means the location between Pa and Pb.

For NoIO1,it don't use iobarrier.
For NoIO4,it used the original iobarrier mechanism.The nornalIO and
syncIO must be contend.
For NoIO2/3, i add two filed in struct r1conf "after_resync_three_count,
after_resync_six_count".They indicate the count of
two types.For those, they don't wait.They only add the relevant the
counter.

For resync action, if there are NoIO4s, it must be wait.If not,it
forward.But if there are NoIO3,it must wait until the counter of NoIO3
became zero.If there are NoIO2, it must wait until the counter of NoIO2
became zero.

There is a problem which when and how to change from NoIO2 to NoIO3.Only
so, the sync action can forward.

I add a filed in struct r1conf "after_resync_sector".It means the
position Pb.What condition will change?
A: if after_resync_sector == MaxSector, it means there are no NoIO2/3.
   So after_resync_sector = Pb.
B: if after_resync_six_count == 0 && after_resync_six_count != 0, it
   means after_resync_sector forward 3 * RESYNC_WINDOW.

There is anthor problem which how to differentiate when NoIO2 became
NoIO3.For example: firstly r1bioA is NoIO2.Before r1bioA completed,
there is no NoIO3.So r1bioA will be became to NoIO3.
At generil,we should use flags and list_head. The codes should:
>list_splice_init(six_head, three_head);
>after_reysnc_three_count = after_resync_six_count;
>after_resync_six_count = 0;
>list_for_each_entry(three_head){
>	clear_bit(RESYNC_SIX_FLAG),
>	set_bit(RESYNC_THREE_FLAG),
>}
If there are many NoIO2, it will take too long in list_for_each_entry.
Avoid this, i add a filed in struct r1bio
"after_threetimes_resync_sector".
I used this to record the position Pb when call wait_barrier in func
make_request.
For NoIO1/4, the value is zero.
In func allow_barrier, it check the conf->after_resync_sector.
If after_threetimes_resync_sector == conf->after_resync_sector,it mean
there is no transition which NoIO2 to NoIO3.In this condtion
 if bio->bi_sector > conf->after_resync_sector + 3 * RESYNC_WINWOD
    it means the bio is NoIO2;
 else
    it means the bio si NoIO3.

If after_threetimes_resync_sector != conf->after_resync_sector,it mean
there is a trnasiton which NoIO2 to NoIO3.Because there is at most one
transtions.So it only means the bio is NoIO2.

For one bio,there are many r1bio.So we make sure the
r1bio->after_threetimes_resync_sector is the same value.
If we met blocked_dev, it must call allow_barrier and wait_barrier.
So the first and the second value of conf->after_resync_sector will
change.If there are many r1bio's with differnet
after_threetimes_resync_sector.For the relevant bio, it depent on the
last value of r1bio.It will cause error.To avoid this, we must wait
former r1bios completes.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
---
 drivers/md/raid1.c | 153 +++++++++++++++++++++++++++++++++++++++++++++--------
 drivers/md/raid1.h |   4 ++
 2 files changed, 134 insertions(+), 23 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 2fd74ec..92909ea 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -66,7 +66,9 @@
  */
 static int max_queued_requests = 1024;
 
-static void allow_barrier(struct r1conf *conf);
+static void allow_barrier(struct r1conf *conf,
+				sector_t after_threetimes_resync_sector,
+				sector_t bi_start_sector);
 static void lower_barrier(struct r1conf *conf);
 
 static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
@@ -84,10 +86,11 @@ static void r1bio_pool_free(void *r1_bio, void *data)
 }
 
 #define RESYNC_BLOCK_SIZE (64*1024)
-//#define RESYNC_BLOCK_SIZE PAGE_SIZE
+#define RESYNC_DEPTH 32
 #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
 #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
-#define RESYNC_WINDOW (2048*1024)
+#define RESYNC_WINDOW (RESYNC_BLOCK_SIZE * RESYNC_DEPTH)
+#define RESYNC_WINDOW_SECTORS (RESYNC_WINDOW >> 9)
 
 static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 {
@@ -225,13 +228,17 @@ static void call_bio_endio(struct r1bio *r1_bio)
 	struct bio *bio = r1_bio->master_bio;
 	int done;
 	struct r1conf *conf = r1_bio->mddev->private;
-
+	unsigned long flags;
+	sector_t after_threetimes_resync_sector =
+		r1_bio->after_threetimes_resync_sector;
+	sector_t bi_start_sector = bio->bi_sector;
+
 	if (bio->bi_phys_segments) {
-		unsigned long flags;
 		spin_lock_irqsave(&conf->device_lock, flags);
 		bio->bi_phys_segments--;
 		done = (bio->bi_phys_segments == 0);
 		spin_unlock_irqrestore(&conf->device_lock, flags);
+		wake_up(&conf->wait_barrier);
 	} else
 		done = 1;
 
@@ -243,7 +250,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
 		 * Wake up any possible resync thread that waits for the device
 		 * to go idle.
 		 */
-		allow_barrier(conf);
+		allow_barrier(conf, after_threetimes_resync_sector,
+				bi_start_sector);
 	}
 }
 
@@ -814,8 +822,6 @@ static void flush_pending_writes(struct r1conf *conf)
  *    there is no normal IO happeing.  It must arrange to call
  *    lower_barrier when the particular background IO completes.
  */
-#define RESYNC_DEPTH 32
-
 static void raise_barrier(struct r1conf *conf)
 {
 	spin_lock_irq(&conf->resync_lock);
@@ -826,11 +832,18 @@ static void raise_barrier(struct r1conf *conf)
 
 	/* block any new IO from starting */
 	conf->barrier++;
-
-	/* Now wait for all pending IO to complete */
+	/* For those conditions, we must wait
+	 * A:array is in freeze state.
+	 * B:When starting resync and there are normal IO
+	 * C:There is no nornal io after three times in resync window
+	 * D: total barrier are less than RESYNC_DEPTH*/
 	wait_event_lock_irq(conf->wait_barrier,
-			    !conf->freeze_array && 
-			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
+			    !conf->freeze_array &&
+			    ((conf->next_resync < RESYNC_WINDOW_SECTORS
+				&& !conf->nr_pending)
+			    || (conf->next_resync + RESYNC_SECTORS
+				    <= conf->after_resync_sector))
+			    && conf->barrier < RESYNC_DEPTH,
 			    conf->resync_lock);
 
 	spin_unlock_irq(&conf->resync_lock);
@@ -846,10 +859,57 @@ static void lower_barrier(struct r1conf *conf)
 	wake_up(&conf->wait_barrier);
 }
 
-static void wait_barrier(struct r1conf *conf)
+static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
 {
+	sector_t sector = 0;
 	spin_lock_irq(&conf->resync_lock);
-	if (conf->barrier) {
+	if (conf->barrier || unlikely(conf->freeze_array)) {
+		if (unlikely(!conf->freeze_array && bio) &&
+			bio_data_dir(bio) == WRITE &&
+			conf->next_resync > RESYNC_WINDOW_SECTORS) {
+			if (bio_end_sector(bio) <= conf->next_resync -
+				RESYNC_WINDOW_SECTORS)
+				goto no_wait;
+			if (conf->after_resync_sector == MaxSector) {
+				if (bio->bi_sector >= conf->next_resync +
+					6 * RESYNC_WINDOW_SECTORS) {
+					if (conf->after_resync_six_count +
+						conf->after_resync_three_count == 0)
+						conf->after_resync_sector =
+							conf->next_resync + 3 *
+							RESYNC_WINDOW_SECTORS;
+					conf->after_resync_six_count++;
+					sector = conf->after_resync_sector;
+					goto no_wait;
+				}
+				if (bio->bi_sector >= conf->next_resync +
+					3 * RESYNC_WINDOW_SECTORS) {
+					if (conf->after_resync_six_count +
+						conf->after_resync_three_count == 0)
+						conf->after_resync_sector =
+							conf->next_resync + 3 *
+							RESYNC_WINDOW_SECTORS;
+					conf->after_resync_three_count++;
+					sector = conf->after_resync_sector;
+					goto no_wait;
+				}
+
+			} else {
+				if (bio->bi_sector >= conf->after_resync_sector + 3 *
+							RESYNC_WINDOW_SECTORS) {
+					conf->after_resync_six_count++;
+					sector = conf->after_resync_sector;
+					goto no_wait;
+				}
+				if (bio->bi_sector >= conf->after_resync_sector) {
+					conf->after_resync_three_count++;
+					sector = conf->after_resync_sector;
+					goto no_wait;
+				}
+			}
+		} else if (unlikely(!conf->freeze_array && bio) &&
+				bio_data_dir(bio) != WRITE)
+			goto no_wait;
 		conf->nr_waiting++;
 		/* Wait for the barrier to drop.
 		 * However if there are already pending
@@ -869,15 +929,46 @@ static void wait_barrier(struct r1conf *conf)
 				    conf->resync_lock);
 		conf->nr_waiting--;
 	}
+no_wait:
 	conf->nr_pending++;
 	spin_unlock_irq(&conf->resync_lock);
+	return sector;
 }
 
-static void allow_barrier(struct r1conf *conf)
+static void
+allow_barrier(struct r1conf *conf,
+		sector_t after_threetimes_resync_sector,
+		sector_t bi_start_sector)
 {
 	unsigned long flags;
 	spin_lock_irqsave(&conf->resync_lock, flags);
 	conf->nr_pending--;
+
+	if (after_threetimes_resync_sector) {
+		if (after_threetimes_resync_sector ==
+				conf->after_resync_sector) {
+			if (bi_start_sector >= (after_threetimes_resync_sector
+						+ 3 * RESYNC_WINDOW_SECTORS))
+				conf->after_resync_six_count--;
+			else
+				conf->after_resync_three_count--;
+		} else
+			conf->after_resync_three_count--;
+
+		if (!conf->after_resync_three_count) {
+			if (conf->after_resync_six_count) {
+				conf->after_resync_three_count =
+					conf->after_resync_six_count;
+				conf->after_resync_six_count = 0;
+				conf->after_resync_sector +=
+					3 * RESYNC_WINDOW_SECTORS;
+			} else
+				conf->after_resync_sector = MaxSector;
+		}
+		BUG_ON(conf->after_resync_three_count < 0 ||
+			conf->after_resync_six_count < 0);
+	}
+
 	spin_unlock_irqrestore(&conf->resync_lock, flags);
 	wake_up(&conf->wait_barrier);
 }
@@ -908,8 +999,8 @@ static void unfreeze_array(struct r1conf *conf)
 	/* reverse the effect of the freeze */
 	spin_lock_irq(&conf->resync_lock);
 	conf->freeze_array = 0;
-	wake_up(&conf->wait_barrier);
 	spin_unlock_irq(&conf->resync_lock);
+	wake_up(&conf->wait_barrier);
 }
 
 
@@ -1012,6 +1103,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 	int first_clone;
 	int sectors_handled;
 	int max_sectors;
+	sector_t after_threetimes_resync_sector = 0;
 
 	/*
 	 * Register the new request and wait if the reconstruction
@@ -1041,7 +1133,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 		finish_wait(&conf->wait_barrier, &w);
 	}
 
-	wait_barrier(conf);
+	after_threetimes_resync_sector = wait_barrier(conf, bio);
 
 	bitmap = mddev->bitmap;
 
@@ -1057,7 +1149,6 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 	r1_bio->state = 0;
 	r1_bio->mddev = mddev;
 	r1_bio->sector = bio->bi_sector;
-
 	/* We might need to issue multiple reads to different
 	 * devices if there are bad blocks around, so we keep
 	 * track of the number of reads in bio->bi_phys_segments.
@@ -1162,6 +1253,8 @@ read_again:
 
 	disks = conf->raid_disks * 2;
  retry_write:
+	r1_bio->after_threetimes_resync_sector = after_threetimes_resync_sector;
+
 	blocked_rdev = NULL;
 	rcu_read_lock();
 	max_sectors = r1_bio->sectors;
@@ -1230,14 +1323,23 @@ read_again:
 	if (unlikely(blocked_rdev)) {
 		/* Wait for this device to become unblocked */
 		int j;
-
+		sector_t old = after_threetimes_resync_sector;
 		for (j = 0; j < i; j++)
 			if (r1_bio->bios[j])
 				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
 		r1_bio->state = 0;
-		allow_barrier(conf);
+		allow_barrier(conf, old, bio->bi_sector);
 		md_wait_for_blocked_rdev(blocked_rdev, mddev);
-		wait_barrier(conf);
+		after_threetimes_resync_sector = wait_barrier(conf, bio);
+		/*
+		 * We must make sure the multi r1bios of bio have
+		 * the same value of after_threetimes_resync_sector
+		 */
+		if (bio->bi_phys_segments && old &&
+			old != after_threetimes_resync_sector)
+			/*wait the former r1bio(s) completed*/
+			wait_event(conf->wait_barrier,
+					bio->bi_phys_segments == 1);
 		goto retry_write;
 	}
 
@@ -1437,11 +1539,13 @@ static void print_conf(struct r1conf *conf)
 
 static void close_sync(struct r1conf *conf)
 {
-	wait_barrier(conf);
-	allow_barrier(conf);
+	wait_barrier(conf, NULL);
+	allow_barrier(conf, 0, 0);
 
 	mempool_destroy(conf->r1buf_pool);
 	conf->r1buf_pool = NULL;
+	conf->next_resync = 0;
+	conf->after_resync_sector = conf->mddev->dev_sectors;
 }
 
 static int raid1_spare_active(struct mddev *mddev)
@@ -2712,6 +2816,9 @@ static struct r1conf *setup_conf(struct mddev *mddev)
 	conf->pending_count = 0;
 	conf->recovery_disabled = mddev->recovery_disabled - 1;
 
+	conf->after_resync_sector = MaxSector;
+	conf->after_resync_six_count = conf->after_resync_three_count = 0;
+
 	err = -EIO;
 	for (i = 0; i < conf->raid_disks * 2; i++) {
 
diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index 8afd300..3ab1405 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -40,6 +40,9 @@ struct r1conf {
 	 * where that is.
 	 */
 	sector_t		next_resync;
+	sector_t		after_resync_sector;
+	int			after_resync_three_count;
+	int			after_resync_six_count;
 
 	spinlock_t		device_lock;
 
@@ -112,6 +115,7 @@ struct r1bio {
 						 * in this BehindIO request
 						 */
 	sector_t		sector;
+	sector_t		after_threetimes_resync_sector;
 	int			sectors;
 	unsigned long		state;
 	struct mddev		*mddev;
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 3/3] raid1: Rewrite the implementation of iobarrier.
  2013-08-19 11:58 [PATCH 3/3] raid1: Rewrite the implementation of iobarrier majianpeng
@ 2013-08-26  7:46 ` NeilBrown
  2013-08-26 10:45   ` majianpeng
  2013-08-26  7:49 ` NeilBrown
  1 sibling, 1 reply; 4+ messages in thread
From: NeilBrown @ 2013-08-26  7:46 UTC (permalink / raw)
  To: majianpeng; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 17734 bytes --]

On Mon, 19 Aug 2013 19:58:07 +0800 majianpeng <majianpeng@gmail.com> wrote:

> There is a iobarrier in raid1 because the contend between nornalIO and
> resyncIO.It suspend all nornal IO when reysync.
> But if nornal IO is outrange the resync windwos,there is no contend.
> So I rewrite the iobarrier.
> 
> I patition the whole space into three parts.
> 
> |---------|-----------|------------|-----------|----------
> 	     Pa 	   next_resync	   Pb	       Pc
> 
> Firtly i introduce some concept:
> 1:RESYNC_WINDOW: For resync, there are 32 sync requests at most.A sync
> request is RESYNC_BLOCK_SIZE(64*1024).So the RESYNC_WINDOW is 32 *
> RESYNC_BLOCK_SIZE, that is 2MB.
> 2:next_resync mean the next start sector which will being do sync.
> 3:Pa means a position which before RESYNC_WINDOW distance from
> next_resync.
> 4:Pb means a position which after 3 * RESYNC_WINDOW distance from
> next_resync.
> 5:Pc means a position which after 6 * RESYNC_WINDOWS disttance from
> next_resync.
> 
> NornalIO will be partitioned into four types.
> 
> NoIO1: it means the end sector of bio is smaller or equal the Pa.
> NoIO2: it means the start sector of bio larger or equal Pc.
> NoIO3: it means the start sector of bio larger or equal Pb.For those
> 	tyeps, i don't care the location of end sector.
> NoIO4: it means the location between Pa and Pb.
> 
> For NoIO1,it don't use iobarrier.
> For NoIO4,it used the original iobarrier mechanism.The nornalIO and
> syncIO must be contend.
> For NoIO2/3, i add two filed in struct r1conf "after_resync_three_count,
> after_resync_six_count".They indicate the count of
> two types.For those, they don't wait.They only add the relevant the
> counter.
> 
> For resync action, if there are NoIO4s, it must be wait.If not,it
> forward.But if there are NoIO3,it must wait until the counter of NoIO3
> became zero.If there are NoIO2, it must wait until the counter of NoIO2
> became zero.
> 
> There is a problem which when and how to change from NoIO2 to NoIO3.Only
> so, the sync action can forward.
> 
> I add a filed in struct r1conf "after_resync_sector".It means the
> position Pb.What condition will change?
> A: if after_resync_sector == MaxSector, it means there are no NoIO2/3.
>    So after_resync_sector = Pb.
> B: if after_resync_six_count == 0 && after_resync_six_count != 0, it
>    means after_resync_sector forward 3 * RESYNC_WINDOW.
> 
> There is anthor problem which how to differentiate when NoIO2 became
> NoIO3.For example: firstly r1bioA is NoIO2.Before r1bioA completed,
> there is no NoIO3.So r1bioA will be became to NoIO3.
> At generil,we should use flags and list_head. The codes should:
> >list_splice_init(six_head, three_head);
> >after_reysnc_three_count = after_resync_six_count;
> >after_resync_six_count = 0;
> >list_for_each_entry(three_head){
> >	clear_bit(RESYNC_SIX_FLAG),
> >	set_bit(RESYNC_THREE_FLAG),
> >}
> If there are many NoIO2, it will take too long in list_for_each_entry.
> Avoid this, i add a filed in struct r1bio
> "after_threetimes_resync_sector".
> I used this to record the position Pb when call wait_barrier in func
> make_request.
> For NoIO1/4, the value is zero.
> In func allow_barrier, it check the conf->after_resync_sector.
> If after_threetimes_resync_sector == conf->after_resync_sector,it mean
> there is no transition which NoIO2 to NoIO3.In this condtion
>  if bio->bi_sector > conf->after_resync_sector + 3 * RESYNC_WINWOD
>     it means the bio is NoIO2;
>  else
>     it means the bio si NoIO3.
> 
> If after_threetimes_resync_sector != conf->after_resync_sector,it mean
> there is a trnasiton which NoIO2 to NoIO3.Because there is at most one
> transtions.So it only means the bio is NoIO2.
> 
> For one bio,there are many r1bio.So we make sure the
> r1bio->after_threetimes_resync_sector is the same value.
> If we met blocked_dev, it must call allow_barrier and wait_barrier.
> So the first and the second value of conf->after_resync_sector will
> change.If there are many r1bio's with differnet
> after_threetimes_resync_sector.For the relevant bio, it depent on the
> last value of r1bio.It will cause error.To avoid this, we must wait
> former r1bios completes.
> 
> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>

Hi,
 this is certainly looking better.  I've made a number of comments below...



> ---
>  drivers/md/raid1.c | 153 +++++++++++++++++++++++++++++++++++++++++++++--------
>  drivers/md/raid1.h |   4 ++
>  2 files changed, 134 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 2fd74ec..92909ea 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -66,7 +66,9 @@
>   */
>  static int max_queued_requests = 1024;
>  
> -static void allow_barrier(struct r1conf *conf);
> +static void allow_barrier(struct r1conf *conf,
> +				sector_t after_threetimes_resync_sector,
> +				sector_t bi_start_sector);
>  static void lower_barrier(struct r1conf *conf);
>  
>  static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
> @@ -84,10 +86,11 @@ static void r1bio_pool_free(void *r1_bio, void *data)
>  }
>  
>  #define RESYNC_BLOCK_SIZE (64*1024)
> -//#define RESYNC_BLOCK_SIZE PAGE_SIZE
> +#define RESYNC_DEPTH 32
>  #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
>  #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
> -#define RESYNC_WINDOW (2048*1024)
> +#define RESYNC_WINDOW (RESYNC_BLOCK_SIZE * RESYNC_DEPTH)
> +#define RESYNC_WINDOW_SECTORS (RESYNC_WINDOW >> 9)

A patch which just tidies up these #defines might be nice.  This patch is
rather large and anything that can be split into a separate patch should be I
think.


>  
>  static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
>  {
> @@ -225,13 +228,17 @@ static void call_bio_endio(struct r1bio *r1_bio)
>  	struct bio *bio = r1_bio->master_bio;
>  	int done;
>  	struct r1conf *conf = r1_bio->mddev->private;
> -
> +	unsigned long flags;

Why have you moved 'flags' from where it was to here?  There is no need so
best to leave it alone.

> +	sector_t after_threetimes_resync_sector =
> +		r1_bio->after_threetimes_resync_sector;
> +	sector_t bi_start_sector = bio->bi_sector;
> +

"after_threetimes_resync_sector" is much too long for a variable name.
And if one day I decide that 4 times resync_sectors is better, I would need
to change that name everywhere.

There are (as you describe above) two windows, each 3*resync_sectors in size.
A "current" window in which no normal writes are permitted, and a "next"
window.  "after_threetimes_resync_sector" is the send of the current window,
and the start of the next window.  So
  start_next_sync_window
would be a much better name, and is noticeably shorter.  I would probably
prefer
  start_next_window
which is quite a manageable size.
  end_current_window
would also be OK.


>  	if (bio->bi_phys_segments) {
> -		unsigned long flags;
>  		spin_lock_irqsave(&conf->device_lock, flags);
>  		bio->bi_phys_segments--;
>  		done = (bio->bi_phys_segments == 0);
>  		spin_unlock_irqrestore(&conf->device_lock, flags);
> +		wake_up(&conf->wait_barrier);

A brief comment explaining why this wake_up is needed would help.
   /* make_request might be waiting for bi_phys_sectors to decrease */


>  	} else
>  		done = 1;
>  
> @@ -243,7 +250,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
>  		 * Wake up any possible resync thread that waits for the device
>  		 * to go idle.
>  		 */
> -		allow_barrier(conf);
> +		allow_barrier(conf, after_threetimes_resync_sector,
> +				bi_start_sector);
>  	}
>  }
>  
> @@ -814,8 +822,6 @@ static void flush_pending_writes(struct r1conf *conf)
>   *    there is no normal IO happeing.  It must arrange to call
>   *    lower_barrier when the particular background IO completes.
>   */
> -#define RESYNC_DEPTH 32
> -
>  static void raise_barrier(struct r1conf *conf)
>  {
>  	spin_lock_irq(&conf->resync_lock);
> @@ -826,11 +832,18 @@ static void raise_barrier(struct r1conf *conf)
>  
>  	/* block any new IO from starting */
>  	conf->barrier++;
> -
> -	/* Now wait for all pending IO to complete */
> +	/* For those conditions, we must wait
> +	 * A:array is in freeze state.
> +	 * B:When starting resync and there are normal IO
> +	 * C:There is no nornal io after three times in resync window
> +	 * D: total barrier are less than RESYNC_DEPTH*/
>  	wait_event_lock_irq(conf->wait_barrier,
> -			    !conf->freeze_array && 
> -			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
> +			    !conf->freeze_array &&
> +			    ((conf->next_resync < RESYNC_WINDOW_SECTORS
> +				&& !conf->nr_pending)
> +			    || (conf->next_resync + RESYNC_SECTORS
> +				    <= conf->after_resync_sector))
> +			    && conf->barrier < RESYNC_DEPTH,
>  			    conf->resync_lock);

For "C:There is no nornal io after three times in resync window" I expected
to see
          conf->after_resync_three_count == 0

but I don't see that.  Why not?


>  
>  	spin_unlock_irq(&conf->resync_lock);
> @@ -846,10 +859,57 @@ static void lower_barrier(struct r1conf *conf)
>  	wake_up(&conf->wait_barrier);
>  }
>  
> -static void wait_barrier(struct r1conf *conf)
> +static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
>  {
> +	sector_t sector = 0;
>  	spin_lock_irq(&conf->resync_lock);
> -	if (conf->barrier) {
> +	if (conf->barrier || unlikely(conf->freeze_array)) {
> +		if (unlikely(!conf->freeze_array && bio) &&
> +			bio_data_dir(bio) == WRITE &&
> +			conf->next_resync > RESYNC_WINDOW_SECTORS) {
> +			if (bio_end_sector(bio) <= conf->next_resync -
> +				RESYNC_WINDOW_SECTORS)
> +				goto no_wait;
> +			if (conf->after_resync_sector == MaxSector) {
> +				if (bio->bi_sector >= conf->next_resync +
> +					6 * RESYNC_WINDOW_SECTORS) {
> +					if (conf->after_resync_six_count +
> +						conf->after_resync_three_count == 0)
> +						conf->after_resync_sector =
> +							conf->next_resync + 3 *
> +							RESYNC_WINDOW_SECTORS;
> +					conf->after_resync_six_count++;
> +					sector = conf->after_resync_sector;
> +					goto no_wait;
> +				}
> +				if (bio->bi_sector >= conf->next_resync +
> +					3 * RESYNC_WINDOW_SECTORS) {
> +					if (conf->after_resync_six_count +
> +						conf->after_resync_three_count == 0)
> +						conf->after_resync_sector =
> +							conf->next_resync + 3 *
> +							RESYNC_WINDOW_SECTORS;
> +					conf->after_resync_three_count++;
> +					sector = conf->after_resync_sector;
> +					goto no_wait;
> +				}
> +
> +			} else {
> +				if (bio->bi_sector >= conf->after_resync_sector + 3 *
> +							RESYNC_WINDOW_SECTORS) {
> +					conf->after_resync_six_count++;
> +					sector = conf->after_resync_sector;
> +					goto no_wait;
> +				}
> +				if (bio->bi_sector >= conf->after_resync_sector) {
> +					conf->after_resync_three_count++;
> +					sector = conf->after_resync_sector;
> +					goto no_wait;
> +				}
> +			}
> +		} else if (unlikely(!conf->freeze_array && bio) &&
> +				bio_data_dir(bio) != WRITE)
> +			goto no_wait;

This is really long set of nested conditions that is probably more complex
than needed, and certainly should be in a separate function.
Then you could have something like:
   if (need_to_wait_for_sync(...) {
        conf->nr_waiting++
        wait.....
        conf->nr_waiting--;
   }

and not need the "no_wait" label.

I suspect there should only be one place in the function which does
after_resync_three_count++ and one for after_resync_six_count++;

And they should be something like
   current_window_requests
and
   next_window_requests.

>  		conf->nr_waiting++;
>  		/* Wait for the barrier to drop.
>  		 * However if there are already pending
> @@ -869,15 +929,46 @@ static void wait_barrier(struct r1conf *conf)
>  				    conf->resync_lock);
>  		conf->nr_waiting--;
>  	}
> +no_wait:
>  	conf->nr_pending++;
>  	spin_unlock_irq(&conf->resync_lock);
> +	return sector;
>  }
>  
> -static void allow_barrier(struct r1conf *conf)
> +static void

Please avoid this sort of unnecessary change.  Leave "allow_barrier" on the
same line as "static void".

> +allow_barrier(struct r1conf *conf,
> +		sector_t after_threetimes_resync_sector,
> +		sector_t bi_start_sector)
>  {
>  	unsigned long flags;
>  	spin_lock_irqsave(&conf->resync_lock, flags);
>  	conf->nr_pending--;
> +
> +	if (after_threetimes_resync_sector) {
> +		if (after_threetimes_resync_sector ==
> +				conf->after_resync_sector) {
> +			if (bi_start_sector >= (after_threetimes_resync_sector
> +						+ 3 * RESYNC_WINDOW_SECTORS))
> +				conf->after_resync_six_count--;
> +			else
> +				conf->after_resync_three_count--;
> +		} else
> +			conf->after_resync_three_count--;
> +
> +		if (!conf->after_resync_three_count) {
> +			if (conf->after_resync_six_count) {
> +				conf->after_resync_three_count =
> +					conf->after_resync_six_count;
> +				conf->after_resync_six_count = 0;
> +				conf->after_resync_sector +=
> +					3 * RESYNC_WINDOW_SECTORS;
> +			} else
> +				conf->after_resync_sector = MaxSector;
> +		}
> +		BUG_ON(conf->after_resync_three_count < 0 ||
> +			conf->after_resync_six_count < 0);
> +	}
> +
>  	spin_unlock_irqrestore(&conf->resync_lock, flags);
>  	wake_up(&conf->wait_barrier);
>  }
> @@ -908,8 +999,8 @@ static void unfreeze_array(struct r1conf *conf)
>  	/* reverse the effect of the freeze */
>  	spin_lock_irq(&conf->resync_lock);
>  	conf->freeze_array = 0;
> -	wake_up(&conf->wait_barrier);
>  	spin_unlock_irq(&conf->resync_lock);
> +	wake_up(&conf->wait_barrier);

This seems unnecessary, so should not be here.  Is there a reason you made
this change?

>  }
>  
>  
> @@ -1012,6 +1103,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
>  	int first_clone;
>  	int sectors_handled;
>  	int max_sectors;
> +	sector_t after_threetimes_resync_sector = 0;
>  
>  	/*
>  	 * Register the new request and wait if the reconstruction
> @@ -1041,7 +1133,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
>  		finish_wait(&conf->wait_barrier, &w);
>  	}
>  
> -	wait_barrier(conf);
> +	after_threetimes_resync_sector = wait_barrier(conf, bio);
>  
>  	bitmap = mddev->bitmap;
>  
> @@ -1057,7 +1149,6 @@ static void make_request(struct mddev *mddev, struct bio * bio)
>  	r1_bio->state = 0;
>  	r1_bio->mddev = mddev;
>  	r1_bio->sector = bio->bi_sector;
> -

Don't delete blanks lines unnecessarily.

>  	/* We might need to issue multiple reads to different
>  	 * devices if there are bad blocks around, so we keep
>  	 * track of the number of reads in bio->bi_phys_segments.
> @@ -1162,6 +1253,8 @@ read_again:
>  
>  	disks = conf->raid_disks * 2;
>   retry_write:
> +	r1_bio->after_threetimes_resync_sector = after_threetimes_resync_sector;
> +
>  	blocked_rdev = NULL;
>  	rcu_read_lock();
>  	max_sectors = r1_bio->sectors;
> @@ -1230,14 +1323,23 @@ read_again:
>  	if (unlikely(blocked_rdev)) {
>  		/* Wait for this device to become unblocked */
>  		int j;
> -
> +		sector_t old = after_threetimes_resync_sector;

Please keep a blank line between variable declarations and code.


>  		for (j = 0; j < i; j++)
>  			if (r1_bio->bios[j])
>  				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
>  		r1_bio->state = 0;
> -		allow_barrier(conf);
> +		allow_barrier(conf, old, bio->bi_sector);
>  		md_wait_for_blocked_rdev(blocked_rdev, mddev);
> -		wait_barrier(conf);
> +		after_threetimes_resync_sector = wait_barrier(conf, bio);
> +		/*
> +		 * We must make sure the multi r1bios of bio have
> +		 * the same value of after_threetimes_resync_sector
> +		 */
> +		if (bio->bi_phys_segments && old &&
> +			old != after_threetimes_resync_sector)
> +			/*wait the former r1bio(s) completed*/
> +			wait_event(conf->wait_barrier,
> +					bio->bi_phys_segments == 1);

I think it is quite possible that bi_phys_segments==0 at this point.
So you probably want "<= 1".

>  		goto retry_write;
>  	}
>  
> @@ -1437,11 +1539,13 @@ static void print_conf(struct r1conf *conf)
>  
>  static void close_sync(struct r1conf *conf)
>  {
> -	wait_barrier(conf);
> -	allow_barrier(conf);
> +	wait_barrier(conf, NULL);
> +	allow_barrier(conf, 0, 0);
>  
>  	mempool_destroy(conf->r1buf_pool);
>  	conf->r1buf_pool = NULL;
> +	conf->next_resync = 0;
> +	conf->after_resync_sector = conf->mddev->dev_sectors;
>  }
>  
>  static int raid1_spare_active(struct mddev *mddev)
> @@ -2712,6 +2816,9 @@ static struct r1conf *setup_conf(struct mddev *mddev)
>  	conf->pending_count = 0;
>  	conf->recovery_disabled = mddev->recovery_disabled - 1;
>  
> +	conf->after_resync_sector = MaxSector;
> +	conf->after_resync_six_count = conf->after_resync_three_count = 0;
> +
>  	err = -EIO;
>  	for (i = 0; i < conf->raid_disks * 2; i++) {
>  
> diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
> index 8afd300..3ab1405 100644
> --- a/drivers/md/raid1.h
> +++ b/drivers/md/raid1.h
> @@ -40,6 +40,9 @@ struct r1conf {
>  	 * where that is.
>  	 */
>  	sector_t		next_resync;
> +	sector_t		after_resync_sector;
> +	int			after_resync_three_count;
> +	int			after_resync_six_count;
>  
>  	spinlock_t		device_lock;
>  
> @@ -112,6 +115,7 @@ struct r1bio {
>  						 * in this BehindIO request
>  						 */
>  	sector_t		sector;
> +	sector_t		after_threetimes_resync_sector;
>  	int			sectors;
>  	unsigned long		state;
>  	struct mddev		*mddev;

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 3/3] raid1: Rewrite the implementation of iobarrier.
  2013-08-19 11:58 [PATCH 3/3] raid1: Rewrite the implementation of iobarrier majianpeng
  2013-08-26  7:46 ` NeilBrown
@ 2013-08-26  7:49 ` NeilBrown
  1 sibling, 0 replies; 4+ messages in thread
From: NeilBrown @ 2013-08-26  7:49 UTC (permalink / raw)
  To: majianpeng; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]

On Mon, 19 Aug 2013 19:58:07 +0800 majianpeng <majianpeng@gmail.com> wrote:

> @@ -846,10 +859,57 @@ static void lower_barrier(struct r1conf *conf)
>  	wake_up(&conf->wait_barrier);
>  }
>  
> -static void wait_barrier(struct r1conf *conf)
> +static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
>  {
> +	sector_t sector = 0;
>  	spin_lock_irq(&conf->resync_lock);
> -	if (conf->barrier) {
> +	if (conf->barrier || unlikely(conf->freeze_array)) {
> +		if (unlikely(!conf->freeze_array && bio) &&

I forgot to mention this bit.
If we need to test freeze_array here, then maybe that test should have been
added in the previous patch which introduced "freeze_array"?

And I think the field should be "array_frozen".  "freeze_array" sounds like
an action so it is a suitable name for a function, but not for a variable.

Also I think you probably mean "likely" rather than "unlikely" the second
time???
I rather avoid "likely" and "unlikely" calls unless they will really
make an important difference.


NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re: [PATCH 3/3] raid1: Rewrite the implementation of iobarrier.
  2013-08-26  7:46 ` NeilBrown
@ 2013-08-26 10:45   ` majianpeng
  0 siblings, 0 replies; 4+ messages in thread
From: majianpeng @ 2013-08-26 10:45 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

>On Mon, 19 Aug 2013 19:58:07 +0800 majianpeng <majianpeng@gmail.com> wrote:
>
[snip]
>> -
>> -	/* Now wait for all pending IO to complete */
>> +	/* For those conditions, we must wait
>> +	 * A:array is in freeze state.
>> +	 * B:When starting resync and there are normal IO
>> +	 * C:There is no nornal io after three times in resync window
>> +	 * D: total barrier are less than RESYNC_DEPTH*/
>>  	wait_event_lock_irq(conf->wait_barrier,
>> -			    !conf->freeze_array && 
>> -			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
>> +			    !conf->freeze_array &&
>> +			    ((conf->next_resync < RESYNC_WINDOW_SECTORS
>> +				&& !conf->nr_pending)
>> +			    || (conf->next_resync + RESYNC_SECTORS
>> +				    <= conf->after_resync_sector))
>> +			    && conf->barrier < RESYNC_DEPTH,
>>  			    conf->resync_lock);
>
>For "C:There is no nornal io after three times in resync window" I expected
>to see
>          conf->after_resync_three_count == 0
>
>but I don't see that.  Why not?
>
I think there are two mistakes.
For those conditions, we must wait:
A:Array is in freeze state.
B:conf->nr_pending > 0 [For this, firtly, i think we should only consider the starting resync.If ->nr_pending > 0,we must wait
whether or not starting resync.]
C:conf->next_resync + RESYNC_SECTORS > conf->after_resync_sector.
D:Total barrier are more than RESYNC_DEPTH.

[snip]

>
>>  		for (j = 0; j < i; j++)
>>  			if (r1_bio->bios[j])
>>  				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
>>  		r1_bio->state = 0;
>> -		allow_barrier(conf);
>> +		allow_barrier(conf, old, bio->bi_sector);
>>  		md_wait_for_blocked_rdev(blocked_rdev, mddev);
>> -		wait_barrier(conf);
>> +		after_threetimes_resync_sector = wait_barrier(conf, bio);
>> +		/*
>> +		 * We must make sure the multi r1bios of bio have
>> +		 * the same value of after_threetimes_resync_sector
>> +		 */
>> +		if (bio->bi_phys_segments && old &&
>> +			old != after_threetimes_resync_sector)
>> +			/*wait the former r1bio(s) completed*/
>> +			wait_event(conf->wait_barrier,
>> +					bio->bi_phys_segments == 1);
>
>I think it is quite possible that bi_phys_segments==0 at this point.
>So you probably want "<= 1".
>
No, only there are multi r1bios of bio, it must wait previous r1bios completed.The left r1bio made'bi_phys_segments = 1'.
If there is one r1bio of bio, it can't wait.It only need allow_barrier and retry wait_barrier.



For other comments, i'll apply.
Thanks!
Jianpeng Ma

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-08-26 10:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-19 11:58 [PATCH 3/3] raid1: Rewrite the implementation of iobarrier majianpeng
2013-08-26  7:46 ` NeilBrown
2013-08-26 10:45   ` majianpeng
2013-08-26  7:49 ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).