linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes()
@ 2016-12-07 17:42 Song Liu
  2016-12-07 17:42 ` [PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000 Song Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Song Liu @ 2016-12-07 17:42 UTC (permalink / raw)
  To: linux-raid
  Cc: neilb, shli, kernel-team, dan.j.williams, hch, liuzhengyuan,
	liuyun01, Song Liu

r5l_recovery_create_empty_meta_block() creates crc for the empty
metablock. After the metablock is updated, we need clear the
checksum before recalculate it.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 drivers/md/raid5-cache.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index c3b3124..875f963 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -2117,7 +2117,9 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log,
 			}
 		}
 		mb->meta_size = cpu_to_le32(offset);
-		mb->checksum = crc32c_le(log->uuid_checksum, mb, PAGE_SIZE);
+		mb->checksum = 0;
+		mb->checksum = cpu_to_le32(crc32c_le(log->uuid_checksum,
+						     mb, PAGE_SIZE));
 		sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page,
 			     REQ_OP_WRITE, WRITE_FUA, false);
 		sh->log_start = ctx->pos;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000
  2016-12-07 17:42 [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Song Liu
@ 2016-12-07 17:42 ` Song Liu
  2016-12-07 17:42 ` [PATCH 3/3] md/r5cache: sh->log_start in recovery Song Liu
  2016-12-08 18:28 ` [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Shaohua Li
  2 siblings, 0 replies; 5+ messages in thread
From: Song Liu @ 2016-12-07 17:42 UTC (permalink / raw)
  To: linux-raid
  Cc: neilb, shli, kernel-team, dan.j.williams, hch, liuzhengyuan,
	liuyun01, Song Liu

Currently, we increase journal entry seq by 10 after recovery.
However, this is not sufficient in the following case.

After crash the journal looks like

| seq+0 | +1 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 |

If +1 is not valid, we dropped all entries from +1 to +12; and
write seq+10:

| seq+0 | +10 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 |

However, if we write a big journal entry with seq+11, it will
connect with some stale journal entry:

| seq+0 | +10 |                     +11                 | +12 |

To reduce the risk of this issue, we increase seq by 1000 instead.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 drivers/md/raid5-cache.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 875f963..5301081 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -2003,8 +2003,8 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
  * happens again, new recovery will start from meta 1. Since meta 2n is
  * valid now, recovery will think meta 3 is valid, which is wrong.
  * The solution is we create a new meta in meta2 with its seq == meta
- * 1's seq + 10 and let superblock points to meta2. The same recovery will
- * not think meta 3 is a valid meta, because its seq doesn't match
+ * 1's seq + 1000 and let superblock points to meta2. The same recovery
+ * will not think meta 3 is a valid meta, because its seq doesn't match
  */
 
 /*
@@ -2034,7 +2034,7 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
  *   ---------------------------------------------
  *   ^                              ^
  *   |- log->last_checkpoint        |- ctx->pos+1
- *   |- log->last_cp_seq            |- ctx->seq+11
+ *   |- log->last_cp_seq            |- ctx->seq+1001
  *
  * However, it is not safe to start the state machine yet, because data only
  * parities are not yet secured in RAID. To save these data only parities, we
@@ -2045,7 +2045,7 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
  *   -----------------------------------------------------------------
  *   ^                                                ^
  *   |- log->last_checkpoint                          |- ctx->pos+n
- *   |- log->last_cp_seq                              |- ctx->seq+10+n
+ *   |- log->last_cp_seq                              |- ctx->seq+1000+n
  *
  * If failure happens again during this process, the recovery can safe start
  * again from log->last_checkpoint.
@@ -2057,7 +2057,7 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
  *   -----------------------------------------------------------------
  *                        ^                         ^
  *                        |- log->last_checkpoint   |- ctx->pos+n
- *                        |- log->last_cp_seq       |- ctx->seq+10+n
+ *                        |- log->last_cp_seq       |- ctx->seq+1000+n
  *
  * Then we can safely start the state machine. If failure happens from this
  * point on, the recovery will start from new log->last_checkpoint.
@@ -2157,8 +2157,8 @@ static int r5l_recovery_log(struct r5l_log *log)
 	if (ret)
 		return ret;
 
-       pos = ctx.pos;
-       ctx.seq += 10;
+	pos = ctx.pos;
+	ctx.seq += 1000;
 
 	if (ctx.data_only_stripes == 0) {
 		log->next_checkpoint = ctx.pos;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] md/r5cache: sh->log_start in recovery
  2016-12-07 17:42 [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Song Liu
  2016-12-07 17:42 ` [PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000 Song Liu
@ 2016-12-07 17:42 ` Song Liu
  2016-12-08 18:41   ` Shaohua Li
  2016-12-08 18:28 ` [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Shaohua Li
  2 siblings, 1 reply; 5+ messages in thread
From: Song Liu @ 2016-12-07 17:42 UTC (permalink / raw)
  To: linux-raid
  Cc: neilb, shli, kernel-team, dan.j.williams, hch, liuzhengyuan,
	liuyun01, Song Liu

We only need to update sh->log_start at the end of recovery,
which is r5c_recovery_rewrite_data_only_stripes().

In when there is data-only stripes, log->next_checkpoint is
also set in r5c_recovery_rewrite_data_only_stripes().

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 drivers/md/raid5-cache.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 5301081..ae2684a 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -1681,8 +1681,7 @@ r5l_recovery_replay_one_stripe(struct r5conf *conf,
 
 static struct stripe_head *
 r5c_recovery_alloc_stripe(struct r5conf *conf,
-			  sector_t stripe_sect,
-			  sector_t log_start)
+			  sector_t stripe_sect)
 {
 	struct stripe_head *sh;
 
@@ -1691,7 +1690,6 @@ r5c_recovery_alloc_stripe(struct r5conf *conf,
 		return NULL;  /* no more stripe available */
 
 	r5l_recovery_reset_stripe(sh);
-	sh->log_start = log_start;
 
 	return sh;
 }
@@ -1861,7 +1859,7 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
 						stripe_sect);
 
 		if (!sh) {
-			sh = r5c_recovery_alloc_stripe(conf, stripe_sect, ctx->pos);
+			sh = r5c_recovery_alloc_stripe(conf, stripe_sect);
 			/*
 			 * cannot get stripe from raid5_get_active_stripe
 			 * try replay some stripes
@@ -1870,7 +1868,7 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
 				r5c_recovery_replay_stripes(
 					cached_stripe_list, ctx);
 				sh = r5c_recovery_alloc_stripe(
-					conf, stripe_sect, ctx->pos);
+					conf, stripe_sect);
 			}
 			if (!sh) {
 				pr_debug("md/raid:%s: Increasing stripe cache size to %d to recovery data on journal.\n",
@@ -1878,8 +1876,8 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
 					conf->min_nr_stripes * 2);
 				raid5_set_cache_size(mddev,
 						     conf->min_nr_stripes * 2);
-				sh = r5c_recovery_alloc_stripe(
-					conf, stripe_sect, ctx->pos);
+				sh = r5c_recovery_alloc_stripe(conf,
+							       stripe_sect);
 			}
 			if (!sh) {
 				pr_err("md/raid:%s: Cannot get enough stripes due to memory pressure. Recovery failed.\n",
@@ -1893,7 +1891,6 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
 			if (!test_bit(STRIPE_R5C_CACHING, &sh->state) &&
 			    test_bit(R5_Wantwrite, &sh->dev[sh->pd_idx].flags)) {
 				r5l_recovery_replay_one_stripe(conf, sh, ctx);
-				sh->log_start = ctx->pos;
 				list_move_tail(&sh->lru, cached_stripe_list);
 			}
 			r5l_recovery_load_data(log, sh, ctx, payload,
@@ -1932,8 +1929,6 @@ static void r5c_recovery_load_one_stripe(struct r5l_log *log,
 			set_bit(R5_UPTODATE, &dev->flags);
 		}
 	}
-	list_add_tail(&sh->r5c, &log->stripe_in_journal_list);
-	atomic_inc(&log->stripe_in_journal_count);
 }
 
 /*
@@ -2123,9 +2118,11 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log,
 		sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page,
 			     REQ_OP_WRITE, WRITE_FUA, false);
 		sh->log_start = ctx->pos;
+		list_add_tail(&sh->r5c, &log->stripe_in_journal_list);
+		atomic_inc(&log->stripe_in_journal_count);
 		ctx->pos = write_pos;
 		ctx->seq += 1;
-
+		log->next_checkpoint = sh->log_start;
 		list_del_init(&sh->lru);
 		raid5_release_stripe(sh);
 	}
@@ -2139,7 +2136,6 @@ static int r5l_recovery_log(struct r5l_log *log)
 	struct r5l_recovery_ctx ctx;
 	int ret;
 	sector_t pos;
-	struct stripe_head *sh;
 
 	ctx.pos = log->last_checkpoint;
 	ctx.seq = log->last_cp_seq;
@@ -2164,9 +2160,6 @@ static int r5l_recovery_log(struct r5l_log *log)
 		log->next_checkpoint = ctx.pos;
 		r5l_log_write_empty_meta_block(log, ctx.pos, ctx.seq++);
 		ctx.pos = r5l_ring_add(log, ctx.pos, BLOCK_SECTORS);
-	} else {
-		sh = list_last_entry(&ctx.cached_list, struct stripe_head, lru);
-		log->next_checkpoint = sh->log_start;
 	}
 
 	if ((ctx.data_only_stripes == 0) && (ctx.data_parity_stripes == 0))
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes()
  2016-12-07 17:42 [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Song Liu
  2016-12-07 17:42 ` [PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000 Song Liu
  2016-12-07 17:42 ` [PATCH 3/3] md/r5cache: sh->log_start in recovery Song Liu
@ 2016-12-08 18:28 ` Shaohua Li
  2 siblings, 0 replies; 5+ messages in thread
From: Shaohua Li @ 2016-12-08 18:28 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-raid, neilb, shli, kernel-team, dan.j.williams, hch,
	liuzhengyuan, liuyun01

On Wed, Dec 07, 2016 at 09:42:05AM -0800, Song Liu wrote:
> r5l_recovery_create_empty_meta_block() creates crc for the empty
> metablock. After the metablock is updated, we need clear the
> checksum before recalculate it.

applied this one. However, I moved out checksum calculation from
r5l_recovery_create_empty_meta_block. We should calculate checksum after all
fields in meta block is updated.


Thanks,
Shaohua 
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  drivers/md/raid5-cache.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> index c3b3124..875f963 100644
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -2117,7 +2117,9 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log,
>  			}
>  		}
>  		mb->meta_size = cpu_to_le32(offset);
> -		mb->checksum = crc32c_le(log->uuid_checksum, mb, PAGE_SIZE);
> +		mb->checksum = 0;
> +		mb->checksum = cpu_to_le32(crc32c_le(log->uuid_checksum,
> +						     mb, PAGE_SIZE));
>  		sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page,
>  			     REQ_OP_WRITE, WRITE_FUA, false);
>  		sh->log_start = ctx->pos;
> -- 
> 2.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 3/3] md/r5cache: sh->log_start in recovery
  2016-12-07 17:42 ` [PATCH 3/3] md/r5cache: sh->log_start in recovery Song Liu
@ 2016-12-08 18:41   ` Shaohua Li
  0 siblings, 0 replies; 5+ messages in thread
From: Shaohua Li @ 2016-12-08 18:41 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-raid, neilb, shli, kernel-team, dan.j.williams, hch,
	liuzhengyuan, liuyun01

On Wed, Dec 07, 2016 at 09:42:07AM -0800, Song Liu wrote:
> We only need to update sh->log_start at the end of recovery,
> which is r5c_recovery_rewrite_data_only_stripes().
> 
> In when there is data-only stripes, log->next_checkpoint is
> also set in r5c_recovery_rewrite_data_only_stripes().

I didn't get this. please describe why this patch is required instead of what
this patch does.


> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  drivers/md/raid5-cache.c | 23 ++++++++---------------
>  1 file changed, 8 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> index 5301081..ae2684a 100644
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -1681,8 +1681,7 @@ r5l_recovery_replay_one_stripe(struct r5conf *conf,
>  
>  static struct stripe_head *
>  r5c_recovery_alloc_stripe(struct r5conf *conf,
> -			  sector_t stripe_sect,
> -			  sector_t log_start)
> +			  sector_t stripe_sect)
>  {
>  	struct stripe_head *sh;
>  
> @@ -1691,7 +1690,6 @@ r5c_recovery_alloc_stripe(struct r5conf *conf,
>  		return NULL;  /* no more stripe available */
>  
>  	r5l_recovery_reset_stripe(sh);
> -	sh->log_start = log_start;
>  
>  	return sh;
>  }
> @@ -1861,7 +1859,7 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
>  						stripe_sect);
>  
>  		if (!sh) {
> -			sh = r5c_recovery_alloc_stripe(conf, stripe_sect, ctx->pos);
> +			sh = r5c_recovery_alloc_stripe(conf, stripe_sect);
>  			/*
>  			 * cannot get stripe from raid5_get_active_stripe
>  			 * try replay some stripes
> @@ -1870,7 +1868,7 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
>  				r5c_recovery_replay_stripes(
>  					cached_stripe_list, ctx);
>  				sh = r5c_recovery_alloc_stripe(
> -					conf, stripe_sect, ctx->pos);
> +					conf, stripe_sect);
>  			}
>  			if (!sh) {
>  				pr_debug("md/raid:%s: Increasing stripe cache size to %d to recovery data on journal.\n",
> @@ -1878,8 +1876,8 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
>  					conf->min_nr_stripes * 2);
>  				raid5_set_cache_size(mddev,
>  						     conf->min_nr_stripes * 2);
> -				sh = r5c_recovery_alloc_stripe(
> -					conf, stripe_sect, ctx->pos);
> +				sh = r5c_recovery_alloc_stripe(conf,
> +							       stripe_sect);
>  			}
>  			if (!sh) {
>  				pr_err("md/raid:%s: Cannot get enough stripes due to memory pressure. Recovery failed.\n",
> @@ -1893,7 +1891,6 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
>  			if (!test_bit(STRIPE_R5C_CACHING, &sh->state) &&
>  			    test_bit(R5_Wantwrite, &sh->dev[sh->pd_idx].flags)) {
>  				r5l_recovery_replay_one_stripe(conf, sh, ctx);
> -				sh->log_start = ctx->pos;
>  				list_move_tail(&sh->lru, cached_stripe_list);
>  			}
>  			r5l_recovery_load_data(log, sh, ctx, payload,
> @@ -1932,8 +1929,6 @@ static void r5c_recovery_load_one_stripe(struct r5l_log *log,
>  			set_bit(R5_UPTODATE, &dev->flags);
>  		}
>  	}
> -	list_add_tail(&sh->r5c, &log->stripe_in_journal_list);
> -	atomic_inc(&log->stripe_in_journal_count);
>  }
>  
>  /*
> @@ -2123,9 +2118,11 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log,
>  		sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page,
>  			     REQ_OP_WRITE, WRITE_FUA, false);
>  		sh->log_start = ctx->pos;
> +		list_add_tail(&sh->r5c, &log->stripe_in_journal_list);
> +		atomic_inc(&log->stripe_in_journal_count);
>  		ctx->pos = write_pos;
>  		ctx->seq += 1;
> -
> +		log->next_checkpoint = sh->log_start;
>  		list_del_init(&sh->lru);
>  		raid5_release_stripe(sh);
>  	}
> @@ -2139,7 +2136,6 @@ static int r5l_recovery_log(struct r5l_log *log)
>  	struct r5l_recovery_ctx ctx;
>  	int ret;
>  	sector_t pos;
> -	struct stripe_head *sh;
>  
>  	ctx.pos = log->last_checkpoint;
>  	ctx.seq = log->last_cp_seq;
> @@ -2164,9 +2160,6 @@ static int r5l_recovery_log(struct r5l_log *log)
>  		log->next_checkpoint = ctx.pos;
>  		r5l_log_write_empty_meta_block(log, ctx.pos, ctx.seq++);
>  		ctx.pos = r5l_ring_add(log, ctx.pos, BLOCK_SECTORS);
> -	} else {
> -		sh = list_last_entry(&ctx.cached_list, struct stripe_head, lru);
> -		log->next_checkpoint = sh->log_start;
>  	}
>  
>  	if ((ctx.data_only_stripes == 0) && (ctx.data_parity_stripes == 0))
> -- 
> 2.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-12-08 18:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-07 17:42 [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Song Liu
2016-12-07 17:42 ` [PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000 Song Liu
2016-12-07 17:42 ` [PATCH 3/3] md/r5cache: sh->log_start in recovery Song Liu
2016-12-08 18:41   ` Shaohua Li
2016-12-08 18:28 ` [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Shaohua Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).