* [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() @ 2016-12-07 17:42 Song Liu 2016-12-07 17:42 ` [PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000 Song Liu ` (2 more replies) 0 siblings, 3 replies; 5+ messages in thread From: Song Liu @ 2016-12-07 17:42 UTC (permalink / raw) To: linux-raid Cc: neilb, shli, kernel-team, dan.j.williams, hch, liuzhengyuan, liuyun01, Song Liu r5l_recovery_create_empty_meta_block() creates crc for the empty metablock. After the metablock is updated, we need clear the checksum before recalculate it. Signed-off-by: Song Liu <songliubraving@fb.com> --- drivers/md/raid5-cache.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index c3b3124..875f963 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -2117,7 +2117,9 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log, } } mb->meta_size = cpu_to_le32(offset); - mb->checksum = crc32c_le(log->uuid_checksum, mb, PAGE_SIZE); + mb->checksum = 0; + mb->checksum = cpu_to_le32(crc32c_le(log->uuid_checksum, + mb, PAGE_SIZE)); sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page, REQ_OP_WRITE, WRITE_FUA, false); sh->log_start = ctx->pos; -- 2.9.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000 2016-12-07 17:42 [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Song Liu @ 2016-12-07 17:42 ` Song Liu 2016-12-07 17:42 ` [PATCH 3/3] md/r5cache: sh->log_start in recovery Song Liu 2016-12-08 18:28 ` [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Shaohua Li 2 siblings, 0 replies; 5+ messages in thread From: Song Liu @ 2016-12-07 17:42 UTC (permalink / raw) To: linux-raid Cc: neilb, shli, kernel-team, dan.j.williams, hch, liuzhengyuan, liuyun01, Song Liu Currently, we increase journal entry seq by 10 after recovery. However, this is not sufficient in the following case. After crash the journal looks like | seq+0 | +1 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 | If +1 is not valid, we dropped all entries from +1 to +12; and write seq+10: | seq+0 | +10 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 | However, if we write a big journal entry with seq+11, it will connect with some stale journal entry: | seq+0 | +10 | +11 | +12 | To reduce the risk of this issue, we increase seq by 1000 instead. Signed-off-by: Song Liu <songliubraving@fb.com> --- drivers/md/raid5-cache.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 875f963..5301081 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -2003,8 +2003,8 @@ static int r5c_recovery_flush_log(struct r5l_log *log, * happens again, new recovery will start from meta 1. Since meta 2n is * valid now, recovery will think meta 3 is valid, which is wrong. * The solution is we create a new meta in meta2 with its seq == meta - * 1's seq + 10 and let superblock points to meta2. The same recovery will - * not think meta 3 is a valid meta, because its seq doesn't match + * 1's seq + 1000 and let superblock points to meta2. The same recovery + * will not think meta 3 is a valid meta, because its seq doesn't match */ /* @@ -2034,7 +2034,7 @@ static int r5c_recovery_flush_log(struct r5l_log *log, * --------------------------------------------- * ^ ^ * |- log->last_checkpoint |- ctx->pos+1 - * |- log->last_cp_seq |- ctx->seq+11 + * |- log->last_cp_seq |- ctx->seq+1001 * * However, it is not safe to start the state machine yet, because data only * parities are not yet secured in RAID. To save these data only parities, we @@ -2045,7 +2045,7 @@ static int r5c_recovery_flush_log(struct r5l_log *log, * ----------------------------------------------------------------- * ^ ^ * |- log->last_checkpoint |- ctx->pos+n - * |- log->last_cp_seq |- ctx->seq+10+n + * |- log->last_cp_seq |- ctx->seq+1000+n * * If failure happens again during this process, the recovery can safe start * again from log->last_checkpoint. @@ -2057,7 +2057,7 @@ static int r5c_recovery_flush_log(struct r5l_log *log, * ----------------------------------------------------------------- * ^ ^ * |- log->last_checkpoint |- ctx->pos+n - * |- log->last_cp_seq |- ctx->seq+10+n + * |- log->last_cp_seq |- ctx->seq+1000+n * * Then we can safely start the state machine. If failure happens from this * point on, the recovery will start from new log->last_checkpoint. @@ -2157,8 +2157,8 @@ static int r5l_recovery_log(struct r5l_log *log) if (ret) return ret; - pos = ctx.pos; - ctx.seq += 10; + pos = ctx.pos; + ctx.seq += 1000; if (ctx.data_only_stripes == 0) { log->next_checkpoint = ctx.pos; -- 2.9.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/3] md/r5cache: sh->log_start in recovery 2016-12-07 17:42 [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Song Liu 2016-12-07 17:42 ` [PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000 Song Liu @ 2016-12-07 17:42 ` Song Liu 2016-12-08 18:41 ` Shaohua Li 2016-12-08 18:28 ` [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Shaohua Li 2 siblings, 1 reply; 5+ messages in thread From: Song Liu @ 2016-12-07 17:42 UTC (permalink / raw) To: linux-raid Cc: neilb, shli, kernel-team, dan.j.williams, hch, liuzhengyuan, liuyun01, Song Liu We only need to update sh->log_start at the end of recovery, which is r5c_recovery_rewrite_data_only_stripes(). In when there is data-only stripes, log->next_checkpoint is also set in r5c_recovery_rewrite_data_only_stripes(). Signed-off-by: Song Liu <songliubraving@fb.com> --- drivers/md/raid5-cache.c | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 5301081..ae2684a 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -1681,8 +1681,7 @@ r5l_recovery_replay_one_stripe(struct r5conf *conf, static struct stripe_head * r5c_recovery_alloc_stripe(struct r5conf *conf, - sector_t stripe_sect, - sector_t log_start) + sector_t stripe_sect) { struct stripe_head *sh; @@ -1691,7 +1690,6 @@ r5c_recovery_alloc_stripe(struct r5conf *conf, return NULL; /* no more stripe available */ r5l_recovery_reset_stripe(sh); - sh->log_start = log_start; return sh; } @@ -1861,7 +1859,7 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log, stripe_sect); if (!sh) { - sh = r5c_recovery_alloc_stripe(conf, stripe_sect, ctx->pos); + sh = r5c_recovery_alloc_stripe(conf, stripe_sect); /* * cannot get stripe from raid5_get_active_stripe * try replay some stripes @@ -1870,7 +1868,7 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log, r5c_recovery_replay_stripes( cached_stripe_list, ctx); sh = r5c_recovery_alloc_stripe( - conf, stripe_sect, ctx->pos); + conf, stripe_sect); } if (!sh) { pr_debug("md/raid:%s: Increasing stripe cache size to %d to recovery data on journal.\n", @@ -1878,8 +1876,8 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log, conf->min_nr_stripes * 2); raid5_set_cache_size(mddev, conf->min_nr_stripes * 2); - sh = r5c_recovery_alloc_stripe( - conf, stripe_sect, ctx->pos); + sh = r5c_recovery_alloc_stripe(conf, + stripe_sect); } if (!sh) { pr_err("md/raid:%s: Cannot get enough stripes due to memory pressure. Recovery failed.\n", @@ -1893,7 +1891,6 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log, if (!test_bit(STRIPE_R5C_CACHING, &sh->state) && test_bit(R5_Wantwrite, &sh->dev[sh->pd_idx].flags)) { r5l_recovery_replay_one_stripe(conf, sh, ctx); - sh->log_start = ctx->pos; list_move_tail(&sh->lru, cached_stripe_list); } r5l_recovery_load_data(log, sh, ctx, payload, @@ -1932,8 +1929,6 @@ static void r5c_recovery_load_one_stripe(struct r5l_log *log, set_bit(R5_UPTODATE, &dev->flags); } } - list_add_tail(&sh->r5c, &log->stripe_in_journal_list); - atomic_inc(&log->stripe_in_journal_count); } /* @@ -2123,9 +2118,11 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log, sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page, REQ_OP_WRITE, WRITE_FUA, false); sh->log_start = ctx->pos; + list_add_tail(&sh->r5c, &log->stripe_in_journal_list); + atomic_inc(&log->stripe_in_journal_count); ctx->pos = write_pos; ctx->seq += 1; - + log->next_checkpoint = sh->log_start; list_del_init(&sh->lru); raid5_release_stripe(sh); } @@ -2139,7 +2136,6 @@ static int r5l_recovery_log(struct r5l_log *log) struct r5l_recovery_ctx ctx; int ret; sector_t pos; - struct stripe_head *sh; ctx.pos = log->last_checkpoint; ctx.seq = log->last_cp_seq; @@ -2164,9 +2160,6 @@ static int r5l_recovery_log(struct r5l_log *log) log->next_checkpoint = ctx.pos; r5l_log_write_empty_meta_block(log, ctx.pos, ctx.seq++); ctx.pos = r5l_ring_add(log, ctx.pos, BLOCK_SECTORS); - } else { - sh = list_last_entry(&ctx.cached_list, struct stripe_head, lru); - log->next_checkpoint = sh->log_start; } if ((ctx.data_only_stripes == 0) && (ctx.data_parity_stripes == 0)) -- 2.9.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 3/3] md/r5cache: sh->log_start in recovery 2016-12-07 17:42 ` [PATCH 3/3] md/r5cache: sh->log_start in recovery Song Liu @ 2016-12-08 18:41 ` Shaohua Li 0 siblings, 0 replies; 5+ messages in thread From: Shaohua Li @ 2016-12-08 18:41 UTC (permalink / raw) To: Song Liu Cc: linux-raid, neilb, shli, kernel-team, dan.j.williams, hch, liuzhengyuan, liuyun01 On Wed, Dec 07, 2016 at 09:42:07AM -0800, Song Liu wrote: > We only need to update sh->log_start at the end of recovery, > which is r5c_recovery_rewrite_data_only_stripes(). > > In when there is data-only stripes, log->next_checkpoint is > also set in r5c_recovery_rewrite_data_only_stripes(). I didn't get this. please describe why this patch is required instead of what this patch does. > Signed-off-by: Song Liu <songliubraving@fb.com> > --- > drivers/md/raid5-cache.c | 23 ++++++++--------------- > 1 file changed, 8 insertions(+), 15 deletions(-) > > diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c > index 5301081..ae2684a 100644 > --- a/drivers/md/raid5-cache.c > +++ b/drivers/md/raid5-cache.c > @@ -1681,8 +1681,7 @@ r5l_recovery_replay_one_stripe(struct r5conf *conf, > > static struct stripe_head * > r5c_recovery_alloc_stripe(struct r5conf *conf, > - sector_t stripe_sect, > - sector_t log_start) > + sector_t stripe_sect) > { > struct stripe_head *sh; > > @@ -1691,7 +1690,6 @@ r5c_recovery_alloc_stripe(struct r5conf *conf, > return NULL; /* no more stripe available */ > > r5l_recovery_reset_stripe(sh); > - sh->log_start = log_start; > > return sh; > } > @@ -1861,7 +1859,7 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log, > stripe_sect); > > if (!sh) { > - sh = r5c_recovery_alloc_stripe(conf, stripe_sect, ctx->pos); > + sh = r5c_recovery_alloc_stripe(conf, stripe_sect); > /* > * cannot get stripe from raid5_get_active_stripe > * try replay some stripes > @@ -1870,7 +1868,7 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log, > r5c_recovery_replay_stripes( > cached_stripe_list, ctx); > sh = r5c_recovery_alloc_stripe( > - conf, stripe_sect, ctx->pos); > + conf, stripe_sect); > } > if (!sh) { > pr_debug("md/raid:%s: Increasing stripe cache size to %d to recovery data on journal.\n", > @@ -1878,8 +1876,8 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log, > conf->min_nr_stripes * 2); > raid5_set_cache_size(mddev, > conf->min_nr_stripes * 2); > - sh = r5c_recovery_alloc_stripe( > - conf, stripe_sect, ctx->pos); > + sh = r5c_recovery_alloc_stripe(conf, > + stripe_sect); > } > if (!sh) { > pr_err("md/raid:%s: Cannot get enough stripes due to memory pressure. Recovery failed.\n", > @@ -1893,7 +1891,6 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log, > if (!test_bit(STRIPE_R5C_CACHING, &sh->state) && > test_bit(R5_Wantwrite, &sh->dev[sh->pd_idx].flags)) { > r5l_recovery_replay_one_stripe(conf, sh, ctx); > - sh->log_start = ctx->pos; > list_move_tail(&sh->lru, cached_stripe_list); > } > r5l_recovery_load_data(log, sh, ctx, payload, > @@ -1932,8 +1929,6 @@ static void r5c_recovery_load_one_stripe(struct r5l_log *log, > set_bit(R5_UPTODATE, &dev->flags); > } > } > - list_add_tail(&sh->r5c, &log->stripe_in_journal_list); > - atomic_inc(&log->stripe_in_journal_count); > } > > /* > @@ -2123,9 +2118,11 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log, > sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page, > REQ_OP_WRITE, WRITE_FUA, false); > sh->log_start = ctx->pos; > + list_add_tail(&sh->r5c, &log->stripe_in_journal_list); > + atomic_inc(&log->stripe_in_journal_count); > ctx->pos = write_pos; > ctx->seq += 1; > - > + log->next_checkpoint = sh->log_start; > list_del_init(&sh->lru); > raid5_release_stripe(sh); > } > @@ -2139,7 +2136,6 @@ static int r5l_recovery_log(struct r5l_log *log) > struct r5l_recovery_ctx ctx; > int ret; > sector_t pos; > - struct stripe_head *sh; > > ctx.pos = log->last_checkpoint; > ctx.seq = log->last_cp_seq; > @@ -2164,9 +2160,6 @@ static int r5l_recovery_log(struct r5l_log *log) > log->next_checkpoint = ctx.pos; > r5l_log_write_empty_meta_block(log, ctx.pos, ctx.seq++); > ctx.pos = r5l_ring_add(log, ctx.pos, BLOCK_SECTORS); > - } else { > - sh = list_last_entry(&ctx.cached_list, struct stripe_head, lru); > - log->next_checkpoint = sh->log_start; > } > > if ((ctx.data_only_stripes == 0) && (ctx.data_parity_stripes == 0)) > -- > 2.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() 2016-12-07 17:42 [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Song Liu 2016-12-07 17:42 ` [PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000 Song Liu 2016-12-07 17:42 ` [PATCH 3/3] md/r5cache: sh->log_start in recovery Song Liu @ 2016-12-08 18:28 ` Shaohua Li 2 siblings, 0 replies; 5+ messages in thread From: Shaohua Li @ 2016-12-08 18:28 UTC (permalink / raw) To: Song Liu Cc: linux-raid, neilb, shli, kernel-team, dan.j.williams, hch, liuzhengyuan, liuyun01 On Wed, Dec 07, 2016 at 09:42:05AM -0800, Song Liu wrote: > r5l_recovery_create_empty_meta_block() creates crc for the empty > metablock. After the metablock is updated, we need clear the > checksum before recalculate it. applied this one. However, I moved out checksum calculation from r5l_recovery_create_empty_meta_block. We should calculate checksum after all fields in meta block is updated. Thanks, Shaohua > Signed-off-by: Song Liu <songliubraving@fb.com> > --- > drivers/md/raid5-cache.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c > index c3b3124..875f963 100644 > --- a/drivers/md/raid5-cache.c > +++ b/drivers/md/raid5-cache.c > @@ -2117,7 +2117,9 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log, > } > } > mb->meta_size = cpu_to_le32(offset); > - mb->checksum = crc32c_le(log->uuid_checksum, mb, PAGE_SIZE); > + mb->checksum = 0; > + mb->checksum = cpu_to_le32(crc32c_le(log->uuid_checksum, > + mb, PAGE_SIZE)); > sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page, > REQ_OP_WRITE, WRITE_FUA, false); > sh->log_start = ctx->pos; > -- > 2.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-12-08 18:41 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-12-07 17:42 [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Song Liu 2016-12-07 17:42 ` [PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000 Song Liu 2016-12-07 17:42 ` [PATCH 3/3] md/r5cache: sh->log_start in recovery Song Liu 2016-12-08 18:41 ` Shaohua Li 2016-12-08 18:28 ` [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Shaohua Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).