* [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes()
@ 2016-12-07 17:42 Song Liu
2016-12-07 17:42 ` [PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000 Song Liu
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Song Liu @ 2016-12-07 17:42 UTC (permalink / raw)
To: linux-raid
Cc: neilb, shli, kernel-team, dan.j.williams, hch, liuzhengyuan,
liuyun01, Song Liu
r5l_recovery_create_empty_meta_block() creates crc for the empty
metablock. After the metablock is updated, we need clear the
checksum before recalculate it.
Signed-off-by: Song Liu <songliubraving@fb.com>
---
drivers/md/raid5-cache.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index c3b3124..875f963 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -2117,7 +2117,9 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log,
}
}
mb->meta_size = cpu_to_le32(offset);
- mb->checksum = crc32c_le(log->uuid_checksum, mb, PAGE_SIZE);
+ mb->checksum = 0;
+ mb->checksum = cpu_to_le32(crc32c_le(log->uuid_checksum,
+ mb, PAGE_SIZE));
sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page,
REQ_OP_WRITE, WRITE_FUA, false);
sh->log_start = ctx->pos;
--
2.9.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000
2016-12-07 17:42 [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Song Liu
@ 2016-12-07 17:42 ` Song Liu
2016-12-07 17:42 ` [PATCH 3/3] md/r5cache: sh->log_start in recovery Song Liu
2016-12-08 18:28 ` [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Shaohua Li
2 siblings, 0 replies; 5+ messages in thread
From: Song Liu @ 2016-12-07 17:42 UTC (permalink / raw)
To: linux-raid
Cc: neilb, shli, kernel-team, dan.j.williams, hch, liuzhengyuan,
liuyun01, Song Liu
Currently, we increase journal entry seq by 10 after recovery.
However, this is not sufficient in the following case.
After crash the journal looks like
| seq+0 | +1 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 |
If +1 is not valid, we dropped all entries from +1 to +12; and
write seq+10:
| seq+0 | +10 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 |
However, if we write a big journal entry with seq+11, it will
connect with some stale journal entry:
| seq+0 | +10 | +11 | +12 |
To reduce the risk of this issue, we increase seq by 1000 instead.
Signed-off-by: Song Liu <songliubraving@fb.com>
---
drivers/md/raid5-cache.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 875f963..5301081 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -2003,8 +2003,8 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
* happens again, new recovery will start from meta 1. Since meta 2n is
* valid now, recovery will think meta 3 is valid, which is wrong.
* The solution is we create a new meta in meta2 with its seq == meta
- * 1's seq + 10 and let superblock points to meta2. The same recovery will
- * not think meta 3 is a valid meta, because its seq doesn't match
+ * 1's seq + 1000 and let superblock points to meta2. The same recovery
+ * will not think meta 3 is a valid meta, because its seq doesn't match
*/
/*
@@ -2034,7 +2034,7 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
* ---------------------------------------------
* ^ ^
* |- log->last_checkpoint |- ctx->pos+1
- * |- log->last_cp_seq |- ctx->seq+11
+ * |- log->last_cp_seq |- ctx->seq+1001
*
* However, it is not safe to start the state machine yet, because data only
* parities are not yet secured in RAID. To save these data only parities, we
@@ -2045,7 +2045,7 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
* -----------------------------------------------------------------
* ^ ^
* |- log->last_checkpoint |- ctx->pos+n
- * |- log->last_cp_seq |- ctx->seq+10+n
+ * |- log->last_cp_seq |- ctx->seq+1000+n
*
* If failure happens again during this process, the recovery can safe start
* again from log->last_checkpoint.
@@ -2057,7 +2057,7 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
* -----------------------------------------------------------------
* ^ ^
* |- log->last_checkpoint |- ctx->pos+n
- * |- log->last_cp_seq |- ctx->seq+10+n
+ * |- log->last_cp_seq |- ctx->seq+1000+n
*
* Then we can safely start the state machine. If failure happens from this
* point on, the recovery will start from new log->last_checkpoint.
@@ -2157,8 +2157,8 @@ static int r5l_recovery_log(struct r5l_log *log)
if (ret)
return ret;
- pos = ctx.pos;
- ctx.seq += 10;
+ pos = ctx.pos;
+ ctx.seq += 1000;
if (ctx.data_only_stripes == 0) {
log->next_checkpoint = ctx.pos;
--
2.9.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/3] md/r5cache: sh->log_start in recovery
2016-12-07 17:42 [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Song Liu
2016-12-07 17:42 ` [PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000 Song Liu
@ 2016-12-07 17:42 ` Song Liu
2016-12-08 18:41 ` Shaohua Li
2016-12-08 18:28 ` [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Shaohua Li
2 siblings, 1 reply; 5+ messages in thread
From: Song Liu @ 2016-12-07 17:42 UTC (permalink / raw)
To: linux-raid
Cc: neilb, shli, kernel-team, dan.j.williams, hch, liuzhengyuan,
liuyun01, Song Liu
We only need to update sh->log_start at the end of recovery,
which is r5c_recovery_rewrite_data_only_stripes().
In when there is data-only stripes, log->next_checkpoint is
also set in r5c_recovery_rewrite_data_only_stripes().
Signed-off-by: Song Liu <songliubraving@fb.com>
---
drivers/md/raid5-cache.c | 23 ++++++++---------------
1 file changed, 8 insertions(+), 15 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 5301081..ae2684a 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -1681,8 +1681,7 @@ r5l_recovery_replay_one_stripe(struct r5conf *conf,
static struct stripe_head *
r5c_recovery_alloc_stripe(struct r5conf *conf,
- sector_t stripe_sect,
- sector_t log_start)
+ sector_t stripe_sect)
{
struct stripe_head *sh;
@@ -1691,7 +1690,6 @@ r5c_recovery_alloc_stripe(struct r5conf *conf,
return NULL; /* no more stripe available */
r5l_recovery_reset_stripe(sh);
- sh->log_start = log_start;
return sh;
}
@@ -1861,7 +1859,7 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
stripe_sect);
if (!sh) {
- sh = r5c_recovery_alloc_stripe(conf, stripe_sect, ctx->pos);
+ sh = r5c_recovery_alloc_stripe(conf, stripe_sect);
/*
* cannot get stripe from raid5_get_active_stripe
* try replay some stripes
@@ -1870,7 +1868,7 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
r5c_recovery_replay_stripes(
cached_stripe_list, ctx);
sh = r5c_recovery_alloc_stripe(
- conf, stripe_sect, ctx->pos);
+ conf, stripe_sect);
}
if (!sh) {
pr_debug("md/raid:%s: Increasing stripe cache size to %d to recovery data on journal.\n",
@@ -1878,8 +1876,8 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
conf->min_nr_stripes * 2);
raid5_set_cache_size(mddev,
conf->min_nr_stripes * 2);
- sh = r5c_recovery_alloc_stripe(
- conf, stripe_sect, ctx->pos);
+ sh = r5c_recovery_alloc_stripe(conf,
+ stripe_sect);
}
if (!sh) {
pr_err("md/raid:%s: Cannot get enough stripes due to memory pressure. Recovery failed.\n",
@@ -1893,7 +1891,6 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
if (!test_bit(STRIPE_R5C_CACHING, &sh->state) &&
test_bit(R5_Wantwrite, &sh->dev[sh->pd_idx].flags)) {
r5l_recovery_replay_one_stripe(conf, sh, ctx);
- sh->log_start = ctx->pos;
list_move_tail(&sh->lru, cached_stripe_list);
}
r5l_recovery_load_data(log, sh, ctx, payload,
@@ -1932,8 +1929,6 @@ static void r5c_recovery_load_one_stripe(struct r5l_log *log,
set_bit(R5_UPTODATE, &dev->flags);
}
}
- list_add_tail(&sh->r5c, &log->stripe_in_journal_list);
- atomic_inc(&log->stripe_in_journal_count);
}
/*
@@ -2123,9 +2118,11 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log,
sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page,
REQ_OP_WRITE, WRITE_FUA, false);
sh->log_start = ctx->pos;
+ list_add_tail(&sh->r5c, &log->stripe_in_journal_list);
+ atomic_inc(&log->stripe_in_journal_count);
ctx->pos = write_pos;
ctx->seq += 1;
-
+ log->next_checkpoint = sh->log_start;
list_del_init(&sh->lru);
raid5_release_stripe(sh);
}
@@ -2139,7 +2136,6 @@ static int r5l_recovery_log(struct r5l_log *log)
struct r5l_recovery_ctx ctx;
int ret;
sector_t pos;
- struct stripe_head *sh;
ctx.pos = log->last_checkpoint;
ctx.seq = log->last_cp_seq;
@@ -2164,9 +2160,6 @@ static int r5l_recovery_log(struct r5l_log *log)
log->next_checkpoint = ctx.pos;
r5l_log_write_empty_meta_block(log, ctx.pos, ctx.seq++);
ctx.pos = r5l_ring_add(log, ctx.pos, BLOCK_SECTORS);
- } else {
- sh = list_last_entry(&ctx.cached_list, struct stripe_head, lru);
- log->next_checkpoint = sh->log_start;
}
if ((ctx.data_only_stripes == 0) && (ctx.data_parity_stripes == 0))
--
2.9.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes()
2016-12-07 17:42 [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Song Liu
2016-12-07 17:42 ` [PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000 Song Liu
2016-12-07 17:42 ` [PATCH 3/3] md/r5cache: sh->log_start in recovery Song Liu
@ 2016-12-08 18:28 ` Shaohua Li
2 siblings, 0 replies; 5+ messages in thread
From: Shaohua Li @ 2016-12-08 18:28 UTC (permalink / raw)
To: Song Liu
Cc: linux-raid, neilb, shli, kernel-team, dan.j.williams, hch,
liuzhengyuan, liuyun01
On Wed, Dec 07, 2016 at 09:42:05AM -0800, Song Liu wrote:
> r5l_recovery_create_empty_meta_block() creates crc for the empty
> metablock. After the metablock is updated, we need clear the
> checksum before recalculate it.
applied this one. However, I moved out checksum calculation from
r5l_recovery_create_empty_meta_block. We should calculate checksum after all
fields in meta block is updated.
Thanks,
Shaohua
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
> drivers/md/raid5-cache.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> index c3b3124..875f963 100644
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -2117,7 +2117,9 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log,
> }
> }
> mb->meta_size = cpu_to_le32(offset);
> - mb->checksum = crc32c_le(log->uuid_checksum, mb, PAGE_SIZE);
> + mb->checksum = 0;
> + mb->checksum = cpu_to_le32(crc32c_le(log->uuid_checksum,
> + mb, PAGE_SIZE));
> sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page,
> REQ_OP_WRITE, WRITE_FUA, false);
> sh->log_start = ctx->pos;
> --
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 3/3] md/r5cache: sh->log_start in recovery
2016-12-07 17:42 ` [PATCH 3/3] md/r5cache: sh->log_start in recovery Song Liu
@ 2016-12-08 18:41 ` Shaohua Li
0 siblings, 0 replies; 5+ messages in thread
From: Shaohua Li @ 2016-12-08 18:41 UTC (permalink / raw)
To: Song Liu
Cc: linux-raid, neilb, shli, kernel-team, dan.j.williams, hch,
liuzhengyuan, liuyun01
On Wed, Dec 07, 2016 at 09:42:07AM -0800, Song Liu wrote:
> We only need to update sh->log_start at the end of recovery,
> which is r5c_recovery_rewrite_data_only_stripes().
>
> In when there is data-only stripes, log->next_checkpoint is
> also set in r5c_recovery_rewrite_data_only_stripes().
I didn't get this. please describe why this patch is required instead of what
this patch does.
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
> drivers/md/raid5-cache.c | 23 ++++++++---------------
> 1 file changed, 8 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> index 5301081..ae2684a 100644
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -1681,8 +1681,7 @@ r5l_recovery_replay_one_stripe(struct r5conf *conf,
>
> static struct stripe_head *
> r5c_recovery_alloc_stripe(struct r5conf *conf,
> - sector_t stripe_sect,
> - sector_t log_start)
> + sector_t stripe_sect)
> {
> struct stripe_head *sh;
>
> @@ -1691,7 +1690,6 @@ r5c_recovery_alloc_stripe(struct r5conf *conf,
> return NULL; /* no more stripe available */
>
> r5l_recovery_reset_stripe(sh);
> - sh->log_start = log_start;
>
> return sh;
> }
> @@ -1861,7 +1859,7 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
> stripe_sect);
>
> if (!sh) {
> - sh = r5c_recovery_alloc_stripe(conf, stripe_sect, ctx->pos);
> + sh = r5c_recovery_alloc_stripe(conf, stripe_sect);
> /*
> * cannot get stripe from raid5_get_active_stripe
> * try replay some stripes
> @@ -1870,7 +1868,7 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
> r5c_recovery_replay_stripes(
> cached_stripe_list, ctx);
> sh = r5c_recovery_alloc_stripe(
> - conf, stripe_sect, ctx->pos);
> + conf, stripe_sect);
> }
> if (!sh) {
> pr_debug("md/raid:%s: Increasing stripe cache size to %d to recovery data on journal.\n",
> @@ -1878,8 +1876,8 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
> conf->min_nr_stripes * 2);
> raid5_set_cache_size(mddev,
> conf->min_nr_stripes * 2);
> - sh = r5c_recovery_alloc_stripe(
> - conf, stripe_sect, ctx->pos);
> + sh = r5c_recovery_alloc_stripe(conf,
> + stripe_sect);
> }
> if (!sh) {
> pr_err("md/raid:%s: Cannot get enough stripes due to memory pressure. Recovery failed.\n",
> @@ -1893,7 +1891,6 @@ r5c_recovery_analyze_meta_block(struct r5l_log *log,
> if (!test_bit(STRIPE_R5C_CACHING, &sh->state) &&
> test_bit(R5_Wantwrite, &sh->dev[sh->pd_idx].flags)) {
> r5l_recovery_replay_one_stripe(conf, sh, ctx);
> - sh->log_start = ctx->pos;
> list_move_tail(&sh->lru, cached_stripe_list);
> }
> r5l_recovery_load_data(log, sh, ctx, payload,
> @@ -1932,8 +1929,6 @@ static void r5c_recovery_load_one_stripe(struct r5l_log *log,
> set_bit(R5_UPTODATE, &dev->flags);
> }
> }
> - list_add_tail(&sh->r5c, &log->stripe_in_journal_list);
> - atomic_inc(&log->stripe_in_journal_count);
> }
>
> /*
> @@ -2123,9 +2118,11 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log,
> sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page,
> REQ_OP_WRITE, WRITE_FUA, false);
> sh->log_start = ctx->pos;
> + list_add_tail(&sh->r5c, &log->stripe_in_journal_list);
> + atomic_inc(&log->stripe_in_journal_count);
> ctx->pos = write_pos;
> ctx->seq += 1;
> -
> + log->next_checkpoint = sh->log_start;
> list_del_init(&sh->lru);
> raid5_release_stripe(sh);
> }
> @@ -2139,7 +2136,6 @@ static int r5l_recovery_log(struct r5l_log *log)
> struct r5l_recovery_ctx ctx;
> int ret;
> sector_t pos;
> - struct stripe_head *sh;
>
> ctx.pos = log->last_checkpoint;
> ctx.seq = log->last_cp_seq;
> @@ -2164,9 +2160,6 @@ static int r5l_recovery_log(struct r5l_log *log)
> log->next_checkpoint = ctx.pos;
> r5l_log_write_empty_meta_block(log, ctx.pos, ctx.seq++);
> ctx.pos = r5l_ring_add(log, ctx.pos, BLOCK_SECTORS);
> - } else {
> - sh = list_last_entry(&ctx.cached_list, struct stripe_head, lru);
> - log->next_checkpoint = sh->log_start;
> }
>
> if ((ctx.data_only_stripes == 0) && (ctx.data_parity_stripes == 0))
> --
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-12-08 18:41 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-07 17:42 [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Song Liu
2016-12-07 17:42 ` [PATCH 2/3] md/r5cache: after recovery, increase journal seq by 1000 Song Liu
2016-12-07 17:42 ` [PATCH 3/3] md/r5cache: sh->log_start in recovery Song Liu
2016-12-08 18:41 ` Shaohua Li
2016-12-08 18:28 ` [PATCH 1/3] md/raid5-cache: fix crc in rewrite_data_only_stripes() Shaohua Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).