All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shaohua Li <shli@kernel.org>
To: Song Liu <songliubraving@fb.com>
Cc: linux-raid@vger.kernel.org, neilb@suse.com, shli@fb.com,
	kernel-team@fb.com, dan.j.williams@intel.com, hch@infradead.org,
	liuzhengyuang521@gmail.com, liuzhengyuan@kylinos.cn
Subject: Re: [PATCH v2 4/6] r5cache: r5c recovery
Date: Tue, 27 Sep 2016 18:08:06 -0700	[thread overview]
Message-ID: <20160928010806.GD98100@kernel.org> (raw)
In-Reply-To: <20160926233050.3351081-5-songliubraving@fb.com>

On Mon, Sep 26, 2016 at 04:30:48PM -0700, Song Liu wrote:
> For Data-Only strips, we need to finish complete calculate parity and
> finish the full reconstruct write or RMW write. For simplicity, in
> the recovery, we load the stripe to stripe cache. Once the array is
> started, the stripe cache state machine will handle these stripes
> through normal write path.

please make sure not change the behavior of writethrough mode. In writethrough,
we discard data-only stripes.

Is it safe to run the state machine in recovery stage? For exmaple, md
personablity ->run is called before bitmap is initialized.

> r5c_recovery_flush_log contains the main procedure of recovery. The
> recovery code first scans through the journal and loads data to
> stripe cache. The code keeps tracks of all these stripes in a list
> (use sh->lru and ctx->cached_list), stripes in the list are
> organized in the order of its first appearance on the journal.
> During the scan, the recovery code assesses each stripe as
> Data-Parity or Data-Only.
> 
> During scan, the array may run out of stripe cache. In these cases,
> the recovery code tries to release some stripe head by replaying
> existing Data-Parity stripes. Once these replays are done, these
> stripes can be released. When releasing Data-Parity stripes is not
> enough, the recovery code will also call raid5_set_cache_size to
> increase stripe cache size.
> 
> At the end of scan, the recovery code replays all Data-Parity
> stripes, and sets proper states for Data-Only stripes. The recovery
> code also increases seq number by 10 and rewrites all Data-Only
> stripes to journal. This is to avoid confusion after repeated
> crashes. More details is explained in raid5-cache.c before
> r5c_recovery_rewrite_data_only_stripes().
...
> +r5c_recovery_analyze_meta_block(struct r5l_log *log,
> +				struct r5l_recovery_ctx *ctx,
> +				struct list_head *cached_stripe_list)
> +{
> +	struct mddev *mddev = log->rdev->mddev;
> +	struct r5conf *conf = mddev->private;
>  	struct r5l_meta_block *mb;
> -	int offset;
> +	struct r5l_payload_data_parity *payload;
> +	int mb_offset;
>  	sector_t log_offset;
> -	sector_t stripe_sector;
> +	sector_t stripe_sect;
> +	struct stripe_head *sh;
> +	int ret;
> +
> +	/* for mismatch in data blocks, we will drop all data in this mb, but
> +	 * we will still read next mb for other data with FLUSH flag, as
> +	 * io_unit could finish out of order.
> +	 */
please correct the format

> +	ret = r5l_recovery_verify_data_checksum_for_mb(log, ctx);
> +	if (ret == -EINVAL)
> +		return -EAGAIN;
> +	else if (ret)
> +		return ret;
>  
>  	mb = page_address(ctx->meta_page);
> -	offset = sizeof(struct r5l_meta_block);
> +	mb_offset = sizeof(struct r5l_meta_block);
>  	log_offset = r5l_ring_add(log, ctx->pos, BLOCK_SECTORS);
>  
> -	while (offset < le32_to_cpu(mb->meta_size)) {
> +	while (mb_offset < le32_to_cpu(mb->meta_size)) {
>  		int dd;
>  
> -		payload = (void *)mb + offset;
> -		stripe_sector = raid5_compute_sector(conf,
> -						     le64_to_cpu(payload->location), 0, &dd, NULL);
> -		if (r5l_recovery_flush_one_stripe(log, ctx, stripe_sector,
> -						  &offset, &log_offset))
> +		payload = (void *)mb + mb_offset;
> +		stripe_sect = (payload->header.type == R5LOG_PAYLOAD_DATA) ?
> +			raid5_compute_sector(
> +				conf, le64_to_cpu(payload->location), 0, &dd,
> +				NULL)
> +			: le64_to_cpu(payload->location);
> +
> +		sh = r5c_recovery_lookup_stripe(cached_stripe_list,
> +						stripe_sect);
> +
> +		if (!sh) {
> +			sh = r5c_recovery_alloc_stripe(conf, cached_stripe_list,
> +						       stripe_sect, ctx->pos);
> +			/* cannot get stripe from raid5_get_active_stripe
> +			 * try replay some stripes
> +			 */
ditto

Thanks,
Shaohua

  reply	other threads:[~2016-09-28  1:08 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-26 23:30 [PATCH v2 0/6] raid5-cache: enabling cache features Song Liu
2016-09-26 23:30 ` [PATCH v2 1/6] r5cache: write part of r5cache Song Liu
2016-09-27 22:51   ` Shaohua Li
2016-09-29 23:06     ` Song Liu
2016-09-26 23:30 ` [PATCH v2 2/6] r5cache: sysfs entry r5c_state Song Liu
2016-09-27 22:58   ` Shaohua Li
2016-09-26 23:30 ` [PATCH v2 3/6] r5cache: reclaim support Song Liu
2016-09-28  0:34   ` Shaohua Li
2016-10-04 21:59     ` Song Liu
2016-09-26 23:30 ` [PATCH v2 4/6] r5cache: r5c recovery Song Liu
2016-09-28  1:08   ` Shaohua Li [this message]
2016-09-26 23:30 ` [PATCH v2 5/6] r5cache: handle SYNC and FUA Song Liu
2016-09-28  1:32   ` Shaohua Li
2016-09-26 23:30 ` [PATCH v2 6/6] md/r5cache: decrease the counter after full-write stripe was reclaimed Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160928010806.GD98100@kernel.org \
    --to=shli@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=kernel-team@fb.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=liuzhengyuan@kylinos.cn \
    --cc=liuzhengyuang521@gmail.com \
    --cc=neilb@suse.com \
    --cc=shli@fb.com \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.