Re: [PATCH v6 05/11] md/r5cache: write-out mode and reclaim support

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.com>
To: linux-raid@vger.kernel.org
Cc: shli@fb.com, kernel-team@fb.com, dan.j.williams@intel.com,
	hch@infradead.org, liuzhengyuang521@gmail.com,
	liuzhengyuan@kylinos.cn, Song Liu <songliubraving@fb.com>
Subject: Re: [PATCH v6 05/11] md/r5cache: write-out mode and reclaim support
Date: Thu, 17 Nov 2016 11:28:57 +1100	[thread overview]
Message-ID: <87zikz7xva.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20161110204623.3484694-6-songliubraving@fb.com>

[-- Attachment #1: Type: text/plain, Size: 5477 bytes --]

On Fri, Nov 11 2016, Song Liu wrote:

> +/*
> + * evaluate log space usage and update R5C_LOG_TIGHT and R5C_LOG_CRITICAL
> + *
> + * R5C_LOG_TIGHT is set when free space on the log device is less than 3x of
> + * reclaim_required_space. R5C_LOG_CRITICAL is set when free space on the log
> + * device is less than 2x of reclaim_required_space.
> + */
> +static inline void r5c_update_log_state(struct r5l_log *log)
> +{
> +	struct r5conf *conf = log->rdev->mddev->private;
> +	sector_t free_space;
> +	sector_t reclaim_space;
> +
> +	if (!r5c_is_writeback(log))
> +		return;
> +
> +	free_space = r5l_ring_distance(log, log->log_start,
> +				       log->last_checkpoint);
> +	reclaim_space = r5c_log_required_to_flush_cache(conf);
> +	if (free_space < 2 * reclaim_space)
> +		set_bit(R5C_LOG_CRITICAL, &conf->cache_state);
> +	else
> +		clear_bit(R5C_LOG_CRITICAL, &conf->cache_state);
> +	if (free_space < 3 * reclaim_space)
> +		set_bit(R5C_LOG_TIGHT, &conf->cache_state);
> +	else
> +		clear_bit(R5C_LOG_TIGHT, &conf->cache_state);
> +}

This code, that you rewrote as I requested (Thanks) behaves slightly
differently to the previous version.
Maybe that is intentional, but I thought I would mention it anyway.
The previous would set TIGHT when free_space dropped below
3*reclaim_space, and would only clear it when free_space when above
4*reclaim_space.  This provided some hysteresis.
Now it is cleared as soon as free_space reaches 3*reclaim_space.

Maybe this is what you want, but as the hysteresis seemed like it might
be sensible, it is worth asking.

>  
> +/*
> + * calculate new last_checkpoint
> + * for write through mode, returns log->next_checkpoint
> + * for write back, returns log_start of first sh in stripe_in_cache_list
> + */
> +static sector_t r5c_calculate_new_cp(struct r5conf *conf)
> +{
> +	struct stripe_head *sh;
> +	struct r5l_log *log = conf->log;
> +	sector_t end = MaxSector;

The value assigned here is never used.

> +
> +	if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH)
> +		return log->next_checkpoint;
> +
> +	spin_lock(&log->stripe_in_cache_lock);
> +	if (list_empty(&conf->log->stripe_in_cache_list)) {
> +		/* all stripes flushed */
> +		spin_unlock(&log->stripe_in_cache_lock);
> +		return log->next_checkpoint;
> +	}
> +	sh = list_first_entry(&conf->log->stripe_in_cache_list,
> +			      struct stripe_head, r5c);
> +	end = sh->log_start;
> +	spin_unlock(&log->stripe_in_cache_lock);
> +	return end;

Given that we only assign "log_start" to the variable "end", it is
strange that it is called "end".
"new_cp" would make sense, or "log_start", but why "end" ??


> +}
> +
>  static sector_t r5l_reclaimable_space(struct r5l_log *log)
>  {
> +	struct r5conf *conf = log->rdev->mddev->private;
> +
>  	return r5l_ring_distance(log, log->last_checkpoint,
> -				 log->next_checkpoint);
> +				 r5c_calculate_new_cp(conf));
>  }
>  
>  static void r5l_run_no_mem_stripe(struct r5l_log *log)
> @@ -776,6 +966,7 @@ static bool r5l_complete_finished_ios(struct r5l_log *log)
>  static void __r5l_stripe_write_finished(struct r5l_io_unit *io)
>  {
>  	struct r5l_log *log = io->log;
> +	struct r5conf *conf = log->rdev->mddev->private;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&log->io_list_lock, flags);
> @@ -786,7 +977,8 @@ static void __r5l_stripe_write_finished(struct r5l_io_unit *io)
>  		return;
>  	}
>  
> -	if (r5l_reclaimable_space(log) > log->max_free_space)
> +	if (r5l_reclaimable_space(log) > log->max_free_space ||
> +	    test_bit(R5C_LOG_TIGHT, &conf->cache_state))
>  		r5l_wake_reclaim(log, 0);
>  
>  	spin_unlock_irqrestore(&log->io_list_lock, flags);
> @@ -907,14 +1099,140 @@ static void r5l_write_super_and_discard_space(struct r5l_log *log,
>  	}
>  }
>  
> +/*
> + * r5c_flush_stripe moves stripe from cached list to handle_list. When called,
> + * the stripe must be on r5c_cached_full_stripes or r5c_cached_partial_stripes.
> + *
> + * must hold conf->device_lock
> + */
> +static void r5c_flush_stripe(struct r5conf *conf, struct stripe_head *sh)
> +{
> +	BUG_ON(list_empty(&sh->lru));
> +	BUG_ON(test_bit(STRIPE_R5C_WRITE_OUT, &sh->state));
> +	BUG_ON(test_bit(STRIPE_HANDLE, &sh->state));
> +	assert_spin_locked(&conf->device_lock);
> +
> +	list_del_init(&sh->lru);
> +	atomic_inc(&sh->count);
> +
> +	set_bit(STRIPE_HANDLE, &sh->state);
> +	atomic_inc(&conf->active_stripes);
> +	r5c_make_stripe_write_out(sh);
> +
> +	if (!test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
> +		atomic_inc(&conf->preread_active_stripes);
> +	raid5_release_stripe(sh);

This looks wrong.  raid5_release_stripe() can try to take
conf->device_lock but this function is called with ->device_lock
held. This would cause a deadlock.

It presumably doesn't deadlock because you just incremented sh->count,
so raid5_release_stripe() will probably just decrement sh->count and
that count will remain > 0.
So why are you incrementing ->count for a few instructions and then
releasing the stripe?  Either that isn't necessary, or it could
deadlock.

I guess that if we are certain that STRIPE_ON_RELEASE_LIST is clear,
then it won't deadlock as it will do a lock-less add to
conf->release_stripes.
But if that is the case, it needs to be documented, and probaby there
needs to be a WARN_ON(test_bit(STRIPE_ON_RELEASE_LIST.....));


Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

next prev parent reply	other threads:[~2016-11-17  0:28 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-10 20:46 [PATCH v6 00/11] raid5-cache: enabling cache features Song Liu
2016-11-10 20:46 ` [PATCH v6 01/11] md/r5cache: Check array size in r5l_init_log Song Liu
2016-11-10 20:46 ` [PATCH v6 02/11] md/r5cache: move some code to raid5.h Song Liu
2016-11-10 20:46 ` [PATCH v6 03/11] md/r5cache: State machine for raid5-cache write back mode Song Liu
2016-11-15  1:22   ` Shaohua Li
2016-11-15  1:36     ` Song Liu
2016-11-15  1:38       ` Shaohua Li
2016-11-16  0:17   ` NeilBrown
2016-11-16  5:18     ` Song Liu
2016-11-17  0:28       ` NeilBrown
2016-11-10 20:46 ` [PATCH v6 04/11] md/r5cache: caching mode of r5cache Song Liu
2016-11-15 17:03   ` Shaohua Li
2016-11-15 19:08     ` Song Liu
2016-11-15 21:49       ` Shaohua Li
2016-11-16 19:55         ` Song Liu
2016-11-17 17:25           ` Song Liu
2016-11-16  1:08   ` NeilBrown
2016-11-16  5:23     ` Song Liu
2016-11-10 20:46 ` [PATCH v6 05/11] md/r5cache: write-out mode and reclaim support Song Liu
2016-11-17  0:28   ` NeilBrown [this message]
2016-11-17  0:57     ` Song Liu
2016-11-10 20:46 ` [PATCH v6 06/11] md/r5cache: sysfs entry r5c_journal_mode Song Liu
2016-11-15 23:35   ` Shaohua Li
2016-11-17  0:29   ` NeilBrown
2016-11-10 20:46 ` [PATCH v6 07/11] md/r5cache: refactoring journal recovery code Song Liu
2016-11-10 20:46 ` [PATCH v6 08/11] md/r5cache: r5cache recovery: part 1 Song Liu
2016-11-16  0:33   ` Shaohua Li
2016-11-10 20:46 ` [PATCH v6 09/11] md/r5cache: r5cache recovery: part 2 Song Liu
2016-11-16  0:37   ` Shaohua Li
2016-11-10 20:46 ` [PATCH v6 10/11] md/r5cache: handle SYNC and FUA Song Liu
2016-11-10 20:46 ` [PATCH v6 11/11] md/r5cache: handle alloc_page failure Song Liu
2016-11-16  6:54   ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zikz7xva.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=kernel-team@fb.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=liuzhengyuan@kylinos.cn \
    --cc=liuzhengyuang521@gmail.com \
    --cc=shli@fb.com \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).