linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: linux-raid@vger.kernel.org
Cc: shli@fb.com, kernel-team@fb.com, dan.j.williams@intel.com,
	hch@infradead.org, liuzhengyuang521@gmail.com,
	liuzhengyuan@kylinos.cn, Song Liu <songliubraving@fb.com>
Subject: Re: [PATCH v6 03/11] md/r5cache: State machine for raid5-cache write back mode
Date: Wed, 16 Nov 2016 11:17:13 +1100	[thread overview]
Message-ID: <87bmxgi8hi.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20161110204623.3484694-4-songliubraving@fb.com>

[-- Attachment #1: Type: text/plain, Size: 3505 bytes --]

On Fri, Nov 11 2016, Song Liu wrote:

> This patch adds state machine for raid5-cache. With log device, the
> raid456 array could operate in two different modes (r5c_journal_mode):
>   - write-back (R5C_MODE_WRITE_BACK)
>   - write-through (R5C_MODE_WRITE_THROUGH)
>
> Existing code of raid5-cache only has write-through mode. For write-back
> cache, it is necessary to extend the state machine.
>
> With write-back cache, every stripe could operate in two different
> modes:
>   - caching
>   - writing-out
>
> In caching mode, the stripe handles writes as:
>   - write to journal
>   - return IO
>
> In writing-out mode, the stripe behaviors as a stripe in write through
> mode R5C_MODE_WRITE_THROUGH.
>
> STRIPE_R5C_WRITE_OUT is added to sh->state to differentiate caching and
> writing-out mode.
>
> When the array is write-through, stripes also go between caching mode
> and writing-out mode.

This bothers me.  Why would a stripe *ever* be in "caching mode" (or
"caching phase") when the array is in write-through?  It doesn't seem to
make sense.

A stripe_head goes through several states/stages/phases/things.

 1/ write data blocks to journal
 2/ reading missing data blocks and compute parity
 3/ write data and parity to journal
 4/ write data and parity to RAID

When there is no log, we only perform 2 and 4
When there is a log in WRITE_THOUGH we perform 2,3,4
When there is a log in WRITE_BACK we perform 1 (maybe several times) 2 3 4.

Step 2 is initiated by calling handle_stripe_dirtying().
A new function is introduced "r5c_handle_stripe_dirtying()" which
determines if handle_stripe_dirtying() should do anything.
(It returns '0' if it shouldn't proceed, and -EAGAIN if it should,
 which seems a little strange).

If there is no log, or if STRIPE_R5C_WRITE_OUT is set, -EAGAIN is
returned.
Otherwise if in MODE_WRITE_THROUGH, STRIPE_R5C_WRITE_OUT is set and
-EAGAIN is returned.
else (in next patch) are more complex determination is made, but
-EAGAIN is only ever returns with STRIPE_R5C_WRITE_OUT set, or if log == NULL.

So STRIPE_R5C_WRITE_OUT is (sometimes) set to enter step 2, and cleared
when step 4 completes.

STRIPE_R5C_WRITE_OUT means that the 2(3)4 writeout sequence has
commenced and not yet completed.  I would like to see it *always* set
when that happens, including when log==NULL.
I would probably even rename r5c_handle_stripe_dirtying() to something
like  r5c_check_need_writeout() which would check to see if writeout was
needed, and would set STRIPE_R5C_WRITE_OUT if it was.
Then handle_stripe_dirtying() would be called iff STRIPE_R5C_WRITE_OUT
were set. (it could even be named to handle_stripe_writeout()??)

There are two actions that can be taken when where are ->towrite blocks
on a stripe.  We can enter WRITE_OUT, or they can be cached in the
journal.  Also we can enter WRITE_OUT when a stripe needs to be removed
From memory or from the journal.
This makes "writeout" and "cache" seem more like "actions" than states,
modes, or phases.  Naming is hard.

Trying to understand R5_InJournal:
 It is set (on pd_idx) when step 3 completes.
 It is only used (in this patch) in r5c_finish_stripe_write_out()
   to make sure WRITE_OUT isn't cleared until after InJournal is
 cleared.

So setting InJournal[pd_idx] marks the end of step 3 and clearing WRITE_OUT
marks the end of step 4.

I think that observation helps me understand the code a bit better.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

  parent reply	other threads:[~2016-11-16  0:17 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-10 20:46 [PATCH v6 00/11] raid5-cache: enabling cache features Song Liu
2016-11-10 20:46 ` [PATCH v6 01/11] md/r5cache: Check array size in r5l_init_log Song Liu
2016-11-10 20:46 ` [PATCH v6 02/11] md/r5cache: move some code to raid5.h Song Liu
2016-11-10 20:46 ` [PATCH v6 03/11] md/r5cache: State machine for raid5-cache write back mode Song Liu
2016-11-15  1:22   ` Shaohua Li
2016-11-15  1:36     ` Song Liu
2016-11-15  1:38       ` Shaohua Li
2016-11-16  0:17   ` NeilBrown [this message]
2016-11-16  5:18     ` Song Liu
2016-11-17  0:28       ` NeilBrown
2016-11-10 20:46 ` [PATCH v6 04/11] md/r5cache: caching mode of r5cache Song Liu
2016-11-15 17:03   ` Shaohua Li
2016-11-15 19:08     ` Song Liu
2016-11-15 21:49       ` Shaohua Li
2016-11-16 19:55         ` Song Liu
2016-11-17 17:25           ` Song Liu
2016-11-16  1:08   ` NeilBrown
2016-11-16  5:23     ` Song Liu
2016-11-10 20:46 ` [PATCH v6 05/11] md/r5cache: write-out mode and reclaim support Song Liu
2016-11-17  0:28   ` NeilBrown
2016-11-17  0:57     ` Song Liu
2016-11-10 20:46 ` [PATCH v6 06/11] md/r5cache: sysfs entry r5c_journal_mode Song Liu
2016-11-15 23:35   ` Shaohua Li
2016-11-17  0:29   ` NeilBrown
2016-11-10 20:46 ` [PATCH v6 07/11] md/r5cache: refactoring journal recovery code Song Liu
2016-11-10 20:46 ` [PATCH v6 08/11] md/r5cache: r5cache recovery: part 1 Song Liu
2016-11-16  0:33   ` Shaohua Li
2016-11-10 20:46 ` [PATCH v6 09/11] md/r5cache: r5cache recovery: part 2 Song Liu
2016-11-16  0:37   ` Shaohua Li
2016-11-10 20:46 ` [PATCH v6 10/11] md/r5cache: handle SYNC and FUA Song Liu
2016-11-10 20:46 ` [PATCH v6 11/11] md/r5cache: handle alloc_page failure Song Liu
2016-11-16  6:54   ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bmxgi8hi.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=kernel-team@fb.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=liuzhengyuan@kylinos.cn \
    --cc=liuzhengyuang521@gmail.com \
    --cc=shli@fb.com \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).