From: Shaohua Li <shli@kernel.org>
To: Song Liu <songliubraving@fb.com>
Cc: linux-raid <linux-raid@vger.kernel.org>, Shaohua Li <shli@fb.com>,
NeilBrown <neilb@suse.com>, Kernel Team <Kernel-team@fb.com>,
"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
"hch@infradead.org" <hch@infradead.org>
Subject: Re: [PATCH] md/r5cache: flush data in memory during journal device failure
Date: Wed, 15 Mar 2017 15:48:31 -0700 [thread overview]
Message-ID: <20170315224831.plinspr2liew4mp7@kernel.org> (raw)
In-Reply-To: <5FFE3F62-D87A-46C5-B9D7-7A7501A32B90@fb.com>
On Tue, Mar 14, 2017 at 10:40:14PM +0000, Song Liu wrote:
>
> > On Mar 14, 2017, at 10:50 AM, Shaohua Li <shli@kernel.org> wrote:
> >
> > On Mon, Mar 13, 2017 at 04:36:26PM -0700, Song Liu wrote:
> >> For the raid456 with writeback cache, when journal device failed during
> >> normal operation, it is still possible to persist all data, as all
> >> pending data is still in stripe cache. However, the stripe will be
> >> marked as fail with s.log_failed. Thus, the write out from stripe cache
> >> cannot make progress.
> >>
> >> To unblock the write out in journal failures, this patch allows stripes
> >> with data injournal to make progress.
> >
> > what about the parity part? if log failed, we should skip journaling the parity.
> >
> > Thanks,
> > Shaohua
> >
>
> For stripes with data in journal (not flushed yet), the state machine
> can flush them out. The behavior is just like when there are no journal
> at all.
can you explain this more? I didn't find any place we check the failure bit and
so skip journaling the parity. Also include the description in the changelog.
> On the other hand, other writes will be gated by the log_failed flags,
> so the array appears to be read-only to upper layers.
>
> Thanks,
> Song
>
> >> The array should be read-only in journal failures. Therefore, pending
> >> writes (in dev->towrite) are excluded in this write (in delay_towrite).
> >>
> >> Signed-off-by: Song Liu <songliubraving@fb.com>
> >> ---
> >> drivers/md/raid5.c | 10 +++++++++-
> >> 1 file changed, 9 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >> index 3233975..447d9dd 100644
> >> --- a/drivers/md/raid5.c
> >> +++ b/drivers/md/raid5.c
> >> @@ -3069,6 +3069,10 @@ sector_t raid5_compute_blocknr(struct stripe_head *sh, int i, int previous)
> >> * When LOG_CRITICAL, stripes with injournal == 0 will be sent to
> >> * no_space_stripes list.
> >> *
> >> + * 3. during journal failure
> >> + * In journal failure, we try to flush all cached data to raid disks
> >> + * based on data in stripe cache. The array is read-only to upper
> >> + * layers, so we would skip all pending writes.
> >> */
> >> static inline bool delay_towrite(struct r5conf *conf,
> >> struct r5dev *dev,
> >> @@ -3082,6 +3086,9 @@ static inline bool delay_towrite(struct r5conf *conf,
> >> if (test_bit(R5C_LOG_CRITICAL, &conf->cache_state) &&
> >> s->injournal > 0)
> >> return true;
> >> + /* case 3 above */
> >> + if (s->log_failed && s->injournal)
> >> + return true;
> >> return false;
> >> }
> >>
> >> @@ -4721,7 +4728,8 @@ static void handle_stripe(struct stripe_head *sh)
> >> /* check if the array has lost more than max_degraded devices and,
> >> * if so, some requests might need to be failed.
> >> */
> >> - if (s.failed > conf->max_degraded || s.log_failed) {
> >> + if (s.failed > conf->max_degraded ||
> >> + (s.log_failed && s.injournal == 0)) {
> >> sh->check_state = 0;
> >> sh->reconstruct_state = 0;
> >> break_stripe_batch_list(sh, 0);
> >> --
> >> 2.9.3
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2017-03-15 22:48 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-13 23:36 [PATCH] md/r5cache: flush data in memory during journal device failure Song Liu
2017-03-14 17:50 ` Shaohua Li
2017-03-14 22:40 ` Song Liu
2017-03-15 22:48 ` Shaohua Li [this message]
2017-03-15 23:45 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170315224831.plinspr2liew4mp7@kernel.org \
--to=shli@kernel.org \
--cc=Kernel-team@fb.com \
--cc=dan.j.williams@intel.com \
--cc=hch@infradead.org \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.com \
--cc=shli@fb.com \
--cc=songliubraving@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).