* [PATCH] md/r5cache: flush data in memory during journal device failure @ 2017-03-13 23:36 Song Liu 2017-03-14 17:50 ` Shaohua Li 0 siblings, 1 reply; 5+ messages in thread From: Song Liu @ 2017-03-13 23:36 UTC (permalink / raw) To: linux-raid; +Cc: shli, neilb, kernel-team, dan.j.williams, hch, Song Liu For the raid456 with writeback cache, when journal device failed during normal operation, it is still possible to persist all data, as all pending data is still in stripe cache. However, the stripe will be marked as fail with s.log_failed. Thus, the write out from stripe cache cannot make progress. To unblock the write out in journal failures, this patch allows stripes with data injournal to make progress. The array should be read-only in journal failures. Therefore, pending writes (in dev->towrite) are excluded in this write (in delay_towrite). Signed-off-by: Song Liu <songliubraving@fb.com> --- drivers/md/raid5.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 3233975..447d9dd 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3069,6 +3069,10 @@ sector_t raid5_compute_blocknr(struct stripe_head *sh, int i, int previous) * When LOG_CRITICAL, stripes with injournal == 0 will be sent to * no_space_stripes list. * + * 3. during journal failure + * In journal failure, we try to flush all cached data to raid disks + * based on data in stripe cache. The array is read-only to upper + * layers, so we would skip all pending writes. */ static inline bool delay_towrite(struct r5conf *conf, struct r5dev *dev, @@ -3082,6 +3086,9 @@ static inline bool delay_towrite(struct r5conf *conf, if (test_bit(R5C_LOG_CRITICAL, &conf->cache_state) && s->injournal > 0) return true; + /* case 3 above */ + if (s->log_failed && s->injournal) + return true; return false; } @@ -4721,7 +4728,8 @@ static void handle_stripe(struct stripe_head *sh) /* check if the array has lost more than max_degraded devices and, * if so, some requests might need to be failed. */ - if (s.failed > conf->max_degraded || s.log_failed) { + if (s.failed > conf->max_degraded || + (s.log_failed && s.injournal == 0)) { sh->check_state = 0; sh->reconstruct_state = 0; break_stripe_batch_list(sh, 0); -- 2.9.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] md/r5cache: flush data in memory during journal device failure 2017-03-13 23:36 [PATCH] md/r5cache: flush data in memory during journal device failure Song Liu @ 2017-03-14 17:50 ` Shaohua Li 2017-03-14 22:40 ` Song Liu 2017-03-15 23:45 ` Song Liu 0 siblings, 2 replies; 5+ messages in thread From: Shaohua Li @ 2017-03-14 17:50 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid, shli, neilb, kernel-team, dan.j.williams, hch On Mon, Mar 13, 2017 at 04:36:26PM -0700, Song Liu wrote: > For the raid456 with writeback cache, when journal device failed during > normal operation, it is still possible to persist all data, as all > pending data is still in stripe cache. However, the stripe will be > marked as fail with s.log_failed. Thus, the write out from stripe cache > cannot make progress. > > To unblock the write out in journal failures, this patch allows stripes > with data injournal to make progress. what about the parity part? if log failed, we should skip journaling the parity. Thanks, Shaohua > The array should be read-only in journal failures. Therefore, pending > writes (in dev->towrite) are excluded in this write (in delay_towrite). > > Signed-off-by: Song Liu <songliubraving@fb.com> > --- > drivers/md/raid5.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 3233975..447d9dd 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -3069,6 +3069,10 @@ sector_t raid5_compute_blocknr(struct stripe_head *sh, int i, int previous) > * When LOG_CRITICAL, stripes with injournal == 0 will be sent to > * no_space_stripes list. > * > + * 3. during journal failure > + * In journal failure, we try to flush all cached data to raid disks > + * based on data in stripe cache. The array is read-only to upper > + * layers, so we would skip all pending writes. > */ > static inline bool delay_towrite(struct r5conf *conf, > struct r5dev *dev, > @@ -3082,6 +3086,9 @@ static inline bool delay_towrite(struct r5conf *conf, > if (test_bit(R5C_LOG_CRITICAL, &conf->cache_state) && > s->injournal > 0) > return true; > + /* case 3 above */ > + if (s->log_failed && s->injournal) > + return true; > return false; > } > > @@ -4721,7 +4728,8 @@ static void handle_stripe(struct stripe_head *sh) > /* check if the array has lost more than max_degraded devices and, > * if so, some requests might need to be failed. > */ > - if (s.failed > conf->max_degraded || s.log_failed) { > + if (s.failed > conf->max_degraded || > + (s.log_failed && s.injournal == 0)) { > sh->check_state = 0; > sh->reconstruct_state = 0; > break_stripe_batch_list(sh, 0); > -- > 2.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] md/r5cache: flush data in memory during journal device failure 2017-03-14 17:50 ` Shaohua Li @ 2017-03-14 22:40 ` Song Liu 2017-03-15 22:48 ` Shaohua Li 2017-03-15 23:45 ` Song Liu 1 sibling, 1 reply; 5+ messages in thread From: Song Liu @ 2017-03-14 22:40 UTC (permalink / raw) To: Shaohua Li Cc: linux-raid, Shaohua Li, NeilBrown, Kernel Team, dan.j.williams@intel.com, hch@infradead.org > On Mar 14, 2017, at 10:50 AM, Shaohua Li <shli@kernel.org> wrote: > > On Mon, Mar 13, 2017 at 04:36:26PM -0700, Song Liu wrote: >> For the raid456 with writeback cache, when journal device failed during >> normal operation, it is still possible to persist all data, as all >> pending data is still in stripe cache. However, the stripe will be >> marked as fail with s.log_failed. Thus, the write out from stripe cache >> cannot make progress. >> >> To unblock the write out in journal failures, this patch allows stripes >> with data injournal to make progress. > > what about the parity part? if log failed, we should skip journaling the parity. > > Thanks, > Shaohua > For stripes with data in journal (not flushed yet), the state machine can flush them out. The behavior is just like when there are no journal at all. On the other hand, other writes will be gated by the log_failed flags, so the array appears to be read-only to upper layers. Thanks, Song >> The array should be read-only in journal failures. Therefore, pending >> writes (in dev->towrite) are excluded in this write (in delay_towrite). >> >> Signed-off-by: Song Liu <songliubraving@fb.com> >> --- >> drivers/md/raid5.c | 10 +++++++++- >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c >> index 3233975..447d9dd 100644 >> --- a/drivers/md/raid5.c >> +++ b/drivers/md/raid5.c >> @@ -3069,6 +3069,10 @@ sector_t raid5_compute_blocknr(struct stripe_head *sh, int i, int previous) >> * When LOG_CRITICAL, stripes with injournal == 0 will be sent to >> * no_space_stripes list. >> * >> + * 3. during journal failure >> + * In journal failure, we try to flush all cached data to raid disks >> + * based on data in stripe cache. The array is read-only to upper >> + * layers, so we would skip all pending writes. >> */ >> static inline bool delay_towrite(struct r5conf *conf, >> struct r5dev *dev, >> @@ -3082,6 +3086,9 @@ static inline bool delay_towrite(struct r5conf *conf, >> if (test_bit(R5C_LOG_CRITICAL, &conf->cache_state) && >> s->injournal > 0) >> return true; >> + /* case 3 above */ >> + if (s->log_failed && s->injournal) >> + return true; >> return false; >> } >> >> @@ -4721,7 +4728,8 @@ static void handle_stripe(struct stripe_head *sh) >> /* check if the array has lost more than max_degraded devices and, >> * if so, some requests might need to be failed. >> */ >> - if (s.failed > conf->max_degraded || s.log_failed) { >> + if (s.failed > conf->max_degraded || >> + (s.log_failed && s.injournal == 0)) { >> sh->check_state = 0; >> sh->reconstruct_state = 0; >> break_stripe_batch_list(sh, 0); >> -- >> 2.9.3 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] md/r5cache: flush data in memory during journal device failure 2017-03-14 22:40 ` Song Liu @ 2017-03-15 22:48 ` Shaohua Li 0 siblings, 0 replies; 5+ messages in thread From: Shaohua Li @ 2017-03-15 22:48 UTC (permalink / raw) To: Song Liu Cc: linux-raid, Shaohua Li, NeilBrown, Kernel Team, dan.j.williams@intel.com, hch@infradead.org On Tue, Mar 14, 2017 at 10:40:14PM +0000, Song Liu wrote: > > > On Mar 14, 2017, at 10:50 AM, Shaohua Li <shli@kernel.org> wrote: > > > > On Mon, Mar 13, 2017 at 04:36:26PM -0700, Song Liu wrote: > >> For the raid456 with writeback cache, when journal device failed during > >> normal operation, it is still possible to persist all data, as all > >> pending data is still in stripe cache. However, the stripe will be > >> marked as fail with s.log_failed. Thus, the write out from stripe cache > >> cannot make progress. > >> > >> To unblock the write out in journal failures, this patch allows stripes > >> with data injournal to make progress. > > > > what about the parity part? if log failed, we should skip journaling the parity. > > > > Thanks, > > Shaohua > > > > For stripes with data in journal (not flushed yet), the state machine > can flush them out. The behavior is just like when there are no journal > at all. can you explain this more? I didn't find any place we check the failure bit and so skip journaling the parity. Also include the description in the changelog. > On the other hand, other writes will be gated by the log_failed flags, > so the array appears to be read-only to upper layers. > > Thanks, > Song > > >> The array should be read-only in journal failures. Therefore, pending > >> writes (in dev->towrite) are excluded in this write (in delay_towrite). > >> > >> Signed-off-by: Song Liu <songliubraving@fb.com> > >> --- > >> drivers/md/raid5.c | 10 +++++++++- > >> 1 file changed, 9 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > >> index 3233975..447d9dd 100644 > >> --- a/drivers/md/raid5.c > >> +++ b/drivers/md/raid5.c > >> @@ -3069,6 +3069,10 @@ sector_t raid5_compute_blocknr(struct stripe_head *sh, int i, int previous) > >> * When LOG_CRITICAL, stripes with injournal == 0 will be sent to > >> * no_space_stripes list. > >> * > >> + * 3. during journal failure > >> + * In journal failure, we try to flush all cached data to raid disks > >> + * based on data in stripe cache. The array is read-only to upper > >> + * layers, so we would skip all pending writes. > >> */ > >> static inline bool delay_towrite(struct r5conf *conf, > >> struct r5dev *dev, > >> @@ -3082,6 +3086,9 @@ static inline bool delay_towrite(struct r5conf *conf, > >> if (test_bit(R5C_LOG_CRITICAL, &conf->cache_state) && > >> s->injournal > 0) > >> return true; > >> + /* case 3 above */ > >> + if (s->log_failed && s->injournal) > >> + return true; > >> return false; > >> } > >> > >> @@ -4721,7 +4728,8 @@ static void handle_stripe(struct stripe_head *sh) > >> /* check if the array has lost more than max_degraded devices and, > >> * if so, some requests might need to be failed. > >> */ > >> - if (s.failed > conf->max_degraded || s.log_failed) { > >> + if (s.failed > conf->max_degraded || > >> + (s.log_failed && s.injournal == 0)) { > >> sh->check_state = 0; > >> sh->reconstruct_state = 0; > >> break_stripe_batch_list(sh, 0); > >> -- > >> 2.9.3 > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] md/r5cache: flush data in memory during journal device failure 2017-03-14 17:50 ` Shaohua Li 2017-03-14 22:40 ` Song Liu @ 2017-03-15 23:45 ` Song Liu 1 sibling, 0 replies; 5+ messages in thread From: Song Liu @ 2017-03-15 23:45 UTC (permalink / raw) To: Shaohua Li Cc: linux-raid, Shaohua Li, NeilBrown, Kernel Team, Dan Williams, hch@infradead.org > On Mar 14, 2017, at 10:50 AM, Shaohua Li <shli@kernel.org> wrote: > > On Mon, Mar 13, 2017 at 04:36:26PM -0700, Song Liu wrote: >> For the raid456 with writeback cache, when journal device failed during >> normal operation, it is still possible to persist all data, as all >> pending data is still in stripe cache. However, the stripe will be >> marked as fail with s.log_failed. Thus, the write out from stripe cache >> cannot make progress. >> >> To unblock the write out in journal failures, this patch allows stripes >> with data injournal to make progress. > > what about the parity part? if log failed, we should skip journaling the parity. > Hmm.. I guess we need to check Faulty bit in some functions. Simply checking log is not NULL is not enough. I will update the patch. Thanks, Song ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-03-15 23:45 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-03-13 23:36 [PATCH] md/r5cache: flush data in memory during journal device failure Song Liu 2017-03-14 17:50 ` Shaohua Li 2017-03-14 22:40 ` Song Liu 2017-03-15 22:48 ` Shaohua Li 2017-03-15 23:45 ` Song Liu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).