From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shaohua Li Subject: Re: [PATCH v4 2/2] md/r5cache: gracefully handle journal device errors for writeback mode Date: Wed, 10 May 2017 15:51:03 -0700 Message-ID: <20170510225103.gc34fpaqmesvadgi@kernel.org> References: <20170509003925.3480693-1-songliubraving@fb.com> <20170509003925.3480693-2-songliubraving@fb.com> <20170510170043.4v4ijoxmfty6hndf@kernel.org> <72F7A8AB-A6BD-4C52-B8AF-11C5E71CFEF4@fb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <72F7A8AB-A6BD-4C52-B8AF-11C5E71CFEF4@fb.com> Sender: linux-raid-owner@vger.kernel.org To: Song Liu Cc: "linux-raid@vger.kernel.org" , Shaohua Li , "neilb@suse.com" , Kernel Team , "dan.j.williams@intel.com" , "hch@infradead.org" , "jes.sorensen@gmail.com" List-Id: linux-raid.ids On Wed, May 10, 2017 at 09:45:05PM +0000, Song Liu wrote: > > > On May 10, 2017, at 10:01 AM, Shaohua Li wrote: > > > > On Mon, May 08, 2017 at 05:39:25PM -0700, Song Liu wrote: > >> For the raid456 with writeback cache, when journal device failed during > >> normal operation, it is still possible to persist all data, as all > >> pending data is still in stripe cache. However, it is necessary to handle > >> journal failure gracefully. > >> > >> During journal failures, this patch makes the follow changes to land data > >> in cache to raid disks gracefully: > >> > >> 1. In handle_stripe(), allow stripes with data in journal (s.injournal > 0) > >> to make progress; > >> 2. In delay_towrite(), only process data in the cache (skip dev->towrite); > >> 3. In __get_priority_stripe(), set try_loprio to true, so no stripe stuck > >> in loprio_list > > > > Applied the first patch. For this patch, I don't have a clear picture about > > what you are trying to do. Please describe the steps we are doing to do after > > journal failure. > > I will add more description to the next version. > > > > >> Signed-off-by: Song Liu > >> --- > >> drivers/md/raid5-cache.c | 13 ++++++++++--- > >> drivers/md/raid5-log.h | 3 ++- > >> drivers/md/raid5.c | 29 +++++++++++++++++++++++------ > >> 3 files changed, 35 insertions(+), 10 deletions(-) > >> > >> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c > >> index dc1dba6..e6032f6 100644 > >> --- a/drivers/md/raid5-cache.c > >> +++ b/drivers/md/raid5-cache.c > >> @@ -24,6 +24,7 @@ > >> #include "md.h" > >> #include "raid5.h" > >> #include "bitmap.h" > >> +#include "raid5-log.h" > >> > >> /* > >> * metadata/data stored in disk with 4k size unit (a block) regardless > >> @@ -679,6 +680,7 @@ static void r5c_disable_writeback_async(struct work_struct *work) > >> return; > >> pr_info("md/raid:%s: Disabling writeback cache for degraded array.\n", > >> mdname(mddev)); > >> + md_update_sb(mddev, 1); > > > > Why this? And md_update_sb must be called within mddev->reconfig_mutex locked. > > This is to avoid skipping in handle_stripe(): > > if (s.handle_bad_blocks || > test_bit(MD_SB_CHANGE_PENDING, &conf->mddev->sb_flags)) { > set_bit(STRIPE_HANDLE, &sh->state); > goto finish; > } > > I haven't got a better idea than calling md_update_sb() somewhere. It is also tricky > to lock mddev->reconfigured_mutex here, due to potential deadlocking with > mddev->open_mutex. > > Do you have suggestions on this? This sounds not necessary then. The mddev->thread will call md_update_sb and clear the MD_SB_CHANGE_PENDING bit. After that, the stripes will be handled. This is running in a workqueue, I'm wondering what kind of deadlock issue there is. > > >> mddev_suspend(mddev); > >> log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH; > >> mddev_resume(mddev); > >> @@ -1557,6 +1559,8 @@ void r5l_wake_reclaim(struct r5l_log *log, sector_t space) > >> void r5l_quiesce(struct r5l_log *log, int state) > >> { > >> struct mddev *mddev; > >> + struct r5conf *conf; > >> + > >> if (!log || state == 2) > >> return; > >> if (state == 0) > >> @@ -1564,10 +1568,12 @@ void r5l_quiesce(struct r5l_log *log, int state) > >> else if (state == 1) { > >> /* make sure r5l_write_super_and_discard_space exits */ > >> mddev = log->rdev->mddev; > >> + conf = mddev->private; > >> wake_up(&mddev->sb_wait); > >> kthread_park(log->reclaim_thread->tsk); > >> r5l_wake_reclaim(log, MaxSector); > >> - r5l_do_reclaim(log); > >> + if (!r5l_log_disk_error(conf)) > >> + r5l_do_reclaim(log); > > > > I think r5c_disable_writeback_async() will call into this, so we flush all > > stripe cache out to raid disks, why skip the reclaim? > > > > r5l_do_reclaim() reclaims log space with 2 steps: > 1. clear all io_unit lists (flushing_ios, etc.) by waking up mddev->thread. > 2. update log_tail in the journal device, and issue discard to journal device. > > When we are handling log failures in r5c_disable_writeback_async(), we are > flushing the cache, so it is not necessary to wake up mddev->thread. Also, > with log device error, it is not necessary update log_tail or issue discard. > > Therefore, r5l_do_reclaim is not necessary in log disk errors. Ok, there is no side effect, right? let's call it anyway. Thanks, Shaohua