From mboxrd@z Thu Jan 1 00:00:00 1970 From: Junxiao Bi Subject: Re: [PATCH V2] writeback: fix race that cause writeback hung Date: Wed, 21 Aug 2013 09:32:03 +0800 Message-ID: <52141893.3070500@oracle.com> References: <1376710563-17145-1-git-send-email-junxiao.bi@oracle.com> <20130819080655.GA28780@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org, fengguang.wu@intel.com, viro@ZenIV.linux.org.uk To: Jan Kara Return-path: Received: from userp1040.oracle.com ([156.151.31.81]:41195 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751448Ab3HUBe3 (ORCPT ); Tue, 20 Aug 2013 21:34:29 -0400 In-Reply-To: <20130819080655.GA28780@quack.suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Cc: Al Viro On 08/19/2013 04:06 PM, Jan Kara wrote: > On Sat 17-08-13 11:36:03, Junxiao Bi wrote: >> There is a race between mark inode dirty and writeback thread, >> see the following scenario. In this case, writeback thread will >> not run though there is dirty_io. >> >> __mark_inode_dirty() bdi_writeback_workfn() >> ... ... >> spin_lock(&inode->i_lock); >> ... >> if (bdi_cap_writeback_dirty(bdi)) { >> <<< assume wb has dirty_io, so wakeup_bdi is false. >> <<< the following inode_dirty also have wakeup_bdi false. >> if (!wb_has_dirty_io(&bdi->wb)) >> wakeup_bdi = true; >> } >> spin_unlock(&inode->i_lock); >> <<< assume last dirty_io is removed here. >> pages_written = wb_do_writeback(wb); >> ... >> <<< work_list empty and wb has no dirty_io, >> <<< delayed_work will not be queued. >> if (!list_empty(&bdi->work_list) || >> (wb_has_dirty_io(wb) && dirty_writeback_interval)) >> queue_delayed_work(bdi_wq, &wb->dwork, >> msecs_to_jiffies(dirty_writeback_interval * 10)); >> spin_lock(&bdi->wb.list_lock); >> inode->dirtied_when = jiffies; >> <<< new dirty_io is added. >> list_move(&inode->i_wb_list, &bdi->wb.b_dirty); >> spin_unlock(&bdi->wb.list_lock); >> >> <<< though there is dirty_io, but wakeup_bdi is false, >> <<< so writeback thread will not be waked up and >> <<< the new dirty_io will not be flushed. >> if (wakeup_bdi) >> bdi_wakeup_thread_delayed(bdi); >> >> Writeback will run until there is a new flush work queued. >> This may cause a lot of dirty pages stay in memory for a long time. >> >> Cc: Jan Kara >> Cc: Fengguang Wu >> Signed-off-by: Junxiao Bi > Looks good. You can add: > Reviewed-by: Jan Kara > > Honza > >> --- >> fs/fs-writeback.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c >> index 68851ff..053ec91 100644 >> --- a/fs/fs-writeback.c >> +++ b/fs/fs-writeback.c >> @@ -1173,6 +1173,8 @@ void __mark_inode_dirty(struct inode *inode, int flags) >> bool wakeup_bdi = false; >> bdi = inode_to_bdi(inode); >> >> + spin_unlock(&inode->i_lock); >> + spin_lock(&bdi->wb.list_lock); >> if (bdi_cap_writeback_dirty(bdi)) { >> WARN(!test_bit(BDI_registered, &bdi->state), >> "bdi-%s not registered\n", bdi->name); >> @@ -1187,8 +1189,6 @@ void __mark_inode_dirty(struct inode *inode, int flags) >> wakeup_bdi = true; >> } >> >> - spin_unlock(&inode->i_lock); >> - spin_lock(&bdi->wb.list_lock); >> inode->dirtied_when = jiffies; >> list_move(&inode->i_wb_list, &bdi->wb.b_dirty); >> spin_unlock(&bdi->wb.list_lock); >> -- >> 1.7.9.5 >>