From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kara Subject: Re: [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io() Date: Wed, 5 Oct 2011 19:39:08 +0200 Message-ID: <20111005173908.GF23467@quack.suse.cz> References: <1315442684-26754-1-git-send-email-jack@suse.cz> <1315442684-26754-2-git-send-email-jack@suse.cz> <20110908012236.GB12712@localhost> <20110908150340.GB28149@quack.suse.cz> <20110918140737.GA15366@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jan Kara , "linux-fsdevel@vger.kernel.org" , Dave Chinner , Christoph Hellwig To: Wu Fengguang Return-path: Received: from cantor2.suse.de ([195.135.220.15]:35658 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755293Ab1JER7F (ORCPT ); Wed, 5 Oct 2011 13:59:05 -0400 Content-Disposition: inline In-Reply-To: <20110918140737.GA15366@localhost> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sun 18-09-11 22:07:37, Wu Fengguang wrote: > On Thu, Sep 08, 2011 at 11:03:40PM +0800, Jan Kara wrote: > > On Thu 08-09-11 09:22:36, Wu Fengguang wrote: > > > Jan, > > > > > > > @@ -420,6 +421,8 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb, > > > > /* Don't write the inode if only I_DIRTY_PAGES was set */ > > > > if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) { > > > > int err = write_inode(inode, wbc); > > > > + if (!err) > > > > + inode_written = true; > > > > if (ret == 0) > > > > ret = err; > > > > } > > > > > > write_inode() typically return error after redirtying the inode. > > > So the conditions inode_written=false and (inode->i_state & I_DIRTY) > > > are mostly on/off together. For the cases they disagree, it's probably > > > a filesystem bug -- at least I don't think some FS will deliberately > > > return success while redirtying the inode, or the reverse. > > There is a possibility someone else redirties the inode between the moment > > I_DIRTY bits are cleared in writeback_single_inode() and the check for > > I_DIRTY is done after ->write_inode() is called. Especially when > > write_inode() blocks waiting for some IO this isn't that hard to happen. So > > there are valid (although relatively rare) cases when inode_written is > > different from the result of I_DIRTY check. > > Ah yes, that's good point. > > > > > } else if (inode->i_state & I_DIRTY) { > > > > /* > > > > * Filesystems can dirty the inode during writeback > > > > * operations, such as delayed allocation during > > > > * submission or metadata updates after data IO > > > > - * completion. > > > > + * completion. Also inode could have been dirtied by > > > > + * some process aggressively touching metadata. > > > > + * Finally, filesystem could just fail to write the > > > > + * inode for some reason. We have to distinguish the > > > > + * last case from the previous ones - in the last case > > > > + * we want to give the inode quick retry, in the > > > > + * other cases we want to put it back to the dirty list > > > > + * to avoid livelocking of writeback. > > > > */ > > > > - redirty_tail(inode, wb); > > > > + if (inode_written) > > > > + redirty_tail(inode, wb); > > > > > > Can you elaborate the livelock in the below inode_written=true case? > > > Why the sleep in the wb_writeback() loop is not enough? > > In case someone would be able to consistently trigger the race window and > > redirty the inode before we check here, we would loop for a long time > > always writing just this inode and thus effectivelly stalling other > > writeback. That's why I push redirtied inode behind other inodes in the > > dirty list. > > Agreed. All the left to do is to confirm whether this addresses > Christoph's original problem. > > Acked-by: Wu Fengguang Great, thanks for review! I'll resend the two patches to Christoph so that he can try them. Honza -- Jan Kara SUSE Labs, CR