From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Piggin Subject: Re: [patch 6/7] fs: fsync optimisations Date: Tue, 30 Nov 2010 11:11:03 +1100 Message-ID: <20101130001103.GE3255@amd> References: <20101123140610.292941494@kernel.dk> <20101123140708.058372884@kernel.dk> <20101129150325.GD26076@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: npiggin@kernel.dk, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org To: Christoph Hellwig Return-path: Received: from ipmail05.adl6.internode.on.net ([150.101.137.143]:26996 "EHLO ipmail05.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755804Ab0K3ALI (ORCPT ); Mon, 29 Nov 2010 19:11:08 -0500 Content-Disposition: inline In-Reply-To: <20101129150325.GD26076@infradead.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Mon, Nov 29, 2010 at 10:03:25AM -0500, Christoph Hellwig wrote: > On Wed, Nov 24, 2010 at 01:06:16AM +1100, npiggin@kernel.dk wrote: > > Optimise fsync by adding a datasync parameter to sync_inode_metadata to > > DTRT with writing back the inode (->write_inode in theory should have a > > datasync parameter too perhaps, but that's for another time). > > > > Also, implement the metadata sync optimally rather than reusing the > > normal data writeback path. This means less useless moving the inode around the > > writeback lists, and less dropping and retaking of inode_lock, and avoiding > > the data writeback call with nr_pages == 0. > > This patch looks good to me, but a few minor comments below: (BTW. it actually had a bug with writeback_end not being called in some cases, in case you test it) > > @@ -1315,13 +1315,49 @@ EXPORT_SYMBOL(sync_inode); > > * > > * Note: only writes the actual inode, no associated data or other metadata. > > */ > > -int sync_inode_metadata(struct inode *inode, int wait) > > +int sync_inode_metadata(struct inode *inode, int datasync, int wait) > > { > > + struct address_space *mapping = inode->i_mapping; > > struct writeback_control wbc = { > > .sync_mode = wait ? WB_SYNC_ALL : WB_SYNC_NONE, > > .nr_to_write = 0, /* metadata-only */ > > }; > > + unsigned dirty, mask; > > + int ret = 0; > > > > - return sync_inode(inode, &wbc); > > + /* > > + * This is a similar implementation to writeback_single_inode. > > + * Keep them in sync. > > + */ > > I'd move this comment to above the function. OK > > > + /* > > + * Generic write_inode doesn't distinguish between sync and datasync, > > + * so even a datasync can clear the sync state. Filesystems which > > + * distiguish these cases must only clear 'mask' in their metadata > > + * sync code. > > + */ > > I don't understand this comment. Currenly filesystems never clear > i_state bits by themselves. No, but with the new inode writeback routines they could. Basically they might be expected to copy this function, and put their own improvements in.