From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wu Fengguang Subject: Re: [PATCH] improve the performance of large sequential write NFS workloads Date: Thu, 31 Dec 2009 13:04:41 +0800 Message-ID: <20091231050441.GB19627@localhost> References: <20091219122033.GA11360@localhost> <1261232747.1947.194.camel@serenity> <20091222015907.GA6223@localhost> <1261578107.2606.11.camel@localhost> <20091223180551.GD3159@quack.suse.cz> <1261595574.6775.2.camel@localhost> <20091224025228.GA12477@localhost> <1261656281.3596.1.camel@localhost> <20091225055617.GA8595@localhost> <1262190168.7332.6.camel@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jan Kara , Steve Rago , Peter Zijlstra , "linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , "jens.axboe" , Peter Staubach , Arjan van de Ven , Ingo Molnar , "linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" To: Trond Myklebust Return-path: Content-Disposition: inline In-Reply-To: <1262190168.7332.6.camel@localhost> Sender: linux-nfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-fsdevel.vger.kernel.org Trond, On Thu, Dec 31, 2009 at 12:22:48AM +0800, Trond Myklebust wrote: > it ignores the commit request if the caller is just doing a > WB_SYNC_NONE background flush, waiting instead for the ensuing > WB_SYNC_ALL request... I'm afraid this will block balance_dirty_pages() until explicit sync/fsync calls: COMMITs are bad, however if we don't send them regularly, NR_UNSTABLE_NFS will grow large and block balance_dirty_pages() as well as throttle_vm_writeout().. > +int nfs_commit_unstable_pages(struct address_space *mapping, > + struct writeback_control *wbc) > +{ > + struct inode *inode = mapping->host; > + int flags = FLUSH_SYNC; > + int ret; > + ==> > + /* Don't commit if this is just a non-blocking flush */ ==> > + if (wbc->sync_mode != WB_SYNC_ALL) { ==> > + mark_inode_unstable_pages(inode); ==> > + return 0; ==> > + } > + if (wbc->nonblocking) > + flags = 0; > + ret = nfs_commit_inode(inode, flags); > + if (ret > 0) > + return 0; > + return ret; > +} The NFS protocol provides no painless way to reclaim unstable pages other than the COMMIT (or sync write).. This leaves us in a dilemma. We may reasonably reduce the number of COMMITs, and possibly even delay them for a while (and hope the server have writeback the pages before the COMMIT, somehow fragile). What we can obviously do is to avoid sending a COMMIT - if there are already an ongoing COMMIT for the same inode - or when there are ongoing WRITE for the inode (are there easy way to detect this?) What do you think? Thanks, Fengguang --- fs/nfs/inode.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) --- linux.orig/fs/nfs/inode.c 2009-12-25 09:25:38.000000000 +0800 +++ linux/fs/nfs/inode.c 2009-12-25 10:13:06.000000000 +0800 @@ -105,8 +105,11 @@ int nfs_write_inode(struct inode *inode, ret = filemap_fdatawait(inode->i_mapping); if (ret == 0) ret = nfs_commit_inode(inode, FLUSH_SYNC); - } else + } else if (!radix_tree_tagged(&NFS_I(inode)->nfs_page_tree, + NFS_PAGE_TAG_LOCKED)) ret = nfs_commit_inode(inode, 0); + else + ret = -EAGAIN; if (ret >= 0) return 0; __mark_inode_dirty(inode, I_DIRTY_DATASYNC); -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html