linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>,
	Steve Rago <sar@nec-labs.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"jens.axboe" <jens.axboe@oracle.com>,
	Peter Staubach <staubach@redhat.com>, Jan Kara <jack@suse.cz>,
	Arjan van de Ven <arjan@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] improve the performance of large sequential write NFS workloads
Date: Wed, 23 Dec 2009 19:05:51 +0100	[thread overview]
Message-ID: <20091223180551.GD3159@quack.suse.cz> (raw)
In-Reply-To: <1261578107.2606.11.camel@localhost>

On Wed 23-12-09 15:21:47, Trond Myklebust wrote:
> On Tue, 2009-12-22 at 09:59 +0800, Wu Fengguang wrote: 
> > 1) normal fs sets I_DIRTY_DATASYNC when extending i_size, however NFS
> >    will set the flag for any pages written -- why this trick? To
> >    guarantee the call of nfs_commit_inode()? Which unfortunately turns
> >    almost every server side NFS write into sync writes..
> > 
> >  writeback_single_inode:
> >     do_writepages
> >       nfs_writepages
> >         nfs_writepage ----[short time later]---> nfs_writeback_release*
> >                                                    nfs_mark_request_commit
> >                                                      __mark_inode_dirty(I_DIRTY_DATASYNC);
> >                                     
> >     if (I_DIRTY_SYNC || I_DIRTY_DATASYNC)  <---- so this will be true for most time
> >       write_inode
> >         nfs_write_inode
> >           nfs_commit_inode
> 
> 
> I have been working on a fix for this. We basically do want to ensure
> that NFS calls commit (otherwise we're not finished cleaning the dirty
> pages), but we want to do it _after_ we've waited for all the writes to
> complete. See below...
> 
> Trond
> 
> ------------------------------------------------------------------------------------------------------ 
> VFS: Add a new inode state: I_UNSTABLE_PAGES
> 
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> Add a new inode state to enable the vfs to commit the nfs unstable pages to
> stable storage once the write back of dirty pages is done.
  Hmm, does your patch really help?

> @@ -474,6 +482,18 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
>  	}
>  
>  	spin_lock(&inode_lock);
> +	/*
> +	 * Special state for cleaning NFS unstable pages
> +	 */
> +	if (inode->i_state & I_UNSTABLE_PAGES) {
> +		int err;
> +		inode->i_state &= ~I_UNSTABLE_PAGES;
> +		spin_unlock(&inode_lock);
> +		err = commit_unstable_pages(inode, wait);
> +		if (ret == 0)
> +			ret = err;
> +		spin_lock(&inode_lock);
> +	}
  I don't quite understand this chunk: We've called writeback_single_inode
because it had some dirty pages. Thus it has I_DIRTY_DATASYNC set and a few
lines above your chunk, we've called nfs_write_inode which sent commit to
the server. Now here you sometimes send the commit again? What's the
purpose?

> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index faa0918..4f129b3 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -99,17 +99,14 @@ u64 nfs_compat_user_ino64(u64 fileid)
>  
>  int nfs_write_inode(struct inode *inode, int sync)
>  {
> +	int flags = 0;
>  	int ret;
>  
> -	if (sync) {
> -		ret = filemap_fdatawait(inode->i_mapping);
> -		if (ret == 0)
> -			ret = nfs_commit_inode(inode, FLUSH_SYNC);
> -	} else
> -		ret = nfs_commit_inode(inode, 0);
> -	if (ret >= 0)
> +	if (sync)
> +		flags = FLUSH_SYNC;
> +	ret = nfs_commit_inode(inode, flags);
> +	if (ret > 0)
>  		return 0;
> -	__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
>  	return ret;
>  }
>  

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2009-12-23 18:05 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1261015420.1947.54.camel@serenity>
     [not found] ` <1261037877.27920.36.camel@laptop>
     [not found]   ` <20091219122033.GA11360@localhost>
     [not found]     ` <1261232747.1947.194.camel@serenity>
2009-12-22  1:59       ` [PATCH] improve the performance of large sequential write NFS workloads Wu Fengguang
2009-12-22 12:35         ` Jan Kara
     [not found]           ` <20091222123538.GB604-jyMamyUUXNJG4ohzP4jBZS1Fcj925eT/@public.gmane.org>
2009-12-23  8:43             ` Christoph Hellwig
     [not found]               ` <20091223084302.GA14912-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2009-12-23 13:32                 ` Jan Kara
     [not found]                   ` <20091223133244.GB3159-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2009-12-24  5:25                     ` Wu Fengguang
2009-12-24  1:26           ` Wu Fengguang
2009-12-22 16:41         ` Steve Rago
2009-12-24  1:21           ` Wu Fengguang
2009-12-24 14:49             ` Steve Rago
2009-12-25  7:37               ` Wu Fengguang
2009-12-23 14:21         ` Trond Myklebust
2009-12-23 18:05           ` Jan Kara [this message]
     [not found]             ` <20091223180551.GD3159-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2009-12-23 19:12               ` Trond Myklebust
2009-12-24  2:52                 ` Wu Fengguang
2009-12-24 12:04                   ` Trond Myklebust
2009-12-25  5:56                     ` Wu Fengguang
2009-12-30 16:22                       ` Trond Myklebust
2009-12-31  5:04                         ` Wu Fengguang
2009-12-31 19:13                           ` Trond Myklebust
2010-01-06  3:03                             ` Wu Fengguang
2010-01-06 16:56                               ` Trond Myklebust
2010-01-06 18:26                                 ` Trond Myklebust
2010-01-06 18:37                                   ` Peter Zijlstra
2010-01-06 18:52                                     ` Trond Myklebust
2010-01-06 19:07                                       ` Peter Zijlstra
2010-01-06 19:21                                         ` Trond Myklebust
2010-01-06 19:53                                           ` Trond Myklebust
2010-01-06 20:09                                             ` Jan Kara
     [not found]                                               ` <20100106200928.GB22781-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2010-01-06 20:51                                                 ` [PATCH 0/6] " Trond Myklebust
     [not found]                                                   ` <20100106205110.22547.85345.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-06 20:51                                                     ` [PATCH 1/6] VFS: Ensure that writeback_single_inode() commits unstable writes Trond Myklebust
     [not found]                                                       ` <20100106205110.22547.17971.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-06 21:38                                                         ` Jan Kara
     [not found]                                                           ` <20100106213843.GD22781-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2010-01-06 21:48                                                             ` Trond Myklebust
2010-01-07  2:18                                                         ` Wu Fengguang
     [not found]                                                           ` <1262839082.2185.15.camel@localhost>
2010-01-07  4:48                                                             ` Wu Fengguang
2010-01-07  4:53                                                               ` [PATCH 0/5] Re: [PATCH] improve the performance of large sequential write NFS workloads Trond Myklebust
     [not found]                                                                 ` <20100107045330.5986.55090.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  4:53                                                                   ` [PATCH 5/5] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
2010-01-07  4:53                                                                   ` [PATCH 2/5] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
2010-01-07  4:53                                                                   ` [PATCH 4/5] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-01-07  4:53                                                                   ` [PATCH 1/5] VFS: Ensure that writeback_single_inode() commits unstable writes Trond Myklebust
2010-01-07  4:53                                                                   ` [PATCH 3/5] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-01-07 14:56                                                             ` [PATCH 1/6] VFS: Ensure that writeback_single_inode() commits unstable writes Wu Fengguang
2010-01-07 15:10                                                               ` Trond Myklebust
2010-01-08  1:17                                                                 ` Wu Fengguang
2010-01-08  1:37                                                                   ` Trond Myklebust
2010-01-08  1:53                                                                     ` Wu Fengguang
2010-01-08  9:25                                                                 ` Christoph Hellwig
2010-01-08 13:46                                                                   ` Trond Myklebust
2010-01-08 13:54                                                                     ` Christoph Hellwig
2010-01-08 14:15                                                                       ` Trond Myklebust
2010-01-06 20:51                                                     ` [PATCH 4/6] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-01-07  1:56                                                       ` Wu Fengguang
2010-01-06 20:51                                                     ` [PATCH 6/6] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
     [not found]                                                       ` <20100106205110.22547.31434.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  2:32                                                         ` Wu Fengguang
2010-01-06 20:51                                                     ` [PATCH 2/6] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-01-07  2:29                                                       ` Wu Fengguang
2010-01-07  4:49                                                         ` Trond Myklebust
2010-01-07  5:03                                                           ` Wu Fengguang
2010-01-07  5:30                                                             ` Trond Myklebust
2010-01-07 14:37                                                               ` Wu Fengguang
2010-01-06 20:51                                                     ` [PATCH 5/6] VM: Use per-bdi unstable accounting to improve use of wbc->force_commit Trond Myklebust
     [not found]                                                       ` <20100106205110.22547.32584.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  2:34                                                         ` Wu Fengguang
2010-01-06 20:51                                                     ` [PATCH 3/6] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
     [not found]                                                       ` <20100106205110.22547.93554.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  1:48                                                         ` Wu Fengguang
2010-01-06 21:44                                                     ` [PATCH 0/6] Re: [PATCH] improve the performance of large sequential write NFS workloads Jan Kara
2010-01-06 22:03                                                       ` Trond Myklebust
2010-01-07  8:16                                                   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091223180551.GD3159@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=Trond.Myklebust@netapp.com \
    --cc=arjan@infradead.org \
    --cc=fengguang.wu@intel.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=sar@nec-labs.com \
    --cc=staubach@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).