public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Steve Dickson <SteveD@redhat.com>,
	Tom Talpey <tmtalpey@gmail.com>,
	Linux NFS Mailing list <linux-nfs@vger.kernel.org>
Subject: Re: Link performance over NFS degraded in RHEL5. -- was : Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing
Date: Mon, 15 Jun 2009 22:02:30 -0400	[thread overview]
Message-ID: <20090616020230.GB20223@fieldses.org> (raw)
In-Reply-To: <1245112324.7470.7.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>

On Mon, Jun 15, 2009 at 05:32:04PM -0700, Trond Myklebust wrote:
> On Mon, 2009-06-15 at 19:08 -0400, J. Bruce Fields wrote:
> > On Fri, Jun 05, 2009 at 12:35:15PM -0400, Trond Myklebust wrote:
> > > On Fri, 2009-06-05 at 12:05 -0400, J. Bruce Fields wrote:
> > > > On Fri, Jun 05, 2009 at 09:57:19AM -0400, Steve Dickson wrote:
> > > > > 
> > > > > 
> > > > > Trond Myklebust wrote:
> > > > > > On Fri, 2009-06-05 at 09:30 -0400, Steve Dickson wrote:
> > > > > >> Tom Talpey wrote:
> > > > > >>> On 6/5/2009 7:35 AM, Steve Dickson wrote:
> > > > > >>>> Brian R Cowan wrote:
> > > > > >>>>> Trond Myklebust<trond.myklebust@fys.uio.no>  wrote on 06/04/2009
> > > > > >>>>> 02:04:58
> > > > > >>>>> PM:
> > > > > >>>>>
> > > > > >>>>>> Did you try turning off write gathering on the server (i.e. add the
> > > > > >>>>>> 'no_wdelay' export option)? As I said earlier, that forces a delay of
> > > > > >>>>>> 10ms per RPC call, which might explain the FILE_SYNC slowness.
> > > > > >>>>> Just tried it, this seems to be a very useful workaround as well. The
> > > > > >>>>> FILE_SYNC write calls come back in about the same amount of time as the
> > > > > >>>>> write+commit pairs... Speeds up building regardless of the network
> > > > > >>>>> filesystem (ClearCase MVFS or straight NFS).
> > > > > >>>> Does anybody had the history as to why 'no_wdelay' is an
> > > > > >>>> export default?
> > > > > >>> Because "wdelay" is a complete crock?
> > > > > >>>
> > > > > >>> Adding 10ms to every write RPC only helps if there's a steady
> > > > > >>> single-file stream arriving at the server. In most other workloads
> > > > > >>> it only slows things down.
> > > > > >>>
> > > > > >>> The better solution is to continue tuning the clients to issue
> > > > > >>> writes in a more sequential and less all-or-nothing fashion.
> > > > > >>> There are plenty of other less crock-ful things to do in the
> > > > > >>> server, too.
> > > > > >> Ok... So do you think removing it as a default would cause
> > > > > >> any regressions?
> > > > > > 
> > > > > > It might for NFSv2 clients, since they don't have the option of using
> > > > > > unstable writes. I'd therefore prefer a kernel solution that makes write
> > > > > > gathering an NFSv2 only feature.
> > > > > Sounds good to me! ;-)
> > > > 
> > > > Patch welcomed.--b.
> > > 
> > > Something like this ought to suffice...
> > 
> > Thanks, applied.
> > 
> > I'd also like to apply cleanup something like the following--there's
> > probably some cleaner way, but it just bothers me to have this
> > write-gathering special case take up the bulk of nfsd_vfs_write....
> > 
> > --b.
> > 
> > commit bfe7680d68afaf3f0b1195c8976db1fd1f03229d
> > Author: J. Bruce Fields <bfields@citi.umich.edu>
> > Date:   Mon Jun 15 16:03:53 2009 -0700
> > 
> >     nfsd: Pull write-gathering code out of nfsd_vfs_write
> >     
> >     This is a relatively self-contained piece of code that handles a special
> >     case--move it to its own function.
> >     
> >     Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
> > 
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index a8aac7f..de68557 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -963,6 +963,44 @@ static void kill_suid(struct dentry *dentry)
> >  	mutex_unlock(&dentry->d_inode->i_mutex);
> >  }
> >  
> > +/*
> > + * Gathered writes: If another process is currently writing to the file,
> > + * there's a high chance this is another nfsd (triggered by a bulk write
> > + * from a client's biod). Rather than syncing the file with each write
> > + * request, we sleep for 10 msec.
> > + *
> > + * I don't know if this roughly approximates C. Juszak's idea of
> > + * gathered writes, but it's a nice and simple solution (IMHO), and it
> > + * seems to work:-)
> > + *
> > + * Note: we do this only in the NFSv2 case, since v3 and higher have a
> > + * better tool (separate unstable writes and commits) for solving this
> > + * problem.
> > + */
> > +static void wait_for_concurrent_writes(struct file *file, int use_wgather, int *host_err)
> > +{
> > +	struct inode *inode = file->f_path.dentry->d_inode;
> > +	static ino_t last_ino;
> > +	static dev_t last_dev;
> > +
> > +	if (!use_wgather)
> > +		goto out;
> > +	if (atomic_read(&inode->i_writecount) > 1
> > +	    || (last_ino == inode->i_ino && last_dev == inode->i_sb->s_dev)) {
> > +		dprintk("nfsd: write defer %d\n", task_pid_nr(current));
> > +		msleep(10);
> > +		dprintk("nfsd: write resume %d\n", task_pid_nr(current));
> > +	}
> > +
> > +	if (inode->i_state & I_DIRTY) {
> > +		dprintk("nfsd: write sync %d\n", task_pid_nr(current));
> > +		*host_err = nfsd_sync(file);
> > +	}
> > +out:
> > +	last_ino = inode->i_ino;
> > +	last_dev = inode->i_sb->s_dev;
> > +}
> 
> Shouldn't you also timestamp the last_ino/last_dev? Currently you can
> end up waiting even if the last time you referenced this file was 10
> minutes ago...

Maybe, but I don't know that avoiding the delay in the case where
use_wdelay writes are coming rarely is particularly important.

(Note this is just a single static last_ino/last_dev, so the timestamp
would just tell us how long ago there was last a use_wdelay write.)

I'm not as interested in making wdelay work better--someone who uses v2
and wants to benchmark it can do that--as I am interested in just
getting it out of the way so I don't have to look at it again....

--b.

> 
> > +
> >  static __be32
> >  nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
> >  				loff_t offset, struct kvec *vec, int vlen,
> > @@ -1025,41 +1063,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
> >  	if (host_err >= 0 && (inode->i_mode & (S_ISUID | S_ISGID)))
> >  		kill_suid(dentry);
> >  
> > -	if (host_err >= 0 && stable) {
> > -		static ino_t	last_ino;
> > -		static dev_t	last_dev;
> > -
> > -		/*
> > -		 * Gathered writes: If another process is currently
> > -		 * writing to the file, there's a high chance
> > -		 * this is another nfsd (triggered by a bulk write
> > -		 * from a client's biod). Rather than syncing the
> > -		 * file with each write request, we sleep for 10 msec.
> > -		 *
> > -		 * I don't know if this roughly approximates
> > -		 * C. Juszak's idea of gathered writes, but it's a
> > -		 * nice and simple solution (IMHO), and it seems to
> > -		 * work:-)
> > -		 */
> > -		if (use_wgather) {
> > -			if (atomic_read(&inode->i_writecount) > 1
> > -			    || (last_ino == inode->i_ino && last_dev == inode->i_sb->s_dev)) {
> > -				dprintk("nfsd: write defer %d\n", task_pid_nr(current));
> > -				msleep(10);
> > -				dprintk("nfsd: write resume %d\n", task_pid_nr(current));
> > -			}
> > -
> > -			if (inode->i_state & I_DIRTY) {
> > -				dprintk("nfsd: write sync %d\n", task_pid_nr(current));
> > -				host_err=nfsd_sync(file);
> > -			}
> > -#if 0
> > -			wake_up(&inode->i_wait);
> > -#endif
> > -		}
> > -		last_ino = inode->i_ino;
> > -		last_dev = inode->i_sb->s_dev;
> > -	}
> > +	if (host_err >= 0 && stable)
> > +		wait_for_concurrent_writes(file, use_wgather, &host_err);
> >  
> >  	dprintk("nfsd: write complete host_err=%d\n", host_err);
> >  	if (host_err >= 0) {
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

  parent reply	other threads:[~2009-06-16  2:02 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-30 20:12 Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing Brian R Cowan
2009-04-30 20:25 ` Christoph Hellwig
2009-04-30 20:28 ` Chuck Lever
2009-04-30 20:41   ` Peter Staubach
2009-04-30 21:13     ` Chuck Lever
2009-04-30 21:23     ` Trond Myklebust
2009-05-01 16:39       ` Brian R Cowan
     [not found]       ` <1241126587.15476.62.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 15:55         ` Brian R Cowan
2009-05-29 16:46           ` Trond Myklebust
     [not found]             ` <1243615595.7155.48.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 17:25               ` Brian R Cowan
2009-05-29 17:35                 ` Trond Myklebust
     [not found]                   ` <1243618500.7155.56.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-30  0:22                     ` Greg Banks
     [not found]                       ` <ac442c870905291722x1ec811b2sda997d464898fcda-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-05-30  7:57                         ` Christoph Hellwig
2009-06-01 22:30                           ` J. Bruce Fields
2009-06-05 14:54                             ` Christoph Hellwig
2009-06-05 16:01                               ` J. Bruce Fields
2009-06-05 16:12                               ` Trond Myklebust
     [not found]                                 ` <1244218328.5410.38.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-06-05 19:54                                   ` J. Bruce Fields
2009-06-05 21:21                                     ` Trond Myklebust
2009-05-30 12:26                         ` Trond Myklebust
     [not found]                           ` <1243686363.5209.16.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-30 12:43                             ` Trond Myklebust
2009-05-30 13:02                             ` Greg Banks
     [not found]                               ` <ac442c870905300602v6950ec42y5195d2d6ea7dd4c-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-06-01 22:30                                 ` J. Bruce Fields
2009-06-02 15:00                                 ` Chuck Lever
2009-06-02 17:27                                   ` Trond Myklebust
     [not found]                                     ` <1243963631.4868.124.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-06-02 18:15                                       ` Chuck Lever
2009-06-03 16:22                                       ` Carlos Carvalho
2009-06-03 17:10                                         ` Trond Myklebust
     [not found]                                           ` <OFB53BFCCB.0CEC7A7E-ON852575C <1244138698.5203.59.camel@heimdal.trondhjem.org>
2009-06-03 21:28                                           ` Dean Hildebrand
2009-06-04  2:16                                             ` Carlos Carvalho
2009-06-04 17:42                                           ` Brian R Cowan
2009-06-04 18:04                                             ` Trond Myklebust
2009-06-04 20:43                                               ` Link performance over NFS degraded in RHEL5. -- was : " Brian R Cowan
2009-06-04 20:57                                                 ` Trond Myklebust
2009-06-04 21:30                                                   ` Brian R Cowan
2009-06-04 21:48                                                     ` Trond Myklebust
2009-06-04 21:07                                                 ` Peter Staubach
2009-06-04 21:39                                                   ` Brian R Cowan
2009-06-05 11:35                                                 ` Steve Dickson
2009-06-05 12:46                                                   ` Trond Myklebust
2009-06-05 13:03                                                     ` Brian R Cowan
2009-06-05 13:05                                                   ` Tom Talpey
     [not found]                                                   ` <4A29144A.6030405@gmail.com>
2009-06-05 13:30                                                     ` Steve Dickson
2009-06-05 13:52                                                       ` Trond Myklebust
     [not found]                                                         ` <1244209956.5410.33.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-06-05 13:57                                                           ` Steve Dickson
     [not found]                                                             ` <4A29243F.8080008-AfCzQyP5zfLQT0dZR+AlfA@public.gmane.org>
2009-06-05 16:05                                                               ` J. Bruce Fields
2009-06-05 16:35                                                                 ` Trond Myklebust
     [not found]                                                                   ` <1244219715.5410.40.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-06-15 23:08                                                                     ` J. Bruce Fields
2009-06-16  0:21                                                                       ` NeilBrown
     [not found]                                                                         ` <99d4545537613ce76040d3655b78bdb7.squirrel-eq65iwfR9nKIECXXMXunQA@public.gmane.org>
2009-06-16  0:33                                                                           ` J. Bruce Fields
2009-06-16  0:50                                                                             ` NeilBrown
     [not found]                                                                               ` <02ada87c636e1088e9365a3cbea301e7.squirrel-eq65iwfR9nKIECXXMXunQA@public.gmane.org>
2009-06-16  0:55                                                                                 ` J. Bruce Fields
2009-06-17 16:54                                                                                   ` J. Bruce Fields
2009-06-17 16:59                                                                                     ` [PATCH 1/3] nfsd: track last inode only in use_wgather case J. Bruce Fields
2009-06-17 16:59                                                                                       ` [PATCH 2/3] nfsd: Pull write-gathering code out of nfsd_vfs_write J. Bruce Fields
2009-06-17 16:59                                                                                         ` [PATCH 3/3] nfsd: minor nfsd_vfs_write cleanup J. Bruce Fields
2009-06-16  0:32                                                                       ` Link performance over NFS degraded in RHEL5. -- was : Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing Trond Myklebust
     [not found]                                                                         ` <1245112324.7470.7.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-06-16  2:02                                                                           ` J. Bruce Fields [this message]
     [not found]                                                     ` <4A291D83.1000508@RedHat.com>
2009-06-05 13:50                                                       ` Tom Talpey
2009-06-05 13:54                                                         ` Trond Myklebust
2009-06-05 13:58                                                           ` Tom Talpey
2009-06-05 13:56                                                   ` Brian R Cowan
2009-06-24 19:54                                               ` [PATCH] read-modify-write page updating Peter Staubach
2009-06-25 17:13                                                 ` Trond Myklebust
     [not found]                                                   ` <1245950029.4913.17.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-07-09 13:59                                                     ` Peter Staubach
2009-07-09 14:12                                                 ` [PATCH v2] " Peter Staubach
2009-07-09 15:39                                                   ` Trond Myklebust
     [not found]                                                     ` <1247153972.5766.15.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-07-10 15:57                                                       ` Peter Staubach
2009-07-10 17:22                                                         ` J. Bruce Fields
2009-08-04 17:52                                                   ` [PATCH v3] " Peter Staubach
2009-08-05  0:50                                                     ` Trond Myklebust
2009-05-29 17:48               ` Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing Peter Staubach
2009-05-29 18:21                 ` Trond Myklebust
2009-05-29 17:01           ` Chuck Lever
2009-05-29 17:38             ` Brian R Cowan
2009-05-29 17:42               ` Trond Myklebust
     [not found]                 ` <1243618968.7155.60.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 17:47                   ` Chuck Lever
2009-05-29 18:15                     ` Trond Myklebust
2009-05-29 17:51                   ` Peter Staubach
2009-05-29 18:25                     ` Brian R Cowan
2009-05-29 18:43                     ` Trond Myklebust
2009-05-29 17:55                   ` Brian R Cowan
2009-05-29 18:07                     ` Trond Myklebust
     [not found]                       ` <1243620455.7155.80.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 18:18                         ` Brian R Cowan
2009-05-29 18:29                           ` Trond Myklebust
     [not found]                             ` <1243621769.7155.97.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 20:09                               ` Brian R Cowan
2009-05-29 20:21                                 ` Trond Myklebust
     [not found]                                   ` <1243628519.7155.150.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 21:55                                     ` Brian R Cowan
2009-05-29 22:03                                       ` Trond Myklebust
     [not found]                                   ` <OFBB9B2C07.CC3D028B-ON852575C5. <1243634634.7155.160.camel@heimdal.trondhjem.org>
     [not found]                                     ` <1243634634.7155.160.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 22:20                                       ` Brian R Cowan
2009-05-29 22:36                                         ` Trond Myklebust
     [not found]                                     ` <OF061E0258.9581352B-ON852575C <1243636593.7155.188.camel@heimdal.trondhjem.org>
     [not found]                                       ` <1243636593.7155.188.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 23:02                                         ` Brian R Cowan
2009-05-29 23:13                                           ` Trond Myklebust
2009-05-29 17:57                   ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090616020230.GB20223@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=SteveD@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=tmtalpey@gmail.com \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox