linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Harshula <harshula@redhat.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: "Myklebust, Trond" <Trond.Myklebust@netapp.com>,
	Derek McEachern <derekm@ti.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: NFS Mount Option 'nofsc'
Date: Thu, 09 Feb 2012 14:56:16 +1100	[thread overview]
Message-ID: <1328759776.8981.75.camel@serendib> (raw)
In-Reply-To: <386479B9-C285-44C9-896B-A254091272FD@oracle.com>

Hi Chuck,

On Wed, 2012-02-08 at 10:40 -0500, Chuck Lever wrote:
> On Feb 8, 2012, at 2:43 AM, Harshula wrote:

> > Could you please expand on the subtleties involved that require an
> > application to be rewritten if forcedirectio mount option was available?
> > 
> > A scenario where forcedirectio would be useful is when an application
> > reads nearly a TB of data from local disks, processes that data and then
> > dumps it to an NFS mount. All that happens while other processes are
> > reading/writing to the local disks. The application does not have an
> > O_DIRECT option nor is the source code available.
> > 
> > With paged I/O the problem we see is that the NFS client system reaches
> > dirty_bytes/dirty_ratio threshold and then blocks/forces all the
> > processes to flush dirty pages. This effectively 'locks' up the NFS
> > client system while the NFS dirty pages are pushed slowly over the wire
> > to the NFS server. Some of the processes that have nothing to do with
> > writing to the NFS mount are badly impacted. A forcedirectio mount
> > option would be very helpful in this scenario. Do you have any advice on
> > alleviating such problems on the NFS client by only using existing
> > tunables?
> 
> Using direct I/O would be a work-around.  The fundamental problem is
> the architecture of the VM system, and over time we have been making
> improvements there.
> 
> Instead of a mount option, you can fix your application to use direct
> I/O.  Or you can change it to provide the kernel with (better) hints
> about the disposition of the data it is generating (madvise and
> fadvise system calls).  (On Linux we assume you have source code and
> can make such changes.  I realize this is not true for proprietary
> applications).
> 
> You could try using the "sync" mount option to cause the NFS client to
> push writes to the server immediately rather than delaying them.  This
> would also slow down applications that aggressively dirties pages on
> the client.
> 
> Meanwhile, you can dial down the dirty_ratio and especially the
> dirty_background_ratio settings to trigger earlier writeback.  We've
> also found increasing min_free_bytes has positive effects.  The exact
> settings depend on how much memory your client has.  Experimenting
> yourself is pretty harmless, so I won't give exact settings here.

Thanks for the reply. Unfortunately, not all vendors provide the source
code, so using O_DIRECT or fsync is not always an option. 

Lowering dirty_bytes/dirty_ratio and
dirty_background_bytes/dirty_background_ratio did help as it smoothed
out the data transfer over the wire by pushing data out to the NFS
server sooner. Otherwise, I was seeing the data transfer over the wire
having idle periods while >10GiB of pages were being dirtied by the
processes, then congestion as soon as the dirty_ratio was reached and
the frantic flushing of dirty pages to the NFS server. However,
modifying dirty_* tunables has a system-wide impact, hence it was not
accepted.

The "sync" option, depending on the NFS server, may impact the NFS
server's performance when serving many NFS clients. But still worth a
try.

The other hack that seems to work is periodically triggering an
nfs_getattr(), via ls -l, to force the dirty pages to be flushed to the
NFS server. Not exactly elegant ...

Thanks,
#


  reply	other threads:[~2012-02-09  3:56 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-08  2:45 NFS Mount Option 'nofsc' Derek McEachern
2012-02-08  4:55 ` Myklebust, Trond
2012-02-08  7:43   ` Harshula
2012-02-08 15:40     ` Chuck Lever
2012-02-09  3:56       ` Harshula [this message]
2012-02-09  4:12         ` Myklebust, Trond
2012-02-09  5:51           ` Harshula
2012-02-09 14:48             ` Malahal Naineni
2012-02-09 15:31             ` Myklebust, Trond
2012-02-10  8:07               ` Harshula
2012-02-10 16:48                 ` Myklebust, Trond
2012-02-20  5:35                   ` Harshula
2012-02-08 18:13   ` Derek McEachern
2012-02-08 18:15     ` Chuck Lever
2012-02-08 19:52       ` Derek McEachern
2012-02-08 20:00         ` Chuck Lever
2012-02-08 21:16           ` Derek McEachern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1328759776.8981.75.camel@serendib \
    --to=harshula@redhat.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=chuck.lever@oracle.com \
    --cc=derekm@ti.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).