From: Harshula <harshula@redhat.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: "Myklebust, Trond" <Trond.Myklebust@netapp.com>,
Derek McEachern <derekm@ti.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: NFS Mount Option 'nofsc'
Date: Thu, 09 Feb 2012 14:56:16 +1100 [thread overview]
Message-ID: <1328759776.8981.75.camel@serendib> (raw)
In-Reply-To: <386479B9-C285-44C9-896B-A254091272FD@oracle.com>
Hi Chuck,
On Wed, 2012-02-08 at 10:40 -0500, Chuck Lever wrote:
> On Feb 8, 2012, at 2:43 AM, Harshula wrote:
> > Could you please expand on the subtleties involved that require an
> > application to be rewritten if forcedirectio mount option was available?
> >
> > A scenario where forcedirectio would be useful is when an application
> > reads nearly a TB of data from local disks, processes that data and then
> > dumps it to an NFS mount. All that happens while other processes are
> > reading/writing to the local disks. The application does not have an
> > O_DIRECT option nor is the source code available.
> >
> > With paged I/O the problem we see is that the NFS client system reaches
> > dirty_bytes/dirty_ratio threshold and then blocks/forces all the
> > processes to flush dirty pages. This effectively 'locks' up the NFS
> > client system while the NFS dirty pages are pushed slowly over the wire
> > to the NFS server. Some of the processes that have nothing to do with
> > writing to the NFS mount are badly impacted. A forcedirectio mount
> > option would be very helpful in this scenario. Do you have any advice on
> > alleviating such problems on the NFS client by only using existing
> > tunables?
>
> Using direct I/O would be a work-around. The fundamental problem is
> the architecture of the VM system, and over time we have been making
> improvements there.
>
> Instead of a mount option, you can fix your application to use direct
> I/O. Or you can change it to provide the kernel with (better) hints
> about the disposition of the data it is generating (madvise and
> fadvise system calls). (On Linux we assume you have source code and
> can make such changes. I realize this is not true for proprietary
> applications).
>
> You could try using the "sync" mount option to cause the NFS client to
> push writes to the server immediately rather than delaying them. This
> would also slow down applications that aggressively dirties pages on
> the client.
>
> Meanwhile, you can dial down the dirty_ratio and especially the
> dirty_background_ratio settings to trigger earlier writeback. We've
> also found increasing min_free_bytes has positive effects. The exact
> settings depend on how much memory your client has. Experimenting
> yourself is pretty harmless, so I won't give exact settings here.
Thanks for the reply. Unfortunately, not all vendors provide the source
code, so using O_DIRECT or fsync is not always an option.
Lowering dirty_bytes/dirty_ratio and
dirty_background_bytes/dirty_background_ratio did help as it smoothed
out the data transfer over the wire by pushing data out to the NFS
server sooner. Otherwise, I was seeing the data transfer over the wire
having idle periods while >10GiB of pages were being dirtied by the
processes, then congestion as soon as the dirty_ratio was reached and
the frantic flushing of dirty pages to the NFS server. However,
modifying dirty_* tunables has a system-wide impact, hence it was not
accepted.
The "sync" option, depending on the NFS server, may impact the NFS
server's performance when serving many NFS clients. But still worth a
try.
The other hack that seems to work is periodically triggering an
nfs_getattr(), via ls -l, to force the dirty pages to be flushed to the
NFS server. Not exactly elegant ...
Thanks,
#
next prev parent reply other threads:[~2012-02-09 3:56 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-08 2:45 NFS Mount Option 'nofsc' Derek McEachern
2012-02-08 4:55 ` Myklebust, Trond
2012-02-08 7:43 ` Harshula
2012-02-08 15:40 ` Chuck Lever
2012-02-09 3:56 ` Harshula [this message]
2012-02-09 4:12 ` Myklebust, Trond
2012-02-09 5:51 ` Harshula
2012-02-09 14:48 ` Malahal Naineni
2012-02-09 15:31 ` Myklebust, Trond
2012-02-10 8:07 ` Harshula
2012-02-10 16:48 ` Myklebust, Trond
2012-02-20 5:35 ` Harshula
2012-02-08 18:13 ` Derek McEachern
2012-02-08 18:15 ` Chuck Lever
2012-02-08 19:52 ` Derek McEachern
2012-02-08 20:00 ` Chuck Lever
2012-02-08 21:16 ` Derek McEachern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1328759776.8981.75.camel@serendib \
--to=harshula@redhat.com \
--cc=Trond.Myklebust@netapp.com \
--cc=chuck.lever@oracle.com \
--cc=derekm@ti.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.