From: Harshula <harshula@redhat.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: "Myklebust, Trond" <Trond.Myklebust@netapp.com>,
Derek McEachern <derekm@ti.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: NFS Mount Option 'nofsc'
Date: Thu, 09 Feb 2012 14:56:16 +1100 [thread overview]
Message-ID: <1328759776.8981.75.camel@serendib> (raw)
In-Reply-To: <386479B9-C285-44C9-896B-A254091272FD@oracle.com>
Hi Chuck,
On Wed, 2012-02-08 at 10:40 -0500, Chuck Lever wrote:
> On Feb 8, 2012, at 2:43 AM, Harshula wrote:
> > Could you please expand on the subtleties involved that require an
> > application to be rewritten if forcedirectio mount option was available?
> >
> > A scenario where forcedirectio would be useful is when an application
> > reads nearly a TB of data from local disks, processes that data and then
> > dumps it to an NFS mount. All that happens while other processes are
> > reading/writing to the local disks. The application does not have an
> > O_DIRECT option nor is the source code available.
> >
> > With paged I/O the problem we see is that the NFS client system reaches
> > dirty_bytes/dirty_ratio threshold and then blocks/forces all the
> > processes to flush dirty pages. This effectively 'locks' up the NFS
> > client system while the NFS dirty pages are pushed slowly over the wire
> > to the NFS server. Some of the processes that have nothing to do with
> > writing to the NFS mount are badly impacted. A forcedirectio mount
> > option would be very helpful in this scenario. Do you have any advice on
> > alleviating such problems on the NFS client by only using existing
> > tunables?
>
> Using direct I/O would be a work-around. The fundamental problem is
> the architecture of the VM system, and over time we have been making
> improvements there.
>
> Instead of a mount option, you can fix your application to use direct
> I/O. Or you can change it to provide the kernel with (better) hints
> about the disposition of the data it is generating (madvise and
> fadvise system calls). (On Linux we assume you have source code and
> can make such changes. I realize this is not true for proprietary
> applications).
>
> You could try using the "sync" mount option to cause the NFS client to
> push writes to the server immediately rather than delaying them. This
> would also slow down applications that aggressively dirties pages on
> the client.
>
> Meanwhile, you can dial down the dirty_ratio and especially the
> dirty_background_ratio settings to trigger earlier writeback. We've
> also found increasing min_free_bytes has positive effects. The exact
> settings depend on how much memory your client has. Experimenting
> yourself is pretty harmless, so I won't give exact settings here.
Thanks for the reply. Unfortunately, not all vendors provide the source
code, so using O_DIRECT or fsync is not always an option.
Lowering dirty_bytes/dirty_ratio and
dirty_background_bytes/dirty_background_ratio did help as it smoothed
out the data transfer over the wire by pushing data out to the NFS
server sooner. Otherwise, I was seeing the data transfer over the wire
having idle periods while >10GiB of pages were being dirtied by the
processes, then congestion as soon as the dirty_ratio was reached and
the frantic flushing of dirty pages to the NFS server. However,
modifying dirty_* tunables has a system-wide impact, hence it was not
accepted.
The "sync" option, depending on the NFS server, may impact the NFS
server's performance when serving many NFS clients. But still worth a
try.
The other hack that seems to work is periodically triggering an
nfs_getattr(), via ls -l, to force the dirty pages to be flushed to the
NFS server. Not exactly elegant ...
Thanks,
#
next prev parent reply other threads:[~2012-02-09 3:56 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-08 2:45 NFS Mount Option 'nofsc' Derek McEachern
2012-02-08 4:55 ` Myklebust, Trond
2012-02-08 7:43 ` Harshula
2012-02-08 15:40 ` Chuck Lever
2012-02-09 3:56 ` Harshula [this message]
2012-02-09 4:12 ` Myklebust, Trond
2012-02-09 5:51 ` Harshula
2012-02-09 14:48 ` Malahal Naineni
2012-02-09 15:31 ` Myklebust, Trond
2012-02-10 8:07 ` Harshula
2012-02-10 16:48 ` Myklebust, Trond
2012-02-20 5:35 ` Harshula
2012-02-08 18:13 ` Derek McEachern
2012-02-08 18:15 ` Chuck Lever
2012-02-08 19:52 ` Derek McEachern
2012-02-08 20:00 ` Chuck Lever
2012-02-08 21:16 ` Derek McEachern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1328759776.8981.75.camel@serendib \
--to=harshula@redhat.com \
--cc=Trond.Myklebust@netapp.com \
--cc=chuck.lever@oracle.com \
--cc=derekm@ti.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).