From: Harshula <harshula@redhat.com>
To: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>,
Derek McEachern <derekm@ti.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: NFS Mount Option 'nofsc'
Date: Fri, 10 Feb 2012 19:07:24 +1100 [thread overview]
Message-ID: <1328861244.8981.139.camel@serendib> (raw)
In-Reply-To: <1328801489.13180.41.camel@lade.trondhjem.org>
Hi Trond,
On Thu, 2012-02-09 at 15:31 +0000, Myklebust, Trond wrote:
> On Thu, 2012-02-09 at 16:51 +1100, Harshula wrote:
> > Hi Trond,
> >
> > Thanks for the reply. Could you please elaborate on the subtleties
> > involved that require an application to be rewritten if forcedirectio
> > mount option was available?
>
> Firstly, we don't support O_DIRECT+O_APPEND (since the NFS protocol
> itself doesn't support atomic appends), so that would break a bunch of
> applications.
>
> Secondly, uncached I/O means that read() and write() requests need to be
> serialised by the application itself, since there are no atomicity or
> ordering guarantees at the VFS, NFS or RPC call level. Normally, the
> page cache services read() requests if there are outstanding writes, and
> so provides the atomicity guarantees that POSIX requires.
> IOW: if a write() occurs while you are reading, the application may end
> up retrieving part of the old data, and part of the new data instead of
> either one or the other.
>
> IOW: your application still needs to be aware of the fact that it is
> using O_DIRECT, and you are better of adding explicit support for it
> rather than hacky cluges such as a forcedirectio option.
Thanks. Would it be accurate to say that if there were only either
streaming writes or (xor) streaming reads to any given file on the NFS
mount, the application would not need to be rewritten?
Do you see forcedirectio as a sharp object that someone could stab
themselves with?
> > There's another scenario, which we talked about a while back, where the
> > cached async reads of a slowly growing file (tail) was spitting out
> > non-exist NULLs to user space. The forcedirectio mount option should
> > prevent that. Furthermore, the "sync" mount option will not help anymore
> > because you removed nfs_readpage_sync().
>
> No. See the points about O_APPEND and serialisation of read() and
> write() above. You may still end up seeing NUL characters (and indeed
> worse forms of corruption).
If the NFS client only does cached async reads of a slowly growing file
(tail), what's the problem? Is nfs_readpage_sync() gone forever, or
could it be revived?
> > > > The other hack that seems to work is periodically triggering an
> > > > nfs_getattr(), via ls -l, to force the dirty pages to be flushed to the
> > > > NFS server. Not exactly elegant ...
> > >
> > > ????????????????????????????????
> >
> > int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
> > {
> > struct inode *inode = dentry->d_inode;
> > int need_atime = NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATIME;
> > int err;
> >
> > /* Flush out writes to the server in order to update c/mtime. */
> > if (S_ISREG(inode->i_mode)) {
> > err = filemap_write_and_wait(inode->i_mapping);
> > if (err)
> > goto out;
> > }
>
> I'm aware of that code. The point is that '-osync' does that for free.
-osync also impacts the performance of the entire NFS mount. With
aforementioned hack, you can isolate the specific file(s) that need
their dirty pages to be flushed frequently to avoid hitting global dirty
page limit.
cya,
#
next prev parent reply other threads:[~2012-02-10 8:07 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-08 2:45 NFS Mount Option 'nofsc' Derek McEachern
2012-02-08 4:55 ` Myklebust, Trond
2012-02-08 7:43 ` Harshula
2012-02-08 15:40 ` Chuck Lever
2012-02-09 3:56 ` Harshula
2012-02-09 4:12 ` Myklebust, Trond
2012-02-09 5:51 ` Harshula
2012-02-09 14:48 ` Malahal Naineni
2012-02-09 15:31 ` Myklebust, Trond
2012-02-10 8:07 ` Harshula [this message]
2012-02-10 16:48 ` Myklebust, Trond
2012-02-20 5:35 ` Harshula
2012-02-08 18:13 ` Derek McEachern
2012-02-08 18:15 ` Chuck Lever
2012-02-08 19:52 ` Derek McEachern
2012-02-08 20:00 ` Chuck Lever
2012-02-08 21:16 ` Derek McEachern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1328861244.8981.139.camel@serendib \
--to=harshula@redhat.com \
--cc=Trond.Myklebust@netapp.com \
--cc=chuck.lever@oracle.com \
--cc=derekm@ti.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).