From: Harshula <harshula@redhat.com>
To: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>,
Derek McEachern <derekm@ti.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: NFS Mount Option 'nofsc'
Date: Mon, 20 Feb 2012 16:35:58 +1100 [thread overview]
Message-ID: <1329716158.2703.46.camel@serendib> (raw)
In-Reply-To: <1328892525.13180.102.camel@lade.trondhjem.org>
Hi Trond,
On Fri, 2012-02-10 at 16:48 +0000, Myklebust, Trond wrote:
> On Fri, 2012-02-10 at 19:07 +1100, Harshula wrote:
> > Do you see forcedirectio as a sharp object that someone could stab
> > themselves with?
>
> Yes. It does lead to some very subtle POSIX violations.
I'm trying out the alternatives. Your list of reasons were convincing. Thanks.
> > If the NFS client only does cached async reads of a slowly growing file
> > (tail), what's the problem? Is nfs_readpage_sync() gone forever, or
> > could it be revived?
>
> It wouldn't help at all. The problem is the VM's handling of pages vs
> the NFS handling of file size.
>
> The VM basically uses the file size in order to determine how much data
> a page contains. If that file size changed between the instance we
> finished the READ RPC call, and the instance the VM gets round to
> locking the page again, reading the data and then checking the file
> size, then the VM may end up copying data beyond the end of that
> retrieved by the RPC call.
nfs_readpage_sync() keeps doing rsize reads (or PAGE SIZE reads if rsize
> PAGE SIZE) till the entire PAGE has been filled or EOF is hit. Since
these are synchronous reads, the subsequent READ RPC call is not sent
until the previous READ RPC reply arrives. Hence, the READ RPC reply
contains the latest metadata about the file, from the NFS server, before
deciding whether or not to do more READ RPC calls. That is not the case
with the asynchronous READ RPC calls which are queued to be sent before
the replies are received. This results in not READing enough data from
the NFS server even when the READ RPC reply explicitly states that the
file has grown. This mismatch of data and file size is then presented to
the VM.
If you look at nfs_readpage_sync() code, it does not worry about
adjusting the number of bytes to read if it is past the *current* EOF.
Only the async code adjusts the number of bytes to read if it is past
the *current* EOF. Furthermore, testing showed that using -osync (while
nfs_readpage_sync() existed) avoided the NULLs being presented to
userspace.
cya,
#
next prev parent reply other threads:[~2012-02-20 5:36 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-08 2:45 NFS Mount Option 'nofsc' Derek McEachern
2012-02-08 4:55 ` Myklebust, Trond
2012-02-08 7:43 ` Harshula
2012-02-08 15:40 ` Chuck Lever
2012-02-09 3:56 ` Harshula
2012-02-09 4:12 ` Myklebust, Trond
2012-02-09 5:51 ` Harshula
2012-02-09 14:48 ` Malahal Naineni
2012-02-09 15:31 ` Myklebust, Trond
2012-02-10 8:07 ` Harshula
2012-02-10 16:48 ` Myklebust, Trond
2012-02-20 5:35 ` Harshula [this message]
2012-02-08 18:13 ` Derek McEachern
2012-02-08 18:15 ` Chuck Lever
2012-02-08 19:52 ` Derek McEachern
2012-02-08 20:00 ` Chuck Lever
2012-02-08 21:16 ` Derek McEachern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1329716158.2703.46.camel@serendib \
--to=harshula@redhat.com \
--cc=Trond.Myklebust@netapp.com \
--cc=chuck.lever@oracle.com \
--cc=derekm@ti.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).