linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Trond Myklebust <Trond.Myklebust@netapp.com>
To: Benny Halevy <bhalevy@panasas.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	linux-nfs@vger.kernel.org, nfsv4@ietf.org
Subject: Re: [nfsv4] layoutcommits and file layout
Date: Wed, 05 Jan 2011 14:14:34 -0500	[thread overview]
Message-ID: <1294254874.3574.32.camel@heimdal.trondhjem.org> (raw)
In-Reply-To: <1294254263.3574.24.camel@heimdal.trondhjem.org>

On Wed, 2011-01-05 at 14:04 -0500, Trond Myklebust wrote: 
> On Wed, 2011-01-05 at 21:01 +0200, Benny Halevy wrote: 
> > On 2011-01-03 16:40, Trond Myklebust wrote:
> > > On Mon, 2011-01-03 at 16:21 +0200, Benny Halevy wrote: 
> > >> On 2010-12-17 01:07, Christoph Hellwig wrote:
> > >>> On Thu, Dec 16, 2010 at 11:21:21AM -0500, Matt W. Benjamin wrote:
> > >>>> Hi,
> > >>>>
> > >>>> We have a files implementation which wants to receive LAYOUTCOMMIT when a client is finished with a layout.  It was my clear understanding from rfc5661 that we could expect this behavior.
> > >>>
> > >>> Care to post it to the list?
> > >>>
> > >>
> > >> I don't know what Matt's server is doing but the fundamental problem is
> > >> manifested with extending a file with parallel DS writes.
> > >> Assuming that the DS writes are executed in arbitrary order,
> > >> exposing the file length before LAYOUTCOMMIT can cause
> > >> a concurrent reader to read a hole.  Although locking can
> > >> solve this case, day-to-day applications that work well over
> > >> local filesystem and legacy NFS may break because of this.
> > > 
> > > ...and this differs from ordinary NFS writes exactly how?
> > > 
> > > Both cached and uncached (i.e. O_DIRECT) writes can and will be flushed
> > > to disk in entirely random order when writing to the MDS. If you have a
> > > parallel reader on another client (or even on the same client in the
> > > case of O_DIRECT), and want it to see accurate data, then use locking.
> > > If not, you will see holes and other strangeness.
> > > 
> > > IOW: There are no 'day-to-day applications that work well over legacy
> > > NFS' that rely on this behaviour.
> > > 
> > 
> > Assuming the client writes sequentially (over tcp) the writes will
> > practically be processed in order into the server's cache so with
> > no crashes in the mix a parallel reader will see no holes.
> > I'd really like the following scenario to work over pNFS with
> > no hassles:
> > 	"some app >> foo" on one client, and
> > 	"tail -f foo" on another
> 
> No, that doesn't work today! Believe me, I get the "bug reports"...
> 
> There is no point in trying to add properties to pNFS that don't exist
> with ordinary NFS.

...and for the record: use of TCP does _not_ suffice to ensure writes
are processed in order.

In the Linux kernel, we have all sorts of parallelism going on before
the writes even hit the socket on the client. Everything from background
flushing to queuing in the sunrpc layer (e.g. for a session slot)
conspires to destroy any hope of ever achieving what you propose above.

That's not even counting what goes on with the server side. Think, for
instance, of the case where the server crashes before a COMMIT has been
successfully sent. Not only will your reader see holes, it will think
the file has been truncated...

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4

      reply	other threads:[~2011-01-05 19:14 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <978693366.32.1292516428080.JavaMail.root@thunderbeast.private.linuxbox.com>
2010-12-16 16:21 ` [nfsv4] layoutcommits and file layout Matt W. Benjamin
     [not found]   ` <1740153586.34.1292516481789.JavaMail.root-DQa+Qhn4Z593Hjf6844flrbbgpPoC6wPvwx5bNz670MAvxtiuMwx3w@public.gmane.org>
2010-12-16 23:07     ` Christoph Hellwig
2011-01-03 14:21       ` Benny Halevy
2011-01-03 14:40         ` Trond Myklebust
2011-01-05 19:01           ` Benny Halevy
2011-01-05 19:04             ` Trond Myklebust
2011-01-05 19:14               ` Trond Myklebust [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1294254874.3574.32.camel@heimdal.trondhjem.org \
    --to=trond.myklebust@netapp.com \
    --cc=bhalevy@panasas.com \
    --cc=hch@infradead.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=nfsv4@ietf.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).