From: Trond Myklebust <Trond.Myklebust@netapp.com>
To: Daniel.Muntz@emc.com
Cc: linux-nfs@vger.kernel.org, garth@panasas.com, welch@panasas.com,
nfsv4@ietf.org, andros@netapp.com, bhalevy@panasas.com
Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
Date: Wed, 07 Jul 2010 17:01:24 -0400 [thread overview]
Message-ID: <1278536484.12889.4.camel@heimdal.trondhjem.org> (raw)
In-Reply-To: <B9A709F368FAAF4DB4B33870F72A141D0106B6B0@CORPUSMX30A.corp.emc.com>
On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@emc.com wrote:
> To bring this discussion full circle, since we agree that a compliant
> server can implement a scheme where written data does not become visible
> until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a
> "MUST" from a compliant client (independent of layout type)?
Yes. I would agree that the client cannot rely on the updates being made
visible if it fails to send the LAYOUTCOMMIT. My point was simply that a
compliant server MUST also have a valid strategy for dealing with the
case where the client doesn't send it.
Cheers
Trond
> -Dan
>
> > -----Original Message-----
> > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org]
> > On Behalf Of Trond Myklebust
> > Sent: Wednesday, July 07, 2010 7:04 AM
> > To: Benny Halevy
> > Cc: andros@netapp.com; linux-nfs@vger.kernel.org; Garth
> > Gibson; Brent Welch; NFSv4
> > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> >
> > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
> > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust
> > <Trond.Myklebust@netapp.com> wrote:
> > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust
> > <trond.myklebust@fys.uio.no> wrote:
> > > >>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com wrote:
> > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS.
> > I see it as
> > > >>>>> orthogonal to updating the metadata on the MDS (but
> > perhaps I'm wrong).
> > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT
> > provides a synchronization
> > > >>>>> point, so even if the non-clustered server does not
> > want to update
> > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also
> > be a trigger to
> > > >>>>> execute whatever synchronization mechanism the
> > implementer wishes to put
> > > >>>>> in the control protocol.
> > > >>>>
> > > >>>> As far as I'm aware, there are no exceptions in
> > RFC5661 that would allow
> > > >>>> pNFS servers to break the rule that any visible change
> > to the data must
> > > >>>> be atomically accompanied with a change attribute update.
> > > >>>>
> > > >>>
> > > >>> Trond, I'm not sure how this rule you mentioned is specified.
> > > >>>
> > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT
> > and change/time_modify
> > > >>> in particular:
> > > >>>
> > > >>> For some layout protocols, the storage device is
> > able to notify the
> > > >>> metadata server of the occurrence of an I/O; as a
> > result, the change
> > > >>> and time_modify attributes may be updated at the
> > metadata server.
> > > >>> For a metadata server that is capable of monitoring
> > updates to the
> > > >>> change and time_modify attributes, LAYOUTCOMMIT
> > processing is not
> > > >>> required to update the change attribute. In this
> > case, the metadata
> > > >>> server must ensure that no further update to the
> > data has occurred
> > > >>> since the last update of the attributes; file-based
> > protocols may
> > > >>> have enough information to make this determination
> > or may update the
> > > >>> change attribute upon each file modification. This
> > also applies for
> > > >>> the time_modify attribute. If the server
> > implementation is able to
> > > >>> determine that the file has not been modified since the last
> > > >>> time_modify update, the server need not update time_modify at
> > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the
> > updated attributes
> > > >>> should be visible if that file was modified since
> > the latest previous
> > > >>> LAYOUTCOMMIT or LAYOUTGET
> > > >>
> > > >> I know. However the above paragraph does not state that
> > the server
> > > >> should make those changes visible to clients other than
> > the one that is
> > > >> writing.
> > > >>
> > > >> Section 18.32.4 states that writes will cause the
> > time_modified and
> > > >> change attributes to be updated (if and only if the file data is
> > > >> modified). Several other sections rely on this
> > behaviour, including
> > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> > > >>
> > > >> The only 'special behaviour' that I see allowed for pNFS
> > is in section
> > > >> 13.10, which states that clients can't expect to see changes
> > > >> immediately, but that they must be able to expect close-to-open
> > > >> semantics to work. Again, if this is to be the case,
> > then the server
> > > >> _must_ be able to deal with the case where client 1 dies
> > before it can
> > > >> issue the LAYOUTCOMMIT.
> > >
> > > Agreed.
> > >
> > > >>
> > > >>
> > > >>>> As I see it, if your server allows one client to read
> > data that may have
> > > >>>> been modified by another client that holds a WRITE
> > layout for that range
> > > >>>> then (since that is a visible data change) it should
> > provide a change
> > > >>>> attribute update irrespective of whether or not a
> > LAYOUTCOMMIT has been
> > > >>>> sent.
> > > >>>
> > > >>> the requirement for the server in WRITE's
> > implementation section
> > > >>> is quite weak: "It is assumed that the act of writing
> > data to a file will
> > > >>> cause the time_modified and change attributes of the
> > file to be updated."
> > > >>>
> > > >>> The difference here is that for pNFS the written data
> > is not guaranteed
> > > >>> to be visible until LAYOUTCOMMIT. In a broader sense,
> > assuming the clients
> > > >>> are caching dirty data and use a write-behind cache,
> > application-written data
> > > >>> may be visible to other processes on the same host but
> > not to others until
> > > >>> fsync() or close() - open-to-close semantics are the
> > only thing the client
> > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and
> > close() ensure the
> > > >>> data is committed to stable storage and is visible to
> > all other clients in
> > > >>> the cluster.
> > > >>
> > > >> See above. I'm not disputing your statement that 'the
> > written data is
> > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am
> > disputing an
> > > >> assumption that 'the written data may be visible without
> > an accompanying
> > > >> change attribute update'.
> > > >
> > > >
> > > > In other words, I'd expect the following scenario to give the same
> > > > results in NFSv4.1 w/pNFS as it does in NFSv4:
> > >
> > > That's a strong requirement that may limit the scalability
> > of the server.
> > >
> > > The spirit of the pNFS operations, at least from Panasas
> > perspective was that
> > > the data is transient until LAYOUTCOMMIT, meaning it may or
> > may not be visible
> > > to clients other than the one who wrote it, and its
> > associated metadata MUST
> > > be updated and describe the new data only on LAYOUTCOMMIT
> > and until then it's
> > > undefined, i.e. it's up to the server implementation
> > whether to update it or not.
> > >
> > > Without locking, what do the stronger semantics buy you?
> > > Even if a client verified the change_attribute new data may
> > become visible
> > > at any time after the GETATTR if the file/byte range aren't locked.
> >
> > There is no locking needed in the scenario below: it is ordinary
> > close-to-open semantics.
> >
> > The point is that if you remove the one and only way that clients have
> > to determine whether or not their data caches are valid, then they can
> > no longer cache data at all, and server scalability will be shot to
> > smithereens anyway.
> >
> > Trond
> >
> > > Benny
> > >
> > > >
> > > > Client 1 Client 2
> > > > ======== ========
> > > >
> > > > OPEN foo
> > > > READ
> > > > CLOSE
> > > > OPEN
> > > > LAYOUTGET ...
> > > > WRITE via DS
> > > > <dies>...
> > > > OPEN foo
> > > > verify change_attr
> > > > READ if above WRITE is visible
> > > > CLOSE
> > > >
> > > > Trond
> > > > _______________________________________________
> > > > nfsv4 mailing list
> > > > nfsv4@ietf.org
> > > > https://www.ietf.org/mailman/listinfo/nfsv4
> >
> >
> > _______________________________________________
> > nfsv4 mailing list
> > nfsv4@ietf.org
> > https://www.ietf.org/mailman/listinfo/nfsv4
> >
> >
_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4
next prev parent reply other threads:[~2010-07-07 21:01 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-01 23:47 4.1 client - LAYOUTCOMMIT Sandeep Joshi
2010-07-02 0:07 ` 4.1 client - LAYOUTCOMMIT & close Sandeep Joshi
[not found] ` <A062FCC8662DA848949F7C3046B9BEAE01F3A6EE-e1HlL03umel79urLq6li5IWksG4c/lV9Sp/tIRYA5EM@public.gmane.org>
2010-07-02 15:41 ` Andy Adamson
2010-07-02 17:08 ` 4.1 client - LAYOUTCOMMIT & close Suchit Kaura
[not found] ` <loom.20100702T190300-538-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
2010-07-06 13:12 ` Andy Adamson
2010-07-06 13:23 ` Benny Halevy
2010-07-02 21:46 ` 4.1 client - LAYOUTCOMMIT & close Daniel.Muntz
2010-07-06 13:35 ` Benny Halevy
2010-07-06 13:37 ` Andy Adamson
2010-07-06 14:04 ` Boaz Harrosh
2010-07-06 19:20 ` Daniel.Muntz
2010-07-06 20:40 ` Trond Myklebust
2010-07-06 22:50 ` Daniel.Muntz
2010-07-06 23:23 ` Trond Myklebust
2010-07-07 12:05 ` Benny Halevy
2010-07-07 13:06 ` Trond Myklebust
2010-07-07 13:18 ` [nfsv4] " Trond Myklebust
2010-07-07 13:51 ` Benny Halevy
2010-07-07 14:03 ` Trond Myklebust
2010-07-07 17:45 ` Dean Hildebrand
2010-07-07 20:39 ` Daniel.Muntz
2010-07-07 21:01 ` Trond Myklebust [this message]
2010-07-07 22:04 ` Noveck_David
2010-07-07 22:27 ` Trond Myklebust
2010-07-07 22:44 ` david.black
2010-07-07 22:52 ` Trond Myklebust
2010-07-07 23:09 ` Trond Myklebust
[not found] ` <1278544497.15524.17.camel@heimdal.trondhje! m .org>
[not found] ` < 4C35F5E3.3000604@panasas.com>
2010-07-07 23:14 ` Trond Myklebust
2010-07-08 15:59 ` Benny Halevy
2010-07-08 20:30 ` [nfsv4] " david.black
2010-07-08 21:16 ` Trond Myklebust
2010-07-08 23:51 ` Daniel.Muntz
[not found] ` <1278623771.13551.54.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2010-07-09 0:03 ` [nfsv4] " Sandeep Joshi
2010-07-08 22:12 ` sfaibish
2010-07-08 23:01 ` Tom Haynes
2010-07-08 23:57 ` sfaibish
2010-07-09 0:41 ` [nfsv4] " Trond Myklebust
2010-07-06 13:20 ` 4.1 client - LAYOUTCOMMIT Benny Halevy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1278536484.12889.4.camel@heimdal.trondhjem.org \
--to=trond.myklebust@netapp.com \
--cc=Daniel.Muntz@emc.com \
--cc=andros@netapp.com \
--cc=bhalevy@panasas.com \
--cc=garth@panasas.com \
--cc=linux-nfs@vger.kernel.org \
--cc=nfsv4@ietf.org \
--cc=welch@panasas.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).