* file size and getattr [not found] ` <CADaq8je2Ap4oAZAPOguaWctUs8dQ=Q9g=TO39EOiJW5EJSFTGg@mail.gmail.com> @ 2019-02-22 23:25 ` Marc Eshel [not found] ` <CADaq8jfT=nxmVPz4mDRFAaOx36uxWqKbxitqS-YH3GCOs4CsrQ@mail.gmail.com> 0 siblings, 1 reply; 9+ messages in thread From: Marc Eshel @ 2019-02-22 23:25 UTC (permalink / raw) To: David Noveck; +Cc: NFSv4, linux-nfs, linux-nfs-owner, 'Ganesha-devel' What is the file size returned by the NFS server for getattr after an unstable write to the NFS client that did the write or to other NFS clients. As far as I know most file systems will always return the file size that includes the unstable writes. Does the NFSv4 spec allows the server to return file size that doesn't include the unstable write to the writer or any other NFS client? Marc. ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CADaq8jfT=nxmVPz4mDRFAaOx36uxWqKbxitqS-YH3GCOs4CsrQ@mail.gmail.com>]
* Re: [nfsv4] file size and getattr [not found] ` <CADaq8jfT=nxmVPz4mDRFAaOx36uxWqKbxitqS-YH3GCOs4CsrQ@mail.gmail.com> @ 2019-02-25 8:49 ` Mkrtchyan, Tigran 2019-02-26 2:54 ` Rick Macklem [not found] ` <CADaq8jdp7qVOE9aKui-tgrk00hVmNVXP=yjQ_G-RtGm+G4VrYw@mail.gmail.com> 0 siblings, 2 replies; 9+ messages in thread From: Mkrtchyan, Tigran @ 2019-02-25 8:49 UTC (permalink / raw) To: Dave Noveck; +Cc: Marc Eshel, linux-nfs, Ganesha-devel, NFSv4, linux-nfs-owner ----- Original Message ----- > From: "Dave Noveck" <davenoveck@gmail.com> > To: "Marc Eshel" <eshel@us.ibm.com> > Cc: "linux-nfs" <linux-nfs@vger.kernel.org>, "Ganesha-devel" <devel@lists.nfs-ganesha.org>, "NFSv4" <nfsv4@ietf.org>, > "linux-nfs-owner" <linux-nfs-owner@vger.kernel.org> > Sent: Saturday, February 23, 2019 12:38:56 AM > Subject: Re: [nfsv4] file size and getattr >> Does the NFSv4 spec allows the server to return file size that doesn't >> include the unstable write to the writer or any other NFS client? > > I would say "no". Consider the followiing sentence in the description of > COMMIT. > I would say "yes". Let's have a look at LAYOUTCOMMIT. Spec says: "The metadata server may use this information to determine whether the file's size needs to be updated. If the metadata server updates the file's size as the result of the LAYOUTCOMMIT operation, it must return the new size (locr_newsize.ns_size) as part of the results." I read this as: Metadata server may not know real file size until client sends the file size information. Our DSes return DATA_SYNC and relay on client to issue LAYOUTCOMMIT to update file size (well, on close we force data servers up update file anyway). Tigran. >> If the count is zero, a flush from the offset to the end of the file is > done. > > Note that he size returned by GETATTR gives you the end of the file so > that, if it did not > reflect the unstable writes, COMMIT wouldn't work right. > > On Fri, Feb 22, 2019 at 6:25 PM Marc Eshel <eshel@us.ibm.com> wrote: > >> What is the file size returned by the NFS server for getattr after an >> unstable write to the NFS client that did the write or to other NFS >> clients. >> >> As far as I know most file systems will always return the file size that >> includes the unstable writes. >> >> Does the NFSv4 spec allows the server to return file size that doesn't >> include the unstable write to the writer or any other NFS client? >> >> Marc. >> >> > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [nfsv4] file size and getattr 2019-02-25 8:49 ` [nfsv4] " Mkrtchyan, Tigran @ 2019-02-26 2:54 ` Rick Macklem 2019-02-26 3:23 ` Rick Macklem [not found] ` <CADaq8jdp7qVOE9aKui-tgrk00hVmNVXP=yjQ_G-RtGm+G4VrYw@mail.gmail.com> 1 sibling, 1 reply; 9+ messages in thread From: Rick Macklem @ 2019-02-26 2:54 UTC (permalink / raw) To: Mkrtchyan, Tigran, Dave Noveck Cc: Marc Eshel, linux-nfs, Ganesha-devel, NFSv4, linux-nfs-owner Mkrtchyan, Tigran wrote: >> From: "Dave Noveck" <davenoveck@gmail.com> >> To: "Marc Eshel" <eshel@us.ibm.com> >> Cc: "linux-nfs" <linux-nfs@vger.kernel.org>, "Ganesha-devel" <devel@lists.nfs-ganesha.org>, "NFSv4" <nfsv4@ietf.org>, >> "linux-nfs-owner" <linux-nfs-owner@vger.kernel.org> >> Sent: Saturday, February 23, 2019 12:38:56 AM >> Subject: Re: [nfsv4] file size and getattr >> >>> Does the NFSv4 spec allows the server to return file size that doesn't >>> include the unstable write to the writer or any other NFS client? >> >> I would say "no". Consider the followiing sentence in the description of >> COMMIT. >> > >I would say "yes". Let's have a look at LAYOUTCOMMIT. Spec says: > >"The metadata server may > use this information to determine whether the file's size needs to be > updated. If the metadata server updates the file's size as the > result of the LAYOUTCOMMIT operation, it must return the new size > (locr_newsize.ns_size) as part of the results." > >I read this as: Metadata server may not know real file size until client sends >the file size information. Our DSes return DATA_SYNC and relay on client to >issue LAYOUTCOMMIT to update file size (well, on close we force data servers >up update file anyway). For the pNFS case, when I implemented a pNFS server for FreeBSD, I assumed that the Metadata server only needed to return a correct size for a file to a client doing a Getattr with a read/write layout after it did a LayoutCommit. This implementation worked ok for the FreeBSD client, but did not work correctly for the Linux-4.17-rc2 kernel. To fix the server implementation to interoperate with the above Linux kernel, I had to add a tunable that makes the server get an up to date size for the Getattr operation for a client when a read/write layout is issued to the client. Just to be clear, I am referring to the case where the Getattr was done by the client that holds the read/write layout and not another client. (This does result in additional overheads whenever a client holds a read/write layout for the file.) I can't remember exactly how the Linux client failed, but I suspect it would see a premature EOF when the size returned by the MDS was smaller than the actual size of the file on the DS. I honestly don't know if this is a bug in the Linux client or a misinterpretation of RFC5661? rick ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [nfsv4] file size and getattr 2019-02-26 2:54 ` Rick Macklem @ 2019-02-26 3:23 ` Rick Macklem 2019-02-26 12:48 ` Trond Myklebust 0 siblings, 1 reply; 9+ messages in thread From: Rick Macklem @ 2019-02-26 3:23 UTC (permalink / raw) To: Mkrtchyan, Tigran, Dave Noveck Cc: linux-nfs-owner, linux-nfs, NFSv4, Ganesha-devel I wrote: [stuff snipped] >For the pNFS case, when I implemented a pNFS server for FreeBSD, I assumed >that the Metadata server only needed to return a correct size for a file >to a client doing a Getattr with a read/write layout after it did a LayoutCommit. >This implementation worked ok for the FreeBSD client, but did not work >correctly for the Linux-4.17-rc2 kernel. >To fix the server implementation to interoperate with the above Linux kernel, >I had to add a tunable that makes the server get an up to date size for the >Getattr operation for a client when a read/write layout is issued to the client. >Just to be clear, I am referring to the case where the Getattr was done by the >client that holds the read/write layout and not another client. >(This does result in additional overheads whenever a client holds a read/write > layout for the file.) >I can't remember exactly how the Linux client failed, but I suspect it would >see a premature EOF when the size returned by the MDS was smaller than >the actual size of the file on the DS. > >I honestly don't know if this is a bug in the Linux client or a misinterpretation >of RFC5661? A little more info on this. If I recall correctly, the Linux client only did LayoutCommits after Commits. As such, no LayoutCommit were done after writes to a DS that were FILE_SYNC (or stable, if you prefer). Without the LayoutCommit, the pNFS server didn't know when it needed to get an up to date size. The FreeBSD client does a LayoutCommit after writing to a DS whenever it is done writing, such as an fsync() or close() syscall on the file or an unlock of a write lock on the file. rick ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [nfsv4] file size and getattr 2019-02-26 3:23 ` Rick Macklem @ 2019-02-26 12:48 ` Trond Myklebust 2019-02-27 0:13 ` Rick Macklem 0 siblings, 1 reply; 9+ messages in thread From: Trond Myklebust @ 2019-02-26 12:48 UTC (permalink / raw) To: davenoveck@gmail.com, tigran.mkrtchyan@desy.de, rmacklem@uoguelph.ca Cc: devel@lists.nfs-ganesha.org, linux-nfs-owner@vger.kernel.org, linux-nfs@vger.kernel.org, nfsv4@ietf.org Hi Rick, On Tue, 2019-02-26 at 03:23 +0000, Rick Macklem wrote: > > If I recall correctly, the Linux client only did LayoutCommits after > Commits. > As such, no LayoutCommit were done after writes to a DS that were > FILE_SYNC > (or stable, if you prefer). Without the LayoutCommit, the pNFS server > didn't > know when it needed to get an up to date size. > The FreeBSD client does a LayoutCommit after writing to a DS whenever > it is done > writing, such as an fsync() or close() syscall on the file or an > unlock of a write lock > on the file. > Please see the Errata ID 2751 http://www.rfc-editor.org/errata/eid2751 Cheers Trond -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [nfsv4] file size and getattr 2019-02-26 12:48 ` Trond Myklebust @ 2019-02-27 0:13 ` Rick Macklem 2019-02-27 1:12 ` Trond Myklebust 0 siblings, 1 reply; 9+ messages in thread From: Rick Macklem @ 2019-02-27 0:13 UTC (permalink / raw) To: Trond Myklebust, davenoveck@gmail.com, tigran.mkrtchyan@desy.de Cc: devel@lists.nfs-ganesha.org, linux-nfs-owner@vger.kernel.org, linux-nfs@vger.kernel.org, nfsv4@ietf.org Trond Myklebust wrote: [stuff snipped] > Please see the Errata ID 2751 http://www.rfc-editor.org/errata/eid2751 I'll admit I hadn't seen this errata before. However, it seems to be specific to the File Layout. For the Flexible File Layout... When I look in RFC-8435, I cannot find anything that states that a LayoutCommit is only required for case(s) where a Commit to the Storage Server is required. Sec. 2.1 Clearly states that a Commit to the Storage Server is required before the client does a LayoutCommit when the write(s) were not done FILE_SYNC. However, I do not see any indication that the LayoutCommit is not to be done for the case where the write(s) are done FILE_SYNC. FF_FLAGS_NO_LAYOUTCOMMIT can be used to indicate to a client that LayoutCommits are not required, but this does not be dependent on how the write(s) to the Storage Server were done. The only way a Flexible File layout Metadata server can know what the current file size is (when a read/write layout is issued to a client) is to do a Getattr to the Storage Server. If a client is not required to do a LayoutCommit when the write(s) to the Storage Server are done FILE_SYNC, then the Metadata server must do Getattr RPCs to the Storage Server whenever it needs an up to date file size if a read/write layout is issued to a client. This can result in a lot of overhead that can be avoided by requiring the LayoutCommit to be done by a client after writing to a Storage Server, irrespective of the need for a Commit to the Storage Server. As such, I would rather not have this errata applied to RFC-8435. rick ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [nfsv4] file size and getattr 2019-02-27 0:13 ` Rick Macklem @ 2019-02-27 1:12 ` Trond Myklebust [not found] ` <CADaq8jfcFCuM5u30bXqTvg+0Hc-V+1VO9aUdSJLA090j48LCvA@mail.gmail.com> 0 siblings, 1 reply; 9+ messages in thread From: Trond Myklebust @ 2019-02-27 1:12 UTC (permalink / raw) To: davenoveck@gmail.com, tigran.mkrtchyan@desy.de, rmacklem@uoguelph.ca Cc: devel@lists.nfs-ganesha.org, linux-nfs-owner@vger.kernel.org, linux-nfs@vger.kernel.org, nfsv4@ietf.org On Wed, 2019-02-27 at 00:13 +0000, Rick Macklem wrote: > Trond Myklebust wrote: > [stuff snipped] > > Please see the Errata ID 2751 > > http://www.rfc-editor.org/errata/eid2751 > > I'll admit I hadn't seen this errata before. However, it seems to be > specific to > the File Layout. For the Flexible File Layout... > > When I look in RFC-8435, I cannot find anything that states that a > LayoutCommit > is only required for case(s) where a Commit to the Storage Server is > required. > Sec. 2.1 > Clearly states that a Commit to the Storage Server is required > before the client > does a LayoutCommit when the write(s) were not done FILE_SYNC. > However, I do not see any indication that the LayoutCommit is not > to be done > for the case where the write(s) are done FILE_SYNC. > > FF_FLAGS_NO_LAYOUTCOMMIT can be used to indicate to a client that > LayoutCommits are not required, but this does not be dependent on how > the write(s) to the Storage Server were done. > > The only way a Flexible File layout Metadata server can know what the > current file size is (when a read/write layout is issued to a client) > is to do a > Getattr to the Storage Server. > If a client is not required to do a LayoutCommit when the write(s) to > the > Storage Server are done FILE_SYNC, then the Metadata server must do > Getattr RPCs to the Storage Server whenever it needs an up to date > file size > if a read/write layout is issued to a client. > > This can result in a lot of overhead that can be avoided by requiring > the > LayoutCommit to be done by a client after writing to a Storage > Server, > irrespective of the need for a Commit to the Storage Server. > As such, I would rather not have this errata applied to RFC-8435. > Fair enough. I agree that the errata in question only applies to the pNFS files layout, however you were talking about RFC5661 and whether or not we were interpreting that correctly. Since RFC5661 only refers to about the behaviour of the pNFS files layout, then I assumed that was what you were referring to. For flexfiles we may have a bug in the layoutcommit case. However note that the counter argument to what you state above is that _if_ the server requires a layoutcommit before it will acknowledge a file size change, then pNFS is likely to underperform for applications such as databases or VMs where each record is required to be written in stable mode. IOW: If all writes that need to be stable are also required to be acknowledged with a layoutcommit (to the MDS), then your ability to scale out your server will be in doubt. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CADaq8jfcFCuM5u30bXqTvg+0Hc-V+1VO9aUdSJLA090j48LCvA@mail.gmail.com>]
* Re: [nfsv4] file size and getattr [not found] ` <CADaq8jfcFCuM5u30bXqTvg+0Hc-V+1VO9aUdSJLA090j48LCvA@mail.gmail.com> @ 2019-02-27 18:05 ` Trond Myklebust 0 siblings, 0 replies; 9+ messages in thread From: Trond Myklebust @ 2019-02-27 18:05 UTC (permalink / raw) To: davenoveck@gmail.com Cc: nfsv4@ietf.org, devel@lists.nfs-ganesha.org, linux-nfs-owner@vger.kernel.org, tigran.mkrtchyan@desy.de, linux-nfs@vger.kernel.org, rmacklem@uoguelph.ca On Wed, 2019-02-27 at 12:52 -0500, David Noveck wrote: > > However note > > that the counter argument to what you state above is that _if_ the > > server requires a layoutcommit before it will acknowledge a file > size > > change, then pNFS is likely to underperform for applications such > as > > databases or VMs where each record is required to be written in > stable > > mode. > > IOW: If all writes that need to be stable are also required to be > > acknowledged with a layoutcommit (to the MDS), > > But it is not true that *all* writes that need to be stable are also > required > to be acknowledged with a layoutcommit (to the MDS. Only those that > potentially change the file size require this. That's true for POSIX O_DSYNC writes, but it is not true for O_SYNC. In the latter case, the timestamps are required to be updated synchronously as well, which implies a layoutcommit. > > > then your ability to > > scale out your server will be in doubt > > For many applications, particularly databases, it will easy to make > sure > that the writes that potentially change the file size are few and far > between. If the database uses O_DSYNC, yes. > > On Tue, Feb 26, 2019 at 8:12 PM Trond Myklebust < > trondmy@hammerspace.com> wrote: > > On Wed, 2019-02-27 at 00:13 +0000, Rick Macklem wrote: > > > Trond Myklebust wrote: > > > [stuff snipped] > > > > Please see the Errata ID 2751 > > > > http://www.rfc-editor.org/errata/eid2751 > > > > > > I'll admit I hadn't seen this errata before. However, it seems to > > be > > > specific to > > > the File Layout. For the Flexible File Layout... > > > > > > When I look in RFC-8435, I cannot find anything that states that > > a > > > LayoutCommit > > > is only required for case(s) where a Commit to the Storage Server > > is > > > required. > > > Sec. 2.1 > > > Clearly states that a Commit to the Storage Server is required > > > before the client > > > does a LayoutCommit when the write(s) were not done FILE_SYNC. > > > However, I do not see any indication that the LayoutCommit is > > not > > > to be done > > > for the case where the write(s) are done FILE_SYNC. > > > > > > FF_FLAGS_NO_LAYOUTCOMMIT can be used to indicate to a client that > > > LayoutCommits are not required, but this does not be dependent on > > how > > > the write(s) to the Storage Server were done. > > > > > > The only way a Flexible File layout Metadata server can know what > > the > > > current file size is (when a read/write layout is issued to a > > client) > > > is to do a > > > Getattr to the Storage Server. > > > If a client is not required to do a LayoutCommit when the > > write(s) to > > > the > > > Storage Server are done FILE_SYNC, then the Metadata server must > > do > > > Getattr RPCs to the Storage Server whenever it needs an up to > > date > > > file size > > > if a read/write layout is issued to a client. > > > > > > This can result in a lot of overhead that can be avoided by > > requiring > > > the > > > LayoutCommit to be done by a client after writing to a Storage > > > Server, > > > irrespective of the need for a Commit to the Storage Server. > > > As such, I would rather not have this errata applied to RFC-8435. > > > > > > > Fair enough. I agree that the errata in question only applies to > > the > > pNFS files layout, however you were talking about RFC5661 and > > whether > > or not we were interpreting that correctly. Since RFC5661 only > > refers > > to about the behaviour of the pNFS files layout, then I assumed > > that > > was what you were referring to. > > > > For flexfiles we may have a bug in the layoutcommit case. However > > note > > that the counter argument to what you state above is that _if_ the > > server requires a layoutcommit before it will acknowledge a file > > size > > change, then pNFS is likely to underperform for applications such > > as > > databases or VMs where each record is required to be written in > > stable > > mode. > > IOW: If all writes that need to be stable are also required to be > > acknowledged with a layoutcommit (to the MDS), then your ability to > > scale out your server will be in doubt. > > -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CADaq8jdp7qVOE9aKui-tgrk00hVmNVXP=yjQ_G-RtGm+G4VrYw@mail.gmail.com>]
* Re: [nfsv4] file size and getattr [not found] ` <CADaq8jdp7qVOE9aKui-tgrk00hVmNVXP=yjQ_G-RtGm+G4VrYw@mail.gmail.com> @ 2019-02-26 19:46 ` J. Bruce Fields 0 siblings, 0 replies; 9+ messages in thread From: J. Bruce Fields @ 2019-02-26 19:46 UTC (permalink / raw) To: David Noveck Cc: Mkrtchyan, Tigran, linux-nfs-owner, linux-nfs, NFSv4, Ganesha-devel On Mon, Feb 25, 2019 at 10:50:21AM -0500, David Noveck wrote: > For example, if you (or another client) read data written unstably, > and there was been no restart you will see bytes past the eof as known > by the metadata server. Possibly worth noting that if there *is* a restart, all bets are off. And how do you know there hasn't been a restart? Or if you can figure that out (e.g. if you're using sessions), how do you know when clients have finished retransmitting unstable write data? I'd be curious what prompted the original question. --b. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-02-27 18:05 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <155049372736.14318.3390584694682770373.idtracker@ietfa.amsl.com>
[not found] ` <CADaq8je2Ap4oAZAPOguaWctUs8dQ=Q9g=TO39EOiJW5EJSFTGg@mail.gmail.com>
2019-02-22 23:25 ` file size and getattr Marc Eshel
[not found] ` <CADaq8jfT=nxmVPz4mDRFAaOx36uxWqKbxitqS-YH3GCOs4CsrQ@mail.gmail.com>
2019-02-25 8:49 ` [nfsv4] " Mkrtchyan, Tigran
2019-02-26 2:54 ` Rick Macklem
2019-02-26 3:23 ` Rick Macklem
2019-02-26 12:48 ` Trond Myklebust
2019-02-27 0:13 ` Rick Macklem
2019-02-27 1:12 ` Trond Myklebust
[not found] ` <CADaq8jfcFCuM5u30bXqTvg+0Hc-V+1VO9aUdSJLA090j48LCvA@mail.gmail.com>
2019-02-27 18:05 ` Trond Myklebust
[not found] ` <CADaq8jdp7qVOE9aKui-tgrk00hVmNVXP=yjQ_G-RtGm+G4VrYw@mail.gmail.com>
2019-02-26 19:46 ` J. Bruce Fields
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox