From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rob Ross Subject: Re: NFSv4/pNFS possible POSIX I/O API standards Date: Tue, 05 Dec 2006 16:11:16 -0600 Message-ID: <4575EE84.6070604@mcs.anl.gov> References: <1164950795.5761.25.camel@lade.trondhjem.org> <1164984094.5761.86.camel@lade.trondhjem.org> <20061203015203.GA5656@schatzie.adilger.int> <20061204073200.GB5637@schatzie.adilger.int> <1165245336.711.176.camel@lade.trondhjem.org> <4574C48A.8030007@mcs.anl.gov> <1165298200.5776.26.camel@lade.trondhjem.org> <20061205100748.GC5871@infradead.org> <1165330516.5742.24.camel@lade.trondhjem.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Christoph Hellwig , Andreas Dilger , Sage Weil , Brad Boyer , Anton Altaparmakov , Gary Grider , linux-fsdevel@vger.kernel.org Return-path: Received: from mailgw.mcs.anl.gov ([140.221.9.4]:36756 "EHLO mailgw.mcs.anl.gov" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758562AbWLEWLV (ORCPT ); Tue, 5 Dec 2006 17:11:21 -0500 To: Trond Myklebust In-Reply-To: <1165330516.5742.24.camel@lade.trondhjem.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Trond Myklebust wrote: > On Tue, 2006-12-05 at 10:07 +0000, Christoph Hellwig wrote: >>> ...and we have pointed out how nicely this ignores the realities of >>> current caching models. There is no need for a readdirplus() system >>> call. There may be a need for a caching barrier, but AFAICS that is all. >> I think Andreas mentioned that it is useful for clustered filesystems >> that can avoid additional roundtrips this way. That alone might now >> be enough reason for API additions, though. The again statlite and >> readdirplus really are the most sane bits of these proposals as they >> fit nicely into the existing set of APIs. The filehandle idiocy on >> the other hand is way of into crackpipe land. > > They provide no benefits whatsoever for the two most commonly used > networked filesystems NFS and CIFS. As far as they are concerned, the > only new thing added by readdirplus() is the caching barrier semantics. > I don't see why you would want to add that into a generic syscall like > readdir() though: it is > > a) networked filesystem specific. The mask stuff etc adds no > value whatsoever to actual "posix" filesystems. In fact it is > telling the kernel that it can violate posix semantics. It isn't violating POSIX semantics if we get the calls passed as an extension to POSIX :). > b) quite unnatural to impose caching semantics on all the > directory _entries_ using a syscall that refers to the directory > itself (see the explanations by both myself and Peter Staubach > of the synchronisation difficulties). Consider in particular > that it is quite possible for directory contents to change in > between readdirplus calls. I want to make sure that I understand this correctly. NFS semantics dictate that if someone stat()s a file that all changes from that client need to be propagated to the server? And this call complicates that semantic because now there's an operation on a different object (the directory) that would cause this flush on the files? Of course directory contents can change in between readdirplus() calls, just as they can between readdir() calls. That's expected, and we do not attempt to create consistency between calls. > i.e. the "strict posix caching model' is pretty much impossible > to implement on something like NFS or CIFS using these > semantics. Why then even bother to have "masks" to tell you when > it is OK to violate said strict model. We're trying to obtain improved performance for distributed file systems with stronger consistency guarantees than these two. > c) Says nothing about what should happen to non-stat() metadata > such as ACL information and other extended attributes (for > example future selinux context info). You would think that the > 'ls -l' application would care about this. Honestly, we hadn't thought about other non-stat() metadata because we didn't think it was part of the use case, and we were trying to stay close to the flavor of POSIX. If you have ideas here, we'd like to hear them. Thanks for the comments, Rob