From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sage Weil Subject: Re: NFSv4/pNFS possible POSIX I/O API standards Date: Sun, 3 Dec 2006 08:10:23 -0800 (PST) Message-ID: References: <20061128055428.GA29891@infradead.org> <20061129090450.GA16296@infradead.org> <20061129094815.GE6429@schatzie.adilger.int> <1164795522.7557.45.camel@imp.csi.cam.ac.uk> <20061129082622.GA20285@cynthia.pants.nu> <20061130092548.GA1534@infradead.org> <1164950795.5761.25.camel@lade.trondhjem.org> <1164984094.5761.86.camel@lade.trondhjem.org> <20061203015203.GA5656@schatzie.adilger.int> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Trond Myklebust , Christoph Hellwig , Brad Boyer , Anton Altaparmakov , Gary Grider , linux-fsdevel@vger.kernel.org Return-path: Received: from pochacco.sd.dreamhost.com ([66.33.201.150]:20362 "EHLO pochacco.sd.dreamhost.com") by vger.kernel.org with ESMTP id S1754505AbWLCQKc (ORCPT ); Sun, 3 Dec 2006 11:10:32 -0500 To: Andreas Dilger In-Reply-To: <20061203015203.GA5656@schatzie.adilger.int> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Sat, 2 Dec 2006, Andreas Dilger wrote: > Just to be clear, I have no desire to include any kind of > "synchronization" semantics to readdirplus() that is also being discussed > in this thread. Just the ability to bundle select stat info along with > the readdir information, and to allow stat to not return any unnecessary > info (in particular size, blocks, mtime) that may be harder to gather > on a clustered filesystem. I'm not suggesting any "synchronization" beyond what opendir()/readdir() already require for the directory entries themselves. If I'm not mistaken, readdir() is required only to return directory entries as recent as the opendir() (i.e., you shouldn't see entries that were unlink()ed before you called opendir(), and intervening changes to the directory may or may not be reflected in the result, depending on how your implementation is buffering things). I would think the stat() portion of readdirplus() would be similarly (in)consistent (i.e., return a value at least as recent as the opendir()) to make life easy for the implementation and to align with existing readdir() semantics. My only concern is the "at least as recent as the opendir()" part, in contrast to statlite(), which has undefined "recentness" of its result for fields not specified in the mask. Ideally, I'd like to see readdirplus() also take a statlite() style mask, so that you can choose between either "vaguely recent" and "at least as recent as opendir()". As you mentioned, by the time you look at the result of any call (in the absence of locking) it may be out of date. But simply establishing an ordering is useful, especially in a clustered environment where some nodes are waiting for other nodes (via barriers or whatever) and then want to see the effects of previously completed fs operations. Anyway, "synchronization" semantics aside (since I appear to be somewhat alone on this :)... I'm wondering if a corresponding opendirplus() (or similar) would also be appropriate to inform the kernel/filesystem that readdirplus() will follow, and stat information should be gathered/buffered. Or do most implementations wait for the first readdir() before doing any actual work anyway? sage