From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matthew Wilcox Subject: Re: NFSv4/pNFS possible POSIX I/O API standards Date: Tue, 5 Dec 2006 07:20:02 -0700 Message-ID: <20061205142002.GN3013@parisc-linux.org> References: <1164950795.5761.25.camel@lade.trondhjem.org> <1164984094.5761.86.camel@lade.trondhjem.org> <20061203015203.GA5656@schatzie.adilger.int> <20061204073200.GB5637@schatzie.adilger.int> <1165245336.711.176.camel@lade.trondhjem.org> <4574C48A.8030007@mcs.anl.gov> <1165298200.5776.26.camel@lade.trondhjem.org> <20061205100748.GC5871@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Trond Myklebust , Rob Ross , Andreas Dilger , Sage Weil , Brad Boyer , Anton Altaparmakov , Gary Grider , linux-fsdevel@vger.kernel.org Return-path: Received: from palinux.external.hp.com ([192.25.206.14]:49374 "EHLO mail.parisc-linux.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760158AbWLEOUE (ORCPT ); Tue, 5 Dec 2006 09:20:04 -0500 To: Christoph Hellwig Content-Disposition: inline In-Reply-To: <20061205100748.GC5871@infradead.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Tue, Dec 05, 2006 at 10:07:48AM +0000, Christoph Hellwig wrote: > The filehandle idiocy on the other hand is way of into crackpipe land. Right, and it needs to be discarded. Of course, there was a real problem that it addressed, so we need to come up with an acceptable alternative. The scenario is a cluster-wide application doing simultaneous opens of the same file. So thousands of nodes all hitting the same DLM locks (for read) all at once. The openg() non-solution implies that all nodes in the cluster share the same filehandle space, so I think a reasonable solution can be implemented entirely within the clusterfs, with an extra flag to open(), say O_CLUSTER_WIDE. When the clusterfs sees this flag set (in ->lookup), it can treat it as a hint that this pathname component is likely to be opened again on other nodes and broadcast that fact to the other nodes within the cluster. Other nodes on seeing that hint (which could be structured as "The child "bin" of filehandle e62438630ca37539c8cc1553710bbfaa3cf960a7 has filehandle ff51a98799931256b555446b2f5675db08de6229") can keep a record of that fact. When they see their own open, they can populate the path to that file without asking the server for extra metadata. There's obviously security issues there (why I say 'hint' rather than 'command'), but there's also security problems with open-by-filehandle. Note that this solution requires no syscall changes, no application changes, and also helps a scenario where each node opens a different file in the same directory. I've never worked on a clusterfs, so there may be some gotchas (eg, how do you invalidate the caches of nodes when you do a rename). But this has to be preferable to open-by-fh.