From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Barton Date: Wed, 20 Oct 2010 15:30:37 +0200 Subject: [Lustre-devel] Queries regarding LDLM_ENQUEUE In-Reply-To: <4CBEA8A9.9080802@gmail.com> References: <4CBEA415.80307@gmail.com> <9C26CBA7-8DBD-4875-8E14-FB663B749096@oracle.com> <4CBEA8A9.9080802@gmail.com> Message-ID: <00d001cb705a$fd64cb80$f82e6280$@com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: lustre-devel@lists.lustre.org I do like the idea of a collective open, but I'm wondering if it can be implemented simply enough to be worth the effort. True, it avoids the O(n) load on the server of all the clients (re)populating their namespace caches, but it's only useful for parallel jobs - a scale-out NAS style workload can't benefit. Ultimately the O(n) will have to be replaced with something that scales O(log n) (e.g. with a fat tree of caching proxy servers). > On 10/20/10 12:24 PM, Andreas Dilger wrote: > > I'm reluctant to expose the whole FID namespace to applications, ??? It can just be opaque bytes to the app. > > since this completely bypasses all directory permissions and allows > > opening files only based on their inode permissions. If we require a > > name_to_handle() syscall to succeed first, before allowing > > open_by_handle() to work, then at least we know that one of the > > involved processes was able to do a full path traversal. I think this defeats the scalability objective - we trying to avoid having to pull the namespace into every client aren't we? > yes, this is a good point. can be solved if you use FID + > capability/signature ? Yes, I think capabilities are the only way collective open can be made secure "properly". And given the way we believe capabilities have to be implemented for scalability (i.e. to keep the capability cache down to a reasonable size on the server) any open by one node in a given client cluster may well have to confer the right to use the FID by any of its peers. > >> another idea was to do whole path traversal on MDS within a single > >> RPC. bug that'd require amount of changes to llite and/or VFS and > >> keep MDS a bottleneck. That's an optimization rather than a scalability feature. How much does it complicate the code? I'd hate to see something new tricky and fragile complicate further development. Cheers, Eric