From: Paul Nowoczynski <pauln@psc.edu>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] Queries regarding LDLM_ENQUEUE
Date: Wed, 20 Oct 2010 10:51:06 -0400 [thread overview]
Message-ID: <4CBF01DA.3090505@psc.edu> (raw)
In-Reply-To: <00d001cb705a$fd64cb80$f82e6280$@com>
Eric Barton wrote:
> I do like the idea of a collective open, but I'm wondering if it can be
> implemented simply enough to be worth the effort. True, it avoids the O(n)
> load on the server of all the clients (re)populating their namespace
> caches, but it's only useful for parallel jobs - a scale-out NAS style
> workload can't benefit. Ultimately the O(n) will have to be replaced with
> something that scales O(log n) (e.g. with a fat tree of caching proxy
> servers).
Eric makes a good point in that only parallel jobs really need this
feature. Unfortunately, at scale the system (both clients and servers)
*really do* need something like this, especially if we continue pushing
users to perform N-1 file I/O instead of 'file per process'. I too am in
agreement that some sort of capability mechanism is the best approach. I
wonder if this is something that could be done outside of POSIX and
supported through a parallel I/O library? Perhaps a single application
threads could make a special open call (/proc magic perhaps?) and obtain
the glob of opaque bytes which are then broadcast to the rest of the
client via mpi. Traversing the namespace would be avoided on all but one
client. In such a scenario I don't feel that enforcing unix permissions
at every level of the path is needed or sensible, the operation should
be treated as a simple logical open. The question to the lustre experts
- can enough state be packed into an opaque object such that the
recv'ing client can construct the necessary cache state?
>
>> On 10/20/10 12:24 PM, Andreas Dilger wrote:
>>> I'm reluctant to expose the whole FID namespace to applications,
>
> ??? It can just be opaque bytes to the app.
>
>>> since this completely bypasses all directory permissions and allows
>>> opening files only based on their inode permissions. If we require a
>>> name_to_handle() syscall to succeed first, before allowing
>>> open_by_handle() to work, then at least we know that one of the
>>> involved processes was able to do a full path traversal.
>
> I think this defeats the scalability objective - we trying to avoid having
> to pull the namespace into every client aren't we?
>
>> yes, this is a good point. can be solved if you use FID +
>> capability/signature ?
>
> Yes, I think capabilities are the only way collective open can be made
> secure "properly". And given the way we believe capabilities have to be
> implemented for scalability (i.e. to keep the capability cache down to a
> reasonable size on the server) any open by one node in a given client
> cluster may well have to confer the right to use the FID by any of its
> peers.
>
>>>> another idea was to do whole path traversal on MDS within a single
>>>> RPC. bug that'd require amount of changes to llite and/or VFS and
>>>> keep MDS a bottleneck.
>
> That's an optimization rather than a scalability feature. How much does
> it complicate the code? I'd hate to see something new tricky and fragile
> complicate further development.
>
> Cheers,
> Eric
>
>
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
next prev parent reply other threads:[~2010-10-20 14:51 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-18 23:33 [Lustre-devel] Queries regarding LDLM_ENQUEUE Vilobh Meshram
2010-10-19 15:46 ` Fan Yong
2010-10-19 20:28 ` Vilobh Meshram
2010-10-19 22:53 ` Andreas Dilger
2010-10-20 2:04 ` Vilobh Meshram
2010-10-20 7:55 ` Andreas Dilger
2010-10-20 8:11 ` bzzz.tomas at gmail.com
2010-10-20 8:24 ` Andreas Dilger
2010-10-20 8:30 ` bzzz.tomas at gmail.com
2010-10-20 8:38 ` Nikita Danilov
2010-10-20 14:45 ` Nicolas Williams
2010-10-20 13:30 ` Eric Barton
2010-10-20 13:40 ` bzzz.tomas at gmail.com
2010-10-20 14:51 ` Paul Nowoczynski [this message]
2010-10-20 14:55 ` Nicolas Williams
2010-10-20 15:16 ` Paul Nowoczynski
2010-10-20 16:07 ` Andreas Dilger
2010-10-20 15:22 ` bzzz.tomas at gmail.com
2010-10-20 16:43 ` Paul Nowoczynski
2010-10-20 16:49 ` bzzz.tomas at gmail.com
2010-10-20 17:11 ` Paul Nowoczynski
2010-10-20 17:18 ` bzzz.tomas at gmail.com
2010-10-20 17:25 ` Paul Nowoczynski
2010-10-20 17:27 ` Andreas Dilger
2010-10-20 17:29 ` Nicolas Williams
2010-10-20 17:40 ` bzzz.tomas at gmail.com
2010-10-20 18:01 ` Andreas Dilger
2010-10-20 18:09 ` bzzz.tomas at gmail.com
2010-10-20 16:35 ` Andreas Dilger
2010-10-20 16:46 ` Paul Nowoczynski
2010-10-20 17:00 ` Andreas Dilger
2010-10-20 17:13 ` Nicolas Williams
2010-10-20 17:30 ` Andreas Dilger
2010-10-20 17:01 ` Nicolas Williams
2010-10-22 2:33 ` Vilobh Meshram
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CBF01DA.3090505@psc.edu \
--to=pauln@psc.edu \
--cc=lustre-devel@lists.lustre.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.