From: Paul Nowoczynski <pauln@psc.edu>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] Queries regarding LDLM_ENQUEUE
Date: Wed, 20 Oct 2010 12:43:46 -0400 [thread overview]
Message-ID: <4CBF1C42.1090109@psc.edu> (raw)
In-Reply-To: <4CBF094A.9020302@gmail.com>
bzzz.tomas at gmail.com wrote:
> On 10/20/10 6:51 PM, Paul Nowoczynski wrote:
>
>> Eric makes a good point in that only parallel jobs really need this
>> feature. Unfortunately, at scale the system (both clients and servers)
>> *really do* need something like this, especially if we continue pushing
>> users to perform N-1 file I/O instead of 'file per process'. I too am in
>> agreement that some sort of capability mechanism is the best approach. I
>> wonder if this is something that could be done outside of POSIX and
>> supported through a parallel I/O library? Perhaps a single application
>> threads could make a special open call (/proc magic perhaps?) and obtain
>> the glob of opaque bytes which are then broadcast to the rest of the
>> client via mpi. Traversing the namespace would be avoided on all but one
>> client. In such a scenario I don't feel that enforcing unix permissions
>> at every level of the path is needed or sensible, the operation should
>> be treated as a simple logical open. The question to the lustre experts
>> - can enough state be packed into an opaque object such that the
>> recv'ing client can construct the necessary cache state?
>>
>
> could you explain why is it so important to skip intermediate lookups?
> those are to be done once, then the clients will do them locally.
> is it because your nodes are getting new paths all the time or the nodes
> are rebooted very often and lose cache?
>
It's for scalability reasons. When N clients traverse the namespace
with the purpose of opening the same file the result is a storm of RPC
requests which bear down on the metadata server. This type of activity
becomes prohibitive especially when you start considering client counts
> 10^4. An operation such as this is ripe for optimization because
every client in the network is trying to build the same state. If you
have a method for a single client to 'learn' the final state, i.e. the
pathname -> fid translation, and broadcast it to its cohorts, it's a
huge win because it eliminates an O(N) operation.
paul
> thanks, z
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
next prev parent reply other threads:[~2010-10-20 16:43 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-18 23:33 [Lustre-devel] Queries regarding LDLM_ENQUEUE Vilobh Meshram
2010-10-19 15:46 ` Fan Yong
2010-10-19 20:28 ` Vilobh Meshram
2010-10-19 22:53 ` Andreas Dilger
2010-10-20 2:04 ` Vilobh Meshram
2010-10-20 7:55 ` Andreas Dilger
2010-10-20 8:11 ` bzzz.tomas at gmail.com
2010-10-20 8:24 ` Andreas Dilger
2010-10-20 8:30 ` bzzz.tomas at gmail.com
2010-10-20 8:38 ` Nikita Danilov
2010-10-20 14:45 ` Nicolas Williams
2010-10-20 13:30 ` Eric Barton
2010-10-20 13:40 ` bzzz.tomas at gmail.com
2010-10-20 14:51 ` Paul Nowoczynski
2010-10-20 14:55 ` Nicolas Williams
2010-10-20 15:16 ` Paul Nowoczynski
2010-10-20 16:07 ` Andreas Dilger
2010-10-20 15:22 ` bzzz.tomas at gmail.com
2010-10-20 16:43 ` Paul Nowoczynski [this message]
2010-10-20 16:49 ` bzzz.tomas at gmail.com
2010-10-20 17:11 ` Paul Nowoczynski
2010-10-20 17:18 ` bzzz.tomas at gmail.com
2010-10-20 17:25 ` Paul Nowoczynski
2010-10-20 17:27 ` Andreas Dilger
2010-10-20 17:29 ` Nicolas Williams
2010-10-20 17:40 ` bzzz.tomas at gmail.com
2010-10-20 18:01 ` Andreas Dilger
2010-10-20 18:09 ` bzzz.tomas at gmail.com
2010-10-20 16:35 ` Andreas Dilger
2010-10-20 16:46 ` Paul Nowoczynski
2010-10-20 17:00 ` Andreas Dilger
2010-10-20 17:13 ` Nicolas Williams
2010-10-20 17:30 ` Andreas Dilger
2010-10-20 17:01 ` Nicolas Williams
2010-10-22 2:33 ` Vilobh Meshram
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CBF1C42.1090109@psc.edu \
--to=pauln@psc.edu \
--cc=lustre-devel@lists.lustre.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.