* [Lustre-devel] Queries regarding LDLM_ENQUEUE
@ 2010-10-18 23:33 Vilobh Meshram
2010-10-19 15:46 ` Fan Yong
2010-10-19 20:28 ` Vilobh Meshram
0 siblings, 2 replies; 35+ messages in thread
From: Vilobh Meshram @ 2010-10-18 23:33 UTC (permalink / raw)
To: lustre-devel
Hi,
Out of the many RPC's used in Lustre seems like LDLM_ENQUEUE is the most
frequently used RPC to communicate between the client and the MDS.I have few
queries regarding the same :-
1) Is LDLM_ENQUEUE the only interface(RPC here) for CREATE/OPEN kind of
request ; through which the client can interact with the MDS ?
I tried couple of experiments and found out that LDLM_ENQUEUE comes into
picture while mounting the FS as well as when we do a lookup,create or open
a file.I was expecting the MDS_REINT RPC to get invoked in case of a
CREATE/OPEN request via mdc_create() but it seems like Lustre invokes
LDLM_ENQEUE even for CREATE/OPEN( by packing the intent related data).
Please correct me if I am wrong.
2) In which cases (which system calls) does the MDS_REINT RPC will get
invoked ?
Thanks,
Vilobh
*Graduate Research Associate
Department of Computer Science
The Ohio State University Columbus Ohio*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20101018/9f652e32/attachment.htm>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-18 23:33 [Lustre-devel] Queries regarding LDLM_ENQUEUE Vilobh Meshram
@ 2010-10-19 15:46 ` Fan Yong
2010-10-19 20:28 ` Vilobh Meshram
1 sibling, 0 replies; 35+ messages in thread
From: Fan Yong @ 2010-10-19 15:46 UTC (permalink / raw)
To: lustre-devel
On 10/19/10 7:33 AM, Vilobh Meshram wrote:
> Hi,
>
> Out of the many RPC's used in Lustre seems like LDLM_ENQUEUE is the
> most frequently used RPC to communicate between the client and the
> MDS.I have few queries regarding the same :-
>
> 1) Is LDLM_ENQUEUE the only interface(RPC here) for CREATE/OPEN kind
> of request ; through which the client can interact with the MDS ?
>
> I tried couple of experiments and found out that LDLM_ENQUEUE comes
> into picture while mounting the FS as well as when we do a
> lookup,create or open a file.I was expecting the MDS_REINT RPC to get
> invoked in case of a CREATE/OPEN request via mdc_create() but it seems
> like Lustre invokes LDLM_ENQEUE even for CREATE/OPEN( by packing the
> intent related data).
> Please correct me if I am wrong.
For OPEN_CREATE case, it is through LDLM_ENQUEUE interface to
communicate with MDS.
>
> 2) In which cases (which system calls) does the MDS_REINT RPC will get
> invoked ?
You can try mkdir/symlink/mknode to trigger MDS_REINT.
Cheers,
--
Nasf
>
>
> Thanks,
> Vilobh
> /Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio/
>
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20101019/92866d97/attachment.htm>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-18 23:33 [Lustre-devel] Queries regarding LDLM_ENQUEUE Vilobh Meshram
2010-10-19 15:46 ` Fan Yong
@ 2010-10-19 20:28 ` Vilobh Meshram
2010-10-19 22:53 ` Andreas Dilger
1 sibling, 1 reply; 35+ messages in thread
From: Vilobh Meshram @ 2010-10-19 20:28 UTC (permalink / raw)
To: lustre-devel
Hi,
From my exploration it seems like for create/open kind of request
LDLM_ENQUEUE is the RPC through which the client talks to MDS.Please confirm
on this.
Since I could figure out that LDLM_ENQUEUE is the only RPC to interface with
MDS I am planning to send the LDLM_ENQUEUE RPC *with some additonal
buffer*from the client to the MDS so that based on some specific
condition the MDS
can fill the information in the buffer sent from the client.I have made some
modifications to the code for the LDLM_ENQUEUE RPC but I am getting kernel
panics.Can someone please help me and suggest me what is a good way to
tackle this problem.I am using Lustre 1.8.1.1 and I cannot upgrade to Lustre
2.0.
Thanks,
Vilobh
*Graduate Research Associate
Department of Computer Science
The Ohio State University Columbus Ohio*
On Mon, Oct 18, 2010 at 7:33 PM, Vilobh Meshram <vilobh.meshram@gmail.com>wrote:
> Hi,
>
> Out of the many RPC's used in Lustre seems like LDLM_ENQUEUE is the most
> frequently used RPC to communicate between the client and the MDS.I have few
> queries regarding the same :-
>
> 1) Is LDLM_ENQUEUE the only interface(RPC here) for CREATE/OPEN kind of
> request ; through which the client can interact with the MDS ?
>
> I tried couple of experiments and found out that LDLM_ENQUEUE comes into
> picture while mounting the FS as well as when we do a lookup,create or open
> a file.I was expecting the MDS_REINT RPC to get invoked in case of a
> CREATE/OPEN request via mdc_create() but it seems like Lustre invokes
> LDLM_ENQEUE even for CREATE/OPEN( by packing the intent related data).
> Please correct me if I am wrong.
>
> 2) In which cases (which system calls) does the MDS_REINT RPC will get
> invoked ?
>
>
> Thanks,
> Vilobh
> *Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20101019/5c80e0d6/attachment.htm>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-19 20:28 ` Vilobh Meshram
@ 2010-10-19 22:53 ` Andreas Dilger
2010-10-20 2:04 ` Vilobh Meshram
0 siblings, 1 reply; 35+ messages in thread
From: Andreas Dilger @ 2010-10-19 22:53 UTC (permalink / raw)
To: lustre-devel
On 2010-10-19, at 14:28, Vilobh Meshram wrote:
> From my exploration it seems like for create/open kind of request LDLM_ENQUEUE is the RPC through which the client talks to MDS.Please confirm on this.
>
> Since I could figure out that LDLM_ENQUEUE is the only RPC to interface with MDS I am planning to send the LDLM_ENQUEUE RPC with some additonal buffer from the client to the MDS so that based on some specific condition the MDS can fill the information in the buffer sent from the client.
This isn't correct. LDLM_ENQUEUE is used for enqueueing locks. It just happens that when Lustre wants to create a new file it enqueues a lock on the parent directory with the "intent" to create a new file. The MDS currently always replies "you cannot have the lock for the directory, I created the requested file for you". Similarly, when the client is getting attributes on a file, it needs a lock on that file in order to cache the attributes, and to save RPCs the attributes are returned with the lock.
> I have made some modifications to the code for the LDLM_ENQUEUE RPC but I am getting kernel panics.Can someone please help me and suggest me what is a good way to tackle this problem.I am using Lustre 1.8.1.1 and I cannot upgrade to Lustre 2.0.
It would REALLY be a lot easier to have this discussion with you if you actually told us what it is you are working on. Not only could we focus on the higher-level issue that you are trying to solve (instead of possibly wasting a lot of time focussing in a small issue that may in fact be completely irrelevant), but with many ideas related to Lustre it has probably already been discussed at length by the Lustre developers sometime over the past 8 years that we've been working on it. I suspect that the readership of this list could probably give you a lot of assistance with whatever you are working on, if you will only tell us what it actually is you are trying to do.
> On Mon, Oct 18, 2010 at 7:33 PM, Vilobh Meshram <vilobh.meshram@gmail.com> wrote:
>> Out of the many RPC's used in Lustre seems like LDLM_ENQUEUE is the most frequently used RPC to communicate between the client and the MDS.I have few queries regarding the same :-
>>
>> 1) Is LDLM_ENQUEUE the only interface(RPC here) for CREATE/OPEN kind of request ; through which the client can interact with the MDS ?
>>
>> I tried couple of experiments and found out that LDLM_ENQUEUE comes into picture while mounting the FS as well as when we do a lookup,create or open a file.I was expecting the MDS_REINT RPC to get invoked in case of a CREATE/OPEN request via mdc_create() but it seems like Lustre invokes LDLM_ENQEUE even for CREATE/OPEN( by packing the intent related data).
>> Please correct me if I am wrong.
>>
>> 2) In which cases (which system calls) does the MDS_REINT RPC will get invoked ?
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-19 22:53 ` Andreas Dilger
@ 2010-10-20 2:04 ` Vilobh Meshram
2010-10-20 7:55 ` Andreas Dilger
0 siblings, 1 reply; 35+ messages in thread
From: Vilobh Meshram @ 2010-10-20 2:04 UTC (permalink / raw)
To: lustre-devel
Hi Andreas,
Thanks for your e-mail.
We are trying to do following things.Please let me know if things are not
clear :-
Say we have 2 client C1 and C2 and a MDS .Say C1 and C2 share a file.
1) When a client C1 performs a open/create kind of request to the MDS we
want to follow the normal path which Lustre performs.
2) Now say C2 tries to open the same file which was opened by C1.
3) At the MDS end we maintain some data structure to scan and see if the
file was already opened by some Client(in this case C1 has opened this
file).
4) If MDS finds that some client(C1 here) has already opened the file then
it send the new client(C2 here) with some information about the client which
has initially opened the file.
5) Once C2 gets the information its upto C2 to take further actions.
6) By this process we can save the time spent in the locking mechanism for
C2.Basically we aim to by-pass the locking scheme of Lustre for the files
already opened by some client by maintaining some kind of data structure.
Please let us know your thoughts on the above approach.Is this a feasible
design moving ahead can we see any complications ?
So considering the problem statement I need a way for C2 to extract the
information from the data structure maintained at MDS.In order to do that ,
C2 will send a request with intent = create|open which will be a
LDLM_ENQUEUE RPC.I need to modify this RPC such that :-
1) I can enclose some additional buffer whose size is known to me .
2) When we pack the reply at the MDS side we should be able to include this
buffer in the reply message .
3) At the client side we should be able to extract the information from the
reply message about the buffer.
As of now , I need help in above three steps.
Thanks,
Vilobh
*Graduate Research Associate
Department of Computer Science
The Ohio State University Columbus Ohio*
On Tue, Oct 19, 2010 at 6:53 PM, Andreas Dilger
<andreas.dilger@oracle.com>wrote:
> On 2010-10-19, at 14:28, Vilobh Meshram wrote:
> > From my exploration it seems like for create/open kind of request
> LDLM_ENQUEUE is the RPC through which the client talks to MDS.Please confirm
> on this.
> >
> > Since I could figure out that LDLM_ENQUEUE is the only RPC to interface
> with MDS I am planning to send the LDLM_ENQUEUE RPC with some additonal
> buffer from the client to the MDS so that based on some specific condition
> the MDS can fill the information in the buffer sent from the client.
>
> This isn't correct. LDLM_ENQUEUE is used for enqueueing locks. It just
> happens that when Lustre wants to create a new file it enqueues a lock on
> the parent directory with the "intent" to create a new file. The MDS
> currently always replies "you cannot have the lock for the directory, I
> created the requested file for you". Similarly, when the client is getting
> attributes on a file, it needs a lock on that file in order to cache the
> attributes, and to save RPCs the attributes are returned with the lock.
>
> > I have made some modifications to the code for the LDLM_ENQUEUE RPC but I
> am getting kernel panics.Can someone please help me and suggest me what is a
> good way to tackle this problem.I am using Lustre 1.8.1.1 and I cannot
> upgrade to Lustre 2.0.
>
> It would REALLY be a lot easier to have this discussion with you if you
> actually told us what it is you are working on. Not only could we focus on
> the higher-level issue that you are trying to solve (instead of possibly
> wasting a lot of time focussing in a small issue that may in fact be
> completely irrelevant), but with many ideas related to Lustre it has
> probably already been discussed at length by the Lustre developers sometime
> over the past 8 years that we've been working on it. I suspect that the
> readership of this list could probably give you a lot of assistance with
> whatever you are working on, if you will only tell us what it actually is
> you are trying to do.
>
> > On Mon, Oct 18, 2010 at 7:33 PM, Vilobh Meshram <
> vilobh.meshram at gmail.com> wrote:
> >> Out of the many RPC's used in Lustre seems like LDLM_ENQUEUE is the most
> frequently used RPC to communicate between the client and the MDS.I have few
> queries regarding the same :-
> >>
> >> 1) Is LDLM_ENQUEUE the only interface(RPC here) for CREATE/OPEN kind of
> request ; through which the client can interact with the MDS ?
> >>
> >> I tried couple of experiments and found out that LDLM_ENQUEUE comes into
> picture while mounting the FS as well as when we do a lookup,create or open
> a file.I was expecting the MDS_REINT RPC to get invoked in case of a
> CREATE/OPEN request via mdc_create() but it seems like Lustre invokes
> LDLM_ENQEUE even for CREATE/OPEN( by packing the intent related data).
> >> Please correct me if I am wrong.
> >>
> >> 2) In which cases (which system calls) does the MDS_REINT RPC will get
> invoked ?
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20101019/2423bbd7/attachment.htm>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 2:04 ` Vilobh Meshram
@ 2010-10-20 7:55 ` Andreas Dilger
2010-10-20 8:11 ` bzzz.tomas at gmail.com
2010-10-22 2:33 ` Vilobh Meshram
0 siblings, 2 replies; 35+ messages in thread
From: Andreas Dilger @ 2010-10-20 7:55 UTC (permalink / raw)
To: lustre-devel
On 2010-10-19, at 20:04, Vilobh Meshram wrote:
> We are trying to do following things.Please let me know if things are not clear :-
>
> Say we have 2 client C1 and C2 and a MDS .Say C1 and C2 share a file.
> 1) When a client C1 performs a open/create kind of request to the MDS we want to follow the normal path which Lustre performs.
> 2) Now say C2 tries to open the same file which was opened by C1.
> 3) At the MDS end we maintain some data structure to scan and see if the file was already opened by some Client(in this case C1 has opened this file).
> 4) If MDS finds that some client(C1 here) has already opened the file then it send the new client(C2 here) with some information about the client which has initially opened the file.
While I understand the basic concept, I don't really see how your proposal will actually improve performance. If C2 already has to contact the MDS and get a reply from it, then wouldn't it be about the same to simply perform the open as is done today? The number of MDS RPCs is the same, and in fact this would avoid further message overhead between C1 and C2.
> 5) Once C2 gets the information its upto C2 to take further actions.
> 6) By this process we can save the time spent in the locking mechanism for C2.Basically we aim to by-pass the locking scheme of Lustre for the files already opened by some client by maintaining some kind of data structure.
>
> Please let us know your thoughts on the above approach.Is this a feasible design moving ahead can we see any complications ?
There is a separate proposal that has been underway in the Linux community for some time, to allow a user process to get a file handle (i.e. binary blob returned from a new name_to_handle() syscall) from the kernel for a given pathname, and then later use that file handle in another process to open a file descriptor without re-traversing the path.
I've been thinking this would be very useful for Lustre (and MPI in general), and have tried to steer the Linux development in a direction that would allow this to happen. Is this in line with what you are investigating?
While this wouldn't eliminate the actual MDS open RPC (i.e. the LDLM_ENQUEUE you have been discussing), it could avoid the path traversal from each client, possibly saving {path_elements * num_clients} additional RPCs,
> So considering the problem statement I need a way for C2 to extract the information from the data structure maintained at MDS.In order to do that , C2 will send a request with intent = create|open which will be a LDLM_ENQUEUE RPC.I need to modify this RPC such that :-
> 1) I can enclose some additional buffer whose size is known to me .
> 2) When we pack the reply at the MDS side we should be able to include this buffer in the reply message .
> 3) At the client side we should be able to extract the information from the reply message about the buffer.
>
> As of now , I need help in above three steps.
>
> Thanks,
> Vilobh
> Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio
>
>
> On Tue, Oct 19, 2010 at 6:53 PM, Andreas Dilger <andreas.dilger@oracle.com> wrote:
> On 2010-10-19, at 14:28, Vilobh Meshram wrote:
> > From my exploration it seems like for create/open kind of request LDLM_ENQUEUE is the RPC through which the client talks to MDS.Please confirm on this.
> >
> > Since I could figure out that LDLM_ENQUEUE is the only RPC to interface with MDS I am planning to send the LDLM_ENQUEUE RPC with some additonal buffer from the client to the MDS so that based on some specific condition the MDS can fill the information in the buffer sent from the client.
>
> This isn't correct. LDLM_ENQUEUE is used for enqueueing locks. It just happens that when Lustre wants to create a new file it enqueues a lock on the parent directory with the "intent" to create a new file. The MDS currently always replies "you cannot have the lock for the directory, I created the requested file for you". Similarly, when the client is getting attributes on a file, it needs a lock on that file in order to cache the attributes, and to save RPCs the attributes are returned with the lock.
>
> > I have made some modifications to the code for the LDLM_ENQUEUE RPC but I am getting kernel panics.Can someone please help me and suggest me what is a good way to tackle this problem.I am using Lustre 1.8.1.1 and I cannot upgrade to Lustre 2.0.
>
> It would REALLY be a lot easier to have this discussion with you if you actually told us what it is you are working on. Not only could we focus on the higher-level issue that you are trying to solve (instead of possibly wasting a lot of time focussing in a small issue that may in fact be completely irrelevant), but with many ideas related to Lustre it has probably already been discussed at length by the Lustre developers sometime over the past 8 years that we've been working on it. I suspect that the readership of this list could probably give you a lot of assistance with whatever you are working on, if you will only tell us what it actually is you are trying to do.
>
> > On Mon, Oct 18, 2010 at 7:33 PM, Vilobh Meshram <vilobh.meshram@gmail.com> wrote:
> >> Out of the many RPC's used in Lustre seems like LDLM_ENQUEUE is the most frequently used RPC to communicate between the client and the MDS.I have few queries regarding the same :-
> >>
> >> 1) Is LDLM_ENQUEUE the only interface(RPC here) for CREATE/OPEN kind of request ; through which the client can interact with the MDS ?
> >>
> >> I tried couple of experiments and found out that LDLM_ENQUEUE comes into picture while mounting the FS as well as when we do a lookup,create or open a file.I was expecting the MDS_REINT RPC to get invoked in case of a CREATE/OPEN request via mdc_create() but it seems like Lustre invokes LDLM_ENQEUE even for CREATE/OPEN( by packing the intent related data).
> >> Please correct me if I am wrong.
> >>
> >> 2) In which cases (which system calls) does the MDS_REINT RPC will get invoked ?
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
>
>
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 7:55 ` Andreas Dilger
@ 2010-10-20 8:11 ` bzzz.tomas at gmail.com
2010-10-20 8:24 ` Andreas Dilger
2010-10-22 2:33 ` Vilobh Meshram
1 sibling, 1 reply; 35+ messages in thread
From: bzzz.tomas at gmail.com @ 2010-10-20 8:11 UTC (permalink / raw)
To: lustre-devel
On 10/20/10 11:55 AM, Andreas Dilger wrote:
> There is a separate proposal that has been underway in the Linux community for some time, to allow a user process to get a file handle (i.e. binary blob returned from a new name_to_handle() syscall) from the kernel for a given pathname, and then later use that file handle in another process to open a file descriptor without re-traversing the path.
>
> I've been thinking this would be very useful for Lustre (and MPI in general), and have tried to steer the Linux development in a direction that would allow this to happen. Is this in line with what you are investigating?
with FIDs is quite possible and even safe if application can learn it
(using xattr_get or ioctl). then it should be trivial to export FID
namespace on MDS via special .lustre-fids directory?
another idea was to do whole path traversal on MDS within a single RPC.
bug that'd require amount of changes to llite and/or VFS and keep MDS
a bottleneck.
thanks, z
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 8:11 ` bzzz.tomas at gmail.com
@ 2010-10-20 8:24 ` Andreas Dilger
2010-10-20 8:30 ` bzzz.tomas at gmail.com
0 siblings, 1 reply; 35+ messages in thread
From: Andreas Dilger @ 2010-10-20 8:24 UTC (permalink / raw)
To: lustre-devel
On 2010-10-20, at 02:11, bzzz.tomas at gmail.com wrote:
> On 10/20/10 11:55 AM, Andreas Dilger wrote:
>> There is a separate proposal that has been underway in the Linux community for some time, to allow a user process to get a file handle (i.e. binary blob returned from a new name_to_handle() syscall) from the kernel for a given pathname, and then later use that file handle in another process to open a file descriptor without re-traversing the path.
>>
>> I've been thinking this would be very useful for Lustre (and MPI in general), and have tried to steer the Linux development in a direction that would allow this to happen. Is this in line with what you are investigating?
>
> with FIDs is quite possible and even safe if application can learn it
> (using xattr_get or ioctl). then it should be trivial to export FID
> namespace on MDS via special .lustre-fids directory?
I'm reluctant to expose the whole FID namespace to applications, since this completely bypasses all directory permissions and allows opening files only based on their inode permissions. If we require a name_to_handle() syscall to succeed first, before allowing open_by_handle() to work, then at least we know that one of the involved processes was able to do a full path traversal.
> another idea was to do whole path traversal on MDS within a single RPC.
> bug that'd require amount of changes to llite and/or VFS and keep MDS
> a bottleneck.
This was discussed a long time ago, and has the potential drawback that if one of the path components is over-mounted on the client (e.g. local RAM-based tmpfs on a Lustre root filesystem) then the MDS-side path traversal would be incorrect. It could return an entry underneath the mountpoint, instead of inside it.
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 8:24 ` Andreas Dilger
@ 2010-10-20 8:30 ` bzzz.tomas at gmail.com
2010-10-20 8:38 ` Nikita Danilov
2010-10-20 13:30 ` Eric Barton
0 siblings, 2 replies; 35+ messages in thread
From: bzzz.tomas at gmail.com @ 2010-10-20 8:30 UTC (permalink / raw)
To: lustre-devel
On 10/20/10 12:24 PM, Andreas Dilger wrote:
> I'm reluctant to expose the whole FID namespace to applications, since this completely bypasses all directory permissions and allows opening files only based on their inode permissions. If we require a name_to_handle() syscall to succeed first, before allowing open_by_handle() to work, then at least we know that one of the involved processes was able to do a full path traversal.
yes, this is a good point. can be solved if you use FID +
capability/signature ?
>> another idea was to do whole path traversal on MDS within a single RPC.
>> bug that'd require amount of changes to llite and/or VFS and keep MDS
>> a bottleneck.
>
> This was discussed a long time ago, and has the potential drawback that if one of the path components is over-mounted on the client (e.g. local RAM-based tmpfs on a Lustre root filesystem) then the MDS-side path traversal would be incorrect. It could return an entry underneath the mountpoint, instead of inside it.
yes, and that could be solved if server returns a series of FIDs,
then client could check whether any of those is over-mounted?
thanks, z
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 8:30 ` bzzz.tomas at gmail.com
@ 2010-10-20 8:38 ` Nikita Danilov
2010-10-20 14:45 ` Nicolas Williams
2010-10-20 13:30 ` Eric Barton
1 sibling, 1 reply; 35+ messages in thread
From: Nikita Danilov @ 2010-10-20 8:38 UTC (permalink / raw)
To: lustre-devel
On 20 October 2010 12:30, <bzzz.tomas@gmail.com> wrote:
> On 10/20/10 12:24 PM, Andreas Dilger wrote:
> > I'm reluctant to expose the whole FID namespace to applications, since
> this completely bypasses all directory permissions and allows opening files
> only based on their inode permissions. If we require a name_to_handle()
> syscall to succeed first, before allowing open_by_handle() to work, then at
> least we know that one of the involved processes was able to do a full path
> traversal.
>
> yes, this is a good point. can be solved if you use FID +
> capability/signature ?
>
> >> another idea was to do whole path traversal on MDS within a single RPC.
> >> bug that'd require amount of changes to llite and/or VFS and keep MDS
> >> a bottleneck.
> >
> > This was discussed a long time ago, and has the potential drawback that
> if one of the path components is over-mounted on the client (e.g. local
> RAM-based tmpfs on a Lustre root filesystem) then the MDS-side path
> traversal would be incorrect. It could return an entry underneath the
> mountpoint, instead of inside it.
>
> yes, and that could be solved if server returns a series of FIDs,
> then client could check whether any of those is over-mounted?
>
This is what sufficiently smart nfsv4 clients are supposed to do, by the
way, I believe: issue a compound RPC with a sequence of LOOKUP requests and
traverse returned sequence of file-id-s locally, checking for mount points.
Nikita.
>
> thanks, z
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20101020/ae9ad024/attachment.htm>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 8:30 ` bzzz.tomas at gmail.com
2010-10-20 8:38 ` Nikita Danilov
@ 2010-10-20 13:30 ` Eric Barton
2010-10-20 13:40 ` bzzz.tomas at gmail.com
` (2 more replies)
1 sibling, 3 replies; 35+ messages in thread
From: Eric Barton @ 2010-10-20 13:30 UTC (permalink / raw)
To: lustre-devel
I do like the idea of a collective open, but I'm wondering if it can be
implemented simply enough to be worth the effort. True, it avoids the O(n)
load on the server of all the clients (re)populating their namespace
caches, but it's only useful for parallel jobs - a scale-out NAS style
workload can't benefit. Ultimately the O(n) will have to be replaced with
something that scales O(log n) (e.g. with a fat tree of caching proxy
servers).
> On 10/20/10 12:24 PM, Andreas Dilger wrote:
> > I'm reluctant to expose the whole FID namespace to applications,
??? It can just be opaque bytes to the app.
> > since this completely bypasses all directory permissions and allows
> > opening files only based on their inode permissions. If we require a
> > name_to_handle() syscall to succeed first, before allowing
> > open_by_handle() to work, then at least we know that one of the
> > involved processes was able to do a full path traversal.
I think this defeats the scalability objective - we trying to avoid having
to pull the namespace into every client aren't we?
> yes, this is a good point. can be solved if you use FID +
> capability/signature ?
Yes, I think capabilities are the only way collective open can be made
secure "properly". And given the way we believe capabilities have to be
implemented for scalability (i.e. to keep the capability cache down to a
reasonable size on the server) any open by one node in a given client
cluster may well have to confer the right to use the FID by any of its
peers.
> >> another idea was to do whole path traversal on MDS within a single
> >> RPC. bug that'd require amount of changes to llite and/or VFS and
> >> keep MDS a bottleneck.
That's an optimization rather than a scalability feature. How much does
it complicate the code? I'd hate to see something new tricky and fragile
complicate further development.
Cheers,
Eric
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 13:30 ` Eric Barton
@ 2010-10-20 13:40 ` bzzz.tomas at gmail.com
2010-10-20 14:51 ` Paul Nowoczynski
2010-10-20 16:35 ` Andreas Dilger
2 siblings, 0 replies; 35+ messages in thread
From: bzzz.tomas at gmail.com @ 2010-10-20 13:40 UTC (permalink / raw)
To: lustre-devel
On 10/20/10 5:30 PM, Eric Barton wrote:
> I do like the idea of a collective open, but I'm wondering if it can be
> implemented simply enough to be worth the effort. True, it avoids the O(n)
> load on the server of all the clients (re)populating their namespace
> caches, but it's only useful for parallel jobs - a scale-out NAS style
> workload can't benefit. Ultimately the O(n) will have to be replaced with
> something that scales O(log n) (e.g. with a fat tree of caching proxy
> servers).
in long-term I'd prefer proxy approach because this way we could improve
number of cases, including existing POSIX apps doing open, stat, etc.
>>>> another idea was to do whole path traversal on MDS within a single
>>>> RPC. bug that'd require amount of changes to llite and/or VFS and
>>>> keep MDS a bottleneck.
>
> That's an optimization rather than a scalability feature. How much does
> it complicate the code? I'd hate to see something new tricky and fragile
> complicate further development.
yes, this is an optimization. good thing here is that single client can
benefit a lot from this (replacing few RPCs with a single one). bad
thing is that it can be quite quite complicated on the client side (the
server side's part looks OK).
thanks, z
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 8:38 ` Nikita Danilov
@ 2010-10-20 14:45 ` Nicolas Williams
0 siblings, 0 replies; 35+ messages in thread
From: Nicolas Williams @ 2010-10-20 14:45 UTC (permalink / raw)
To: lustre-devel
On Wed, Oct 20, 2010 at 12:38:44PM +0400, Nikita Danilov wrote:
> On 20 October 2010 12:30, <bzzz.tomas@gmail.com> wrote:
> > yes, and that could be solved if server returns a series of FIDs,
> > then client could check whether any of those is over-mounted?
>
> This is what sufficiently smart nfsv4 clients are supposed to do, by the
> way, I believe: issue a compound RPC with a sequence of LOOKUP requests and
> traverse returned sequence of file-id-s locally, checking for mount points.
Yes. The detection and replication of server-side mountpoints on the
client-side is called "mirror mounts" in Solaris, and it's quite handy.
For clients the main issue is going to be whether the VFS allows plugins
to resolve more than one path component at a time.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 13:30 ` Eric Barton
2010-10-20 13:40 ` bzzz.tomas at gmail.com
@ 2010-10-20 14:51 ` Paul Nowoczynski
2010-10-20 14:55 ` Nicolas Williams
2010-10-20 15:22 ` bzzz.tomas at gmail.com
2010-10-20 16:35 ` Andreas Dilger
2 siblings, 2 replies; 35+ messages in thread
From: Paul Nowoczynski @ 2010-10-20 14:51 UTC (permalink / raw)
To: lustre-devel
Eric Barton wrote:
> I do like the idea of a collective open, but I'm wondering if it can be
> implemented simply enough to be worth the effort. True, it avoids the O(n)
> load on the server of all the clients (re)populating their namespace
> caches, but it's only useful for parallel jobs - a scale-out NAS style
> workload can't benefit. Ultimately the O(n) will have to be replaced with
> something that scales O(log n) (e.g. with a fat tree of caching proxy
> servers).
Eric makes a good point in that only parallel jobs really need this
feature. Unfortunately, at scale the system (both clients and servers)
*really do* need something like this, especially if we continue pushing
users to perform N-1 file I/O instead of 'file per process'. I too am in
agreement that some sort of capability mechanism is the best approach. I
wonder if this is something that could be done outside of POSIX and
supported through a parallel I/O library? Perhaps a single application
threads could make a special open call (/proc magic perhaps?) and obtain
the glob of opaque bytes which are then broadcast to the rest of the
client via mpi. Traversing the namespace would be avoided on all but one
client. In such a scenario I don't feel that enforcing unix permissions
at every level of the path is needed or sensible, the operation should
be treated as a simple logical open. The question to the lustre experts
- can enough state be packed into an opaque object such that the
recv'ing client can construct the necessary cache state?
>
>> On 10/20/10 12:24 PM, Andreas Dilger wrote:
>>> I'm reluctant to expose the whole FID namespace to applications,
>
> ??? It can just be opaque bytes to the app.
>
>>> since this completely bypasses all directory permissions and allows
>>> opening files only based on their inode permissions. If we require a
>>> name_to_handle() syscall to succeed first, before allowing
>>> open_by_handle() to work, then at least we know that one of the
>>> involved processes was able to do a full path traversal.
>
> I think this defeats the scalability objective - we trying to avoid having
> to pull the namespace into every client aren't we?
>
>> yes, this is a good point. can be solved if you use FID +
>> capability/signature ?
>
> Yes, I think capabilities are the only way collective open can be made
> secure "properly". And given the way we believe capabilities have to be
> implemented for scalability (i.e. to keep the capability cache down to a
> reasonable size on the server) any open by one node in a given client
> cluster may well have to confer the right to use the FID by any of its
> peers.
>
>>>> another idea was to do whole path traversal on MDS within a single
>>>> RPC. bug that'd require amount of changes to llite and/or VFS and
>>>> keep MDS a bottleneck.
>
> That's an optimization rather than a scalability feature. How much does
> it complicate the code? I'd hate to see something new tricky and fragile
> complicate further development.
>
> Cheers,
> Eric
>
>
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 14:51 ` Paul Nowoczynski
@ 2010-10-20 14:55 ` Nicolas Williams
2010-10-20 15:16 ` Paul Nowoczynski
2010-10-20 15:22 ` bzzz.tomas at gmail.com
1 sibling, 1 reply; 35+ messages in thread
From: Nicolas Williams @ 2010-10-20 14:55 UTC (permalink / raw)
To: lustre-devel
On Wed, Oct 20, 2010 at 10:51:06AM -0400, Paul Nowoczynski wrote:
> Eric makes a good point in that only parallel jobs really need this
> feature. Unfortunately, at scale the system (both clients and servers)
> *really do* need something like this, especially if we continue pushing
> users to perform N-1 file I/O instead of 'file per process'. I too am in
> agreement that some sort of capability mechanism is the best approach. I
> wonder if this is something that could be done outside of POSIX and
> supported through a parallel I/O library? Perhaps a single application
> threads could make a special open call (/proc magic perhaps?) and obtain
> the glob of opaque bytes which are then broadcast to the rest of the
> client via mpi. Traversing the namespace would be avoided on all but one
> client. In such a scenario I don't feel that enforcing unix permissions
> at every level of the path is needed or sensible, the operation should
> be treated as a simple logical open. The question to the lustre experts
> - can enough state be packed into an opaque object such that the
> recv'ing client can construct the necessary cache state?
POSIX already has what you're asking for, and it's called openg() ;)
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 14:55 ` Nicolas Williams
@ 2010-10-20 15:16 ` Paul Nowoczynski
2010-10-20 16:07 ` Andreas Dilger
0 siblings, 1 reply; 35+ messages in thread
From: Paul Nowoczynski @ 2010-10-20 15:16 UTC (permalink / raw)
To: lustre-devel
Yes! I think I was at this HEC meeting a few years ago?? :)
Here are the pointers to the manpages if anyone else is interested.
http://www.opengroup.org/platform/hecewg/
So my question wasn't so much about the interface which is why I posed a
scenario based on MPI. But rather, how feasible is it to import the
necessary state the from the client issuing openg() to the rest?
paul
Nicolas Williams wrote:
> On Wed, Oct 20, 2010 at 10:51:06AM -0400, Paul Nowoczynski wrote:
>
>> Eric makes a good point in that only parallel jobs really need this
>> feature. Unfortunately, at scale the system (both clients and servers)
>> *really do* need something like this, especially if we continue pushing
>> users to perform N-1 file I/O instead of 'file per process'. I too am in
>> agreement that some sort of capability mechanism is the best approach. I
>> wonder if this is something that could be done outside of POSIX and
>> supported through a parallel I/O library? Perhaps a single application
>> threads could make a special open call (/proc magic perhaps?) and obtain
>> the glob of opaque bytes which are then broadcast to the rest of the
>> client via mpi. Traversing the namespace would be avoided on all but one
>> client. In such a scenario I don't feel that enforcing unix permissions
>> at every level of the path is needed or sensible, the operation should
>> be treated as a simple logical open. The question to the lustre experts
>> - can enough state be packed into an opaque object such that the
>> recv'ing client can construct the necessary cache state?
>>
>
> POSIX already has what you're asking for, and it's called openg() ;)
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 14:51 ` Paul Nowoczynski
2010-10-20 14:55 ` Nicolas Williams
@ 2010-10-20 15:22 ` bzzz.tomas at gmail.com
2010-10-20 16:43 ` Paul Nowoczynski
1 sibling, 1 reply; 35+ messages in thread
From: bzzz.tomas at gmail.com @ 2010-10-20 15:22 UTC (permalink / raw)
To: lustre-devel
On 10/20/10 6:51 PM, Paul Nowoczynski wrote:
> Eric makes a good point in that only parallel jobs really need this
> feature. Unfortunately, at scale the system (both clients and servers)
> *really do* need something like this, especially if we continue pushing
> users to perform N-1 file I/O instead of 'file per process'. I too am in
> agreement that some sort of capability mechanism is the best approach. I
> wonder if this is something that could be done outside of POSIX and
> supported through a parallel I/O library? Perhaps a single application
> threads could make a special open call (/proc magic perhaps?) and obtain
> the glob of opaque bytes which are then broadcast to the rest of the
> client via mpi. Traversing the namespace would be avoided on all but one
> client. In such a scenario I don't feel that enforcing unix permissions
> at every level of the path is needed or sensible, the operation should
> be treated as a simple logical open. The question to the lustre experts
> - can enough state be packed into an opaque object such that the
> recv'ing client can construct the necessary cache state?
could you explain why is it so important to skip intermediate lookups?
those are to be done once, then the clients will do them locally.
is it because your nodes are getting new paths all the time or the nodes
are rebooted very often and lose cache?
thanks, z
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 15:16 ` Paul Nowoczynski
@ 2010-10-20 16:07 ` Andreas Dilger
0 siblings, 0 replies; 35+ messages in thread
From: Andreas Dilger @ 2010-10-20 16:07 UTC (permalink / raw)
To: lustre-devel
Note that I was in contact with the HECEWG also, and openg() was proposed to be renamed to be more understandable.
The current Linux name_to_handle() proposal could be used to export an blob identifier (file handle that holds a FID, fs UUID, and a cookie/capability in the Lustre case) to userspace, then MPI-IO or some other mechanism can be used to distribute this to other client processes and open_by_handle() to convert this back into a file handle.
One question is whether mpi_open() could be used for a collective operation (allowing this to be handled inside the Lustre ADIO layer) or if it would need specific application support?
Cheers, Andreas
On 2010-10-20, at 9:16, Paul Nowoczynski <pauln@psc.edu> wrote:
> Yes! I think I was at this HEC meeting a few years ago?? :)
> Here are the pointers to the manpages if anyone else is interested.
> http://www.opengroup.org/platform/hecewg/
>
> So my question wasn't so much about the interface which is why I posed a
> scenario based on MPI. But rather, how feasible is it to import the
> necessary state the from the client issuing openg() to the rest?
> paul
>
>
> Nicolas Williams wrote:
>> On Wed, Oct 20, 2010 at 10:51:06AM -0400, Paul Nowoczynski wrote:
>>
>>> Eric makes a good point in that only parallel jobs really need this
>>> feature. Unfortunately, at scale the system (both clients and servers)
>>> *really do* need something like this, especially if we continue pushing
>>> users to perform N-1 file I/O instead of 'file per process'. I too am in
>>> agreement that some sort of capability mechanism is the best approach. I
>>> wonder if this is something that could be done outside of POSIX and
>>> supported through a parallel I/O library? Perhaps a single application
>>> threads could make a special open call (/proc magic perhaps?) and obtain
>>> the glob of opaque bytes which are then broadcast to the rest of the
>>> client via mpi. Traversing the namespace would be avoided on all but one
>>> client. In such a scenario I don't feel that enforcing unix permissions
>>> at every level of the path is needed or sensible, the operation should
>>> be treated as a simple logical open. The question to the lustre experts
>>> - can enough state be packed into an opaque object such that the
>>> recv'ing client can construct the necessary cache state?
>>>
>>
>> POSIX already has what you're asking for, and it's called openg() ;)
>>
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 13:30 ` Eric Barton
2010-10-20 13:40 ` bzzz.tomas at gmail.com
2010-10-20 14:51 ` Paul Nowoczynski
@ 2010-10-20 16:35 ` Andreas Dilger
2010-10-20 16:46 ` Paul Nowoczynski
2 siblings, 1 reply; 35+ messages in thread
From: Andreas Dilger @ 2010-10-20 16:35 UTC (permalink / raw)
To: lustre-devel
On 2010-10-20, at 7:30, Eric Barton <eeb@whamcloud.com> wrote:
>> On 10/20/10 12:24 PM, Andreas Dilger wrote:
>>> I'm reluctant to expose the whole FID namespace to applications,
>
> ??? It can just be opaque bytes to the app.
This was in reply to Alex Z's comments that we can just do open-by-FID from userspace.
>>> since this completely bypasses all directory permissions and allows
>>> opening files only based on their inode permissions. If we require a
>>> name_to_handle() syscall to succeed first, before allowing
>>> open_by_handle() to work, then at least we know that one of the
>>> involved processes was able to do a full path traversal.
>
> I think this defeats the scalability objective - we trying to avoid having
> to pull the namespace into every client aren't we?
The name_to_handle() only needs to be called on a single node, and open_by_handle() is called on the other nodes. I agree that this doesn't avoid the full O(n) RPCs for the open itself but at least it does avoid the full path traversal from every client and on the MDS (replacing it with an MPI broadcast of the handle).
>
Cheers, Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20101020/b4201823/attachment.htm>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 15:22 ` bzzz.tomas at gmail.com
@ 2010-10-20 16:43 ` Paul Nowoczynski
2010-10-20 16:49 ` bzzz.tomas at gmail.com
0 siblings, 1 reply; 35+ messages in thread
From: Paul Nowoczynski @ 2010-10-20 16:43 UTC (permalink / raw)
To: lustre-devel
bzzz.tomas at gmail.com wrote:
> On 10/20/10 6:51 PM, Paul Nowoczynski wrote:
>
>> Eric makes a good point in that only parallel jobs really need this
>> feature. Unfortunately, at scale the system (both clients and servers)
>> *really do* need something like this, especially if we continue pushing
>> users to perform N-1 file I/O instead of 'file per process'. I too am in
>> agreement that some sort of capability mechanism is the best approach. I
>> wonder if this is something that could be done outside of POSIX and
>> supported through a parallel I/O library? Perhaps a single application
>> threads could make a special open call (/proc magic perhaps?) and obtain
>> the glob of opaque bytes which are then broadcast to the rest of the
>> client via mpi. Traversing the namespace would be avoided on all but one
>> client. In such a scenario I don't feel that enforcing unix permissions
>> at every level of the path is needed or sensible, the operation should
>> be treated as a simple logical open. The question to the lustre experts
>> - can enough state be packed into an opaque object such that the
>> recv'ing client can construct the necessary cache state?
>>
>
> could you explain why is it so important to skip intermediate lookups?
> those are to be done once, then the clients will do them locally.
> is it because your nodes are getting new paths all the time or the nodes
> are rebooted very often and lose cache?
>
It's for scalability reasons. When N clients traverse the namespace
with the purpose of opening the same file the result is a storm of RPC
requests which bear down on the metadata server. This type of activity
becomes prohibitive especially when you start considering client counts
> 10^4. An operation such as this is ripe for optimization because
every client in the network is trying to build the same state. If you
have a method for a single client to 'learn' the final state, i.e. the
pathname -> fid translation, and broadcast it to its cohorts, it's a
huge win because it eliminates an O(N) operation.
paul
> thanks, z
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 16:35 ` Andreas Dilger
@ 2010-10-20 16:46 ` Paul Nowoczynski
2010-10-20 17:00 ` Andreas Dilger
2010-10-20 17:01 ` Nicolas Williams
0 siblings, 2 replies; 35+ messages in thread
From: Paul Nowoczynski @ 2010-10-20 16:46 UTC (permalink / raw)
To: lustre-devel
> The name_to_handle() only needs to be called on a single node, and
> open_by_handle() is called on the other nodes. I agree that this
> doesn't avoid the full O(n) RPCs for the open itself but at least it
> does avoid the full path traversal from every client and on the
> MDS (replacing it with an MPI broadcast of the handle).
Andreas,
excuse my ignorance, but why does open_by_handle() need to issue an
RPC? If it's to obtain the layout, couldn't the layout be encoded into
the 'handle'?
p
>
> Cheers, Andreas
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 16:43 ` Paul Nowoczynski
@ 2010-10-20 16:49 ` bzzz.tomas at gmail.com
2010-10-20 17:11 ` Paul Nowoczynski
0 siblings, 1 reply; 35+ messages in thread
From: bzzz.tomas at gmail.com @ 2010-10-20 16:49 UTC (permalink / raw)
To: lustre-devel
On 10/20/10 8:43 PM, Paul Nowoczynski wrote:
> It's for scalability reasons. When N clients traverse the namespace with
> the purpose of opening the same file the result is a storm of RPC
> requests which bear down on the metadata server. This type of activity
> becomes prohibitive especially when you start considering client counts
> > 10^4. An operation such as this is ripe for optimization because
> every client in the network is trying to build the same state. If you
> have a method for a single client to 'learn' the final state, i.e. the
> pathname -> fid translation, and broadcast it to its cohorts, it's a
> huge win because it eliminates an O(N) operation.
> paul
clear enough, but what is the bottleneck here: MDS to handle lots of
RPCs or network to pass RPCs ?
thanks, z
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 16:46 ` Paul Nowoczynski
@ 2010-10-20 17:00 ` Andreas Dilger
2010-10-20 17:13 ` Nicolas Williams
2010-10-20 17:01 ` Nicolas Williams
1 sibling, 1 reply; 35+ messages in thread
From: Andreas Dilger @ 2010-10-20 17:00 UTC (permalink / raw)
To: lustre-devel
On 2010-10-20, at 10:46, Paul Nowoczynski <pauln@psc.edu> wrote:
>
>> The name_to_handle() only needs to be called on a single node, and open_by_handle() is called on the other nodes. I agree that this doesn't avoid the full O(n) RPCs for the open itself but at least it does avoid the full path traversal from every client and on the MDS (replacing it with an MPI broadcast of the handle).
>
> excuse my ignorance, but why does open_by_handle() need to issue an RPC? If it's to obtain the layout, couldn't the layout be encoded into the 'handle'?
In theory, yes. Practically, there is a size limit on the handle, and in large filesystems the layout is larger than this limit.
Also, it depends on whether we want the MDS to have consistent behavior with the resulting open file descriptor or not.
I suppose in many cases it would be possible to fake out an open file on the client without telling the MDS, but then there will be strange problems in some cases (e.g. stat() of the file, errors on close, etc.) that would result since the MDS won't know anything about the other openers. Maybe that is acceptable, I don't know.
Cheers, Andreas
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 16:46 ` Paul Nowoczynski
2010-10-20 17:00 ` Andreas Dilger
@ 2010-10-20 17:01 ` Nicolas Williams
1 sibling, 0 replies; 35+ messages in thread
From: Nicolas Williams @ 2010-10-20 17:01 UTC (permalink / raw)
To: lustre-devel
On Wed, Oct 20, 2010 at 12:46:56PM -0400, Paul Nowoczynski wrote:
>
> > The name_to_handle() only needs to be called on a single node, and
> > open_by_handle() is called on the other nodes. I agree that this
> > doesn't avoid the full O(n) RPCs for the open itself but at least it
> > does avoid the full path traversal from every client and on the
> > MDS (replacing it with an MPI broadcast of the handle).
> Andreas,
> excuse my ignorance, but why does open_by_handle() need to issue an
> RPC? If it's to obtain the layout, couldn't the layout be encoded into
> the 'handle'?
If you don't mind having a huge handle, then yes, we could skip
additional RPCs.
A handle would have to consist of a {MGS address, FID, layout, access
type, capability}, or so.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 16:49 ` bzzz.tomas at gmail.com
@ 2010-10-20 17:11 ` Paul Nowoczynski
2010-10-20 17:18 ` bzzz.tomas at gmail.com
0 siblings, 1 reply; 35+ messages in thread
From: Paul Nowoczynski @ 2010-10-20 17:11 UTC (permalink / raw)
To: lustre-devel
bzzz.tomas at gmail.com wrote:
> On 10/20/10 8:43 PM, Paul Nowoczynski wrote:
>
>> It's for scalability reasons. When N clients traverse the namespace with
>> the purpose of opening the same file the result is a storm of RPC
>> requests which bear down on the metadata server. This type of activity
>> becomes prohibitive especially when you start considering client counts
>> > 10^4. An operation such as this is ripe for optimization because
>> every client in the network is trying to build the same state. If you
>> have a method for a single client to 'learn' the final state, i.e. the
>> pathname -> fid translation, and broadcast it to its cohorts, it's a
>> huge win because it eliminates an O(N) operation.
>> paul
>>
>
> clear enough, but what is the bottleneck here: MDS to handle lots of
> RPCs or network to pass RPCs ?
I could be wrong but my guess is that the network congestion caused by
this communication pattern is a more serious problem. The mds should be
able to easily service lookup rpc's since only the first few necessitate
a read I/O from the disk.
> thanks, z
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 17:00 ` Andreas Dilger
@ 2010-10-20 17:13 ` Nicolas Williams
2010-10-20 17:30 ` Andreas Dilger
0 siblings, 1 reply; 35+ messages in thread
From: Nicolas Williams @ 2010-10-20 17:13 UTC (permalink / raw)
To: lustre-devel
On Wed, Oct 20, 2010 at 11:00:53AM -0600, Andreas Dilger wrote:
> On 2010-10-20, at 10:46, Paul Nowoczynski <pauln@psc.edu> wrote:
> >> The name_to_handle() only needs to be called on a single node, and
> >> open_by_handle() is called on the other nodes. I agree that this
> >> doesn't avoid the full O(n) RPCs for the open itself but at least
> >> it does avoid the full path traversal from every client and on the
> >> MDS (replacing it with an MPI broadcast of the handle).
> >
> > excuse my ignorance, but why does open_by_handle() need to issue an
> > RPC? If it's to obtain the layout, couldn't the layout be encoded
> > into the 'handle'?
>
> In theory, yes. Practically, there is a size limit on the handle, and
> in large filesystems the layout is larger than this limit.
>
> Also, it depends on whether we want the MDS to have consistent
> behavior with the resulting open file descriptor or not.
>
> I suppose in many cases it would be possible to fake out an open file
> on the client without telling the MDS, but then there will be strange
> problems in some cases (e.g. stat() of the file, errors on close,
> etc.) that would result since the MDS won't know anything about the
> other openers. Maybe that is acceptable, I don't know.
Well, if we're going to add openg() (or whatever its name), we might as
well add variants of stat() that don't require getting the size when the
app doesn't need it, and forget about SOM, or forget about SOM when we
know that a file might be open by unknown clients (recover issues here).
Another possibility is that the handle encodes the current size, and
that to write past that size requires an RPC to establish open state,
but this ignores truncation.
Another possibility is to say that a handle is only good as long as the
original file descriptor remains open (recovery issues here), and that
client can tell the MDS that it will be sharing its handle with other
clients. Or that client could tell the MDS what all the clients are
that will share that handle (recovery issues here too).
Some sort of additional RPC seems hard to avoid here, but maybe it could
be async for clients opening by handle.
Nico
--
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 17:11 ` Paul Nowoczynski
@ 2010-10-20 17:18 ` bzzz.tomas at gmail.com
2010-10-20 17:25 ` Paul Nowoczynski
` (2 more replies)
0 siblings, 3 replies; 35+ messages in thread
From: bzzz.tomas at gmail.com @ 2010-10-20 17:18 UTC (permalink / raw)
To: lustre-devel
On 10/20/10 9:11 PM, Paul Nowoczynski wrote:
> I could be wrong but my guess is that the network congestion caused by
> this communication pattern is a more serious problem. The mds should be
> able to easily service lookup rpc's since only the first few necessitate
> a read I/O from the disk.
but then the network should be able to deal with storm of
<max RPC in-flight> * <# clients> to read/write data?
or it's a specific switch being the bottleneck to specific node?
because if it isn't network, but MDS being a real bottleneck,
then proxy might be a solution like Eric said above. not sure
is this important in your case, but this would allow to use
existing apps.
of course, distribution tree for a handle may scale better.
thanks, z
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 17:18 ` bzzz.tomas at gmail.com
@ 2010-10-20 17:25 ` Paul Nowoczynski
2010-10-20 17:27 ` Andreas Dilger
2010-10-20 17:29 ` Nicolas Williams
2 siblings, 0 replies; 35+ messages in thread
From: Paul Nowoczynski @ 2010-10-20 17:25 UTC (permalink / raw)
To: lustre-devel
have a look at this, it explains the type of problem networks have in
dealing with these communication patterns.
http://www.pdl.cmu.edu/Incast/
and yes, a proxy is a workable solution, and probably the most well
rounded. The disadvantages is that it would presumably require more
engineering to deploy.
p
bzzz.tomas at gmail.com wrote:
> On 10/20/10 9:11 PM, Paul Nowoczynski wrote:
>
>> I could be wrong but my guess is that the network congestion caused by
>> this communication pattern is a more serious problem. The mds should be
>> able to easily service lookup rpc's since only the first few necessitate
>> a read I/O from the disk.
>>
>
> but then the network should be able to deal with storm of
> <max RPC in-flight> * <# clients> to read/write data?
>
> or it's a specific switch being the bottleneck to specific node?
>
> because if it isn't network, but MDS being a real bottleneck,
> then proxy might be a solution like Eric said above. not sure
> is this important in your case, but this would allow to use
> existing apps.
>
> of course, distribution tree for a handle may scale better.
>
> thanks, z
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 17:18 ` bzzz.tomas at gmail.com
2010-10-20 17:25 ` Paul Nowoczynski
@ 2010-10-20 17:27 ` Andreas Dilger
2010-10-20 17:29 ` Nicolas Williams
2 siblings, 0 replies; 35+ messages in thread
From: Andreas Dilger @ 2010-10-20 17:27 UTC (permalink / raw)
To: lustre-devel
On 2010-10-20, at 11:18, bzzz.tomas at gmail.com wrote:
> On 10/20/10 9:11 PM, Paul Nowoczynski wrote:
>> I could be wrong but my guess is that the network congestion caused by
>> this communication pattern is a more serious problem. The mds should be
>> able to easily service lookup rpc's since only the first few necessitate
>> a read I/O from the disk.
>
> but then the network should be able to deal with storm of
> <max RPC in-flight> * <# clients> to read/write data?
>
> or it's a specific switch being the bottleneck to specific node?
I think there is definitely non-trivial overhead of the MDS threads descending into the filesystem to do path lookup and permission checking than would be avoided.
> because if it isn't network, but MDS being a real bottleneck,
> then proxy might be a solution like Eric said above. not sure
> is this important in your case, but this would allow to use
> existing apps.
>
> of course, distribution tree for a handle may scale better.
I don't think the actual distribution of the handle is a significant factor (this can be done via efficient broadcast in MPI layer). If we want to keep the MDS state consistent with N openers of the file then that may take more effort. However, I also just thought of a partial solution to the MDS state issue - if the original client doing name_to_handle() also gets the MDS open lock, then it can somewhat act as "proxy" for the remaining clients that are opening via handle.
The MDS will know that the client with the MDS open lock may be doing other opens, and if the handle also contains the layout as Paul proposed, then it seems possible to get at least a reasonable representation of the file on each client w/o having an additional MDS RPC from each one. Those clients may still have issues if contacting the MDS for that file, but maybe not.
Actually implementing this is left as an exercise for the reader...
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 17:18 ` bzzz.tomas at gmail.com
2010-10-20 17:25 ` Paul Nowoczynski
2010-10-20 17:27 ` Andreas Dilger
@ 2010-10-20 17:29 ` Nicolas Williams
2010-10-20 17:40 ` bzzz.tomas at gmail.com
2 siblings, 1 reply; 35+ messages in thread
From: Nicolas Williams @ 2010-10-20 17:29 UTC (permalink / raw)
To: lustre-devel
On Wed, Oct 20, 2010 at 09:18:59PM +0400, bzzz.tomas at gmail.com wrote:
> On 10/20/10 9:11 PM, Paul Nowoczynski wrote:
> > I could be wrong but my guess is that the network congestion caused by
> > this communication pattern is a more serious problem. The mds should be
> > able to easily service lookup rpc's since only the first few necessitate
> > a read I/O from the disk.
>
> but then the network should be able to deal with storm of
> <max RPC in-flight> * <# clients> to read/write data?
>
> or it's a specific switch being the bottleneck to specific node?
>
> because if it isn't network, but MDS being a real bottleneck,
> then proxy might be a solution like Eric said above. not sure
> is this important in your case, but this would allow to use
> existing apps.
MDSes are typically CPU bound, so that's likely the issue. The problem
though is that the MDS does need to track open file state for SOM and
for dealing with unlinks. The semantics of open-by-handle might be such
that unlinks of files opened by handle can cause the file to disappear
and syscalls on FDs opened by handle could then return EBADF or EIO or
some new error code. But open-by-handle semantics don't allow for that,
then the MDS needs to track open file state, and it's hard to see how to
avoid RPCs to the MDS to establish that state (the original client could
tell the MDS about all the clients that will open-by-handle, but this
seems unlikely to perform so much better than N smaller RPCs as to
justify it, and the open-by-handle API suddenly gets much more complex).
Nico
--
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 17:13 ` Nicolas Williams
@ 2010-10-20 17:30 ` Andreas Dilger
0 siblings, 0 replies; 35+ messages in thread
From: Andreas Dilger @ 2010-10-20 17:30 UTC (permalink / raw)
To: lustre-devel
On 2010-10-20, at 11:13, Nicolas Williams wrote:
> Well, if we're going to add openg() (or whatever its name), we might as
> well add variants of stat() that don't require getting the size when the
> app doesn't need it
That is "stat_lite" (or various different names), and was also under discussion for adding to the Linux kernel, until it turned from being a sensible API to a Linux-designed-by-committee API from hell (IMHO, of course) and has stopped dead in its tracks.
> Another possibility is to say that a handle is only good as long as the
> original file descriptor remains open (recovery issues here), and that
> client can tell the MDS that it will be sharing its handle with other
> clients.
That is partly what the MDS open lock does. It was intended for NFS servers to allow them to open and close a file locally for its clients w/o MDS RPCs.
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 17:29 ` Nicolas Williams
@ 2010-10-20 17:40 ` bzzz.tomas at gmail.com
2010-10-20 18:01 ` Andreas Dilger
0 siblings, 1 reply; 35+ messages in thread
From: bzzz.tomas at gmail.com @ 2010-10-20 17:40 UTC (permalink / raw)
To: lustre-devel
On 10/20/10 9:29 PM, Nicolas Williams wrote:
> MDSes are typically CPU bound, so that's likely the issue. The problem
> though is that the MDS does need to track open file state for SOM and
> for dealing with unlinks. The semantics of open-by-handle might be such
> that unlinks of files opened by handle can cause the file to disappear
> and syscalls on FDs opened by handle could then return EBADF or EIO or
> some new error code. But open-by-handle semantics don't allow for that,
> then the MDS needs to track open file state, and it's hard to see how to
> avoid RPCs to the MDS to establish that state (the original client could
> tell the MDS about all the clients that will open-by-handle, but this
> seems unlikely to perform so much better than N smaller RPCs as to
> justify it, and the open-by-handle API suddenly gets much more complex).
I guess for this purpose they may just disable SOM and do few steps away
from POSIX. probably inter-client data consistency isn't that important
any more ;) then get rid of MDS and namespace completely using some sort
of FID.
thanks, z
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 17:40 ` bzzz.tomas at gmail.com
@ 2010-10-20 18:01 ` Andreas Dilger
2010-10-20 18:09 ` bzzz.tomas at gmail.com
0 siblings, 1 reply; 35+ messages in thread
From: Andreas Dilger @ 2010-10-20 18:01 UTC (permalink / raw)
To: lustre-devel
On 2010-10-20, at 11:40, bzzz.tomas at gmail.com wrote:
> On 10/20/10 9:29 PM, Nicolas Williams wrote:
>> MDSes are typically CPU bound, so that's likely the issue. The problem
>> though is that the MDS does need to track open file state for SOM and
>> for dealing with unlinks. The semantics of open-by-handle might be such
>> that unlinks of files opened by handle can cause the file to disappear
>> and syscalls on FDs opened by handle could then return EBADF or EIO or
>> some new error code. But open-by-handle semantics don't allow for that,
>> then the MDS needs to track open file state, and it's hard to see how to
>> avoid RPCs to the MDS to establish that state (the original client could
>> tell the MDS about all the clients that will open-by-handle, but this
>> seems unlikely to perform so much better than N smaller RPCs as to
>> justify it, and the open-by-handle API suddenly gets much more complex).
>
> I guess for this purpose they may just disable SOM and do few steps away
> from POSIX. probably inter-client data consistency isn't that important
> any more ;) then get rid of MDS and namespace completely using some sort
> of FID.
I don't think that most customers want to drop POSIX and namespaces completely, because of the huge numbers of tools/apps that depend on this, but rather to have an API that can improve the performance of select applications that have a need for it.
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 18:01 ` Andreas Dilger
@ 2010-10-20 18:09 ` bzzz.tomas at gmail.com
0 siblings, 0 replies; 35+ messages in thread
From: bzzz.tomas at gmail.com @ 2010-10-20 18:09 UTC (permalink / raw)
To: lustre-devel
On 10/20/10 10:01 PM, Andreas Dilger wrote:
>> I guess for this purpose they may just disable SOM and do few steps away
>> from POSIX. probably inter-client data consistency isn't that important
>> any more ;) then get rid of MDS and namespace completely using some sort
>> of FID.
>
> I don't think that most customers want to drop POSIX and namespaces completely, because of the huge numbers of tools/apps that depend on this, but rather to have an API that can improve the performance of select applications that have a need for it.
oh, sorry for this sort of joke.. what I mean is that
probably we could provide with another user-visible
API which allows to bypass regular namespace, for example.
thanks, z
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Lustre-devel] Queries regarding LDLM_ENQUEUE
2010-10-20 7:55 ` Andreas Dilger
2010-10-20 8:11 ` bzzz.tomas at gmail.com
@ 2010-10-22 2:33 ` Vilobh Meshram
1 sibling, 0 replies; 35+ messages in thread
From: Vilobh Meshram @ 2010-10-22 2:33 UTC (permalink / raw)
To: lustre-devel
Thanks Andreas for the e-mail.
I am trying to modify the LDLM_ENQUEUE rpc to get the the reply in the form
of some buffer (say string "Hello World") filled in from MDS.I have
explained the use case in my last e-mail.Please refer my e-mail sent on
10/19.
I have attached the diff files. I am getting a kernel panic at the MDS end
when I try to make the attached changes.Can someone please suggest me where
I might be missing ?
Thanks,
Vilobh
*Graduate Research Associate
Department of Computer Science
The Ohio State University Columbus Ohio*
On Wed, Oct 20, 2010 at 3:55 AM, Andreas Dilger
<andreas.dilger@oracle.com>wrote:
> On 2010-10-19, at 20:04, Vilobh Meshram wrote:
> > We are trying to do following things.Please let me know if things are not
> clear :-
> >
> > Say we have 2 client C1 and C2 and a MDS .Say C1 and C2 share a file.
> > 1) When a client C1 performs a open/create kind of request to the MDS we
> want to follow the normal path which Lustre performs.
> > 2) Now say C2 tries to open the same file which was opened by C1.
> > 3) At the MDS end we maintain some data structure to scan and see if the
> file was already opened by some Client(in this case C1 has opened this
> file).
> > 4) If MDS finds that some client(C1 here) has already opened the file
> then it send the new client(C2 here) with some information about the client
> which has initially opened the file.
>
> While I understand the basic concept, I don't really see how your proposal
> will actually improve performance. If C2 already has to contact the MDS and
> get a reply from it, then wouldn't it be about the same to simply perform
> the open as is done today? The number of MDS RPCs is the same, and in fact
> this would avoid further message overhead between C1 and C2.
>
> > 5) Once C2 gets the information its upto C2 to take further actions.
> > 6) By this process we can save the time spent in the locking mechanism
> for C2.Basically we aim to by-pass the locking scheme of Lustre for the
> files already opened by some client by maintaining some kind of data
> structure.
> >
> > Please let us know your thoughts on the above approach.Is this a feasible
> design moving ahead can we see any complications ?
>
> There is a separate proposal that has been underway in the Linux community
> for some time, to allow a user process to get a file handle (i.e. binary
> blob returned from a new name_to_handle() syscall) from the kernel for a
> given pathname, and then later use that file handle in another process to
> open a file descriptor without re-traversing the path.
>
> I've been thinking this would be very useful for Lustre (and MPI in
> general), and have tried to steer the Linux development in a direction that
> would allow this to happen. Is this in line with what you are
> investigating?
>
> While this wouldn't eliminate the actual MDS open RPC (i.e. the
> LDLM_ENQUEUE you have been discussing), it could avoid the path traversal
> from each client, possibly saving {path_elements * num_clients} additional
> RPCs,
>
> > So considering the problem statement I need a way for C2 to extract the
> information from the data structure maintained at MDS.In order to do that ,
> C2 will send a request with intent = create|open which will be a
> LDLM_ENQUEUE RPC.I need to modify this RPC such that :-
> > 1) I can enclose some additional buffer whose size is known to me .
> > 2) When we pack the reply at the MDS side we should be able to include
> this buffer in the reply message .
> > 3) At the client side we should be able to extract the information from
> the reply message about the buffer.
> >
> > As of now , I need help in above three steps.
> >
> > Thanks,
> > Vilobh
> > Graduate Research Associate
> > Department of Computer Science
> > The Ohio State University Columbus Ohio
> >
> >
> > On Tue, Oct 19, 2010 at 6:53 PM, Andreas Dilger <
> andreas.dilger at oracle.com> wrote:
> > On 2010-10-19, at 14:28, Vilobh Meshram wrote:
> > > From my exploration it seems like for create/open kind of request
> LDLM_ENQUEUE is the RPC through which the client talks to MDS.Please confirm
> on this.
> > >
> > > Since I could figure out that LDLM_ENQUEUE is the only RPC to interface
> with MDS I am planning to send the LDLM_ENQUEUE RPC with some additonal
> buffer from the client to the MDS so that based on some specific condition
> the MDS can fill the information in the buffer sent from the client.
> >
> > This isn't correct. LDLM_ENQUEUE is used for enqueueing locks. It just
> happens that when Lustre wants to create a new file it enqueues a lock on
> the parent directory with the "intent" to create a new file. The MDS
> currently always replies "you cannot have the lock for the directory, I
> created the requested file for you". Similarly, when the client is getting
> attributes on a file, it needs a lock on that file in order to cache the
> attributes, and to save RPCs the attributes are returned with the lock.
> >
> > > I have made some modifications to the code for the LDLM_ENQUEUE RPC but
> I am getting kernel panics.Can someone please help me and suggest me what is
> a good way to tackle this problem.I am using Lustre 1.8.1.1 and I cannot
> upgrade to Lustre 2.0.
> >
> > It would REALLY be a lot easier to have this discussion with you if you
> actually told us what it is you are working on. Not only could we focus on
> the higher-level issue that you are trying to solve (instead of possibly
> wasting a lot of time focussing in a small issue that may in fact be
> completely irrelevant), but with many ideas related to Lustre it has
> probably already been discussed at length by the Lustre developers sometime
> over the past 8 years that we've been working on it. I suspect that the
> readership of this list could probably give you a lot of assistance with
> whatever you are working on, if you will only tell us what it actually is
> you are trying to do.
> >
> > > On Mon, Oct 18, 2010 at 7:33 PM, Vilobh Meshram <
> vilobh.meshram at gmail.com> wrote:
> > >> Out of the many RPC's used in Lustre seems like LDLM_ENQUEUE is the
> most frequently used RPC to communicate between the client and the MDS.I
> have few queries regarding the same :-
> > >>
> > >> 1) Is LDLM_ENQUEUE the only interface(RPC here) for CREATE/OPEN kind
> of request ; through which the client can interact with the MDS ?
> > >>
> > >> I tried couple of experiments and found out that LDLM_ENQUEUE comes
> into picture while mounting the FS as well as when we do a lookup,create or
> open a file.I was expecting the MDS_REINT RPC to get invoked in case of a
> CREATE/OPEN request via mdc_create() but it seems like Lustre invokes
> LDLM_ENQEUE even for CREATE/OPEN( by packing the intent related data).
> > >> Please correct me if I am wrong.
> > >>
> > >> 2) In which cases (which system calls) does the MDS_REINT RPC will get
> invoked ?
> >
> >
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Lustre Technical Lead
> > Oracle Corporation Canada Inc.
> >
> >
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20101021/495a5f4a/attachment.htm>
-------------- next part --------------
*** ./lustre/ldlm/ldlm_lockd.c 2010-10-21 17:49:05.000000000 -0400
--- ../fresh/lustre/ldlm/ldlm_lockd.c 2010-10-15 15:37:02.000000000 -0400
*************** int ldlm_handle_enqueue(struct ptlrpc_re
*** 997,1017 ****
struct obd_device *obddev = req->rq_export->exp_obd;
struct ldlm_reply *dlm_rep;
struct ldlm_request *dlm_req;
! __u32 size[4] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
[DLM_LOCKREPLY_OFF] = sizeof(*dlm_rep) };
int rc = 0;
__u32 flags;
ldlm_error_t err = ELDLM_OK;
struct ldlm_lock *lock = NULL;
void *cookie = NULL;
- int i;
- char *str = "Hello World Sun";
- char *str_target;
ENTRY;
LDLM_DEBUG_NOLOCK("server-side enqueue handler START");
! printk("\n Inside function %s server-side enqueue handler START",__func__);
! for(i=0;i<3;i++) printk("\n Inside function %s size[%d]:%d",__func__,i,size[i]);
dlm_req = lustre_swab_reqbuf(req, DLM_LOCKREQ_OFF, sizeof(*dlm_req),
lustre_swab_ldlm_request);
if (dlm_req == NULL) {
--- 997,1013 ----
struct obd_device *obddev = req->rq_export->exp_obd;
struct ldlm_reply *dlm_rep;
struct ldlm_request *dlm_req;
! __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
[DLM_LOCKREPLY_OFF] = sizeof(*dlm_rep) };
int rc = 0;
__u32 flags;
ldlm_error_t err = ELDLM_OK;
struct ldlm_lock *lock = NULL;
void *cookie = NULL;
ENTRY;
LDLM_DEBUG_NOLOCK("server-side enqueue handler START");
!
dlm_req = lustre_swab_reqbuf(req, DLM_LOCKREQ_OFF, sizeof(*dlm_req),
lustre_swab_ldlm_request);
if (dlm_req == NULL) {
*************** existing_lock:
*** 1126,1148 ****
int buffers = 2;
lock_res_and_lock(lock);
- printk("\n Exsisting lock lock->l_resource->lr_lvb_len:%u",lock->l_resource->lr_lvb_len);
if (lock->l_resource->lr_lvb_len) {
- printk("\n Inside function %s , inside condition lock->l_resource->lr_lvb_len so buffers=3",__func__);
size[DLM_REPLY_REC_OFF] = lock->l_resource->lr_lvb_len;
buffers = 3;
}
- //size[DLM_REPLY_REC_OFF] = 16;
- //buffer = buffer + 1;
- if(lock->l_resource->lr_lvb_len == 0)
- {
- buffers++;
- size[DLM_REPLY_REC_OFF] = 0;
- }
- buffers++;
- size[DLM_REPLY_REC_OFF+1] = 16;
unlock_res_and_lock(lock);
! printk("\n Inside function %s , outside condition lock->l_resource->lr_lvb_len so buffers=2",__func__);
if (OBD_FAIL_CHECK_ONCE(OBD_FAIL_LDLM_ENQUEUE_EXTENT_ERR))
GOTO(out, rc = -ENOMEM);
--- 1122,1133 ----
int buffers = 2;
lock_res_and_lock(lock);
if (lock->l_resource->lr_lvb_len) {
size[DLM_REPLY_REC_OFF] = lock->l_resource->lr_lvb_len;
buffers = 3;
}
unlock_res_and_lock(lock);
!
if (OBD_FAIL_CHECK_ONCE(OBD_FAIL_LDLM_ENQUEUE_EXTENT_ERR))
GOTO(out, rc = -ENOMEM);
*************** existing_lock:
*** 1156,1164 ****
if (dlm_req->lock_desc.l_resource.lr_type == LDLM_EXTENT)
lock->l_req_extent = lock->l_policy_data.l_extent;
- printk("%s: \twill do lock-enq...\n", __func__);
err = ldlm_lock_enqueue(obddev->obd_namespace, &lock, cookie, (int *)&flags);
- printk("%s: \tafter lock-enq...\n", __func__);
if (err)
GOTO(out, err);
--- 1141,1147 ----
*************** existing_lock:
*** 1178,1185 ****
dlm_rep->lock_flags |= dlm_req->lock_flags & LDLM_INHERIT_FLAGS;
lock->l_flags |= dlm_req->lock_flags & LDLM_INHERIT_FLAGS;
- str_target = lustre_msg_buf(req->rq_repmsg, DLM_REPLY_REC_OFF+1,16);
- memcpy(str_target,str,16);
/* Don't move a pending lock onto the export if it has already
* been evicted. Cancel it now instead. (bug 5683) */
if (req->rq_export->exp_failed ||
--- 1161,1166 ----
*************** existing_lock:
*** 1232,1238 ****
EXIT;
out:
- printk("\n [VM] Inside function %s got a hit at out",__func__);
req->rq_status = rc ?: err; /* return either error - bug 11190 */
if (!req->rq_packed_final) {
err = lustre_pack_reply(req, 1, NULL, NULL);
--- 1213,1218 ----
*************** existing_lock:
*** 1248,1257 ****
if (rc == 0) {
lock_res_and_lock(lock);
- printk("\n Inside function %s , inside if condition rc=0 the place where we do a memcpy for offset = DLM_REPLY_REC_OFF",__func__);
size[DLM_REPLY_REC_OFF] = lock->l_resource->lr_lvb_len;
- printk("\n Inside function %s , size[DLM_REPLY_REC_OFF] : %u , lock->l_resource->lr_lvb_len :%u",__func__,size[DLM_REPLY_REC_OFF],lock->l_resource->lr_lvb_len);
- size[DLM_REPLY_REC_OFF+1]= 16;
if (size[DLM_REPLY_REC_OFF] > 0) {
void *lvb = lustre_msg_buf(req->rq_repmsg,
DLM_REPLY_REC_OFF,
--- 1228,1234 ----
*************** existing_lock:
*** 1264,1270 ****
}
unlock_res_and_lock(lock);
} else {
- printk("\n Inside function %s , inside else condition rc=0 the place where we do a memcpy for offset = DLM_REPLY_REC_OFF",__func__);
lock_res_and_lock(lock);
ldlm_resource_unlink_lock(lock);
ldlm_lock_destroy_nolock(lock);
--- 1241,1246 ----
-------------- next part --------------
*** ./lustre/ldlm/ldlm_request.c 2010-10-21 22:26:28.000000000 -0400
--- ../fresh/lustre/ldlm/ldlm_request.c 2010-10-15 15:37:02.000000000 -0400
*************** int ldlm_cli_enqueue_fini(struct obd_exp
*** 389,395 ****
int cleanup_phase = 1;
ENTRY;
- printk("\n Inside function %s",__func__);
lock = ldlm_handle2lock(lockh);
/* ldlm_cli_enqueue is holding a reference on this lock. */
if (!lock) {
--- 389,394 ----
*************** int ldlm_cli_enqueue_fini(struct obd_exp
*** 401,407 ****
LASSERT(!is_replay);
LDLM_DEBUG(lock, "client-side enqueue END (%s)",
rc == ELDLM_LOCK_ABORTED ? "ABORTED" : "FAILED");
- printk("\n Inside %s if client lock aborted or failed",__func__);
if (rc == ELDLM_LOCK_ABORTED) {
/* Before we return, swab the reply */
reply = lustre_swab_repbuf(req, DLM_LOCKREPLY_OFF,
--- 400,405 ----
*************** int ldlm_cli_enqueue_fini(struct obd_exp
*** 433,440 ****
GOTO(cleanup, rc = -EPROTO);
}
- printk("\n Inside function %s we have received a reply",__func__);
-
/* lock enqueued on the server */
cleanup_phase = 0;
--- 431,436 ----
*************** int ldlm_cli_enqueue_fini(struct obd_exp
*** 463,469 ****
* again. */
if ((*flags) & LDLM_FL_LOCK_CHANGED) {
int newmode = reply->lock_desc.l_req_mode;
- printk("\n Inside function %s in condition (*flags) & LDLM_FL_LOCK_CHANGED)",__func__);
LASSERT(!is_replay);
if (newmode && newmode != lock->l_req_mode) {
LDLM_DEBUG(lock, "server returned different mode %s",
--- 459,464 ----
*************** int ldlm_cli_enqueue_fini(struct obd_exp
*** 504,510 ****
* because it cannot handle asynchronous ASTs robustly (see
* bug 7311). */
(LIBLUSTRE_CLIENT && type == LDLM_EXTENT)) {
- printk("\n Inside function %s in condition ((*flags) & LDLM_FL_AST_SENT ||(LIBLUSTRE_CLIENT && type == LDLM_EXTENT))",__func__);
lock_res_and_lock(lock);
lock->l_flags |= LDLM_FL_CBPENDING | LDLM_FL_BL_AST;
unlock_res_and_lock(lock);
--- 499,504 ----
*************** int ldlm_cli_enqueue_fini(struct obd_exp
*** 515,521 ****
* clobber the LVB with an older one. */
if (lvb_len && (lock->l_req_mode != lock->l_granted_mode)) {
void *tmplvb;
- printk("\n Inside function %s in condition lvb_len && (lock->l_req_mode != lock->l_granted_mode) , lvb_len:%d",__func__,lvb_len);
tmplvb = lustre_swab_repbuf(req, DLM_REPLY_REC_OFF, lvb_len,
lvb_swabber);
if (tmplvb == NULL)
--- 509,514 ----
*************** int ldlm_cli_enqueue_fini(struct obd_exp
*** 524,530 ****
}
if (!is_replay) {
- printk("\n Inside function %s in condition !is_replay",__func__);
rc = ldlm_lock_enqueue(ns, &lock, NULL, flags);
if (lock->l_completion_ast != NULL) {
int err = lock->l_completion_ast(lock, *flags, NULL);
--- 517,522 ----
*************** int ldlm_cli_enqueue_fini(struct obd_exp
*** 536,542 ****
}
if (lvb_len && lvb != NULL) {
- printk("\n Inside function %s in condition lvb_len && lvb != NULL",__func__);
/* Copy the LVB here, and not earlier, because the completion
* AST (if any) can override what we got in the reply */
memcpy(lvb, lock->l_lvb_data, lvb_len);
--- 528,533 ----
*************** static inline int ldlm_req_handles_avail
*** 560,578 ****
__u32 *size, int bufcount, int off)
{
int avail = min_t(int, LDLM_MAXREQSIZE, CFS_PAGE_SIZE - 512);
! printk("\n Inside function %s",__func__);
! printk("\n avail--before = %d",avail);
avail -= lustre_msg_size(class_exp2cliimp(exp)->imp_msg_magic,
bufcount, size);
! printk("\n avail--after = %d",avail);
! if (likely(avail >= 0)){
avail /= (int)sizeof(struct lustre_handle);
- printk("\n avail--likely = %d",avail);
- }
else
avail = 0;
avail += LDLM_LOCKREQ_HANDLES - off;
! printk("\n avail--lats = %d",avail);
return avail;
}
--- 551,565 ----
__u32 *size, int bufcount, int off)
{
int avail = min_t(int, LDLM_MAXREQSIZE, CFS_PAGE_SIZE - 512);
!
avail -= lustre_msg_size(class_exp2cliimp(exp)->imp_msg_magic,
bufcount, size);
! if (likely(avail >= 0))
avail /= (int)sizeof(struct lustre_handle);
else
avail = 0;
avail += LDLM_LOCKREQ_HANDLES - off;
!
return avail;
}
*************** struct ptlrpc_request *ldlm_prep_elc_req
*** 597,622 ****
CFS_LIST_HEAD(head);
ENTRY;
- printk("\n Inside function %s, opc=%d",__func__, opc);
if (cancels == NULL)
cancels = &head;
if (exp_connect_cancelset(exp)) {
/* Estimate the amount of free space in the request. */
- printk("\n Inside exp_connect_cancelset(exp) in func %s",__func__);
LASSERT(bufoff < bufcount);
avail = ldlm_req_handles_avail(exp, size, bufcount, canceloff);
- printk("\n In function %s avail = %d",__func__,avail);
flags = ns_connect_lru_resize(ns) ?
LDLM_CANCEL_LRUR : LDLM_CANCEL_AGED;
- printk("\n In function %s ns_connect_lru_resize(ns) :%d",__func__,ns_connect_lru_resize(ns));
to_free = !ns_connect_lru_resize(ns) &&
opc == LDLM_ENQUEUE ? 1 : 0;
/* Cancel lru locks here _only_ if the server supports
* EARLY_CANCEL. Otherwise we have to send extra CANCEL
* rpc, what will make us slower. */
- printk("\n In function %s count = %d",__func__,count);
if (avail > count)
count += ldlm_cancel_lru_local(ns, cancels, to_free,
avail - count, 0, flags);
--- 584,604 ----
*************** struct ptlrpc_request *ldlm_prep_elc_req
*** 624,632 ****
pack = count;
else
pack = avail;
- printk("\n In function %s pack = %d",__func__,pack);
size[bufoff] = ldlm_request_bufsize(pack, opc);
- printk("\n In function %s , bufoff : %d , size[bufoff]= %u",__func__,bufoff,size[bufoff]);
}
req = ptlrpc_prep_req(class_exp2cliimp(exp), version,
--- 606,612 ----
*************** struct ptlrpc_request *ldlm_prep_enqueue
*** 657,663 ****
struct list_head *cancels,
int count)
{
- printk("\n Inside function %s \n",__func__);
return ldlm_prep_elc_req(exp, LUSTRE_DLM_VERSION, LDLM_ENQUEUE,
bufcount, size, DLM_LOCKREQ_OFF,
LDLM_ENQUEUE_CANCEL_OFF, cancels, count);
--- 637,642 ----
*************** int ldlm_cli_enqueue(struct obd_export *
*** 679,697 ****
struct ldlm_lock *lock;
struct ldlm_request *body;
struct ldlm_reply *reply;
! __u32 size[4] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
[DLM_LOCKREQ_OFF] = sizeof(*body),
[DLM_REPLY_REC_OFF] = lvb_len ? lvb_len :
! sizeof(struct ost_lvb),
! [DLM_REPLY_REC_OFF+1] = 16};
int is_replay = *flags & LDLM_FL_REPLAY;
int req_passed_in = 1, rc, err;
struct ptlrpc_request *req;
- int i;
ENTRY;
- printk("\n Inside function %s \n",__func__);
- for(i=0;i<4;i++) printk("\n size[%d] : %d",i,size[i]);
LASSERT(exp != NULL);
/* If we're replaying this lock, just check some invariants.
--- 658,672 ----
struct ldlm_lock *lock;
struct ldlm_request *body;
struct ldlm_reply *reply;
! __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
[DLM_LOCKREQ_OFF] = sizeof(*body),
[DLM_REPLY_REC_OFF] = lvb_len ? lvb_len :
! sizeof(struct ost_lvb) };
int is_replay = *flags & LDLM_FL_REPLAY;
int req_passed_in = 1, rc, err;
struct ptlrpc_request *req;
ENTRY;
LASSERT(exp != NULL);
/* If we're replaying this lock, just check some invariants.
*************** int ldlm_cli_enqueue(struct obd_export *
*** 700,706 ****
lock = ldlm_handle2lock(lockh);
LASSERT(lock != NULL);
LDLM_DEBUG(lock, "client-side enqueue START");
- printk("\n Client-side enqueue START in %s",__func__);
LASSERT(exp == lock->l_conn_export);
} else {
lock = ldlm_lock_create(ns, res_id, einfo->ei_type,
--- 675,680 ----
*************** int ldlm_cli_enqueue(struct obd_export *
*** 736,742 ****
/* lock not sent to server yet */
if (reqp == NULL || *reqp == NULL) {
! req = ldlm_prep_enqueue_req(exp,3, size, NULL, 0);
if (req == NULL) {
failed_lock_cleanup(ns, lock, lockh, einfo->ei_mode);
LDLM_LOCK_PUT(lock);
--- 710,716 ----
/* lock not sent to server yet */
if (reqp == NULL || *reqp == NULL) {
! req = ldlm_prep_enqueue_req(exp, 2, size, NULL, 0);
if (req == NULL) {
failed_lock_cleanup(ns, lock, lockh, einfo->ei_mode);
LDLM_LOCK_PUT(lock);
*************** int ldlm_cli_enqueue(struct obd_export *
*** 746,752 ****
if (reqp)
*reqp = req;
} else {
- printk("\n [VM]got a hit at case where reqp is not NULL in %s",__func__);
req = *reqp;
LASSERTF(lustre_msg_buflen(req->rq_reqmsg, DLM_LOCKREQ_OFF) >=
sizeof(*body), "buflen[%d] = %d, not %d\n",
--- 720,725 ----
*************** int ldlm_cli_enqueue(struct obd_export *
*** 768,774 ****
/* Continue as normal. */
if (!req_passed_in) {
size[DLM_LOCKREPLY_OFF] = sizeof(*reply);
! ptlrpc_req_set_repsize(req, 4, size);
}
/*
--- 741,747 ----
/* Continue as normal. */
if (!req_passed_in) {
size[DLM_LOCKREPLY_OFF] = sizeof(*reply);
! ptlrpc_req_set_repsize(req, 3, size);
}
/*
*************** int ldlm_cli_enqueue(struct obd_export *
*** 784,793 ****
RETURN(0);
}
- printk("\n in --func-- %s SENDING REQUEST",__func__);
LDLM_DEBUG(lock, "sending request");
rc = ptlrpc_queue_wait(req);
- printk("\n in --func-- %s REQUEST SENT after ptlrpc_queue_wait",__func__);
err = ldlm_cli_enqueue_fini(exp, req, einfo->ei_type, policy ? 1 : 0,
einfo->ei_mode, flags, lvb, lvb_len,
lvb_swabber, lockh, rc);
--- 757,764 ----
-------------- next part --------------
*** ./lustre/mdc/mdc_locks.c 2010-10-20 20:58:51.000000000 -0400
--- ../fresh/lustre/mdc/mdc_locks.c 2010-10-15 15:37:15.000000000 -0400
*************** static struct ptlrpc_request *mdc_intent
*** 252,264 ****
int repbufcount = 5;
int mode;
int rc;
- int i;
ENTRY;
- printk("\n Inside function %s",__func__);
- for(i=0;i<6;i++) printk("\n size[%d] : %d",i,size[i]);
- for(i=0;i<5;i++) printk("\n repsize[%d] : %d",i,repsize[i]);
-
it->it_create_mode = (it->it_create_mode & ~S_IFMT) | S_IFREG;
if (mdc_exp_is_2_0_server(exp)) {
size[DLM_INTENT_REC_OFF] = sizeof(struct mdt_rec_create);
--- 252,259 ----
*************** static struct ptlrpc_request *mdc_intent
*** 381,405 ****
struct ptlrpc_request *req;
struct ldlm_intent *lit;
struct obd_device *obddev = class_exp2obd(exp);
! __u32 size[] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
[DLM_LOCKREQ_OFF] = sizeof(struct ldlm_request),
[DLM_INTENT_IT_OFF] = sizeof(*lit),
[DLM_INTENT_REC_OFF] = sizeof(struct mdt_body),
[DLM_INTENT_REC_OFF+1]= data->namelen + 1,
! [DLM_INTENT_REC_OFF+2]= 0,
! [DLM_INTENT_REC_OFF+3]= 16 };
! __u32 repsize[] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
[DLM_LOCKREPLY_OFF] = sizeof(struct ldlm_reply),
[DLM_REPLY_REC_OFF] = sizeof(struct mdt_body),
[DLM_REPLY_REC_OFF+1] = obddev->u.cli.
cl_max_mds_easize,
[DLM_REPLY_REC_OFF+2] = LUSTRE_POSIX_ACL_MAX_SIZE,
! [DLM_REPLY_REC_OFF+3] = 0,
! [DLM_REPLY_REC_OFF+4] = 16 };
obd_valid valid = OBD_MD_FLGETATTR | OBD_MD_FLEASIZE | OBD_MD_FLACL |
OBD_MD_FLMODEASIZE | OBD_MD_FLDIREA;
! int bufcount = 6;
! int i=0;
ENTRY;
if (mdc_exp_is_2_0_server(exp)) {
--- 376,397 ----
struct ptlrpc_request *req;
struct ldlm_intent *lit;
struct obd_device *obddev = class_exp2obd(exp);
! __u32 size[6] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
[DLM_LOCKREQ_OFF] = sizeof(struct ldlm_request),
[DLM_INTENT_IT_OFF] = sizeof(*lit),
[DLM_INTENT_REC_OFF] = sizeof(struct mdt_body),
[DLM_INTENT_REC_OFF+1]= data->namelen + 1,
! [DLM_INTENT_REC_OFF+2]= 0 };
! __u32 repsize[6] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
[DLM_LOCKREPLY_OFF] = sizeof(struct ldlm_reply),
[DLM_REPLY_REC_OFF] = sizeof(struct mdt_body),
[DLM_REPLY_REC_OFF+1] = obddev->u.cli.
cl_max_mds_easize,
[DLM_REPLY_REC_OFF+2] = LUSTRE_POSIX_ACL_MAX_SIZE,
! [DLM_REPLY_REC_OFF+3] = 0 };
obd_valid valid = OBD_MD_FLGETATTR | OBD_MD_FLEASIZE | OBD_MD_FLACL |
OBD_MD_FLMODEASIZE | OBD_MD_FLDIREA;
! int bufcount = 5;
ENTRY;
if (mdc_exp_is_2_0_server(exp)) {
*************** static struct ptlrpc_request *mdc_intent
*** 407,418 ****
size[DLM_INTENT_REC_OFF+2] = data->namelen + 1;
bufcount = 6;
}
-
- printk("%s: prep-enq-req: bufcnt=%d\n", __func__, bufcount);
- for(i=0; i<bufcount; i++) {
- printk("\tsize[%d]=%u\n", i,size[i] );
- printk("\trepsize[%d]=%u\n", i,repsize[i] );
- }
req = ldlm_prep_enqueue_req(exp, bufcount, size, NULL, 0);
if (req) {
/* pack the intent */
--- 399,404 ----
*************** static int mdc_finish_enqueue(struct obd
*** 455,461 ****
struct ldlm_reply *lockrep;
ENTRY;
- printk("\n Inside function %s",__func__);
LASSERT(rc >= 0);
/* Similarly, if we're going to replay this request, we don't want to
* actually get a lock, just perform the intent. */
--- 441,446 ----
*************** static int mdc_finish_enqueue(struct obd
*** 517,523 ****
/* We know what to expect, so we do any byte flipping required here */
if (it->it_op & (IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR)) {
struct mds_body *body;
! printk("\n Inside function %s inside condition IT_OPEN , IT_LOOKUP , IT_GETATTR",__func__);
body = lustre_swab_repbuf(req, DLM_REPLY_REC_OFF, sizeof(*body),
lustre_swab_mds_body);
if (body == NULL) {
--- 502,508 ----
/* We know what to expect, so we do any byte flipping required here */
if (it->it_op & (IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR)) {
struct mds_body *body;
!
body = lustre_swab_repbuf(req, DLM_REPLY_REC_OFF, sizeof(*body),
lustre_swab_mds_body);
if (body == NULL) {
*************** int mdc_enqueue(struct obd_export *exp,
*** 587,593 ****
int rc;
ENTRY;
- printk("\n Inside function %s \n",__func__);
fid_build_reg_res_name((void *)&data->fid1, &res_id);
LASSERTF(einfo->ei_type == LDLM_IBITS,"lock type %d\n", einfo->ei_type);
if (it->it_op & (IT_UNLINK | IT_GETATTR | IT_READDIR))
--- 572,577 ----
*************** int mdc_intent_getattr_async(struct obd_
*** 924,933 ****
int flags = LDLM_FL_HAS_INTENT;
ENTRY;
- printk("%s: name: %.*s in inode "LPU64", intent: %s flags %#o\n",__func__,
- op_data->namelen, op_data->name, op_data->fid1.id,
- ldlm_it2str(it->it_op), it->it_flags);
-
CDEBUG(D_DLMTRACE,"name: %.*s in inode "LPU64", intent: %s flags %#o\n",
op_data->namelen, op_data->name, op_data->fid1.id,
ldlm_it2str(it->it_op), it->it_flags);
--- 908,913 ----
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2010-10-22 2:33 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-18 23:33 [Lustre-devel] Queries regarding LDLM_ENQUEUE Vilobh Meshram
2010-10-19 15:46 ` Fan Yong
2010-10-19 20:28 ` Vilobh Meshram
2010-10-19 22:53 ` Andreas Dilger
2010-10-20 2:04 ` Vilobh Meshram
2010-10-20 7:55 ` Andreas Dilger
2010-10-20 8:11 ` bzzz.tomas at gmail.com
2010-10-20 8:24 ` Andreas Dilger
2010-10-20 8:30 ` bzzz.tomas at gmail.com
2010-10-20 8:38 ` Nikita Danilov
2010-10-20 14:45 ` Nicolas Williams
2010-10-20 13:30 ` Eric Barton
2010-10-20 13:40 ` bzzz.tomas at gmail.com
2010-10-20 14:51 ` Paul Nowoczynski
2010-10-20 14:55 ` Nicolas Williams
2010-10-20 15:16 ` Paul Nowoczynski
2010-10-20 16:07 ` Andreas Dilger
2010-10-20 15:22 ` bzzz.tomas at gmail.com
2010-10-20 16:43 ` Paul Nowoczynski
2010-10-20 16:49 ` bzzz.tomas at gmail.com
2010-10-20 17:11 ` Paul Nowoczynski
2010-10-20 17:18 ` bzzz.tomas at gmail.com
2010-10-20 17:25 ` Paul Nowoczynski
2010-10-20 17:27 ` Andreas Dilger
2010-10-20 17:29 ` Nicolas Williams
2010-10-20 17:40 ` bzzz.tomas at gmail.com
2010-10-20 18:01 ` Andreas Dilger
2010-10-20 18:09 ` bzzz.tomas at gmail.com
2010-10-20 16:35 ` Andreas Dilger
2010-10-20 16:46 ` Paul Nowoczynski
2010-10-20 17:00 ` Andreas Dilger
2010-10-20 17:13 ` Nicolas Williams
2010-10-20 17:30 ` Andreas Dilger
2010-10-20 17:01 ` Nicolas Williams
2010-10-22 2:33 ` Vilobh Meshram
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.