* Re: libcephfs: Open-By-Handle API - our patch and it's description [not found] <138894132.40.1372343148764.JavaMail.root@thunderbeast.private.linuxbox.com> @ 2013-06-27 14:37 ` Matt W. Benjamin 2013-06-27 19:39 ` Sage Weil 0 siblings, 1 reply; 4+ messages in thread From: Matt W. Benjamin @ 2013-06-27 14:37 UTC (permalink / raw) To: Sage Weil; +Cc: aemerson, David Zafman, ceph-devel, Ilya Storozhilov Hi Sage, Adam and I haven't looked in detail yet, but based on the prior email discussion, and not counting nits, Ilya's changes seem fine. If you agree, I and Adam take on the task of rebasing all our changes currently on wip-libcephfs, into a small number of changes, following your lead in the "rebase" branch where possible. (I had some questions about how the rebase branch was assembled initially, before bugging you, we'll have a fresh look at it.) We may need some interaction on IRC, but with luck can have a prototype branch somewhere today or tomorrow. (David, Sage, wrt IRC availability, does that sound likely to work?) Our work in progress changes logically follow all of this, so I'm not worried about those yet. Matt ----- "Sage Weil" <sage@inktank.com> wrote: > Hi Ilya! > > On Thu, 27 Jun 2013, Ilya Storozhilov wrote: > > Hi Matt (CC Sage, David, Adam), > > > > > > we have completed with our prototype of NFS/CIFS frontends to CEPH > > filesystem. In order to do it we had to fork ?wip-libcephfs? branch > (see > > https://github.com/ferustigris/ceph and it?s ?open_by_handle_api? > branch) to fix > > some issues with libcephfs open-by-handle API - see below. I?ve > attached a > > respective patch, which: > > I posted some comments on github on the changes. I think we're > close! > > > Does it possible to merge our changes to ?wip-libcephfs? branch of > the official > > CEPH repository? If some action is needed from our side (further > amendments, > > merge-requests, etc.) - let us know, please. > > Matt, what is your plan for this branch? Is it/are you rebasing? Are > > there other changes in progress? > > I'd very much like to get the basic changes (the open by handle stuff) > > into the tree during the next week or two so that we make the dumpling > > release cutoff. Getting some of the additional bits that you guys > specifically need ironed out (like the flushing and cap ref stuff) is > less > important for that timeframe, if that helps. > > Ideally, the patch series would look more like the rebased version I > pushed (but with updated changelogs for each patch and correct > authorship/sign-off. If it's based on current master, and this patch > goes > on top, I'll pull it in. > > It would also be great to see some tests in test/libcephfs/ for the > new > calls as well... :) > > > There are still some unresolved issues in libcephfs, which are have > to be > > reported: > > > > * > > > > A file could be manipulated by it?s resource handle (inode) even > if it has > > been already removed by another client and syncing all data in > both > > cephfs connections is not helpful in this case; > > If I'm understanding you, this is normal unix behavior: open, unlink, > do > i/o for as long as you want, close. That is by design. > > > * > > > > Invalid hardlinks count for directories: always 1 instead of 2 > plus one > > per each subdirectory as it?s dictated by POSIX. > > This is a very common divergence from POSIX, and one you'll find with > > btrfs and others as well. IIRC the de facto convention is that if the > > link count is 1 it does not reflect the sub directory count, but if it > is > >= 2 it does. You'll see code to this effect in GNU find (which tries > to > use the link count to optimize its search when -type d). > > Thanks! > sage > > > > > > Should we report this bugs on github? > > > > > > Best regards! > > > > -- > > Ilya Storozhilov > > Lead Software Engineer > > > > EPAM Systems > > Tver office, Russia > > GMT+4 > > > > ________________________________________ > > ??: Matt W. Benjamin [matt@linuxbox.com] > > ??????????: 6 ???? 2013 ?.21:47 > > To: Ilya Storozhilov > > Cc: Sage Weil; aemerson; David Zafman > > ????: DRAFT: Re: HA: libcephfs: Open-By-Handle API question > > > > Hi Ilya (CC Sage, David, Adam), > > > > Thanks for looking at the branch. > > > > We'll try to help you get the most out of Ganesha+Ceph, if that's > your goal. > > > > I dont overly like "open-by-handle" as a descriptor for all the > changes we > > and David Zafman have made so far, but obviously, that's a core > > functionality it's providing to, e.g., an NFS re-exporter. > > > > In terms of your document, > > > > #1 The "resource handle" vs. pointer to opaque object distinction > seems like > > a merely cosmetic change. What the API does now is typical of many C > APIs > > (see cred_t in NetBSD kernel, many others), as well as C-over-C++ > APIs. > > (Perhaps this branch has actual C++ leakage, if so, it's > inadvertant, and > > we'll fix.) > > > > #2 The "recoverable handle" concept needs to be subordinate to the > semantics > > of the objects and capability model of Ceph. It sounds like you're > generally > > speaking about objects at the NFS protocol level. With respect to > NFS > > filehandles, everything you're asserting is already true. With > respect to > > open state and other cluster coherence concepts, you've gone past > the NFS > > protocol to details of a clustered implementation. Our team has > specific > > requirements here that may differ from yours, just fyi. > > > > #3, #4 I (and Adam here) aren't fully persuaded of this. While > embedding > > objects in Ganesha handle memory would save small allocations, that > > may not be a critical consideration. From a Ganesha perspective, > it's > > definitely just a hidden property of the driver implementation. I > think > > intuitively we're not really keen on having Inode objects (say) be > expanded > > in caller handles, although yes, Ganesha itself can support it. > > > > #5 No objection here. > > > > #6 I've been thinking of the "which level of API clients should use, > and > > which a slight modification" in more historical, and less > prescriptive > > terms. I do think it makes sense to continue refactoring to reduce > code > > duplication, in addition to getting thread-level concurrency in the > Client > > cache (that's what my api-concurrent topic branch is focused on). > Obviously, > > we want to do this without inconveniencing other users. > > > > Other points from your email: > > > > The lack of a convenience function for getting the root inode is > just > > artifact of Fuse, Ganesha+Ceph, and others being able to construct > the inode > > descriptively. I don't think it's really a C++ encapsulation issue. > We can > > add an API (inline in Client) to get the root inode, however, and > will add > > it, if there's no objection. > > > > Getattr fixes. May not be API related. I planned to push a revision > of > > libcephfs-wip this week, so perhaps fixes can follow this. > > > > We haven't spent time with extended attributes so far, because > Ganesha's > > current xattr implementation is a prototype likely to change soon. > We may > > have missed something obvious here, though, I'll look at it. > > > > Regards, > > > > Matt > > > > ----- "Ilya Storozhilov" <Ilya_Storozhilov@epam.com> wrote: > > > > > Hi Matt, > > > > > > we have organized some testing of the open-by-handle API staff > from > > > the ?libcephfs-wip? Ceph repo branch and I?m sending to you our > > > thoughts about it. I will mention inode structure as ?resource > handle? > > > in this letter - this is how NFS-Ganesha considers it. > > > > > > First of all there is a small bug, when while trying to make file > > > operations on the resource which has been already removed in > another > > > CephFS connection - you can even open this file for writing and > append > > > some data to it. Actually there should be a ENOENT error, but > there is > > > not. An ceph_ll_getattr() operation returns no error on this > resource > > > handle but the amount of hard-links equals to zero in returned > stat > > > structure. > > > > > > There is no method for listing extended attributes (e.g. > > > ceph_ll_listxattr()) and to lookup a root resource handle (e.g. > > > ceph_ll_lookup_root() - it was mentioned in my previous letter). > > > > > > If you try to fetch an extended attribute value by calling > > > ceph_ll_getxattr() after setting in using ceph_ll_setxattr() you > will > > > receive a ENODATA error. ceph_ll_removexattr() does not return > ENODATA > > > when trying to remove non-existent extended attribute. > > > > > > So at the moment we have made a draft list of key requirements to > the > > > open-by-handle CephFS API and created an API header prototype - > look > > > at the attached file, please. What do you think about it? > > > > > > P.S. Generally speaking I think we can provide you with the > patch, > > > which fixes these issues, if you need. But mostly there will be > only > > > workarounds, cause, from my point of view a little reengineering > is > > > needed - see attached document. > > > > > > Best regards, > > > Ilya > > > > > > ________________________________________ > > > ??: Matt W. Benjamin [matt@linuxbox.com] > > > ??????????: 4 ???? 2013 ?.17:10 > > > To: Ilya Storozhilov > > > Cc: ceph-devel@vger.kernel.org > > > ????: Re: libcephfs: Open-By-Handle API question > > > > > > Hi Ilya, > > > > > > The changes on this branch originated in our Ganesha NFS driver > for > > > Ceph, so I'm not sure where the gap is, if any. > > > > > > I'll send an update to the list when we've finish re-integrating > > > against > > > the libcephfs-wip merge branch. > > > > > > Matt > > > > > > ----- "Ilya Storozhilov" <Ilya_Storozhilov@epam.com> wrote: > > > > > > > Hi Ceph developers, > > > > > > > > in order to represent NFS-frontend to CephFS data storage we > are > > > > trying to use innovative Open-By-Handle API from > > > > 'src/include/cephfs/libcephfs.h' file, which is of > 'wip-libcephfs' > > > > branch at the moment. API looks quite consistent and useful but > we > > > > couldn't find a method to get a pointer to root inode of the > mounted > > > > Ceph filesystem. > > > > > > > > At the moment we have found only one place, where it could be > > > fetched > > > > from: an 'Inode* root' member from the 'Client' class > > > > ('src/client/Client.h') but it is in 'protected' section, so > some > > > hack > > > > is needed (e.g. to introduce a Client's descendant, which is > > > providing > > > > a method to acces this protected member). Do you know, how to > fetch > > > a > > > > pointer to the root inode of the mounted Ceph filesystem without > any > > > > hacking (using just an official CephFS API only)? > > > > > > > > Thank you and best wishes, > > > > Ilya V. Storozhilov > > > > EPAM Systems > > > > Lead Software Engineer > > > > > > > > P.S. What do you think about to make 'Open-By-Handle' API to be > a > > > > primary and not low-level API to CephFS and to make POSIX-like > API > > > to > > > > be just a helper addendum to it?-- > > > > To unsubscribe from this list: send the line "unsubscribe > > > ceph-devel" > > > > in > > > > the body of a message to majordomo@vger.kernel.org > > > > More majordomo info at > http://vger.kernel.org/majordomo-info.html > > > > > > -- > > > Matt Benjamin > > > The Linux Box > > > 206 South Fifth Ave. Suite 150 > > > Ann Arbor, MI 48104 > > > > > > http://linuxbox.com > > > > > > tel. 734-761-4689 > > > fax. 734-769-8938 > > > cel. 734-216-5309 > > > > -- > > Matt Benjamin > > The Linux Box > > 206 South Fifth Ave. Suite 150 > > Ann Arbor, MI 48104 > > > > http://linuxbox.com > > > > tel. 734-761-4689 > > fax. 734-769-8938 > > cel. 734-216-5309 > > > > -- Matt Benjamin The Linux Box 206 South Fifth Ave. Suite 150 Ann Arbor, MI 48104 http://linuxbox.com tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: libcephfs: Open-By-Handle API - our patch and it's description 2013-06-27 14:37 ` libcephfs: Open-By-Handle API - our patch and it's description Matt W. Benjamin @ 2013-06-27 19:39 ` Sage Weil 2013-06-27 19:46 ` Matt W. Benjamin 0 siblings, 1 reply; 4+ messages in thread From: Sage Weil @ 2013-06-27 19:39 UTC (permalink / raw) To: Matt W. Benjamin; +Cc: aemerson, David Zafman, ceph-devel, Ilya Storozhilov [-- Attachment #1: Type: TEXT/PLAIN, Size: 13290 bytes --] On Thu, 27 Jun 2013, Matt W. Benjamin wrote: > Hi Sage, > > Adam and I haven't looked in detail yet, but based on the prior email > discussion, and not counting nits, Ilya's changes seem fine. > > If you agree, I and Adam take on the task of rebasing all our changes > currently on wip-libcephfs, into a small number of changes, following > your lead in the "rebase" branch where possible. (I had some questions > about how the rebase branch was assembled initially, before bugging you, > we'll have a fresh look at it.) We may need some interaction on IRC, but > with luck can have a prototype branch somewhere today or tomorrow. > (David, Sage, wrt IRC availability, does that sound likely to work?) I am traveling today and tomorrow, so my IRC availability will be very limited. But the rebase process I followed before is pretty straightforward. Basically, git rebase master git reset master git gui -- interactively stage various bits and pieces and commit them in logical units It might be simpler to just use the branch I already started and fix up the commit messages (git rebase -i master, 'e' for each commit, and git commit --amend on each to adjust the changelog). > Our work in progress changes logically follow all of this, so I'm not > worried about those yet. Great! sage > > Matt > > ----- "Sage Weil" <sage@inktank.com> wrote: > > > Hi Ilya! > > > > On Thu, 27 Jun 2013, Ilya Storozhilov wrote: > > > Hi Matt (CC Sage, David, Adam), > > > > > > > > > we have completed with our prototype of NFS/CIFS frontends to CEPH > > > filesystem. In order to do it we had to fork ?wip-libcephfs? branch > > (see > > > https://github.com/ferustigris/ceph and it?s ?open_by_handle_api? > > branch) to fix > > > some issues with libcephfs open-by-handle API - see below. I?ve > > attached a > > > respective patch, which: > > > > I posted some comments on github on the changes. I think we're > > close! > > > > > Does it possible to merge our changes to ?wip-libcephfs? branch of > > the official > > > CEPH repository? If some action is needed from our side (further > > amendments, > > > merge-requests, etc.) - let us know, please. > > > > Matt, what is your plan for this branch? Is it/are you rebasing? Are > > > > there other changes in progress? > > > > I'd very much like to get the basic changes (the open by handle stuff) > > > > into the tree during the next week or two so that we make the dumpling > > > > release cutoff. Getting some of the additional bits that you guys > > specifically need ironed out (like the flushing and cap ref stuff) is > > less > > important for that timeframe, if that helps. > > > > Ideally, the patch series would look more like the rebased version I > > pushed (but with updated changelogs for each patch and correct > > authorship/sign-off. If it's based on current master, and this patch > > goes > > on top, I'll pull it in. > > > > It would also be great to see some tests in test/libcephfs/ for the > > new > > calls as well... :) > > > > > There are still some unresolved issues in libcephfs, which are have > > to be > > > reported: > > > > > > * > > > > > > A file could be manipulated by it?s resource handle (inode) even > > if it has > > > been already removed by another client and syncing all data in > > both > > > cephfs connections is not helpful in this case; > > > > If I'm understanding you, this is normal unix behavior: open, unlink, > > do > > i/o for as long as you want, close. That is by design. > > > > > * > > > > > > Invalid hardlinks count for directories: always 1 instead of 2 > > plus one > > > per each subdirectory as it?s dictated by POSIX. > > > > This is a very common divergence from POSIX, and one you'll find with > > > > btrfs and others as well. IIRC the de facto convention is that if the > > > > link count is 1 it does not reflect the sub directory count, but if it > > is > > >= 2 it does. You'll see code to this effect in GNU find (which tries > > to > > use the link count to optimize its search when -type d). > > > > Thanks! > > sage > > > > > > > > > > Should we report this bugs on github? > > > > > > > > > Best regards! > > > > > > -- > > > Ilya Storozhilov > > > Lead Software Engineer > > > > > > EPAM Systems > > > Tver office, Russia > > > GMT+4 > > > > > > ________________________________________ > > > ??: Matt W. Benjamin [matt@linuxbox.com] > > > ??????????: 6 ???? 2013 ?.21:47 > > > To: Ilya Storozhilov > > > Cc: Sage Weil; aemerson; David Zafman > > > ????: DRAFT: Re: HA: libcephfs: Open-By-Handle API question > > > > > > Hi Ilya (CC Sage, David, Adam), > > > > > > Thanks for looking at the branch. > > > > > > We'll try to help you get the most out of Ganesha+Ceph, if that's > > your goal. > > > > > > I dont overly like "open-by-handle" as a descriptor for all the > > changes we > > > and David Zafman have made so far, but obviously, that's a core > > > functionality it's providing to, e.g., an NFS re-exporter. > > > > > > In terms of your document, > > > > > > #1 The "resource handle" vs. pointer to opaque object distinction > > seems like > > > a merely cosmetic change. What the API does now is typical of many C > > APIs > > > (see cred_t in NetBSD kernel, many others), as well as C-over-C++ > > APIs. > > > (Perhaps this branch has actual C++ leakage, if so, it's > > inadvertant, and > > > we'll fix.) > > > > > > #2 The "recoverable handle" concept needs to be subordinate to the > > semantics > > > of the objects and capability model of Ceph. It sounds like you're > > generally > > > speaking about objects at the NFS protocol level. With respect to > > NFS > > > filehandles, everything you're asserting is already true. With > > respect to > > > open state and other cluster coherence concepts, you've gone past > > the NFS > > > protocol to details of a clustered implementation. Our team has > > specific > > > requirements here that may differ from yours, just fyi. > > > > > > #3, #4 I (and Adam here) aren't fully persuaded of this. While > > embedding > > > objects in Ganesha handle memory would save small allocations, that > > > may not be a critical consideration. From a Ganesha perspective, > > it's > > > definitely just a hidden property of the driver implementation. I > > think > > > intuitively we're not really keen on having Inode objects (say) be > > expanded > > > in caller handles, although yes, Ganesha itself can support it. > > > > > > #5 No objection here. > > > > > > #6 I've been thinking of the "which level of API clients should use, > > and > > > which a slight modification" in more historical, and less > > prescriptive > > > terms. I do think it makes sense to continue refactoring to reduce > > code > > > duplication, in addition to getting thread-level concurrency in the > > Client > > > cache (that's what my api-concurrent topic branch is focused on). > > Obviously, > > > we want to do this without inconveniencing other users. > > > > > > Other points from your email: > > > > > > The lack of a convenience function for getting the root inode is > > just > > > artifact of Fuse, Ganesha+Ceph, and others being able to construct > > the inode > > > descriptively. I don't think it's really a C++ encapsulation issue. > > We can > > > add an API (inline in Client) to get the root inode, however, and > > will add > > > it, if there's no objection. > > > > > > Getattr fixes. May not be API related. I planned to push a revision > > of > > > libcephfs-wip this week, so perhaps fixes can follow this. > > > > > > We haven't spent time with extended attributes so far, because > > Ganesha's > > > current xattr implementation is a prototype likely to change soon. > > We may > > > have missed something obvious here, though, I'll look at it. > > > > > > Regards, > > > > > > Matt > > > > > > ----- "Ilya Storozhilov" <Ilya_Storozhilov@epam.com> wrote: > > > > > > > Hi Matt, > > > > > > > > we have organized some testing of the open-by-handle API staff > > from > > > > the ?libcephfs-wip? Ceph repo branch and I?m sending to you our > > > > thoughts about it. I will mention inode structure as ?resource > > handle? > > > > in this letter - this is how NFS-Ganesha considers it. > > > > > > > > First of all there is a small bug, when while trying to make file > > > > operations on the resource which has been already removed in > > another > > > > CephFS connection - you can even open this file for writing and > > append > > > > some data to it. Actually there should be a ENOENT error, but > > there is > > > > not. An ceph_ll_getattr() operation returns no error on this > > resource > > > > handle but the amount of hard-links equals to zero in returned > > stat > > > > structure. > > > > > > > > There is no method for listing extended attributes (e.g. > > > > ceph_ll_listxattr()) and to lookup a root resource handle (e.g. > > > > ceph_ll_lookup_root() - it was mentioned in my previous letter). > > > > > > > > If you try to fetch an extended attribute value by calling > > > > ceph_ll_getxattr() after setting in using ceph_ll_setxattr() you > > will > > > > receive a ENODATA error. ceph_ll_removexattr() does not return > > ENODATA > > > > when trying to remove non-existent extended attribute. > > > > > > > > So at the moment we have made a draft list of key requirements to > > the > > > > open-by-handle CephFS API and created an API header prototype - > > look > > > > at the attached file, please. What do you think about it? > > > > > > > > P.S. Generally speaking I think we can provide you with the > > patch, > > > > which fixes these issues, if you need. But mostly there will be > > only > > > > workarounds, cause, from my point of view a little reengineering > > is > > > > needed - see attached document. > > > > > > > > Best regards, > > > > Ilya > > > > > > > > ________________________________________ > > > > ??: Matt W. Benjamin [matt@linuxbox.com] > > > > ??????????: 4 ???? 2013 ?.17:10 > > > > To: Ilya Storozhilov > > > > Cc: ceph-devel@vger.kernel.org > > > > ????: Re: libcephfs: Open-By-Handle API question > > > > > > > > Hi Ilya, > > > > > > > > The changes on this branch originated in our Ganesha NFS driver > > for > > > > Ceph, so I'm not sure where the gap is, if any. > > > > > > > > I'll send an update to the list when we've finish re-integrating > > > > against > > > > the libcephfs-wip merge branch. > > > > > > > > Matt > > > > > > > > ----- "Ilya Storozhilov" <Ilya_Storozhilov@epam.com> wrote: > > > > > > > > > Hi Ceph developers, > > > > > > > > > > in order to represent NFS-frontend to CephFS data storage we > > are > > > > > trying to use innovative Open-By-Handle API from > > > > > 'src/include/cephfs/libcephfs.h' file, which is of > > 'wip-libcephfs' > > > > > branch at the moment. API looks quite consistent and useful but > > we > > > > > couldn't find a method to get a pointer to root inode of the > > mounted > > > > > Ceph filesystem. > > > > > > > > > > At the moment we have found only one place, where it could be > > > > fetched > > > > > from: an 'Inode* root' member from the 'Client' class > > > > > ('src/client/Client.h') but it is in 'protected' section, so > > some > > > > hack > > > > > is needed (e.g. to introduce a Client's descendant, which is > > > > providing > > > > > a method to acces this protected member). Do you know, how to > > fetch > > > > a > > > > > pointer to the root inode of the mounted Ceph filesystem without > > any > > > > > hacking (using just an official CephFS API only)? > > > > > > > > > > Thank you and best wishes, > > > > > Ilya V. Storozhilov > > > > > EPAM Systems > > > > > Lead Software Engineer > > > > > > > > > > P.S. What do you think about to make 'Open-By-Handle' API to be > > a > > > > > primary and not low-level API to CephFS and to make POSIX-like > > API > > > > to > > > > > be just a helper addendum to it?-- > > > > > To unsubscribe from this list: send the line "unsubscribe > > > > ceph-devel" > > > > > in > > > > > the body of a message to majordomo@vger.kernel.org > > > > > More majordomo info at > > http://vger.kernel.org/majordomo-info.html > > > > > > > > -- > > > > Matt Benjamin > > > > The Linux Box > > > > 206 South Fifth Ave. Suite 150 > > > > Ann Arbor, MI 48104 > > > > > > > > http://linuxbox.com > > > > > > > > tel. 734-761-4689 > > > > fax. 734-769-8938 > > > > cel. 734-216-5309 > > > > > > -- > > > Matt Benjamin > > > The Linux Box > > > 206 South Fifth Ave. Suite 150 > > > Ann Arbor, MI 48104 > > > > > > http://linuxbox.com > > > > > > tel. 734-761-4689 > > > fax. 734-769-8938 > > > cel. 734-216-5309 > > > > > > > > -- > Matt Benjamin > The Linux Box > 206 South Fifth Ave. Suite 150 > Ann Arbor, MI 48104 > > http://linuxbox.com > > tel. 734-761-4689 > fax. 734-769-8938 > cel. 734-216-5309 > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: libcephfs: Open-By-Handle API - our patch and it's description 2013-06-27 19:39 ` Sage Weil @ 2013-06-27 19:46 ` Matt W. Benjamin 0 siblings, 0 replies; 4+ messages in thread From: Matt W. Benjamin @ 2013-06-27 19:46 UTC (permalink / raw) To: Sage Weil; +Cc: aemerson, David Zafman, ceph-devel, Ilya Storozhilov Hi Sage, ----- "Sage Weil" <sage@inktank.com> wrote: > > It might be simpler to just use the branch I already started and fix > up > the commit messages (git rebase -i master, 'e' for each commit, and > git > commit --amend on each to adjust the changelog). Yes, doing exactly that. > > > Our work in progress changes logically follow all of this, so I'm > not > > worried about those yet. > > Great! > sage -- Matt Benjamin The Linux Box 206 South Fifth Ave. Suite 150 Ann Arbor, MI 48104 http://linuxbox.com tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <941FC8BE87ADE64D90E4EB6CD554517710021442@EPBYMINSA0032.epam.com>]
* Re: libcephfs: Open-By-Handle API - our patch and it's description [not found] <941FC8BE87ADE64D90E4EB6CD554517710021442@EPBYMINSA0032.epam.com> @ 2013-06-27 12:50 ` Sage Weil 0 siblings, 0 replies; 4+ messages in thread From: Sage Weil @ 2013-06-27 12:50 UTC (permalink / raw) To: Ilya Storozhilov; +Cc: Matt W. Benjamin, aemerson, David Zafman, ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 10228 bytes --] Hi Ilya! On Thu, 27 Jun 2013, Ilya Storozhilov wrote: > Hi Matt (CC Sage, David, Adam), > > > we have completed with our prototype of NFS/CIFS frontends to CEPH > filesystem. In order to do it we had to fork ?wip-libcephfs? branch (see > https://github.com/ferustigris/ceph and it?s ?open_by_handle_api? branch) to fix > some issues with libcephfs open-by-handle API - see below. I?ve attached a > respective patch, which: I posted some comments on github on the changes. I think we're close! > Does it possible to merge our changes to ?wip-libcephfs? branch of the official > CEPH repository? If some action is needed from our side (further amendments, > merge-requests, etc.) - let us know, please. Matt, what is your plan for this branch? Is it/are you rebasing? Are there other changes in progress? I'd very much like to get the basic changes (the open by handle stuff) into the tree during the next week or two so that we make the dumpling release cutoff. Getting some of the additional bits that you guys specifically need ironed out (like the flushing and cap ref stuff) is less important for that timeframe, if that helps. Ideally, the patch series would look more like the rebased version I pushed (but with updated changelogs for each patch and correct authorship/sign-off. If it's based on current master, and this patch goes on top, I'll pull it in. It would also be great to see some tests in test/libcephfs/ for the new calls as well... :) > There are still some unresolved issues in libcephfs, which are have to be > reported: > > * > > A file could be manipulated by it?s resource handle (inode) even if it has > been already removed by another client and syncing all data in both > cephfs connections is not helpful in this case; If I'm understanding you, this is normal unix behavior: open, unlink, do i/o for as long as you want, close. That is by design. > * > > Invalid hardlinks count for directories: always 1 instead of 2 plus one > per each subdirectory as it?s dictated by POSIX. This is a very common divergence from POSIX, and one you'll find with btrfs and others as well. IIRC the de facto convention is that if the link count is 1 it does not reflect the sub directory count, but if it is >= 2 it does. You'll see code to this effect in GNU find (which tries to use the link count to optimize its search when -type d). Thanks! sage > > Should we report this bugs on github? > > > Best regards! > > -- > Ilya Storozhilov > Lead Software Engineer > > EPAM Systems > Tver office, Russia > GMT+4 > > ________________________________________ > ??: Matt W. Benjamin [matt@linuxbox.com] > ??????????: 6 ???? 2013 ?.21:47 > To: Ilya Storozhilov > Cc: Sage Weil; aemerson; David Zafman > ????: DRAFT: Re: HA: libcephfs: Open-By-Handle API question > > Hi Ilya (CC Sage, David, Adam), > > Thanks for looking at the branch. > > We'll try to help you get the most out of Ganesha+Ceph, if that's your goal. > > I dont overly like "open-by-handle" as a descriptor for all the changes we > and David Zafman have made so far, but obviously, that's a core > functionality it's providing to, e.g., an NFS re-exporter. > > In terms of your document, > > #1 The "resource handle" vs. pointer to opaque object distinction seems like > a merely cosmetic change. What the API does now is typical of many C APIs > (see cred_t in NetBSD kernel, many others), as well as C-over-C++ APIs. > (Perhaps this branch has actual C++ leakage, if so, it's inadvertant, and > we'll fix.) > > #2 The "recoverable handle" concept needs to be subordinate to the semantics > of the objects and capability model of Ceph. It sounds like you're generally > speaking about objects at the NFS protocol level. With respect to NFS > filehandles, everything you're asserting is already true. With respect to > open state and other cluster coherence concepts, you've gone past the NFS > protocol to details of a clustered implementation. Our team has specific > requirements here that may differ from yours, just fyi. > > #3, #4 I (and Adam here) aren't fully persuaded of this. While embedding > objects in Ganesha handle memory would save small allocations, that > may not be a critical consideration. From a Ganesha perspective, it's > definitely just a hidden property of the driver implementation. I think > intuitively we're not really keen on having Inode objects (say) be expanded > in caller handles, although yes, Ganesha itself can support it. > > #5 No objection here. > > #6 I've been thinking of the "which level of API clients should use, and > which a slight modification" in more historical, and less prescriptive > terms. I do think it makes sense to continue refactoring to reduce code > duplication, in addition to getting thread-level concurrency in the Client > cache (that's what my api-concurrent topic branch is focused on). Obviously, > we want to do this without inconveniencing other users. > > Other points from your email: > > The lack of a convenience function for getting the root inode is just > artifact of Fuse, Ganesha+Ceph, and others being able to construct the inode > descriptively. I don't think it's really a C++ encapsulation issue. We can > add an API (inline in Client) to get the root inode, however, and will add > it, if there's no objection. > > Getattr fixes. May not be API related. I planned to push a revision of > libcephfs-wip this week, so perhaps fixes can follow this. > > We haven't spent time with extended attributes so far, because Ganesha's > current xattr implementation is a prototype likely to change soon. We may > have missed something obvious here, though, I'll look at it. > > Regards, > > Matt > > ----- "Ilya Storozhilov" <Ilya_Storozhilov@epam.com> wrote: > > > Hi Matt, > > > > we have organized some testing of the open-by-handle API staff from > > the ?libcephfs-wip? Ceph repo branch and I?m sending to you our > > thoughts about it. I will mention inode structure as ?resource handle? > > in this letter - this is how NFS-Ganesha considers it. > > > > First of all there is a small bug, when while trying to make file > > operations on the resource which has been already removed in another > > CephFS connection - you can even open this file for writing and append > > some data to it. Actually there should be a ENOENT error, but there is > > not. An ceph_ll_getattr() operation returns no error on this resource > > handle but the amount of hard-links equals to zero in returned stat > > structure. > > > > There is no method for listing extended attributes (e.g. > > ceph_ll_listxattr()) and to lookup a root resource handle (e.g. > > ceph_ll_lookup_root() - it was mentioned in my previous letter). > > > > If you try to fetch an extended attribute value by calling > > ceph_ll_getxattr() after setting in using ceph_ll_setxattr() you will > > receive a ENODATA error. ceph_ll_removexattr() does not return ENODATA > > when trying to remove non-existent extended attribute. > > > > So at the moment we have made a draft list of key requirements to the > > open-by-handle CephFS API and created an API header prototype - look > > at the attached file, please. What do you think about it? > > > > P.S. Generally speaking I think we can provide you with the patch, > > which fixes these issues, if you need. But mostly there will be only > > workarounds, cause, from my point of view a little reengineering is > > needed - see attached document. > > > > Best regards, > > Ilya > > > > ________________________________________ > > ??: Matt W. Benjamin [matt@linuxbox.com] > > ??????????: 4 ???? 2013 ?.17:10 > > To: Ilya Storozhilov > > Cc: ceph-devel@vger.kernel.org > > ????: Re: libcephfs: Open-By-Handle API question > > > > Hi Ilya, > > > > The changes on this branch originated in our Ganesha NFS driver for > > Ceph, so I'm not sure where the gap is, if any. > > > > I'll send an update to the list when we've finish re-integrating > > against > > the libcephfs-wip merge branch. > > > > Matt > > > > ----- "Ilya Storozhilov" <Ilya_Storozhilov@epam.com> wrote: > > > > > Hi Ceph developers, > > > > > > in order to represent NFS-frontend to CephFS data storage we are > > > trying to use innovative Open-By-Handle API from > > > 'src/include/cephfs/libcephfs.h' file, which is of 'wip-libcephfs' > > > branch at the moment. API looks quite consistent and useful but we > > > couldn't find a method to get a pointer to root inode of the mounted > > > Ceph filesystem. > > > > > > At the moment we have found only one place, where it could be > > fetched > > > from: an 'Inode* root' member from the 'Client' class > > > ('src/client/Client.h') but it is in 'protected' section, so some > > hack > > > is needed (e.g. to introduce a Client's descendant, which is > > providing > > > a method to acces this protected member). Do you know, how to fetch > > a > > > pointer to the root inode of the mounted Ceph filesystem without any > > > hacking (using just an official CephFS API only)? > > > > > > Thank you and best wishes, > > > Ilya V. Storozhilov > > > EPAM Systems > > > Lead Software Engineer > > > > > > P.S. What do you think about to make 'Open-By-Handle' API to be a > > > primary and not low-level API to CephFS and to make POSIX-like API > > to > > > be just a helper addendum to it?-- > > > To unsubscribe from this list: send the line "unsubscribe > > ceph-devel" > > > in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > > Matt Benjamin > > The Linux Box > > 206 South Fifth Ave. Suite 150 > > Ann Arbor, MI 48104 > > > > http://linuxbox.com > > > > tel. 734-761-4689 > > fax. 734-769-8938 > > cel. 734-216-5309 > > -- > Matt Benjamin > The Linux Box > 206 South Fifth Ave. Suite 150 > Ann Arbor, MI 48104 > > http://linuxbox.com > > tel. 734-761-4689 > fax. 734-769-8938 > cel. 734-216-5309 > > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-06-27 19:47 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <138894132.40.1372343148764.JavaMail.root@thunderbeast.private.linuxbox.com>
2013-06-27 14:37 ` libcephfs: Open-By-Handle API - our patch and it's description Matt W. Benjamin
2013-06-27 19:39 ` Sage Weil
2013-06-27 19:46 ` Matt W. Benjamin
[not found] <941FC8BE87ADE64D90E4EB6CD554517710021442@EPBYMINSA0032.epam.com>
2013-06-27 12:50 ` Sage Weil
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.