* No lock on RBD allow several mount on different servers...
@ 2012-08-11 23:50 Sébastien Han
[not found] ` <CALFpzo49Urnf8rnFCQ=wQ8eFMR0-8FWh2=9nKrCAxb+0Xm0rVQ@mail.gmail.com>
0 siblings, 1 reply; 10+ messages in thread
From: Sébastien Han @ 2012-08-11 23:50 UTC (permalink / raw)
To: ceph-devel
Hi guys,
With RBD images, the theory makes possible to mount them multiple
times on different servers, of course **no one** wants that. If you
care about the consistency of your data :D
I was wondering if ceph has any lock ability on the RBD device like
DRBD does with secondary resource. Apparently not, I was able to mount
an image on multiple server and wrote data on both.
Is this an incoming feature?
I don't really know the difficulty level that this kind of feature
implies, but it would be nice to have it.
Cheers!
^ permalink raw reply [flat|nested] 10+ messages in thread[parent not found: <CALFpzo49Urnf8rnFCQ=wQ8eFMR0-8FWh2=9nKrCAxb+0Xm0rVQ@mail.gmail.com>]
[parent not found: <CAOLwVUnSUpAC69W48gbz-+7L7+p9z5tioODh_hPwVEt39GDvHw@mail.gmail.com>]
[parent not found: <CALFpzo4X7iL6aEUtqyEBp4AMDxKkK9wtwPx35WQVauYQbe8Hng@mail.gmail.com>]
* Re: No lock on RBD allow several mount on different servers... [not found] ` <CALFpzo4X7iL6aEUtqyEBp4AMDxKkK9wtwPx35WQVauYQbe8Hng@mail.gmail.com> @ 2012-08-12 0:35 ` Marcus Sorensen 2012-08-12 0:53 ` Marcus Sorensen 0 siblings, 1 reply; 10+ messages in thread From: Marcus Sorensen @ 2012-08-12 0:35 UTC (permalink / raw) To: Sébastien Han, ceph-devel What I mean is that my understanding of RBD is that it is designed to do no more than to present a block device. In that context, what you're asking is more like whether they will support persistent reservations. RBD being just a block device is at a lower level, and you'd have to add something on top of it that is aware of sharing/locking. On Sat, Aug 11, 2012 at 6:26 PM, Marcus Sorensen <shadowsor@gmail.com> wrote: > But you put a /dev/drbd atop a block device. > > On Aug 11, 2012 6:09 PM, "Sébastien Han" <han.sebastien@gmail.com> wrote: >> >> Hi Marcus, >> >> I didn't really get your first sentence, but I don't think so, for >> instance DRBD manages his own /dev/drbd device and voluntary puts a >> lock on a resource with a 'secondary' state, like the single-primary >> mode. So I would say that it's even higher than lower... >> >> Cheers! >> >> On Sun, Aug 12, 2012 at 1:58 AM, Marcus Sorensen <shadowsor@gmail.com> >> wrote: >> > Isn't it supposed to be lower level than that? More like just a block >> > device >> > such as a SAN or iscsi device? DRBD(GFS,OCFS,CLVM) goes on top of that. >> > >> > On Aug 11, 2012 5:51 PM, "Sébastien Han" <han.sebastien@gmail.com> >> > wrote: >> >> >> >> Hi guys, >> >> >> >> With RBD images, the theory makes possible to mount them multiple >> >> times on different servers, of course **no one** wants that. If you >> >> care about the consistency of your data :D >> >> I was wondering if ceph has any lock ability on the RBD device like >> >> DRBD does with secondary resource. Apparently not, I was able to mount >> >> an image on multiple server and wrote data on both. >> >> >> >> Is this an incoming feature? >> >> >> >> I don't really know the difficulty level that this kind of feature >> >> implies, but it would be nice to have it. >> >> >> >> Cheers! >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> >> in >> >> the body of a message to majordomo@vger.kernel.org >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: No lock on RBD allow several mount on different servers... 2012-08-12 0:35 ` Marcus Sorensen @ 2012-08-12 0:53 ` Marcus Sorensen 2012-08-12 8:40 ` Sebastien HAN 0 siblings, 1 reply; 10+ messages in thread From: Marcus Sorensen @ 2012-08-12 0:53 UTC (permalink / raw) To: Sébastien Han, ceph-devel Oh, and note there's recently been an RBD caching mode added, this would mess up any multi-mount or sharing you'd attempt to do of an RBD device. It's optional, though. http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/6402 Still, you'd probably need something on top of it to manage sharing, just like with any other block device. I'm interested to see what the devs say though. On Sat, Aug 11, 2012 at 6:35 PM, Marcus Sorensen <shadowsor@gmail.com> wrote: > What I mean is that my understanding of RBD is that it is designed to > do no more than to present a block device. In that context, what > you're asking is more like whether they will support persistent > reservations. RBD being just a block device is at a lower level, and > you'd have to add something on top of it that is aware of > sharing/locking. > > On Sat, Aug 11, 2012 at 6:26 PM, Marcus Sorensen <shadowsor@gmail.com> wrote: >> But you put a /dev/drbd atop a block device. >> >> On Aug 11, 2012 6:09 PM, "Sébastien Han" <han.sebastien@gmail.com> wrote: >>> >>> Hi Marcus, >>> >>> I didn't really get your first sentence, but I don't think so, for >>> instance DRBD manages his own /dev/drbd device and voluntary puts a >>> lock on a resource with a 'secondary' state, like the single-primary >>> mode. So I would say that it's even higher than lower... >>> >>> Cheers! >>> >>> On Sun, Aug 12, 2012 at 1:58 AM, Marcus Sorensen <shadowsor@gmail.com> >>> wrote: >>> > Isn't it supposed to be lower level than that? More like just a block >>> > device >>> > such as a SAN or iscsi device? DRBD(GFS,OCFS,CLVM) goes on top of that. >>> > >>> > On Aug 11, 2012 5:51 PM, "Sébastien Han" <han.sebastien@gmail.com> >>> > wrote: >>> >> >>> >> Hi guys, >>> >> >>> >> With RBD images, the theory makes possible to mount them multiple >>> >> times on different servers, of course **no one** wants that. If you >>> >> care about the consistency of your data :D >>> >> I was wondering if ceph has any lock ability on the RBD device like >>> >> DRBD does with secondary resource. Apparently not, I was able to mount >>> >> an image on multiple server and wrote data on both. >>> >> >>> >> Is this an incoming feature? >>> >> >>> >> I don't really know the difficulty level that this kind of feature >>> >> implies, but it would be nice to have it. >>> >> >>> >> Cheers! >>> >> -- >>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>> >> in >>> >> the body of a message to majordomo@vger.kernel.org >>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: No lock on RBD allow several mount on different servers... 2012-08-12 0:53 ` Marcus Sorensen @ 2012-08-12 8:40 ` Sebastien HAN 2012-08-12 9:37 ` Smart Weblications GmbH - Florian Wiessner 2012-08-12 15:44 ` Sage Weil 0 siblings, 2 replies; 10+ messages in thread From: Sebastien HAN @ 2012-08-12 8:40 UTC (permalink / raw) To: Marcus Sorensen; +Cc: ceph-devel Hi Marcus, I completely understand your point and of course I'm agree. Even if this King of locking occurs with software on top of the block device, it doesn't look impossible to me. And it's more than persistent reservation because it's also fearly depend of the filesystem on top of it. For example, you would like to be able to allow a dual-primary mode... After our brief exchange, I have to admit that this looks harder than I expected. Maybe this would require a little daemon running on the server where the device is mapped... A little bit overkill and out of the design... I'm also curious to hear the output from the devs! Ps: I knew the rbd caching but I never tried it, I will. Thanks you again :-) Cheers! On 12 août 2012, at 02:53, Marcus Sorensen <shadowsor@gmail.com> wrote: > Oh, and note there's recently been an RBD caching mode added, this > would mess up any multi-mount or sharing you'd attempt to do of an RBD > device. It's optional, though. > > http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/6402 > > Still, you'd probably need something on top of it to manage sharing, > just like with any other block device. I'm interested to see what the > devs say though. > > On Sat, Aug 11, 2012 at 6:35 PM, Marcus Sorensen <shadowsor@gmail.com> wrote: >> What I mean is that my understanding of RBD is that it is designed to >> do no more than to present a block device. In that context, what >> you're asking is more like whether they will support persistent >> reservations. RBD being just a block device is at a lower level, and >> you'd have to add something on top of it that is aware of >> sharing/locking. >> >> On Sat, Aug 11, 2012 at 6:26 PM, Marcus Sorensen <shadowsor@gmail.com> wrote: >>> But you put a /dev/drbd atop a block device. >>> >>> On Aug 11, 2012 6:09 PM, "Sébastien Han" <han.sebastien@gmail.com> wrote: >>>> >>>> Hi Marcus, >>>> >>>> I didn't really get your first sentence, but I don't think so, for >>>> instance DRBD manages his own /dev/drbd device and voluntary puts a >>>> lock on a resource with a 'secondary' state, like the single-primary >>>> mode. So I would say that it's even higher than lower... >>>> >>>> Cheers! >>>> >>>> On Sun, Aug 12, 2012 at 1:58 AM, Marcus Sorensen <shadowsor@gmail.com> >>>> wrote: >>>>> Isn't it supposed to be lower level than that? More like just a block >>>>> device >>>>> such as a SAN or iscsi device? DRBD(GFS,OCFS,CLVM) goes on top of that. >>>>> >>>>> On Aug 11, 2012 5:51 PM, "Sébastien Han" <han.sebastien@gmail.com> >>>>> wrote: >>>>>> >>>>>> Hi guys, >>>>>> >>>>>> With RBD images, the theory makes possible to mount them multiple >>>>>> times on different servers, of course **no one** wants that. If you >>>>>> care about the consistency of your data :D >>>>>> I was wondering if ceph has any lock ability on the RBD device like >>>>>> DRBD does with secondary resource. Apparently not, I was able to mount >>>>>> an image on multiple server and wrote data on both. >>>>>> >>>>>> Is this an incoming feature? >>>>>> >>>>>> I don't really know the difficulty level that this kind of feature >>>>>> implies, but it would be nice to have it. >>>>>> >>>>>> Cheers! >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>>> in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: No lock on RBD allow several mount on different servers... 2012-08-12 8:40 ` Sebastien HAN @ 2012-08-12 9:37 ` Smart Weblications GmbH - Florian Wiessner 2012-08-12 15:44 ` Sage Weil 1 sibling, 0 replies; 10+ messages in thread From: Smart Weblications GmbH - Florian Wiessner @ 2012-08-12 9:37 UTC (permalink / raw) To: Sebastien HAN; +Cc: Marcus Sorensen, ceph-devel Am 12.08.2012 10:40, schrieb Sebastien HAN: >>>>>>> >>>>>>> With RBD images, the theory makes possible to mount them multiple >>>>>>> times on different servers, of course **no one** wants that. If you >>>>>>> care about the consistency of your data :D >>>>>>> I was wondering if ceph has any lock ability on the RBD device like >>>>>>> DRBD does with secondary resource. Apparently not, I was able to mount >>>>>>> an image on multiple server and wrote data on both. >>>>>>> >>>>>>> Is this an incoming feature? >>>>>>> >>>>>>> I don't really know the difficulty level that this kind of feature >>>>>>> implies, but it would be nice to have it. >>>>>>> I use ocfs2 ontop of mapped rbd images - works great. -- Mit freundlichen Grüßen, Florian Wiessner Smart Weblications GmbH Martinsberger Str. 1 D-95119 Naila fon.: +49 9282 9638 200 fax.: +49 9282 9638 205 24/7: +49 900 144 000 00 - 0,99 EUR/Min* http://www.smart-weblications.de -- Sitz der Gesellschaft: Naila Geschäftsführer: Florian Wiessner HRB-Nr.: HRB 3840 Amtsgericht Hof *aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: No lock on RBD allow several mount on different servers... 2012-08-12 8:40 ` Sebastien HAN 2012-08-12 9:37 ` Smart Weblications GmbH - Florian Wiessner @ 2012-08-12 15:44 ` Sage Weil 2012-08-13 16:55 ` Gregory Farnum 1 sibling, 1 reply; 10+ messages in thread From: Sage Weil @ 2012-08-12 15:44 UTC (permalink / raw) To: Sebastien HAN; +Cc: Marcus Sorensen, ceph-devel RBD image locking is on roadmap, but it's tricky. Almost all of the pieces are in place for exclusive locking of the image header, which will let the user know when other nodes have the image mapped, and give them the option to break their lock and take over ownership. The real challenge is fencing. Unlike move conventional options like SCSI, the RBD image is distributed across the entire cluster, so ensuring that the old guy doesn't still have IOs in flight that will stomp on the new owner means that potentially everyone needs to be informed that the bad guy should be locked out. I think there are a few options: 1- The user has their own fencing or STOGITH on top of rbd, informed by the rbd locking. Pull the plug, update your iptables, whatever. Not very friendly. 2- Extend the rados 'blacklist' functionality to let you ensure that every node in the cluster has received the updated osdmap+blacklist information, so that you can be sure no further IO from the old guy is possible. 3- Use the same approach that ceph-mds fencing uses, in which the old owner isn't known to be fenced away from a particular object until the new owner reads/touches that object. My hope is that we can get away with #3, in which case all of the basic pieces are in place and the real remaining work is integration and testing. The logic goes something like this: File systems write to blocks on disk in a somewhat ordered fashion. After writing a bunch of data, they approach a 'consistency point' where their journal and/or superblocks must be flushed and things 'commit' to disk. At that point, if the IO fails or blocks, it won't continue to clobber other parts of the disk. When an fs in mounted, those same critical areas are read (superblock, journal, etc.). The existing client/osd interaction ensures that if the new guy knows that the old guy is fenced, the act of reading ensures that the relevant ceph-osds will find out too and that paticular object will be fenced. The resulting conclusion is that if a file system (or application on top of it doing direct io) is sufficiently well-behaved that will be not corrupt itself when the disk reorders IOs (they do) and issues barrier/flush operations at the appropriate time (in modern kernels, they do), then it will work. I suppose it's roughly analogous to Schroedinger's cat: until the new owner reads a block, it may or may not still be modified/modifiable by the old guy, but as soon as it is observed, its state is known. What do you guys think? If that doesn't work, I think we're stuck with #2, which is expensive but doable. sage ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: No lock on RBD allow several mount on different servers... 2012-08-12 15:44 ` Sage Weil @ 2012-08-13 16:55 ` Gregory Farnum 2012-08-13 17:22 ` Josh Durgin 0 siblings, 1 reply; 10+ messages in thread From: Gregory Farnum @ 2012-08-13 16:55 UTC (permalink / raw) To: ceph-devel, Josh Durgin; +Cc: Sebastien HAN, Marcus Sorensen We've discussed some of the issues here a little bit before. See http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/7094 if you're interested. Josh, can you discuss the current status of the advisory locking? -Greg On Sun, Aug 12, 2012 at 8:44 AM, Sage Weil <sage@inktank.com> wrote: > RBD image locking is on roadmap, but it's tricky. Almost all of the > pieces are in place for exclusive locking of the image header, which will > let the user know when other nodes have the image mapped, and give them > the option to break their lock and take over ownership. > > The real challenge is fencing. Unlike move conventional options like > SCSI, the RBD image is distributed across the entire cluster, so ensuring > that the old guy doesn't still have IOs in flight that will stomp on the > new owner means that potentially everyone needs to be informed that the > bad guy should be locked out. > > I think there are a few options: > > 1- The user has their own fencing or STOGITH on top of rbd, informed by > the rbd locking. Pull the plug, update your iptables, whatever. Not > very friendly. > 2- Extend the rados 'blacklist' functionality to let you ensure that every > node in the cluster has received the updated osdmap+blacklist > information, so that you can be sure no further IO from the old guy is > possible. > 3- Use the same approach that ceph-mds fencing uses, in which the old > owner isn't known to be fenced away from a particular object until the > new owner reads/touches that object. > > My hope is that we can get away with #3, in which case all of the basic > pieces are in place and the real remaining work is integration and > testing. The logic goes something like this: > > File systems write to blocks on disk in a somewhat ordered fashion. > After writing a bunch of data, they approach a 'consistency point' where > their journal and/or superblocks must be flushed and things 'commit' to > disk. At that point, if the IO fails or blocks, it won't continue to > clobber other parts of the disk. > > When an fs in mounted, those same critical areas are read (superblock, > journal, etc.). The existing client/osd interaction ensures that if > the new guy knows that the old guy is fenced, the act of reading > ensures that the relevant ceph-osds will find out too and that > paticular object will be fenced. > > The resulting conclusion is that if a file system (or application on top > of it doing direct io) is sufficiently well-behaved that will be not > corrupt itself when the disk reorders IOs (they do) and issues > barrier/flush operations at the appropriate time (in modern kernels, they > do), then it will work. > > I suppose it's roughly analogous to Schroedinger's cat: until the new > owner reads a block, it may or may not still be modified/modifiable by the > old guy, but as soon as it is observed, its state is known. > > What do you guys think? If that doesn't work, I think we're stuck with > #2, which is expensive but doable. > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: No lock on RBD allow several mount on different servers... 2012-08-13 16:55 ` Gregory Farnum @ 2012-08-13 17:22 ` Josh Durgin 2012-08-13 17:49 ` Yehuda Sadeh 0 siblings, 1 reply; 10+ messages in thread From: Josh Durgin @ 2012-08-13 17:22 UTC (permalink / raw) To: Gregory Farnum; +Cc: ceph-devel, Sebastien HAN, Marcus Sorensen On 08/13/2012 09:55 AM, Gregory Farnum wrote: > We've discussed some of the issues here a little bit before. See > http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/7094 if > you're interested. > > Josh, can you discuss the current status of the advisory locking? > -Greg Yehuda reworked it into a generic rados class so it can be used outside of rbd. It hasn't been re-integrated with rbd yet, and I haven't looked at it closely since the generalization. Yehuda could describe it in more detail. Josh > On Sun, Aug 12, 2012 at 8:44 AM, Sage Weil <sage@inktank.com> wrote: >> RBD image locking is on roadmap, but it's tricky. Almost all of the >> pieces are in place for exclusive locking of the image header, which will >> let the user know when other nodes have the image mapped, and give them >> the option to break their lock and take over ownership. >> >> The real challenge is fencing. Unlike move conventional options like >> SCSI, the RBD image is distributed across the entire cluster, so ensuring >> that the old guy doesn't still have IOs in flight that will stomp on the >> new owner means that potentially everyone needs to be informed that the >> bad guy should be locked out. >> >> I think there are a few options: >> >> 1- The user has their own fencing or STOGITH on top of rbd, informed by >> the rbd locking. Pull the plug, update your iptables, whatever. Not >> very friendly. >> 2- Extend the rados 'blacklist' functionality to let you ensure that every >> node in the cluster has received the updated osdmap+blacklist >> information, so that you can be sure no further IO from the old guy is >> possible. >> 3- Use the same approach that ceph-mds fencing uses, in which the old >> owner isn't known to be fenced away from a particular object until the >> new owner reads/touches that object. >> >> My hope is that we can get away with #3, in which case all of the basic >> pieces are in place and the real remaining work is integration and >> testing. The logic goes something like this: >> >> File systems write to blocks on disk in a somewhat ordered fashion. >> After writing a bunch of data, they approach a 'consistency point' where >> their journal and/or superblocks must be flushed and things 'commit' to >> disk. At that point, if the IO fails or blocks, it won't continue to >> clobber other parts of the disk. >> >> When an fs in mounted, those same critical areas are read (superblock, >> journal, etc.). The existing client/osd interaction ensures that if >> the new guy knows that the old guy is fenced, the act of reading >> ensures that the relevant ceph-osds will find out too and that >> paticular object will be fenced. >> >> The resulting conclusion is that if a file system (or application on top >> of it doing direct io) is sufficiently well-behaved that will be not >> corrupt itself when the disk reorders IOs (they do) and issues >> barrier/flush operations at the appropriate time (in modern kernels, they >> do), then it will work. >> >> I suppose it's roughly analogous to Schroedinger's cat: until the new >> owner reads a block, it may or may not still be modified/modifiable by the >> old guy, but as soon as it is observed, its state is known. >> >> What do you guys think? If that doesn't work, I think we're stuck with >> #2, which is expensive but doable. >> >> sage >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: No lock on RBD allow several mount on different servers... 2012-08-13 17:22 ` Josh Durgin @ 2012-08-13 17:49 ` Yehuda Sadeh 2012-08-15 8:18 ` Sébastien Han 0 siblings, 1 reply; 10+ messages in thread From: Yehuda Sadeh @ 2012-08-13 17:49 UTC (permalink / raw) To: Josh Durgin; +Cc: Gregory Farnum, ceph-devel, Sebastien HAN, Marcus Sorensen On Mon, Aug 13, 2012 at 10:22 AM, Josh Durgin <josh.durgin@inktank.com> wrote: > On 08/13/2012 09:55 AM, Gregory Farnum wrote: >> >> We've discussed some of the issues here a little bit before. See >> http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/7094 if >> you're interested. >> >> Josh, can you discuss the current status of the advisory locking? >> -Greg > > > Yehuda reworked it into a generic rados class so it can be used outside > of rbd. It hasn't been re-integrated with rbd yet, and I haven't looked > at it closely since the generalization. Yehuda could describe it in > more detail. > The lock objclass provides a generic way to set locks on objects. It is a cooperative scheme. The following operations are available: - lock (exclusive or shared) - unlock -- remove a lock that was set by the same client instance - break -- remove a lock that was set by a different client instance A lock can be set indefinitely, or can be timed out after a specified period. A lock can be renewed. For the use of rbd, a client will have to set an exclusive lock on the rbd header, with a specified (and relatively short timeout). It shouldn't do any I/O operations without holding that lock. It'll have to renew that lock, and failing to do so (meaning the lock was broken by another client) will require it to stop any I/O operations. The overriding client should wait for the original client's lock period timeout before initiating a new new I/O. As I said, this is a cooperative scheme, it doesn't prevent a buggy/bad client to send unwanted I/Os to the osds. A client cannot know that its lock was broken without explicitly checking it, or trying to renew it (when it's an exclusive lock). We can achieve that using watch/notify, though I'm not sure it's worth the extra trouble (watch/notify doesn't solve everything either). Yehuda ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: No lock on RBD allow several mount on different servers... 2012-08-13 17:49 ` Yehuda Sadeh @ 2012-08-15 8:18 ` Sébastien Han 0 siblings, 0 replies; 10+ messages in thread From: Sébastien Han @ 2012-08-15 8:18 UTC (permalink / raw) To: Yehuda Sadeh; +Cc: Josh Durgin, Gregory Farnum, ceph-devel, Marcus Sorensen Hi guys, Thank you for the tremendous answers :D How far are we to see this feature in the stable branch? Part of the 0.48.x or far away from that? Cheers! On Mon, Aug 13, 2012 at 7:49 PM, Yehuda Sadeh <yehuda@inktank.com> wrote: > On Mon, Aug 13, 2012 at 10:22 AM, Josh Durgin <josh.durgin@inktank.com> wrote: >> On 08/13/2012 09:55 AM, Gregory Farnum wrote: >>> >>> We've discussed some of the issues here a little bit before. See >>> http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/7094 if >>> you're interested. >>> >>> Josh, can you discuss the current status of the advisory locking? >>> -Greg >> >> >> Yehuda reworked it into a generic rados class so it can be used outside >> of rbd. It hasn't been re-integrated with rbd yet, and I haven't looked >> at it closely since the generalization. Yehuda could describe it in >> more detail. >> > > The lock objclass provides a generic way to set locks on objects. It > is a cooperative scheme. The following operations are available: > - lock (exclusive or shared) > - unlock -- remove a lock that was set by the same client instance > - break -- remove a lock that was set by a different client instance > > A lock can be set indefinitely, or can be timed out after a specified > period. A lock can be renewed. > > For the use of rbd, a client will have to set an exclusive lock on the > rbd header, with a specified (and relatively short timeout). It > shouldn't do any I/O operations without holding that lock. It'll have > to renew that lock, and failing to do so (meaning the lock was broken > by another client) will require it to stop any I/O operations. The > overriding client should wait for the original client's lock period > timeout before initiating a new new I/O. As I said, this is a > cooperative scheme, it doesn't prevent a buggy/bad client to send > unwanted I/Os to the osds. A client cannot know that its lock was > broken without explicitly checking it, or trying to renew it (when > it's an exclusive lock). We can achieve that using watch/notify, > though I'm not sure it's worth the extra trouble (watch/notify doesn't > solve everything either). > > > Yehuda ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-08-15 8:19 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-11 23:50 No lock on RBD allow several mount on different servers Sébastien Han
[not found] ` <CALFpzo49Urnf8rnFCQ=wQ8eFMR0-8FWh2=9nKrCAxb+0Xm0rVQ@mail.gmail.com>
[not found] ` <CAOLwVUnSUpAC69W48gbz-+7L7+p9z5tioODh_hPwVEt39GDvHw@mail.gmail.com>
[not found] ` <CALFpzo4X7iL6aEUtqyEBp4AMDxKkK9wtwPx35WQVauYQbe8Hng@mail.gmail.com>
2012-08-12 0:35 ` Marcus Sorensen
2012-08-12 0:53 ` Marcus Sorensen
2012-08-12 8:40 ` Sebastien HAN
2012-08-12 9:37 ` Smart Weblications GmbH - Florian Wiessner
2012-08-12 15:44 ` Sage Weil
2012-08-13 16:55 ` Gregory Farnum
2012-08-13 17:22 ` Josh Durgin
2012-08-13 17:49 ` Yehuda Sadeh
2012-08-15 8:18 ` Sébastien Han
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.