No lock on RBD allow several mount on different servers...

All of lore.kernel.org
 help / color / mirror / Atom feed

* No lock on RBD allow several mount on different servers...
@ 2012-08-11 23:50 Sébastien Han
       [not found] ` <CALFpzo49Urnf8rnFCQ=wQ8eFMR0-8FWh2=9nKrCAxb+0Xm0rVQ@mail.gmail.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Sébastien Han @ 2012-08-11 23:50 UTC (permalink / raw)
  To: ceph-devel

Hi guys,

With RBD images, the theory makes possible to mount them multiple
times on different servers, of course **no one** wants that. If you
care about the consistency of your data :D
I was wondering if ceph has any lock ability on the RBD device like
DRBD does with secondary resource. Apparently not, I was able to mount
an image on multiple server and wrote data on both.

Is this an incoming feature?

I don't really know the difficulty level that this kind of feature
implies, but it would be nice to have it.

Cheers!

^ permalink raw reply	[flat|nested] 10+ messages in thread

[parent not found: <CALFpzo49Urnf8rnFCQ=wQ8eFMR0-8FWh2=9nKrCAxb+0Xm0rVQ@mail.gmail.com>]

[parent not found: <CAOLwVUnSUpAC69W48gbz-+7L7+p9z5tioODh_hPwVEt39GDvHw@mail.gmail.com>]

[parent not found: <CALFpzo4X7iL6aEUtqyEBp4AMDxKkK9wtwPx35WQVauYQbe8Hng@mail.gmail.com>]

* Re: No lock on RBD allow several mount on different servers...
       [not found]     ` <CALFpzo4X7iL6aEUtqyEBp4AMDxKkK9wtwPx35WQVauYQbe8Hng@mail.gmail.com>
@ 2012-08-12  0:35       ` Marcus Sorensen
  2012-08-12  0:53         ` Marcus Sorensen
  0 siblings, 1 reply; 10+ messages in thread
From: Marcus Sorensen @ 2012-08-12  0:35 UTC (permalink / raw)
  To: Sébastien Han, ceph-devel

What I mean is that my understanding of RBD is that it is designed to
do no more than to present a block device. In that context, what
you're asking is more like whether they will support persistent
reservations.  RBD being just a block device is at a lower level, and
you'd have to add something on top of it that is aware of
sharing/locking.

On Sat, Aug 11, 2012 at 6:26 PM, Marcus Sorensen <shadowsor@gmail.com> wrote:
> But you put a /dev/drbd atop a block device.
>
> On Aug 11, 2012 6:09 PM, "Sébastien Han" <han.sebastien@gmail.com> wrote:
>>
>> Hi Marcus,
>>
>> I didn't really get your first sentence, but I don't think so, for
>> instance DRBD manages his own /dev/drbd device and voluntary puts a
>> lock on a resource with a 'secondary' state, like the single-primary
>> mode. So I would say that it's even higher than lower...
>>
>> Cheers!
>>
>> On Sun, Aug 12, 2012 at 1:58 AM, Marcus Sorensen <shadowsor@gmail.com>
>> wrote:
>> > Isn't it supposed to be lower level than that? More like just a block
>> > device
>> > such as a SAN or iscsi device? DRBD(GFS,OCFS,CLVM) goes on top of that.
>> >
>> > On Aug 11, 2012 5:51 PM, "Sébastien Han" <han.sebastien@gmail.com>
>> > wrote:
>> >>
>> >> Hi guys,
>> >>
>> >> With RBD images, the theory makes possible to mount them multiple
>> >> times on different servers, of course **no one** wants that. If you
>> >> care about the consistency of your data :D
>> >> I was wondering if ceph has any lock ability on the RBD device like
>> >> DRBD does with secondary resource. Apparently not, I was able to mount
>> >> an image on multiple server and wrote data on both.
>> >>
>> >> Is this an incoming feature?
>> >>
>> >> I don't really know the difficulty level that this kind of feature
>> >> implies, but it would be nice to have it.
>> >>
>> >> Cheers!
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> >> in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No lock on RBD allow several mount on different servers...
  2012-08-12  0:35       ` Marcus Sorensen
@ 2012-08-12  0:53         ` Marcus Sorensen
  2012-08-12  8:40           ` Sebastien HAN
  0 siblings, 1 reply; 10+ messages in thread
From: Marcus Sorensen @ 2012-08-12  0:53 UTC (permalink / raw)
  To: Sébastien Han, ceph-devel

Oh, and note there's recently been an RBD caching mode added, this
would mess up any multi-mount or sharing you'd attempt to do of an RBD
device. It's optional, though.

http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/6402

Still, you'd probably need something on top of it to manage sharing,
just like with any other block device.  I'm interested to see what the
devs say though.

On Sat, Aug 11, 2012 at 6:35 PM, Marcus Sorensen <shadowsor@gmail.com> wrote:
> What I mean is that my understanding of RBD is that it is designed to
> do no more than to present a block device. In that context, what
> you're asking is more like whether they will support persistent
> reservations.  RBD being just a block device is at a lower level, and
> you'd have to add something on top of it that is aware of
> sharing/locking.
>
> On Sat, Aug 11, 2012 at 6:26 PM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>> But you put a /dev/drbd atop a block device.
>>
>> On Aug 11, 2012 6:09 PM, "Sébastien Han" <han.sebastien@gmail.com> wrote:
>>>
>>> Hi Marcus,
>>>
>>> I didn't really get your first sentence, but I don't think so, for
>>> instance DRBD manages his own /dev/drbd device and voluntary puts a
>>> lock on a resource with a 'secondary' state, like the single-primary
>>> mode. So I would say that it's even higher than lower...
>>>
>>> Cheers!
>>>
>>> On Sun, Aug 12, 2012 at 1:58 AM, Marcus Sorensen <shadowsor@gmail.com>
>>> wrote:
>>> > Isn't it supposed to be lower level than that? More like just a block
>>> > device
>>> > such as a SAN or iscsi device? DRBD(GFS,OCFS,CLVM) goes on top of that.
>>> >
>>> > On Aug 11, 2012 5:51 PM, "Sébastien Han" <han.sebastien@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi guys,
>>> >>
>>> >> With RBD images, the theory makes possible to mount them multiple
>>> >> times on different servers, of course **no one** wants that. If you
>>> >> care about the consistency of your data :D
>>> >> I was wondering if ceph has any lock ability on the RBD device like
>>> >> DRBD does with secondary resource. Apparently not, I was able to mount
>>> >> an image on multiple server and wrote data on both.
>>> >>
>>> >> Is this an incoming feature?
>>> >>
>>> >> I don't really know the difficulty level that this kind of feature
>>> >> implies, but it would be nice to have it.
>>> >>
>>> >> Cheers!
>>> >> --
>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> >> in
>>> >> the body of a message to majordomo@vger.kernel.org
>>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No lock on RBD allow several mount on different servers...
  2012-08-12  0:53         ` Marcus Sorensen
@ 2012-08-12  8:40           ` Sebastien HAN
  2012-08-12  9:37             ` Smart Weblications GmbH - Florian Wiessner
  2012-08-12 15:44             ` Sage Weil
  0 siblings, 2 replies; 10+ messages in thread
From: Sebastien HAN @ 2012-08-12  8:40 UTC (permalink / raw)
  To: Marcus Sorensen; +Cc: ceph-devel

Hi Marcus,

I completely understand your point and of course I'm agree. Even if this King of locking occurs with software on top of the block device, it doesn't look impossible to me. And it's more than persistent reservation because it's also fearly depend of the filesystem on top of it. For example, you would like to be able to allow a dual-primary mode...

After our brief exchange, I have to admit that this looks harder than I expected. Maybe this would require a little daemon running on the server where the device is mapped... A little bit overkill and out of the design...

I'm also curious to hear the output from the devs! 

Ps: I knew the rbd caching but I never tried it, I will. 

Thanks you again :-)

Cheers!

On 12 août 2012, at 02:53, Marcus Sorensen <shadowsor@gmail.com> wrote:

> Oh, and note there's recently been an RBD caching mode added, this
> would mess up any multi-mount or sharing you'd attempt to do of an RBD
> device. It's optional, though.
> 
> http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/6402
> 
> Still, you'd probably need something on top of it to manage sharing,
> just like with any other block device.  I'm interested to see what the
> devs say though.
> 
> On Sat, Aug 11, 2012 at 6:35 PM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>> What I mean is that my understanding of RBD is that it is designed to
>> do no more than to present a block device. In that context, what
>> you're asking is more like whether they will support persistent
>> reservations.  RBD being just a block device is at a lower level, and
>> you'd have to add something on top of it that is aware of
>> sharing/locking.
>> 
>> On Sat, Aug 11, 2012 at 6:26 PM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>>> But you put a /dev/drbd atop a block device.
>>> 
>>> On Aug 11, 2012 6:09 PM, "Sébastien Han" <han.sebastien@gmail.com> wrote:
>>>> 
>>>> Hi Marcus,
>>>> 
>>>> I didn't really get your first sentence, but I don't think so, for
>>>> instance DRBD manages his own /dev/drbd device and voluntary puts a
>>>> lock on a resource with a 'secondary' state, like the single-primary
>>>> mode. So I would say that it's even higher than lower...
>>>> 
>>>> Cheers!
>>>> 
>>>> On Sun, Aug 12, 2012 at 1:58 AM, Marcus Sorensen <shadowsor@gmail.com>
>>>> wrote:
>>>>> Isn't it supposed to be lower level than that? More like just a block
>>>>> device
>>>>> such as a SAN or iscsi device? DRBD(GFS,OCFS,CLVM) goes on top of that.
>>>>> 
>>>>> On Aug 11, 2012 5:51 PM, "Sébastien Han" <han.sebastien@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi guys,
>>>>>> 
>>>>>> With RBD images, the theory makes possible to mount them multiple
>>>>>> times on different servers, of course **no one** wants that. If you
>>>>>> care about the consistency of your data :D
>>>>>> I was wondering if ceph has any lock ability on the RBD device like
>>>>>> DRBD does with secondary resource. Apparently not, I was able to mount
>>>>>> an image on multiple server and wrote data on both.
>>>>>> 
>>>>>> Is this an incoming feature?
>>>>>> 
>>>>>> I don't really know the difficulty level that this kind of feature
>>>>>> implies, but it would be nice to have it.
>>>>>> 
>>>>>> Cheers!
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>> in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No lock on RBD allow several mount on different servers...
  2012-08-12  8:40           ` Sebastien HAN
@ 2012-08-12  9:37             ` Smart Weblications GmbH - Florian Wiessner
  2012-08-12 15:44             ` Sage Weil
  1 sibling, 0 replies; 10+ messages in thread
From: Smart Weblications GmbH - Florian Wiessner @ 2012-08-12  9:37 UTC (permalink / raw)
  To: Sebastien HAN; +Cc: Marcus Sorensen, ceph-devel

Am 12.08.2012 10:40, schrieb Sebastien HAN:

>>>>>>>
>>>>>>> With RBD images, the theory makes possible to mount them multiple
>>>>>>> times on different servers, of course **no one** wants that. If you
>>>>>>> care about the consistency of your data :D
>>>>>>> I was wondering if ceph has any lock ability on the RBD device like
>>>>>>> DRBD does with secondary resource. Apparently not, I was able to mount
>>>>>>> an image on multiple server and wrote data on both.
>>>>>>>
>>>>>>> Is this an incoming feature?
>>>>>>>
>>>>>>> I don't really know the difficulty level that this kind of feature
>>>>>>> implies, but it would be nice to have it.
>>>>>>>

I use ocfs2 ontop of mapped rbd images - works great.


-- 

Mit freundlichen Grüßen,

Florian Wiessner

Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila

fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de

--
Sitz der Gesellschaft: Naila
Geschäftsführer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No lock on RBD allow several mount on different servers...
  2012-08-12  8:40           ` Sebastien HAN
  2012-08-12  9:37             ` Smart Weblications GmbH - Florian Wiessner
@ 2012-08-12 15:44             ` Sage Weil
  2012-08-13 16:55               ` Gregory Farnum
  1 sibling, 1 reply; 10+ messages in thread
From: Sage Weil @ 2012-08-12 15:44 UTC (permalink / raw)
  To: Sebastien HAN; +Cc: Marcus Sorensen, ceph-devel

RBD image locking is on roadmap, but it's tricky.  Almost all of the 
pieces are in place for exclusive locking of the image header, which will 
let the user know when other nodes have the image mapped, and give them 
the option to break their lock and take over ownership.

The real challenge is fencing.  Unlike move conventional options like 
SCSI, the RBD image is distributed across the entire cluster, so ensuring 
that the old guy doesn't still have IOs in flight that will stomp on the 
new owner means that potentially everyone needs to be informed that the 
bad guy should be locked out.

I think there are a few options:

1- The user has their own fencing or STOGITH on top of rbd, informed by 
   the rbd locking.  Pull the plug, update your iptables, whatever.  Not 
   very friendly.
2- Extend the rados 'blacklist' functionality to let you ensure that every 
   node in the cluster has received the updated osdmap+blacklist 
   information, so that you can be sure no further IO from the old guy is 
   possible.
3- Use the same approach that ceph-mds fencing uses, in which the old 
   owner isn't known to be fenced away from a particular object until the 
   new owner reads/touches that object.

My hope is that we can get away with #3, in which case all of the basic 
pieces are in place and the real remaining work is integration and 
testing.  The logic goes something like this:

File systems write to blocks on disk in a somewhat ordered fashion.  
After writing a bunch of data, they approach a 'consistency point' where 
their journal and/or superblocks must be flushed and things 'commit' to 
disk.  At that point, if the IO fails or blocks, it won't continue to 
clobber other parts of the disk.

When an fs in mounted, those same critical areas are read (superblock, 
journal, etc.).  The existing client/osd interaction ensures that if 
the new guy knows that the old guy is fenced, the act of reading 
ensures that the relevant ceph-osds will find out too and that 
paticular object will be fenced.

The resulting conclusion is that if a file system (or application on top 
of it doing direct io) is sufficiently well-behaved that will be not 
corrupt itself when the disk reorders IOs (they do) and issues 
barrier/flush operations at the appropriate time (in modern kernels, they 
do), then it will work.

I suppose it's roughly analogous to Schroedinger's cat: until the new 
owner reads a block, it may or may not still be modified/modifiable by the 
old guy, but as soon as it is observed, its state is known.

What do you guys think?  If that doesn't work, I think we're stuck with 
#2, which is expensive but doable.

sage

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No lock on RBD allow several mount on different servers...
  2012-08-12 15:44             ` Sage Weil
@ 2012-08-13 16:55               ` Gregory Farnum
  2012-08-13 17:22                 ` Josh Durgin
  0 siblings, 1 reply; 10+ messages in thread
From: Gregory Farnum @ 2012-08-13 16:55 UTC (permalink / raw)
  To: ceph-devel, Josh Durgin; +Cc: Sebastien HAN, Marcus Sorensen

We've discussed some of the issues here a little bit before. See
http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/7094 if
you're interested.

Josh, can you discuss the current status of the advisory locking?
-Greg

On Sun, Aug 12, 2012 at 8:44 AM, Sage Weil <sage@inktank.com> wrote:
> RBD image locking is on roadmap, but it's tricky.  Almost all of the
> pieces are in place for exclusive locking of the image header, which will
> let the user know when other nodes have the image mapped, and give them
> the option to break their lock and take over ownership.
>
> The real challenge is fencing.  Unlike move conventional options like
> SCSI, the RBD image is distributed across the entire cluster, so ensuring
> that the old guy doesn't still have IOs in flight that will stomp on the
> new owner means that potentially everyone needs to be informed that the
> bad guy should be locked out.
>
> I think there are a few options:
>
> 1- The user has their own fencing or STOGITH on top of rbd, informed by
>    the rbd locking.  Pull the plug, update your iptables, whatever.  Not
>    very friendly.
> 2- Extend the rados 'blacklist' functionality to let you ensure that every
>    node in the cluster has received the updated osdmap+blacklist
>    information, so that you can be sure no further IO from the old guy is
>    possible.
> 3- Use the same approach that ceph-mds fencing uses, in which the old
>    owner isn't known to be fenced away from a particular object until the
>    new owner reads/touches that object.
>
> My hope is that we can get away with #3, in which case all of the basic
> pieces are in place and the real remaining work is integration and
> testing.  The logic goes something like this:
>
> File systems write to blocks on disk in a somewhat ordered fashion.
> After writing a bunch of data, they approach a 'consistency point' where
> their journal and/or superblocks must be flushed and things 'commit' to
> disk.  At that point, if the IO fails or blocks, it won't continue to
> clobber other parts of the disk.
>
> When an fs in mounted, those same critical areas are read (superblock,
> journal, etc.).  The existing client/osd interaction ensures that if
> the new guy knows that the old guy is fenced, the act of reading
> ensures that the relevant ceph-osds will find out too and that
> paticular object will be fenced.
>
> The resulting conclusion is that if a file system (or application on top
> of it doing direct io) is sufficiently well-behaved that will be not
> corrupt itself when the disk reorders IOs (they do) and issues
> barrier/flush operations at the appropriate time (in modern kernels, they
> do), then it will work.
>
> I suppose it's roughly analogous to Schroedinger's cat: until the new
> owner reads a block, it may or may not still be modified/modifiable by the
> old guy, but as soon as it is observed, its state is known.
>
> What do you guys think?  If that doesn't work, I think we're stuck with
> #2, which is expensive but doable.
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No lock on RBD allow several mount on different servers...
  2012-08-13 16:55               ` Gregory Farnum
@ 2012-08-13 17:22                 ` Josh Durgin
  2012-08-13 17:49                   ` Yehuda Sadeh
  0 siblings, 1 reply; 10+ messages in thread
From: Josh Durgin @ 2012-08-13 17:22 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel, Sebastien HAN, Marcus Sorensen

On 08/13/2012 09:55 AM, Gregory Farnum wrote:
> We've discussed some of the issues here a little bit before. See
> http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/7094 if
> you're interested.
>
> Josh, can you discuss the current status of the advisory locking?
> -Greg

Yehuda reworked it into a generic rados class so it can be used outside
of rbd. It hasn't been re-integrated with rbd yet, and I haven't looked
at it closely since the generalization. Yehuda could describe it in
more detail.

Josh

> On Sun, Aug 12, 2012 at 8:44 AM, Sage Weil <sage@inktank.com> wrote:
>> RBD image locking is on roadmap, but it's tricky.  Almost all of the
>> pieces are in place for exclusive locking of the image header, which will
>> let the user know when other nodes have the image mapped, and give them
>> the option to break their lock and take over ownership.
>>
>> The real challenge is fencing.  Unlike move conventional options like
>> SCSI, the RBD image is distributed across the entire cluster, so ensuring
>> that the old guy doesn't still have IOs in flight that will stomp on the
>> new owner means that potentially everyone needs to be informed that the
>> bad guy should be locked out.
>>
>> I think there are a few options:
>>
>> 1- The user has their own fencing or STOGITH on top of rbd, informed by
>>     the rbd locking.  Pull the plug, update your iptables, whatever.  Not
>>     very friendly.
>> 2- Extend the rados 'blacklist' functionality to let you ensure that every
>>     node in the cluster has received the updated osdmap+blacklist
>>     information, so that you can be sure no further IO from the old guy is
>>     possible.
>> 3- Use the same approach that ceph-mds fencing uses, in which the old
>>     owner isn't known to be fenced away from a particular object until the
>>     new owner reads/touches that object.
>>
>> My hope is that we can get away with #3, in which case all of the basic
>> pieces are in place and the real remaining work is integration and
>> testing.  The logic goes something like this:
>>
>> File systems write to blocks on disk in a somewhat ordered fashion.
>> After writing a bunch of data, they approach a 'consistency point' where
>> their journal and/or superblocks must be flushed and things 'commit' to
>> disk.  At that point, if the IO fails or blocks, it won't continue to
>> clobber other parts of the disk.
>>
>> When an fs in mounted, those same critical areas are read (superblock,
>> journal, etc.).  The existing client/osd interaction ensures that if
>> the new guy knows that the old guy is fenced, the act of reading
>> ensures that the relevant ceph-osds will find out too and that
>> paticular object will be fenced.
>>
>> The resulting conclusion is that if a file system (or application on top
>> of it doing direct io) is sufficiently well-behaved that will be not
>> corrupt itself when the disk reorders IOs (they do) and issues
>> barrier/flush operations at the appropriate time (in modern kernels, they
>> do), then it will work.
>>
>> I suppose it's roughly analogous to Schroedinger's cat: until the new
>> owner reads a block, it may or may not still be modified/modifiable by the
>> old guy, but as soon as it is observed, its state is known.
>>
>> What do you guys think?  If that doesn't work, I think we're stuck with
>> #2, which is expensive but doable.
>>
>> sage
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No lock on RBD allow several mount on different servers...
  2012-08-13 17:22                 ` Josh Durgin
@ 2012-08-13 17:49                   ` Yehuda Sadeh
  2012-08-15  8:18                     ` Sébastien Han
  0 siblings, 1 reply; 10+ messages in thread
From: Yehuda Sadeh @ 2012-08-13 17:49 UTC (permalink / raw)
  To: Josh Durgin; +Cc: Gregory Farnum, ceph-devel, Sebastien HAN, Marcus Sorensen

On Mon, Aug 13, 2012 at 10:22 AM, Josh Durgin <josh.durgin@inktank.com> wrote:
> On 08/13/2012 09:55 AM, Gregory Farnum wrote:
>>
>> We've discussed some of the issues here a little bit before. See
>> http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/7094 if
>> you're interested.
>>
>> Josh, can you discuss the current status of the advisory locking?
>> -Greg
>
>
> Yehuda reworked it into a generic rados class so it can be used outside
> of rbd. It hasn't been re-integrated with rbd yet, and I haven't looked
> at it closely since the generalization. Yehuda could describe it in
> more detail.
>

The lock objclass provides a generic way to set locks on objects. It
is a cooperative scheme. The following operations are available:
 - lock (exclusive or shared)
 - unlock -- remove a lock that was set by the same client instance
 - break -- remove a lock that was set by a different client instance

A lock can be set indefinitely, or can be timed out after a specified
period. A lock can be renewed.

For the use of rbd, a client will have to set an exclusive lock on the
rbd header, with a specified (and relatively short timeout). It
shouldn't do any I/O operations without holding that lock. It'll have
to renew that lock, and failing to do so (meaning the lock was broken
by another client) will require it to stop any I/O operations. The
overriding client should wait for the original client's lock period
timeout before initiating a new new I/O. As I said, this is a
cooperative scheme, it doesn't prevent a buggy/bad client to send
unwanted I/Os to the osds. A client cannot know that its lock was
broken without explicitly checking it, or trying to renew it (when
it's an exclusive lock). We can achieve that using watch/notify,
though I'm not sure it's worth the extra trouble (watch/notify doesn't
solve everything either).

Yehuda

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No lock on RBD allow several mount on different servers...
  2012-08-13 17:49                   ` Yehuda Sadeh
@ 2012-08-15  8:18                     ` Sébastien Han
  0 siblings, 0 replies; 10+ messages in thread
From: Sébastien Han @ 2012-08-15  8:18 UTC (permalink / raw)
  To: Yehuda Sadeh; +Cc: Josh Durgin, Gregory Farnum, ceph-devel, Marcus Sorensen

Hi guys,

Thank you for the tremendous answers :D
How far are we to see this feature in the stable branch? Part of the
0.48.x or far away from that?

Cheers!

On Mon, Aug 13, 2012 at 7:49 PM, Yehuda Sadeh <yehuda@inktank.com> wrote:
> On Mon, Aug 13, 2012 at 10:22 AM, Josh Durgin <josh.durgin@inktank.com> wrote:
>> On 08/13/2012 09:55 AM, Gregory Farnum wrote:
>>>
>>> We've discussed some of the issues here a little bit before. See
>>> http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/7094 if
>>> you're interested.
>>>
>>> Josh, can you discuss the current status of the advisory locking?
>>> -Greg
>>
>>
>> Yehuda reworked it into a generic rados class so it can be used outside
>> of rbd. It hasn't been re-integrated with rbd yet, and I haven't looked
>> at it closely since the generalization. Yehuda could describe it in
>> more detail.
>>
>
> The lock objclass provides a generic way to set locks on objects. It
> is a cooperative scheme. The following operations are available:
>  - lock (exclusive or shared)
>  - unlock -- remove a lock that was set by the same client instance
>  - break -- remove a lock that was set by a different client instance
>
> A lock can be set indefinitely, or can be timed out after a specified
> period. A lock can be renewed.
>
> For the use of rbd, a client will have to set an exclusive lock on the
> rbd header, with a specified (and relatively short timeout). It
> shouldn't do any I/O operations without holding that lock. It'll have
> to renew that lock, and failing to do so (meaning the lock was broken
> by another client) will require it to stop any I/O operations. The
> overriding client should wait for the original client's lock period
> timeout before initiating a new new I/O. As I said, this is a
> cooperative scheme, it doesn't prevent a buggy/bad client to send
> unwanted I/Os to the osds. A client cannot know that its lock was
> broken without explicitly checking it, or trying to renew it (when
> it's an exclusive lock). We can achieve that using watch/notify,
> though I'm not sure it's worth the extra trouble (watch/notify doesn't
> solve everything either).
>
>
> Yehuda

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-08-15  8:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-11 23:50 No lock on RBD allow several mount on different servers Sébastien Han
     [not found] ` <CALFpzo49Urnf8rnFCQ=wQ8eFMR0-8FWh2=9nKrCAxb+0Xm0rVQ@mail.gmail.com>
     [not found]   ` <CAOLwVUnSUpAC69W48gbz-+7L7+p9z5tioODh_hPwVEt39GDvHw@mail.gmail.com>
     [not found]     ` <CALFpzo4X7iL6aEUtqyEBp4AMDxKkK9wtwPx35WQVauYQbe8Hng@mail.gmail.com>
2012-08-12  0:35       ` Marcus Sorensen
2012-08-12  0:53         ` Marcus Sorensen
2012-08-12  8:40           ` Sebastien HAN
2012-08-12  9:37             ` Smart Weblications GmbH - Florian Wiessner
2012-08-12 15:44             ` Sage Weil
2012-08-13 16:55               ` Gregory Farnum
2012-08-13 17:22                 ` Josh Durgin
2012-08-13 17:49                   ` Yehuda Sadeh
2012-08-15  8:18                     ` Sébastien Han

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.