Removing disks / OSDs

All of lore.kernel.org
 help / color / mirror / Atom feed

* Removing disks / OSDs
@ 2013-10-21 16:15 Loic Dachary
  2013-10-21 16:49 ` Gregory Farnum
  0 siblings, 1 reply; 12+ messages in thread
From: Loic Dachary @ 2013-10-21 16:15 UTC (permalink / raw)
  To: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 2257 bytes --]

Hi Ceph,

In the context of the ceph puppet module ( https://wiki.openstack.org/wiki/Puppet-openstack/ceph-blueprint ) I tried to think about what should be provided to deal with disks / OSDs when they are removed or moved around.

Here is a possible scenario:

* Machine A dies and contains OSD 42
* ceph osd rm 42 is done to get rid of the OSD
* ceph-prepare is called on a new disk and gets OSD id 42

ceph/src/mon/OSDMonitor.cc

    // allocate a new id
    for (i=0; i < osdmap.get_max_osd(); i++) {
      if (!osdmap.exists(i) &&
	  pending_inc.new_up_client.count(i) == 0 &&
	  (pending_inc.new_state.count(i) == 0 ||
	   (pending_inc.new_state[i] & CEPH_OSD_EXISTS) == 0))
	goto done;
    }

* The disk of machine A is still good and is plugged into machine C
* the udev logic sees it has the ceph magic uuid and contains a well formed osd file system and runs the osd daemon on it. The osd daemon fails and dies because its key does not match. It will try again and fail in the same way when the machine reboots.

If the osd id was not reused, the disk would find its way back in the cluster and be reused without manual intervention. Since ceph-disk uses the osd uuid to create the disk, it does not matter that it has been removed : https://github.com/ceph/ceph/blob/master/src/ceph-disk#L458 . I'm not sure to understand how the key that was previously registered is re-imported. If I understand correctly it is created with ceph-osd --mkkey https://github.com/ceph/ceph/blob/master/src/ceph-disk#L1301 and stored in the osd tree at the location specified by --keyring https://github.com/ceph/ceph/blob/master/src/ceph-disk#L1307 . 

At this point my understanding is that (as long as OSD ids are not reused), removing a disk or moving it from machine to machine, even over a long period of time does not require any action. The OSD id is an int which is probably large enough for any kind of cluster in the near future. The OSD ids that are not used and not removed could be cleaned from time to time to garbage collect the space they use.

Please let me know if I've missed something :-)

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Removing disks / OSDs
  2013-10-21 16:15 Removing disks / OSDs Loic Dachary
@ 2013-10-21 16:49 ` Gregory Farnum
  2013-10-21 16:57   ` Loic Dachary
  2013-10-22  6:13   ` Loic Dachary
  0 siblings, 2 replies; 12+ messages in thread
From: Gregory Farnum @ 2013-10-21 16:49 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

I'm not quite sure what questions you're actually asking here...
In general, the OSD is not removed from the system without explicit
admin intervention. When it is removed, all traces of it should be
zapped (including its key), so it can't reconnect.
If it hasn't been removed, then indeed it will continue working
properly even if moved to a different box.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Oct 21, 2013 at 9:15 AM, Loic Dachary <loic@dachary.org> wrote:
> Hi Ceph,
>
> In the context of the ceph puppet module ( https://wiki.openstack.org/wiki/Puppet-openstack/ceph-blueprint ) I tried to think about what should be provided to deal with disks / OSDs when they are removed or moved around.
>
> Here is a possible scenario:
>
> * Machine A dies and contains OSD 42
> * ceph osd rm 42 is done to get rid of the OSD
> * ceph-prepare is called on a new disk and gets OSD id 42
>
> ceph/src/mon/OSDMonitor.cc
>
>     // allocate a new id
>     for (i=0; i < osdmap.get_max_osd(); i++) {
>       if (!osdmap.exists(i) &&
>           pending_inc.new_up_client.count(i) == 0 &&
>           (pending_inc.new_state.count(i) == 0 ||
>            (pending_inc.new_state[i] & CEPH_OSD_EXISTS) == 0))
>         goto done;
>     }
>
> * The disk of machine A is still good and is plugged into machine C
> * the udev logic sees it has the ceph magic uuid and contains a well formed osd file system and runs the osd daemon on it. The osd daemon fails and dies because its key does not match. It will try again and fail in the same way when the machine reboots.
>
> If the osd id was not reused, the disk would find its way back in the cluster and be reused without manual intervention. Since ceph-disk uses the osd uuid to create the disk, it does not matter that it has been removed : https://github.com/ceph/ceph/blob/master/src/ceph-disk#L458 . I'm not sure to understand how the key that was previously registered is re-imported. If I understand correctly it is created with ceph-osd --mkkey https://github.com/ceph/ceph/blob/master/src/ceph-disk#L1301 and stored in the osd tree at the location specified by --keyring https://github.com/ceph/ceph/blob/master/src/ceph-disk#L1307 .
>
> At this point my understanding is that (as long as OSD ids are not reused), removing a disk or moving it from machine to machine, even over a long period of time does not require any action. The OSD id is an int which is probably large enough for any kind of cluster in the near future. The OSD ids that are not used and not removed could be cleaned from time to time to garbage collect the space they use.
>
> Please let me know if I've missed something :-)
>
> --
> Loïc Dachary, Artisan Logiciel Libre
> All that is necessary for the triumph of evil is that good people do nothing.
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Removing disks / OSDs
  2013-10-21 16:49 ` Gregory Farnum
@ 2013-10-21 16:57   ` Loic Dachary
  2013-10-21 17:17     ` Gregory Farnum
  2013-10-22  6:13   ` Loic Dachary
  1 sibling, 1 reply; 12+ messages in thread
From: Loic Dachary @ 2013-10-21 16:57 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 3436 bytes --]



On 21/10/2013 18:49, Gregory Farnum wrote:
> I'm not quite sure what questions you're actually asking here...

I guess I was asking if my understanding was correct.

> In general, the OSD is not removed from the system without explicit
> admin intervention. When it is removed, all traces of it should be
> zapped (including its key), so it can't reconnect.

Ok. So reusing osd ids is not an issue. If you reconnect a disk after osd rm the id it had, you get what you deserve and you can zap the disk so that it is formated again. Is it what you mean ?

> If it hasn't been removed, then indeed it will continue working
> properly even if moved to a different box.

Cool.

That makes it real simple from the point of view of tools like puppet :-)

Cheers

> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> 
> 
> On Mon, Oct 21, 2013 at 9:15 AM, Loic Dachary <loic@dachary.org> wrote:
>> Hi Ceph,
>>
>> In the context of the ceph puppet module ( https://wiki.openstack.org/wiki/Puppet-openstack/ceph-blueprint ) I tried to think about what should be provided to deal with disks / OSDs when they are removed or moved around.
>>
>> Here is a possible scenario:
>>
>> * Machine A dies and contains OSD 42
>> * ceph osd rm 42 is done to get rid of the OSD
>> * ceph-prepare is called on a new disk and gets OSD id 42
>>
>> ceph/src/mon/OSDMonitor.cc
>>
>>     // allocate a new id
>>     for (i=0; i < osdmap.get_max_osd(); i++) {
>>       if (!osdmap.exists(i) &&
>>           pending_inc.new_up_client.count(i) == 0 &&
>>           (pending_inc.new_state.count(i) == 0 ||
>>            (pending_inc.new_state[i] & CEPH_OSD_EXISTS) == 0))
>>         goto done;
>>     }
>>
>> * The disk of machine A is still good and is plugged into machine C
>> * the udev logic sees it has the ceph magic uuid and contains a well formed osd file system and runs the osd daemon on it. The osd daemon fails and dies because its key does not match. It will try again and fail in the same way when the machine reboots.
>>
>> If the osd id was not reused, the disk would find its way back in the cluster and be reused without manual intervention. Since ceph-disk uses the osd uuid to create the disk, it does not matter that it has been removed : https://github.com/ceph/ceph/blob/master/src/ceph-disk#L458 . I'm not sure to understand how the key that was previously registered is re-imported. If I understand correctly it is created with ceph-osd --mkkey https://github.com/ceph/ceph/blob/master/src/ceph-disk#L1301 and stored in the osd tree at the location specified by --keyring https://github.com/ceph/ceph/blob/master/src/ceph-disk#L1307 .
>>
>> At this point my understanding is that (as long as OSD ids are not reused), removing a disk or moving it from machine to machine, even over a long period of time does not require any action. The OSD id is an int which is probably large enough for any kind of cluster in the near future. The OSD ids that are not used and not removed could be cleaned from time to time to garbage collect the space they use.
>>
>> Please let me know if I've missed something :-)
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>> All that is necessary for the triumph of evil is that good people do nothing.
>>

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Removing disks / OSDs
  2013-10-21 16:57   ` Loic Dachary
@ 2013-10-21 17:17     ` Gregory Farnum
  2013-10-21 21:15       ` Mark Kirkwood
  0 siblings, 1 reply; 12+ messages in thread
From: Gregory Farnum @ 2013-10-21 17:17 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

On Mon, Oct 21, 2013 at 9:57 AM, Loic Dachary <loic@dachary.org> wrote:
>
>
> On 21/10/2013 18:49, Gregory Farnum wrote:
>> I'm not quite sure what questions you're actually asking here...
>
> I guess I was asking if my understanding was correct.
>
>> In general, the OSD is not removed from the system without explicit
>> admin intervention. When it is removed, all traces of it should be
>> zapped (including its key), so it can't reconnect.
>
> Ok. So reusing osd ids is not an issue. If you reconnect a disk after osd rm the id it had, you get what you deserve and you can zap the disk so that it is formated again. Is it what you mean ?

I was actually under the impression that "osd rm" would, itself, clear
out the keys as well as the osd map state, but it does not do so. That
looks like a serious bug! http://tracker.ceph.com/issues/6605
-Greg

>
>> If it hasn't been removed, then indeed it will continue working
>> properly even if moved to a different box.
>
> Cool.
>
> That makes it real simple from the point of view of tools like puppet :-)
>
> Cheers
>
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Mon, Oct 21, 2013 at 9:15 AM, Loic Dachary <loic@dachary.org> wrote:
>>> Hi Ceph,
>>>
>>> In the context of the ceph puppet module ( https://wiki.openstack.org/wiki/Puppet-openstack/ceph-blueprint ) I tried to think about what should be provided to deal with disks / OSDs when they are removed or moved around.
>>>
>>> Here is a possible scenario:
>>>
>>> * Machine A dies and contains OSD 42
>>> * ceph osd rm 42 is done to get rid of the OSD
>>> * ceph-prepare is called on a new disk and gets OSD id 42
>>>
>>> ceph/src/mon/OSDMonitor.cc
>>>
>>>     // allocate a new id
>>>     for (i=0; i < osdmap.get_max_osd(); i++) {
>>>       if (!osdmap.exists(i) &&
>>>           pending_inc.new_up_client.count(i) == 0 &&
>>>           (pending_inc.new_state.count(i) == 0 ||
>>>            (pending_inc.new_state[i] & CEPH_OSD_EXISTS) == 0))
>>>         goto done;
>>>     }
>>>
>>> * The disk of machine A is still good and is plugged into machine C
>>> * the udev logic sees it has the ceph magic uuid and contains a well formed osd file system and runs the osd daemon on it. The osd daemon fails and dies because its key does not match. It will try again and fail in the same way when the machine reboots.
>>>
>>> If the osd id was not reused, the disk would find its way back in the cluster and be reused without manual intervention. Since ceph-disk uses the osd uuid to create the disk, it does not matter that it has been removed : https://github.com/ceph/ceph/blob/master/src/ceph-disk#L458 . I'm not sure to understand how the key that was previously registered is re-imported. If I understand correctly it is created with ceph-osd --mkkey https://github.com/ceph/ceph/blob/master/src/ceph-disk#L1301 and stored in the osd tree at the location specified by --keyring https://github.com/ceph/ceph/blob/master/src/ceph-disk#L1307 .
>>>
>>> At this point my understanding is that (as long as OSD ids are not reused), removing a disk or moving it from machine to machine, even over a long period of time does not require any action. The OSD id is an int which is probably large enough for any kind of cluster in the near future. The OSD ids that are not used and not removed could be cleaned from time to time to garbage collect the space they use.
>>>
>>> Please let me know if I've missed something :-)
>>>
>>> --
>>> Loïc Dachary, Artisan Logiciel Libre
>>> All that is necessary for the triumph of evil is that good people do nothing.
>>>
>
> --
> Loïc Dachary, Artisan Logiciel Libre
> All that is necessary for the triumph of evil is that good people do nothing.
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Removing disks / OSDs
  2013-10-21 17:17     ` Gregory Farnum
@ 2013-10-21 21:15       ` Mark Kirkwood
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Kirkwood @ 2013-10-21 21:15 UTC (permalink / raw)
  To: Gregory Farnum, Loic Dachary; +Cc: Ceph Development

On 22/10/13 06:17, Gregory Farnum wrote:
> On Mon, Oct 21, 2013 at 9:57 AM, Loic Dachary <loic@dachary.org> wrote:
>>
>>
>> On 21/10/2013 18:49, Gregory Farnum wrote:
>>> I'm not quite sure what questions you're actually asking here...
>>
>> I guess I was asking if my understanding was correct.
>>
>>> In general, the OSD is not removed from the system without explicit
>>> admin intervention. When it is removed, all traces of it should be
>>> zapped (including its key), so it can't reconnect.
>>
>> Ok. So reusing osd ids is not an issue. If you reconnect a disk after osd rm the id it had, you get what you deserve and you can zap the disk so that it is formated again. Is it what you mean ?
>
> I was actually under the impression that "osd rm" would, itself, clear
> out the keys as well as the osd map state, but it does not do so. That
> looks like a serious bug! http://tracker.ceph.com/issues/6605
> -Greg
>

FWIW the docs here 
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ say that 
you need to do:

$ ceph osd crush remove osd.{osd.num}
$ ceph auth del osd.{osd-num}
$ ceph osd rm {osd-num}

Regards

>>
>>> If it hasn't been removed, then indeed it will continue working
>>> properly even if moved to a different box.
>>
>> Cool.
>>
>> That makes it real simple from the point of view of tools like puppet :-)
>>
>> Cheers
>>
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>
>>>
>>> On Mon, Oct 21, 2013 at 9:15 AM, Loic Dachary <loic@dachary.org> wrote:
>>>> Hi Ceph,
>>>>
>>>> In the context of the ceph puppet module ( https://wiki.openstack.org/wiki/Puppet-openstack/ceph-blueprint ) I tried to think about what should be provided to deal with disks / OSDs when they are removed or moved around.
>>>>
>>>> Here is a possible scenario:
>>>>
>>>> * Machine A dies and contains OSD 42
>>>> * ceph osd rm 42 is done to get rid of the OSD
>>>> * ceph-prepare is called on a new disk and gets OSD id 42
>>>>
>>>> ceph/src/mon/OSDMonitor.cc
>>>>
>>>>      // allocate a new id
>>>>      for (i=0; i < osdmap.get_max_osd(); i++) {
>>>>        if (!osdmap.exists(i) &&
>>>>            pending_inc.new_up_client.count(i) == 0 &&
>>>>            (pending_inc.new_state.count(i) == 0 ||
>>>>             (pending_inc.new_state[i] & CEPH_OSD_EXISTS) == 0))
>>>>          goto done;
>>>>      }
>>>>
>>>> * The disk of machine A is still good and is plugged into machine C
>>>> * the udev logic sees it has the ceph magic uuid and contains a well formed osd file system and runs the osd daemon on it. The osd daemon fails and dies because its key does not match. It will try again and fail in the same way when the machine reboots.
>>>>
>>>> If the osd id was not reused, the disk would find its way back in the cluster and be reused without manual intervention. Since ceph-disk uses the osd uuid to create the disk, it does not matter that it has been removed : https://github.com/ceph/ceph/blob/master/src/ceph-disk#L458 . I'm not sure to understand how the key that was previously registered is re-imported. If I understand correctly it is created with ceph-osd --mkkey https://github.com/ceph/ceph/blob/master/src/ceph-disk#L1301 and stored in the osd tree at the location specified by --keyring https://github.com/ceph/ceph/blob/master/src/ceph-disk#L1307 .
>>>>
>>>> At this point my understanding is that (as long as OSD ids are not reused), removing a disk or moving it from machine to machine, even over a long period of time does not require any action. The OSD id is an int which is probably large enough for any kind of cluster in the near future. The OSD ids that are not used and not removed could be cleaned from time to time to garbage collect the space they use.
>>>>
>>>> Please let me know if I've missed something :-)
>>>>
>>>> --
>>>> Loïc Dachary, Artisan Logiciel Libre
>>>> All that is necessary for the triumph of evil is that good people do nothing.
>>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>> All that is necessary for the triumph of evil is that good people do nothing.
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Removing disks / OSDs
  2013-10-21 16:49 ` Gregory Farnum
  2013-10-21 16:57   ` Loic Dachary
@ 2013-10-22  6:13   ` Loic Dachary
  2013-10-22 17:26     ` Gregory Farnum
  1 sibling, 1 reply; 12+ messages in thread
From: Loic Dachary @ 2013-10-22  6:13 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 3572 bytes --]



On 21/10/2013 18:49, Gregory Farnum wrote:
> I'm not quite sure what questions you're actually asking here...
> In general, the OSD is not removed from the system without explicit
> admin intervention. When it is removed, all traces of it should be
> zapped (including its key), so it can't reconnect.
> If it hasn't been removed, then indeed it will continue working
> properly even if moved to a different box.

If there is an external journal, the device containing the journal needs to be moved with the device containing the data. If I read ceph/src/upstart/ceph-osd.conf correctly, when the data device is plugged in the new machine it will fail to start because the journal is not there yet. When the journal device is plugged in, the ceph-osd.conf would be called because udev rule in ceph/udev/95-ceph-osd.rules call ceph-disk activate-journal.

Is my understanding correct ?

Cheers

> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> 
> 
> On Mon, Oct 21, 2013 at 9:15 AM, Loic Dachary <loic@dachary.org> wrote:
>> Hi Ceph,
>>
>> In the context of the ceph puppet module ( https://wiki.openstack.org/wiki/Puppet-openstack/ceph-blueprint ) I tried to think about what should be provided to deal with disks / OSDs when they are removed or moved around.
>>
>> Here is a possible scenario:
>>
>> * Machine A dies and contains OSD 42
>> * ceph osd rm 42 is done to get rid of the OSD
>> * ceph-prepare is called on a new disk and gets OSD id 42
>>
>> ceph/src/mon/OSDMonitor.cc
>>
>>     // allocate a new id
>>     for (i=0; i < osdmap.get_max_osd(); i++) {
>>       if (!osdmap.exists(i) &&
>>           pending_inc.new_up_client.count(i) == 0 &&
>>           (pending_inc.new_state.count(i) == 0 ||
>>            (pending_inc.new_state[i] & CEPH_OSD_EXISTS) == 0))
>>         goto done;
>>     }
>>
>> * The disk of machine A is still good and is plugged into machine C
>> * the udev logic sees it has the ceph magic uuid and contains a well formed osd file system and runs the osd daemon on it. The osd daemon fails and dies because its key does not match. It will try again and fail in the same way when the machine reboots.
>>
>> If the osd id was not reused, the disk would find its way back in the cluster and be reused without manual intervention. Since ceph-disk uses the osd uuid to create the disk, it does not matter that it has been removed : https://github.com/ceph/ceph/blob/master/src/ceph-disk#L458 . I'm not sure to understand how the key that was previously registered is re-imported. If I understand correctly it is created with ceph-osd --mkkey https://github.com/ceph/ceph/blob/master/src/ceph-disk#L1301 and stored in the osd tree at the location specified by --keyring https://github.com/ceph/ceph/blob/master/src/ceph-disk#L1307 .
>>
>> At this point my understanding is that (as long as OSD ids are not reused), removing a disk or moving it from machine to machine, even over a long period of time does not require any action. The OSD id is an int which is probably large enough for any kind of cluster in the near future. The OSD ids that are not used and not removed could be cleaned from time to time to garbage collect the space they use.
>>
>> Please let me know if I've missed something :-)
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>> All that is necessary for the triumph of evil is that good people do nothing.
>>

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Removing disks / OSDs
  2013-10-22  6:13   ` Loic Dachary
@ 2013-10-22 17:26     ` Gregory Farnum
  2013-10-22 17:31       ` Sage Weil
  2013-10-23  6:29       ` Loic Dachary
  0 siblings, 2 replies; 12+ messages in thread
From: Gregory Farnum @ 2013-10-22 17:26 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

On Mon, Oct 21, 2013 at 11:13 PM, Loic Dachary <loic@dachary.org> wrote:
>
>
> On 21/10/2013 18:49, Gregory Farnum wrote:
>> I'm not quite sure what questions you're actually asking here...
>> In general, the OSD is not removed from the system without explicit
>> admin intervention. When it is removed, all traces of it should be
>> zapped (including its key), so it can't reconnect.
>> If it hasn't been removed, then indeed it will continue working
>> properly even if moved to a different box.
>
> If there is an external journal, the device containing the journal needs to be moved with the device containing the data. If I read ceph/src/upstart/ceph-osd.conf correctly, when the data device is plugged in the new machine it will fail to start because the journal is not there yet. When the journal device is plugged in, the ceph-osd.conf would be called because udev rule in ceph/udev/95-ceph-osd.rules call ceph-disk activate-journal.
>
> Is my understanding correct ?

Well, after being wrong last time I'm a little reluctant to make
pronouncements from memory, but that definitely sounds correct to me.
:) If I were doing an audit I'd want to look at what happens if there
is a wrong journal in the correct location, etc.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Removing disks / OSDs
  2013-10-22 17:26     ` Gregory Farnum
@ 2013-10-22 17:31       ` Sage Weil
  2013-10-23 13:45         ` Loic Dachary
  2013-10-23  6:29       ` Loic Dachary
  1 sibling, 1 reply; 12+ messages in thread
From: Sage Weil @ 2013-10-22 17:31 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Loic Dachary, Ceph Development

On Tue, 22 Oct 2013, Gregory Farnum wrote:
> On Mon, Oct 21, 2013 at 11:13 PM, Loic Dachary <loic@dachary.org> wrote:
> >
> >
> > On 21/10/2013 18:49, Gregory Farnum wrote:
> >> I'm not quite sure what questions you're actually asking here...
> >> In general, the OSD is not removed from the system without explicit
> >> admin intervention. When it is removed, all traces of it should be
> >> zapped (including its key), so it can't reconnect.
> >> If it hasn't been removed, then indeed it will continue working
> >> properly even if moved to a different box.
> >
> > If there is an external journal, the device containing the journal needs to be moved with the device containing the data. If I read ceph/src/upstart/ceph-osd.conf correctly, when the data device is plugged in the new machine it will fail to start because the journal is not there yet. When the journal device is plugged in, the ceph-osd.conf would be called because udev rule in ceph/udev/95-ceph-osd.rules call ceph-disk activate-journal.
> >
> > Is my understanding correct ?
> 
> Well, after being wrong last time I'm a little reluctant to make
> pronouncements from memory, but that definitely sounds correct to me.

Yep, that's how it's supposed to work.  The activate-journal piece is 
somewhat recent though (I think maybe it wasn't in place for cuttlefish?).

> :) If I were doing an audit I'd want to look at what happens if there
> is a wrong journal in the correct location, etc.

The ceph-osd will fail on start because the uuid/fsid doesn't match.

sage

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Removing disks / OSDs
  2013-10-22 17:26     ` Gregory Farnum
  2013-10-22 17:31       ` Sage Weil
@ 2013-10-23  6:29       ` Loic Dachary
  1 sibling, 0 replies; 12+ messages in thread
From: Loic Dachary @ 2013-10-23  6:29 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 1618 bytes --]



On 22/10/2013 19:26, Gregory Farnum wrote:
> On Mon, Oct 21, 2013 at 11:13 PM, Loic Dachary <loic@dachary.org> wrote:
>>
>>
>> On 21/10/2013 18:49, Gregory Farnum wrote:
>>> I'm not quite sure what questions you're actually asking here...
>>> In general, the OSD is not removed from the system without explicit
>>> admin intervention. When it is removed, all traces of it should be
>>> zapped (including its key), so it can't reconnect.
>>> If it hasn't been removed, then indeed it will continue working
>>> properly even if moved to a different box.
>>
>> If there is an external journal, the device containing the journal needs to be moved with the device containing the data. If I read ceph/src/upstart/ceph-osd.conf correctly, when the data device is plugged in the new machine it will fail to start because the journal is not there yet. When the journal device is plugged in, the ceph-osd.conf would be called because udev rule in ceph/udev/95-ceph-osd.rules call ceph-disk activate-journal.
>>
>> Is my understanding correct ?
> 
> Well, after being wrong last time I'm a little reluctant to make
> pronouncements from memory, but that definitely sounds correct to me.
> :) If I were doing an audit I'd want to look at what happens if there
> is a wrong journal in the correct location, etc.

Thanks. It makes things real simple from the configuration point of view :-)

Cheers

> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Removing disks / OSDs
  2013-10-22 17:31       ` Sage Weil
@ 2013-10-23 13:45         ` Loic Dachary
  2013-10-25 13:11           ` Loic Dachary
  0 siblings, 1 reply; 12+ messages in thread
From: Loic Dachary @ 2013-10-23 13:45 UTC (permalink / raw)
  Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 1975 bytes --]

Hi,

As a conclusion to this thread and also because I realized that most people are unaware of the level of automation provided by ceph / udev, I just posted a quick summary : http://dachary.org/?p=2428

Cheers

On 22/10/2013 19:31, Sage Weil wrote:
> On Tue, 22 Oct 2013, Gregory Farnum wrote:
>> On Mon, Oct 21, 2013 at 11:13 PM, Loic Dachary <loic@dachary.org> wrote:
>>>
>>>
>>> On 21/10/2013 18:49, Gregory Farnum wrote:
>>>> I'm not quite sure what questions you're actually asking here...
>>>> In general, the OSD is not removed from the system without explicit
>>>> admin intervention. When it is removed, all traces of it should be
>>>> zapped (including its key), so it can't reconnect.
>>>> If it hasn't been removed, then indeed it will continue working
>>>> properly even if moved to a different box.
>>>
>>> If there is an external journal, the device containing the journal needs to be moved with the device containing the data. If I read ceph/src/upstart/ceph-osd.conf correctly, when the data device is plugged in the new machine it will fail to start because the journal is not there yet. When the journal device is plugged in, the ceph-osd.conf would be called because udev rule in ceph/udev/95-ceph-osd.rules call ceph-disk activate-journal.
>>>
>>> Is my understanding correct ?
>>
>> Well, after being wrong last time I'm a little reluctant to make
>> pronouncements from memory, but that definitely sounds correct to me.
> 
> Yep, that's how it's supposed to work.  The activate-journal piece is 
> somewhat recent though (I think maybe it wasn't in place for cuttlefish?).
> 
>> :) If I were doing an audit I'd want to look at what happens if there
>> is a wrong journal in the correct location, etc.
> 
> The ceph-osd will fail on start because the uuid/fsid doesn't match.
> 
> sage
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Removing disks / OSDs
  2013-10-23 13:45         ` Loic Dachary
@ 2013-10-25 13:11           ` Loic Dachary
  2013-10-25 15:58             ` Sage Weil
  0 siblings, 1 reply; 12+ messages in thread
From: Loic Dachary @ 2013-10-25 13:11 UTC (permalink / raw)
  To: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 2436 bytes --]

Hi,

If a disk containing an OSD is moved to another host, it is entirely possible that the objects it contains are violating a crush rule imposing that no object is replicated on the same host. Will this be taken care of as a side effect of updating the place of the OSD in the map ( https://github.com/ceph/ceph/blob/v0.61.9/src/upstart/ceph-osd.conf#L29 ) ? 

Cheers

On 23/10/2013 15:45, Loic Dachary wrote:
> Hi,
> 
> As a conclusion to this thread and also because I realized that most people are unaware of the level of automation provided by ceph / udev, I just posted a quick summary : http://dachary.org/?p=2428
> 
> Cheers
> 
> On 22/10/2013 19:31, Sage Weil wrote:
>> On Tue, 22 Oct 2013, Gregory Farnum wrote:
>>> On Mon, Oct 21, 2013 at 11:13 PM, Loic Dachary <loic@dachary.org> wrote:
>>>>
>>>>
>>>> On 21/10/2013 18:49, Gregory Farnum wrote:
>>>>> I'm not quite sure what questions you're actually asking here...
>>>>> In general, the OSD is not removed from the system without explicit
>>>>> admin intervention. When it is removed, all traces of it should be
>>>>> zapped (including its key), so it can't reconnect.
>>>>> If it hasn't been removed, then indeed it will continue working
>>>>> properly even if moved to a different box.
>>>>
>>>> If there is an external journal, the device containing the journal needs to be moved with the device containing the data. If I read ceph/src/upstart/ceph-osd.conf correctly, when the data device is plugged in the new machine it will fail to start because the journal is not there yet. When the journal device is plugged in, the ceph-osd.conf would be called because udev rule in ceph/udev/95-ceph-osd.rules call ceph-disk activate-journal.
>>>>
>>>> Is my understanding correct ?
>>>
>>> Well, after being wrong last time I'm a little reluctant to make
>>> pronouncements from memory, but that definitely sounds correct to me.
>>
>> Yep, that's how it's supposed to work.  The activate-journal piece is 
>> somewhat recent though (I think maybe it wasn't in place for cuttlefish?).
>>
>>> :) If I were doing an audit I'd want to look at what happens if there
>>> is a wrong journal in the correct location, etc.
>>
>> The ceph-osd will fail on start because the uuid/fsid doesn't match.
>>
>> sage
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Removing disks / OSDs
  2013-10-25 13:11           ` Loic Dachary
@ 2013-10-25 15:58             ` Sage Weil
  0 siblings, 0 replies; 12+ messages in thread
From: Sage Weil @ 2013-10-25 15:58 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

On Fri, 25 Oct 2013, Loic Dachary wrote:
> Hi,
> 
> If a disk containing an OSD is moved to another host, it is entirely 
> possible that the objects it contains are violating a crush rule 
> imposing that no object is replicated on the same host. Will this be 
> taken care of as a side effect of updating the place of the OSD in the 
> map ( 
> https://github.com/ceph/ceph/blob/v0.61.9/src/upstart/ceph-osd.conf#L29 
> ) ?

Exactly.  In general, in the simlest case, plugging an OSD into a 
different part of the CRUSH map will mean that the data will move. The 
benefit is that we retain that copy so there is no spike in the 
probability of losing additional replicas and all your data.

There are some tricks we could play if a host is being replaced wholesale 
(all drives moved together) by resuing the internal CRUSH ids and 
preventing movement, but that isn't done yet.

sage


> 
> Cheers
> 
> On 23/10/2013 15:45, Loic Dachary wrote:
> > Hi,
> > 
> > As a conclusion to this thread and also because I realized that most people are unaware of the level of automation provided by ceph / udev, I just posted a quick summary : http://dachary.org/?p=2428
> > 
> > Cheers
> > 
> > On 22/10/2013 19:31, Sage Weil wrote:
> >> On Tue, 22 Oct 2013, Gregory Farnum wrote:
> >>> On Mon, Oct 21, 2013 at 11:13 PM, Loic Dachary <loic@dachary.org> wrote:
> >>>>
> >>>>
> >>>> On 21/10/2013 18:49, Gregory Farnum wrote:
> >>>>> I'm not quite sure what questions you're actually asking here...
> >>>>> In general, the OSD is not removed from the system without explicit
> >>>>> admin intervention. When it is removed, all traces of it should be
> >>>>> zapped (including its key), so it can't reconnect.
> >>>>> If it hasn't been removed, then indeed it will continue working
> >>>>> properly even if moved to a different box.
> >>>>
> >>>> If there is an external journal, the device containing the journal needs to be moved with the device containing the data. If I read ceph/src/upstart/ceph-osd.conf correctly, when the data device is plugged in the new machine it will fail to start because the journal is not there yet. When the journal device is plugged in, the ceph-osd.conf would be called because udev rule in ceph/udev/95-ceph-osd.rules call ceph-disk activate-journal.
> >>>>
> >>>> Is my understanding correct ?
> >>>
> >>> Well, after being wrong last time I'm a little reluctant to make
> >>> pronouncements from memory, but that definitely sounds correct to me.
> >>
> >> Yep, that's how it's supposed to work.  The activate-journal piece is 
> >> somewhat recent though (I think maybe it wasn't in place for cuttlefish?).
> >>
> >>> :) If I were doing an audit I'd want to look at what happens if there
> >>> is a wrong journal in the correct location, etc.
> >>
> >> The ceph-osd will fail on start because the uuid/fsid doesn't match.
> >>
> >> sage
> >>
> > 
> 
> -- 
> Lo?c Dachary, Artisan Logiciel Libre
> All that is necessary for the triumph of evil is that good people do nothing.
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-10-25 15:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-21 16:15 Removing disks / OSDs Loic Dachary
2013-10-21 16:49 ` Gregory Farnum
2013-10-21 16:57   ` Loic Dachary
2013-10-21 17:17     ` Gregory Farnum
2013-10-21 21:15       ` Mark Kirkwood
2013-10-22  6:13   ` Loic Dachary
2013-10-22 17:26     ` Gregory Farnum
2013-10-22 17:31       ` Sage Weil
2013-10-23 13:45         ` Loic Dachary
2013-10-25 13:11           ` Loic Dachary
2013-10-25 15:58             ` Sage Weil
2013-10-23  6:29       ` Loic Dachary

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.