* OSD replacement feature
@ 2015-11-19 17:20 Loic Dachary
2015-11-20 7:55 ` Wei-Chung Cheng
0 siblings, 1 reply; 7+ messages in thread
From: Loic Dachary @ 2015-11-19 17:20 UTC (permalink / raw)
To: Vicente Cheng; +Cc: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 305 bytes --]
Hi Vicente,
Now that your ceph-disk deactivate/destroy feature is merged (and documented ;-), I wonder if you have time to comment on http://tracker.ceph.com/issues/13732 which is about replacing a disk ? Your input would be much appreciated.
Cheers
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: OSD replacement feature
2015-11-19 17:20 OSD replacement feature Loic Dachary
@ 2015-11-20 7:55 ` Wei-Chung Cheng
2015-11-20 11:38 ` Sage Weil
0 siblings, 1 reply; 7+ messages in thread
From: Wei-Chung Cheng @ 2015-11-20 7:55 UTC (permalink / raw)
To: Ceph Development
Hi Loic and cephers,
Sure, I have time to help (comment) on this feature replace a disk.
This is a useful feature to handle disk failure :p
An simple step is described on http://tracker.ceph.com/issues/13732 :
1. set noout flag - if the broken osd is primary osd, could we handle well?
2. stop osd daemon and we need to wait the osd actually down. (or
maybe use deactivate option with ceph-disk)
these two above step seems OK.
about handle crush map, should we remove the broken osd out?
If we do that, why we set noout flag? It still trigger re-balance
after we remove osd from crushmap.
Could we just remove the auth key and re-create osd with new disk
(then add the auth key back)?
I will try and test myself.
feel free to let me know if you have any suggeations!
thanks!!!
vicente
2015-11-20 1:20 GMT+08:00 Loic Dachary <loic@dachary.org>:
> Hi Vicente,
>
> Now that your ceph-disk deactivate/destroy feature is merged (and documented ;-), I wonder if you have time to comment on http://tracker.ceph.com/issues/13732 which is about replacing a disk ? Your input would be much appreciated.
>
> Cheers
>
> --
> Loïc Dachary, Artisan Logiciel Libre
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: OSD replacement feature
2015-11-20 7:55 ` Wei-Chung Cheng
@ 2015-11-20 11:38 ` Sage Weil
2015-11-20 16:40 ` Wei-Chung Cheng
2015-11-20 17:54 ` David Zafman
0 siblings, 2 replies; 7+ messages in thread
From: Sage Weil @ 2015-11-20 11:38 UTC (permalink / raw)
To: Wei-Chung Cheng; +Cc: Ceph Development
On Fri, 20 Nov 2015, Wei-Chung Cheng wrote:
> Hi Loic and cephers,
>
> Sure, I have time to help (comment) on this feature replace a disk.
> This is a useful feature to handle disk failure :p
>
> An simple step is described on http://tracker.ceph.com/issues/13732 :
> 1. set noout flag - if the broken osd is primary osd, could we handle well?
> 2. stop osd daemon and we need to wait the osd actually down. (or
> maybe use deactivate option with ceph-disk)
>
> these two above step seems OK.
> about handle crush map, should we remove the broken osd out?
> If we do that, why we set noout flag? It still trigger re-balance
> after we remove osd from crushmap.
Right--I think you generally want to do either one or the other:
1) mark osd out, leave failed disk in place. or, replace with new disk
that re-uses the same osd id.
or,
2) remove osd from crush map. replace with new disk (which gets new osd
id).
I think re-using the osd id is awkward currently, so doing 1 and replacing
the disk ends up moving data twice.
sage
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: OSD replacement feature
2015-11-20 11:38 ` Sage Weil
@ 2015-11-20 16:40 ` Wei-Chung Cheng
2015-11-20 17:54 ` David Zafman
1 sibling, 0 replies; 7+ messages in thread
From: Wei-Chung Cheng @ 2015-11-20 16:40 UTC (permalink / raw)
To: Sage Weil; +Cc: Ceph Development
2015-11-20 19:38 GMT+08:00 Sage Weil <sage@newdream.net>:
> On Fri, 20 Nov 2015, Wei-Chung Cheng wrote:
>> Hi Loic and cephers,
>>
>> Sure, I have time to help (comment) on this feature replace a disk.
>> This is a useful feature to handle disk failure :p
>>
>> An simple step is described on http://tracker.ceph.com/issues/13732 :
>> 1. set noout flag - if the broken osd is primary osd, could we handle well?
>> 2. stop osd daemon and we need to wait the osd actually down. (or
>> maybe use deactivate option with ceph-disk)
>>
>> these two above step seems OK.
>> about handle crush map, should we remove the broken osd out?
>> If we do that, why we set noout flag? It still trigger re-balance
>> after we remove osd from crushmap.
>
> Right--I think you generally want to do either one or the other:
>
> 1) mark osd out, leave failed disk in place. or, replace with new disk
> that re-uses the same osd id.
>
> or,
>
> 2) remove osd from crush map. replace with new disk (which gets new osd
> id).
>
> I think re-using the osd id is awkward currently, so doing 1 and replacing
> the disk ends up moving data twice.
>
Hi sage,
If the osd on "DNE" status, its weight must be zero and trigger moving
object data?
In my test cases, I only remove the auth key and osd-id (osd is "DNE" status).
Then replace with new disk that re-uses the same osd-id.
The osd only has little time on the out status.
I think this operation could reduce some redundant data moving.
How you think this operation? or just like you say.
mark osd out (deactivate/destroy ...etc) and replace with new disk
that re-uses the same osd id?
btw, if we just use ceph-deploy/ceph-disk, we could not create osd
with specific osd-id.
that should we implement for it?
thanks!!!
vicente
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: OSD replacement feature
2015-11-20 11:38 ` Sage Weil
2015-11-20 16:40 ` Wei-Chung Cheng
@ 2015-11-20 17:54 ` David Zafman
2015-11-24 4:45 ` Wei-Chung Cheng
1 sibling, 1 reply; 7+ messages in thread
From: David Zafman @ 2015-11-20 17:54 UTC (permalink / raw)
To: Sage Weil, Wei-Chung Cheng; +Cc: Ceph Development
There are two reasons for having a ceph-disk replace feature.
1. To simplify the steps required to replace a disk
2. To allow a disk to be replaced proactively without causing any data
movement.
So keeping the osd id the same is required and is what motivated the
feature for me.
David
On 11/20/15 3:38 AM, Sage Weil wrote:
> On Fri, 20 Nov 2015, Wei-Chung Cheng wrote:
>> Hi Loic and cephers,
>>
>> Sure, I have time to help (comment) on this feature replace a disk.
>> This is a useful feature to handle disk failure :p
>>
>> An simple step is described on http://tracker.ceph.com/issues/13732 :
>> 1. set noout flag - if the broken osd is primary osd, could we handle well?
>> 2. stop osd daemon and we need to wait the osd actually down. (or
>> maybe use deactivate option with ceph-disk)
>>
>> these two above step seems OK.
>> about handle crush map, should we remove the broken osd out?
>> If we do that, why we set noout flag? It still trigger re-balance
>> after we remove osd from crushmap.
> Right--I think you generally want to do either one or the other:
>
> 1) mark osd out, leave failed disk in place. or, replace with new disk
> that re-uses the same osd id.
>
> or,
>
> 2) remove osd from crush map. replace with new disk (which gets new osd
> id).
>
> I think re-using the osd id is awkward currently, so doing 1 and replacing
> the disk ends up moving data twice.
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: OSD replacement feature
2015-11-20 17:54 ` David Zafman
@ 2015-11-24 4:45 ` Wei-Chung Cheng
2015-11-24 7:26 ` David Zafman
0 siblings, 1 reply; 7+ messages in thread
From: Wei-Chung Cheng @ 2015-11-24 4:45 UTC (permalink / raw)
To: David Zafman; +Cc: Sage Weil, Ceph Development
2015-11-21 1:54 GMT+08:00 David Zafman <dzafman@redhat.com>:
>
> There are two reasons for having a ceph-disk replace feature.
>
> 1. To simplify the steps required to replace a disk
> 2. To allow a disk to be replaced proactively without causing any data
> movement.
Hi David,
It good to without causing any data movement when we want to replaced
failure osd.
But I don't have any idea to complete it, could you give some opinions?
I though if we want to replace failure we must move the object data on
failure osd to new(replacement) osd?
Or I got some misunderstanding?
thanks!!!
vicente
>
> So keeping the osd id the same is required and is what motivated the feature
> for me.
>
> David
>
>
> On 11/20/15 3:38 AM, Sage Weil wrote:
>>
>> On Fri, 20 Nov 2015, Wei-Chung Cheng wrote:
>>>
>>> Hi Loic and cephers,
>>>
>>> Sure, I have time to help (comment) on this feature replace a disk.
>>> This is a useful feature to handle disk failure :p
>>>
>>> An simple step is described on http://tracker.ceph.com/issues/13732 :
>>> 1. set noout flag - if the broken osd is primary osd, could we handle
>>> well?
>>> 2. stop osd daemon and we need to wait the osd actually down. (or
>>> maybe use deactivate option with ceph-disk)
>>>
>>> these two above step seems OK.
>>> about handle crush map, should we remove the broken osd out?
>>> If we do that, why we set noout flag? It still trigger re-balance
>>> after we remove osd from crushmap.
>>
>> Right--I think you generally want to do either one or the other:
>>
>> 1) mark osd out, leave failed disk in place. or, replace with new disk
>> that re-uses the same osd id.
>>
>> or,
>>
>> 2) remove osd from crush map. replace with new disk (which gets new osd
>> id).
>>
>> I think re-using the osd id is awkward currently, so doing 1 and replacing
>> the disk ends up moving data twice.
>>
>> sage
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: OSD replacement feature
2015-11-24 4:45 ` Wei-Chung Cheng
@ 2015-11-24 7:26 ` David Zafman
0 siblings, 0 replies; 7+ messages in thread
From: David Zafman @ 2015-11-24 7:26 UTC (permalink / raw)
To: Wei-Chung Cheng; +Cc: Sage Weil, Ceph Development
That is correct. The goal is to only refill the replacement OSD disk.
Otherwise, if the OSD is only down for less than
mon_osd_down_out_interval (5 min default) or noout is set, no other data
movement would occur.
David
On 11/23/15 8:45 PM, Wei-Chung Cheng wrote:
> 2015-11-21 1:54 GMT+08:00 David Zafman <dzafman@redhat.com>:
>> There are two reasons for having a ceph-disk replace feature.
>>
>> 1. To simplify the steps required to replace a disk
>> 2. To allow a disk to be replaced proactively without causing any data
>> movement.
> Hi David,
>
> It good to without causing any data movement when we want to replaced
> failure osd.
>
> But I don't have any idea to complete it, could you give some opinions?
>
> I though if we want to replace failure we must move the object data on
> failure osd to new(replacement) osd?
>
> Or I got some misunderstanding?
>
> thanks!!!
> vicente
>
>> So keeping the osd id the same is required and is what motivated the feature
>> for me.
>>
>> David
>>
>>
>> On 11/20/15 3:38 AM, Sage Weil wrote:
>>> On Fri, 20 Nov 2015, Wei-Chung Cheng wrote:
>>>> Hi Loic and cephers,
>>>>
>>>> Sure, I have time to help (comment) on this feature replace a disk.
>>>> This is a useful feature to handle disk failure :p
>>>>
>>>> An simple step is described on http://tracker.ceph.com/issues/13732 :
>>>> 1. set noout flag - if the broken osd is primary osd, could we handle
>>>> well?
>>>> 2. stop osd daemon and we need to wait the osd actually down. (or
>>>> maybe use deactivate option with ceph-disk)
>>>>
>>>> these two above step seems OK.
>>>> about handle crush map, should we remove the broken osd out?
>>>> If we do that, why we set noout flag? It still trigger re-balance
>>>> after we remove osd from crushmap.
>>> Right--I think you generally want to do either one or the other:
>>>
>>> 1) mark osd out, leave failed disk in place. or, replace with new disk
>>> that re-uses the same osd id.
>>>
>>> or,
>>>
>>> 2) remove osd from crush map. replace with new disk (which gets new osd
>>> id).
>>>
>>> I think re-using the osd id is awkward currently, so doing 1 and replacing
>>> the disk ends up moving data twice.
>>>
>>> sage
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-11-24 7:26 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-19 17:20 OSD replacement feature Loic Dachary
2015-11-20 7:55 ` Wei-Chung Cheng
2015-11-20 11:38 ` Sage Weil
2015-11-20 16:40 ` Wei-Chung Cheng
2015-11-20 17:54 ` David Zafman
2015-11-24 4:45 ` Wei-Chung Cheng
2015-11-24 7:26 ` David Zafman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.