From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Zafman Subject: Re: OSD replacement feature Date: Mon, 23 Nov 2015 23:26:40 -0800 Message-ID: <56541130.6090005@redhat.com> References: <564E04F9.20607@dachary.org> <564F5E6A.5050407@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mx1.redhat.com ([209.132.183.28]:35196 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752279AbbKXH0l (ORCPT ); Tue, 24 Nov 2015 02:26:41 -0500 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Wei-Chung Cheng Cc: Sage Weil , Ceph Development That is correct. The goal is to only refill the replacement OSD disk. Otherwise, if the OSD is only down for less than mon_osd_down_out_interval (5 min default) or noout is set, no other data movement would occur. David On 11/23/15 8:45 PM, Wei-Chung Cheng wrote: > 2015-11-21 1:54 GMT+08:00 David Zafman : >> There are two reasons for having a ceph-disk replace feature. >> >> 1. To simplify the steps required to replace a disk >> 2. To allow a disk to be replaced proactively without causing any data >> movement. > Hi David, > > It good to without causing any data movement when we want to replaced > failure osd. > > But I don't have any idea to complete it, could you give some opinions? > > I though if we want to replace failure we must move the object data on > failure osd to new(replacement) osd? > > Or I got some misunderstanding? > > thanks!!! > vicente > >> So keeping the osd id the same is required and is what motivated the feature >> for me. >> >> David >> >> >> On 11/20/15 3:38 AM, Sage Weil wrote: >>> On Fri, 20 Nov 2015, Wei-Chung Cheng wrote: >>>> Hi Loic and cephers, >>>> >>>> Sure, I have time to help (comment) on this feature replace a disk. >>>> This is a useful feature to handle disk failure :p >>>> >>>> An simple step is described on http://tracker.ceph.com/issues/13732 : >>>> 1. set noout flag - if the broken osd is primary osd, could we handle >>>> well? >>>> 2. stop osd daemon and we need to wait the osd actually down. (or >>>> maybe use deactivate option with ceph-disk) >>>> >>>> these two above step seems OK. >>>> about handle crush map, should we remove the broken osd out? >>>> If we do that, why we set noout flag? It still trigger re-balance >>>> after we remove osd from crushmap. >>> Right--I think you generally want to do either one or the other: >>> >>> 1) mark osd out, leave failed disk in place. or, replace with new disk >>> that re-uses the same osd id. >>> >>> or, >>> >>> 2) remove osd from crush map. replace with new disk (which gets new osd >>> id). >>> >>> I think re-using the osd id is awkward currently, so doing 1 and replacing >>> the disk ends up moving data twice. >>> >>> sage >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html