All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Łukasz Chrustek" <skidoo@tlen.pl>
To: Sage Weil <sage@newdream.net>
Cc: ceph-devel@vger.kernel.org
Subject: Re: problem with removing osd
Date: Thu, 29 Dec 2016 21:20:30 +0100	[thread overview]
Message-ID: <109878557.20161229212030@tlen.pl> (raw)
In-Reply-To: <alpine.DEB.2.11.1612291908472.10615@piezo.novalocal>

Hi,


>> 
>> # ceph osd tree
>> ID        WEIGHT    TYPE NAME             UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>        -7  16.89590 root ssd-disks
>>       -11         0     host ssd1
>> 598798032         0         osd.598798032     DNE        0

> Yikes!

Yes... indeed, I don't like this number too...

>>     21940         0         osd.21940         DNE        0
>>        71         0         osd.71            DNE        0
>> 
>> My question is how to delete this osds without direct editing crushmap
>> ? It is production system, I can't affort any service interruption :(,
>> when I try to ceph osd crush remove then ceph-mon crushes....
>> 
>> I  dumped  crushmap,  but it took 19G (!!) after decompiling (compiled
>> file  is  very small). So, I cleaned this file with perl (it take very
>> long  time), and I have now small txt crushmap, which I edited. But is
>> there  any  chance  that ceph will still remember somewhere about this
>> huge  numbers  for osds ? Is it safe to apply this cleaned crushmap to
>> cluster ?

> It sounds like the problem is the OSDMap, not CRUSH per se.  Can you 
> attach the output from 'ceph osd dump -f json-pretty'?

It's quite big so I put it on pastebin:

http://pastebin.com/Unkk2Pa7

> Do you know how osd.598798032 got created?  Or osd.21940 for that matter.
> OSD ids should be small since they are stored internally by OSDMap as a
> vector.  This is probably why your mon is crashing.

[root@cc1 /etc/ceph]# ceph osd tree
ID  WEIGHT    TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY
 -7  16.89590 root ssd-intel-s3700
-11         0     host ssd-stor1
 69         0         osd.69          down        0          1.00000
 70         0         osd.70          down        0          1.00000
 71         0         osd.71          down        0          1.00000


This the moment, when it happend:
]# for i in `seq 69 71`;do ceph osd crush remove osd.$i;done
removed item id 69 name 'osd.69' from crush map


removed item id 70 name 'osd.70' from crush map

here i press ctrl+c

2016-12-28 17:38:10.055239 7f4576d7a700  0 monclient: hunting for new mon
2016-12-28 17:38:10.055582 7f4574233700  0 -- 192.168.128.1:0/1201679761 >> 192.168.128.2:6789/0 pipe(0x7f456c023190 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f456c024470).fault
2016-12-28 17:38:30.550622 7f4574233700  0 -- 192.168.128.1:0/1201679761 >> 192.168.128.1:6789/0 pipe(0x7f45600008c0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4560001df0).fault
2016-12-28 17:38:54.551031 7f4574474700  0 -- 192.168.128.1:0/1201679761 >> 192.168.128.2:6789/0 pipe(0x7f45600046c0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f45600042b0).fault

after restart of ceph-mon:

]# ceph osd tree
ID         WEIGHT    TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY
        -7  16.89590 root ssd-intel-s3700
       -11         0     host ssd-stor1
-231707408         0
     22100         0         osd.22100        DNE        0
        71         0         osd.71           DNE        0

and later:

[root@cc1 ~]# ceph osd crush remove osd.22100
device 'osd.22100' does not appear in the crush map
[root@cc1 ~]# ceph osd crush remove osd.71
2016-12-28 17:52:34.459668 7f426a862700  0 monclient: hunting for new mon
2016-12-28 17:52:55.238418 7f426a862700  0 monclient: hunting for new mon
2016-12-28 17:52:55.238680 7f4262ebc700  0 -- 192.168.128.1:0/692048545 >> 192.168.128.2:6789/0 pipe(0x7f4254028300 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4254026800).fault

and after another restart of ceph-mon:

]# ceph osd tree
ID        WEIGHT    TYPE NAME             UP/DOWN REWEIGHT PRIMARY-AFFINITY
       -7  16.89590 root ssd-intel-s3700
      -11         0     host ssd-stor1
598798032         0         osd.598798032     DNE        0
    21940         0         osd.21940         DNE        0
       71         0         osd.71            DNE        0




-- 
Regards
Luk


  reply	other threads:[~2016-12-29 20:20 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-29 15:07 problem with removing osd Łukasz Chrustek
2016-12-29 19:10 ` Sage Weil
2016-12-29 20:20   ` Łukasz Chrustek [this message]
2016-12-29 20:26     ` Sage Weil
2016-12-29 20:46       ` Łukasz Chrustek
2016-12-29 20:49         ` Sage Weil
2016-12-29 20:55           ` Łukasz Chrustek
2016-12-29 21:41             ` Sage Weil
2016-12-29 22:02               ` Łukasz Chrustek
2016-12-29 22:18                 ` Sage Weil
2016-12-29 23:13                   ` Brad Hubbard
2016-12-29 23:33                     ` Łukasz Chrustek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=109878557.20161229212030@tlen.pl \
    --to=skidoo@tlen.pl \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.