From: Stefan Kooman <stefan-68+x73Hep80@public.gmane.org>
To: by morphin <morphinwithyou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org,
ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: Mimic cluster is offline and not healing
Date: Fri, 28 Sep 2018 09:09:23 +0200 [thread overview]
Message-ID: <20180928070923.GC17567@shell.dmz.bit.nl> (raw)
In-Reply-To: <CAE-AtHo2UVSFcMHMXszSPJXs=BRKb0PELzryMyu4LVEv910pQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Quoting by morphin (morphinwithyou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
> Good news... :)
>
> After I tried everything. I decide to re-create my MONs from OSD's and
> I used the script:
> https://paste.ubuntu.com/p/rNMPdMPhT5/
>
> And it worked!!!
Congrats!
> I think when 2 server crashed and come back same time some how MON's
> confused and the maps just corrupted.
> After re-creation all the MONs was have the same map so it worked.
> But still I dont know how to hell the mons can cause endless %95 I/O ???
> This a bug anyway and if you dont want to leave the problem then do
> not "enable" your mons. Just start them manual! Another tough lesson.
The only time we needed to manually start the mons was at "bootstrap"
time. After a reboot they are brought up by systemd ... and it keeps on
working. Have you rebooted your mon(s) after the manual start?
>
> ceph -s: https://paste.ubuntu.com/p/m3hFF22jM9/
>
> As you can see below some of the OSDs are still down. And when I start
> them they dont start.
> Check start log: https://paste.ubuntu.com/p/ZJQG4khdbx/
> Debug log: https://paste.ubuntu.com/p/J3JyGShHym/
>
> What we can do for the problem?
Apply PR https://github.com/ceph/ceph/pull/24064
I see that you are running Mimic 13.2.1 ... 13.2.2 was released a few
days ago. Not sure if this fix has made it into 13.2.2.
> What is the cause of the problem?
Somehow it looks like you hit this issue:
https://tracker.ceph.com/issues/24866
Gr. Stefan
--
| BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / info-68+x73Hep80@public.gmane.org
prev parent reply other threads:[~2018-09-28 7:09 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-27 12:19 Mimic cluster is offline and not healing by morphin
[not found] ` <CAE-AtHqSpX09gnAfgXt1=nmyLKuvjgMMn+qKaiZ0nOUKwEARrA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-09-27 13:10 ` Stefan Kooman
[not found] ` <20180927131043.GB17567-VkyGEX2O1ez1kYbDYJMsfg@public.gmane.org>
2018-09-27 13:27 ` by morphin
[not found] ` <CAE-AtHodr9iaGF3vhkrv+J8mHsYk384Ni8MpbMvW6Xg_Tdw4GQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-09-27 18:38 ` by morphin
[not found] ` <CAE-AtHpGLZu5ygyw0sLkOcB3mt-0pLfcLZiPKYuptDLAafy7uw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-09-27 20:52 ` by morphin
[not found] ` <CAE-AtHo2UVSFcMHMXszSPJXs=BRKb0PELzryMyu4LVEv910pQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-09-28 7:09 ` Stefan Kooman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180928070923.GC17567@shell.dmz.bit.nl \
--to=stefan-68+x73hep80@public.gmane.org \
--cc=ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org \
--cc=morphinwithyou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox