public inbox for ceph-devel@vger.kernel.org
 help / color / mirror / Atom feed
From: Stefan Kooman <stefan-68+x73Hep80@public.gmane.org>
To: by morphin <morphinwithyou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: Mimic cluster is offline and not healing
Date: Thu, 27 Sep 2018 15:10:43 +0200	[thread overview]
Message-ID: <20180927131043.GB17567@shell.dmz.bit.nl> (raw)
In-Reply-To: <CAE-AtHqSpX09gnAfgXt1=nmyLKuvjgMMn+qKaiZ0nOUKwEARrA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Quoting by morphin (morphinwithyou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
> After 72 hours I believe we may hit a bug. Any help would be greatly
> appreciated.

Is it feasible for you to stop all client IO to the Ceph cluster? At
least until it stabilizes again. "ceph osd pause" would do the trick
(ceph osd unpause would unset it). 

What kind of workload are you running on the cluster? How does your
crush map looks like (ceph osd getcrushmap -o  /tmp/crush_raw; 
crushtool -d /tmp/crush_raw -o /tmp/crush_edit)?

I have seen a (test) Ceph cluster "healing" itself to the point there was
nothing left to recover on. In *that* case the disks were overbooked
(multiple OSDs per physical disk) ... The flags you set (nooout, nodown,
nobackfill, norecover, noscrub, etc., etc.) helped to get it to recover
again. I would try to get all OSDs online again (and manually keep them
up / restart them, because you have set nodown).

Does the cluster recover at all?

Gr. Stefan

-- 
| BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / info-68+x73Hep80@public.gmane.org

  parent reply	other threads:[~2018-09-27 13:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-27 12:19 Mimic cluster is offline and not healing by morphin
     [not found] ` <CAE-AtHqSpX09gnAfgXt1=nmyLKuvjgMMn+qKaiZ0nOUKwEARrA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-09-27 13:10   ` Stefan Kooman [this message]
     [not found]     ` <20180927131043.GB17567-VkyGEX2O1ez1kYbDYJMsfg@public.gmane.org>
2018-09-27 13:27       ` by morphin
     [not found]         ` <CAE-AtHodr9iaGF3vhkrv+J8mHsYk384Ni8MpbMvW6Xg_Tdw4GQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-09-27 18:38           ` by morphin
     [not found]             ` <CAE-AtHpGLZu5ygyw0sLkOcB3mt-0pLfcLZiPKYuptDLAafy7uw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-09-27 20:52               ` by morphin
     [not found]                 ` <CAE-AtHo2UVSFcMHMXszSPJXs=BRKb0PELzryMyu4LVEv910pQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-09-28  7:09                   ` Stefan Kooman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180927131043.GB17567@shell.dmz.bit.nl \
    --to=stefan-68+x73hep80@public.gmane.org \
    --cc=ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org \
    --cc=morphinwithyou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox