All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wido den Hollander <wido@42on.com>
To: "Piotr.Dalek@ts.fujitsu.com" <Piotr.Dalek@ts.fujitsu.com>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: HEALTH_ERR when (re)starting ceph-osd's
Date: Thu, 28 Jan 2016 13:37:39 +0100	[thread overview]
Message-ID: <56AA0B93.8050105@42on.com> (raw)
In-Reply-To: <0ec5364ea10c41a687edca2bfe7818e6@R01UKEXCASM115.r01.fujitsu.local>



On 28-01-16 11:48, Piotr.Dalek@ts.fujitsu.com wrote:
> Hello,
> 
> I haven't noticed it before, but since merging https://github.com/ceph/ceph/pull/7253 I see that, when restarting daemons on healthy ceph cluster, it goes to HEALTH_ERR state with "$(random_number) pgs are stuck inactive for more than 300 seconds". 
> I looked at the commit and it turns out it will be always occurring on restart/boot, as booting pgs are inactive "by default" (since mons never received any sign of life from them) - not because they're actually stuck inactive.

Well, in that case, isn't the PR correct? But I see what you mean.

> One solution to this would be to mark pg_stat.last_* fields to the point where it were first seen, so they will become stuck (mon_pg_stuck_threshold) seconds after first registering, and not right away.

That sounds like a good solution, you might want to take a look at:
http://tracker.ceph.com/issues/14028

> Another, less invasive one, is to just let user disable this warning.
> 

As you can see in the discussion on Github, we decided to set
'mon_pg_min_inactive' to 1 by default.

You can disable these warnings by either setting it to zero or to maybe
something like 10.

This is just there that people are informed when multiple PGs are inactive.

Being in WARN state, but still not performing I/O is a bad thing. WARN
should be where you take a look, but aren't worried.

If I/O stops ERR is a good thing to go in to.

Wido

> What do you think?
> 
> With best regards / Pozdrawiam
> Piotr Dałek
> 
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�\x1dʇڙ�,j\a��f���h���z�\x1e�w���\f���j:+v���w�j�m����\a����zZ+�����ݢj"��!tml=
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2016-01-28 12:37 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-28 10:48 HEALTH_ERR when (re)starting ceph-osd's Piotr.Dalek
2016-01-28 12:37 ` Wido den Hollander [this message]
2016-01-28 13:05   ` Piotr.Dalek
2016-01-28 15:25   ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56AA0B93.8050105@42on.com \
    --to=wido@42on.com \
    --cc=Piotr.Dalek@ts.fujitsu.com \
    --cc=ceph-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.