All of lore.kernel.org
 help / color / mirror / Atom feed
* HEALTH_ERR when (re)starting ceph-osd's
@ 2016-01-28 10:48 Piotr.Dalek
  2016-01-28 12:37 ` Wido den Hollander
  0 siblings, 1 reply; 4+ messages in thread
From: Piotr.Dalek @ 2016-01-28 10:48 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

Hello,

I haven't noticed it before, but since merging https://github.com/ceph/ceph/pull/7253 I see that, when restarting daemons on healthy ceph cluster, it goes to HEALTH_ERR state with "$(random_number) pgs are stuck inactive for more than 300 seconds". 
I looked at the commit and it turns out it will be always occurring on restart/boot, as booting pgs are inactive "by default" (since mons never received any sign of life from them) - not because they're actually stuck inactive.
One solution to this would be to mark pg_stat.last_* fields to the point where it were first seen, so they will become stuck (mon_pg_stuck_threshold) seconds after first registering, and not right away.
Another, less invasive one, is to just let user disable this warning.

What do you think?

With best regards / Pozdrawiam
Piotr Dałek


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-01-28 15:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-28 10:48 HEALTH_ERR when (re)starting ceph-osd's Piotr.Dalek
2016-01-28 12:37 ` Wido den Hollander
2016-01-28 13:05   ` Piotr.Dalek
2016-01-28 15:25   ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.