* Quering since when a PG is inactive
@ 2015-12-09 7:54 Wido den Hollander
2015-12-09 13:50 ` Sage Weil
0 siblings, 1 reply; 3+ messages in thread
From: Wido den Hollander @ 2015-12-09 7:54 UTC (permalink / raw)
To: ceph-devel
Hi,
I'm working on a patch in PGMonitor.cc that sets the state to HEALTH_ERR
if >= X PGs are stuck non-active.
This works for me now, but I would like to add a timer that a PG has to
be inactive for more than Y seconds.
The PGMap contains "last_active" and "last_clean", but these timestamps
are never updated. So I can't query for last_active =< (now() - 300) for
example.
On a idle test cluster I have a PG for example:
"last_active": "2015-12-09 02:32:31.540712",
It's currently 08:53:56 here, so I can't check against last_active.
What would a good way be to see for how long a PG has been inactive?
--
Wido den Hollander
42on B.V.
Ceph trainer and consultant
Phone: +31 (0)20 700 9902
Skype: contact42on
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Quering since when a PG is inactive
2015-12-09 7:54 Quering since when a PG is inactive Wido den Hollander
@ 2015-12-09 13:50 ` Sage Weil
2015-12-09 16:14 ` Wido den Hollander
0 siblings, 1 reply; 3+ messages in thread
From: Sage Weil @ 2015-12-09 13:50 UTC (permalink / raw)
To: Wido den Hollander; +Cc: ceph-devel
Hi Wido!
On Wed, 9 Dec 2015, Wido den Hollander wrote:
> Hi,
>
> I'm working on a patch in PGMonitor.cc that sets the state to HEALTH_ERR
> if >= X PGs are stuck non-active.
>
> This works for me now, but I would like to add a timer that a PG has to
> be inactive for more than Y seconds.
>
> The PGMap contains "last_active" and "last_clean", but these timestamps
> are never updated. So I can't query for last_active =< (now() - 300) for
> example.
>
> On a idle test cluster I have a PG for example:
>
> "last_active": "2015-12-09 02:32:31.540712",
>
> It's currently 08:53:56 here, so I can't check against last_active.
>
> What would a good way be to see for how long a PG has been inactive?
It sounds like maybe the current code is subtley broken:
https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L2566
The last_active/clean etc should be fresh within
osd_pg_stat_report_interval_max seconds...
sage
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Quering since when a PG is inactive
2015-12-09 13:50 ` Sage Weil
@ 2015-12-09 16:14 ` Wido den Hollander
0 siblings, 0 replies; 3+ messages in thread
From: Wido den Hollander @ 2015-12-09 16:14 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On 12/09/2015 02:50 PM, Sage Weil wrote:
> Hi Wido!
>
> On Wed, 9 Dec 2015, Wido den Hollander wrote:
>> Hi,
>>
>> I'm working on a patch in PGMonitor.cc that sets the state to HEALTH_ERR
>> if >= X PGs are stuck non-active.
>>
>> This works for me now, but I would like to add a timer that a PG has to
>> be inactive for more than Y seconds.
>>
>> The PGMap contains "last_active" and "last_clean", but these timestamps
>> are never updated. So I can't query for last_active =< (now() - 300) for
>> example.
>>
>> On a idle test cluster I have a PG for example:
>>
>> "last_active": "2015-12-09 02:32:31.540712",
>>
>> It's currently 08:53:56 here, so I can't check against last_active.
>>
>> What would a good way be to see for how long a PG has been inactive?
>
> It sounds like maybe the current code is subtley broken:
>
> https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L2566
>
> The last_active/clean etc should be fresh within
> osd_pg_stat_report_interval_max seconds...
>
Indeed, that seems broken. I created a issue for it:
http://tracker.ceph.com/issues/14028
I'm not sure where to start (yet).
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Wido den Hollander
42on B.V.
Ceph trainer and consultant
Phone: +31 (0)20 700 9902
Skype: contact42on
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-12-09 16:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-09 7:54 Quering since when a PG is inactive Wido den Hollander
2015-12-09 13:50 ` Sage Weil
2015-12-09 16:14 ` Wido den Hollander
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.