All of lore.kernel.org
 help / color / mirror / Atom feed
* Quering since when a PG is inactive
@ 2015-12-09  7:54 Wido den Hollander
  2015-12-09 13:50 ` Sage Weil
  0 siblings, 1 reply; 3+ messages in thread
From: Wido den Hollander @ 2015-12-09  7:54 UTC (permalink / raw)
  To: ceph-devel

Hi,

I'm working on a patch in PGMonitor.cc that sets the state to HEALTH_ERR
if >= X PGs are stuck non-active.

This works for me now, but I would like to add a timer that a PG has to
be inactive for more than Y seconds.

The PGMap contains "last_active" and "last_clean", but these timestamps
are never updated. So I can't query for last_active =< (now() - 300) for
example.

On a idle test cluster I have a PG for example:

"last_active": "2015-12-09 02:32:31.540712",

It's currently 08:53:56 here, so I can't check against last_active.

What would a good way be to see for how long a PG has been inactive?

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Quering since when a PG is inactive
  2015-12-09  7:54 Quering since when a PG is inactive Wido den Hollander
@ 2015-12-09 13:50 ` Sage Weil
  2015-12-09 16:14   ` Wido den Hollander
  0 siblings, 1 reply; 3+ messages in thread
From: Sage Weil @ 2015-12-09 13:50 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: ceph-devel

Hi Wido!

On Wed, 9 Dec 2015, Wido den Hollander wrote:
> Hi,
> 
> I'm working on a patch in PGMonitor.cc that sets the state to HEALTH_ERR
> if >= X PGs are stuck non-active.
> 
> This works for me now, but I would like to add a timer that a PG has to
> be inactive for more than Y seconds.
> 
> The PGMap contains "last_active" and "last_clean", but these timestamps
> are never updated. So I can't query for last_active =< (now() - 300) for
> example.
> 
> On a idle test cluster I have a PG for example:
> 
> "last_active": "2015-12-09 02:32:31.540712",
> 
> It's currently 08:53:56 here, so I can't check against last_active.
> 
> What would a good way be to see for how long a PG has been inactive?

It sounds like maybe the current code is subtley broken:

	https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L2566

The last_active/clean etc should be fresh within 
osd_pg_stat_report_interval_max seconds...

sage

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Quering since when a PG is inactive
  2015-12-09 13:50 ` Sage Weil
@ 2015-12-09 16:14   ` Wido den Hollander
  0 siblings, 0 replies; 3+ messages in thread
From: Wido den Hollander @ 2015-12-09 16:14 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 12/09/2015 02:50 PM, Sage Weil wrote:
> Hi Wido!
> 
> On Wed, 9 Dec 2015, Wido den Hollander wrote:
>> Hi,
>>
>> I'm working on a patch in PGMonitor.cc that sets the state to HEALTH_ERR
>> if >= X PGs are stuck non-active.
>>
>> This works for me now, but I would like to add a timer that a PG has to
>> be inactive for more than Y seconds.
>>
>> The PGMap contains "last_active" and "last_clean", but these timestamps
>> are never updated. So I can't query for last_active =< (now() - 300) for
>> example.
>>
>> On a idle test cluster I have a PG for example:
>>
>> "last_active": "2015-12-09 02:32:31.540712",
>>
>> It's currently 08:53:56 here, so I can't check against last_active.
>>
>> What would a good way be to see for how long a PG has been inactive?
> 
> It sounds like maybe the current code is subtley broken:
> 
> 	https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L2566
> 
> The last_active/clean etc should be fresh within 
> osd_pg_stat_report_interval_max seconds...
> 

Indeed, that seems broken. I created a issue for it:
http://tracker.ceph.com/issues/14028

I'm not sure where to start (yet).

> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-12-09 16:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-09  7:54 Quering since when a PG is inactive Wido den Hollander
2015-12-09 13:50 ` Sage Weil
2015-12-09 16:14   ` Wido den Hollander

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.