* Would it make sense to require ntp
@ 2015-11-06 10:06 Nathan Cutler
2015-11-06 11:36 ` Wido den Hollander
2015-11-06 12:26 ` John Spray
0 siblings, 2 replies; 7+ messages in thread
From: Nathan Cutler @ 2015-11-06 10:06 UTC (permalink / raw)
To: ceph-devel
Hi Ceph:
Recently I encountered some a "clock skew" issue with 0.94.3. I have
some small demo clusters in AWS. When I boot them up, in most cases the
cluster will start in HEALTH_WARN due to clock skew on some of the MONs.
I surmise that this is due to a race condition between the ceph-mon and
ntpd systemd services. Sometimes ntpd.service starts *after* ceph-mon -
in this case the MON sees a wrong/unsynchronized time value.
Now, even though ntpd.service starts (and fixes the time value) very
soon afterwards, the cluster remains in clock skew for a long time - but
that is a separate issue. What I would like to ask is this:
Is there any reasonable Ceph cluster node configuration that does not
include running the NTP daemon?
If the answer is "no", would it make sense to make NTP a runtime
dependency and tell the ceph-mon systemd service to wait for
ntpd.service before it starts?
Thanks and regards
--
Nathan Cutler
Software Engineer Distributed Storage
SUSE LINUX, s.r.o.
Tel.: +420 284 084 037
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Would it make sense to require ntp
2015-11-06 10:06 Would it make sense to require ntp Nathan Cutler
@ 2015-11-06 11:36 ` Wido den Hollander
2015-11-06 12:26 ` John Spray
1 sibling, 0 replies; 7+ messages in thread
From: Wido den Hollander @ 2015-11-06 11:36 UTC (permalink / raw)
To: Nathan Cutler, ceph-devel
On 11/06/2015 11:06 AM, Nathan Cutler wrote:
> Hi Ceph:
>
> Recently I encountered some a "clock skew" issue with 0.94.3. I have
> some small demo clusters in AWS. When I boot them up, in most cases the
> cluster will start in HEALTH_WARN due to clock skew on some of the MONs.
>
> I surmise that this is due to a race condition between the ceph-mon and
> ntpd systemd services. Sometimes ntpd.service starts *after* ceph-mon -
> in this case the MON sees a wrong/unsynchronized time value.
>
> Now, even though ntpd.service starts (and fixes the time value) very
> soon afterwards, the cluster remains in clock skew for a long time - but
> that is a separate issue. What I would like to ask is this:
>
> Is there any reasonable Ceph cluster node configuration that does not
> include running the NTP daemon?
>
Well, the MONs are very, very time sensitive. OSDs somewhat less, but if
they drift too far they run into trouble authenticating.
> If the answer is "no", would it make sense to make NTP a runtime
> dependency and tell the ceph-mon systemd service to wait for
> ntpd.service before it starts?
>
I think it makes sense, correct time is essential imho.
> Thanks and regards
>
--
Wido den Hollander
42on B.V.
Ceph trainer and consultant
Phone: +31 (0)20 700 9902
Skype: contact42on
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Would it make sense to require ntp
2015-11-06 10:06 Would it make sense to require ntp Nathan Cutler
2015-11-06 11:36 ` Wido den Hollander
@ 2015-11-06 12:26 ` John Spray
2015-11-06 15:08 ` Gregory Farnum
` (2 more replies)
1 sibling, 3 replies; 7+ messages in thread
From: John Spray @ 2015-11-06 12:26 UTC (permalink / raw)
To: Nathan Cutler; +Cc: ceph-devel
On Fri, Nov 6, 2015 at 10:06 AM, Nathan Cutler <ncutler@suse.cz> wrote:
> Hi Ceph:
>
> Recently I encountered some a "clock skew" issue with 0.94.3. I have
> some small demo clusters in AWS. When I boot them up, in most cases the
> cluster will start in HEALTH_WARN due to clock skew on some of the MONs.
>
> I surmise that this is due to a race condition between the ceph-mon and
> ntpd systemd services. Sometimes ntpd.service starts *after* ceph-mon -
> in this case the MON sees a wrong/unsynchronized time value.
>
> Now, even though ntpd.service starts (and fixes the time value) very
> soon afterwards, the cluster remains in clock skew for a long time - but
> that is a separate issue. What I would like to ask is this:
>
> Is there any reasonable Ceph cluster node configuration that does not
> include running the NTP daemon?
Only if there is some other time service replacing it. I don't really
know of anyone using alternative ntp daemons, but it's a possibility
to consider before introducing a hard dependency on ntpd.
> If the answer is "no", would it make sense to make NTP a runtime
> dependency and tell the ceph-mon systemd service to wait for
> ntpd.service before it starts?
Just waiting for the service is quick, but it doesn't achieve any
effect on the clock other than promising that it will be synced at
some point in the future. Wouldn't we have to wait for time sync
rather than just waiting for the service? That could take a while.
My hunch is that users wouldn't appreciate the mon blocking until
times were in sync, they'd probably prefer to go ahead and start up,
but raise a warning (like we currently do).
Given all that, maybe the question is actually: why do the mons stay
in the skew state for so long after the clocks are corrected?
John
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Would it make sense to require ntp
2015-11-06 12:26 ` John Spray
@ 2015-11-06 15:08 ` Gregory Farnum
2015-11-06 15:19 ` Vasiliy Angapov
2015-11-06 17:44 ` Jeremy Hanmer
2016-01-21 2:45 ` Dan Mick
2 siblings, 1 reply; 7+ messages in thread
From: Gregory Farnum @ 2015-11-06 15:08 UTC (permalink / raw)
To: John Spray, Joao Eduardo Luis; +Cc: Nathan Cutler, ceph-devel
On Fri, Nov 6, 2015 at 4:26 AM, John Spray <jspray@redhat.com> wrote:
> On Fri, Nov 6, 2015 at 10:06 AM, Nathan Cutler <ncutler@suse.cz> wrote:
>> Hi Ceph:
>>
>> Recently I encountered some a "clock skew" issue with 0.94.3. I have
>> some small demo clusters in AWS. When I boot them up, in most cases the
>> cluster will start in HEALTH_WARN due to clock skew on some of the MONs.
>>
>> I surmise that this is due to a race condition between the ceph-mon and
>> ntpd systemd services. Sometimes ntpd.service starts *after* ceph-mon -
>> in this case the MON sees a wrong/unsynchronized time value.
>>
>> Now, even though ntpd.service starts (and fixes the time value) very
>> soon afterwards, the cluster remains in clock skew for a long time - but
>> that is a separate issue. What I would like to ask is this:
>>
>> Is there any reasonable Ceph cluster node configuration that does not
>> include running the NTP daemon?
>
> Only if there is some other time service replacing it. I don't really
> know of anyone using alternative ntp daemons, but it's a possibility
> to consider before introducing a hard dependency on ntpd.
>
>> If the answer is "no", would it make sense to make NTP a runtime
>> dependency and tell the ceph-mon systemd service to wait for
>> ntpd.service before it starts?
>
> Just waiting for the service is quick, but it doesn't achieve any
> effect on the clock other than promising that it will be synced at
> some point in the future. Wouldn't we have to wait for time sync
> rather than just waiting for the service? That could take a while.
>
> My hunch is that users wouldn't appreciate the mon blocking until
> times were in sync, they'd probably prefer to go ahead and start up,
> but raise a warning (like we currently do).
>
> Given all that, maybe the question is actually: why do the mons stay
> in the skew state for so long after the clocks are corrected?
Perhaps they're just keeping the warning log up until the next
regularly-scheduled clock sync test? I don't know that we want to
start higher-frequency testing when in an error state (how expensive
are the clock sync tests?) but we could at least let admins trigger
one directly. (Maybe we do, but I didn't find anything about clocks in
MonCommands.)
-Greg
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Would it make sense to require ntp
2015-11-06 15:08 ` Gregory Farnum
@ 2015-11-06 15:19 ` Vasiliy Angapov
0 siblings, 0 replies; 7+ messages in thread
From: Vasiliy Angapov @ 2015-11-06 15:19 UTC (permalink / raw)
To: Gregory Farnum; +Cc: John Spray, Joao Eduardo Luis, Nathan Cutler, ceph-devel
Btw, in RHEL 7 based distros there is a choice between ntpd and
chronyd with the latest being more preferred.
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/ch-Configuring_NTP_Using_the_chrony_Suite.html
2015-11-06 23:08 GMT+08:00 Gregory Farnum <gfarnum@redhat.com>:
> On Fri, Nov 6, 2015 at 4:26 AM, John Spray <jspray@redhat.com> wrote:
>> On Fri, Nov 6, 2015 at 10:06 AM, Nathan Cutler <ncutler@suse.cz> wrote:
>>> Hi Ceph:
>>>
>>> Recently I encountered some a "clock skew" issue with 0.94.3. I have
>>> some small demo clusters in AWS. When I boot them up, in most cases the
>>> cluster will start in HEALTH_WARN due to clock skew on some of the MONs.
>>>
>>> I surmise that this is due to a race condition between the ceph-mon and
>>> ntpd systemd services. Sometimes ntpd.service starts *after* ceph-mon -
>>> in this case the MON sees a wrong/unsynchronized time value.
>>>
>>> Now, even though ntpd.service starts (and fixes the time value) very
>>> soon afterwards, the cluster remains in clock skew for a long time - but
>>> that is a separate issue. What I would like to ask is this:
>>>
>>> Is there any reasonable Ceph cluster node configuration that does not
>>> include running the NTP daemon?
>>
>> Only if there is some other time service replacing it. I don't really
>> know of anyone using alternative ntp daemons, but it's a possibility
>> to consider before introducing a hard dependency on ntpd.
>>
>>> If the answer is "no", would it make sense to make NTP a runtime
>>> dependency and tell the ceph-mon systemd service to wait for
>>> ntpd.service before it starts?
>>
>> Just waiting for the service is quick, but it doesn't achieve any
>> effect on the clock other than promising that it will be synced at
>> some point in the future. Wouldn't we have to wait for time sync
>> rather than just waiting for the service? That could take a while.
>>
>> My hunch is that users wouldn't appreciate the mon blocking until
>> times were in sync, they'd probably prefer to go ahead and start up,
>> but raise a warning (like we currently do).
>>
>> Given all that, maybe the question is actually: why do the mons stay
>> in the skew state for so long after the clocks are corrected?
>
> Perhaps they're just keeping the warning log up until the next
> regularly-scheduled clock sync test? I don't know that we want to
> start higher-frequency testing when in an error state (how expensive
> are the clock sync tests?) but we could at least let admins trigger
> one directly. (Maybe we do, but I didn't find anything about clocks in
> MonCommands.)
> -Greg
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Would it make sense to require ntp
2015-11-06 12:26 ` John Spray
2015-11-06 15:08 ` Gregory Farnum
@ 2015-11-06 17:44 ` Jeremy Hanmer
2016-01-21 2:45 ` Dan Mick
2 siblings, 0 replies; 7+ messages in thread
From: Jeremy Hanmer @ 2015-11-06 17:44 UTC (permalink / raw)
To: ceph-devel
On Fri, Nov 6, 2015 at 4:26 AM, John Spray <jspray@redhat.com> wrote:
> On Fri, Nov 6, 2015 at 10:06 AM, Nathan Cutler <ncutler@suse.cz> wrote:
>> Hi Ceph:
>>
>>
>> Is there any reasonable Ceph cluster node configuration that does not
>> include running the NTP daemon?
>
> Only if there is some other time service replacing it. I don't really
> know of anyone using alternative ntp daemons, but it's a possibility
> to consider before introducing a hard dependency on ntpd.
>
There's also the case of virtualization where the clock is managed by
the host OS
rather than the guest where the Ceph services are running. Anyone running a
virtualized test/dev cluster or simply using VMs to provide isolation between
services won't want to have a requirement on NTP.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Would it make sense to require ntp
2015-11-06 12:26 ` John Spray
2015-11-06 15:08 ` Gregory Farnum
2015-11-06 17:44 ` Jeremy Hanmer
@ 2016-01-21 2:45 ` Dan Mick
2 siblings, 0 replies; 7+ messages in thread
From: Dan Mick @ 2016-01-21 2:45 UTC (permalink / raw)
To: John Spray, Nathan Cutler; +Cc: ceph-devel
On 11/06/2015 04:26 AM, John Spray wrote:
>> If the answer is "no", would it make sense to make NTP a runtime
>> > dependency and tell the ceph-mon systemd service to wait for
>> > ntpd.service before it starts?
> Just waiting for the service is quick, but it doesn't achieve any
> effect on the clock other than promising that it will be synced at
> some point in the future. Wouldn't we have to wait for time sync
> rather than just waiting for the service? That could take a while.
For teuthology we have an optional task to stop ntpd, run ntpdate to
slam-sync the clocks, and restart ntpd; by default ntpd can take a
minimum of five minutes to adjust the clock, assuming it's got good sync
data from its servers. I've wondered whether we ought to try to
distribute that technique for users, but it's pretty half-baked in
teuthology right now (the code exists but we're not running it as a
matter of course).
One wonders if there is a service that could be a sync point for "actual
time update from ntpd/chronyd".
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-01-21 2:45 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-06 10:06 Would it make sense to require ntp Nathan Cutler
2015-11-06 11:36 ` Wido den Hollander
2015-11-06 12:26 ` John Spray
2015-11-06 15:08 ` Gregory Farnum
2015-11-06 15:19 ` Vasiliy Angapov
2015-11-06 17:44 ` Jeremy Hanmer
2016-01-21 2:45 ` Dan Mick
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.