* Would it make sense to require ntp @ 2015-11-06 10:06 Nathan Cutler 2015-11-06 11:36 ` Wido den Hollander 2015-11-06 12:26 ` John Spray 0 siblings, 2 replies; 7+ messages in thread From: Nathan Cutler @ 2015-11-06 10:06 UTC (permalink / raw) To: ceph-devel Hi Ceph: Recently I encountered some a "clock skew" issue with 0.94.3. I have some small demo clusters in AWS. When I boot them up, in most cases the cluster will start in HEALTH_WARN due to clock skew on some of the MONs. I surmise that this is due to a race condition between the ceph-mon and ntpd systemd services. Sometimes ntpd.service starts *after* ceph-mon - in this case the MON sees a wrong/unsynchronized time value. Now, even though ntpd.service starts (and fixes the time value) very soon afterwards, the cluster remains in clock skew for a long time - but that is a separate issue. What I would like to ask is this: Is there any reasonable Ceph cluster node configuration that does not include running the NTP daemon? If the answer is "no", would it make sense to make NTP a runtime dependency and tell the ceph-mon systemd service to wait for ntpd.service before it starts? Thanks and regards -- Nathan Cutler Software Engineer Distributed Storage SUSE LINUX, s.r.o. Tel.: +420 284 084 037 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Would it make sense to require ntp 2015-11-06 10:06 Would it make sense to require ntp Nathan Cutler @ 2015-11-06 11:36 ` Wido den Hollander 2015-11-06 12:26 ` John Spray 1 sibling, 0 replies; 7+ messages in thread From: Wido den Hollander @ 2015-11-06 11:36 UTC (permalink / raw) To: Nathan Cutler, ceph-devel On 11/06/2015 11:06 AM, Nathan Cutler wrote: > Hi Ceph: > > Recently I encountered some a "clock skew" issue with 0.94.3. I have > some small demo clusters in AWS. When I boot them up, in most cases the > cluster will start in HEALTH_WARN due to clock skew on some of the MONs. > > I surmise that this is due to a race condition between the ceph-mon and > ntpd systemd services. Sometimes ntpd.service starts *after* ceph-mon - > in this case the MON sees a wrong/unsynchronized time value. > > Now, even though ntpd.service starts (and fixes the time value) very > soon afterwards, the cluster remains in clock skew for a long time - but > that is a separate issue. What I would like to ask is this: > > Is there any reasonable Ceph cluster node configuration that does not > include running the NTP daemon? > Well, the MONs are very, very time sensitive. OSDs somewhat less, but if they drift too far they run into trouble authenticating. > If the answer is "no", would it make sense to make NTP a runtime > dependency and tell the ceph-mon systemd service to wait for > ntpd.service before it starts? > I think it makes sense, correct time is essential imho. > Thanks and regards > -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Would it make sense to require ntp 2015-11-06 10:06 Would it make sense to require ntp Nathan Cutler 2015-11-06 11:36 ` Wido den Hollander @ 2015-11-06 12:26 ` John Spray 2015-11-06 15:08 ` Gregory Farnum ` (2 more replies) 1 sibling, 3 replies; 7+ messages in thread From: John Spray @ 2015-11-06 12:26 UTC (permalink / raw) To: Nathan Cutler; +Cc: ceph-devel On Fri, Nov 6, 2015 at 10:06 AM, Nathan Cutler <ncutler@suse.cz> wrote: > Hi Ceph: > > Recently I encountered some a "clock skew" issue with 0.94.3. I have > some small demo clusters in AWS. When I boot them up, in most cases the > cluster will start in HEALTH_WARN due to clock skew on some of the MONs. > > I surmise that this is due to a race condition between the ceph-mon and > ntpd systemd services. Sometimes ntpd.service starts *after* ceph-mon - > in this case the MON sees a wrong/unsynchronized time value. > > Now, even though ntpd.service starts (and fixes the time value) very > soon afterwards, the cluster remains in clock skew for a long time - but > that is a separate issue. What I would like to ask is this: > > Is there any reasonable Ceph cluster node configuration that does not > include running the NTP daemon? Only if there is some other time service replacing it. I don't really know of anyone using alternative ntp daemons, but it's a possibility to consider before introducing a hard dependency on ntpd. > If the answer is "no", would it make sense to make NTP a runtime > dependency and tell the ceph-mon systemd service to wait for > ntpd.service before it starts? Just waiting for the service is quick, but it doesn't achieve any effect on the clock other than promising that it will be synced at some point in the future. Wouldn't we have to wait for time sync rather than just waiting for the service? That could take a while. My hunch is that users wouldn't appreciate the mon blocking until times were in sync, they'd probably prefer to go ahead and start up, but raise a warning (like we currently do). Given all that, maybe the question is actually: why do the mons stay in the skew state for so long after the clocks are corrected? John ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Would it make sense to require ntp 2015-11-06 12:26 ` John Spray @ 2015-11-06 15:08 ` Gregory Farnum 2015-11-06 15:19 ` Vasiliy Angapov 2015-11-06 17:44 ` Jeremy Hanmer 2016-01-21 2:45 ` Dan Mick 2 siblings, 1 reply; 7+ messages in thread From: Gregory Farnum @ 2015-11-06 15:08 UTC (permalink / raw) To: John Spray, Joao Eduardo Luis; +Cc: Nathan Cutler, ceph-devel On Fri, Nov 6, 2015 at 4:26 AM, John Spray <jspray@redhat.com> wrote: > On Fri, Nov 6, 2015 at 10:06 AM, Nathan Cutler <ncutler@suse.cz> wrote: >> Hi Ceph: >> >> Recently I encountered some a "clock skew" issue with 0.94.3. I have >> some small demo clusters in AWS. When I boot them up, in most cases the >> cluster will start in HEALTH_WARN due to clock skew on some of the MONs. >> >> I surmise that this is due to a race condition between the ceph-mon and >> ntpd systemd services. Sometimes ntpd.service starts *after* ceph-mon - >> in this case the MON sees a wrong/unsynchronized time value. >> >> Now, even though ntpd.service starts (and fixes the time value) very >> soon afterwards, the cluster remains in clock skew for a long time - but >> that is a separate issue. What I would like to ask is this: >> >> Is there any reasonable Ceph cluster node configuration that does not >> include running the NTP daemon? > > Only if there is some other time service replacing it. I don't really > know of anyone using alternative ntp daemons, but it's a possibility > to consider before introducing a hard dependency on ntpd. > >> If the answer is "no", would it make sense to make NTP a runtime >> dependency and tell the ceph-mon systemd service to wait for >> ntpd.service before it starts? > > Just waiting for the service is quick, but it doesn't achieve any > effect on the clock other than promising that it will be synced at > some point in the future. Wouldn't we have to wait for time sync > rather than just waiting for the service? That could take a while. > > My hunch is that users wouldn't appreciate the mon blocking until > times were in sync, they'd probably prefer to go ahead and start up, > but raise a warning (like we currently do). > > Given all that, maybe the question is actually: why do the mons stay > in the skew state for so long after the clocks are corrected? Perhaps they're just keeping the warning log up until the next regularly-scheduled clock sync test? I don't know that we want to start higher-frequency testing when in an error state (how expensive are the clock sync tests?) but we could at least let admins trigger one directly. (Maybe we do, but I didn't find anything about clocks in MonCommands.) -Greg ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Would it make sense to require ntp 2015-11-06 15:08 ` Gregory Farnum @ 2015-11-06 15:19 ` Vasiliy Angapov 0 siblings, 0 replies; 7+ messages in thread From: Vasiliy Angapov @ 2015-11-06 15:19 UTC (permalink / raw) To: Gregory Farnum; +Cc: John Spray, Joao Eduardo Luis, Nathan Cutler, ceph-devel Btw, in RHEL 7 based distros there is a choice between ntpd and chronyd with the latest being more preferred. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/ch-Configuring_NTP_Using_the_chrony_Suite.html 2015-11-06 23:08 GMT+08:00 Gregory Farnum <gfarnum@redhat.com>: > On Fri, Nov 6, 2015 at 4:26 AM, John Spray <jspray@redhat.com> wrote: >> On Fri, Nov 6, 2015 at 10:06 AM, Nathan Cutler <ncutler@suse.cz> wrote: >>> Hi Ceph: >>> >>> Recently I encountered some a "clock skew" issue with 0.94.3. I have >>> some small demo clusters in AWS. When I boot them up, in most cases the >>> cluster will start in HEALTH_WARN due to clock skew on some of the MONs. >>> >>> I surmise that this is due to a race condition between the ceph-mon and >>> ntpd systemd services. Sometimes ntpd.service starts *after* ceph-mon - >>> in this case the MON sees a wrong/unsynchronized time value. >>> >>> Now, even though ntpd.service starts (and fixes the time value) very >>> soon afterwards, the cluster remains in clock skew for a long time - but >>> that is a separate issue. What I would like to ask is this: >>> >>> Is there any reasonable Ceph cluster node configuration that does not >>> include running the NTP daemon? >> >> Only if there is some other time service replacing it. I don't really >> know of anyone using alternative ntp daemons, but it's a possibility >> to consider before introducing a hard dependency on ntpd. >> >>> If the answer is "no", would it make sense to make NTP a runtime >>> dependency and tell the ceph-mon systemd service to wait for >>> ntpd.service before it starts? >> >> Just waiting for the service is quick, but it doesn't achieve any >> effect on the clock other than promising that it will be synced at >> some point in the future. Wouldn't we have to wait for time sync >> rather than just waiting for the service? That could take a while. >> >> My hunch is that users wouldn't appreciate the mon blocking until >> times were in sync, they'd probably prefer to go ahead and start up, >> but raise a warning (like we currently do). >> >> Given all that, maybe the question is actually: why do the mons stay >> in the skew state for so long after the clocks are corrected? > > Perhaps they're just keeping the warning log up until the next > regularly-scheduled clock sync test? I don't know that we want to > start higher-frequency testing when in an error state (how expensive > are the clock sync tests?) but we could at least let admins trigger > one directly. (Maybe we do, but I didn't find anything about clocks in > MonCommands.) > -Greg > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Would it make sense to require ntp 2015-11-06 12:26 ` John Spray 2015-11-06 15:08 ` Gregory Farnum @ 2015-11-06 17:44 ` Jeremy Hanmer 2016-01-21 2:45 ` Dan Mick 2 siblings, 0 replies; 7+ messages in thread From: Jeremy Hanmer @ 2015-11-06 17:44 UTC (permalink / raw) To: ceph-devel On Fri, Nov 6, 2015 at 4:26 AM, John Spray <jspray@redhat.com> wrote: > On Fri, Nov 6, 2015 at 10:06 AM, Nathan Cutler <ncutler@suse.cz> wrote: >> Hi Ceph: >> >> >> Is there any reasonable Ceph cluster node configuration that does not >> include running the NTP daemon? > > Only if there is some other time service replacing it. I don't really > know of anyone using alternative ntp daemons, but it's a possibility > to consider before introducing a hard dependency on ntpd. > There's also the case of virtualization where the clock is managed by the host OS rather than the guest where the Ceph services are running. Anyone running a virtualized test/dev cluster or simply using VMs to provide isolation between services won't want to have a requirement on NTP. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Would it make sense to require ntp 2015-11-06 12:26 ` John Spray 2015-11-06 15:08 ` Gregory Farnum 2015-11-06 17:44 ` Jeremy Hanmer @ 2016-01-21 2:45 ` Dan Mick 2 siblings, 0 replies; 7+ messages in thread From: Dan Mick @ 2016-01-21 2:45 UTC (permalink / raw) To: John Spray, Nathan Cutler; +Cc: ceph-devel On 11/06/2015 04:26 AM, John Spray wrote: >> If the answer is "no", would it make sense to make NTP a runtime >> > dependency and tell the ceph-mon systemd service to wait for >> > ntpd.service before it starts? > Just waiting for the service is quick, but it doesn't achieve any > effect on the clock other than promising that it will be synced at > some point in the future. Wouldn't we have to wait for time sync > rather than just waiting for the service? That could take a while. For teuthology we have an optional task to stop ntpd, run ntpdate to slam-sync the clocks, and restart ntpd; by default ntpd can take a minimum of five minutes to adjust the clock, assuming it's got good sync data from its servers. I've wondered whether we ought to try to distribute that technique for users, but it's pretty half-baked in teuthology right now (the code exists but we're not running it as a matter of course). One wonders if there is a service that could be a sync point for "actual time update from ntpd/chronyd". ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-01-21 2:45 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-11-06 10:06 Would it make sense to require ntp Nathan Cutler 2015-11-06 11:36 ` Wido den Hollander 2015-11-06 12:26 ` John Spray 2015-11-06 15:08 ` Gregory Farnum 2015-11-06 15:19 ` Vasiliy Angapov 2015-11-06 17:44 ` Jeremy Hanmer 2016-01-21 2:45 ` Dan Mick
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.