* Re: [PATCH] rtc: adapt allowed RTC update error [not found] ` <874kl3eu8p.fsf@nanos.tec.linutronix.de> @ 2020-12-03 1:14 ` Thomas Gleixner 2020-12-03 2:04 ` Jason Gunthorpe 2020-12-03 2:10 ` Alexandre Belloni 0 siblings, 2 replies; 24+ messages in thread From: Thomas Gleixner @ 2020-12-03 1:14 UTC (permalink / raw) To: Jason Gunthorpe Cc: Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, Alexandre Belloni, linux-rtc, Peter Zijlstra Cc+ RTC folks. On Wed, Dec 02 2020 at 23:08, Thomas Gleixner wrote: > On Wed, Dec 02 2020 at 16:54, Jason Gunthorpe wrote: >>> I don't think the timer should be canceled if the ntp_synced() state did >>> not change. Otherwise every do_adtimex() call will cancel/restart >>> it, which does not make sense. Lemme stare at it some more. >> >> That makes sense, being conditional on the STA_UNSYNC prior to doing >> any hrtimer_start seems OK? > > Yeah. And I was staring at it some more. TBH, rtc_tv_nsec_ok() is pretty much voodoo and wishful thinking. The point is that the original MC146818 has an update cycle once per second. That's what mc146818_is_updating() is checking. Lets start right here. mc146818_get_time() is bonkers to begin with. It does: if (mc146818_is_updating()) mdelay(20); lock(&rtc_lock); read_stuff(); unlock(&rtc_lock); That's crap, really. The MC146818 does a periodic update of the time data once per second and it advertises it via the UIP bit in Register A. The UIP bit is set 244us _before_ the update starts and depending on the time base is takes either 248us or 1984us (32kHz) until the bit goes low again. So let's look at the time line assuming a 32kHz time base: Time 0.000s ~0.998s 1.0s | | _____ UIP ____________________| |_______ Let's look at the code again and annotate it with time stamps: 0.997 read() if (mc146818_is_updating()) lock(rtc); read(regA); unlock(rtc); return regA & UIP; <- Lets assume FALSE 0.998 -> whatever happens (SMI, NMI, interrupt, preemption ...) 0.999 lock(rtc) read_data() <- Result is undefined because RTC is in the middle of the update unlock(rtc); Seriously... The comment above the if(...updating()) is full of wishful thinking: /* * read RTC once any update in progress is done. The update * can take just over 2ms. We wait 20ms. There is no need to * to poll-wait (up to 1s - eeccch) for the falling edge of RTC_UIP. * If you need to know *exactly* when a second has started, enable * periodic update complete interrupts, (via ioctl) and then * immediately read /dev/rtc which will block until you get the IRQ. * Once the read clears, read the RTC time (again via ioctl). Easy. */ - The update can take exactly up to 1984us, which is not just over 2ms. Also the update starts 248us _after_ UIP goes high. - Wait 20ms? What for? To make sure that the RTC has advanced by 2ms? - Poll wait for the falling edge of UIP takes up to 1s? I must have missed something in basic math. If UIP is high then waiting for the falling edge takes at max 2ms. I know what the comment tries to say, but on the first read I had one of those forbidden-by-CoC moments. - The need to know *exactly* part of the comment is even more amazing. Q: Which system guarantees that the interrupt: - will not happen _before_ read(/dev/rtc) after the ioctl() enabled it? - will be delivered without delay ? - will make sure that the task in the blocking read will be scheduled immediately and then do the ioctl based readout ? A: None Q: What is *exact* about this? A: Nothing at all. So the right thing to do here is: read() do { lock(rtc) start = ktime_get(); if (read(regA) & UIP) { unlock(rtc); wait(2ms); continue; } read_data(); end = ktime_get(); unlock(rtc); while (end - start > 2ms); and return _all_ three values (start, end, rtcdata) so clueful userspace can make sense of it. Returning nonsense as pointed off above is a non option. Hmm? Now to the write side. That's really wishful thinking. Let's look at the code: write() lock(rtc); write(regB, read(reagB) | SET); write_data(); write(regB, read(reagB) & ~SET); unlock(rtc); lock/unlock are irrelevant as they just serialize concurrent access to the RTC. The magic comment in ntp.c says that RTC will update the written value 0.5 seconds after writing it. That's pure fiction... I have no idea where this comes from, but any spec out there says about this: SET - When the SET bit is a "0", the update cycle functions normally by advancing the counts once-per-second. When the SET bit is written to a "1", any update cycle in progress is aborted and the program may initialize the time and calendar bytes without an update occuring in the midst of initializing. ..... So yes, the comment is partially correct _if_ and only _if_ the time base which schedules the update is exactly the same time base as the RTC time base and the time on which the update is scheduled is perfectly in sync with the RTC time base. Plus the update must be completed _before_ then next update cycle of the RTC starts which is what the 0.5 sec offset is trying to solve. Emphasis on _trying_. But with NTP that's not the case at all. The clocksource which is adjusted by NTP (usually TSC on x86, but that does not matter) is not at all guaranteed to run from the same crystal as the RTC. On most systems the RTC has a seperate crystal which feeds it across poweroff. Even if it _is_ using the same crystal, then the NTP adjustments are going to skew the clocksource frequency against the underlying crystal which feeds the RTC and the clocksource. Q: So how can any assumption about update cycle correlatation between NTP synced CLOCK_REALTIME and the RTC notion of clock realtime be correct? A: Not at all. Aside of that the magic correction of the time which is written to the RTC is completely bogus. Lets start with the interface and the two callers of it: static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec, struct timespec64 *to_set, const struct timespec64 *now) The callers are: sync_cmos_clock() /* The legacy RTC cruft */ struct timespec64 now; struct timespec64 adjust; long target_nsec = NSEC_PER_SEC / 2; ktime_get_real_ts64(&now); if (rtc_tv_nsec_ok(-1 * target_nsec, &adjust, &now)) { if (persistent_clock_is_local) adjust.tv_sec -= (sys_tz.tz_minuteswest * 60); rc = update_persistent_clock64(adjust); } sync_rtc_clock() unsigned long target_nsec; <- Signed unsigned .... struct timespec64 adjust, now; ktime_get_real_ts64(&now); adjust = now; <- Why the difference to the above? if (persistent_clock_is_local) <- Again, why is the ordering different? adjust.tv_sec -= (sys_tz.tz_minuteswest * 60); rc = rtc_set_ntp_time(adjust, &target_nsec) // int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec) struct timespec64 to_set; set_normalized_timespec64(&to_set, 0, -rtc->set_offset_nsec); *target_nsec = to_set.tv_nsec; <- target_nsec = rtc->set_offset_nsec because the timespec is normalized ergo == rtc->set_offset_nsec unless the set_offset_nsec would be negative which makes at all. if (rtc_tv_nsec_ok(rtc->set_offset_nsec, &to_set, &now)) update_rtc(...); So sync_cmos_clock hands in -(NSEC_PER_SEC/2) and the rtc cruft hands in NSEC_PER_SEC/2 by default. The comment in drivers/rtc/class.c says: drivers/rtc/class.c- /* Drivers can revise this default after allocating the device. */ drivers/rtc/class.c: rtc->set_offset_nsec = NSEC_PER_SEC / 2; but no driver ever bothered to change that value. Also the idea of having this offset as type s64 is beyond my understanding. Why the heck would any RTC require to set an offset which is _after_ the second transition. That aside. Looking at the above two variants let's go into rtc_tv_nsec_ok() static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec, struct timespec64 *to_set, const struct timespec64 *now) /* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */ const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5; struct timespec64 delay = {.tv_sec = 0, .tv_nsec = set_offset_nsec}; *to_set = timespec64_add(*now, delay); The scheduled point for both legacy CMOS and RTC modern is at the point of the second minus 0.5 seconds (lets ignore that set_offset_nsec might be different for this). So let's assume the update was scheduled at 659s 500ms independent of legacy or modern. Now legacy does the following: struct timespec64 delay = { .tv_sec = 0, tv_nsec = -5e8 } which is an not normalized timespec to begin with but *to_set = timespec64_add(*now , delay); can deal with that. So the result of this computation is: now - delay IOW, 0.5 seconds before now: 659s 0ms Now the same magic for the 'modern' RTC will do: struct timespec64 delay = { .tv_sec = 0, tv_nsec = 5e8 } so the result of the add is: now + delay IOW, 0.5 seconds _after_ now: 700s 0ms Can you spot the subtle difference? That said, can somebody answer the one million dollar question which problem is solved by all of this magic nonsense? If anyone involved seriously believes that any of this solves a real world problem, then please come forth an make your case. If not, all of this illusionary attempts to be "correct" can be removed for good and the whole thing reduced to a update_rtc_plus_minus_a_second() mechanism, which is exactly what we have today just without the code which pretends to be *exact* or whatever. Thanks, tglx ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 1:14 ` [PATCH] rtc: adapt allowed RTC update error Thomas Gleixner @ 2020-12-03 2:04 ` Jason Gunthorpe 2020-12-03 2:10 ` Alexandre Belloni 1 sibling, 0 replies; 24+ messages in thread From: Jason Gunthorpe @ 2020-12-03 2:04 UTC (permalink / raw) To: Thomas Gleixner Cc: Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, Alexandre Belloni, linux-rtc, Peter Zijlstra On Thu, Dec 03, 2020 at 02:14:12AM +0100, Thomas Gleixner wrote: > If anyone involved seriously believes that any of this solves a real > world problem, then please come forth an make your case. The original commit 0f295b0650c9 ("rtc: Allow rtc drivers to specify the tv_nsec value for ntp") was tested by myself and RMK on various ARM systems and did work as advertised. Here is the giant thread, RMK's post explains the problem and gives his measurements of several different RTCs: https://lore.kernel.org/linux-arm-kernel/20170920112152.GL20805@n2100.armlinux.org.uk/ And the patch that resulted: https://lore.kernel.org/linux-arm-kernel/20171013175433.GA22062@obsidianresearch.com/ There is a lot of detail in there.. Keep in mind none of this was for the mc146818 style RTCs. I can't recall any more why no drivers use the set_offset_nsec. I'm surprised, maybe I forgot to send the patch for the RTCs I tested or maybe it got dropped someplace.. It certainly was needed for some maxim I2C chips. The thread shows rmk had even written a hrtimer patch to go with this, but it also got lost for some reason. Maybe all the arguing killed further effort? Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 1:14 ` [PATCH] rtc: adapt allowed RTC update error Thomas Gleixner 2020-12-03 2:04 ` Jason Gunthorpe @ 2020-12-03 2:10 ` Alexandre Belloni 2020-12-03 15:39 ` Thomas Gleixner 2020-12-03 15:52 ` Jason Gunthorpe 1 sibling, 2 replies; 24+ messages in thread From: Alexandre Belloni @ 2020-12-03 2:10 UTC (permalink / raw) To: Thomas Gleixner Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra Hello Thomas, I'll take more time to reply more in depth to the whole email but... On 03/12/2020 02:14:12+0100, Thomas Gleixner wrote: > Aside of that the magic correction of the time which is written to the > RTC is completely bogus. Lets start with the interface and the two > callers of it: > > static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec, > struct timespec64 *to_set, > const struct timespec64 *now) > > The callers are: > > sync_cmos_clock() /* The legacy RTC cruft */ > struct timespec64 now; > struct timespec64 adjust; > long target_nsec = NSEC_PER_SEC / 2; > > ktime_get_real_ts64(&now); > if (rtc_tv_nsec_ok(-1 * target_nsec, &adjust, &now)) { > if (persistent_clock_is_local) > adjust.tv_sec -= (sys_tz.tz_minuteswest * 60); > rc = update_persistent_clock64(adjust); > } > > sync_rtc_clock() > unsigned long target_nsec; <- Signed unsigned .... > struct timespec64 adjust, now; > > ktime_get_real_ts64(&now); > > adjust = now; <- Why the difference to the above? > > if (persistent_clock_is_local) <- Again, why is the ordering different? > adjust.tv_sec -= (sys_tz.tz_minuteswest * 60); > > rc = rtc_set_ntp_time(adjust, &target_nsec) > // int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec) > > struct timespec64 to_set; > > set_normalized_timespec64(&to_set, 0, -rtc->set_offset_nsec); > *target_nsec = to_set.tv_nsec; <- target_nsec = rtc->set_offset_nsec > because the timespec is normalized > ergo == rtc->set_offset_nsec > unless the set_offset_nsec would > be negative which makes at all. > > if (rtc_tv_nsec_ok(rtc->set_offset_nsec, &to_set, &now)) > update_rtc(...); > > So sync_cmos_clock hands in -(NSEC_PER_SEC/2) and the rtc cruft hands in > NSEC_PER_SEC/2 by default. The comment in drivers/rtc/class.c says: > > drivers/rtc/class.c- /* Drivers can revise this default after allocating the device. */ > drivers/rtc/class.c: rtc->set_offset_nsec = NSEC_PER_SEC / 2; > > but no driver ever bothered to change that value. Also the idea of > having this offset as type s64 is beyond my understanding. Why the heck > would any RTC require to set an offset which is _after_ the second > transition. > This (no driver making use of set_offset_nsec) happened because it got applied without me agreeing to the change. I did complain at the time and RMK walked away. [...] > That said, can somebody answer the one million dollar question which > problem is solved by all of this magic nonsense? > The goal was to remove the 500ms offset for all the RTCs but the MC146818 because there are RTC that will reset properly their counter when setting the time, meaning they can be set very precisely. IIRC, used in conjunction with rtc_hctosys which also adds inconditionnaly 500ms this can ends up with the system time being one second away from the wall clock time which NTP will take quite some time to remove. > If anyone involved seriously believes that any of this solves a real > world problem, then please come forth an make your case. > > If not, all of this illusionary attempts to be "correct" can be removed > for good and the whole thing reduced to a > > update_rtc_plus_minus_a_second() > > mechanism, which is exactly what we have today just without the code > which pretends to be *exact* or whatever. > Coincidentally, I was going to revert those patches for v5.11. Also, honestly, I still don't understand why the kernel needs to set the RTC while userspace is very well equipped to do that. chrony is able to set the RTC time and it can do so precisely. It can even compute how that RTC is time drifting and that value can already be used to adjust the RTC crystal. From my tests, with some efforts, userspace can set the RTC time with a 20µs precision, even on low end systems. To do so, it doesn't need set_offset_nsec. I also don't like hctosys, it is currently wrong but I can see use cases and now systemd relies on its presence so my plan is to fix it. -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 2:10 ` Alexandre Belloni @ 2020-12-03 15:39 ` Thomas Gleixner 2020-12-03 16:16 ` Jason Gunthorpe 2020-12-03 17:29 ` Alexandre Belloni 2020-12-03 15:52 ` Jason Gunthorpe 1 sibling, 2 replies; 24+ messages in thread From: Thomas Gleixner @ 2020-12-03 15:39 UTC (permalink / raw) To: Alexandre Belloni Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra Alexandre, On Thu, Dec 03 2020 at 03:10, Alexandre Belloni wrote: > On 03/12/2020 02:14:12+0100, Thomas Gleixner wrote: >> That said, can somebody answer the one million dollar question which >> problem is solved by all of this magic nonsense? >> > The goal was to remove the 500ms offset for all the RTCs but the > MC146818 because there are RTC that will reset properly their counter > when setting the time, meaning they can be set very precisely. The MC setting is halfways precise. The write resets the divider chain and when the reset is removed then the next UIP will happen after the magic 0.5 seconds. So yes, writing it 500ms _before_ the next second is halfways correct assumed that there is no interference between ktime_get_real() and the actual write which is a silly assumption as the code is fully preemptible. > IIRC, used in conjunction with rtc_hctosys which also adds > inconditionnaly 500ms this can ends up with the system time > being one second away from the wall clock time which NTP will take quite > some time to remove. The rtc_cmos() driver has a fun comment in cmos_set_time().... The logic in sync_cmos_clock() and rtc_set_ntp_time() is different as I pointed out: sync_cmos_clock() hands -500ms to rtc_tv_nsec_ok() and rtc_set_ntp_time() uses +500ms, IOW exactly ONE second difference in behaviour. > Coincidentally, I was going to revert those patches for v5.11. Which will break the rtc_cmos() driver in a different way. We should really fix that proper and just have the 500ms offset for rtc_cmos, aka. MC146818. When other drivers want a different offset, then they still can do so. The direct /dev/rtc settime ioctl is not using that logic anyway. Thats the business of the user space application to get that straight which is scheduling lottery as well. Thanks, tglx ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 15:39 ` Thomas Gleixner @ 2020-12-03 16:16 ` Jason Gunthorpe 2020-12-03 21:05 ` Thomas Gleixner 2020-12-03 17:29 ` Alexandre Belloni 1 sibling, 1 reply; 24+ messages in thread From: Jason Gunthorpe @ 2020-12-03 16:16 UTC (permalink / raw) To: Thomas Gleixner Cc: Alexandre Belloni, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03, 2020 at 04:39:21PM +0100, Thomas Gleixner wrote: > The logic in sync_cmos_clock() and rtc_set_ntp_time() is different as I > pointed out: sync_cmos_clock() hands -500ms to rtc_tv_nsec_ok() and > rtc_set_ntp_time() uses +500ms, IOW exactly ONE second difference in > behaviour. I understood this is because the two APIs work differently, rmk explained this as: > 1. kernel/time/ntp.c assumes that all RTCs want to be told to set the > time at around 500ms into the second. > > 2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms, > then we want to set the _next_ second. ie one path is supposed to round down and one path is supposed to round up, so you get to that 1s difference.. IIRC this is also connected to why the offset is signed.. Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 16:16 ` Jason Gunthorpe @ 2020-12-03 21:05 ` Thomas Gleixner 2020-12-03 21:31 ` Thomas Gleixner 2020-12-03 22:00 ` Alexandre Belloni 0 siblings, 2 replies; 24+ messages in thread From: Thomas Gleixner @ 2020-12-03 21:05 UTC (permalink / raw) To: Jason Gunthorpe Cc: Alexandre Belloni, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03 2020 at 12:16, Jason Gunthorpe wrote: > On Thu, Dec 03, 2020 at 04:39:21PM +0100, Thomas Gleixner wrote: > >> The logic in sync_cmos_clock() and rtc_set_ntp_time() is different as I >> pointed out: sync_cmos_clock() hands -500ms to rtc_tv_nsec_ok() and >> rtc_set_ntp_time() uses +500ms, IOW exactly ONE second difference in >> behaviour. > > I understood this is because the two APIs work differently, rmk > explained this as: > >> 1. kernel/time/ntp.c assumes that all RTCs want to be told to set the >> time at around 500ms into the second. >> >> 2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms, >> then we want to set the _next_ second. > > ie one path is supposed to round down and one path is supposed to > round up, so you get to that 1s difference.. > > IIRC this is also connected to why the offset is signed.. The problem is that it is device specific and therefore having the offset parameter is a good starting point. Lets look at the two scenarios: 1) Direct accessible RTC: tsched t1 t2 write(newsec) RTC increments seconds For rtc_cmos/MC1... tinc = t2 - t1 = 500ms There are RTCs which reset the thing on write so tinc = t2 - t1 = 1000ms No idea what other variants are out there, but the principle is the same for all of them. Lets assume that the event is accurate for now and ignore the fuzz logic, i.e. tsched == t1 tsched must be scheduled to happen tinc before wallclock increments seconds so that the RTC increments seconds at the same time. That means newsec = t1.tv_sec. So now the fuzz logic for the legacy cmos path does: newtime = t1 - tinc; if (newtime.tv_nsec < FUZZ) newsec = newtime.tv_sec; else if (newtime.tv_nsec > NSEC_PER_SEC - FUZZ) newsec = newtime.tv_sec + 1; else goto fail; The first condition handles the case where t1 >= tsched and the second one where t1 < tsched. We need the same logic for rtc_cmos() when the update goes through the RTC path, which is broken today. See below. 2) I2C/SPI ... tsched t0 t1 t2 transfer(newsec) RTC update (newsec) RTC increments seconds Lets assume that ttransfer = t1 - t0 is known. tinc is the same as above = t2 - t1 Again, lets assume that the event is accurate for now and ignore the fuzz logic, i.e. tsched == t0 So tsched has to be ttot = t2 - t0 _before_ wallclock reaches t2 and increments seconds. In this case newsec = t1.tv_sec = (t0 + ttransfer).tv_sec So now the fuzz logic for this is: newtime = t0 + ttransfer; if (newtime.tv_nsec < FUZZ) newsec = newtime.tv_sec; else if (newtime.tv_nsec > NSEC_PER_SEC - FUZZ) newsec = newtime.tv_sec + 1; else goto fail; Again the first condition handles the case where t1 >= tsched and the second one where t1 < tsched. So now we have two options to fix this: 1) Use a negative sync_offset for devices which need #1 above (rtc_cmos & similar) That requires setting tsched to t2 - abs(sync_offset) 2) Use always a positive sync_offset and a flag which tells rtc_tv_nsec_ok() whether it needs to add or subtract. #1 is good enough. All it takes is a comment at the timer start code why abs() is required. Let me hack that up along with the hrtimer muck. Thanks, tglx ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 21:05 ` Thomas Gleixner @ 2020-12-03 21:31 ` Thomas Gleixner 2020-12-03 22:36 ` Jason Gunthorpe 2020-12-03 22:00 ` Alexandre Belloni 1 sibling, 1 reply; 24+ messages in thread From: Thomas Gleixner @ 2020-12-03 21:31 UTC (permalink / raw) To: Jason Gunthorpe Cc: Alexandre Belloni, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03 2020 at 22:05, Thomas Gleixner wrote: > On Thu, Dec 03 2020 at 12:16, Jason Gunthorpe wrote: > So now we have two options to fix this: > > 1) Use a negative sync_offset for devices which need #1 above > (rtc_cmos & similar) > > That requires setting tsched to t2 - abs(sync_offset) > > 2) Use always a positive sync_offset and a flag which tells > rtc_tv_nsec_ok() whether it needs to add or subtract. > > #1 is good enough. All it takes is a comment at the timer start code why > abs() is required. > > Let me hack that up along with the hrtimer muck. That comment in rtc.h makes me cry: /* Number of nsec it takes to set the RTC clock. This influences when * the set ops are called. An offset: * - of 0.5 s will call RTC set for wall clock time 10.0 s at 9.5 s * - of 1.5 s will call RTC set for wall clock time 10.0 s at 8.5 s * - of -0.5 s will call RTC set for wall clock time 10.0 s at 10.5 s */ Setting the wall clock time 10.0 at 10.5 is only possible for time traveling RTCs. It magically works, but come on ... Thanks, tglx ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 21:31 ` Thomas Gleixner @ 2020-12-03 22:36 ` Jason Gunthorpe 2020-12-04 13:02 ` Thomas Gleixner 0 siblings, 1 reply; 24+ messages in thread From: Jason Gunthorpe @ 2020-12-03 22:36 UTC (permalink / raw) To: Thomas Gleixner Cc: Alexandre Belloni, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03, 2020 at 10:31:02PM +0100, Thomas Gleixner wrote: > On Thu, Dec 03 2020 at 22:05, Thomas Gleixner wrote: > > On Thu, Dec 03 2020 at 12:16, Jason Gunthorpe wrote: > > So now we have two options to fix this: > > > > 1) Use a negative sync_offset for devices which need #1 above > > (rtc_cmos & similar) > > > > That requires setting tsched to t2 - abs(sync_offset) > > > > 2) Use always a positive sync_offset and a flag which tells > > rtc_tv_nsec_ok() whether it needs to add or subtract. > > > > #1 is good enough. All it takes is a comment at the timer start code why > > abs() is required. > > > > Let me hack that up along with the hrtimer muck. > > That comment in rtc.h makes me cry: > > /* Number of nsec it takes to set the RTC clock. This influences when > * the set ops are called. An offset: > * - of 0.5 s will call RTC set for wall clock time 10.0 s at 9.5 s > * - of 1.5 s will call RTC set for wall clock time 10.0 s at 8.5 s > * - of -0.5 s will call RTC set for wall clock time 10.0 s at 10.5 s > */ > > Setting the wall clock time 10.0 at 10.5 is only possible for time > traveling RTCs. It magically works, but come on ... No tardis required. You can think of storing to a RTC as including a millisecond component, so the three examples are: 10.0 stores 9.5, 10.0 stores 8.5, 10.0 stores 10.5. It was probably included due to cmos, either as a misunderstanding what it does, or it does actually store 10.5 when you store 10.0.. Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 22:36 ` Jason Gunthorpe @ 2020-12-04 13:02 ` Thomas Gleixner 2020-12-04 14:08 ` Jason Gunthorpe 0 siblings, 1 reply; 24+ messages in thread From: Thomas Gleixner @ 2020-12-04 13:02 UTC (permalink / raw) To: Jason Gunthorpe Cc: Alexandre Belloni, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03 2020 at 18:36, Jason Gunthorpe wrote: > On Thu, Dec 03, 2020 at 10:31:02PM +0100, Thomas Gleixner wrote: >> On Thu, Dec 03 2020 at 22:05, Thomas Gleixner wrote: >> > On Thu, Dec 03 2020 at 12:16, Jason Gunthorpe wrote: >> > So now we have two options to fix this: >> > >> > 1) Use a negative sync_offset for devices which need #1 above >> > (rtc_cmos & similar) >> > >> > That requires setting tsched to t2 - abs(sync_offset) >> > >> > 2) Use always a positive sync_offset and a flag which tells >> > rtc_tv_nsec_ok() whether it needs to add or subtract. >> > >> > #1 is good enough. All it takes is a comment at the timer start code why >> > abs() is required. >> > >> > Let me hack that up along with the hrtimer muck. >> >> That comment in rtc.h makes me cry: >> >> /* Number of nsec it takes to set the RTC clock. This influences when >> * the set ops are called. An offset: >> * - of 0.5 s will call RTC set for wall clock time 10.0 s at 9.5 s >> * - of 1.5 s will call RTC set for wall clock time 10.0 s at 8.5 s >> * - of -0.5 s will call RTC set for wall clock time 10.0 s at 10.5 s >> */ >> >> Setting the wall clock time 10.0 at 10.5 is only possible for time >> traveling RTCs. It magically works, but come on ... > > No tardis required. You can think of storing to a RTC as including a > millisecond component, so the three examples are: 10.0 stores 9.5, > 10.0 stores 8.5, 10.0 stores 10.5. > > It was probably included due to cmos, either as a misunderstanding > what it does, or it does actually store 10.5 when you store 10.0.. Yes, it kinda stores 10.5 because after the write the next seconds increment happens 500ms later. But none of this magic is actually required because the behaviour of most RTCs is that the next seconds increment happens exactly 1000ms _after_ the write. Which means _all_ of these offsets are positive: tsched twrite tnextsec For CMOS tsched == twrite and tnextsec - twrite = 500ms For I2C tsched = tnextsec - 1000ms - ttransport which means the formula is the same for all of them tRTCinc = tnextsec - twrite tsched = tnextsec - tRTCinc - ttransport And this covers also the (probably unlikely) case where the I2C RTC has a tRTCinc != 1000ms. Imagine a i2c based MC14xxx which would have: offset = 500ms + ttransport No magic sign calculation required if you look at it from the actual timeline and account the time between write and next second increment correctly. Thanks, tglx ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 13:02 ` Thomas Gleixner @ 2020-12-04 14:08 ` Jason Gunthorpe 2020-12-04 14:37 ` Alexandre Belloni 0 siblings, 1 reply; 24+ messages in thread From: Jason Gunthorpe @ 2020-12-04 14:08 UTC (permalink / raw) To: Thomas Gleixner Cc: Alexandre Belloni, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Fri, Dec 04, 2020 at 02:02:57PM +0100, Thomas Gleixner wrote: > No magic sign calculation required if you look at it from the actual > timeline and account the time between write and next second increment > correctly. Yes, it is equivalent to break things into two values, and does look to be more understandable as one can read at least one of the values from a datasheet and the other could be estimated by timing a read clock Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 14:08 ` Jason Gunthorpe @ 2020-12-04 14:37 ` Alexandre Belloni 2020-12-04 14:46 ` Jason Gunthorpe 0 siblings, 1 reply; 24+ messages in thread From: Alexandre Belloni @ 2020-12-04 14:37 UTC (permalink / raw) To: Jason Gunthorpe Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On 04/12/2020 10:08:19-0400, Jason Gunthorpe wrote: > On Fri, Dec 04, 2020 at 02:02:57PM +0100, Thomas Gleixner wrote: > > > No magic sign calculation required if you look at it from the actual > > timeline and account the time between write and next second increment > > correctly. > > Yes, it is equivalent to break things into two values, and does look > to be more understandable as one can read at least one of the values > from a datasheet and the other could be estimated by timing a read > clock > If you want to read an RTC accurately, you don't want to time a read, what you want is to time an alarm. This is a common misconception and is, again, why hctosys in its current state is not useful. And because people using systohc are definitively using hctosys, this will still result in an up to 500ms error in the current time. As said, the price to pay for a proper solution will be an up to one second delay when booting which is not acceptable for most users. Is "fixing" systohc worth the effort and the maintenance cost? -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 14:37 ` Alexandre Belloni @ 2020-12-04 14:46 ` Jason Gunthorpe 2020-12-04 15:08 ` Alexandre Belloni 0 siblings, 1 reply; 24+ messages in thread From: Jason Gunthorpe @ 2020-12-04 14:46 UTC (permalink / raw) To: Alexandre Belloni Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Fri, Dec 04, 2020 at 03:37:35PM +0100, Alexandre Belloni wrote: > On 04/12/2020 10:08:19-0400, Jason Gunthorpe wrote: > > On Fri, Dec 04, 2020 at 02:02:57PM +0100, Thomas Gleixner wrote: > > > > > No magic sign calculation required if you look at it from the actual > > > timeline and account the time between write and next second increment > > > correctly. > > > > Yes, it is equivalent to break things into two values, and does look > > to be more understandable as one can read at least one of the values > > from a datasheet and the other could be estimated by timing a read > > clock > > > > If you want to read an RTC accurately, you don't want to time a read, > what you want is to time an alarm. This is a common misconception and > is, again, why hctosys in its current state is not useful. I mean literatally time the excution of something like a straight read. This will give some estimate of the bus latency and it should linearly relate to the bus latency for a write. The driver could time configuring an alarm as well, if it likes. > And because people using systohc are definitively using hctosys, this > will still result in an up to 500ms error in the current time. > As said, the price to pay for a proper solution will be an up to one > second delay when booting which is not acceptable for most users. IIRC I had fixed this in some embedded system long ago by having hctosys reading seconds time during boot, then in parallel using the 'up to one second' method to get the sub-second resolution. This means there was a sub second non-monotonicity in the realtime clock, but the system was designed to tolerate this as it also ran a modified NTP which would unconditionally step the clock on first sync if it was over .1s out of alignment. The goal was to bring the system to correct time as quickly as possible in as many cases as possible, not to maintain the monotonicity of the realtime clock. > Is "fixing" systohc worth the effort and the maintenance cost? As I said before, if there is no desire to address the read side then the whole thing should be abandoned. Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 14:46 ` Jason Gunthorpe @ 2020-12-04 15:08 ` Alexandre Belloni 2020-12-04 15:57 ` Jason Gunthorpe 0 siblings, 1 reply; 24+ messages in thread From: Alexandre Belloni @ 2020-12-04 15:08 UTC (permalink / raw) To: Jason Gunthorpe Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On 04/12/2020 10:46:59-0400, Jason Gunthorpe wrote: > > If you want to read an RTC accurately, you don't want to time a read, > > what you want is to time an alarm. This is a common misconception and > > is, again, why hctosys in its current state is not useful. > > I mean literatally time the excution of something like a straight > read. This will give some estimate of the bus latency and it should > linearly relate to the bus latency for a write. > It doesn't, some rtc will require writing dozen registers to set the time and reading only 3 to get the time, the only accurate way is to really time setting the time. You set the RTC time once, set up an alarm for the next second, once you get the alarm, you get system time and you now how far you are off. Notice that systohc could do that if you wanted to be accurate and then the whole issue with mc146818 is gone and this nicely solves it for all the RTCs at once. > The driver could time configuring an alarm as well, if it likes. The driver should definitively not have to do the timing. the core, maybe but I won't go over the 165 drivers to add timing. > > > And because people using systohc are definitively using hctosys, this > > will still result in an up to 500ms error in the current time. > > As said, the price to pay for a proper solution will be an up to one > > second delay when booting which is not acceptable for most users. > > IIRC I had fixed this in some embedded system long ago by having > hctosys reading seconds time during boot, then in parallel using the > 'up to one second' method to get the sub-second resolution. > > This means there was a sub second non-monotonicity in the realtime > clock, but the system was designed to tolerate this as it also ran a > modified NTP which would unconditionally step the clock on first sync > if it was over .1s out of alignment. > > The goal was to bring the system to correct time as quickly as > possible in as many cases as possible, not to maintain the > monotonicity of the realtime clock. > I'm really curious, in your use case, couldn't you have read the RTC from userspace and set the system time from there, right before starting NTP and other applications? Doing so, you would have probably been able to ensure monotonicity. > > Is "fixing" systohc worth the effort and the maintenance cost? > > As I said before, if there is no desire to address the read side then > the whole thing should be abandoned. > What was your plan back in 2017? -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 15:08 ` Alexandre Belloni @ 2020-12-04 15:57 ` Jason Gunthorpe 2020-12-04 16:35 ` Alexandre Belloni 0 siblings, 1 reply; 24+ messages in thread From: Jason Gunthorpe @ 2020-12-04 15:57 UTC (permalink / raw) To: Alexandre Belloni Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Fri, Dec 04, 2020 at 04:08:57PM +0100, Alexandre Belloni wrote: > > I mean literatally time the excution of something like a straight > > read. This will give some estimate of the bus latency and it should > > linearly relate to the bus latency for a write. > > > It doesn't, some rtc will require writing dozen registers to set the > time and reading only 3 to get the time, the only accurate way is to > really time setting the time. You set the RTC time once, set up an alarm for > the next second, once you get the alarm, you get system time and you now > how far you are off. This is why I said linearly related. Yes the relation is per-driver, based on the ops required, but in principle it can get close. > Notice that systohc could do that if you wanted to be accurate and then > the whole issue with mc146818 is gone and this nicely solves it for all > the RTCs at once. Another good solution is to have systohc progam the time and then immediately read it back (eg with an alarm). The difference between the read back and the system RTC is the single value to plug into the adjustment offset and naturally includes all the time factors Thomas identified, including the additional factor of 'latency to read the time' > > The goal was to bring the system to correct time as quickly as > > possible in as many cases as possible, not to maintain the > > monotonicity of the realtime clock. > > I'm really curious, in your use case, couldn't you have read the RTC > from userspace and set the system time from there, right before starting > NTP and other applications? This was RAM constrainted embedded, a few hundred bytes of kernel code wins over 100k of userspace > Doing so, you would have probably been able to ensure monotonicity. No, this case also wasn't willing to wait the 1s to load the time. It had to go parallel with the rest of the boot up. This was enterprise gear, boot time to operational counts against the achievable availability rating. From an upstream perspective this was hacky because - We historically try not to create non-monotinicity in CLOCK_REALTIME because too much stuff wrongly works on CLOCK_REALTIME when they really need CLOCK_MONOTONIC - Having the kernel async set the clock is racy with NTP also trying to set the clock But in a closed system the two above were delt with reliably. > > As I said before, if there is no desire to address the read side then > > the whole thing should be abandoned. > > What was your plan back in 2017? Well I was helping RMK because we had similar issues, but I saw some path to achive the low offset round trip, and view this as laudable for the embedded world. I see I also changed jobs right around then which probably explains why things stopped at this one patch. The fact nobody else picked it up probably says people really just don't care about realtime accuracy. Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 15:57 ` Jason Gunthorpe @ 2020-12-04 16:35 ` Alexandre Belloni 0 siblings, 0 replies; 24+ messages in thread From: Alexandre Belloni @ 2020-12-04 16:35 UTC (permalink / raw) To: Jason Gunthorpe Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On 04/12/2020 11:57:08-0400, Jason Gunthorpe wrote: > On Fri, Dec 04, 2020 at 04:08:57PM +0100, Alexandre Belloni wrote: > > > > I mean literatally time the excution of something like a straight > > > read. This will give some estimate of the bus latency and it should > > > linearly relate to the bus latency for a write. > > > > > > It doesn't, some rtc will require writing dozen registers to set the > > time and reading only 3 to get the time, the only accurate way is to > > really time setting the time. You set the RTC time once, set up an alarm for > > the next second, once you get the alarm, you get system time and you now > > how far you are off. > > This is why I said linearly related. Yes the relation is per-driver, > based on the ops required, but in principle it can get close. > > > Notice that systohc could do that if you wanted to be accurate and then > > the whole issue with mc146818 is gone and this nicely solves it for all > > the RTCs at once. > > Another good solution is to have systohc progam the time and then > immediately read it back (eg with an alarm). This is what I was suggesting, with an alarm. > The difference between > the read back and the system RTC is the single value to plug into the > adjustment offset and naturally includes all the time factors Thomas > identified, including the additional factor of 'latency to read the > time' There is no 'latency to read the time' because you don't have to read the time. You already know what the time will be when the alarm fires. That is when you read the system time and you can figure out what the offset is. But you never need to read the time. [...] > I see I also changed jobs right around then which probably explains > why things stopped at this one patch. The fact nobody else picked it > up probably says people really just don't care about realtime > accuracy. Or those that do care do it from userspace. -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 21:05 ` Thomas Gleixner 2020-12-03 21:31 ` Thomas Gleixner @ 2020-12-03 22:00 ` Alexandre Belloni 2020-12-04 9:34 ` Thomas Gleixner 1 sibling, 1 reply; 24+ messages in thread From: Alexandre Belloni @ 2020-12-03 22:00 UTC (permalink / raw) To: Thomas Gleixner Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On 03/12/2020 22:05:09+0100, Thomas Gleixner wrote: > 2) I2C/SPI ... > > tsched t0 t1 t2 > transfer(newsec) RTC update (newsec) RTC increments seconds > > Lets assume that ttransfer = t1 - t0 is known. Note that ttransfer is one of the reason why setting set_offset_nsec from the RTC driver is not a good idea. The same RTC may be on busses with different rates and there is no way to know that. I think that was one of my objections at the time. ttransfer is not a function of the RTC model but rather of how it is integrated in the system. > > tinc is the same as above = t2 - t1 > > Again, lets assume that the event is accurate for now and ignore the fuzz > logic, i.e. tsched == t0 > > So tsched has to be ttot = t2 - t0 _before_ wallclock reaches t2 and > increments seconds. > I had a rough week and I'm probably not awake enough to follow completely but is that thinking correct? For the mc146818, you have t1 - t0 which is probably negligible and t2 - T& == 500 ms For most of the other RTCs, you have t1 - t0 is somewhat important, probably around 100 to 150µs and t2 - t1 is 1s. I would think that what is needed is tsched has to be t1-t0 before wallclock reaches t1. In that case t2 doesn't matter, it will always be 1s after t1. > In this case newsec = t1.tv_sec = (t0 + ttransfer).tv_sec > > So now the fuzz logic for this is: > > newtime = t0 + ttransfer; > > if (newtime.tv_nsec < FUZZ) > newsec = newtime.tv_sec; > else if (newtime.tv_nsec > NSEC_PER_SEC - FUZZ) > newsec = newtime.tv_sec + 1; > else > goto fail; > > Again the first condition handles the case where t1 >= tsched and the > second one where t1 < tsched. > > So now we have two options to fix this: > > 1) Use a negative sync_offset for devices which need #1 above > (rtc_cmos & similar) > > That requires setting tsched to t2 - abs(sync_offset) > > 2) Use always a positive sync_offset and a flag which tells > rtc_tv_nsec_ok() whether it needs to add or subtract. > > #1 is good enough. All it takes is a comment at the timer start code why > abs() is required. > > Let me hack that up along with the hrtimer muck. > > Thanks, > > tglx -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 22:00 ` Alexandre Belloni @ 2020-12-04 9:34 ` Thomas Gleixner 2020-12-04 9:51 ` Alexandre Belloni 0 siblings, 1 reply; 24+ messages in thread From: Thomas Gleixner @ 2020-12-04 9:34 UTC (permalink / raw) To: Alexandre Belloni Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03 2020 at 23:00, Alexandre Belloni wrote: > On 03/12/2020 22:05:09+0100, Thomas Gleixner wrote: >> 2) I2C/SPI ... >> >> tsched t0 t1 t2 >> transfer(newsec) RTC update (newsec) RTC increments seconds >> >> Lets assume that ttransfer = t1 - t0 is known. > > Note that ttransfer is one of the reason why setting set_offset_nsec > from the RTC driver is not a good idea. The same RTC may be on busses > with different rates and there is no way to know that. I think that was > one of my objections at the time. > > ttransfer is not a function of the RTC model but rather of how it is > integrated in the system. Yes, but it's the right place to store that information. It's a fundamental problem of the RTC driver because that's the one which has to be able to tell the caller about it. The caller has absolutely no way to figure it out because it does not even know what type of RTC is there. So either the RTC knows the requirements for tsched, e.g. the MC14xxx datasheet, or it can retrieve that information from DT or by querying the underlying bus mechanics for the xfer time estimate or just by timing an xfer for reference. Thanks, tglx ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 9:34 ` Thomas Gleixner @ 2020-12-04 9:51 ` Alexandre Belloni 2020-12-04 10:44 ` Thomas Gleixner 0 siblings, 1 reply; 24+ messages in thread From: Alexandre Belloni @ 2020-12-04 9:51 UTC (permalink / raw) To: Thomas Gleixner Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On 04/12/2020 10:34:13+0100, Thomas Gleixner wrote: > On Thu, Dec 03 2020 at 23:00, Alexandre Belloni wrote: > > On 03/12/2020 22:05:09+0100, Thomas Gleixner wrote: > >> 2) I2C/SPI ... > >> > >> tsched t0 t1 t2 > >> transfer(newsec) RTC update (newsec) RTC increments seconds > >> > >> Lets assume that ttransfer = t1 - t0 is known. > > > > Note that ttransfer is one of the reason why setting set_offset_nsec > > from the RTC driver is not a good idea. The same RTC may be on busses > > with different rates and there is no way to know that. I think that was > > one of my objections at the time. > > > > ttransfer is not a function of the RTC model but rather of how it is > > integrated in the system. > > Yes, but it's the right place to store that information. > > It's a fundamental problem of the RTC driver because that's the one > which has to be able to tell the caller about it. The caller has > absolutely no way to figure it out because it does not even know what > type of RTC is there. > > So either the RTC knows the requirements for tsched, e.g. the MC14xxx > datasheet, or it can retrieve that information from DT or by querying > the underlying bus mechanics for the xfer time estimate or just by > timing an xfer for reference. > What I do from userspace is that the caller is the one estimating the transfer time and this works very well. I really think that the ntp code could do just the same. -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 9:51 ` Alexandre Belloni @ 2020-12-04 10:44 ` Thomas Gleixner 0 siblings, 0 replies; 24+ messages in thread From: Thomas Gleixner @ 2020-12-04 10:44 UTC (permalink / raw) To: Alexandre Belloni Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Fri, Dec 04 2020 at 10:51, Alexandre Belloni wrote: > On 04/12/2020 10:34:13+0100, Thomas Gleixner wrote: >> So either the RTC knows the requirements for tsched, e.g. the MC14xxx >> datasheet, or it can retrieve that information from DT or by querying >> the underlying bus mechanics for the xfer time estimate or just by >> timing an xfer for reference. >> > > What I do from userspace is that the caller is the one estimating the > transfer time and this works very well. I really think that the ntp code > could do just the same. For MC14xxx type RTCs tsched is defined by a constant, so heuristics are really horrible because you have to poll the RTC to get it correct. What's the point if the driver can just provide the value from the data sheet? For RTC's behind a bus the driver its pretty simple to let the driver tell at RTC registration time that the transfer time is unknown. So you don't have to add the estimation procedure to each driver. You simply can add it to the core in one place and expose that info to user space as well as a starting point. Sticking that into the NTP code is really the wrong place. That's like asking the users of a timer device to calibrate it before usage. The requirements for writing a RTC are not a problem of the caller, they are fundamental properties of the RTC itself. So why on earth are you asking every user to implement heuristics to figure these out themself? Having it as property of the RTC device gives at least a halfways correct value for the periodic kernel side update and if user space want's to do better then it still can do so. Thanks, tglx ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 15:39 ` Thomas Gleixner 2020-12-03 16:16 ` Jason Gunthorpe @ 2020-12-03 17:29 ` Alexandre Belloni 2020-12-03 19:52 ` Thomas Gleixner 1 sibling, 1 reply; 24+ messages in thread From: Alexandre Belloni @ 2020-12-03 17:29 UTC (permalink / raw) To: Thomas Gleixner Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On 03/12/2020 16:39:21+0100, Thomas Gleixner wrote: > Alexandre, > > On Thu, Dec 03 2020 at 03:10, Alexandre Belloni wrote: > > On 03/12/2020 02:14:12+0100, Thomas Gleixner wrote: > >> That said, can somebody answer the one million dollar question which > >> problem is solved by all of this magic nonsense? > >> > > The goal was to remove the 500ms offset for all the RTCs but the > > MC146818 because there are RTC that will reset properly their counter > > when setting the time, meaning they can be set very precisely. > > The MC setting is halfways precise. The write resets the divider chain > and when the reset is removed then the next UIP will happen after the > magic 0.5 seconds. So yes, writing it 500ms _before_ the next second is > halfways correct assumed that there is no interference between > ktime_get_real() and the actual write which is a silly assumption as the > code is fully preemptible. > > > IIRC, used in conjunction with rtc_hctosys which also adds > > inconditionnaly 500ms this can ends up with the system time > > being one second away from the wall clock time which NTP will take quite > > some time to remove. > > The rtc_cmos() driver has a fun comment in cmos_set_time().... > > The logic in sync_cmos_clock() and rtc_set_ntp_time() is different as I > pointed out: sync_cmos_clock() hands -500ms to rtc_tv_nsec_ok() and > rtc_set_ntp_time() uses +500ms, IOW exactly ONE second difference in > behaviour. > > > Coincidentally, I was going to revert those patches for v5.11. > > Which will break the rtc_cmos() driver in a different way. We should > really fix that proper and just have the 500ms offset for rtc_cmos, > aka. MC146818. When other drivers want a different offset, then they > still can do so. > My point was to get back to the previous situation where only rtc_cmos was supposed to work properly by removing rtc_tv_nsec_ok and set_offset_nsec. > The direct /dev/rtc settime ioctl is not using that logic anyway. Thats > the business of the user space application to get that straight which is > scheduling lottery as well. I still don't see how userspace is worse than systohc in that regard and why we need to do that in the kernel. Especially since hctosys is doing a very bad job trying to read the time from the RTC. You may as well not bother with the 500ms and just set the time to the current or next second. And what about the non configurable 659 period, isn't that policy that should be left to userspace configuration? I'm still convinced that set_offset_nsec is not needed to set the time accurately and I still want to remove it. Also, this may be a good time to move systohc.c to kernel/time/ntp.c as this is definitively some NTP specific code being an RTC consumer, very much like alarmtimer.c -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 17:29 ` Alexandre Belloni @ 2020-12-03 19:52 ` Thomas Gleixner 0 siblings, 0 replies; 24+ messages in thread From: Thomas Gleixner @ 2020-12-03 19:52 UTC (permalink / raw) To: Alexandre Belloni Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra Alexandre, On Thu, Dec 03 2020 at 18:29, Alexandre Belloni wrote: > On 03/12/2020 16:39:21+0100, Thomas Gleixner wrote: >> On Thu, Dec 03 2020 at 03:10, Alexandre Belloni wrote: >> > On 03/12/2020 02:14:12+0100, Thomas Gleixner wrote: >> > Coincidentally, I was going to revert those patches for v5.11. >> >> Which will break the rtc_cmos() driver in a different way. We should >> really fix that proper and just have the 500ms offset for rtc_cmos, >> aka. MC146818. When other drivers want a different offset, then they >> still can do so. >> > > My point was to get back to the previous situation where only > rtc_cmos was supposed to work properly by removing rtc_tv_nsec_ok and > set_offset_nsec. If you remove the offset, then rtc_cmos() is off by ~500ms. > I'm still convinced that set_offset_nsec is not needed to set the time > accurately and I still want to remove it. Also, this may be a good time > to move systohc.c to kernel/time/ntp.c as this is definitively some NTP > specific code being an RTC consumer, very much like alarmtimer.c No objections from my side. The main pain point is that the periodic update is seen as ABI by some folks. The fact that you can disable it in Kconfig does not matter. You can disable other stuff like posix timers which is ABI as well. So removing it is not really an option. Keeping it broken or break it differently is neither... Thanks, tglx ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 2:10 ` Alexandre Belloni 2020-12-03 15:39 ` Thomas Gleixner @ 2020-12-03 15:52 ` Jason Gunthorpe 2020-12-03 16:07 ` Alexandre Belloni 1 sibling, 1 reply; 24+ messages in thread From: Jason Gunthorpe @ 2020-12-03 15:52 UTC (permalink / raw) To: Alexandre Belloni Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03, 2020 at 03:10:47AM +0100, Alexandre Belloni wrote: > IIRC, used in conjunction with rtc_hctosys which also adds > inconditionnaly 500ms this can ends up with the system time > being one second away from the wall clock time which NTP will take quite > some time to remove. I can't remember the details, but this was not the intention. As long as systohc and hctosys exist then the design goal of rtclib should be to provide sub-second accuracy on the round trip of time through the RTC. Otherwise what is the point? Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 15:52 ` Jason Gunthorpe @ 2020-12-03 16:07 ` Alexandre Belloni 2020-12-03 20:10 ` Jason Gunthorpe 0 siblings, 1 reply; 24+ messages in thread From: Alexandre Belloni @ 2020-12-03 16:07 UTC (permalink / raw) To: Jason Gunthorpe Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On 03/12/2020 11:52:49-0400, Jason Gunthorpe wrote: > On Thu, Dec 03, 2020 at 03:10:47AM +0100, Alexandre Belloni wrote: > > > IIRC, used in conjunction with rtc_hctosys which also adds > > inconditionnaly 500ms this can ends up with the system time > > being one second away from the wall clock time which NTP will take quite > > some time to remove. > > I can't remember the details, but this was not the intention. > > As long as systohc and hctosys exist then the design goal of rtclib > should be to provide sub-second accuracy on the round trip of time > through the RTC. > > Otherwise what is the point? > hctosys never had a sub second accuracy because there is no way to accurately read the rtc time without being ready to wait up to a second. I have patches doing exactly that but I'm pretty sure nobody wants to pay the price and have a kernel that boots significantly slower. Especially since that would basically affect everyone since systemd requires that hctosys is enabled. -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 16:07 ` Alexandre Belloni @ 2020-12-03 20:10 ` Jason Gunthorpe 0 siblings, 0 replies; 24+ messages in thread From: Jason Gunthorpe @ 2020-12-03 20:10 UTC (permalink / raw) To: Alexandre Belloni Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03, 2020 at 05:07:53PM +0100, Alexandre Belloni wrote: > On 03/12/2020 11:52:49-0400, Jason Gunthorpe wrote: > > On Thu, Dec 03, 2020 at 03:10:47AM +0100, Alexandre Belloni wrote: > > > > > IIRC, used in conjunction with rtc_hctosys which also adds > > > inconditionnaly 500ms this can ends up with the system time > > > being one second away from the wall clock time which NTP will take quite > > > some time to remove. > > > > I can't remember the details, but this was not the intention. > > > > As long as systohc and hctosys exist then the design goal of rtclib > > should be to provide sub-second accuracy on the round trip of time > > through the RTC. > > > > Otherwise what is the point? > > hctosys never had a sub second accuracy because there is no way to > accurately read the rtc time without being ready to wait up to a > second. Yes, I know. I was talking about a goal.. If that is not the goal then just delete it all and update the docs that userspace needs to deal with all of this and the kernel stuff isn't supposed to be useful. Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2020-12-04 16:36 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20201201143835.2054508-1-mlichvar@redhat.com>
[not found] ` <20201201161224.GF5487@ziepe.ca>
[not found] ` <20201201171420.GN1900232@localhost>
[not found] ` <20201201173540.GH5487@ziepe.ca>
[not found] ` <87mtywe2zu.fsf@nanos.tec.linutronix.de>
[not found] ` <20201202162723.GJ5487@ziepe.ca>
[not found] ` <87a6uwdnfn.fsf@nanos.tec.linutronix.de>
[not found] ` <20201202205418.GN5487@ziepe.ca>
[not found] ` <874kl3eu8p.fsf@nanos.tec.linutronix.de>
2020-12-03 1:14 ` [PATCH] rtc: adapt allowed RTC update error Thomas Gleixner
2020-12-03 2:04 ` Jason Gunthorpe
2020-12-03 2:10 ` Alexandre Belloni
2020-12-03 15:39 ` Thomas Gleixner
2020-12-03 16:16 ` Jason Gunthorpe
2020-12-03 21:05 ` Thomas Gleixner
2020-12-03 21:31 ` Thomas Gleixner
2020-12-03 22:36 ` Jason Gunthorpe
2020-12-04 13:02 ` Thomas Gleixner
2020-12-04 14:08 ` Jason Gunthorpe
2020-12-04 14:37 ` Alexandre Belloni
2020-12-04 14:46 ` Jason Gunthorpe
2020-12-04 15:08 ` Alexandre Belloni
2020-12-04 15:57 ` Jason Gunthorpe
2020-12-04 16:35 ` Alexandre Belloni
2020-12-03 22:00 ` Alexandre Belloni
2020-12-04 9:34 ` Thomas Gleixner
2020-12-04 9:51 ` Alexandre Belloni
2020-12-04 10:44 ` Thomas Gleixner
2020-12-03 17:29 ` Alexandre Belloni
2020-12-03 19:52 ` Thomas Gleixner
2020-12-03 15:52 ` Jason Gunthorpe
2020-12-03 16:07 ` Alexandre Belloni
2020-12-03 20:10 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox