From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751363AbcGMNAr (ORCPT ); Wed, 13 Jul 2016 09:00:47 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:34603 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750890AbcGMNAl (ORCPT ); Wed, 13 Jul 2016 09:00:41 -0400 From: Nicolai Stange To: Thomas Gleixner Cc: John Stultz , linux-kernel@vger.kernel.org, Nicolai Stange Subject: [RFC v3 1/3] kernel/time/clockevents: initial support for mono to raw time conversion Date: Wed, 13 Jul 2016 15:00:15 +0200 Message-Id: <20160713130017.8202-2-nicstange@gmail.com> X-Mailer: git-send-email 2.9.0 In-Reply-To: <20160713130017.8202-1-nicstange@gmail.com> References: <20160713130017.8202-1-nicstange@gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With NOHZ_FULL and one single well-isolated, CPU consumptive task, one would expect approximately one clockevent interrupt per second. However, on my Intel Haswell where the monotonic clock is the TSC monotonic clock and the clockevent device is the TSC deadline device, it turns out that every second, there are two such interrupts: the first one arrives always approximately ~50us before the scheduled deadline as programmed by tick_nohz_stop_sched_tick() through the hrtimer API. The __hrtimer_run_queues() called in this interrupt detects that the queued tick_sched_timer hasn't expired yet and simply does nothing except reprogramming the clock event device to fire shortly after again. These too early programmed deadlines are explained as follows: clockevents_program_event() programs the clockevent device to fire after f_event * delta_t_progr clockevent device cycles where f_event is the clockevent device's hardware frequency and delta_t_progr is the requested time interval. After that many clockevent device cycles have elapsed, the device underlying the monotonic clock, that is the monotonic raw clock has seen f_raw / f_event as many cycles. The ktime_get() called from __hrtimer_run_queues() interprets those cycles to run at the frequency of the monotonic clock. Summarizing: delta_t_perc = 1/f_mono * f_raw/f_event * f_event * delta_t_progr = f_raw / f_mono * delta_t_progr with f_mono being the monotonic clock's frequency and delta_t_perc being the elapsed time interval as perceived by __hrtimer_run_queues(). Now, f_mono is not a constant, but is dynamically adjusted in timekeeping_adjust() in order to compensate for the NTP error. With the large values of delta_t_progr of 10^9ns with NOHZ_FULL, the error made becomes significant and results in the double timer interrupts described above. Compensate for this error by multiplying the clockevent device's f_event by f_mono/f_raw. Namely: - Introduce a ->mult_mono member to the struct clock_event_device. It's value is supposed to be equal to ->mult * f_mono/f_raw. - Introduce the timekeeping_get_mono_mult() helper which provides the clockevent core with access to the timekeeping's current f_mono and f_raw. - Introduce the helper __clockevents_adjust_freq() which sets a clockevent device's ->mult_mono member as appropriate. It is implemented with the help of the new __clockevents_calc_adjust_freq(). - Call __clockevents_adjust_freq() at clockevent device registration time as well as at frequency updates through clockevents_update_freq(). - Finally, use the ->mult_mono rather than ->mult in the ns to cycle conversion made in clockevents_program_event(). Note that future adjustments of the monotonic clock are not taken into account yet. Furthemore, this patch assumes that after a clockevent device's registration, its ->mult changes only through calls to clockevents_update_freq(). Signed-off-by: Nicolai Stange --- include/linux/clockchips.h | 1 + kernel/time/clockevents.c | 49 ++++++++++++++++++++++++++++++++++++++++++++- kernel/time/tick-internal.h | 1 + kernel/time/timekeeping.c | 8 ++++++++ 4 files changed, 58 insertions(+), 1 deletion(-) diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index 0d442e3..2863742 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -104,6 +104,7 @@ struct clock_event_device { u64 max_delta_ns; u64 min_delta_ns; u32 mult; + u32 mult_mono; u32 shift; enum clock_event_state state_use_accessors; unsigned int features; diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index a9b76a4..ba7fea4 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -33,6 +33,8 @@ struct ce_unbind { int res; }; +static void __clockevents_adjust_freq(struct clock_event_device *dev); + static u64 cev_delta2ns(unsigned long latch, struct clock_event_device *evt, bool ismax) { @@ -166,6 +168,7 @@ void clockevents_switch_state(struct clock_event_device *dev, if (clockevent_state_oneshot(dev)) { if (unlikely(!dev->mult)) { dev->mult = 1; + dev->mult_mono = 1; WARN_ON(1); } } @@ -335,7 +338,7 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires, delta = min(delta, (int64_t) dev->max_delta_ns); delta = max(delta, (int64_t) dev->min_delta_ns); - clc = ((unsigned long long) delta * dev->mult) >> dev->shift; + clc = ((unsigned long long) delta * dev->mult_mono) >> dev->shift; rc = dev->set_next_event((unsigned long) clc, dev); return (rc && force) ? clockevents_program_min_delta(dev) : rc; @@ -458,6 +461,8 @@ void clockevents_register_device(struct clock_event_device *dev) dev->cpumask = cpumask_of(smp_processor_id()); } + __clockevents_adjust_freq(dev); + raw_spin_lock_irqsave(&clockevents_lock, flags); list_add(&dev->list, &clockevent_devices); @@ -512,9 +517,51 @@ void clockevents_config_and_register(struct clock_event_device *dev, } EXPORT_SYMBOL_GPL(clockevents_config_and_register); +static u32 __clockevents_calc_adjust_freq(u32 mult_ce_raw, u32 mult_cs_mono, + u32 mult_cs_raw) +{ + u64 adj; + int sign; + + if (mult_cs_raw >= mult_cs_mono) { + sign = 0; + adj = mult_cs_raw - mult_cs_mono; + } else { + sign = 1; + adj = mult_cs_mono - mult_cs_raw; + } + + adj *= mult_ce_raw; + adj += mult_cs_mono / 2; + do_div(adj, mult_cs_mono); + + if (!sign) { + if (U32_MAX - mult_ce_raw < adj) + return U32_MAX; + return mult_ce_raw + (u32)adj; + } + if (adj >= mult_ce_raw) + return 1; + return mult_ce_raw - (u32)adj; +} + +void __clockevents_adjust_freq(struct clock_event_device *dev) +{ + u32 mult_cs_mono, mult_cs_raw; + + if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT)) + return; + + timekeeping_get_mono_mult(&mult_cs_mono, &mult_cs_raw); + dev->mult_mono = __clockevents_calc_adjust_freq(dev->mult, + mult_cs_mono, + mult_cs_raw); +} + int __clockevents_update_freq(struct clock_event_device *dev, u32 freq) { clockevents_config(dev, freq); + __clockevents_adjust_freq(dev); if (clockevent_state_oneshot(dev)) return clockevents_program_event(dev, dev->next_event, false); diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h index f738251..0b29d23 100644 --- a/kernel/time/tick-internal.h +++ b/kernel/time/tick-internal.h @@ -56,6 +56,7 @@ extern int clockevents_program_event(struct clock_event_device *dev, ktime_t expires, bool force); extern void clockevents_handle_noop(struct clock_event_device *dev); extern int __clockevents_update_freq(struct clock_event_device *dev, u32 freq); +extern void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw); extern ssize_t sysfs_get_uname(const char *buf, char *dst, size_t cnt); /* Broadcasting support */ diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index dcd5ce6..a011ae1 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -330,6 +330,14 @@ static inline s64 timekeeping_cycles_to_ns(struct tk_read_base *tkr, return timekeeping_delta_to_ns(tkr, delta); } +void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw) +{ + struct tk_read_base *tkr_mono = &tk_core.timekeeper.tkr_mono; + + *mult_cs_mono = tkr_mono->mult; + *mult_cs_raw = tkr_mono->clock->mult; +} + /** * update_fast_timekeeper - Update the fast and NMI safe monotonic timekeeper. * @tkr: Timekeeping readout base from which we take the update -- 2.9.0