From: Joel Fernandes <joel@joelfernandes.org>
To: linux-kernel@vger.kernel.org,
Anna-Maria Behnsen <anna-maria@linutronix.de>,
Frederic Weisbecker <frederic@kernel.org>,
Ingo Molnar <mingo@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC 3/3] tick-sched: Replace jiffie readout with idle_entrytime
Date: Sun, 10 Nov 2024 22:55:45 +0000 [thread overview]
Message-ID: <20241110225545.GA1579217@google.com> (raw)
In-Reply-To: <20241108174839.1016424-4-joel@joelfernandes.org>
On Fri, Nov 08, 2024 at 05:48:36PM +0000, Joel Fernandes (Google) wrote:
> This solves the issue where jiffies can be stale and inaccurate.
>
> Putting some prints, I see that basemono can be quite stale:
> tick_nohz_next_event: basemono=18692000000 basemono_from_idle_entrytime=18695000000
>
> Since we have 'now' in ts->idle_entrytime, we can just use that. It is
> more accurate, cleaner, reduces lines of code and reduces any lock
> contention with the seq locks.
>
> I was also concerned about issue where jiffies is not updated for a long
> time, and then we receive a non-tick interrupt in the future. Relying on
> stale jiffies value and using that as base can be inaccurate to determine
> whether next event occurs within next tick. Fix that.
>
> XXX: Need to fix issue in idle accounting which does 'jiffies -
> idle_entrytime'. If idle_entrytime is more current than jiffies, it
> could cause negative values. I could replace jiffies with idle_exittime
> in this computation potentially to fix that.
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
> kernel/time/tick-sched.c | 27 +++++++--------------------
> 1 file changed, 7 insertions(+), 20 deletions(-)
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 4aa64266f2b0..22a4f96d9585 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -860,24 +860,6 @@ static inline bool local_timer_softirq_pending(void)
> return local_softirq_pending() & BIT(TIMER_SOFTIRQ);
> }
>
> -/*
> - * Read jiffies and the time when jiffies were updated last
> - */
> -u64 get_jiffies_update(unsigned long *basej)
> -{
> - unsigned long basejiff;
> - unsigned int seq;
> - u64 basemono;
> -
> - do {
> - seq = read_seqcount_begin(&jiffies_seq);
> - basemono = last_jiffies_update;
> - basejiff = jiffies;
> - } while (read_seqcount_retry(&jiffies_seq, seq));
> - *basej = basejiff;
> - return basemono;
> -}
> -
> /**
> * tick_nohz_next_event() - return the clock monotonic based next event
> * @ts: pointer to tick_sched struct
> @@ -887,14 +869,19 @@ u64 get_jiffies_update(unsigned long *basej)
> * *%0 - When the next event is a maximum of TICK_NSEC in the future
> * and the tick is not stopped yet
> * *%next_event - Next event based on clock monotonic
> + *
> + * Note: ts->idle_entrytime is updated with 'now' via tick_nohz_idle_enter().
> */
> static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
> {
> - u64 basemono, next_tick, delta, expires, delta_hr, next_hr_wo;
> + u64 basemono, next_tick, delta, expires, delta_hr, next_hr_wo, boot_ticks;
> unsigned long basejiff;
> int tick_cpu;
>
> - basemono = get_jiffies_update(&basejiff);
> + boot_ticks = DIV_ROUND_DOWN_ULL(ts->idle_entrytime, TICK_NSEC);
> + basejiff = boot_ticks + INITIAL_JIFFIES;
> + basemono = boot_ticks * TICK_NSEC;
> +
There is a bug here, I end up overcounting basejiff. I did something like
this and it now makes basejiff equivalent to the previous code so should be
good. I'll work more on it this week...
─╯
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index d88b13076b79..5387c67eea7a 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -34,6 +34,8 @@ DEFINE_PER_CPU(struct tick_device, tick_cpu_device);
*/
ktime_t tick_next_period;
+ktime_t tick_first_period;
+
/*
* tick_do_timer_cpu is a timer core internal variable which holds the CPU NR
* which is responsible for calling do_timer(), i.e. the timekeeping stuff. This
@@ -219,6 +221,7 @@ static void tick_setup_device(struct tick_device *td,
if (READ_ONCE(tick_do_timer_cpu) == TICK_DO_TIMER_BOOT) {
WRITE_ONCE(tick_do_timer_cpu, cpu);
tick_next_period = ktime_get();
+ tick_first_period = tick_next_period;
#ifdef CONFIG_NO_HZ_FULL
/*
* The boot CPU may be nohz_full, in which case set
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 5f2105e637bd..a15721516a85 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -20,6 +20,7 @@ struct timer_events {
DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
extern ktime_t tick_next_period;
+extern ktime_t tick_first_period;
extern int tick_do_timer_cpu __read_mostly;
extern void tick_setup_periodic(struct clock_event_device *dev, int broadcast);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 8a245f8ceb56..8fdfda4b8af3 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -895,11 +896,23 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
u64 basemono, next_tick, delta, expires, delta_hr, next_hr_wo, boot_ticks;
unsigned long basejiff;
int tick_cpu;
boot_ticks = DIV_ROUND_DOWN_ULL(ts->idle_entrytime, TICK_NSEC);
basejiff = boot_ticks + INITIAL_JIFFIES;
basemono = boot_ticks * TICK_NSEC;
+ /*
+ * There is some time that passes between when clocksource starts and the
+ * first time tick device is setup. Offset basejiff by that.
+ */
+ basejiff -= DIV_ROUND_DOWN_ULL(tick_first_period, TICK_NSEC);
+
ts->last_jiffies = basejiff;
ts->timer_expires_base = basemono;
next prev parent reply other threads:[~2024-11-10 22:55 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-08 17:48 [RFC 0/3] tick-sched cleanups Joel Fernandes (Google)
2024-11-08 17:48 ` [RFC 1/3] tick-sched: Remove last_tick and calculate next tick from now Joel Fernandes (Google)
2024-11-11 23:43 ` Frederic Weisbecker
2024-11-12 13:46 ` Thomas Gleixner
2024-11-12 13:56 ` Frederic Weisbecker
2024-11-12 18:20 ` Joel Fernandes
2024-11-12 18:33 ` Joel Fernandes
2024-11-13 12:40 ` Frederic Weisbecker
2024-11-13 21:40 ` Joel Fernandes
2024-11-08 17:48 ` [RFC 2/3] tick-sched: Keep tick on if hrtimer is due imminently Joel Fernandes (Google)
2024-11-11 12:37 ` Christian Loehle
2024-11-11 15:56 ` Joel Fernandes
2024-11-11 16:55 ` Christian Loehle
2024-11-11 17:17 ` Joel Fernandes
2024-11-08 17:48 ` [RFC 3/3] tick-sched: Replace jiffie readout with idle_entrytime Joel Fernandes (Google)
2024-11-10 22:55 ` Joel Fernandes [this message]
2024-11-12 14:48 ` Thomas Gleixner
2024-11-13 21:46 ` Joel Fernandes
2024-11-11 22:25 ` Frederic Weisbecker
2024-11-13 22:39 ` Joel Fernandes
2024-11-12 14:30 ` Thomas Gleixner
2024-11-13 22:18 ` Joel Fernandes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241110225545.GA1579217@google.com \
--to=joel@joelfernandes.org \
--cc=anna-maria@linutronix.de \
--cc=frederic@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.