From: Joel Fernandes <joel@joelfernandes.org>
To: linux-kernel@vger.kernel.org,
Anna-Maria Behnsen <anna-maria@linutronix.de>,
Frederic Weisbecker <frederic@kernel.org>,
Ingo Molnar <mingo@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC 3/3] tick-sched: Replace jiffie readout with idle_entrytime
Date: Sun, 10 Nov 2024 22:55:45 +0000 [thread overview]
Message-ID: <20241110225545.GA1579217@google.com> (raw)
In-Reply-To: <20241108174839.1016424-4-joel@joelfernandes.org>
On Fri, Nov 08, 2024 at 05:48:36PM +0000, Joel Fernandes (Google) wrote:
> This solves the issue where jiffies can be stale and inaccurate.
>
> Putting some prints, I see that basemono can be quite stale:
> tick_nohz_next_event: basemono=18692000000 basemono_from_idle_entrytime=18695000000
>
> Since we have 'now' in ts->idle_entrytime, we can just use that. It is
> more accurate, cleaner, reduces lines of code and reduces any lock
> contention with the seq locks.
>
> I was also concerned about issue where jiffies is not updated for a long
> time, and then we receive a non-tick interrupt in the future. Relying on
> stale jiffies value and using that as base can be inaccurate to determine
> whether next event occurs within next tick. Fix that.
>
> XXX: Need to fix issue in idle accounting which does 'jiffies -
> idle_entrytime'. If idle_entrytime is more current than jiffies, it
> could cause negative values. I could replace jiffies with idle_exittime
> in this computation potentially to fix that.
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
> kernel/time/tick-sched.c | 27 +++++++--------------------
> 1 file changed, 7 insertions(+), 20 deletions(-)
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 4aa64266f2b0..22a4f96d9585 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -860,24 +860,6 @@ static inline bool local_timer_softirq_pending(void)
> return local_softirq_pending() & BIT(TIMER_SOFTIRQ);
> }
>
> -/*
> - * Read jiffies and the time when jiffies were updated last
> - */
> -u64 get_jiffies_update(unsigned long *basej)
> -{
> - unsigned long basejiff;
> - unsigned int seq;
> - u64 basemono;
> -
> - do {
> - seq = read_seqcount_begin(&jiffies_seq);
> - basemono = last_jiffies_update;
> - basejiff = jiffies;
> - } while (read_seqcount_retry(&jiffies_seq, seq));
> - *basej = basejiff;
> - return basemono;
> -}
> -
> /**
> * tick_nohz_next_event() - return the clock monotonic based next event
> * @ts: pointer to tick_sched struct
> @@ -887,14 +869,19 @@ u64 get_jiffies_update(unsigned long *basej)
> * *%0 - When the next event is a maximum of TICK_NSEC in the future
> * and the tick is not stopped yet
> * *%next_event - Next event based on clock monotonic
> + *
> + * Note: ts->idle_entrytime is updated with 'now' via tick_nohz_idle_enter().
> */
> static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
> {
> - u64 basemono, next_tick, delta, expires, delta_hr, next_hr_wo;
> + u64 basemono, next_tick, delta, expires, delta_hr, next_hr_wo, boot_ticks;
> unsigned long basejiff;
> int tick_cpu;
>
> - basemono = get_jiffies_update(&basejiff);
> + boot_ticks = DIV_ROUND_DOWN_ULL(ts->idle_entrytime, TICK_NSEC);
> + basejiff = boot_ticks + INITIAL_JIFFIES;
> + basemono = boot_ticks * TICK_NSEC;
> +
There is a bug here, I end up overcounting basejiff. I did something like
this and it now makes basejiff equivalent to the previous code so should be
good. I'll work more on it this week...
─╯
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index d88b13076b79..5387c67eea7a 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -34,6 +34,8 @@ DEFINE_PER_CPU(struct tick_device, tick_cpu_device);
*/
ktime_t tick_next_period;
+ktime_t tick_first_period;
+
/*
* tick_do_timer_cpu is a timer core internal variable which holds the CPU NR
* which is responsible for calling do_timer(), i.e. the timekeeping stuff. This
@@ -219,6 +221,7 @@ static void tick_setup_device(struct tick_device *td,
if (READ_ONCE(tick_do_timer_cpu) == TICK_DO_TIMER_BOOT) {
WRITE_ONCE(tick_do_timer_cpu, cpu);
tick_next_period = ktime_get();
+ tick_first_period = tick_next_period;
#ifdef CONFIG_NO_HZ_FULL
/*
* The boot CPU may be nohz_full, in which case set
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 5f2105e637bd..a15721516a85 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -20,6 +20,7 @@ struct timer_events {
DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
extern ktime_t tick_next_period;
+extern ktime_t tick_first_period;
extern int tick_do_timer_cpu __read_mostly;
extern void tick_setup_periodic(struct clock_event_device *dev, int broadcast);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 8a245f8ceb56..8fdfda4b8af3 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -895,11 +896,23 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
u64 basemono, next_tick, delta, expires, delta_hr, next_hr_wo, boot_ticks;
unsigned long basejiff;
int tick_cpu;
boot_ticks = DIV_ROUND_DOWN_ULL(ts->idle_entrytime, TICK_NSEC);
basejiff = boot_ticks + INITIAL_JIFFIES;
basemono = boot_ticks * TICK_NSEC;
+ /*
+ * There is some time that passes between when clocksource starts and the
+ * first time tick device is setup. Offset basejiff by that.
+ */
+ basejiff -= DIV_ROUND_DOWN_ULL(tick_first_period, TICK_NSEC);
+
ts->last_jiffies = basejiff;
ts->timer_expires_base = basemono;
next prev parent reply other threads:[~2024-11-10 22:55 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-08 17:48 [RFC 0/3] tick-sched cleanups Joel Fernandes (Google)
2024-11-08 17:48 ` [RFC 1/3] tick-sched: Remove last_tick and calculate next tick from now Joel Fernandes (Google)
2024-11-11 23:43 ` Frederic Weisbecker
2024-11-12 13:46 ` Thomas Gleixner
2024-11-12 13:56 ` Frederic Weisbecker
2024-11-12 18:20 ` Joel Fernandes
2024-11-12 18:33 ` Joel Fernandes
2024-11-13 12:40 ` Frederic Weisbecker
2024-11-13 21:40 ` Joel Fernandes
2024-11-08 17:48 ` [RFC 2/3] tick-sched: Keep tick on if hrtimer is due imminently Joel Fernandes (Google)
2024-11-11 12:37 ` Christian Loehle
2024-11-11 15:56 ` Joel Fernandes
2024-11-11 16:55 ` Christian Loehle
2024-11-11 17:17 ` Joel Fernandes
2024-11-08 17:48 ` [RFC 3/3] tick-sched: Replace jiffie readout with idle_entrytime Joel Fernandes (Google)
2024-11-10 22:55 ` Joel Fernandes [this message]
2024-11-12 14:48 ` Thomas Gleixner
2024-11-13 21:46 ` Joel Fernandes
2024-11-11 22:25 ` Frederic Weisbecker
2024-11-13 22:39 ` Joel Fernandes
2024-11-12 14:30 ` Thomas Gleixner
2024-11-13 22:18 ` Joel Fernandes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241110225545.GA1579217@google.com \
--to=joel@joelfernandes.org \
--cc=anna-maria@linutronix.de \
--cc=frederic@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox