public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Arjan van de Ven <arjan@infradead.org>, Chris Mason <clm@fb.com>,
	Eric Dumazet <edumazet@google.com>,
	George Spelvin <linux@sciencehorizons.net>,
	Josh Triplett <josh@joshtriplett.org>,
	Len Brown <lenb@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Rik van Riel <riel@surriel.com>,
	Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [Question] timers: trigger_dyntick_cpu() vs TIMER_DEFERRABLE
Date: Mon, 25 Jul 2022 12:43:56 +0200	[thread overview]
Message-ID: <20220725104356.GA2950296@lothringen> (raw)
In-Reply-To: <xhsmhedy9fsg5.mognet@vschneid.remote.csb>

On Mon, Jul 25, 2022 at 10:32:42AM +0100, Valentin Schneider wrote:
> Hi,
> 
> I've been incidentally staring at some NOHZ bits related to the timer
> wheels, and trigger_dyntick_cpu() confuses me:
> 
>   static void
>   trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
>   {
>           [...]
>           /*
>            * TODO: This wants some optimizing similar to the code below, but we
>            * will do that when we switch from push to pull for deferrable timers.
>            */
>           if ((timer->flags & TIMER_DEFERRABLE)) {
>                   if (tick_nohz_full_cpu(base->cpu))
>                           wake_up_nohz_cpu(base->cpu);
>                   return;
>           }
>           [...]
>   }
> 
> From what I grok out of get_nohz_timer_target(), under
> timers_migration_enabled we should migrate the timer to an non-idle CPU
> (or at the very least a non-isolated CPU) *before* enqueuing the
> timer.

That's not always the case. For example TIMER_PINNED timers might have
to run on a buzy or isolated CPU.

And note that even when (base->cpu == smp_processor_id()) we want to kick
the current CPU with a self-IPI. This way we force, from IRQ-tail, the
tick to recalculate the next deadline to fire, considering the new enqueued
timer callback.

> Without timers_migration_enabled (or if TIMER_PINNED), I don't see
> anything that could migrate the timer elsewhere, so:
> 
> Why bother kicking a NOHZ CPU for a deferrable timer if it is the next
> expiring one? Per the definition:
> 
>  * @TIMER_DEFERRABLE: A deferrable timer will work normally when the
>  * system is busy, but will not cause a CPU to come out of idle just
>  * to service it; instead, the timer will be serviced when the CPU
>  * eventually wakes up with a subsequent non-deferrable timer.
> 
> I tried to find some discussion over this in LKML, but found nothing.
> v3 of the patch did *not* kick a CPU for a deferrable timer, but v4 (the
> one that ended up merged) did (see below). Patch in question is:
> 
>   a683f390b93f ("timers: Forward the wheel clock whenever possible")

Because TIMER_DEFERRABLE timers should only be deferred when the CPU is
in "nohz-idle". If the CPU runs an actual task with the tick shutdown
("nohz-full"), we should execute those deferrable timers.

Now that's the theory. In practice the deferrable timers are ignored by
both nohz-idle and nohz-full when it comes to compute the next nohz delta.
This is a mistake that is there since the introduction of nohz-full but I've
always been scared to break some user setup while fixing it. Anyway things
should look like this (untested):

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index de192dcff828..5f8ef777a785 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -819,7 +819,7 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
 		 * disabled this also looks at the next expiring
 		 * hrtimer.
 		 */
-		next_tick = get_next_timer_interrupt(basejiff, basemono);
+		next_tick = get_next_timer_interrupt(basejiff, basemono, ts->inidle);
 		ts->next_timer = next_tick;
 	}
 
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 717fcb9fb14a..8279d4e9b7a0 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -574,16 +574,6 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
 	if (!is_timers_nohz_active())
 		return;
 
-	/*
-	 * TODO: This wants some optimizing similar to the code below, but we
-	 * will do that when we switch from push to pull for deferrable timers.
-	 */
-	if (timer->flags & TIMER_DEFERRABLE) {
-		if (tick_nohz_full_cpu(base->cpu))
-			wake_up_nohz_cpu(base->cpu);
-		return;
-	}
-
 	/*
 	 * We might have to IPI the remote CPU if the base is idle and the
 	 * timer is not deferrable. If the other CPU is on the way to idle
@@ -1678,17 +1668,9 @@ static u64 cmp_next_hrtimer_event(u64 basem, u64 expires)
 	return DIV_ROUND_UP_ULL(nextevt, TICK_NSEC) * TICK_NSEC;
 }
 
-/**
- * get_next_timer_interrupt - return the time (clock mono) of the next timer
- * @basej:	base time jiffies
- * @basem:	base time clock monotonic
- *
- * Returns the tick aligned clock monotonic time of the next pending
- * timer or KTIME_MAX if no timer is pending.
- */
-u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
+static u64 get_next_base_interrupt(struct timer_base *base,
+				   unsigned long basej, u64 basem)
 {
-	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
 	u64 expires = KTIME_MAX;
 	unsigned long nextevt;
 
@@ -1734,6 +1716,32 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
 	}
 	raw_spin_unlock(&base->lock);
 
+	return expires;
+}
+
+/**
+ * get_next_timer_interrupt - return the time (clock mono) of the next timer
+ * @basej:	base time jiffies
+ * @basem:	base time clock monotonic
+ * @idle:	is the CPU idle?
+ *
+ * Returns the tick aligned clock monotonic time of the next pending
+ * timer or KTIME_MAX if no timer is pending.
+ */
+u64 get_next_timer_interrupt(unsigned long basej, u64 basem, bool idle)
+{
+	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
+	u64 expires;
+
+	expires = get_next_base_interrupt(base, basej, basem);
+	if (!idle) {
+		u64 expires_def;
+
+		base = this_cpu_ptr(&timer_bases[BASE_DEF]);
+		expires_def = get_next_base_interrupt(base, basej, basem);
+		expires = min(expires, expires_def);
+	}
+
 	return cmp_next_hrtimer_event(basem, expires);
 }
 
@@ -1744,15 +1752,15 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
  */
 void timer_clear_idle(void)
 {
-	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
-
 	/*
 	 * We do this unlocked. The worst outcome is a remote enqueue sending
 	 * a pointless IPI, but taking the lock would just make the window for
 	 * sending the IPI a few instructions smaller for the cost of taking
 	 * the lock in the exit from idle path.
 	 */
-	base->is_idle = false;
+	__this_cpu_write(timer_bases[BASE_STD].is_idle, false);
+	if (tick_nohz_full_cpu(smp_processor_id()))
+		__this_cpu_write(timer_bases[BASE_DEF].is_idle, false);
 }
 #endif
 

  reply	other threads:[~2022-07-25 10:44 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-25  9:32 [Question] timers: trigger_dyntick_cpu() vs TIMER_DEFERRABLE Valentin Schneider
2022-07-25 10:43 ` Frederic Weisbecker [this message]
2022-07-25 15:00   ` Valentin Schneider
2022-07-25 15:18     ` Arjan van de Ven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220725104356.GA2950296@lothringen \
    --to=frederic@kernel.org \
    --cc=arjan@infradead.org \
    --cc=clm@fb.com \
    --cc=edumazet@google.com \
    --cc=josh@joshtriplett.org \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@sciencehorizons.net \
    --cc=mtosatti@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=tglx@linutronix.de \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox