From: Thomas Gleixner <tglx@linutronix.de>
To: Oleg Nesterov <oleg@redhat.com>,
Frederic Weisbecker <frederic@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>,
Nicholas Piggin <npiggin@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Phil Auld <pauld@redhat.com>,
Chris von Recklinghausen <crecklin@redhat.com>,
linux-kernel@vger.kernel.org
Subject: Re: sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled
Date: Sat, 25 May 2024 00:06:06 +0200 [thread overview]
Message-ID: <87v832dfw1.ffs@tglx> (raw)
In-Reply-To: <20240524183700.GA17065@redhat.com>
On Fri, May 24 2024 at 20:37, Oleg Nesterov wrote:
> I've already had a few beers today, I know I'll regret about this
> email tomorrow, but I can't resist ;)
You won't regret it. :)
> On 05/24, Frederic Weisbecker wrote:
> But again, again. tick_sched_do_timer() says
>
> * If nohz_full is enabled, this should not happen because the
> * 'tick_do_timer_cpu' CPU never relinquishes.
>
> so I guess it is not supposed to happen?
Right. It does not happen because the kernel starts with jiffies as
clocksource except on S390. The jiffies clocksource is not qualified to
switch over to NOHZ mode for obvious reasons. But even on S390 which has
a truly usable and useful clocksource the tick stays periodic to begin
with. Why?
The NOHZ ready notification happens late in the boot process via:
fs_initcall(clocksource_done_booting)
So by the time that happens, the secondary CPUs are up and have taken
over the do timer duty.
[ 0.600381] smp: Bringing up secondary CPUs ...
....
[ 1.917842] clocksource: Switched to clocksource kvm-clock
[ 1.918548] clocksource_done_booting: Switched to NOHZ // debug printk
This is the point where tick_nohz_activate() is called first time and
that does:
tick_sched_flag_set(ts, TS_FLAG_NOHZ);
So up to this point the tick is never stopped neither on housekeeping
nor on NOHZ FULL CPUs:
tick_nohz_full_update_tick()
if (!tick_sched_flag_test(ts, TS_FLAG_NOHZ))
return;
> And. My main question was: how can smp_call_function_single() help???
It's useless.
> Why do we actually need it?
We do not.
As explained above there is also nothing extra to fix contrary to
Frederics fears.
Even in the case that a command line limitation restricts the number of
CPUs such that there is no housekeeping CPU onlined during
smp_init(). That is checked in the isolation init code which clears
nohz_full_running in that case. Nothing to see there either.
So all this needs is the simple:
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index d88b13076b79..dab17d756fd8 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -229,11 +209,9 @@ static void tick_setup_device(struct tick_device *td,
if (tick_nohz_full_cpu(cpu))
tick_do_timer_boot_cpu = cpu;
- } else if (tick_do_timer_boot_cpu != -1 &&
- !tick_nohz_full_cpu(cpu)) {
- tick_take_do_timer_from_boot();
+ } else if (tick_do_timer_boot_cpu != -1 && !tick_nohz_full_cpu(cpu)) {
+ WRITE_ONCE(tick_do_timer_cpu, cpu);
tick_do_timer_boot_cpu = -1;
- WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
#endif
}
along with the removal of the SMP function call voodoo programming gunk,
a lengthy changelog and a bunch of useful comments.
Changing the horribly lazy and incomprehensible '-1' to an actual
meaningful define, e.g. TICK_DO_TIMER_NONE, would definitely help along
with renaming the variable to tick_do_timer_nohz_full_boot_cpu.
There is no race other than the boot CPU reading tick_do_timer_cpu
concurrently to the update, but that's completely harmless whatever it
sees there. If it's the boot CPU, i.e. 0, or the secondary does not
matter. The secondary immediately schedules the tick unconditionally so
timekeeping and jiffies will just work.
If the secondary CPU fails to come up after it installed the clock event
device then the missing tick is the least of the problems.
That has absolutely nothing to do with the issue at hand. If the CPU
which owns tick_do_timer_cpu dies or gets stuck then all bets are off
independent of NOHZ FULL. See the changes which went in during the merge
window to handle the case where the hypervisor fails to inject the timer
interrupts or keeps the time keeper duty CPU scheduled out for a long
period of time....
Thanks,
tglx
next prev parent reply other threads:[~2024-05-24 22:06 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-22 15:17 sched/isolation: tick_take_do_timer_from_boot() calls smp_call_function_single() with irqs disabled Oleg Nesterov
2024-05-23 13:23 ` Oleg Nesterov
2024-05-24 9:31 ` Thomas Gleixner
2024-05-24 14:10 ` Oleg Nesterov
2024-05-24 15:22 ` Frederic Weisbecker
2024-05-24 15:20 ` Frederic Weisbecker
2024-05-24 17:16 ` Thomas Gleixner
2024-05-24 18:37 ` Oleg Nesterov
2024-05-24 22:06 ` Thomas Gleixner [this message]
2024-05-25 13:51 ` Oleg Nesterov
2024-05-25 14:13 ` Oleg Nesterov
2024-05-26 19:27 ` Oleg Nesterov
2024-05-26 20:52 ` Frederic Weisbecker
2024-05-27 15:57 ` Oleg Nesterov
2024-05-27 11:01 ` Nicholas Piggin
2024-05-27 15:57 ` Oleg Nesterov
2024-05-28 1:02 ` Nicholas Piggin
2024-05-28 12:19 ` Oleg Nesterov
2024-05-27 16:13 ` Thomas Gleixner
2024-05-26 20:57 ` Frederic Weisbecker
2024-05-27 9:10 ` Nicholas Piggin
2024-05-27 10:23 ` Thomas Gleixner
2024-05-27 11:16 ` Nicholas Piggin
2024-05-28 12:20 ` [PATCH] tick/nohz_full: don't abuse smp_call_function_single() in tick_setup_device() Oleg Nesterov
2024-05-28 12:22 ` Oleg Nesterov
2024-05-30 12:40 ` [PATCH] tick/nohz_full: turn tick_do_timer_boot_cpu into boot_cpu_is_nohz_full Oleg Nesterov
2024-06-03 15:35 ` [PATCH v2] " Oleg Nesterov
2024-06-03 21:44 ` Frederic Weisbecker
2024-06-04 5:08 ` Nicholas Piggin
2024-05-30 14:52 ` [PATCH] tick/nohz_full: don't abuse smp_call_function_single() in tick_setup_device() Frederic Weisbecker
2024-05-30 16:52 ` Oleg Nesterov
2024-05-30 17:01 ` Oleg Nesterov
2024-06-01 14:03 ` Oleg Nesterov
2024-06-02 21:29 ` Frederic Weisbecker
2024-06-03 15:41 ` Oleg Nesterov
2024-06-03 21:45 ` Frederic Weisbecker
2024-06-10 15:55 ` [PING ;)] " Oleg Nesterov
2024-06-10 18:15 ` Thomas Gleixner
2024-06-10 18:26 ` [tip: timers/urgent] tick/nohz_full: Don't " tip-bot2 for Oleg Nesterov
2024-06-10 19:42 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v832dfw1.ffs@tglx \
--to=tglx@linutronix.de \
--cc=crecklin@redhat.com \
--cc=frederic@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=npiggin@gmail.com \
--cc=oleg@redhat.com \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox