All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: Nick Piggin <npiggin@gmail.com>, Tejun Heo <tj@kernel.org>,
	Leonardo Bras <leobras@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	linux-kernel@vger.kernel.org, Junyao Zhao <junzhao@redhat.com>,
	Chris von Recklinghausen <crecklin@redhat.com>
Subject: Re: Nohz_full on boot CPU is broken (was: Re: [PATCH v2 1/1] wq: Avoid using isolated cpus' timers on queue_delayed_work)
Date: Sun, 7 Apr 2024 15:52:48 +0200	[thread overview]
Message-ID: <20240407135248.GB10796@redhat.com> (raw)
In-Reply-To: <20240407130914.GA10796@redhat.com>

On 04/07, Oleg Nesterov wrote:
>
> On 04/05, Frederic Weisbecker wrote:
> >
> > +Cc Nick
> >
> > Le Fri, Apr 05, 2024 at 04:04:49PM +0200, Oleg Nesterov a écrit :
> > > On 04/03, Oleg Nesterov wrote:
> > > >
> > > > > > OTOH, Documentation/timers/no_hz.rst says
> > > > > >
> > > > > > 	Therefore, the
> > > > > > 	boot CPU is prohibited from entering adaptive-ticks mode.  Specifying a
> > > > > > 	"nohz_full=" mask that includes the boot CPU will result in a boot-time
> > > > > > 	error message, and the boot CPU will be removed from the mask.
> > > > > >
> > > > > > and this doesn't match the reality.
> > > > >
> > > > > Don't some archs allow the boot CPU to go down too tho? If so, this doesn't
> > > > > really solve the problem, right?
> > > >
> > > > I do not know. But I thought about this too.
> > > >
> > > > In the context of this discussion we do not care if the boot CPU goes down.
> > > > But we need at least one housekeeping CPU after cpu_down(). The comment in
> > > > cpu_down_maps_locked() says
> > > >
> > > > 	Also keep at least one housekeeping cpu onlined
> > > >
> > > > but it checks HK_TYPE_DOMAIN, and I do not know (and it is too late for me
> > > > to try to read the code ;) if housekeeping.cpumasks[HK_TYPE_TIMER] can get
> > > > empty or not.
> > >
> > > This nearly killed me, but I managed to convince myself we shouldn't worry
> > > about cpu_down().
> > >
> > > HK_FLAG_TIMER implies HK_FLAG_TICK.
> > >
> > > HK_FLAG_TICK implies tick_nohz_full_setup() which sets
> > > tick_nohz_full_mask = non_housekeeping_mask.
> > >
> > > When tick_setup_device() is called on a housekeeping CPU it does
> > >
> > > 	else if (tick_do_timer_boot_cpu != -1 &&
> > > 					!tick_nohz_full_cpu(cpu)) {
> > > 		tick_take_do_timer_from_boot();
> > > 		tick_do_timer_boot_cpu = -1;
> > >
> > >
> > > 	and this sets tick_do_timer_cpu = first-housekeeping-cpu.
> > >
> > > cpu_down(tick_do_timer_cpu) will fail, tick_nohz_cpu_down() will nack it.
> > >
> > > So cpu_down() can't make housekeeping.cpumasks[HK_FLAG_TIMER] empty and I
> > > still think that the change below is the right approach.
> > >
> > > But probably WARN_ON() in housekeeping_any_cpu() makes sense anyway.
> > >
> > > What do you think?
> >
> > Good analysis on this nasty housekeeping VS tick code. I promised so many
> > times to cleanup this mess but things keep piling up.
> >
> > It is indeed possible for the boot CPU to be a nohz_full CPU and as
> > you can see, it's only half-working. This is so ever since:
> >
> >     08ae95f4fd3b (nohz_full: Allow the boot CPU to be nohz_full)
>
> Thanks... So this is intentional. I was confused by
>
> 	Therefore, the
> 	boot CPU is prohibited from entering adaptive-ticks mode.  Specifying a
> 	"nohz_full=" mask that includes the boot CPU will result in a boot-time
> 	error message, and the boot CPU will be removed from the mask.
>
> from Documentation/timers/no_hz.rst
>
> > I would love
> > to revert that now but I don't know if anyone uses this and have it working
> > by chance somewhere... Should we continue to support a broken feature? Can we
> > break user ABI if it's already half-broken?
>
> Well, the changelog says
>
>     nohz_full has been trialed at a large supercomputer site and found to
>     significantly reduce jitter. In order to deploy it in production, they
>     need CPU0 to be nohz_full
>
> so I guess this feature has users.
>
> But after the commit aae17ebb53cd3da ("workqueue: Avoid using isolated cpus'
> timers on queue_delayed_work") the kernel will crash at boot time if the boot
> CPU is nohz_full.
>
> So we need a workaround at least. I am starting to think I will send a trivial
> patch which changes __queue_delayed_work() to validate the cpu returned by
> housekeeping_any_cpu(HK_TYPE_TIMER).
>
> But perhaps something like below makes more sense as a (stupid) workaround?

Or simply


diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 373d42c707bc..e912555c6fc8 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -46,7 +46,11 @@ int housekeeping_any_cpu(enum hk_type type)
 			if (cpu < nr_cpu_ids)
 				return cpu;
 
-			return cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
+			cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
+			if (cpu < nr_cpu_ids)
+				return cpu;
+
+			WARN_ON_ONCE(system_state == SYSTEM_RUNNING);
 		}
 	}
 	return smp_processor_id();


  reply	other threads:[~2024-04-07 13:54 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-30  1:00 [PATCH v2 1/1] wq: Avoid using isolated cpus' timers on queue_delayed_work Leonardo Bras
2024-01-30  1:22 ` Tejun Heo
2024-01-30  2:58   ` Leonardo Bras
2024-04-02 10:58 ` Oleg Nesterov
2024-04-03 19:12   ` Tejun Heo
2024-04-03 20:38     ` Oleg Nesterov
2024-04-05 14:04       ` Oleg Nesterov
2024-04-05 15:38         ` Tejun Heo
2024-04-05 22:03           ` Frederic Weisbecker
2024-04-05 21:52         ` Nohz_full on boot CPU is broken (was: Re: [PATCH v2 1/1] wq: Avoid using isolated cpus' timers on queue_delayed_work) Frederic Weisbecker
2024-04-07 13:09           ` Oleg Nesterov
2024-04-07 13:52             ` Oleg Nesterov [this message]
2024-04-09 12:05               ` Frederic Weisbecker
2024-04-09 12:04             ` Frederic Weisbecker
2024-04-09 13:07               ` Oleg Nesterov
2024-04-09 13:59                 ` Frederic Weisbecker
2024-04-10  4:26                 ` Nicholas Piggin
2024-04-10 13:55                   ` Oleg Nesterov
2024-04-11 13:41                     ` Oleg Nesterov
2024-04-11 14:39   ` [PATCH] sched/isolation: fix boot crash when the boot CPU is nohz_full Oleg Nesterov
2024-04-11 16:59     ` Oleg Nesterov
2024-04-13 14:17     ` [PATCH] sched/isolation: fix boot crash when maxcpus < first-housekeeping-cpu Oleg Nesterov
2024-04-18 14:54       ` Phil Auld
2024-04-18 15:40       ` Frederic Weisbecker
2024-04-24 20:05       ` [tip: sched/urgent] sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU tip-bot2 for Oleg Nesterov
2024-04-28  8:13         ` Ingo Molnar
2024-04-28 13:16           ` Oleg Nesterov
2024-04-28  8:24       ` tip-bot2 for Oleg Nesterov
2024-04-15 21:37     ` [PATCH] sched/isolation: fix boot crash when the boot CPU is nohz_full Frederic Weisbecker
2024-04-18 14:50     ` Phil Auld
2024-04-22 18:50       ` Oleg Nesterov
2024-04-24 14:42         ` Phil Auld
2024-04-24 20:05     ` [tip: sched/urgent] sched/isolation: {revent " tip-bot2 for Oleg Nesterov
2024-04-24 20:41       ` Phil Auld
2024-04-28  8:14         ` Ingo Molnar
2024-04-29 11:50           ` Phil Auld
2024-04-28  8:24     ` [tip: sched/urgent] sched/isolation: Prevent " tip-bot2 for Oleg Nesterov
2024-10-04 12:37 ` [PATCH v2 1/1] wq: Avoid using isolated cpus' timers on queue_delayed_work Frederic Weisbecker
2024-10-04 22:43   ` Leonardo Bras Soares Passos
2024-10-05 11:15     ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240407135248.GB10796@redhat.com \
    --to=oleg@redhat.com \
    --cc=crecklin@redhat.com \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=junzhao@redhat.com \
    --cc=leobras@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.