All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>,
	Tejun Heo <tj@kernel.org>, Leonardo Bras <leobras@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	linux-kernel@vger.kernel.org, Junyao Zhao <junzhao@redhat.com>,
	Chris von Recklinghausen <crecklin@redhat.com>
Subject: Re: Nohz_full on boot CPU is broken (was: Re: [PATCH v2 1/1] wq: Avoid using isolated cpus' timers on queue_delayed_work)
Date: Wed, 10 Apr 2024 15:55:19 +0200	[thread overview]
Message-ID: <20240410135518.GA25421@redhat.com> (raw)
In-Reply-To: <D0G5OX8W9NH9.1HE33RVAROAJK@gmail.com>

Hi Nicholas,

On 04/10, Nicholas Piggin wrote:
>
> Thanks for this. Taking a while to page this back in, the intention is
> for housekeeping to be done by boot CPU until house keeper is awake, so
> returning smp_processor_id() seems like the right thing to do here for
> ephemeral jobs like timers and work, provided that CPU / mask is not
> stored somewhere long term by the caller.
>
> For things that set an affinity like kthread, sched, maybe managed
> irqs, and such.
>
> There are not many callers of housekeeping_any_cpu() so that's easy
> enough to verify. But similar like housekeeping_cpumask() and others
> could be an issue or at least a foot-gun, I'm not sure how well I
> convinced myself of those.
>
> Could you test like this?
>
>   WARN_ON_ONCE(system_state == SYSTEM_RUNNING ||
>                type != HK_TYPE_TIMER);
>
> With a comment to say other ephemeral mask types could be exempted if
> needed.

Sorry, I don't understand... Let me repeat, I know absolutely nothing
about nonhz/etc. I didn't even try to really fix the problem(s), I am
only trying to find a minimal/simple workaround to fix the problem we
hit in Red Hat.

This is what I was going to send:

	--- a/kernel/sched/isolation.c
	+++ b/kernel/sched/isolation.c
	@@ -46,7 +46,15 @@ int housekeeping_any_cpu(enum hk_type type)
				if (cpu < nr_cpu_ids)
					return cpu;
	 
	-			return cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
	+			cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
	+			if (likely(cpu < nr_cpu_ids))
	+				return cpu;
	+			/*
	+			 * Unless we have another problem this can only happen
	+			 * at boot time before start_secondary() brings the 1st
	+			 * housekeeping CPU up.
	+			 */
	+			WARN_ON_ONCE(system_state == SYSTEM_RUNNING);
			}
		}
		return smp_processor_id();

Yes, this fixes the symptom, not the problem. And yes, "another problem"
mentioned in the comment is very possible, say "maxcpus" kernel-parameter
can be less than the first housekeeping cpu. But in this case the user
should blame himself (and I am not sure the kernel will boot).

I don't understand why do you suggest to add "|| type != HK_TYPE_TIMER",
currently all the callers of housekeeping_any_cpu() use type == HK_TIMER.
But OK, I can add this check. I guess for the case it finds another user
with type != HK_TYPE_TIMER which can't use smp_processor_id() even at
boot time or stores the returned CPU for the long term.

Will you agree with the change above or what do you suggest instead as
a simple workaround?


> It would also be nice to warn for cases that would be bugs if the boot
> CPU was not in the HK mask. Could that be done by having a
> housekeepers_online() call after smp_init() (maybe at the start of
> sched_init_smp()) that could verify there is at least one online, and
> set a flag that could be used to create warnings.

Again, I am not sure I understand, but I too thought that something like

	housekeeping_check(void)
	{
		for_each_set_bit(type, &housekeeping.flags, HK_TYPE_MAX) {
			if (!cpumask_intersects(cpu_online, housekeeping.cpumasks[type]))
				panic();
		}

after bringup_nonboot_cpus(setup_max_cpus).

But I am not sure this is correct and this is another (although related) issue.

Oleg.


  reply	other threads:[~2024-04-10 13:56 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-30  1:00 [PATCH v2 1/1] wq: Avoid using isolated cpus' timers on queue_delayed_work Leonardo Bras
2024-01-30  1:22 ` Tejun Heo
2024-01-30  2:58   ` Leonardo Bras
2024-04-02 10:58 ` Oleg Nesterov
2024-04-03 19:12   ` Tejun Heo
2024-04-03 20:38     ` Oleg Nesterov
2024-04-05 14:04       ` Oleg Nesterov
2024-04-05 15:38         ` Tejun Heo
2024-04-05 22:03           ` Frederic Weisbecker
2024-04-05 21:52         ` Nohz_full on boot CPU is broken (was: Re: [PATCH v2 1/1] wq: Avoid using isolated cpus' timers on queue_delayed_work) Frederic Weisbecker
2024-04-07 13:09           ` Oleg Nesterov
2024-04-07 13:52             ` Oleg Nesterov
2024-04-09 12:05               ` Frederic Weisbecker
2024-04-09 12:04             ` Frederic Weisbecker
2024-04-09 13:07               ` Oleg Nesterov
2024-04-09 13:59                 ` Frederic Weisbecker
2024-04-10  4:26                 ` Nicholas Piggin
2024-04-10 13:55                   ` Oleg Nesterov [this message]
2024-04-11 13:41                     ` Oleg Nesterov
2024-04-11 14:39   ` [PATCH] sched/isolation: fix boot crash when the boot CPU is nohz_full Oleg Nesterov
2024-04-11 16:59     ` Oleg Nesterov
2024-04-13 14:17     ` [PATCH] sched/isolation: fix boot crash when maxcpus < first-housekeeping-cpu Oleg Nesterov
2024-04-18 14:54       ` Phil Auld
2024-04-18 15:40       ` Frederic Weisbecker
2024-04-24 20:05       ` [tip: sched/urgent] sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU tip-bot2 for Oleg Nesterov
2024-04-28  8:13         ` Ingo Molnar
2024-04-28 13:16           ` Oleg Nesterov
2024-04-28  8:24       ` tip-bot2 for Oleg Nesterov
2024-04-15 21:37     ` [PATCH] sched/isolation: fix boot crash when the boot CPU is nohz_full Frederic Weisbecker
2024-04-18 14:50     ` Phil Auld
2024-04-22 18:50       ` Oleg Nesterov
2024-04-24 14:42         ` Phil Auld
2024-04-24 20:05     ` [tip: sched/urgent] sched/isolation: {revent " tip-bot2 for Oleg Nesterov
2024-04-24 20:41       ` Phil Auld
2024-04-28  8:14         ` Ingo Molnar
2024-04-29 11:50           ` Phil Auld
2024-04-28  8:24     ` [tip: sched/urgent] sched/isolation: Prevent " tip-bot2 for Oleg Nesterov
2024-10-04 12:37 ` [PATCH v2 1/1] wq: Avoid using isolated cpus' timers on queue_delayed_work Frederic Weisbecker
2024-10-04 22:43   ` Leonardo Bras Soares Passos
2024-10-05 11:15     ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240410135518.GA25421@redhat.com \
    --to=oleg@redhat.com \
    --cc=crecklin@redhat.com \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=junzhao@redhat.com \
    --cc=leobras@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.