All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: Gabriele Monaco <gmonaco@redhat.com>,
	linux-kernel@vger.kernel.org,
	Anna-Maria Behnsen <anna-maria@linutronix.de>,
	Waiman Long <longman@redhat.com>,
	"John B. Wyatt IV" <jwyatt@redhat.com>,
	"John B. Wyatt IV" <sageofredondo@gmail.com>
Subject: Re: [PATCH v15 7/7] timers: Exclude isolated cpus from timer migration
Date: Wed, 19 Nov 2025 22:23:11 +0100	[thread overview]
Message-ID: <87fra9lnsw.ffs@tglx> (raw)
In-Reply-To: <aR4k36oh5mGYdH7e@pavilion.home>

On Wed, Nov 19 2025 at 21:13, Frederic Weisbecker wrote:
> Le Wed, Nov 19, 2025 at 07:15:42PM +0100, Thomas Gleixner a écrit :
>> On Wed, Nov 19 2025 at 18:14, Frederic Weisbecker wrote:
>> > Le Wed, Nov 19, 2025 at 05:50:15PM +0100, Thomas Gleixner a écrit :
>> >> But thinking more about it. What's the actual point of moving this 'clear'
>> >> out instead of just moving it further down?
>> >> 
>> >> It does not matter at all whether the isol/unisol muck clears an already
>> >> cleared bit or not. But it would keep the function name comprehensible
>> >> and avoid all this online/offline wrapper nonsense.
>> >
>> > That was my suggestion.
>> >
>> > It's because tmigr_clear_cpu_available() and tmigr_set_cpu_available()
>> > can now all be called concurrently through the workqueues and race and
>> > mess up the cpumask if they all try to clear/set at the same time...
>> 
>> Huch?
>> 
>> cpumask_set_cpu() uses set_bit() and cpumask_clear_cpu() uses
>> clear_bit(). Both are atomic and nothing gets messed up.
>
> Urgh, right...
>
>> 
>> The only undefined case would be if you end up setting/clearing the same
>> bit, which would require that the unisol and isol maps overlap. But that
>> would be a bug on it's own, no?
>
> Ok.
>
> But then the "unisolate" works must be flushed before this line:
>
> +       /* Set up the mask earlier to avoid races with the migrator CPU */
> +       cpumask_andnot(tmigr_available_cpumask, tmigr_available_cpumask, cpumask_isol);
>
> Because that is non-atomic and can race with the cpumask_set_cpu() from the
> works, right?

Correct, but I still have to understand why this has to happen
upfront. As I said before this comment is useless:

> +       /* Set up the mask earlier to avoid races with the migrator CPU */

which decodes to:

   Set up the mask earlier than something unspecified to avoid
   unspecified races with the migrator CPU.

Seriously?

But I finally understood what this is about after staring at it with
less disgust again for five times in a row:

   tmigr_clear_cpu_available() requires a stable cpumask to find a
   migrator.

So what this is about is to avoid touching the cpumask in those worker
functions so that tmigr_isolated_exclude_cpumask() can fiddle with them
asynchronously.

Right?

That's voodoo programming which nobody - even those involved - will
understand anymore three months down the road.

The idea that updating the masks upfront will provide stable state is
flawed to begin with. Let's assume the following scenario:

  1) The isol/unisol mask is just flipped around

  2) The newly available CPUs are marked in the mask

  3) The work is scheduled on those CPUs

  4) The newly unavailable CPUs are cleared in the mask

  5) The work is scheduled on those CPUs

  6) The newly available CPU workers are delayed so that the newly
     unavailable workers get there first. They find a stable CPU mask,
     but none of these now "available" CPUs are actually brought into
     active state yet.

That's just a perfect recipe for some completely undecodable bug to
happen anytime soon even it it supposed to "work" by chance.

The way how this was designed in the first place is that changing the
"availability" is fully serialized by the CPU hotplug machinery. Which
means zero surprises.

Now you create a side channel, which lacks these serialization
guarantees for absolutely no reason. Changing this isolation muck at
run-time is not a hotpath operation and trying to optimize it for no
good reason is just stupid.

The most trivial solution is to schedule the works one by one walking
through the newly available and then through the unavailable maps.

If you want to be a bit smarter, then you can just use a global mutex,
which is taken inside the set/clear_available() functions which
serializes the related functionality and bulk schedule/flush the unisol
(make available) first and then proceed to the isol (make unavailable).

Then nothing has to change vs. the set/clear operations and everything
just works.

That mutex does not do any harm in the CPU hotplug case and the
serialization vs. the workers is not going to be the end of the world.

I'm willing to bet that no real-world use-case will ever notice the
existance of this mutex. The microbenchmark which shows off the "I'm so
smart" metric is completely irrelevant especially when the result is
fragile, incomprehensible and therefore unmaintainable.

Keep it correct and simple is still the most important engineering
principle. Premature optimization is a guaranteed path to failure.

If there is a compelling use case which justifies the resulting
complexity, then it can be built on top. I'm not holding my breath. See
above...

Thanks,

        tglx

  reply	other threads:[~2025-11-19 21:23 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-13  8:33 [PATCH v15 0/7] timers: Exclude isolated cpus from timer migration Gabriele Monaco
2025-11-13  8:33 ` [PATCH v15 1/7] timers: Rename tmigr 'online' bit to 'available' Gabriele Monaco
2025-11-19 15:49   ` Thomas Gleixner
2025-11-13  8:33 ` [PATCH v15 2/7] timers: Add the available mask in timer migration Gabriele Monaco
2025-11-19 15:56   ` Thomas Gleixner
2025-11-13  8:33 ` [PATCH v15 3/7] timers: Use scoped_guard when setting/clearing the tmigr available flag Gabriele Monaco
2025-11-19 15:57   ` Thomas Gleixner
2025-11-13  8:33 ` [PATCH v15 4/7] cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks() Gabriele Monaco
2025-11-13  8:33 ` [PATCH v15 5/7] sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any Gabriele Monaco
2025-11-13  8:33 ` [PATCH v15 6/7] cpumask: Add initialiser to use cleanup helpers Gabriele Monaco
2025-11-13  8:33 ` [PATCH v15 7/7] timers: Exclude isolated cpus from timer migration Gabriele Monaco
2025-11-19 16:50   ` Thomas Gleixner
2025-11-19 17:14     ` Frederic Weisbecker
2025-11-19 18:15       ` Thomas Gleixner
2025-11-19 20:13         ` Frederic Weisbecker
2025-11-19 21:23           ` Thomas Gleixner [this message]
2025-11-19 22:02             ` Frederic Weisbecker
2025-11-19 22:10               ` Thomas Gleixner
2025-11-19 22:31                 ` Frederic Weisbecker
2025-11-19 20:43   ` Waiman Long
2025-11-20 10:48     ` Gabriele Monaco
2025-11-20 21:04       ` Waiman Long
2025-11-13 13:12 ` [PATCH v15 0/7] " Frederic Weisbecker
2025-11-18 10:01   ` Gabriele Monaco

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fra9lnsw.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=anna-maria@linutronix.de \
    --cc=frederic@kernel.org \
    --cc=gmonaco@redhat.com \
    --cc=jwyatt@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=sageofredondo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.