Linux cgroups development
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@kernel.org>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: Waiman Long <longman@redhat.com>, Jing Wu <realwujing@gmail.com>,
	linux-kernel@vger.kernel.org, rcu@vger.kernel.org,
	cgroups@vger.kernel.org, Qiliang Yuan <yuanql9@chinatelecom.cn>
Subject: Re: [PATCH-next 00/23] cgroup/cpuset: Enable runtime update of nohz_full and managed_irq CPUs
Date: Fri, 03 Jul 2026 22:50:53 +0200	[thread overview]
Message-ID: <87h5mflrv6.ffs@fw13> (raw)
In-Reply-To: <ake9D20lxx2Sncqm@localhost.localdomain>

On Fri, Jul 03 2026 at 15:45, Frederic Weisbecker wrote:
> Le Thu, Jul 02, 2026 at 05:00:03PM +0200, Thomas Gleixner a écrit :
>> At #4 the half unplugged CPU is not in NOHZ full mode and the tick keeps
>> running so all GP processing work as before except that the CPU itself
>> is not handling any callbacks because all queued ones are drained and no
>> new ones can be queued. When it comes back up it turns into a fully
>> offloaded one.
>
> But interrupts can still fire and queue callbacks, right?

Sure, but because of

>>   2) rcutree_offline_cpu() removes the CPU from the fully functional CPU
>>      mask _AND_ marks the CPU as "lightweight offloaded", which means:
>> 
>>         - no new callbacks can be queued on it anymore neither from the
>>           CPU itself nor from truly offloaded CPUs
>> 
>>         - the CPU is still processing already queued callbacks and
>>           participates in the GP magic

the queuing sees "offloaded", so callbacks won't end up on the outgoing
CPU. No?

>> There are obviously a gazillion of details and cornercases to handle,
>> but I don't see why this can't be made work in principle.
>
> If we need to do something tricky anyway, how about this that would
> solve the initial problem of hotplug:stop_machine VS latency sensitive workloads
> in general?

I'm all for that but there is way more than RCU and places which consult
cpu_online_mask.

Before you get to the point where you can remove stomp_machine() from
the CPU down machinery, you have to go through:

 - All architecture specific code in __cpu_disable()
 
 - All existing (~60) AP callbacks in that section (former DYING
   notification)

and validate that none of that has assumptions about stomp_machine()
protecting them magically.

Back then when I was sanitizing CPU hotplug I looked into that deeply
and looked away pretty fast not only because of RCU. If it would have
been only RCU I surely would have pestered Paul enough to get it
fixed. :)

Let me give you some major pain points from my notes in complexity
order from back then:

   - All topology masks

     It's not only cpu_online_mask. There is numa_mask and all sibling,
     core, die, llc, l2c and whatever fancy masks we have and most of
     them are accessed in hotpaths all over the place and many of them
     implicitely rely on the stomp_machine() serialization (due to
     preempt/interrupt disable), unless they use an explicit
     cpuhp_read_lock() section.

   - RCU

     Plus the SMPCFD part, which has ordering constraints vs. RCU

   - Interrupt migration

     Sounds trivial but with the nastiness of the x86 APIC (w/o
     interrupt remapping) this becomes a nightmare pretty fast.

   - Tick

     Never dived deeply into it, but looking at the on the fly patches
     that's a solv[able|ed] problem.

   - Perf

     There were some truly nasty things in various perf implementations,
     but those got sorted out (at least on x86) due to RT by now. Still
     needs to be looked at.

That's x86 only. I've never looked at any other architecture and their 
callbacks in the stomp_machine() section.

Just looking at your back then proposal:

    set_cpu_online(cpu, 0)
    synchronize_rcu()
    migrate things // call CPUHP_TEARDOWN_CPU -> CPUHP_AP_IDLE_DEAD

There is a hen and egg problem right there. synchronize_rcu() running on
the outgoing CPU requires a functional scheduler as synchronize_rcu()
can sleep on the completion. But you just pulled the rug under the
scheduler because you set the CPU offline. So how exactly is the wakeup,
which might be coming from a different CPU going to work?

I totally agree with the long term goal of removing stomp_machine() from
the hotplug machinery completely, but the various subsystems which
depend on it today need to be solved one by one upfront with that goal
in mind. Once we have them out of the way, removing stomp_machine()
becomes trivial. But starting with it to begin with is a guaranteed
recipe for disaster.

Thanks,

        tglx



      reply	other threads:[~2026-07-03 20:50 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21  3:03 [PATCH-next 00/23] cgroup/cpuset: Enable runtime update of nohz_full and managed_irq CPUs Waiman Long
2026-04-21  3:03 ` [PATCH 01/23] sched/isolation: Add HK_TYPE_KERNEL_NOISE_BOOT & HK_TYPE_MANAGED_IRQ_BOOT Waiman Long
2026-04-21  3:03 ` [PATCH 02/23] sched/isolation: Enhance housekeeping_update() to support updating more than one HK cpumask Waiman Long
2026-04-22  6:39   ` Chen Ridong
2026-04-21  3:03 ` [PATCH 03/23] tick/nohz: Make nohz_full parameter optional Waiman Long
2026-04-21  8:32   ` Thomas Gleixner
2026-04-21 14:14     ` Waiman Long
2026-04-24 15:57       ` Frederic Weisbecker
2026-04-21  3:03 ` [PATCH 04/23] tick/nohz: Allow runtime changes in full dynticks CPUs Waiman Long
2026-04-21  8:50   ` Thomas Gleixner
2026-04-21 14:24     ` Waiman Long
2026-05-13 13:04     ` Frederic Weisbecker
2026-04-21  3:03 ` [PATCH 05/23] tick: Pass timer tick job to an online HK CPU in tick_cpu_dying() Waiman Long
2026-04-21  8:55   ` Thomas Gleixner
2026-04-21 14:22     ` Waiman Long
2026-04-21  3:03 ` [PATCH 06/23] rcu/nocbs: Allow runtime changes in RCU NOCBS cpumask Waiman Long
2026-04-21  3:03 ` [PATCH 07/23] watchdog: Sync up with runtime change of isolated CPUs Waiman Long
2026-04-21  3:03 ` [PATCH 08/23] arm64: topology: Use RCU to protect access to HK_TYPE_TICK cpumask Waiman Long
2026-04-22  9:34   ` Chen Ridong
2026-05-13 16:19   ` Frederic Weisbecker
2026-04-21  3:03 ` [PATCH 09/23] workqueue: Use RCU to protect access of HK_TYPE_TIMER cpumask Waiman Long
2026-04-21  3:03 ` [PATCH 10/23] cpu: " Waiman Long
2026-04-21  8:57   ` Thomas Gleixner
2026-04-21 14:25     ` Waiman Long
2026-04-21  3:03 ` [PATCH 11/23] hrtimer: " Waiman Long
2026-04-21  8:59   ` Thomas Gleixner
2026-04-21  3:03 ` [PATCH 12/23] net: Use boot time housekeeping cpumask settings for now Waiman Long
2026-04-21  3:03 ` [PATCH 13/23] sched/core: Use RCU to protect access of HK_TYPE_KERNEL_NOISE cpumask Waiman Long
2026-04-21  3:03 ` [PATCH 14/23] hwmon/coretemp: Use RCU to protect access of HK_TYPE_MISC cpumask Waiman Long
2026-04-21  3:03 ` [PATCH 15/23] Drivers: hv: Use RCU to protect access of HK_TYPE_MANAGED_IRQ cpumask Waiman Long
2026-04-21  3:03 ` [PATCH 16/23] genirq/cpuhotplug: " Waiman Long
2026-04-21  9:02   ` Thomas Gleixner
2026-04-21 14:29     ` Waiman Long
2026-04-21  3:03 ` [PATCH 17/23] sched/isolation: Extend housekeeping_dereference_check() to cover changes in nohz_full or manged_irqs cpumasks Waiman Long
2026-04-21  3:03 ` [PATCH 18/23] cpu/hotplug: Add a new cpuhp_offline_cb() API Waiman Long
2026-04-21 16:17   ` Thomas Gleixner
2026-04-21 17:29     ` Waiman Long
2026-04-21 18:43       ` Thomas Gleixner
2026-04-21  3:03 ` [PATCH 19/23] cgroup/cpuset: Improve check for calling housekeeping_update() Waiman Long
2026-04-23  1:10   ` Chen Ridong
2026-04-24 18:32     ` Waiman Long
2026-04-21  3:03 ` [PATCH 20/23] cgroup/cpuset: Enable runtime update of HK_TYPE_{KERNEL_NOISE,MANAGED_IRQ} cpumasks Waiman Long
2026-04-21  3:03 ` [PATCH 21/23] cgroup/cpuset: Limit the side effect of using CPU hotplug on isolated partition Waiman Long
2026-04-21  3:03 ` [PATCH 22/23] cgroup/cpuset: Prevent offline_disabled CPUs from being used in " Waiman Long
2026-04-21  3:03 ` [PATCH 23/23] cgroup/cpuset: Documentation and kselftest updates Waiman Long
2026-06-24  6:34 ` [PATCH-next 00/23] cgroup/cpuset: Enable runtime update of nohz_full and managed_irq CPUs Jing Wu
2026-06-25  5:27   ` Waiman Long
2026-07-01 14:22     ` Frederic Weisbecker
2026-07-01 18:56       ` Waiman Long
2026-07-02  3:39         ` Jing Wu
2026-07-03 13:19         ` Frederic Weisbecker
2026-07-02 15:00       ` Thomas Gleixner
2026-07-02 23:07         ` Paul E. McKenney
2026-07-03  6:11           ` Jing Wu
2026-07-03 17:25             ` Paul E. McKenney
2026-07-03 13:45         ` Frederic Weisbecker
2026-07-03 20:50           ` Thomas Gleixner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h5mflrv6.ffs@fw13 \
    --to=tglx@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=rcu@vger.kernel.org \
    --cc=realwujing@gmail.com \
    --cc=yuanql9@chinatelecom.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox