From: Frederic Weisbecker <frederic@kernel.org>
To: Waiman Long <llong@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
linux-kernel@vger.kernel.org,
Anna-Maria Behnsen <anna-maria@linutronix.de>
Subject: Re: [RESEND PATCH v13 0/9] timers: Exclude isolated cpus from timer migration
Date: Fri, 31 Oct 2025 14:48:35 +0100 [thread overview]
Message-ID: <aQS-M_6_97ZLk0yH@localhost.localdomain> (raw)
In-Reply-To: <4421ec3d-e4df-4645-9b68-261080bd4760@redhat.com>
Le Thu, Oct 30, 2025 at 01:57:50PM -0400, Waiman Long a écrit :
> On 10/30/25 1:10 PM, Frederic Weisbecker wrote:
> > Le Thu, Oct 30, 2025 at 12:37:08PM -0400, Waiman Long a écrit :
> > > On 10/30/25 12:09 PM, Gabriele Monaco wrote:
> > > > On Thu, 2025-10-30 at 11:37 -0400, Waiman Long wrote:
> > > > > On 10/30/25 10:12 AM, Frederic Weisbecker wrote:
> > > > > > Hi Waiman,
> > > > > >
> > > > > > Le Wed, Oct 29, 2025 at 10:56:06PM -0400, Waiman Long a écrit :
> > > > > > > On 10/20/25 7:27 AM, Gabriele Monaco wrote:
> > > > > > > > The timer migration mechanism allows active CPUs to pull timers from
> > > > > > > > idle ones to improve the overall idle time. This is however undesired
> > > > > > > > when CPU intensive workloads run on isolated cores, as the algorithm
> > > > > > > > would move the timers from housekeeping to isolated cores, negatively
> > > > > > > > affecting the isolation.
> > > > > > > >
> > > > > > > > Exclude isolated cores from the timer migration algorithm, extend the
> > > > > > > > concept of unavailable cores, currently used for offline ones, to
> > > > > > > > isolated ones:
> > > > > > > > * A core is unavailable if isolated or offline;
> > > > > > > > * A core is available if non isolated and online;
> > > > > > > >
> > > > > > > > A core is considered unavailable as isolated if it belongs to:
> > > > > > > > * the isolcpus (domain) list
> > > > > > > > * an isolated cpuset
> > > > > > > > Except if it is:
> > > > > > > > * in the nohz_full list (already idle for the hierarchy)
> > > > > > > > * the nohz timekeeper core (must be available to handle global timers)
> > > > > > > >
> > > > > > > > CPUs are added to the hierarchy during late boot, excluding isolated
> > > > > > > > ones, the hierarchy is also adapted when the cpuset isolation changes.
> > > > > > > >
> > > > > > > > Due to how the timer migration algorithm works, any CPU part of the
> > > > > > > > hierarchy can have their global timers pulled by remote CPUs and have to
> > > > > > > > pull remote timers, only skipping pulling remote timers would break the
> > > > > > > > logic.
> > > > > > > > For this reason, prevent isolated CPUs from pulling remote global
> > > > > > > > timers, but also the other way around: any global timer started on an
> > > > > > > > isolated CPU will run there. This does not break the concept of
> > > > > > > > isolation (global timers don't come from outside the CPU) and, if
> > > > > > > > considered inappropriate, can usually be mitigated with other isolation
> > > > > > > > techniques (e.g. IRQ pinning).
> > > > > > > >
> > > > > > > > This effect was noticed on a 128 cores machine running oslat on the
> > > > > > > > isolated cores (1-31,33-63,65-95,97-127). The tool monopolises CPUs,
> > > > > > > > and the CPU with lowest count in a timer migration hierarchy (here 1
> > > > > > > > and 65) appears as always active and continuously pulls global timers,
> > > > > > > > from the housekeeping CPUs. This ends up moving driver work (e.g.
> > > > > > > > delayed work) to isolated CPUs and causes latency spikes:
> > > > > > > >
> > > > > > > > before the change:
> > > > > > > >
> > > > > > > > # oslat -c 1-31,33-63,65-95,97-127 -D 62s
> > > > > > > > ...
> > > > > > > > Maximum: 1203 10 3 4 ... 5 (us)
> > > > > > > >
> > > > > > > > after the change:
> > > > > > > >
> > > > > > > > # oslat -c 1-31,33-63,65-95,97-127 -D 62s
> > > > > > > > ...
> > > > > > > > Maximum: 10 4 3 4 3 ... 5 (us)
> > > > > > > >
> > > > > > > > The same behaviour was observed on a machine with as few as 20 cores /
> > > > > > > > 40 threads with isocpus set to: 1-9,11-39 with rtla-osnoise-top.
> > > > > > > >
> > > > > > > > The first 5 patches are preparatory work to change the concept of
> > > > > > > > online/offline to available/unavailable, keep track of those in a
> > > > > > > > separate cpumask cleanup the setting/clearing functions and change a
> > > > > > > > function name in cpuset code.
> > > > > > > >
> > > > > > > > Patch 6 and 7 adapt isolation and cpuset to prevent domain isolated and
> > > > > > > > nohz_full from covering all CPUs not leaving any housekeeping one. This
> > > > > > > > can lead to problems with the changes introduced in this series because
> > > > > > > > no CPU would remain to handle global timers.
> > > > > > > >
> > > > > > > > Patch 9 extends the unavailable status to domain isolated CPUs, which
> > > > > > > > is the main contribution of the series.
> > > > > > > >
> > > > > > > > This series is equivalent to v13 but rebased on v6.18-rc2.
> > > > > > > Thomas,
> > > > > > >
> > > > > > > This patch series have undergone multiple round of reviews. Do you think
> > > > > > > it
> > > > > > > is good enough to be merged into tip?
> > > > > > >
> > > > > > > It does contain some cpuset code, but most of the changes are in the timer
> > > > > > > code. So I think it is better to go through the tip tree. It does have
> > > > > > > some
> > > > > > > minor conflicts with the current for-6.19 branch of the cgroup tree, but
> > > > > > > it
> > > > > > > can be easily resolved during merge.
> > > > > > >
> > > > > > > What do you think?
> > > > > > Just wait a little, I realize I made a buggy suggestion to Gabriele and
> > > > > > a detail needs to be fixed.
> > > > > >
> > > > > > My bad...
> > > > > OK, I thought you were OK with the timer changes. I guess Gabriele will have
> > > > > to send out a new version to address your finding.
> > > > Sure, I'm going to have a look at this next week and send a V14.
> > > I am going to extract out your 2 cpuset patches and send them to the cgroup
> > > mailing list separately. So you don't need to include them in your next
> > > version.
> > I'm not sure this will help if you apply those to an external tree if the
> > plan is to apply the whole to the timer tree. Or we'll create a dependency
> > issue...
>
> These 2 cpuset patches are actually independent of the timer related
> changes. The purpose of these two patches are to prevent the cpuset code
> from adding isolated CPUs in such a way that all the nohz_full HK CPUs
> become domain-isolated. This is a corner case that normal users won't try to
> do. The patches are just an insurance policy to ensure that users can't do
> that. This is complementary to the sched/isolation patch that limits what
> CPUs can be put to the isolcpus and nohz_full boot parameters. All these
> patches are independent of the timer related changes, though you can say
> that the solution will only be complete if all the pieces are in place.
Right but there will be a conflict if the timer patches don't have
the rename of update_unbound_workqueue_cpumask().
> There are another set of pending cpuset patches from Chen Ridong that does
> some restructuring of the cpuset code that will likely have some conflicts
> with these 2 patches. So I would like to settle the cpuset changes to avoid
> future conflicts.
Ok so it looks like there will be conflicts eventually during the merge
window. In that case it makes sense to take Gabriel cpuset patches but
he'll need to rebase the rest on top of the timer tree.
Thanks.
>
> Cheers,
> Longman
>
--
Frederic Weisbecker
SUSE Labs
next prev parent reply other threads:[~2025-10-31 13:48 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-20 11:27 [RESEND PATCH v13 0/9] timers: Exclude isolated cpus from timer migration Gabriele Monaco
2025-10-20 11:27 ` [RESEND PATCH v13 1/9] timers/migration: Postpone online/offline callbacks registration to late initcall Gabriele Monaco
2025-10-30 14:07 ` Frederic Weisbecker
2025-10-20 11:27 ` [RESEND PATCH v13 2/9] timers: Rename tmigr 'online' bit to 'available' Gabriele Monaco
2025-10-20 11:27 ` [RESEND PATCH v13 3/9] timers: Add the available mask in timer migration Gabriele Monaco
2025-10-20 11:27 ` [RESEND PATCH v13 4/9] timers: Use scoped_guard when setting/clearing the tmigr available flag Gabriele Monaco
2025-10-20 11:27 ` [RESEND PATCH v13 5/9] cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_exclusion_cpumasks() Gabriele Monaco
2025-10-20 11:27 ` [RESEND PATCH v13 6/9] sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any Gabriele Monaco
2025-10-20 11:28 ` [RESEND PATCH v13 7/9] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping Gabriele Monaco
2025-10-20 11:28 ` [RESEND PATCH v13 8/9] cpumask: Add initialiser to use cleanup helpers Gabriele Monaco
2025-10-20 11:28 ` [RESEND PATCH v13 9/9] timers: Exclude isolated cpus from timer migration Gabriele Monaco
2025-10-30 2:56 ` [RESEND PATCH v13 0/9] " Waiman Long
2025-10-30 14:12 ` Frederic Weisbecker
[not found] ` <5457560d-f48a-4a99-8756-51b1017a6aab@redhat.com>
2025-10-30 16:09 ` Gabriele Monaco
2025-10-30 16:37 ` Waiman Long
2025-10-30 17:10 ` Frederic Weisbecker
2025-10-30 17:57 ` Waiman Long
2025-10-31 13:48 ` Frederic Weisbecker [this message]
2025-10-31 14:03 ` Gabriele Monaco
2025-10-31 16:14 ` Waiman Long
2025-10-30 17:08 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQS-M_6_97ZLk0yH@localhost.localdomain \
--to=frederic@kernel.org \
--cc=anna-maria@linutronix.de \
--cc=gmonaco@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=llong@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox