public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Waiman Long <llong@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org,
	Gabriele Monaco <gmonaco@redhat.com>,
	Anna-Maria Behnsen <anna-maria@linutronix.de>
Subject: Re: [RESEND PATCH v13 0/9] timers: Exclude isolated cpus from timer migration
Date: Thu, 30 Oct 2025 18:08:54 +0100	[thread overview]
Message-ID: <aQObphtf6Vbc-XLJ@localhost.localdomain> (raw)
In-Reply-To: <5457560d-f48a-4a99-8756-51b1017a6aab@redhat.com>

Le Thu, Oct 30, 2025 at 11:37:36AM -0400, Waiman Long a écrit :
> On 10/30/25 10:12 AM, Frederic Weisbecker wrote:
> > Hi Waiman,
> > 
> > Le Wed, Oct 29, 2025 at 10:56:06PM -0400, Waiman Long a écrit :
> > > On 10/20/25 7:27 AM, Gabriele Monaco wrote:
> > > > The timer migration mechanism allows active CPUs to pull timers from
> > > > idle ones to improve the overall idle time. This is however undesired
> > > > when CPU intensive workloads run on isolated cores, as the algorithm
> > > > would move the timers from housekeeping to isolated cores, negatively
> > > > affecting the isolation.
> > > > 
> > > > Exclude isolated cores from the timer migration algorithm, extend the
> > > > concept of unavailable cores, currently used for offline ones, to
> > > > isolated ones:
> > > > * A core is unavailable if isolated or offline;
> > > > * A core is available if non isolated and online;
> > > > 
> > > > A core is considered unavailable as isolated if it belongs to:
> > > > * the isolcpus (domain) list
> > > > * an isolated cpuset
> > > > Except if it is:
> > > > * in the nohz_full list (already idle for the hierarchy)
> > > > * the nohz timekeeper core (must be available to handle global timers)
> > > > 
> > > > CPUs are added to the hierarchy during late boot, excluding isolated
> > > > ones, the hierarchy is also adapted when the cpuset isolation changes.
> > > > 
> > > > Due to how the timer migration algorithm works, any CPU part of the
> > > > hierarchy can have their global timers pulled by remote CPUs and have to
> > > > pull remote timers, only skipping pulling remote timers would break the
> > > > logic.
> > > > For this reason, prevent isolated CPUs from pulling remote global
> > > > timers, but also the other way around: any global timer started on an
> > > > isolated CPU will run there. This does not break the concept of
> > > > isolation (global timers don't come from outside the CPU) and, if
> > > > considered inappropriate, can usually be mitigated with other isolation
> > > > techniques (e.g. IRQ pinning).
> > > > 
> > > > This effect was noticed on a 128 cores machine running oslat on the
> > > > isolated cores (1-31,33-63,65-95,97-127). The tool monopolises CPUs,
> > > > and the CPU with lowest count in a timer migration hierarchy (here 1
> > > > and 65) appears as always active and continuously pulls global timers,
> > > > from the housekeeping CPUs. This ends up moving driver work (e.g.
> > > > delayed work) to isolated CPUs and causes latency spikes:
> > > > 
> > > > before the change:
> > > > 
> > > >    # oslat -c 1-31,33-63,65-95,97-127 -D 62s
> > > >    ...
> > > >     Maximum:     1203 10 3 4 ... 5 (us)
> > > > 
> > > > after the change:
> > > > 
> > > >    # oslat -c 1-31,33-63,65-95,97-127 -D 62s
> > > >    ...
> > > >     Maximum:      10 4 3 4 3 ... 5 (us)
> > > > 
> > > > The same behaviour was observed on a machine with as few as 20 cores /
> > > > 40 threads with isocpus set to: 1-9,11-39 with rtla-osnoise-top.
> > > > 
> > > > The first 5 patches are preparatory work to change the concept of
> > > > online/offline to available/unavailable, keep track of those in a
> > > > separate cpumask cleanup the setting/clearing functions and change a
> > > > function name in cpuset code.
> > > > 
> > > > Patch 6 and 7 adapt isolation and cpuset to prevent domain isolated and
> > > > nohz_full from covering all CPUs not leaving any housekeeping one. This
> > > > can lead to problems with the changes introduced in this series because
> > > > no CPU would remain to handle global timers.
> > > > 
> > > > Patch 9 extends the unavailable status to domain isolated CPUs, which
> > > > is the main contribution of the series.
> > > > 
> > > > This series is equivalent to v13 but rebased on v6.18-rc2.
> > > Thomas,
> > > 
> > > This patch series have undergone multiple round of reviews. Do you think it
> > > is good enough to be merged into tip?
> > > 
> > > It does contain some cpuset code, but most of the changes are in the timer
> > > code. So I think it is better to go through the tip tree. It does have some
> > > minor conflicts with the current for-6.19 branch of the cgroup tree, but it
> > > can be easily resolved during merge.
> > > 
> > > What do you think?
> > Just wait a little, I realize I made a buggy suggestion to Gabriele and
> > a detail needs to be fixed.
> > 
> > My bad...
> 
> OK, I thought you were OK with the timer changes.

I was ok until...just a few days ago, and I should have written about it right
away but you know, being wrong is a process that takes time :o)

> I guess Gabriele will have to send out a new version to address your finding.

Right.

Thanks!

-- 
Frederic Weisbecker
SUSE Labs

      parent reply	other threads:[~2025-10-30 17:08 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-20 11:27 [RESEND PATCH v13 0/9] timers: Exclude isolated cpus from timer migration Gabriele Monaco
2025-10-20 11:27 ` [RESEND PATCH v13 1/9] timers/migration: Postpone online/offline callbacks registration to late initcall Gabriele Monaco
2025-10-30 14:07   ` Frederic Weisbecker
2025-10-20 11:27 ` [RESEND PATCH v13 2/9] timers: Rename tmigr 'online' bit to 'available' Gabriele Monaco
2025-10-20 11:27 ` [RESEND PATCH v13 3/9] timers: Add the available mask in timer migration Gabriele Monaco
2025-10-20 11:27 ` [RESEND PATCH v13 4/9] timers: Use scoped_guard when setting/clearing the tmigr available flag Gabriele Monaco
2025-10-20 11:27 ` [RESEND PATCH v13 5/9] cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_exclusion_cpumasks() Gabriele Monaco
2025-10-20 11:27 ` [RESEND PATCH v13 6/9] sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any Gabriele Monaco
2025-10-20 11:28 ` [RESEND PATCH v13 7/9] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping Gabriele Monaco
2025-10-20 11:28 ` [RESEND PATCH v13 8/9] cpumask: Add initialiser to use cleanup helpers Gabriele Monaco
2025-10-20 11:28 ` [RESEND PATCH v13 9/9] timers: Exclude isolated cpus from timer migration Gabriele Monaco
2025-10-30  2:56 ` [RESEND PATCH v13 0/9] " Waiman Long
2025-10-30 14:12   ` Frederic Weisbecker
     [not found]     ` <5457560d-f48a-4a99-8756-51b1017a6aab@redhat.com>
2025-10-30 16:09       ` Gabriele Monaco
2025-10-30 16:37         ` Waiman Long
2025-10-30 17:10           ` Frederic Weisbecker
2025-10-30 17:57             ` Waiman Long
2025-10-31 13:48               ` Frederic Weisbecker
2025-10-31 14:03                 ` Gabriele Monaco
2025-10-31 16:14                 ` Waiman Long
2025-10-30 17:08       ` Frederic Weisbecker [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQObphtf6Vbc-XLJ@localhost.localdomain \
    --to=frederic@kernel.org \
    --cc=anna-maria@linutronix.de \
    --cc=gmonaco@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=llong@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox