From: Ingo Molnar <mingo@kernel.org>
To: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: peterz@infradead.org, vincent.guittot@linaro.org,
linux-kernel@vger.kernel.org, kprateek.nayak@amd.com,
dietmar.eggemann@arm.com, vschneid@redhat.com,
rostedt@goodmis.org, tglx@linutronix.de,
tim.c.chen@linux.intel.com
Subject: Re: [PATCH 4/4] sched/fair: Remove atomic nr_cpus and use cpumask instead
Date: Mon, 1 Dec 2025 20:58:35 +0100 [thread overview]
Message-ID: <aS3za7X9BLS5rg65@gmail.com> (raw)
In-Reply-To: <20251201183146.74443-5-sshegde@linux.ibm.com>
* Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
> nohz_balance_enter_idle:
> cpumask_set_cpu(cpu, nohz.idle_cpus_mask)
> atomic_inc(&nohz.nr_cpus)
>
> nohz_balance_exit_idle:
> cpumask_clear_cpu(rq->cpu, nohz.idle_cpus_mask)
> atomic_dec(&nohz.nr_cpus)
>
> kick_ilb:
> if (likely(!atomic_read(&nohz.nr_cpus)))
> return;
>
> So, idle_cpus_mask contains the same information. Instead of doing
> costly atomic in large systems, its better to check if cpumask is empty
> or not to make the same decision to trigger idle load balance.
>
> There might be race between cpumask_empty check and set of cpumask in
> the remote CPUs. In such case at next tick idle load balance will be
> triggered. Race of clearing the bit is not a concern, since _nohz_idle_balance
> checks if CPU is idle or not before doing the balance.
>
> cpumask_empty uses ffs. So should not be very costly.
>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> static struct {
> cpumask_var_t idle_cpus_mask;
> - atomic_t nr_cpus;
> int has_blocked; /* Idle CPUS has blocked load */
> int needs_update; /* Newly idle CPUs need their next_balance collated */
> unsigned long next_balance; /* in jiffy units */
> @@ -12450,7 +12449,7 @@ static void nohz_balancer_kick(struct rq *rq)
> * None are in tickless mode and hence no need for NOHZ idle load
> * balancing, do stats update if its due
> */
> - if (unlikely(!atomic_read(&nohz.nr_cpus)))
> + if (unlikely(cpumask_empty(nohz.idle_cpus_mask)))
> goto out;
So the thing is, if the goal is to avoid cacheline
bouncing, this won't fundamentally change the
situation:
> rq->nohz_tick_stopped = 0;
> cpumask_clear_cpu(rq->cpu, nohz.idle_cpus_mask);
> - atomic_dec(&nohz.nr_cpus);
nohz.idle_cpus_mask will be on a single 64-byte
cacheline even on 512 CPU systems, and the
cpumask_clear_cpu() and cpumask_set_cpu() calls will
dirty the cacheline and make it bounce with exactly the
same frequency as the atomic_inc/dec() of nohz.nr_cpus
does today.
From the 0/4 boilerplate description:
> It was noted when running on large systems
> nohz.nr_cpus cacheline was bouncing quite often.
> There is atomic inc/dec and read happening on many
> CPUs at a time and it is possible for this line to
> bounce often.
That the nr_cpus modification is an atomic op doesn't
change the situation much in terms of cacheline
bouncing, because the cacheline dirtying will still
cause comparable levels of bouncing on modern CPUs with
modern cache coherency protocols.
If nr_cpus and nohz.nr_cpus are in separate cachelines,
then this patch might eliminate about half of the
bounces - but AFAICS they are right next to each other,
so unless it's off-stack cpumasks, they should be in
the same cacheline. Half of 'bad bouncing' is still
kinda 'bad bouncing'. :-)
I'm not really objecting to the patch, because it would
reduce cacheline bouncing in the offstack-mask case,
but the explanation isn't very clear about these
details.
Thanks,
Ingo
next prev parent reply other threads:[~2025-12-01 19:58 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-01 18:31 [PATCH 0/4] sched/fair: improve nohz fields for large systems Shrikanth Hegde
2025-12-01 18:31 ` [PATCH 1/4] sched/fair: Move checking for nohz cpus after time check Shrikanth Hegde
2025-12-01 18:31 ` [PATCH 2/4] sched/fair: Change likelyhood of nohz nr_cpus check Shrikanth Hegde
2025-12-01 18:31 ` [PATCH 3/4] sched/fair: Check for blocked task after time check Shrikanth Hegde
2025-12-02 6:26 ` Ingo Molnar
2025-12-02 6:55 ` Shrikanth Hegde
2025-12-01 18:31 ` [PATCH 4/4] sched/fair: Remove atomic nr_cpus and use cpumask instead Shrikanth Hegde
2025-12-01 19:58 ` Ingo Molnar [this message]
2025-12-02 5:29 ` Shrikanth Hegde
2025-12-02 7:54 ` Ingo Molnar
2025-12-02 14:35 ` Shrikanth Hegde
2025-12-02 16:14 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aS3za7X9BLS5rg65@gmail.com \
--to=mingo@kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sshegde@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox