From: Ming Lei <ming.lei@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
Jens Axboe <axboe@kernel.dk>,
Wangyang Guo <wangyang.guo@intel.com>
Subject: Re: [PATCH] lib/group_cpus: fix cross-NUMA CPU assignment in group_cpus_evenly
Date: Mon, 22 Dec 2025 21:50:13 +0800 [thread overview]
Message-ID: <aUlMlf8P5xYOOsWr@fedora> (raw)
In-Reply-To: <20251221112354.3a0ee9e1824f2cac9572d170@linux-foundation.org>
On Sun, Dec 21, 2025 at 11:23:54AM -0800, Andrew Morton wrote:
> On Mon, 20 Oct 2025 20:46:46 +0800 Ming Lei <ming.lei@redhat.com> wrote:
>
> > When numgrps > nodes, group_cpus_evenly() can incorrectly assign CPUs
> > from different NUMA nodes to the same group due to the wrapping logic.
> > Then poor block IO performance is caused because of remote IO completion.
> > And it can be avoided completely in case of `numgrps > nodes` because
> > each numa node may includes more CPUs than group's.
>
> Please quantify "poor block IO performance", to help people understand
> the userspace-visible effect of this change.
It is usually a bug, given fast nvme IO perf may drop to 1/2 or 1/3 in case of
remote completion. queue mapping shouldn't cross CPUs from different numa
nodes in case of nr_queues >= nr_nodes.
>
> > The issue occurs when curgrp reaches last_grp and wraps to 0. This causes
> > CPUs from later-processed nodes to be added to groups that already contain
> > CPUs from earlier-processed nodes, violating NUMA locality.
> >
> > Example with 8 NUMA nodes, 16 groups:
> > - Each node gets 2 groups allocated
> > - After processing nodes, curgrp reaches 16
> > - Wrapping to 0 causes CPUs from node N to be added to group 0 which
> > already has CPUs from node 0
> >
> > Fix this by adding find_next_node_group() helper that searches for the
> > next group (starting from 0) that already contains CPUs from the same
> > NUMA node. When wrapping is needed, use this helper instead of blindly
> > wrapping to 0, ensuring CPUs are only added to groups within the same
> > NUMA node.
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > lib/group_cpus.c | 28 +++++++++++++++++++++++++---
>
> The patch overlaps (a lot) with Wangyang Guo's "lib/group_cpus: make
> group CPU cluster aware". I did a lot of surgery but got stuck on the
> absence of node_to_cpumask, so I guess the patch has bitrotted.
>
> Please update the changelog as above and redo this patch against
> Wangyang's patch (which will be in linux-next very soon).
Please ignore this patch now because I can't reproduce the original issue
on both v6.18 and v6.19-rc.
>
> Also, it would be great if you and Wangyang were to review and test
> each other's changes, thanks.
OK.
Thanks,
Ming
prev parent reply other threads:[~2025-12-22 13:50 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-20 12:46 [PATCH] lib/group_cpus: fix cross-NUMA CPU assignment in group_cpus_evenly Ming Lei
2025-10-27 1:07 ` Ming Lei
2025-11-05 3:35 ` Ming Lei
2025-12-21 19:23 ` Andrew Morton
2025-12-22 13:50 ` Ming Lei [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aUlMlf8P5xYOOsWr@fedora \
--to=ming.lei@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=wangyang.guo@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).