Re: [PATCH 4/4] sched/numa: Adjust imb_numa_nr to a better approximation of memory channels

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@techsingularity.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/4] sched/numa: Adjust imb_numa_nr to a better approximation of memory channels
Date: Wed, 18 May 2022 18:06:25 +0100	[thread overview]
Message-ID: <20220518170625.GT3441@techsingularity.net> (raw)
In-Reply-To: <YoT9D0YGlWwHQMQi@hirez.programming.kicks-ass.net>

On Wed, May 18, 2022 at 04:05:03PM +0200, Peter Zijlstra wrote:
> On Wed, May 18, 2022 at 12:15:39PM +0100, Mel Gorman wrote:
> 
> > I'm not aware of how it can be done in-kernel on a cross architectural
> > basis. Reading through the arch manual, it states how many channels are
> > in a given processor family and it's available during memory check errors
> > (apparently via the EDAC driver). It's sometimes available via PMUs but
> > I couldn't find a place where it's generically available for topology.c
> > that would work on all x86-64 machines let alone every other architecture.
> 
> So provided it is something we want (below) we can always start an arch
> interface and fill it out where needed.
> 

It could start with a function with a fixed value that architectures
can override but it might be a deep rabbit hole to discover and wire
it all up.  The most straight-forward would be based on CPU family and
model but time consuming to maintain. It gets fuzzy if it's something
like PowerKVM where channel details are hidden. It could be a deep
rabbit hole.

> > It's not even clear if SMBIOS was parsed in early boot whether
> 
> We can always rebuild topology / update variables slightly later in
> boot.
> 
> > it's a
> > good idea. It could result in difference imbalance thresholds for each
> > NUMA domain or weird corner cases where assymetric NUMA node populations
> > would result in run-to-run variance that are difficult to analyse.
> 
> Yeah, maybe. OTOH having a magic value that's guestimated based on
> hardware of the day is something that'll go bad any moment as well.
> 
> I'm not too worried about run-to-run since people don't typically change
> DIMM population over a reboot, but yes, there's always going to be
> corner cases. Same with a fixed value though, that's also going to be
> wrong.
> 

By run-to-run, I mean just running the same workload in a loop and
not rebooting between runs. If there are differences in how nodes are
populated, there will be some run-to-run variance based purely on what
node the workload started on because they will have different "allowed
imbalance" thresholds.

I'm running the tests to recheck exactly how much impact this patch has
on the peak performance. It takes a few hours so I won't have anything
until tomorrow.

Initially "get peak performance" and "stabilise run-to-run variances"
were my objectives. This series only aimed at the peak performance for a
finish as allowed NUMA imbalance was not the sole cause of the problem.
I still haven't spent time figuring out why c6f886546cb8 ("sched/fair:
Trigger the update of blocked load on newly idle cpu") made such a big
difference to variability.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2022-05-18 17:06 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-11 14:30 [PATCH 0/4] Mitigate inconsistent NUMA imbalance behaviour Mel Gorman
2022-05-11 14:30 ` [PATCH 1/4] sched/numa: Initialise numa_migrate_retry Mel Gorman
2022-05-11 14:30 ` [PATCH 2/4] sched/numa: Do not swap tasks between nodes when spare capacity is available Mel Gorman
2022-05-11 14:30 ` [PATCH 3/4] sched/numa: Apply imbalance limitations consistently Mel Gorman
2022-05-18  9:24   ` [sched/numa] bb2dee337b: unixbench.score -11.2% regression kernel test robot
2022-05-18 15:22     ` Mel Gorman
2022-05-19  7:54       ` ying.huang
2022-05-20  6:44         ` [LKP] " Ying Huang
2022-05-18  9:31   ` [PATCH 3/4] sched/numa: Apply imbalance limitations consistently Peter Zijlstra
2022-05-18 10:46     ` Mel Gorman
2022-05-18 13:59       ` Peter Zijlstra
2022-05-18 15:39         ` Mel Gorman
2022-05-11 14:30 ` [PATCH 4/4] sched/numa: Adjust imb_numa_nr to a better approximation of memory channels Mel Gorman
2022-05-18  9:41   ` Peter Zijlstra
2022-05-18 11:15     ` Mel Gorman
2022-05-18 14:05       ` Peter Zijlstra
2022-05-18 17:06         ` Mel Gorman [this message]
2022-05-19  9:29           ` Mel Gorman
2022-05-20  4:58 ` [PATCH 0/4] Mitigate inconsistent NUMA imbalance behaviour K Prateek Nayak
2022-05-20 10:18   ` Mel Gorman
2022-05-20 15:17     ` K Prateek Nayak
  -- strict thread matches above, loose matches on Subject: below --
2022-05-20 10:35 [PATCH v2 " Mel Gorman
2022-05-20 10:35 ` [PATCH 4/4] sched/numa: Adjust imb_numa_nr to a better approximation of memory channels Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220518170625.GT3441@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=aubrey.li@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox