public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: kernel test robot <oliver.sang@intel.com>
Cc: 0day robot <lkp@intel.com>, LKML <linux-kernel@vger.kernel.org>,
	lkp@lists.01.org, ying.huang@intel.com, feng.tang@intel.com,
	zhengjun.xing@linux.intel.com, fengwei.yin@intel.com,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	yu.c.chen@intel.com
Subject: Re: [sched/numa]  bb2dee337b:  unixbench.score -11.2% regression
Date: Wed, 18 May 2022 16:22:58 +0100	[thread overview]
Message-ID: <20220518152258.GR3441@techsingularity.net> (raw)
In-Reply-To: <20220518092414.GA15472@xsang-OptiPlex-9020>

On Wed, May 18, 2022 at 05:24:14PM +0800, kernel test robot wrote:
> 
> 
> Greeting,
> 
> FYI, we noticed a -11.2% regression of unixbench.score due to commit:
> 
> 
> commit: bb2dee337bd7d314eb7c7627e1afd754f86566bc ("[PATCH 3/4] sched/numa: Apply imbalance limitations consistently")
> url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Mitigate-inconsistent-NUMA-imbalance-behaviour/20220511-223233
> base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git d70522fc541224b8351ac26f4765f2c6268f8d72
> patch link: https://lore.kernel.org/lkml/20220511143038.4620-4-mgorman@techsingularity.net
> 
> in testcase: unixbench
> on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz with 256G memory
> with following parameters:
> 
> 	runtime: 300s
> 	nr_task: 1
> 	test: shell8
> 	cpufreq_governor: performance
> 	ucode: 0xd000331
> 
> test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
> test-url: https://github.com/kdlucas/byte-unixbench

I think what is happening for unixbench is that it prefers to run all
instances on a local node if possible. shell8 is creating 8 scripts,
each of which spawn more processes. The total number of tasks may exceed
the allowed imbalance at fork time of 16 tasks. Some spill over to a
remote node and as they are using files, some accesses are remote and it
suffers. It's not memory bandwidth bound but is sensitive to locality.
The stats somewhat support this idea

>      83590 ± 13%     -73.7%      21988 ± 32%  numa-meminfo.node0.AnonHugePages
>     225657 ± 18%     -58.0%      94847 ± 18%  numa-meminfo.node0.AnonPages
>     231652 ± 17%     -55.3%     103657 ± 16%  numa-meminfo.node0.AnonPages.max
>     234525 ± 17%     -55.5%     104341 ± 18%  numa-meminfo.node0.Inactive
>     234397 ± 17%     -55.5%     104267 ± 18%  numa-meminfo.node0.Inactive(anon)
>      11724 ±  7%     +17.5%      13781 ±  5%  numa-meminfo.node0.KernelStack
>       4472 ± 34%    +117.1%       9708 ± 31%  numa-meminfo.node0.PageTables
>      15239 ± 75%    +401.2%      76387 ± 10%  numa-meminfo.node1.AnonHugePages
>      67256 ± 63%    +206.3%     205994 ±  6%  numa-meminfo.node1.AnonPages
>      73568 ± 58%    +193.1%     215644 ±  6%  numa-meminfo.node1.AnonPages.max
>      75737 ± 53%    +183.9%     215053 ±  6%  numa-meminfo.node1.Inactive
>      75709 ± 53%    +183.9%     214971 ±  6%  numa-meminfo.node1.Inactive(anon)
>       3559 ± 42%    +187.1%      10216 ±  8%  numa-meminfo.node1.PageTables

There is less memory used on one node and more on the other so it's
getting split.

> In addition to that, the commit also has significant impact on the following tests:
> 
> +------------------+-------------------------------------------------------------------------------------+
> | testcase: change | fsmark: fsmark.files_per_sec -21.5% regression                                      |
> | test machine     | 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory |
> | test parameters  | cpufreq_governor=performance                                                        |
> |                  | disk=1SSD                                                                           |
> |                  | filesize=8K                                                                         |
> |                  | fs=f2fs                                                                             |
> |                  | iterations=8                                                                        |
> |                  | nr_directories=16d                                                                  |
> |                  | nr_files_per_directory=256fpd                                                       |
> |                  | nr_threads=4                                                                        |
> |                  | sync_method=fsyncBeforeClose                                                        |
> |                  | test_size=72G                                                                       |
> |                  | ucode=0x500320a                                                                     |
> +------------------+-------------------------------------------------------------------------------------+
> 

It's less clearcut for this from the stats but it's likely getting split
too and had preferred locality. It's curious that f2fs is affected but
maybe other filesystems were too.

In both cases, the workloads are not memory bandwidth limited so prefer to
stack on one node and previously, because they were cache hot, the load
balancer would avoid splitting them apart if there were other candidates
available.

This is a tradeoff between loads that want to stick on one node for
locality because they are not bandwidth limited and workloads that are
memory bandwidth limited and want to spread wide. We can't tell what
type of workload it is at fork time.

Given there is no crystal ball and it's a tradeoff, I think it's better
to be consistent and use similar logic at both fork time and runtime even
if it doesn't have universal benefit.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2022-05-18 15:23 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-11 14:30 [PATCH 0/4] Mitigate inconsistent NUMA imbalance behaviour Mel Gorman
2022-05-11 14:30 ` [PATCH 1/4] sched/numa: Initialise numa_migrate_retry Mel Gorman
2022-05-11 14:30 ` [PATCH 2/4] sched/numa: Do not swap tasks between nodes when spare capacity is available Mel Gorman
2022-05-11 14:30 ` [PATCH 3/4] sched/numa: Apply imbalance limitations consistently Mel Gorman
2022-05-18  9:24   ` [sched/numa] bb2dee337b: unixbench.score -11.2% regression kernel test robot
2022-05-18 15:22     ` Mel Gorman [this message]
2022-05-19  7:54       ` ying.huang
2022-05-20  6:44         ` [LKP] " Ying Huang
2022-05-18  9:31   ` [PATCH 3/4] sched/numa: Apply imbalance limitations consistently Peter Zijlstra
2022-05-18 10:46     ` Mel Gorman
2022-05-18 13:59       ` Peter Zijlstra
2022-05-18 15:39         ` Mel Gorman
2022-05-11 14:30 ` [PATCH 4/4] sched/numa: Adjust imb_numa_nr to a better approximation of memory channels Mel Gorman
2022-05-18  9:41   ` Peter Zijlstra
2022-05-18 11:15     ` Mel Gorman
2022-05-18 14:05       ` Peter Zijlstra
2022-05-18 17:06         ` Mel Gorman
2022-05-19  9:29           ` Mel Gorman
2022-05-20  4:58 ` [PATCH 0/4] Mitigate inconsistent NUMA imbalance behaviour K Prateek Nayak
2022-05-20 10:18   ` Mel Gorman
2022-05-20 15:17     ` K Prateek Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220518152258.GR3441@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=aubrey.li@linux.intel.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=mingo@kernel.org \
    --cc=oliver.sang@intel.com \
    --cc=peterz@infradead.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=ying.huang@intel.com \
    --cc=yu.c.chen@intel.com \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox