public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Gang Li <ligang.bdlg@bytedance.com>
Cc: <linux-api@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v5 0/2] sched/numa: add per-process numa_balancing
Date: Wed, 9 Nov 2022 19:11:41 -0800	[thread overview]
Message-ID: <49ed07b1-e167-7f94-9986-8e86fb60bb09@nvidia.com> (raw)
In-Reply-To: <20221027025302.45766-1-ligang.bdlg@bytedance.com>

On 10/26/22 19:53, Gang Li wrote:
> # Introduce
> Add PR_NUMA_BALANCING in prctl.
> 
> A large number of page faults will cause performance loss when numa
> balancing is performing. Thus those processes which care about worst-case
> performance need numa balancing disabled. Others, on the contrary, allow a
> temporary performance loss in exchange for higher average performance, so
> enable numa balancing is better for them.
> 
> Numa balancing can only be controlled globally by
> /proc/sys/kernel/numa_balancing. Due to the above case, we want to
> disable/enable numa_balancing per-process instead.

Hi Gang Li,

Wow, it feels like I'm getting a Christmas present early, this is great
news!

This feature is something we've always wanted for GPUs and Compute
Accelerators, too. Because what happens there is: we might have GPUs in
the system that have mapped CPU memory into their page tables. When
autonuma unmaps the CPU ptes, this triggers (via mmu invalidate
callbacks) an unmapping on each GPU. But GPU mapping and unmapping is
far heavier weight than the corresponding CPU operations.

And so for things such as OpenCL apps that run on a GPU, the only viable
approach is to somehow disable autonuma balancing. And until your series
here, that was a system wide setting, which leads to not being able to
ever have things set up "right", without constantly intervening at the
sysadmin level.

So for the series, please feel free to add:

Acked-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

> 
> Set per-process numa balancing:
> 	prctl(PR_NUMA_BALANCING, PR_SET_NUMA_BALANCING_DISABLE); //disable
> 	prctl(PR_NUMA_BALANCING, PR_SET_NUMA_BALANCING_ENABLE);  //enable
> 	prctl(PR_NUMA_BALANCING, PR_SET_NUMA_BALANCING_DEFAULT); //follow global
> Get numa_balancing state:
> 	prctl(PR_NUMA_BALANCING, PR_GET_NUMA_BALANCING, &ret);
> 	cat /proc/<pid>/status | grep NumaB_mode
> 
> # Unixbench multithread result
> I ran benchmark 20 times, but still have measurement error. I will run
> benchmark more precisely on the next version of this patchset.
> +-------------------+----------+
> |       NAME        | OVERHEAD |
> +-------------------+----------+
> | Dhrystone2        | -0.27%   |
> | Whetstone         | -0.17%   |
> | Execl             | -0.92%   |
> | File_Copy_1024    | 0.31%    |
> | File_Copy_256     | -1.96%   |
> | File_Copy_4096    | 0.40%    |
> | Pipe_Throughput   | -3.08%   |
> | Context_Switching | -1.11%   |
> | Process_Creation  | 3.24%    |
> | Shell_Scripts_1   | 0.26%    |
> | Shell_Scripts_8   | 0.32%    |
> | System_Call       | 0.10%    |
> +-------------------+----------+
> | Total             | -0.21%   |
> +-------------------+----------+
> 
> # Changes
> Changes in v5:
> - replace numab_enabled with numa_balancing_mode (Peter Zijlstra)
> - make numa_balancing_enabled and numa_balancing_mode inline (Peter Zijlstra)
> - use static_branch_inc/dec instead of static_branch_enable/disable (Peter Zijlstra)
> - delete CONFIG_NUMA_BALANCING in task_tick_fair (Peter Zijlstra)
> - reword commit, use imperative mood (Bagas Sanjaya)
> - Unixbench overhead result
> 
> Changes in v4:
> - code clean: add wrapper function `numa_balancing_enabled`
> 
> Changes in v3:
> - Fix compile error.
> 
> Changes in v2:
> - Now PR_NUMA_BALANCING support three states: enabled, disabled, default.
>    enabled and disabled will ignore global setting, and default will follow
>    global setting.
> 
> Gang Li (2):
>    sched/numa: use static_branch_inc/dec for sched_numa_balancing
>    sched/numa: add per-process numa_balancing
> 
>   Documentation/filesystems/proc.rst   |  2 ++
>   fs/proc/task_mmu.c                   | 20 ++++++++++++
>   include/linux/mm_types.h             |  3 ++
>   include/linux/sched/numa_balancing.h | 45 ++++++++++++++++++++++++++
>   include/uapi/linux/prctl.h           |  7 +++++
>   kernel/fork.c                        |  4 +++
>   kernel/sched/core.c                  | 26 +++++++--------
>   kernel/sched/fair.c                  |  9 +++---
>   kernel/sys.c                         | 47 ++++++++++++++++++++++++++++
>   mm/mprotect.c                        |  6 ++--
>   10 files changed, 150 insertions(+), 19 deletions(-)
> 


  parent reply	other threads:[~2022-11-10  3:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-27  2:53 [PATCH v5 0/2] sched/numa: add per-process numa_balancing Gang Li
2022-10-27  2:53 ` [PATCH v5 1/2] sched/numa: use static_branch_inc/dec for sched_numa_balancing Gang Li
2022-10-27  2:53 ` [PATCH v5 2/2] sched/numa: add per-process numa_balancing Gang Li
2022-11-10  3:11 ` John Hubbard [this message]
2022-11-16 20:45 ` [PATCH v5 0/2] " John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49ed07b1-e167-7f94-9986-8e86fb60bb09@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=ligang.bdlg@bytedance.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox