Re: [PATCH v4] sched/numa: add per-process numa_balancing

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Gang Li <ligang.bdlg@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	linux-api@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v4] sched/numa: add per-process numa_balancing
Date: Thu, 29 Sep 2022 10:45:01 +0200	[thread overview]
Message-ID: <YzVbDbLOYUVNnWRu@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20220929064359.46932-1-ligang.bdlg@bytedance.com>


The alternative to this is ofcourse to have your latency critical
applications use mbind()/set_mempolicy() etc.., because surely, them
being timing critical, they have the infrastructure to do this right?

Because timing critical software doesn't want it's memory spread
randomly, because well random is bad for performance, hmm?

And once numa balancing sees all the memory has an expliciy policy, it
won't touch it.

On Thu, Sep 29, 2022 at 02:43:58PM +0800, Gang Li wrote:

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ef0e6b3e08ff..87215b3776c9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2818,6 +2818,24 @@ void task_numa_free(struct task_struct *p, bool final)
>  	}
>  }
>  
> +inline bool numa_balancing_enabled(struct task_struct *p)

Does this want to be static?

> +{
> +	if (p->mm) {
> +		int numab = p->mm->numab_enabled;
> +
> +		switch (numab) {
> +		case NUMAB_ENABLED:
> +			return true;
> +		case NUMAB_DISABLED:
> +			return false;
> +		case NUMAB_DEFAULT:
> +			break;
> +		}
> +	}
> +
> +	return static_branch_unlikely(&sched_numa_balancing);
> +}

Blergh, this sucks. Now you have the unconditional pointer chasing and
cache-misses. The advantage of sched_numa_balancing was that there is no
overhead when disabled.

Also, "numab" is a weird word.

What about something like:

static inline bool numa_balancing_enabled(struct task_struct *p)
{
	if (!static_branch_unlikely(&sched_numa_balancing))
		return false;

	if (p->mm) switch (p->mm->numa_balancing_mode) {
	case NUMA_BALANCING_ENABLED:
		return true;
	case NUMA_BALANCING_DISABLED:
		return false
	default:
		break;
	}

	return sysctl_numa_balancing_mode;
}

( Note how that all following the existing 'numa_balancing' wording
  without inventing weird new words. )

And then you frob the sysctl and prctl such that sched_numa_balancing
and sysctl_numa_balancing_mode are not tied together just so.
Specifically, I'm thinking you should use static_branch_inc() to count
how many enables you have, one for the default and one for each prctl().
Then it all just works.

> @@ -11581,8 +11599,10 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
>  		entity_tick(cfs_rq, se, queued);
>  	}
>  
> -	if (static_branch_unlikely(&sched_numa_balancing))
> +#ifdef CONFIG_NUMA_BALANCING
> +	if (numa_balancing_enabled(curr))
>  		task_tick_numa(rq, curr);
> +#endif
>  
>  	update_misfit_status(curr, rq);
>  	update_overutilized_status(task_rq(curr));

Surely you can make that #ifdef go away without much effort.

> diff --git a/kernel/sys.c b/kernel/sys.c
> index 8a6432465dc5..11720a35455a 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -59,6 +59,7 @@
>  #include <linux/sched/coredump.h>
>  #include <linux/sched/task.h>
>  #include <linux/sched/cputime.h>
> +#include <linux/sched/numa_balancing.h>
>  #include <linux/rcupdate.h>
>  #include <linux/uidgid.h>
>  #include <linux/cred.h>
> @@ -2101,6 +2102,23 @@ static int prctl_set_auxv(struct mm_struct *mm, unsigned long addr,
>  	return 0;
>  }
>  
> +#ifdef CONFIG_NUMA_BALANCING
> +static int prctl_pid_numa_balancing_write(int numa_balancing)
> +{
> +	if (numa_balancing != PR_SET_NUMAB_DEFAULT
> +	    && numa_balancing != PR_SET_NUMAB_DISABLED
> +	    && numa_balancing != PR_SET_NUMAB_ENABLED)
> +		return -EINVAL;

Operators go at the end of the line.

> +	current->mm->numab_enabled = numa_balancing;
> +	return 0;
> +}

     prev parent reply	other threads:[~2022-09-29  8:45 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-29  6:43 [PATCH v4] sched/numa: add per-process numa_balancing Gang Li
2022-09-29  8:22 ` Bagas Sanjaya
2022-09-29  8:45 ` Peter Zijlstra [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YzVbDbLOYUVNnWRu@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=corbet@lwn.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=ligang.bdlg@bytedance.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox