From: Parth Shah <parth@linux.ibm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>,
mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
dietmar.eggemann@arm.com, rostedt@goodmis.org,
bsegall@google.com, mgorman@suse.de,
linux-kernel@vger.kernel.org
Cc: pauld@redhat.com, valentin.schneider@arm.com, hdanton@sina.com
Subject: Re: [PATCH v4 2/5] sched/numa: Replace runnable_load_avg by load_avg
Date: Sun, 23 Feb 2020 12:02:14 +0530 [thread overview]
Message-ID: <c595c3be-2fbf-4cd3-5c58-7b5faa055d2f@linux.ibm.com> (raw)
In-Reply-To: <20200221132715.20648-3-vincent.guittot@linaro.org>
On 2/21/20 6:57 PM, Vincent Guittot wrote:
> Similarly to what has been done for the normal load balancer, we can
> replace runnable_load_avg by load_avg in numa load balancing and track the
> other statistics like the utilization and the number of running tasks to
> get to better view of the current state of a node.
>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> Reviewed-by: "Dietmar Eggemann <dietmar.eggemann@arm.com>"
> ---
> kernel/sched/fair.c | 102 ++++++++++++++++++++++++++++++--------------
> 1 file changed, 70 insertions(+), 32 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 27450c4ddc81..637f4eb47889 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1473,38 +1473,35 @@ bool should_numa_migrate_memory(struct task_struct *p, struct page * page,
> group_faults_cpu(ng, src_nid) * group_faults(p, dst_nid) * 4;
> }
>
> -static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq *cfs_rq);
> -
> -static unsigned long cpu_runnable_load(struct rq *rq)
> -{
> - return cfs_rq_runnable_load_avg(&rq->cfs);
> -}
> +/*
> + * 'numa_type' describes the node at the moment of load balancing.
> + */
> +enum numa_type {
> + /* The node has spare capacity that can be used to run more tasks. */
> + node_has_spare = 0,
> + /*
> + * The node is fully used and the tasks don't compete for more CPU
> + * cycles. Nevertheless, some tasks might wait before running.
> + */
> + node_fully_busy,
> + /*
> + * The node is overloaded and can't provide expected CPU cycles to all
> + * tasks.
> + */
> + node_overloaded
> +};
>
> /* Cached statistics for all CPUs within a node */
> struct numa_stats {
> unsigned long load;
> -
> + unsigned long util;
> /* Total compute capacity of CPUs on a node */
> unsigned long compute_capacity;
> + unsigned int nr_running;
> + unsigned int weight;
> + enum numa_type node_type;
> };
>
> -/*
> - * XXX borrowed from update_sg_lb_stats
> - */
> -static void update_numa_stats(struct numa_stats *ns, int nid)
> -{
> - int cpu;
> -
> - memset(ns, 0, sizeof(*ns));
> - for_each_cpu(cpu, cpumask_of_node(nid)) {
> - struct rq *rq = cpu_rq(cpu);
> -
> - ns->load += cpu_runnable_load(rq);
> - ns->compute_capacity += capacity_of(cpu);
> - }
> -
> -}
> -
> struct task_numa_env {
> struct task_struct *p;
>
> @@ -1521,6 +1518,47 @@ struct task_numa_env {
> int best_cpu;
> };
>
> +static unsigned long cpu_load(struct rq *rq);
> +static unsigned long cpu_util(int cpu);
> +
> +static inline enum
> +numa_type numa_classify(unsigned int imbalance_pct,
> + struct numa_stats *ns)
> +{
> + if ((ns->nr_running > ns->weight) &&
> + ((ns->compute_capacity * 100) < (ns->util * imbalance_pct)))
> + return node_overloaded;
> +
> + if ((ns->nr_running < ns->weight) ||
> + ((ns->compute_capacity * 100) > (ns->util * imbalance_pct)))
> + return node_has_spare;
> +
> + return node_fully_busy;
> +}
I was pondering upon the possible cases of returning node_fully_busy here.
It will return fully busy only when scaled util is exactly equal to
capacity && ns->nr_running == ns->weight. From reading the patch-set, I
failed to figure out the implications of it. Ideally, the tasks should
neither be pulled to or pulled from this node. Is this what its use for?
If yes, then should we return false when checking for load_too_imbalanced
and found that env->dst_stats.node_type == node_fully_busy ?
[...]
> @@ -1556,6 +1594,11 @@ static bool load_too_imbalanced(long src_load, long dst_load,
> long orig_src_load, orig_dst_load;
> long src_capacity, dst_capacity;
>
> +
> + /* If dst node has spare capacity, there is no real load imbalance */
> + if (env->dst_stats.node_type == node_has_spare)
> + return false;
[...]
- Parth
next prev parent reply other threads:[~2020-02-23 6:34 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-21 13:27 [PATCH v4 0/5] remove runnable_load_avg and improve group_classify Vincent Guittot
2020-02-21 13:27 ` [PATCH v4 1/5] sched/fair: Reorder enqueue/dequeue_task_fair path Vincent Guittot
2020-02-21 13:27 ` [PATCH v4 2/5] sched/numa: Replace runnable_load_avg by load_avg Vincent Guittot
2020-02-23 6:32 ` Parth Shah [this message]
2020-02-21 13:27 ` [PATCH v4 3/5] sched/pelt: Remove unused runnable load average Vincent Guittot
2020-02-21 13:27 ` [PATCH v4 4/5] sched/pelt: Add a new runnable average signal Vincent Guittot
2020-02-24 10:05 ` Parth Shah
2020-02-21 13:27 ` [PATCH v4 5/5] sched/fair: Take into account runnable_avg to classify group Vincent Guittot
2020-02-21 15:09 ` [PATCH v4 0/5] remove runnable_load_avg and improve group_classify Mel Gorman
2020-02-23 6:08 ` Parth Shah
[not found] ` <20200222055522.9548-1-hdanton@sina.com>
2020-02-24 8:32 ` [PATCH v4 5/5] sched/fair: Take into account runnable_avg to classify group Vincent Guittot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c595c3be-2fbf-4cd3-5c58-7b5faa055d2f@linux.ibm.com \
--to=parth@linux.ibm.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=hdanton@sina.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox