From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCB7FEB64DB for ; Thu, 22 Jun 2023 09:12:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232086AbjFVJMW (ORCPT ); Thu, 22 Jun 2023 05:12:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232095AbjFVJMB (ORCPT ); Thu, 22 Jun 2023 05:12:01 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B901E268F; Thu, 22 Jun 2023 02:02:46 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 132451063; Thu, 22 Jun 2023 02:03:30 -0700 (PDT) Received: from localhost (ionvoi01-desktop.cambridge.arm.com [10.2.78.69]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A41063F663; Thu, 22 Jun 2023 02:02:45 -0700 (PDT) Date: Thu, 22 Jun 2023 10:02:44 +0100 From: Ionela Voinescu To: Ricardo Neri Cc: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot , Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Zhao Liu , "Yuan, Perry" , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, "Tim C . Chen" , Zhao Liu Subject: Re: [PATCH v4 07/24] sched/fair: Compute IPC class scores for load balancing Message-ID: References: <20230613042422.5344-1-ricardo.neri-calderon@linux.intel.com> <20230613042422.5344-8-ricardo.neri-calderon@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230613042422.5344-8-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org On Monday 12 Jun 2023 at 21:24:05 (-0700), Ricardo Neri wrote: > When using IPCC scores to break ties between two scheduling groups, it is > necessary to consider both the current score and the score that would > result after load balancing. > > Compute the combined IPC class score of a scheduling group and the local > scheduling group. Compute both the current score and the prospective score. > > Collect IPCC statistics only for asym_packing and fully_busy scheduling > groups. These are the only cases that use IPCC scores. > > These IPCC statistics are used during idle load balancing. The candidate > scheduling group will have one fewer busy CPU after load balancing. This > observation is important for cores with SMT support. > > The IPCC score of scheduling groups composed of SMT siblings needs to > consider that the siblings share CPU resources. When computing the total > IPCC score of the scheduling group, divide the score of each sibling by > the number of busy siblings. > > Cc: Ben Segall > Cc: Daniel Bristot de Oliveira > Cc: Dietmar Eggemann > Cc: Ionela Voinescu > Cc: Joel Fernandes (Google) > Cc: Len Brown > Cc: Lukasz Luba > Cc: Mel Gorman > Cc: Perry Yuan > Cc: Rafael J. Wysocki > Cc: Srinivas Pandruvada > Cc: Steven Rostedt > Cc: Tim C. Chen > Cc: Valentin Schneider > Cc: Zhao Liu > Cc: x86@kernel.org > Cc: linux-pm@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Ricardo Neri > --- > Changes since v3: > * None > > Changes since v2: > * Also collect IPCC stats for fully_busy sched groups. > * Restrict use of IPCC stats to SD_ASYM_PACKING. (Ionela) > * Handle errors of arch_get_ipcc_score(). (Ionela) > > Changes since v1: > * Implemented cleanups and reworks from PeterZ. I took all his > suggestions, except the computation of the IPC score before and after > load balancing. We are computing not the average score, but the *total*. > * Check for the SD_SHARE_CPUCAPACITY to compute the throughput of the SMT > siblings of a physical core. > * Used the new interface names. > * Reworded commit message for clarity. > --- > kernel/sched/fair.c | 68 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 68 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index c0cab5e501b6..a51c65c9335f 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9114,6 +9114,8 @@ struct sg_lb_stats { > unsigned long min_score; /* Min(score(rq->curr->ipcc)) */ > unsigned short min_ipcc; /* Class of the task with the minimum IPCC score in the rq */ > unsigned long sum_score; /* Sum(score(rq->curr->ipcc)) */ > + long ipcc_score_after; /* Prospective IPCC score after load balancing */ > + unsigned long ipcc_score_before; /* IPCC score before load balancing */ > #endif > }; > > @@ -9452,6 +9454,62 @@ static void update_sg_lb_ipcc_stats(int dst_cpu, struct sg_lb_stats *sgs, > } > } > > +static void update_sg_lb_stats_scores(struct sg_lb_stats *sgs, > + struct sched_group *sg, > + struct lb_env *env) > +{ > + unsigned long score_on_dst_cpu, before; > + int busy_cpus; > + long after; > + > + if (!sched_ipcc_enabled()) > + return; > + > + /* > + * IPCC scores are only useful during idle load balancing. For now, > + * only asym_packing uses IPCC scores. > + */ > + if (!(env->sd->flags & SD_ASYM_PACKING) || > + env->idle == CPU_NOT_IDLE) > + return; > + > + /* > + * IPCC scores are used to break ties only between these types of > + * groups. > + */ > + if (sgs->group_type != group_fully_busy && > + sgs->group_type != group_asym_packing) > + return; > + > + busy_cpus = sgs->group_weight - sgs->idle_cpus; > + > + /* No busy CPUs in the group. No tasks to move. */ > + if (!busy_cpus) > + return; > + > + score_on_dst_cpu = arch_get_ipcc_score(sgs->min_ipcc, env->dst_cpu); > + > + /* > + * Do not use IPC scores. sgs::ipcc_score_{after, before} will be zero > + * and not used. > + */ > + if (IS_ERR_VALUE(score_on_dst_cpu)) > + return; > + > + before = sgs->sum_score; > + after = before - sgs->min_score; I don't believe this can end up being negative as the sum of all scores should be higher or equal to the min score, right? I'm just wondering if ipcc_score_after can be made unsigned long as well, just for consistency. > + > + /* SMT siblings share throughput. */ > + if (busy_cpus > 1 && sg->flags & SD_SHARE_CPUCAPACITY) { > + before /= busy_cpus; > + /* One sibling will become idle after load balance. */ > + after /= busy_cpus - 1; > + } > + > + sgs->ipcc_score_after = after + score_on_dst_cpu; > + sgs->ipcc_score_before = before; Shouldn't the score_on_dst_cpu be added to "after" before being divided between the SMT siblings? Thanks, Ionela. > +} > + > #else /* CONFIG_IPC_CLASSES */ > static void update_sg_lb_ipcc_stats(int dst_cpu, struct sg_lb_stats *sgs, > struct rq *rq) > @@ -9461,6 +9519,13 @@ static void update_sg_lb_ipcc_stats(int dst_cpu, struct sg_lb_stats *sgs, > static void init_rq_ipcc_stats(struct sg_lb_stats *sgs) > { > } > + > +static void update_sg_lb_stats_scores(struct sg_lb_stats *sgs, > + struct sched_group *sg, > + struct lb_env *env) > +{ > +} > + > #endif /* CONFIG_IPC_CLASSES */ > > /** > @@ -9620,6 +9685,9 @@ static inline void update_sg_lb_stats(struct lb_env *env, > > sgs->group_type = group_classify(env->sd->imbalance_pct, group, sgs); > > + if (!local_group) > + update_sg_lb_stats_scores(sgs, group, env); > + > /* Computing avg_load makes sense only when group is overloaded */ > if (sgs->group_type == group_overloaded) > sgs->avg_load = (sgs->group_load * SCHED_CAPACITY_SCALE) / > -- > 2.25.1 >