From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16A7D15E5A6; Thu, 5 Dec 2024 18:04:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733421895; cv=none; b=K9yEjY8uYNv+siE46A0KdhAtrRaS7Mcjvry+sPJeIMAGV7vZPBcrprinOhJEPJiZA+Rv0eQehCiYwb7EiffU9WiwOBMkrg3UO4ZnQii1ZlxGVrtUTXfR0pqT1LloybhLKxRH3uzOVe2cwGUjjT5qZeDMO28LCAd+BGpYIwKHSaI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733421895; c=relaxed/simple; bh=G90FdBeMIQHJai+DJeniOB7I1nuN36CoXanhJkEHyOg=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=NwWI7eEXEZq70/K/Fz7YVZXtsb0FL/ZWj+jJY679f9bJQDPKsPP1YSXZIZnoOjlUCo9T7BvNYeXJp0CynWfUmz6UDGuaBC/ZgGfVJ1qaqr3HJ8sqoNqZ+MUUNpXiwm2H4zeRdhG1Bne3QAG4QtL5uoxEE88XHM2mcsI8/3P6zwE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=aqEArEnG; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="aqEArEnG" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4B5ElQF9012690; Thu, 5 Dec 2024 18:04:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pp1; bh=7eRIiB aC2/NJSVx6fPq7rzlPrkL1wtTVmAMBA3NWeL0=; b=aqEArEnGuFeufiH9H7gzHo 8e6nRsCij4idRce1vrCJIhRBXX6kE1MRYgfqs39UHvfIcizViBRkpXGrPUHQKJZb mzMw6ZL5YqMDz3sYb6Vh/TIg0G66z0JsSsFJy6DS+/wx2s6L5RqpMMHa5SsKkwZ9 pWeKTPoRi7KOEx0DuFFsvpn7JpsrXrO41NoVVqnnRfhJ1AMrIvIpaGD+tVDHDNu3 e+n2/1mzwzEciLfsGJWl7ngLgaUK0B0nC54fiFUwm2EfeBELBk7t5QN+XbWogO2w QsmWzI3fDvT8USvl9K9BRdoYH+QRuDyGaPW6vEy7RXPxF/hKp0l5d08qOlDSryag == Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 437s4jf2bb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 05 Dec 2024 18:04:36 +0000 (GMT) Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 4B5I1Xw7028714; Thu, 5 Dec 2024 18:04:36 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 437s4jf2b5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 05 Dec 2024 18:04:36 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 4B5GoFSl023083; Thu, 5 Dec 2024 18:04:35 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 438e1na1cb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 05 Dec 2024 18:04:34 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4B5I4WjK27591078 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 5 Dec 2024 18:04:33 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D915A20040; Thu, 5 Dec 2024 18:04:32 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7F4852004B; Thu, 5 Dec 2024 18:04:26 +0000 (GMT) Received: from [9.39.27.71] (unknown [9.39.27.71]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 5 Dec 2024 18:04:26 +0000 (GMT) Message-ID: <5f8584e4-d180-4f65-ab42-9b0348b703d5@linux.ibm.com> Date: Thu, 5 Dec 2024 23:34:24 +0530 Precedence: bulk X-Mailing-List: linux-s390@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 1/2] sched/fair: introduce new scheduler group type group_parked To: Tobias Huschle , linux-kernel@vger.kernel.org Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org References: <20241204112149.25872-1-huschle@linux.ibm.com> <20241204112149.25872-2-huschle@linux.ibm.com> From: Shrikanth Hegde Content-Language: en-US In-Reply-To: <20241204112149.25872-2-huschle@linux.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: NM30V_Wo2qC9g26fwAMpIFAy4CehXfe0 X-Proofpoint-ORIG-GUID: 5-anfkpRB6e3IRu6bxOwdXtDfucRDwva X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-15_01,2024-10-11_01,2024-09-30_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1011 suspectscore=0 adultscore=0 mlxscore=0 mlxlogscore=999 lowpriorityscore=0 bulkscore=0 impostorscore=0 phishscore=0 spamscore=0 malwarescore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2411120000 definitions=main-2412050132 On 12/4/24 16:51, Tobias Huschle wrote: > A parked CPU is considered to be flagged as unsuitable to process > workload at the moment, but might be become usable anytime. Depending on > the necessity for additional computation power and/or available capacity > of the underlying hardware. > > A scheduler group is considered to be parked if it only contains parked > CPUs. A parked scheduler group is considered to be busier than another > if it runs more tasks than the other parked scheduler group. > > Indicators whether a CPU should be parked depend on the underlying > hardware and must be considered to be architecture dependent. > Therefore the check whether a CPU is parked is architecture specific. > For architectures not relying on this feature, the check is a NOP. > > This is more efficient and non-disruptive compared to CPU hotplug in > environments where such changes can be necessary on a frequent basis. > > Signed-off-by: Tobias Huschle > --- > include/linux/sched/topology.h | 20 ++++++ > kernel/sched/core.c | 10 ++- > kernel/sched/fair.c | 122 ++++++++++++++++++++++++++------- > 3 files changed, 127 insertions(+), 25 deletions(-) > > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h > index 4237daa5ac7a..cfe3c59bc329 100644 > --- a/include/linux/sched/topology.h > +++ b/include/linux/sched/topology.h > @@ -270,6 +270,26 @@ unsigned long arch_scale_cpu_capacity(int cpu) > } > #endif > > +#ifndef arch_cpu_parked > +/** > + * arch_cpu_parked - Check if a given CPU is currently parked. > + * > + * A parked CPU cannot run any kind of workload since underlying > + * physical CPU should not be used at the moment . > + * > + * @cpu: the CPU in question. > + * > + * By default assume CPU is not parked > + * > + * Return: Parked state of CPU > + */ > +static __always_inline > +unsigned long arch_cpu_parked(int cpu) bool instead? > +{ > + return false; > +} > +#endif > + > #ifndef arch_scale_hw_pressure > static __always_inline > unsigned long arch_scale_hw_pressure(int cpu) > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 1dee3f5ef940..8f9aeb97c396 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2437,7 +2437,7 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu) > > /* Non kernel threads are not allowed during either online or offline. */ > if (!(p->flags & PF_KTHREAD)) > - return cpu_active(cpu); > + return !arch_cpu_parked(cpu) && cpu_active(cpu); > > /* KTHREAD_IS_PER_CPU is always allowed. */ > if (kthread_is_per_cpu(p)) > @@ -2447,6 +2447,10 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu) > if (cpu_dying(cpu)) > return false; > > + /* CPU should be avoided at the moment */ > + if (arch_cpu_parked(cpu)) > + return false; > + > /* But are allowed during online. */ > return cpu_online(cpu); > } > @@ -3924,6 +3928,10 @@ static inline bool ttwu_queue_cond(struct task_struct *p, int cpu) > if (task_on_scx(p)) > return false; > > + /* The task should not be queued onto a parked CPU. */ > + if (arch_cpu_parked(cpu)) > + return false; > + When it comes here, likely cpu is not parked since wakeup path has those checks. > /* > * Do not complicate things with the async wake_list while the CPU is > * in hotplug state. > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 4283c818bbd1..fa1c19d285de 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7415,6 +7415,9 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, > { > int target = nr_cpumask_bits; > > + if (arch_cpu_parked(target)) > + return prev_cpu; > + > if (sched_feat(WA_IDLE)) > target = wake_affine_idle(this_cpu, prev_cpu, sync); > > @@ -7454,6 +7457,9 @@ sched_balance_find_dst_group_cpu(struct sched_group *group, struct task_struct * > for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) { > struct rq *rq = cpu_rq(i); > > + if (arch_cpu_parked(i)) > + continue; > + > if (!sched_core_cookie_match(rq, p)) > continue; > > @@ -7546,10 +7552,14 @@ static inline int sched_balance_find_dst_cpu(struct sched_domain *sd, struct tas > return new_cpu; > } > > +static inline bool is_idle_cpu_allowed(int cpu) > +{ > + return !arch_cpu_parked(cpu) && (available_idle_cpu(cpu) || sched_idle_cpu(cpu)); > +} How about adding below code, it could simplify the code quite a bit. no? sched_idle_rq also might need the same check though. +++ b/kernel/sched/syscalls.c @@ -214,6 +214,9 @@ int idle_cpu(int cpu) return 0; #endif + if (arch_cpu_parked(cpu)) + return 0; + return 1; } > + > static inline int __select_idle_cpu(int cpu, struct task_struct *p) > { > - if ((available_idle_cpu(cpu) || sched_idle_cpu(cpu)) && > - sched_cpu_cookie_match(cpu_rq(cpu), p)) > + if (is_idle_cpu_allowed(cpu) && sched_cpu_cookie_match(cpu_rq(cpu), p)) > return cpu; > > return -1; > @@ -7657,7 +7667,7 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t > */ > if (!cpumask_test_cpu(cpu, sched_domain_span(sd))) > continue; > - if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) > + if (is_idle_cpu_allowed(cpu)) > return cpu; > } > > @@ -7779,7 +7789,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target) > for_each_cpu_wrap(cpu, cpus, target) { > unsigned long cpu_cap = capacity_of(cpu); > > - if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu)) > + if (!is_idle_cpu_allowed(cpu)) > continue; > > fits = util_fits_cpu(task_util, util_min, util_max, cpu); > @@ -7850,7 +7860,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) > */ > lockdep_assert_irqs_disabled(); > > - if ((available_idle_cpu(target) || sched_idle_cpu(target)) && > + if (is_idle_cpu_allowed(target) && > asym_fits_cpu(task_util, util_min, util_max, target)) > return target; > > @@ -7858,7 +7868,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) > * If the previous CPU is cache affine and idle, don't be stupid: > */ > if (prev != target && cpus_share_cache(prev, target) && > - (available_idle_cpu(prev) || sched_idle_cpu(prev)) && > + is_idle_cpu_allowed(prev) && > asym_fits_cpu(task_util, util_min, util_max, prev)) { > > if (!static_branch_unlikely(&sched_cluster_active) || > @@ -7890,7 +7900,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) > if (recent_used_cpu != prev && > recent_used_cpu != target && > cpus_share_cache(recent_used_cpu, target) && > - (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) && > + is_idle_cpu_allowed(recent_used_cpu) && > cpumask_test_cpu(recent_used_cpu, p->cpus_ptr) && > asym_fits_cpu(task_util, util_min, util_max, recent_used_cpu)) { > > @@ -9198,7 +9208,12 @@ enum group_type { > * The CPU is overloaded and can't provide expected CPU cycles to all > * tasks. > */ > - group_overloaded > + group_overloaded, > + /* > + * The CPU should be avoided as it can't provide expected CPU cycles > + * even for small amounts of workload. > + */ > + group_parked > }; > > enum migration_type { > @@ -9498,7 +9513,7 @@ static int detach_tasks(struct lb_env *env) > * Source run queue has been emptied by another CPU, clear > * LBF_ALL_PINNED flag as we will not test any task. > */ > - if (env->src_rq->nr_running <= 1) { > + if (env->src_rq->nr_running <= 1 && !arch_cpu_parked(env->src_cpu)) { > env->flags &= ~LBF_ALL_PINNED; > return 0; > } > @@ -9511,7 +9526,7 @@ static int detach_tasks(struct lb_env *env) > * We don't want to steal all, otherwise we may be treated likewise, > * which could at worst lead to a livelock crash. > */ > - if (env->idle && env->src_rq->nr_running <= 1) > + if (env->idle && env->src_rq->nr_running <= 1 && !arch_cpu_parked(env->src_cpu)) > break; > > env->loop++; > @@ -9870,6 +9885,8 @@ struct sg_lb_stats { > unsigned long group_runnable; /* Total runnable time over the CPUs of the group */ > unsigned int sum_nr_running; /* Nr of all tasks running in the group */ > unsigned int sum_h_nr_running; /* Nr of CFS tasks running in the group */ > + unsigned int sum_nr_parked; > + unsigned int parked_cpus; Can you please explain why you need two of these? Is it to identify the group with most parked cpus? maybe comments is needed. > unsigned int idle_cpus; /* Nr of idle CPUs in the group */ > unsigned int group_weight; > enum group_type group_type; > @@ -10127,6 +10144,9 @@ group_type group_classify(unsigned int imbalance_pct, > struct sched_group *group, > struct sg_lb_stats *sgs) > { > + if (sgs->parked_cpus) > + return group_parked; > + > if (group_is_overloaded(imbalance_pct, sgs)) > return group_overloaded; > > @@ -10328,10 +10348,15 @@ static inline void update_sg_lb_stats(struct lb_env *env, > sgs->nr_numa_running += rq->nr_numa_running; > sgs->nr_preferred_running += rq->nr_preferred_running; > #endif > + > + if (rq->cfs.h_nr_running) { > + sgs->parked_cpus += arch_cpu_parked(i); > + sgs->sum_nr_parked += arch_cpu_parked(i) * rq->cfs.h_nr_running; > + } > /* > * No need to call idle_cpu() if nr_running is not 0 > */ > - if (!nr_running && idle_cpu(i)) { > + if (!nr_running && idle_cpu(i) && !arch_cpu_parked(i)) { > sgs->idle_cpus++; > /* Idle cpu can't have misfit task */ > continue; > @@ -10355,7 +10380,14 @@ static inline void update_sg_lb_stats(struct lb_env *env, > > sgs->group_capacity = group->sgc->capacity; > > - sgs->group_weight = group->group_weight; > + sgs->group_weight = group->group_weight - sgs->parked_cpus; > + > + /* > + * Only a subset of the group is parked, so the group itself has the > + * capability to potentially pull tasks > + */ > + if (sgs->parked_cpus < group->group_weight) > + sgs->parked_cpus = 0; Say you had a group with 4 cpus and 2 were parked CPUs. Now the group_weight will be 2 and it will be marked as parked. whereas if 1 CPU is parked group will not be marked as parked. That seems wrong. instead mark it as parked and use the parked_cpus number to compare no? > > /* Check if dst CPU is idle and preferred to this group */ > if (!local_group && env->idle && sgs->sum_h_nr_running && > @@ -10422,6 +10454,8 @@ static bool update_sd_pick_busiest(struct lb_env *env, > */ > > switch (sgs->group_type) { > + case group_parked: > + return sgs->sum_nr_parked > busiest->sum_nr_parked; > case group_overloaded: > /* Select the overloaded group with highest avg_load. */ > return sgs->avg_load > busiest->avg_load; > @@ -10633,6 +10667,9 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd, > nr_running = rq->nr_running - local; > sgs->sum_nr_running += nr_running; > > + sgs->parked_cpus += arch_cpu_parked(i); > + sgs->sum_nr_parked += arch_cpu_parked(i) * rq->cfs.h_nr_running; > + > /* > * No need to call idle_cpu_without() if nr_running is not 0 > */ > @@ -10649,7 +10686,14 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd, > > sgs->group_capacity = group->sgc->capacity; > > - sgs->group_weight = group->group_weight; > + sgs->group_weight = group->group_weight - sgs->parked_cpus; > + > + /* > + * Only a subset of the group is parked, so the group itself has the > + * capability to potentially pull tasks > + */ > + if (sgs->parked_cpus < group->group_weight) > + sgs->parked_cpus = 0; same comment as above. > > sgs->group_type = group_classify(sd->imbalance_pct, group, sgs); > > @@ -10680,6 +10724,8 @@ static bool update_pick_idlest(struct sched_group *idlest, > */ > > switch (sgs->group_type) { > + case group_parked: > + return false; Why not use the parked_cpus to compare? > case group_overloaded: > case group_fully_busy: > /* Select the group with lowest avg_load. */ > @@ -10730,7 +10776,7 @@ sched_balance_find_dst_group(struct sched_domain *sd, struct task_struct *p, int > unsigned long imbalance; > struct sg_lb_stats idlest_sgs = { > .avg_load = UINT_MAX, > - .group_type = group_overloaded, > + .group_type = group_parked, > }; > > do { > @@ -10788,6 +10834,8 @@ sched_balance_find_dst_group(struct sched_domain *sd, struct task_struct *p, int > return idlest; > > switch (local_sgs.group_type) { > + case group_parked: > + return idlest; > case group_overloaded: > case group_fully_busy: > > @@ -11039,6 +11087,12 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s > local = &sds->local_stat; > busiest = &sds->busiest_stat; > > + if (busiest->group_type == group_parked) { > + env->migration_type = migrate_task; > + env->imbalance = busiest->sum_nr_parked; > + return; > + } > + > if (busiest->group_type == group_misfit_task) { > if (env->sd->flags & SD_ASYM_CPUCAPACITY) { > /* Set imbalance to allow misfit tasks to be balanced. */ > @@ -11207,13 +11261,14 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s > /* > * Decision matrix according to the local and busiest group type: > * > - * busiest \ local has_spare fully_busy misfit asym imbalanced overloaded > - * has_spare nr_idle balanced N/A N/A balanced balanced > - * fully_busy nr_idle nr_idle N/A N/A balanced balanced > - * misfit_task force N/A N/A N/A N/A N/A > - * asym_packing force force N/A N/A force force > - * imbalanced force force N/A N/A force force > - * overloaded force force N/A N/A force avg_load > + * busiest \ local has_spare fully_busy misfit asym imbalanced overloaded parked > + * has_spare nr_idle balanced N/A N/A balanced balanced balanced > + * fully_busy nr_idle nr_idle N/A N/A balanced balanced balanced > + * misfit_task force N/A N/A N/A N/A N/A N/A > + * asym_packing force force N/A N/A force force balanced > + * imbalanced force force N/A N/A force force balanced > + * overloaded force force N/A N/A force avg_load balanced > + * parked force force N/A N/A force force nr_tasks If i see the code below, if local is parked, it always goes to balanced. how it is nr_tasks? am i reading this table wrong? > * > * N/A : Not Applicable because already filtered while updating > * statistics. > @@ -11222,6 +11277,8 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s > * avg_load : Only if imbalance is significant enough. > * nr_idle : dst_cpu is not busy and the number of idle CPUs is quite > * different in groups. > + * nr_task : balancing can go either way depending on the number of running tasks > + * per group > */ > > /** > @@ -11252,6 +11309,13 @@ static struct sched_group *sched_balance_find_src_group(struct lb_env *env) > goto out_balanced; > > busiest = &sds.busiest_stat; > + local = &sds.local_stat; > + > + if (local->group_type == group_parked) > + goto out_balanced; > + > + if (busiest->group_type == group_parked) > + goto force_balance; > > /* Misfit tasks should be dealt with regardless of the avg load */ > if (busiest->group_type == group_misfit_task) > @@ -11273,7 +11337,6 @@ static struct sched_group *sched_balance_find_src_group(struct lb_env *env) > if (busiest->group_type == group_imbalanced) > goto force_balance; > > - local = &sds.local_stat; > /* > * If the local group is busier than the selected busiest group > * don't try and pull any tasks. > @@ -11386,6 +11449,8 @@ static struct rq *sched_balance_find_src_rq(struct lb_env *env, > enum fbq_type rt; > > rq = cpu_rq(i); > + if (arch_cpu_parked(i) && rq->cfs.h_nr_running) > + return rq; > rt = fbq_classify_rq(rq); > > /* > @@ -11556,6 +11621,9 @@ static int need_active_balance(struct lb_env *env) > { > struct sched_domain *sd = env->sd; > > + if (arch_cpu_parked(env->src_cpu) && !idle_cpu(env->src_cpu)) > + return 1; > + > if (asym_active_balance(env)) > return 1; > > @@ -11589,6 +11657,9 @@ static int should_we_balance(struct lb_env *env) > struct sched_group *sg = env->sd->groups; > int cpu, idle_smt = -1; > > + if (arch_cpu_parked(env->dst_cpu)) > + return 0; > + > /* > * Ensure the balancing environment is consistent; can happen > * when the softirq triggers 'during' hotplug. > @@ -11612,7 +11683,7 @@ static int should_we_balance(struct lb_env *env) > cpumask_copy(swb_cpus, group_balance_mask(sg)); > /* Try to find first idle CPU */ > for_each_cpu_and(cpu, swb_cpus, env->cpus) { > - if (!idle_cpu(cpu)) > + if (!idle_cpu(cpu) || arch_cpu_parked(cpu)) > continue; > > /* > @@ -11707,7 +11778,7 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq, > ld_moved = 0; > /* Clear this flag as soon as we find a pullable task */ > env.flags |= LBF_ALL_PINNED; > - if (busiest->nr_running > 1) { > + if (busiest->nr_running > 1 || arch_cpu_parked(busiest->cpu)) { > /* > * Attempt to move tasks. If sched_balance_find_src_group has found > * an imbalance but busiest->nr_running <= 1, the group is > @@ -12721,6 +12792,9 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf) > > update_misfit_status(NULL, this_rq); > > + if (arch_cpu_parked(this_cpu)) > + return 0; > + > /* > * There is a task waiting to run. No need to search for one. > * Return 0; the task will be enqueued when switching to idle.