From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82151C4321E for ; Mon, 5 Dec 2022 11:02:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230480AbiLELCG (ORCPT ); Mon, 5 Dec 2022 06:02:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230350AbiLELCF (ORCPT ); Mon, 5 Dec 2022 06:02:05 -0500 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D7E78167D7 for ; Mon, 5 Dec 2022 03:02:03 -0800 (PST) Received: by mail-wr1-x434.google.com with SMTP id m14so17975945wrh.7 for ; Mon, 05 Dec 2022 03:02:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=50n9+I4ZxZRBd4Gzgxnr+F5t6glYKeTWrpBn2vhsZRE=; b=k6LIQd5Iev9CRSssx9gOdJxTMz6B/GodnspKI9plJT4mnElcDfYL/C6O2GzZSQG4uJ kZV8xkD0fGRN5PWuAqtrKW+FOq7Va24G6hgmtAmtVa9bzmEhHe2UqfjagkG7EC8jyNbN GPC+MaUgIo8FSovEfpkMbBit5cKQjAzi7v8EvH7Zk+5SG9xuMujJCzjSL3+vP6PDs+7x QC0DV55+7C6Nx5Fr7dFDrhGvgQw/4Xeqi6+MCNYCrGy1GrO9s5H6GRmHY6SiZmWd8jZ5 +Ge5s0T1GJTM4K4+uC29OEvM09ycL7f9KdrUq0Kak+zWvtd4oI0edjm+6k5Ym9e3GyQH xrdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=50n9+I4ZxZRBd4Gzgxnr+F5t6glYKeTWrpBn2vhsZRE=; b=xm0lFPX1Z3Kg/SrMoOSCyPdu7EqZodnddIVaY8G1VaHyASBNALlc/Iawsfgn0R/RVc 08T0T3q07U+tkfPB/LICfJ6SiZCy5MWJqfCUpPW6MceI5OAVRbuwcZ7cXJCGmPRtKuM5 icPUUiweF+Brf352zoVeKIXRtD/nW8kTBaJtgmXeIl3bJvBzKQuz4UbVn40Iwoisl2vS hX3cnb53VWbBFu5CcqKqI8YsHNkAJBylQfjKNcc0xB8Kf5UUk/M2cF9A7koC4HJAu56x YRvr/131diKiDPf0GjHzXWe3vNaIWy2kMzWRK8YkdTmRuQGJlUvB9iU1uuiX7v4v/UEh Iz2w== X-Gm-Message-State: ANoB5pmVA1LZeuYJMbXLvAwOnYT4XXPoAvG4VA8bZ7qjGS5QcrtZ5tju ha0LwZFxlOw+VDT2P8OOFW87eQ== X-Google-Smtp-Source: AA0mqf6GC/DR99Gk2V2xSkqMAURWxlcI35iVbq9UMoouyXgZDMFQIevH6e6v28xtjFDFcu4GhmK56Q== X-Received: by 2002:adf:e305:0:b0:236:6089:cc5e with SMTP id b5-20020adfe305000000b002366089cc5emr40746437wrj.118.1670238122223; Mon, 05 Dec 2022 03:02:02 -0800 (PST) Received: from airbuntu (host86-130-134-87.range86-130.btcentralplus.com. [86.130.134.87]) by smtp.gmail.com with ESMTPSA id j11-20020a05600c190b00b003b47e75b401sm23742162wmq.37.2022.12.05.03.02.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Dec 2022 03:02:01 -0800 (PST) Date: Mon, 5 Dec 2022 11:01:59 +0000 From: Qais Yousef To: Vincent Guittot Cc: Ingo Molnar , Peter Zijlstra , Dietmar Eggemann , "Rafael J. Wysocki" , Viresh Kumar , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Lukasz Luba , Wei Wang , Xuewen Yan , Hank , Jonathan JMChen Subject: Re: [RFC PATCH 3/3] sched/fair: Traverse cpufreq policies to detect capacity inversion Message-ID: <20221205110159.nd5igwvsaj55jar7@airbuntu> References: <20221127141742.1644023-1-qyousef@layalina.io> <20221127141742.1644023-4-qyousef@layalina.io> <20221203143323.w32boxa6asqvvdnp@airbuntu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org On 12/04/22 12:35, Vincent Guittot wrote: > On Sat, 3 Dec 2022 at 15:33, Qais Yousef wrote: > > > > On 12/02/22 15:57, Vincent Guittot wrote: > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > > index 7c0dd57e562a..4bbbca85134b 100644 > > > > --- a/kernel/sched/fair.c > > > > +++ b/kernel/sched/fair.c > > > > @@ -8856,23 +8856,20 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu) > > > > * * Thermal pressure will impact all cpus in this perf domain > > > > * equally. > > > > */ > > > > - if (sched_energy_enabled()) { > > > > + if (static_branch_unlikely(&sched_asym_cpucapacity)) { > > > > unsigned long inv_cap = capacity_orig - thermal_load_avg(rq); > > > > - struct perf_domain *pd = rcu_dereference(rq->rd->pd); > > > > + struct cpufreq_policy *policy, __maybe_unused *policy_n; > > > > > > > > rq->cpu_capacity_inverted = 0; > > > > > > > > - SCHED_WARN_ON(!rcu_read_lock_held()); > > > > - > > > > - for (; pd; pd = pd->next) { > > > > - struct cpumask *pd_span = perf_domain_span(pd); > > > > + for_each_active_policy_safe(policy, policy_n) { > > > > > > So you are looping all cpufreq policy (and before the perf domain) in > > > the period load balance. That' really not something we should or want > > > to do > > > > Why is it not acceptable in the period load balance but acceptable in the hot > > wake up path in feec()? What's the difference? > > This patch loops on all cpufreq policy in sched softirq, how can this > be sane ? and not only in eas mode but also in the default asymmetric Hmm I'm still puzzled. Why it's not sane to do it here but it's okay to do it in the wake up path in feec()? AFAICT the number of cpufreq policies and perf domains have 1:1 mapping. In feec() we not only loop perf domains, but we go through each cpu of each domain to find max_spare_cap then compute_energy(). Which is more intensive. In worst case scenario where there's a perf domain for each cpu, then we'll loop through every CPU *and* compute energy its energy cost in the wake up path. Why it's deemed acceptable in wake up path which is more critical AFAIU but not period load balance which is expected to do more work? What am I missing? > performance one. > > This inverted detection doesn't look like the right way to fix your > problem IMO. That being said, i agree that I haven't made any other > proposal apart that I think that you should use a different rules for > task and for overutilized and part of your problem comes from this. We discussed this before; I need to revisit the thread but I can't see how overutilized different than task will fix the issue. They should be unified by design. I'm all ears if there's a simpler way to address the problem :-) The problem is that thermal pressure on big cpu is not important from uclamp perspective until it is in inversion state. It is quite common to have a system where the medium capacity is in 500 range. If the big is under thermal pressure that it drops to 800, then it is still a fitting CPU from uclamp perspective. Keep in mind uclamp_min is useful for tasks whose utilization is small So we need to be selective when thermal pressure is actually helping out or just creating unnecessary problems. The only other option I had in mind was to do the detection when we update the thermal_pressure in the topology code. But that didn't look better alternative to me. > > Then this make eas and util_fits_cpu even more Arm specific and I What is the Arm specific part about it? Why it wouldn't work on non-Arm systems? > still hope to merge sched_asym_cpucapacity and asym_packing a some > levels because they looks more and more similar but each side is > trying to add some SoC specific policy Oh, it seems Intel relies on asym_packing for their hybrid support approach? I think sched_asym_cpucapacity was designed to be generic. If I gathered correctly lack of support for SMT and inability to provide energy model outside of DT were some required extensions. To be honest, I personally think EAS can be useful on SMP systems and it would be nice to enable it outside of sched_asym_cpucapacity. I'm interested to hear more about this unification idea actually. If you feel a bit chatty to describe in more detail how do you see this being unified, that could be enlightening for some of us who work in this area :-) Thanks! -- Qais Yousef