From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B8B38C25B47 for ; Wed, 25 Oct 2023 11:02:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=ft7QGfvdHo9lx4JWFptudGAEAvZyJ7fgtbU5Tzg+WuY=; b=VEKNQq7ulT6oZs 0+aOrx51y08ShvHp3cqBC/LQA7n8QGcoZuTtWPwKULMrI7L6hZJpT/grXQd0bSZxJ81Q3cfHjS3tO W+3oXpQkfhzOr18yPvAVY5LMXpH9pi6/iLvbzN2sWoagpd44gXySNiOlviZspro5KlXUe+HY/lZp+ 4maVhlD/p1Gw3wH+DvuK4rOQMw98laBt9v45TQfao7hHZHQ2r2/Ay8qcW+8RV3B6LFi7AtaDsr+qJ gKNzzMypHgNI5uKCQh6jCxeAvIjbt/UAXvMtd4IKRWlkKM/GOgvKrKC9oJGHckvXXEyVJht79cP3m TPeMo/gCcQHDBEZNNseQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qvbdh-00C8YV-0U; Wed, 25 Oct 2023 11:01:29 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qvbdd-00C8Xh-0m for linux-arm-kernel@lists.infradead.org; Wed, 25 Oct 2023 11:01:27 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 89CBF2F4; Wed, 25 Oct 2023 04:02:03 -0700 (PDT) Received: from FVFF77S0Q05N (unknown [10.57.69.205]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6311B3F738; Wed, 25 Oct 2023 04:01:19 -0700 (PDT) Date: Wed, 25 Oct 2023 12:01:16 +0100 From: Mark Rutland To: Zeng Heng Cc: broonie@kernel.org, joey.gouly@arm.com, will@kernel.org, amit.kachhap@arm.com, rafael@kernel.org, catalin.marinas@arm.com, james.morse@arm.com, maz@kernel.org, viresh.kumar@linaro.org, sumitg@nvidia.com, yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, wangxiongfeng2@huawei.com, xiexiuqi@huawei.com, Ionela Voinescu Subject: Re: [PATCH 3/3] cpufreq: CPPC: Eliminate the impact of cpc_read() latency error Message-ID: References: <20231025093847.3740104-1-zengheng4@huawei.com> <20231025093847.3740104-4-zengheng4@huawei.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20231025093847.3740104-4-zengheng4@huawei.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231025_040125_380006_568FE502 X-CRM114-Status: GOOD ( 29.94 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Oct 25, 2023 at 05:38:47PM +0800, Zeng Heng wrote: > We have found significant differences in the latency of cpc_read() between > regular scenarios and scenarios with high memory access pressure. Ignoring > this error can result in getting rate interface occasionally returning > absurd values. > > Here provides a high memory access sample test by stress-ng. My local > testing platform includes 160 CPUs, the CPC registers is accessed by mmio > method, and the cpuidle feature is disabled (the AMU always works online): > > ~~~ > ./stress-ng --memrate 160 --timeout 180 > ~~~ > > The following data is sourced from ftrace statistics towards > cppc_get_perf_ctrs(): > > Regular scenarios || High memory access pressure scenarios > 104) | cppc_get_perf_ctrs() { || 133) | cppc_get_perf_ctrs() { > 104) 0.800 us | cpc_read.isra.0(); || 133) 4.580 us | cpc_read.isra.0(); > 104) 0.640 us | cpc_read.isra.0(); || 133) 7.780 us | cpc_read.isra.0(); > 104) 0.450 us | cpc_read.isra.0(); || 133) 2.550 us | cpc_read.isra.0(); > 104) 0.430 us | cpc_read.isra.0(); || 133) 0.570 us | cpc_read.isra.0(); > 104) 4.610 us | } || 133) ! 157.610 us | } > 104) | cppc_get_perf_ctrs() { || 133) | cppc_get_perf_ctrs() { > 104) 0.720 us | cpc_read.isra.0(); || 133) 0.760 us | cpc_read.isra.0(); > 104) 0.720 us | cpc_read.isra.0(); || 133) 4.480 us | cpc_read.isra.0(); > 104) 0.510 us | cpc_read.isra.0(); || 133) 0.520 us | cpc_read.isra.0(); > 104) 0.500 us | cpc_read.isra.0(); || 133) + 10.100 us | cpc_read.isra.0(); > 104) 3.460 us | } || 133) ! 120.850 us | } > 108) | cppc_get_perf_ctrs() { || 87) | cppc_get_perf_ctrs() { > 108) 0.820 us | cpc_read.isra.0(); || 87) ! 255.200 us | cpc_read.isra.0(); > 108) 0.850 us | cpc_read.isra.0(); || 87) 2.910 us | cpc_read.isra.0(); > 108) 0.590 us | cpc_read.isra.0(); || 87) 5.160 us | cpc_read.isra.0(); > 108) 0.610 us | cpc_read.isra.0(); || 87) 4.340 us | cpc_read.isra.0(); > 108) 5.080 us | } || 87) ! 315.790 us | } > 108) | cppc_get_perf_ctrs() { || 87) | cppc_get_perf_ctrs() { > 108) 0.630 us | cpc_read.isra.0(); || 87) 0.800 us | cpc_read.isra.0(); > 108) 0.630 us | cpc_read.isra.0(); || 87) 6.310 us | cpc_read.isra.0(); > 108) 0.420 us | cpc_read.isra.0(); || 87) 1.190 us | cpc_read.isra.0(); > 108) 0.430 us | cpc_read.isra.0(); || 87) + 11.620 us | cpc_read.isra.0(); > 108) 3.780 us | } || 87) ! 207.010 us | } > > My local testing platform works under 3000000hz, but the cpuinfo_cur_freq > interface returns values that are not even close to the actual frequency: > > [root@localhost ~]# cd /sys/devices/system/cpu > [root@localhost cpu]# for i in {0..159}; do cat cpu$i/cpufreq/cpuinfo_cur_freq; done > 5127812 > 2952127 > 3069001 > 3496183 > 922989768 > 2419194 > 3427042 > 2331869 > 3594611 > 8238499 > ... > > The reason is when under heavy memory access pressure, the execution of > cpc_read() delay has increased from sub-microsecond to several hundred > microseconds. Moving the cpc_read function into a critical section by irq > disable/enable has minimal impact on the result. > > cppc_get_perf_ctrs()[0] cppc_get_perf_ctrs()[1] > / \ / \ > cpc_read cpc_read cpc_read cpc_read > ref[0] delivered[0] ref[1] delivered[1] > | | | | > v v v v > -----------------------------------------------------------------------> time > <--delta[0]--> <------sample_period------> <-----delta[1]-----> > > Since that, > freq = ref_freq * (delivered[1] - delivered[0]) / (ref[1] - ref[0]) > and > delivered[1] - delivered[0] = freq * (delta[1] + sample_period), > ref[1] - ref[0] = ref_freq * (delta[0] + sample_period) > > To eliminate the impact of system memory access latency, setting a > sampling period of 2us is far from sufficient. Consequently, we suggest > cppc_cpufreq_get_rate() only can be called in the process context, and > adopt a longer sampling period to neutralize the impact of random latency. > > Here we call the cond_resched() function instead of sleep-like functions > to ensure that `taskset -c $i cat cpu$i/cpufreq/cpuinfo_cur_freq` could > work when cpuidle feature is enabled. > > Reported-by: Yang Shi > Link: https://lore.kernel.org/all/20230328193846.8757-1-yang@os.amperecomputing.com/ > Signed-off-by: Zeng Heng > --- > drivers/cpufreq/cppc_cpufreq.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c > index 321a9dc9484d..a7c5418bcda7 100644 > --- a/drivers/cpufreq/cppc_cpufreq.c > +++ b/drivers/cpufreq/cppc_cpufreq.c > @@ -851,12 +851,26 @@ static int cppc_get_perf_ctrs_pair(void *val) The previous patch added this function, and calls it with smp_call_on_cpu(), where it'll run in IRQ context with IRQs disabled... > struct fb_ctr_pair *fb_ctrs = val; > int cpu = fb_ctrs->cpu; > int ret; > + unsigned long timeout; > > ret = cppc_get_perf_ctrs(cpu, &fb_ctrs->fb_ctrs_t0); > if (ret) > return ret; > > - udelay(2); /* 2usec delay between sampling */ > + if (likely(!irqs_disabled())) { > + /* > + * Set 1ms as sampling interval, but never schedule > + * to the idle task to prevent the AMU counters from > + * stopping working. > + */ > + timeout = jiffies + msecs_to_jiffies(1); > + while (!time_after(jiffies, timeout)) > + cond_resched(); > + > + } else { ... so we'll enter this branch of the if-else ... > + pr_warn_once("CPU%d: Get rate in atomic context", cpu); ... and pr_warn_once() for something that's apparently normal and outside of the user's control? That doesn't make much sense to me. Mark. > + udelay(2); /* 2usec delay between sampling */ > + } > > return cppc_get_perf_ctrs(cpu, &fb_ctrs->fb_ctrs_t1); > } > -- > 2.25.1 > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel