From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 97992C433F5 for ; Wed, 5 Oct 2022 07:52:50 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9E84110E46B; Wed, 5 Oct 2022 07:52:40 +0000 (UTC) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4396410E1A9; Wed, 5 Oct 2022 07:52:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664956353; x=1696492353; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version:content-transfer-encoding; bh=NzGv+WkIkasEkGM0i/r9lU95OJ63suX/JjPuF5uCz2s=; b=hwa5ltWpmwbIKPS72vrYQ6S+0dwJNZoMmTD48lCKsqCxy0/pOCppiG6I J/zgClgifQ2SadFT3yQfbvvDtc9fNy8Yn0b1Zgwx//4B2BdQYWwoVUD/P m3+2BbfW327o7RnwcO3ptxUJkeEZbYZA3MbjBRGVTHyTABnsc0HCetX3c exDX3KlYXdD1kEAN3/3t6e33MdNGyldmo0PfxLFtkaF9sLe+Qek3AkLH7 J+Eexa6T1X3+iRSOyJ0v8AQYWJ96RbttsMvfTVosC494CWe38DGKskJTB pB9rAfwMP+c1AsyhzlHE0nBIpNRYLx2eD8VgX19UU1/V75waIIIFyE9vz g==; X-IronPort-AV: E=McAfee;i="6500,9779,10490"; a="304089397" X-IronPort-AV: E=Sophos;i="5.95,159,1661842800"; d="scan'208";a="304089397" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Oct 2022 00:52:32 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10490"; a="692808944" X-IronPort-AV: E=Sophos;i="5.95,159,1661842800"; d="scan'208";a="692808944" Received: from jko-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.212.181.240]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Oct 2022 00:51:22 -0700 Date: Wed, 05 Oct 2022 00:40:34 -0700 Message-ID: <878rluzpjh.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Tvrtko Ursulin In-Reply-To: References: <20221003192419.3541088-1-ashutosh.dixit@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Intel-gfx] [PATCH] drm/i915/pmu: Match frequencies reported by PMU and sysfs X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Tue, 04 Oct 2022 06:00:22 -0700, Tvrtko Ursulin wrote: > Hi Tvrtko, > > On 04/10/2022 10:29, Tvrtko Ursulin wrote: > > > > On 03/10/2022 20:24, Ashutosh Dixit wrote: > >> PMU and sysfs use different wakeref's to "interpret" zero freq. Sysfs > >> uses > >> runtime PM wakeref (see intel_rps_read_punit_req and > >> intel_rps_read_actual_frequency). PMU uses the GT parked/unparked > >> wakeref. In general the GT wakeref is held for less time that the runt= ime > >> PM wakeref which causes PMU to report a lower average freq than the > >> average > >> freq obtained from sampling sysfs. > >> > >> To resolve this, use the same freq functions (and wakeref's) in PMU as > >> those used in sysfs. > >> > >> Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/7025 > >> Reported-by: Ashwin Kumar Kulkarni > >> Cc: Tvrtko Ursulin > >> Signed-off-by: Ashutosh Dixit > >> --- > >> =A0 drivers/gpu/drm/i915/i915_pmu.c | 27 ++------------------------- > >> =A0 1 file changed, 2 insertions(+), 25 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/i915/i915_pmu.c > >> b/drivers/gpu/drm/i915/i915_pmu.c > >> index 958b37123bf1..eda03f264792 100644 > >> --- a/drivers/gpu/drm/i915/i915_pmu.c > >> +++ b/drivers/gpu/drm/i915/i915_pmu.c > >> @@ -371,37 +371,16 @@ static void > >> =A0 frequency_sample(struct intel_gt *gt, unsigned int period_ns) > >> =A0 { > >> =A0=A0=A0=A0=A0 struct drm_i915_private *i915 =3D gt->i915; > >> -=A0=A0=A0 struct intel_uncore *uncore =3D gt->uncore; > >> =A0=A0=A0=A0=A0 struct i915_pmu *pmu =3D &i915->pmu; > >> =A0=A0=A0=A0=A0 struct intel_rps *rps =3D >->rps; > >> =A0=A0=A0=A0=A0 if (!frequency_sampling_enabled(pmu)) > >> =A0=A0=A0=A0=A0=A0=A0=A0=A0 return; > >> -=A0=A0=A0 /* Report 0/0 (actual/requested) frequency while parked. */ > >> -=A0=A0=A0 if (!intel_gt_pm_get_if_awake(gt)) > >> -=A0=A0=A0=A0=A0=A0=A0 return; > >> - > >> =A0=A0=A0=A0=A0 if (pmu->enable & config_mask(I915_PMU_ACTUAL_FREQUENC= Y)) { > >> -=A0=A0=A0=A0=A0=A0=A0 u32 val; > >> - > >> -=A0=A0=A0=A0=A0=A0=A0 /* > >> -=A0=A0=A0=A0=A0=A0=A0=A0 * We take a quick peek here without using fo= rcewake > >> -=A0=A0=A0=A0=A0=A0=A0=A0 * so that we don't perturb the system under = observation > >> -=A0=A0=A0=A0=A0=A0=A0=A0 * (forcewake =3D> !rc6 =3D> increased power = use). We expect > >> -=A0=A0=A0=A0=A0=A0=A0=A0 * that if the read fails because it is outsi= de of the > >> -=A0=A0=A0=A0=A0=A0=A0=A0 * mmio power well, then it will return 0 -- = in which > >> -=A0=A0=A0=A0=A0=A0=A0=A0 * case we assume the system is running at th= e intended > >> -=A0=A0=A0=A0=A0=A0=A0=A0 * frequency. Fortunately, the read should ra= rely fail! > >> -=A0=A0=A0=A0=A0=A0=A0=A0 */ > >> -=A0=A0=A0=A0=A0=A0=A0 val =3D intel_uncore_read_fw(uncore, GEN6_RPSTA= T1); > >> -=A0=A0=A0=A0=A0=A0=A0 if (val) > >> -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 val =3D intel_rps_get_cagf(rps, val= ); > >> -=A0=A0=A0=A0=A0=A0=A0 else > >> -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 val =3D rps->cur_freq; > >> - > >> =A0=A0=A0=A0=A0=A0=A0=A0=A0 add_sample_mult(&pmu->sample[__I915_SAMPLE= _FREQ_ACT], > >> -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 intel_gpu_freq(rps, val= ), period_ns / 1000); > >> +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 intel_rps_read_actual_f= requency(rps), > >> +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 period_ns / 1000); > >> =A0=A0=A0=A0=A0 } > >> =A0=A0=A0=A0=A0 if (pmu->enable & config_mask(I915_PMU_REQUESTED_FREQU= ENCY)) { > > > > What is software tracking of requested frequency showing when GT is > > parked or runtime suspended? With this change sampling would be outside > > any such checks so we need to be sure reported value makes sense. > > > > Although more important open is around what is actually correct. > > > > For instance how does the patch affect RC6 and power? I don't know how > > power management of different blocks is wired up, so personally I would > > only be able to look at it empirically. In other words what I am asking > > is this - if we changed from skipping obtaining forcewake even when > > unparked, to obtaining forcewake if not runtime suspended - what hardwa= re > > blocks does that power up and how it affects RC6 and power? Can it affe= ct > > actual frequency or not? (Will "something" power up the clocks just > > because we will be getting forcewake?) > > > > Or maybe question simplified - does 200Hz polling on existing sysfs > > actual frequency field disturbs the system under some circumstances? > > (Increases power and decreases RC6.) If it does then that would be a > > problem. We want a solution which shows the real data, but where the act > > of monitoring itself does not change it too much. If it doesn't then it= 's > > okay. > > > > Could you somehow investigate on these topics? Maybe log RAPL GPU power > > while polling on sysfs, versus getting the actual frequency from the > > existing PMU implementation and see if that shows anything? Or actually > > simpler - RAPL GPU power for current PMU intel_gpu_top versus this patc= h? > > On idle(-ish) desktop workloads perhaps? Power and frequency graphed for > > both. > > Another thought - considering that bspec says for 0xa01c "This register > reflects real-time values and thus does not have a pre-determined default > value out of reset" - could it be that it also does not reflect a real > value when GPU is not executing anything (so zero), just happens to be not > runtime suspended? That would mean sysfs reads could maybe show last known > value? Just a thought to check. Thanks for the suggestion, I'll try to check and report what I find. > I've also tried on my Alderlake desktop: > > 1) > > while true; do cat gt_act_freq_mhz >/dev/null; sleep 0.005; done > > This costs ~120mW of GPU power and ~20% decrease in RC6. > > > 2) > > intel_gpu_top -l -s 5 >/dev/null > > This costs no power or RC6. Thanks for the experiments. As I mentioned for Gen12+ is a different register which doesn't require taking a forcewake (it's not upstream yet but you can see it in this patch: https://patchwork.freedesktop.org/patch/504920/?series=3D109116&rev=3D1#com= ment_910146) so this issue should not be there at least for Gen12+. > I have also never observed sysfs to show below min freq. This was with no > desktop so it's possible this register indeed does not reflect the real > situation when things are idle. > > So I think it is possible sysfs value is the misleading one. Thanks I will check. The other possibility is if someone is holding a forcewake, the products where we are seeing this is have GuC controlling the both the frequency (SLPC) as well RC6 (GUCRC). Thanks. -- Ashutosh