From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0B04BC433FE for ; Tue, 8 Nov 2022 00:45:30 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D971610E211; Tue, 8 Nov 2022 00:45:29 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 89CE610E211 for ; Tue, 8 Nov 2022 00:45:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667868325; x=1699404325; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=Rj7Dg9goEx/c2pTbdBvGaeGlHlGuR3oQg2tEcCNTK5M=; b=RKrH+6LehGK3DoTWX9isRxLnhf07IF4K4+Uvuk6oO9RZTxwG1GUbNZXj 1JYcaGXJ6Cfxr6rIRUz77JQGr1Wz2Ek13Q4cpHjCvdyeXhJ4JEiv/WhYV KF2zWkjexrSoz76IQEryv9G7lMwAELgOVQeyTHkBX0Dx20bRKwMawllBk 7tf4D+u59DS3ifuGCo9bsyPdqA50TZIz10KfQ1YjD4WFfsHRT72X8Oc84 eT69efaFDLQfiIKmHIgMFY37b+yViQKZWcb7DVZDYu5N3tvox60Ddnukg h9LUXeyVTedU2ZVvcbPTEnlMu+bkn2h/xNRk65rUUA1/lVMLEg4pI+G5O g==; X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="374834163" X-IronPort-AV: E=Sophos;i="5.96,145,1665471600"; d="scan'208";a="374834163" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Nov 2022 16:45:24 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="881294031" X-IronPort-AV: E=Sophos;i="5.96,145,1665471600"; d="scan'208";a="881294031" Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.209.72.209]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Nov 2022 16:45:24 -0800 Date: Mon, 07 Nov 2022 16:45:18 -0800 Message-ID: <87h6zauvdt.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Umesh Nerlige Ramappa In-Reply-To: References: <20221105003235.1717908-1-umesh.nerlige.ramappa@intel.com> <20221105003235.1717908-2-umesh.nerlige.ramappa@intel.com> <87pmdylarc.wl-ashutosh.dixit@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII Subject: Re: [Intel-gfx] [PATCH 1/2] i915/uncore: Acquire fw before loop in intel_uncore_read64_2x32 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: intel-gfx@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Mon, 07 Nov 2022 16:11:27 -0800, Umesh Nerlige Ramappa wrote: > > On Mon, Nov 07, 2022 at 01:23:19PM -0800, Dixit, Ashutosh wrote: > > On Mon, 07 Nov 2022 02:13:46 -0800, Tvrtko Ursulin wrote: > >> > >> On 05/11/2022 00:32, Umesh Nerlige Ramappa wrote: > >> > PMU reads the GT timestamp as a 2x32 mmio read and since upper and lower > >> > 32 bit registers are read in a loop, there is a latency involved between > >> > getting the GT timestamp and the CPU timestamp. As part of the > >> > resolution, refactor intel_uncore_read64_2x32 to acquire forcewake and > >> > uncore lock prior to reading upper and lower regs. > >> > > >> > Signed-off-by: Umesh Nerlige Ramappa > >> > --- > >> > drivers/gpu/drm/i915/intel_uncore.h | 44 ++++++++++++++++++++--------- > >> > 1 file changed, 30 insertions(+), 14 deletions(-) > >> > > >> > diff --git a/drivers/gpu/drm/i915/intel_uncore.h b/drivers/gpu/drm/i915/intel_uncore.h > >> > index 5449146a0624..e9e38490815d 100644 > >> > --- a/drivers/gpu/drm/i915/intel_uncore.h > >> > +++ b/drivers/gpu/drm/i915/intel_uncore.h > >> > @@ -382,20 +382,6 @@ __uncore_write(write_notrace, 32, l, false) > >> > */ > >> > __uncore_read(read64, 64, q, true) > >> > -static inline u64 > >> > -intel_uncore_read64_2x32(struct intel_uncore *uncore, > >> > - i915_reg_t lower_reg, i915_reg_t upper_reg) > >> > -{ > >> > - u32 upper, lower, old_upper, loop = 0; > >> > - upper = intel_uncore_read(uncore, upper_reg); > >> > - do { > >> > - old_upper = upper; > >> > - lower = intel_uncore_read(uncore, lower_reg); > >> > - upper = intel_uncore_read(uncore, upper_reg); > >> > - } while (upper != old_upper && loop++ < 2); > >> > - return (u64)upper << 32 | lower; > >> > -} > >> > - > >> > #define intel_uncore_posting_read(...) ((void)intel_uncore_read_notrace(__VA_ARGS__)) > >> > #define intel_uncore_posting_read16(...) ((void)intel_uncore_read16_notrace(__VA_ARGS__)) > >> > @@ -455,6 +441,36 @@ static inline void intel_uncore_rmw_fw(struct > >> > intel_uncore *uncore, > >> > intel_uncore_write_fw(uncore, reg, val); > >> > } > >> > +static inline u64 > >> > +intel_uncore_read64_2x32(struct intel_uncore *uncore, > >> > + i915_reg_t lower_reg, i915_reg_t upper_reg) > >> > +{ > >> > + u32 upper, lower, old_upper, loop = 0; > >> > + enum forcewake_domains fw_domains; > >> > + unsigned long flags; > >> > + > >> > + fw_domains = intel_uncore_forcewake_for_reg(uncore, lower_reg, > >> > + FW_REG_READ); > >> > + > >> > + fw_domains |= intel_uncore_forcewake_for_reg(uncore, upper_reg, > >> > + FW_REG_READ); > >> > + > >> > + spin_lock_irqsave(&uncore->lock, flags); > >> > + intel_uncore_forcewake_get__locked(uncore, fw_domains); > >> > + > >> > + upper = intel_uncore_read_fw(uncore, upper_reg); > >> > + do { > >> > + old_upper = upper; > >> > + lower = intel_uncore_read_fw(uncore, lower_reg); > >> > + upper = intel_uncore_read_fw(uncore, upper_reg); > >> > + } while (upper != old_upper && loop++ < 2); > >> > + > >> > + intel_uncore_forcewake_put__locked(uncore, fw_domains); > >> > >> I mulled over the fact this no longer applies the put hysteresis, but then > >> I saw GuC busyness is essentially the only current caller so thought it > >> doesn't really warrant adding a super long named > >> intel_uncore_forcewake_put_delayed__locked helper. > >> > >> Perhaps it would make sense to move this out of static inline, > > Are you saying - drop the inline OR drop static inline? I am assuming the > former. No you need to have 'static inline' for functions defined in a header file. I also don't understand completely but seems what Tvrtko is saying is move the function to the .c leaving only the declarations in the .h? Anyway let Tvrtko explain more. > > >> in which > >> case it would also be easier to have the hysteresis without needing to > >> export any new helpers, > > I don't understand this part. Do you mean that it makes it easier to just > call __intel_uncore_forcewake_put(uncore, fw_domains, true) then? Yes I think this will work, drop the lock and call __intel_uncore_forcewake_put. > Just > wondering how 'static inline' has any effect on that. > > >> but mostly because it does not feel the static > >> inline is justified. > > Agree, just carried it over from the previous helper definition. > > >> Sounds an attractive option but it is passable as is. > > > > Yup, copy that. Also see now how this reduces the read latency. And also it > > would increase the latency a bit for a different thread trying to do an > > uncore read/write since we hold uncore->lock longer but should be ok I > > think. > > Didn't think about it from that perspective. Worst case is that > gt_park/gt_unpark may happen very frequently (as seen on some use > cases). In that case, the unpark would end up calling this helper each > time. > > > > >> Reviewed-by: Tvrtko Ursulin > > > > Copy that too: > > > > Reviewed-by: Ashutosh Dixit > > Thanks, > Umesh > > > > >> > >> > + spin_unlock_irqrestore(&uncore->lock, flags); > >> > + > >> > + return (u64)upper << 32 | lower; > >> > +} > >> > + > >> > static inline int intel_uncore_write_and_verify(struct intel_uncore *uncore, > >> > i915_reg_t reg, u32 val, > >> > u32 mask, u32 expected_val)