From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 0B04BC433FE
	for <intel-gfx@archiver.kernel.org>; Tue,  8 Nov 2022 00:45:30 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id D971610E211;
	Tue,  8 Nov 2022 00:45:29 +0000 (UTC)
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 89CE610E211
 for <intel-gfx@lists.freedesktop.org>; Tue,  8 Nov 2022 00:45:25 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1667868325; x=1699404325;
 h=date:message-id:from:to:cc:subject:in-reply-to:
 references:mime-version;
 bh=Rj7Dg9goEx/c2pTbdBvGaeGlHlGuR3oQg2tEcCNTK5M=;
 b=RKrH+6LehGK3DoTWX9isRxLnhf07IF4K4+Uvuk6oO9RZTxwG1GUbNZXj
 1JYcaGXJ6Cfxr6rIRUz77JQGr1Wz2Ek13Q4cpHjCvdyeXhJ4JEiv/WhYV
 KF2zWkjexrSoz76IQEryv9G7lMwAELgOVQeyTHkBX0Dx20bRKwMawllBk
 7tf4D+u59DS3ifuGCo9bsyPdqA50TZIz10KfQ1YjD4WFfsHRT72X8Oc84
 eT69efaFDLQfiIKmHIgMFY37b+yViQKZWcb7DVZDYu5N3tvox60Ddnukg
 h9LUXeyVTedU2ZVvcbPTEnlMu+bkn2h/xNRk65rUUA1/lVMLEg4pI+G5O g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="374834163"
X-IronPort-AV: E=Sophos;i="5.96,145,1665471600"; d="scan'208";a="374834163"
Received: from fmsmga006.fm.intel.com ([10.253.24.20])
 by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Nov 2022 16:45:24 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="881294031"
X-IronPort-AV: E=Sophos;i="5.96,145,1665471600"; d="scan'208";a="881294031"
Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com)
 ([10.209.72.209])
 by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Nov 2022 16:45:24 -0800
Date: Mon, 07 Nov 2022 16:45:18 -0800
Message-ID: <87h6zauvdt.wl-ashutosh.dixit@intel.com>
From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
In-Reply-To: <Y2mer/algb78p4Fh@unerlige-ril>
References: <20221105003235.1717908-1-umesh.nerlige.ramappa@intel.com>	<20221105003235.1717908-2-umesh.nerlige.ramappa@intel.com>	<a6daeb51-8552-9f26-94c0-425ae1e24077@linux.intel.com>	<87pmdylarc.wl-ashutosh.dixit@intel.com>	<Y2mer/algb78p4Fh@unerlige-ril>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue)
 FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0
 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset=US-ASCII
Subject: Re: [Intel-gfx] [PATCH 1/2] i915/uncore: Acquire fw before loop in
 intel_uncore_read64_2x32
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: intel-gfx@lists.freedesktop.org
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

On Mon, 07 Nov 2022 16:11:27 -0800, Umesh Nerlige Ramappa wrote:
>
> On Mon, Nov 07, 2022 at 01:23:19PM -0800, Dixit, Ashutosh wrote:
> > On Mon, 07 Nov 2022 02:13:46 -0800, Tvrtko Ursulin wrote:
> >>
> >> On 05/11/2022 00:32, Umesh Nerlige Ramappa wrote:
> >> > PMU reads the GT timestamp as a 2x32 mmio read and since upper and lower
> >> > 32 bit registers are read in a loop, there is a latency involved between
> >> > getting the GT timestamp and the CPU timestamp. As part of the
> >> > resolution, refactor intel_uncore_read64_2x32 to acquire forcewake and
> >> > uncore lock prior to reading upper and lower regs.
> >> >
> >> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >> > ---
> >> >   drivers/gpu/drm/i915/intel_uncore.h | 44 ++++++++++++++++++++---------
> >> >   1 file changed, 30 insertions(+), 14 deletions(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/intel_uncore.h b/drivers/gpu/drm/i915/intel_uncore.h
> >> > index 5449146a0624..e9e38490815d 100644
> >> > --- a/drivers/gpu/drm/i915/intel_uncore.h
> >> > +++ b/drivers/gpu/drm/i915/intel_uncore.h
> >> > @@ -382,20 +382,6 @@ __uncore_write(write_notrace, 32, l, false)
> >> >    */
> >> >   __uncore_read(read64, 64, q, true)
> >> >   -static inline u64
> >> > -intel_uncore_read64_2x32(struct intel_uncore *uncore,
> >> > -			 i915_reg_t lower_reg, i915_reg_t upper_reg)
> >> > -{
> >> > -	u32 upper, lower, old_upper, loop = 0;
> >> > -	upper = intel_uncore_read(uncore, upper_reg);
> >> > -	do {
> >> > -		old_upper = upper;
> >> > -		lower = intel_uncore_read(uncore, lower_reg);
> >> > -		upper = intel_uncore_read(uncore, upper_reg);
> >> > -	} while (upper != old_upper && loop++ < 2);
> >> > -	return (u64)upper << 32 | lower;
> >> > -}
> >> > -
> >> >   #define intel_uncore_posting_read(...) ((void)intel_uncore_read_notrace(__VA_ARGS__))
> >> >   #define intel_uncore_posting_read16(...) ((void)intel_uncore_read16_notrace(__VA_ARGS__))
> >> >   @@ -455,6 +441,36 @@ static inline void intel_uncore_rmw_fw(struct
> >> > intel_uncore *uncore,
> >> >		intel_uncore_write_fw(uncore, reg, val);
> >> >   }
> >> >   +static inline u64
> >> > +intel_uncore_read64_2x32(struct intel_uncore *uncore,
> >> > +			 i915_reg_t lower_reg, i915_reg_t upper_reg)
> >> > +{
> >> > +	u32 upper, lower, old_upper, loop = 0;
> >> > +	enum forcewake_domains fw_domains;
> >> > +	unsigned long flags;
> >> > +
> >> > +	fw_domains = intel_uncore_forcewake_for_reg(uncore, lower_reg,
> >> > +						    FW_REG_READ);
> >> > +
> >> > +	fw_domains |= intel_uncore_forcewake_for_reg(uncore, upper_reg,
> >> > +						    FW_REG_READ);
> >> > +
> >> > +	spin_lock_irqsave(&uncore->lock, flags);
> >> > +	intel_uncore_forcewake_get__locked(uncore, fw_domains);
> >> > +
> >> > +	upper = intel_uncore_read_fw(uncore, upper_reg);
> >> > +	do {
> >> > +		old_upper = upper;
> >> > +		lower = intel_uncore_read_fw(uncore, lower_reg);
> >> > +		upper = intel_uncore_read_fw(uncore, upper_reg);
> >> > +	} while (upper != old_upper && loop++ < 2);
> >> > +
> >> > +	intel_uncore_forcewake_put__locked(uncore, fw_domains);
> >>
> >> I mulled over the fact this no longer applies the put hysteresis, but then
> >> I saw GuC busyness is essentially the only current caller so thought it
> >> doesn't really warrant adding a super long named
> >> intel_uncore_forcewake_put_delayed__locked helper.
> >>
> >> Perhaps it would make sense to move this out of static inline,
>
> Are you saying - drop the inline OR drop static inline? I am assuming the
> former.

No you need to have 'static inline' for functions defined in a header
file. I also don't understand completely but seems what Tvrtko is saying is
move the function to the .c leaving only the declarations in the .h? Anyway
let Tvrtko explain more.

>
> >> in which
> >> case it would also be easier to have the hysteresis without needing to
> >> export any new helpers,
>
> I don't understand this part. Do you mean that it makes it easier to just
> call __intel_uncore_forcewake_put(uncore, fw_domains, true) then?

Yes I think this will work, drop the lock and call
__intel_uncore_forcewake_put.

> Just
> wondering how 'static inline' has any effect on that.
>
> >> but mostly because it does not feel the static
> >> inline is justified.
>
> Agree, just carried it over from the previous helper definition.
>
> >> Sounds an attractive option but it is passable as is.
> >
> > Yup, copy that. Also see now how this reduces the read latency. And also it
> > would increase the latency a bit for a different thread trying to do an
> > uncore read/write since we hold uncore->lock longer but should be ok I
> > think.
>
> Didn't think about it from that perspective. Worst case is that
> gt_park/gt_unpark may happen very frequently (as seen on some use
> cases). In that case, the unpark would end up calling this helper each
> time.
>
> >
> >> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >
> > Copy that too:
> >
> > Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>
> Thanks,
> Umesh
> >
>
> >>
> >> > +	spin_unlock_irqrestore(&uncore->lock, flags);
> >> > +
> >> > +	return (u64)upper << 32 | lower;
> >> > +}
> >> > +
> >> >   static inline int intel_uncore_write_and_verify(struct intel_uncore *uncore,
> >> >						i915_reg_t reg, u32 val,
> >> >						u32 mask, u32 expected_val)