From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 89E50C433FE for ; Tue, 8 Nov 2022 10:07:00 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6AF0310E0E8; Tue, 8 Nov 2022 10:06:59 +0000 (UTC) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3C16610E0E8 for ; Tue, 8 Nov 2022 10:06:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667902016; x=1699438016; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=10mfojkvrcKAfRNcguihPF3czLgyRoaxFbzd4IXz1cE=; b=NpgwNfbKJtTqV2D1AkJM8NmMuGBgg7+oT4Jxu4cMPFBhnjHSznyaGEzP FERN/ugjVBrRupqkitXhfyalSgdGOM5wjNCkZtz3ER8P7q288+jnQ6g8R 07M/XPBjOMLfY6XeJDLAmMCV5Ib4FolsJPQbgj4Ww92i25OgIV3S59Fdh LoTf8V88pHBr12pBtBbcrzae3isX6N702te/TRHFCls/FnJlKrs53H10i 3HBxa+nTDLBCEg/O6wKqZFqUP+vGblcHkDRZpMamgQCSqL9pl7IHGD1xM BFokwXHY5oLCbJl3mWkFI9BJjF9PCDgPjHz0tZfOb4h9xiRzT4Hhpg3ua g==; X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="311813265" X-IronPort-AV: E=Sophos;i="5.96,147,1665471600"; d="scan'208";a="311813265" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Nov 2022 02:06:55 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="725508551" X-IronPort-AV: E=Sophos;i="5.96,147,1665471600"; d="scan'208";a="725508551" Received: from shylandx-mobl2.ger.corp.intel.com (HELO [10.213.210.50]) ([10.213.210.50]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Nov 2022 02:06:54 -0800 Message-ID: <4482cdde-8c8b-261a-cba7-36595d849a0e@linux.intel.com> Date: Tue, 8 Nov 2022 10:06:52 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.3 Content-Language: en-US To: "Dixit, Ashutosh" , Umesh Nerlige Ramappa References: <20221105003235.1717908-1-umesh.nerlige.ramappa@intel.com> <20221105003235.1717908-2-umesh.nerlige.ramappa@intel.com> <87pmdylarc.wl-ashutosh.dixit@intel.com> <87h6zauvdt.wl-ashutosh.dixit@intel.com> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc In-Reply-To: <87h6zauvdt.wl-ashutosh.dixit@intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Intel-gfx] [PATCH 1/2] i915/uncore: Acquire fw before loop in intel_uncore_read64_2x32 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: intel-gfx@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On 08/11/2022 00:45, Dixit, Ashutosh wrote: > On Mon, 07 Nov 2022 16:11:27 -0800, Umesh Nerlige Ramappa wrote: >> >> On Mon, Nov 07, 2022 at 01:23:19PM -0800, Dixit, Ashutosh wrote: >>> On Mon, 07 Nov 2022 02:13:46 -0800, Tvrtko Ursulin wrote: >>>> >>>> On 05/11/2022 00:32, Umesh Nerlige Ramappa wrote: >>>>> PMU reads the GT timestamp as a 2x32 mmio read and since upper and lower >>>>> 32 bit registers are read in a loop, there is a latency involved between >>>>> getting the GT timestamp and the CPU timestamp. As part of the >>>>> resolution, refactor intel_uncore_read64_2x32 to acquire forcewake and >>>>> uncore lock prior to reading upper and lower regs. >>>>> >>>>> Signed-off-by: Umesh Nerlige Ramappa >>>>> --- >>>>> drivers/gpu/drm/i915/intel_uncore.h | 44 ++++++++++++++++++++--------- >>>>> 1 file changed, 30 insertions(+), 14 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/i915/intel_uncore.h b/drivers/gpu/drm/i915/intel_uncore.h >>>>> index 5449146a0624..e9e38490815d 100644 >>>>> --- a/drivers/gpu/drm/i915/intel_uncore.h >>>>> +++ b/drivers/gpu/drm/i915/intel_uncore.h >>>>> @@ -382,20 +382,6 @@ __uncore_write(write_notrace, 32, l, false) >>>>> */ >>>>> __uncore_read(read64, 64, q, true) >>>>> -static inline u64 >>>>> -intel_uncore_read64_2x32(struct intel_uncore *uncore, >>>>> - i915_reg_t lower_reg, i915_reg_t upper_reg) >>>>> -{ >>>>> - u32 upper, lower, old_upper, loop = 0; >>>>> - upper = intel_uncore_read(uncore, upper_reg); >>>>> - do { >>>>> - old_upper = upper; >>>>> - lower = intel_uncore_read(uncore, lower_reg); >>>>> - upper = intel_uncore_read(uncore, upper_reg); >>>>> - } while (upper != old_upper && loop++ < 2); >>>>> - return (u64)upper << 32 | lower; >>>>> -} >>>>> - >>>>> #define intel_uncore_posting_read(...) ((void)intel_uncore_read_notrace(__VA_ARGS__)) >>>>> #define intel_uncore_posting_read16(...) ((void)intel_uncore_read16_notrace(__VA_ARGS__)) >>>>> @@ -455,6 +441,36 @@ static inline void intel_uncore_rmw_fw(struct >>>>> intel_uncore *uncore, >>>>> intel_uncore_write_fw(uncore, reg, val); >>>>> } >>>>> +static inline u64 >>>>> +intel_uncore_read64_2x32(struct intel_uncore *uncore, >>>>> + i915_reg_t lower_reg, i915_reg_t upper_reg) >>>>> +{ >>>>> + u32 upper, lower, old_upper, loop = 0; >>>>> + enum forcewake_domains fw_domains; >>>>> + unsigned long flags; >>>>> + >>>>> + fw_domains = intel_uncore_forcewake_for_reg(uncore, lower_reg, >>>>> + FW_REG_READ); >>>>> + >>>>> + fw_domains |= intel_uncore_forcewake_for_reg(uncore, upper_reg, >>>>> + FW_REG_READ); >>>>> + >>>>> + spin_lock_irqsave(&uncore->lock, flags); >>>>> + intel_uncore_forcewake_get__locked(uncore, fw_domains); >>>>> + >>>>> + upper = intel_uncore_read_fw(uncore, upper_reg); >>>>> + do { >>>>> + old_upper = upper; >>>>> + lower = intel_uncore_read_fw(uncore, lower_reg); >>>>> + upper = intel_uncore_read_fw(uncore, upper_reg); >>>>> + } while (upper != old_upper && loop++ < 2); >>>>> + >>>>> + intel_uncore_forcewake_put__locked(uncore, fw_domains); >>>> >>>> I mulled over the fact this no longer applies the put hysteresis, but then >>>> I saw GuC busyness is essentially the only current caller so thought it >>>> doesn't really warrant adding a super long named >>>> intel_uncore_forcewake_put_delayed__locked helper. >>>> >>>> Perhaps it would make sense to move this out of static inline, >> >> Are you saying - drop the inline OR drop static inline? I am assuming the >> former. > > No you need to have 'static inline' for functions defined in a header > file. I also don't understand completely but seems what Tvrtko is saying is > move the function to the .c leaving only the declarations in the .h? Anyway > let Tvrtko explain more. Yes I does not feel warranted for it to be a static inline so I'd just move it to .c. In which case.. >>>> in which >>>> case it would also be easier to have the hysteresis without needing to >>>> export any new helpers, >> >> I don't understand this part. Do you mean that it makes it easier to just >> call __intel_uncore_forcewake_put(uncore, fw_domains, true) then? .. you could indeed call this and keep the put hysteresis. But I don't think that it matters really. You can go with the patch as is for what I am concerned. > Yes I think this will work, drop the lock and call > __intel_uncore_forcewake_put. > >> Just >> wondering how 'static inline' has any effect on that. >> >>>> but mostly because it does not feel the static >>>> inline is justified. >> >> Agree, just carried it over from the previous helper definition. >> >>>> Sounds an attractive option but it is passable as is. >>> >>> Yup, copy that. Also see now how this reduces the read latency. And also it >>> would increase the latency a bit for a different thread trying to do an >>> uncore read/write since we hold uncore->lock longer but should be ok I >>> think. >> >> Didn't think about it from that perspective. Worst case is that >> gt_park/gt_unpark may happen very frequently (as seen on some use >> cases). In that case, the unpark would end up calling this helper each >> time. Concern is two mmio reads under the uncore lock versus two lock-unlock cycles with one mmio read under them each? Feels like a meh. I guess with this DC induced latency issue it's a worse worst case but difference between normal times and pathological spike is probably orders of magnitude right? Regards, Tvrtko