From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A54E5C77B75 for ; Mon, 15 May 2023 22:07:53 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3FD7510E2A2; Mon, 15 May 2023 22:07:53 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id B1DA910E2A2 for ; Mon, 15 May 2023 22:07:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684188470; x=1715724470; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version:content-transfer-encoding; bh=yN2EtFJbjpL1gyqMPcuwQ5bxya2X6xhLrRPoGr8gDb0=; b=aXJX2RS7U+1mg8i1I6McDj2xNGyWxitlwtiI9uIQQaaDLCxYhUKQjipD TT3pMEGAeXV93DoQ82RGZcdoI84juJhzEo/bN7LXM/zQEkdk9osOUwkIG tTnKyj/HzoLUxIZKS/os0s5VPwLi86c+IiuYdSHig5zoeSiEB949XVFcV G8RgLOlaWJUR82nNNYnGaZHMoeYMNkHtFQ+0DlRIAzRfOdfihitwLwoi/ H5QQmu+ry/6fpyxgLkWJyKGh46eb88BW7hVTe9rIU6RaTIS4P9ZV67wyW /RSgrcCDrfZNlkHx5XJNm55Ji8wkcACg41c5DqNGf3BMGblhxv6GiKzgZ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10711"; a="350159401" X-IronPort-AV: E=Sophos;i="5.99,277,1677571200"; d="scan'208";a="350159401" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2023 15:07:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10711"; a="695194956" X-IronPort-AV: E=Sophos;i="5.99,277,1677571200"; d="scan'208";a="695194956" Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.212.225.207]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2023 15:07:49 -0700 Date: Mon, 15 May 2023 15:07:29 -0700 Message-ID: <87jzx9p7tq.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Tvrtko Ursulin In-Reply-To: <45f66bdd-7a67-d6a2-e1cb-111c78c0a70f@linux.intel.com> References: <20230506005816.1891043-1-umesh.nerlige.ramappa@intel.com> <20230506005816.1891043-6-umesh.nerlige.ramappa@intel.com> <87ttwie4qn.wl-ashutosh.dixit@intel.com> <45f66bdd-7a67-d6a2-e1cb-111c78c0a70f@linux.intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: intel-gfx@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Mon, 15 May 2023 03:10:56 -0700, Tvrtko Ursulin wrote: > Hi Tvrtko, > On 12/05/2023 21:57, Umesh Nerlige Ramappa wrote: > > On Fri, May 12, 2023 at 11:56:18AM +0100, Tvrtko Ursulin wrote: > >> > >> On 12/05/2023 02:08, Dixit, Ashutosh wrote: > >>> On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote: > >>>> > >>>> From: Tvrtko Ursulin > >>>> > >>>> Reserve some bits in the counter config namespace which will carry t= he > >>>> tile id and prepare the code to handle this. > >>>> > >>>> No per tile counters have been added yet. > >>>> > >>>> v2: > >>>> - Fix checkpatch issues > >>>> - Use 4 bits for gt id in non-engine counters. Drop FIXME. > >>>> - Set MAX GTs to 4. Drop FIXME. > >>>> > >>>> Signed-off-by: Tvrtko Ursulin > >>>> Signed-off-by: Umesh Nerlige Ramappa > > 8< > > >>>> +static u64 frequency_enabled_mask(void) > >>>> +{ > >>>> +=A0=A0=A0 unsigned int i; > >>>> +=A0=A0=A0 u64 mask =3D 0; > >>>> + > >>>> +=A0=A0=A0 for (i =3D 0; i < I915_PMU_MAX_GTS; i++) > >>>> +=A0=A0=A0=A0=A0=A0=A0 mask |=3D config_mask(__I915_PMU_ACTUAL_FREQU= ENCY(i)) | > >>>> +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 config_mask(__I915_PMU_REQUESTED_= FREQUENCY(i)); > >>>> + > >>>> +=A0=A0=A0 return mask; > >>>> +} > >>>> + > >>>> =A0static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) > >>>> =A0{ > >>>> =A0=A0=A0=A0struct drm_i915_private *i915 =3D container_of(pmu, type= of(*i915), > >>>> pmu); > >>>> @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, > >>>> bool gpu_active) > >>>> =A0=A0=A0=A0 * Mask out all the ones which do not need the timer, or= in > >>>> =A0=A0=A0=A0 * other words keep all the ones that could need the tim= er. > >>>> =A0=A0=A0=A0 */ > >>>> -=A0=A0=A0 enable &=3D config_mask(I915_PMU_ACTUAL_FREQUENCY) | > >>>> -=A0=A0=A0=A0=A0=A0=A0=A0=A0 config_mask(I915_PMU_REQUESTED_FREQUENC= Y) | > >>>> -=A0=A0=A0=A0=A0=A0=A0=A0=A0 ENGINE_SAMPLE_MASK; > >>>> +=A0=A0=A0 enable &=3D frequency_enabled_mask() | ENGINE_SAMPLE_MASK; > >>> > >>> u32 enable & u64 frequency_enabled_mask > >>> > >>> ugly but ok I guess? Or change enable to u64? > > > > making pmu->enable u64 as well as other places where it is assigned to > > local variables. > > > >> > >> Hmm.. yes very ugly. Could have been an accident which happened to work > >> because there is a single timer (not per tile). > > > > Happened to work because the frequency mask does not spill over to the > > upper 32 bits (even for multi tile). > > > > --------------------- START_SECTION ---------------- > >> > >> Similar issue in frequency_sampling_enabled too. Gt_id argument to it > >> seems pointless. > > > > Not sure why it's pointless. We need the gt_id to determine the right > > mask for that specific gt. If it's not enabled, then we just return > > without pm_get and async put (like you mention later). > > And this piece of code is called within for_each_gt. > > I think I got a little confused cross referencing the code and patches la= st > week and did not mentally see all the changes. > > Because the hunk in other_bit() is correctly adding support for per gt bi= ts. > > The layout of pmu->enable ends up like this: > > bits 0 - 2: engine events > bits 3 - 5: gt0 other events > bits 6 - 8: gt1 other events > bits 9 - 11: gt2 other events > bits 12 - 14: gt3 other events Correct. > > >> So I now think whole frequency_enabled_mask() is just pointless and > >> should be removed. And then pmu_needs_time code can stay as is. Possib= ly > >> add a config_mask_32 helper which ensures no bits in upper 32 bits are > >> returned. > >> > >> That is if we are happy for the frequency_sampling_enabled returning > >> true for all gts, regardless of which ones actually have frequency > >> sampling enabled. > >> > >> Or if we want to implement it as I probably have intended, we will need > >> to add some gt bits into pmu->enable. Maybe reserve top four same as > >> with config counters. > > > > Nope. What you have here works just fine. pmu->enable should not include > > any gt id info. gt_id[63:60] is only a concept for pmu config sent by > > user.=A0 config_mask and pmu->enable are i915 internal bookkeeping (bit > > masks) just to track what events need to be sampled.=A0 The 'other' bit > > masks are a function of gt_id because we use gt_id to calculate a > > contiguous numerical value for these 'other' events. That's all. Once t= he > > numerical value is calculated, there is no need for gt_id because > > config_mask is BIT_ULL(numerical_value). Since the numerical values nev= er > > exceeded 31 (even for multi-gts), everything worked even with 32 bit > > pmu->enable. > > Yep. > > So question then is why make pmu->enable u64? The only reason was simplicity, since a lot of the existing code already assumes u64. E.g. if we keep pmu->enable u32, we should have to do the following: * Change config_mask() return type to u32 (in frequency_sampling_enabled(), we have 'pmu->enable & config_mask()') * Change frequency_enabled_mask() return type to u32 (again uses config_mask() so if we change config_mask() to u32 we change return type here too) * In i915_pmu_enable(), change 'pmu->enable |=3D BIT_ULL(bit)' to 'pmu->enable |=3D BIT(bit)' So yes, if we think we should be using pmu->enable u32, let's change this to be consistent everywhere. > Instead frequency_enabled_mask() should be made u32 since the bitwise or > composition of config_masks() is guaranteed to fit. > > At most it can have an internal u64 for the mask, assert upper_32_bits are > zero and return lower_32_bits. > > Did I get it right this time round? :) Yes, though we'd have to make the config_mask() type change above to make frequency_enabled_mask() u32. Or we can just keep everything u64. Let's decide one way or the other and close this. It seems Tvrtko is leaning towards making pmu->enable and frequency_enabled_mask() u32? Thanks. -- Ashutosh