From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 64417C25B75 for ; Wed, 29 May 2024 06:20:23 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 085F010E346; Wed, 29 May 2024 06:20:22 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="ZCDoVEv1"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) by gabe.freedesktop.org (Postfix) with ESMTPS id CC70010E3C0 for ; Wed, 29 May 2024 06:20:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1716963612; x=1748499612; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=oAJfHdSc0EQMk5O450RC2HCBq3kErsFaoaPUEoYWQpE=; b=ZCDoVEv1ykD1PyQ6+OAuX14YNo6HjhYO//ECm8AlOjWLs7XF3z3WtKFi qiqoygiHMHV+/NpRzWCtTISy6cROR9yMAyTF/RUmVulQKqtH663EZTPmY e1W4i0emwMj7LNfodKXp3el5Sum7B9i3gXAx8gOunVs9k7mbxWsKANAnS Ros97VPwfw2xLgqFkrDF6ao0hckA+mQeNOuW7N4BFhLyvkT1FHVgXMVAf 0g46dbAiXXqvpWNu8BikytNAHUDZlLodz7Z6+Wl963dXtNsbyR3phE68z gXHhvQJcDdcihPJsVxs0N3+y1/9gRZtGuBorOOo0AN/D0Ixu8I0iehTgE g==; X-CSE-ConnectionGUID: IFHasX7LS9SiGXGfJF9BpQ== X-CSE-MsgGUID: utGBCKJnRWaSz+SN8PFSZg== X-IronPort-AV: E=McAfee;i="6600,9927,11085"; a="38730391" X-IronPort-AV: E=Sophos;i="6.08,197,1712646000"; d="scan'208";a="38730391" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2024 23:20:11 -0700 X-CSE-ConnectionGUID: qb0ylCvrTMeChplEDWtCOg== X-CSE-MsgGUID: fDLrxoXkTgeXJjVNPDUWHQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,197,1712646000"; d="scan'208";a="66180560" Received: from zhaomin2-mobl1.ccr.corp.intel.com (HELO adixit-arch.intel.com) ([10.124.81.153]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2024 23:20:10 -0700 Date: Tue, 28 May 2024 23:11:33 -0700 Message-ID: <87v82xno4q.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Lionel Landwerlin Cc: intel-xe@lists.freedesktop.org, Jose Souza , Umesh Nerlige Ramappa Subject: Re: [PATCH 07/17] drm/xe/oa: OA stream initialization (OAG) In-Reply-To: <87wmndnrob.wl-ashutosh.dixit@intel.com> References: <20240527014333.603914-1-ashutosh.dixit@intel.com> <20240527014333.603914-8-ashutosh.dixit@intel.com> <0cd6f261-2345-4cc5-b95c-b2c28dc79fa5@intel.com> <8734q2pkus.wl-ashutosh.dixit@intel.com> <855xuyjw8k.wl-ashutosh.dixit@intel.com> <4e2d7cb9-22e3-48cb-9aec-aaa07ad2f8df@intel.com> <854jaijtj6.wl-ashutosh.dixit@intel.com> <87wmndnrob.wl-ashutosh.dixit@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.3 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Tue, 28 May 2024 21:55:00 -0700, Dixit, Ashutosh wrote: > > On Tue, 28 May 2024 00:43:42 -0700, Lionel Landwerlin wrote: > > > > Hi Lionel/Jose, > > > On 28/05/2024 10:16, Dixit, Ashutosh wrote: > > > On Mon, 27 May 2024 23:39:50 -0700, Lionel Landwerlin wrote: > > >> On 28/05/2024 09:17, Dixit, Ashutosh wrote: > > >>> On Mon, 27 May 2024 22:47:13 -0700, Lionel Landwerlin wrote: > > >>> > > >>>> On 28/05/2024 08:27, Dixit, Ashutosh wrote: > > >>>>> On Mon, 27 May 2024 00:04:21 -0700, Lionel Landwerlin wrote: > > >>>>> > > >>>>>>> +static int xe_oa_stream_init(struct xe_oa_stream *stream, > > >>>>>>> + struct xe_oa_open_param *param) > > >>>>>>> +{ > > >>>>> /snip/ > > >>>>> > > >>>>>>> + stream->k_exec_q = xe_exec_queue_create(stream->oa->xe, NULL, > > >>>>>>> + BIT(stream->hwe->logical_instance), 1, > > >>>>>>> + stream->hwe, EXEC_QUEUE_FLAG_KERNEL, 0); > > >>>>>> Hi Ashutosh, > > >>>>>> > > >>>>>> On i915 the changes of configuration were pipelined in the application's > > >>>>>> execution just like any other submission. > > >>>>>> > > >>>>>> Creating another queue completely unsynchronized from the application's > > >>>>>> submissions makes this non usable in my opinion. > > >>>>> As we discussed previously, the plan here is to provide a drm_xe_sync array, > > >>>>> through stream properties, which can use to synchronize OA programming with > > >>>>> workload submisson. > > >>>>> > > >>>>> Would that not work? If not, we can do what was done in i915. But note that > > >>>>> i915 still has unresolved hangs, which I believe are due to the spinner > > >>>>> running on the application engine (iirc repeatedly opening/closing an OA > > >>>>> stream will hang in i915, though it could be due to other i915 > > >>>>> complexity). That is why thought using drm_xe_sync array is both safer and > > >>>>> more standard way of doing what we want to achieve. > > >>>>> > > >>>>> Basically the output sync object will be signalled after registers are > > >>>>> programmed and also any additional OA programming delay (which is > > >>>>> implemented in i915 using the spinner). > > >>>>> > > >>>>> This would be done both for OA stream open and changing OA stream > > >>>>> configuration. > > >>> That is true. But now that I have other stuff like gpuvis wrapped up, I > > >>> plan to start looking these couple of missing uapi pieces (hold preemption > > >>> and synchronization, likely in that order). > > >>> > > >>> Because synchronization is not implemented I add the delay below: > > >>> > > >>> static int xe_oa_emit_oa_config(struct xe_oa_stream *stream) > > >>> { > > >>> #define NOA_PROGRAM_ADDITIONAL_DELAY_US 500 > > >>> struct xe_oa_config_bo *oa_bo; > > >>> int err, us = NOA_PROGRAM_ADDITIONAL_DELAY_US; > > >>> > > >>> oa_bo = xe_oa_alloc_config_buffer(stream); > > >>> if (IS_ERR(oa_bo)) { > > >>> err = PTR_ERR(oa_bo); > > >>> goto exit; > > >>> } > > >>> > > >>> err = xe_oa_submit_bb(stream, oa_bo->bb); > > >>> > > >>> /* Additional empirical delay needed for NOA programming after registers are written */ > > >>> usleep_range(us, 2 * us); > > >>> exit: > > >>> return err; > > >>> } > > >>> > > >>> I need understand this is temporaty band-aid, since it stalls the > > >>> submission pipeline and needs to be replaced by proper synchronization. > > >>> > > >>>> Just letting you know, because we cannot use the current ioctl because it > > >>>> doesn't behave as we expect > > >>> You wouldn't be able to merge the Mesa PR as per the current uapi now and > > >>> then add additional Mesa patches, when we implement these couple of missing > > >>> uapi features in KMD? > > >> > > >> We could merge only the stuff that parse the reports, that's enough to have > > >> perfetto work. > > > Hi Lionel, so this means the OAG buffer stuff, correct? > > > > Correct > > > > >> But all of the VK_KHR_performance_query won't work as far as I can tell. > > > And this is the OAR stuff, synchronization and hold preemption? > > > > > > > This is actually a mix of OAG/OAR. We do read a couple of OAG registers > > that don't come as part of MI_REPORT_PERF_COUNT. > > > > We will need a way to tell when you've added the feature we need for > > VK_KHR_performance_query. > > > > I don't think we want to test this at runtime given some of the ioctl > > take time. > > Hmm, you mean like a version? We have capability bits in Xe and could > potentially add a bit if needed. It is in the oa_unit query so like init > time, not runtime. Though, maybe, just not implement in mesa yet and implement it only after KMD exposes that property? Till then say VK_KHR_performance_query is not supported? > > > > > Yes even if we can merge the OAG stuff first, that will get most of KMD OA > > > code merged and then we can merge the remaining stuff later. Since it's a > > > pain to maintain the code out of tree, that's why even if I can merge the > > > OAG stuff (together with Mesa) which is the majority of the OA KMD code > > > anyway, that would be a big relief. > > So, do you think you Mesa guys can give us an ack to merge the uapi we have > now? And add the missing uapi bits in the second round of OA uapi changes? > Or is there anything else you want us to do in this first round? > > For the second round, here is the list I have: > > * hold preemption (needs some investigation as to feasibility) > * xe_sync for OA stream configuration and reconfiguration > * Equivalent of DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID. I am still trying > to figure out if this is really needed. See here: > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29312#note_2429650 > > Since this seems to be using some MDAPI metrics in Mesa, I think it > should be ok to do this in the second round too. > > Thanks. > -- > Ashutosh