From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 27FA4C3DA4A for ; Fri, 9 Aug 2024 04:31:26 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E272410E857; Fri, 9 Aug 2024 04:31:25 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="G+d63oCI"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id 51C9610E857 for ; Fri, 9 Aug 2024 04:31:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723177884; x=1754713884; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=xmBKtVpob7nW+EGETjOW8pKZudeYr0N8DkSNb7xqX7s=; b=G+d63oCI0NbLSCXZE0YzFqHCkGb5lu67S1WPfN3BA+gHOjSvIUQOaxkU KSZyWX0CNrVba+5CCYbUXOI62K06FSzJqHEFtcFeYhZQhSpy2PC5dcKh6 NkPbfPi2tl+QRyGFhMzEiA84HeXN1sKSwBrq2uqWOT2pTltFbhDLAh0Bm nUn6Uy6dLthON7RbIL0Kv89Q8gXMK5ATO/Vf6nrxvZ7FTxiEeusCyuRqL SpUXzaY0140i6ashWuk/oE0Rrp0wnimMllU+Ta8vTwHZ3ydSmGU7EZFsB auSEM11OzLPMscXtKkBU4AXHUNVK4NGeaelym0m6Tdu3LjrCFjnpV0wGg g==; X-CSE-ConnectionGUID: KZoqoXMLTei5VYKPQ+o3bA== X-CSE-MsgGUID: cIhKqI06RxmJmbceOhgg1w== X-IronPort-AV: E=McAfee;i="6700,10204,11158"; a="24244513" X-IronPort-AV: E=Sophos;i="6.09,275,1716274800"; d="scan'208";a="24244513" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2024 21:31:24 -0700 X-CSE-ConnectionGUID: O9FMpj8XQUWT6+rSzcLBYw== X-CSE-MsgGUID: mOITkJY7QTK3fNC6haHKoQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,275,1716274800"; d="scan'208";a="57100053" Received: from rsanford-mobl1.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.125.0.114]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2024 21:31:23 -0700 Date: Thu, 08 Aug 2024 21:15:05 -0700 Message-ID: <87mslmcoxi.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Matthew Brost Cc: intel-xe@lists.freedesktop.org, Umesh Nerlige Ramappa , Jose Souza , Lionel Landwerlin Subject: Re: [PATCH 5/8] drm/xe/oa: Signal output fences In-Reply-To: References: <20240808174139.4027534-1-ashutosh.dixit@intel.com> <20240808174139.4027534-6-ashutosh.dixit@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.4 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Thu, 08 Aug 2024 16:19:30 -0700, Matthew Brost wrote: > Hi Matt, > On Thu, Aug 08, 2024 at 10:41:36AM -0700, Ashutosh Dixit wrote: > > Complete 'struct xe_oa_fence' to include the dma_fence used to signal > > output fences in the xe_sync array. The fences are signalled > > asynchronously. When there are no output fences to signal, the OA > > configuration wait is synchronously re-introduced into the ioctl. > > > > Signed-off-by: Ashutosh Dixit > > --- > > drivers/gpu/drm/xe/xe_oa.c | 46 +++++++++++++++++++++++++++++++++++--- > > 1 file changed, 43 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c > > index 416e031ac454b..bc421cd0af6ba 100644 > > --- a/drivers/gpu/drm/xe/xe_oa.c > > +++ b/drivers/gpu/drm/xe/xe_oa.c > > @@ -96,6 +96,10 @@ struct xe_oa_config_bo { > > }; > > > > struct xe_oa_fence { > > I don't think you need this at all. > > The fence from the job can directly be installed via > xe_sync_entry_signal between xe_sched_job_arm and > xe_sched_job_push. I think I understand what you are saying here, that job->drm.s_fence->finished can be used to signal the output fences. But the complication in OA is that an additional delay is needed after the register programming batch has executed (see below where I point this out) before we signal the output fences. So the sequence is: * Submit a register programming batch * Wait till this completes (signaled by the finished fence) * Wait for an additional delay * Now signal output fences after this delay (signal ofence->base) So 'struct xe_oa_fence' is needed because of this additional delay. Otherwise I agree, we could just have used the finished fence. > Then before xe_sync_entry_cleanup check for !num_signal and directly > wait on the job's fence. Here also, we never want to wait in the direct ioctl call, we always want to do the wait in the work item (so it has to be an asychronous wait). > > Matt > > > + /* @base: dma fence base */ > > + struct dma_fence base; > > + /* @lock: lock for the fence */ > > + spinlock_t lock; > > /* @xe: pointer to xe device */ > > struct xe_device *xe; > > /* @work: work to signal that OA configuration is applied */ > > @@ -953,9 +957,26 @@ static void xe_oa_fence_work_fn(struct work_struct *w) > > /* Additional empirical delay needed for NOA programming after registers are written */ > > usleep_range(us, 2 * us); This is the additional delay after which we signal output fences. Please let me know if you agree with this or if there's another way to do this. Thanks. -- Ashutosh > > > > - kfree(ofence); > > + /* Now signal fence to indicate new OA configuration is active */ > > + dma_fence_signal(&ofence->base); > > + dma_fence_put(&ofence->base); > > } > > > > +static const char *xe_oa_get_driver_name(struct dma_fence *fence) > > +{ > > + return "xe_oa"; > > +} > > + > > +static const char *xe_oa_get_timeline_name(struct dma_fence *fence) > > +{ > > + return "unbound"; > > +} > > + > > +static const struct dma_fence_ops xe_oa_fence_ops = { > > + .get_driver_name = xe_oa_get_driver_name, > > + .get_timeline_name = xe_oa_get_timeline_name, > > +}; > > + > > static struct xe_oa_fence *xe_oa_fence_init(struct xe_device *xe, struct dma_fence *config_fence) > > { > > struct xe_oa_fence *ofence; > > @@ -967,6 +988,8 @@ static struct xe_oa_fence *xe_oa_fence_init(struct xe_device *xe, struct dma_fen > > ofence->xe = xe; > > INIT_WORK(&ofence->work, xe_oa_fence_work_fn); > > ofence->config_fence = config_fence; > > + spin_lock_init(&ofence->lock); > > + dma_fence_init(&ofence->base, &xe_oa_fence_ops, &ofence->lock, 0, 0); > > > > return ofence; > > } > > @@ -975,8 +998,8 @@ static int xe_oa_emit_oa_config(struct xe_oa_stream *stream, struct xe_oa_config > > { > > struct xe_oa_config_bo *oa_bo; > > struct xe_oa_fence *ofence; > > + int i, err, num_signal = 0; > > struct dma_fence *fence; > > - int err; > > > > /* Emit OA configuration batch */ > > oa_bo = xe_oa_alloc_config_buffer(stream, config); > > @@ -989,13 +1012,30 @@ static int xe_oa_emit_oa_config(struct xe_oa_stream *stream, struct xe_oa_config > > if (err) > > goto exit; > > > > + /* Initialize and set fence to signal */ > > ofence = xe_oa_fence_init(stream->oa->xe, fence); > > if (IS_ERR(ofence)) { > > err = PTR_ERR(ofence); > > goto put_fence; > > } > > > > - xe_oa_fence_work_fn(&ofence->work); > > + for (i = 0; i < stream->num_syncs; i++) > > + xe_sync_entry_signal(&stream->syncs[i], &ofence->base); > > + > > + /* Schedule work to signal the fence */ > > + queue_work(system_unbound_wq, &ofence->work); > > + > > + /* If nothing needs to be signaled we wait synchronously */ > > + for (i = 0; i < stream->num_syncs; i++) > > + if (stream->syncs[i].flags & DRM_XE_SYNC_FLAG_SIGNAL) > > + num_signal++; > > + if (!num_signal) > > + flush_work(&ofence->work); > > + > > + /* Done with syncs */ > > + for (i = 0; i < stream->num_syncs; i++) > > + xe_sync_entry_cleanup(&stream->syncs[i]); > > + kfree(stream->syncs); > > > > return 0; > > put_fence: > > -- > > 2.41.0 > >