From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 431A9E7716F for ; Thu, 5 Dec 2024 02:37:04 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 07BE510E0A4; Thu, 5 Dec 2024 02:37:04 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="ETKt8eR7"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) by gabe.freedesktop.org (Postfix) with ESMTPS id D9DB510E0A4 for ; Thu, 5 Dec 2024 02:37:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733366223; x=1764902223; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=eQYLZKNWqJNhU57yJ3PkLvFNMyHq7LrYBcSZbf8XNcM=; b=ETKt8eR7KvWIY3y2PfElEDfws4q3F6MSkIsyfwR6PRHEoFxkNtz/GsEt YaDjrQA6OHhU5aXHUBmNiTZXT5nHg5zISOQkz0IlfOFRb+l0Usf9fICP8 gvkR9B9icdLXgLF1jsQSrE5XKvrVO1yDRgiQWMsI/KwEy3TFMtzRtFmZQ unwA3u8S2lDuwUXIM0iBLwiQAuBj/01md1yc/7GndbMYfI3zLdpqw/9MP lE6uV5iTRxgvQ+T0oIOhfw8FHi9VmcGdv6Cx2ZLNZiqDdrI0//v110K/y fXZDj2avVDYq4LxPlZMFhyRxYn1GkyttOEu/MoQSIPyQ50XX7rrzrQIql Q==; X-CSE-ConnectionGUID: AhevNidrSOGp46IdCoj1uA== X-CSE-MsgGUID: 8n6OkAVaQfyCge1qoViR/g== X-IronPort-AV: E=McAfee;i="6700,10204,11276"; a="37324368" X-IronPort-AV: E=Sophos;i="6.12,209,1728975600"; d="scan'208";a="37324368" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2024 18:37:03 -0800 X-CSE-ConnectionGUID: T00QhJZWR1eUEEk/D6aRXg== X-CSE-MsgGUID: STjC+5fxSOiXQEnDqstUbQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,209,1728975600"; d="scan'208";a="93796420" Received: from orsosgc001.jf.intel.com (HELO orsosgc001.intel.com) ([10.165.21.142]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2024 18:37:03 -0800 Date: Wed, 04 Dec 2024 18:37:02 -0800 Message-ID: <85plm63lf5.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Matthew Brost Cc: Umesh Nerlige Ramappa , intel-xe@lists.freedesktop.org Subject: Re: [PATCH 2/2] xe/oa: Use MI_LOAD_REGISTER_IMMEDIATE to enable OAR/OAC In-Reply-To: References: <20241127003127.42560-1-umesh.nerlige.ramappa@intel.com> <20241127003127.42560-3-umesh.nerlige.ramappa@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-redhat-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Wed, 04 Dec 2024 13:05:41 -0800, Matthew Brost wrote: > Hi Umesh, > On Tue, Nov 26, 2024 at 04:31:27PM -0800, Umesh Nerlige Ramappa wrote: > > To enable OAR/OAC, a bit in RING_CONTEXT_CONTROL needs to be set. > > Setting this bit cause the context image size to change and if not done > > correct, can cause undesired hangs. > > > > Current code uses a separate exec_queue to modify this bit and is > > error-prone. As per HW recommendation, submit MI_LOAD_REGISTER_IMM to > > the target hardware context to modify the relevant bit. Looks like Matt is fine with the overall approach of this series and the locking here should also be ok after the suggested changes. I can review the OA specific stuff. A couple of comments below. > > > > Signed-off-by: Umesh Nerlige Ramappa > > --- > > drivers/gpu/drm/xe/xe_oa.c | 68 ++++++++++++++++++++++++++++---------- > > 1 file changed, 50 insertions(+), 18 deletions(-) > > > > diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c > > index 8dd55798ab31..73c07fee6409 100644 > > --- a/drivers/gpu/drm/xe/xe_oa.c > > +++ b/drivers/gpu/drm/xe/xe_oa.c > > @@ -719,27 +719,55 @@ static int xe_oa_modify_ctx_image(struct xe_oa_stream *stream, struct xe_lrc *lr > > return err; > > } > > > > -static int xe_oa_load_with_lri(struct xe_oa_stream *stream, struct xe_oa_reg *reg_lri) > > +static int xe_oa_load_with_lri(struct xe_oa_stream *stream, struct xe_oa_reg *reg_lri, u32 count) > > { > > + struct xe_exec_queue *q = stream->exec_q; The two OAR programming batches are followed by another batch to configure the OA registers. We fence on only the last batch. The assumption is that all batches are submitted to the same exec queue, so we only need to fence on the last one. So we need to probably send all batches on either the kernel exec_q (when user exec_q is not availble) or the user exec_q (when user exec_q is available). We'll need to see if there's any reason we cannot do it and then revisit. > > + struct xe_sched_job *job; > > struct dma_fence *fence; > > struct xe_bb *bb; > > + long timeout; > > int err; > > > > - bb = xe_bb_new(stream->gt, 3, false); > > + bb = xe_bb_new(stream->gt, 2 * count + 1, false); > > if (IS_ERR(bb)) { > > err = PTR_ERR(bb); > > goto exit; > > } > > > > - write_cs_mi_lri(bb, reg_lri, 1); > > + write_cs_mi_lri(bb, reg_lri, count); > > > > - fence = xe_oa_submit_bb(stream, XE_OA_SUBMIT_NO_DEPS, bb); > > - if (IS_ERR(fence)) { > > - err = PTR_ERR(fence); > > + if (q->vm) { > > + down_read(&q->vm->lock); > > + xe_vm_lock(q->vm, false); > > + } > > + > > + job = xe_bb_create_job(q, bb); > > + job->ggtt = true; > > Need to set this after the IS_ERR(job) check. > > > + if (IS_ERR(job)) { > > + err = PTR_ERR(job); > > goto free_bb; > > You need to unlock the VM here. > > Matt > > > } > > - xe_bb_free(bb, fence); > > + > > + xe_sched_job_arm(job); > > + fence = dma_fence_get(&job->drm.s_fence->finished); > > + xe_sched_job_push(job); > > + > > + if (q->vm) { > > + xe_vm_unlock(q->vm); > > + up_read(&q->vm->lock); > > + } > > + > > + timeout = dma_fence_wait_timeout(fence, false, HZ); The code should not block here, it should be just push the job and move on. So the code is completely non-blocking if any output fences are specified for stream open. The fences should signal correctly by design. See calls to xe_oa_submit_bb(). > > dma_fence_put(fence); > > + xe_bb_free(bb, fence); > > + > > + if (timeout < 0) { > > + drm_dbg(&stream->oa->xe->drm, "OA load with LRI failed %ld\n", timeout); > > + return timeout; > > + } else if (!timeout) { > > + drm_dbg(&stream->oa->xe->drm, "OA load with LRI timed out\n"); > > + return -ETIME; > > + } > > > > return 0; > > free_bb: > > @@ -751,8 +779,6 @@ static int xe_oa_load_with_lri(struct xe_oa_stream *stream, struct xe_oa_reg *re > > static int xe_oa_configure_oar_context(struct xe_oa_stream *stream, bool enable) > > { > > const struct xe_oa_format *format = stream->oa_buffer.format; > > - struct xe_lrc *lrc = stream->exec_q->lrc[0]; > > - u32 regs_offset = xe_lrc_regs_offset(lrc) / sizeof(u32); > > u32 oacontrol = __format_to_oactrl(format, OAR_OACONTROL_COUNTER_SEL_MASK) | > > (enable ? OAR_OACONTROL_COUNTER_ENABLE : 0); > > > > @@ -762,13 +788,17 @@ static int xe_oa_configure_oar_context(struct xe_oa_stream *stream, bool enable) > > stream->oa->ctx_oactxctrl_offset[stream->hwe->class] + 1, > > enable ? OA_COUNTER_RESUME : 0, > > }, > > + }; > > + struct xe_oa_reg reg_lri[] = { > > + { > > + OAR_OACONTROL, > > + oacontrol, > > + }, > > { > > RING_CONTEXT_CONTROL(stream->hwe->mmio_base), > > - regs_offset + CTX_CONTEXT_CONTROL, > > _MASKED_BIT_ENABLE(CTX_CTRL_OAC_CONTEXT_ENABLE), > > }, > > }; > > - struct xe_oa_reg reg_lri = { OAR_OACONTROL, oacontrol }; > > int err; > > > > /* Modify stream hwe context image with regs_context */ > > @@ -778,14 +808,12 @@ static int xe_oa_configure_oar_context(struct xe_oa_stream *stream, bool enable) > > return err; > > > > /* Apply reg_lri using LRI */ > > - return xe_oa_load_with_lri(stream, ®_lri); > > + return xe_oa_load_with_lri(stream, reg_lri, ARRAY_SIZE(reg_lri)); > > } > > > > static int xe_oa_configure_oac_context(struct xe_oa_stream *stream, bool enable) > > { > > const struct xe_oa_format *format = stream->oa_buffer.format; > > - struct xe_lrc *lrc = stream->exec_q->lrc[0]; > > - u32 regs_offset = xe_lrc_regs_offset(lrc) / sizeof(u32); > > u32 oacontrol = __format_to_oactrl(format, OAR_OACONTROL_COUNTER_SEL_MASK) | > > (enable ? OAR_OACONTROL_COUNTER_ENABLE : 0); > > struct flex regs_context[] = { > > @@ -794,14 +822,18 @@ static int xe_oa_configure_oac_context(struct xe_oa_stream *stream, bool enable) > > stream->oa->ctx_oactxctrl_offset[stream->hwe->class] + 1, > > enable ? OA_COUNTER_RESUME : 0, > > }, > > + }; > > + struct xe_oa_reg reg_lri[] = { > > + { > > + OAC_OACONTROL, > > + oacontrol > > + }, > > { > > RING_CONTEXT_CONTROL(stream->hwe->mmio_base), > > - regs_offset + CTX_CONTEXT_CONTROL, > > _MASKED_BIT_ENABLE(CTX_CTRL_OAC_CONTEXT_ENABLE) | > > _MASKED_FIELD(CTX_CTRL_RUN_ALONE, enable ? CTX_CTRL_RUN_ALONE : 0), > > }, > > - }; > > - struct xe_oa_reg reg_lri = { OAC_OACONTROL, oacontrol }; > > + }; > > int err; > > > > /* Set ccs select to enable programming of OAC_OACONTROL */ > > @@ -815,7 +847,7 @@ static int xe_oa_configure_oac_context(struct xe_oa_stream *stream, bool enable) > > return err; > > > > /* Apply reg_lri using LRI */ > > - return xe_oa_load_with_lri(stream, ®_lri); > > + return xe_oa_load_with_lri(stream, reg_lri, ARRAY_SIZE(reg_lri)); > > } > > > > static int xe_oa_configure_oa_context(struct xe_oa_stream *stream, bool enable) > > -- > > 2.34.1 > > Thanks. -- Ashutosh