From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 392E8C3DA4A for ; Tue, 20 Aug 2024 01:21:37 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 07BCB10E386; Tue, 20 Aug 2024 01:21:37 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="NNfHo3Ke"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5C68610E386 for ; Tue, 20 Aug 2024 01:21:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724116895; x=1755652895; h=date:message-id:from:to:cc:subject:references: mime-version; bh=awgWhArIsUwm6Hgo5MwKnX4B9Kj5gkwzrmr13ccc1HQ=; b=NNfHo3KeG0KHKR5NgbFWdfBqqh8ZGJq6Gay+z+Y0SEUC/Dg0wH9J2vTU I7Qo/2y4xbbnM3lfZN7MmieK+oBMDhowq3+rw27m8dPsbS/r/cjETB/sZ f5G1+sWIxImQq/s+7Aby9Bh4+F9ZL9nnMOX4wryavu+HN4irXzOU5jR7h GG6SWETw6WgphxFh66d4JVHpmlBbAqL+mjHsNv7c5ERAKKVExFlUmxdIU u0BaYF60Gd4Ojqc9zsytq/hpo7ERUJ4GiKb9isq9sTx6tioyVmba1fpaX Ct2u3T8kKXMjj9l7vYax4g979AveoV/HCS8ijtScDaSRRGndetpeXoesl w==; X-CSE-ConnectionGUID: sNghyg4oRK6PH6CwVxTd4A== X-CSE-MsgGUID: D4lUpFjkTA6UncLTHbQKfA== X-IronPort-AV: E=McAfee;i="6700,10204,11169"; a="13102315" X-IronPort-AV: E=Sophos;i="6.10,160,1719903600"; d="scan'208";a="13102315" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 18:21:35 -0700 X-CSE-ConnectionGUID: jEb6P06FS1m3UVGbGJ7HRw== X-CSE-MsgGUID: 0gRkdj9HS4S9+nnDrm4pAA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,160,1719903600"; d="scan'208";a="61330330" Received: from peterval-mobl1.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.124.114.37]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 18:21:34 -0700 Date: Mon, 19 Aug 2024 18:04:47 -0700 Message-ID: <87le0sdmxc.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Matthew Brost Cc: intel-xe@lists.freedesktop.org, Umesh Nerlige Ramappa , Jose Souza , Lionel Landwerlin Subject: Re: [PATCH 3/8] drm/xe/oa/uapi: Define and parse OA sync properties References: <20240808174139.4027534-1-ashutosh.dixit@intel.com> <20240808174139.4027534-4-ashutosh.dixit@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.4 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" Some time ago, Dixit, Ashutosh wrote: > Hi Matt, > On Thu, 08 Aug 2024 16:13:38 -0700, Matthew Brost wrote: > > > > On Thu, Aug 08, 2024 at 10:41:34AM -0700, Ashutosh Dixit wrote: > > > Now that we have laid the groundwork, introduce OA sync properties in the > > > uapi. Also parse the input xe_sync array as is done elsewhere in the > > > driver. > > > > > > Signed-off-by: Ashutosh Dixit > > > --- > > > drivers/gpu/drm/xe/xe_oa.c | 83 +++++++++++++++++++++++++++++++- > > > drivers/gpu/drm/xe/xe_oa_types.h | 6 +++ > > > include/uapi/drm/xe_drm.h | 14 ++++++ > > > 3 files changed, 102 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c > > > index f97d64ffb460f..ba8f2e9d95b7f 100644 > > > --- a/drivers/gpu/drm/xe/xe_oa.c > > > +++ b/drivers/gpu/drm/xe/xe_oa.c > > > @@ -36,6 +36,7 @@ > > > #include "xe_pm.h" > > > #include "xe_sched_job.h" > > > #include "xe_sriov.h" > > > +#include "xe_sync.h" > > > > > > #define DEFAULT_POLL_FREQUENCY_HZ 200 > > > #define DEFAULT_POLL_PERIOD_NS (NSEC_PER_SEC / DEFAULT_POLL_FREQUENCY_HZ) > > > @@ -70,6 +71,7 @@ struct flex { > > > }; > > > > > > struct xe_oa_open_param { > > > + struct xe_file *xef; > > > u32 oa_unit_id; > > > bool sample; > > > u32 metric_set; > > > @@ -81,6 +83,9 @@ struct xe_oa_open_param { > > > struct xe_exec_queue *exec_q; > > > struct xe_hw_engine *hwe; > > > bool no_preempt; > > > + struct drm_xe_sync __user *syncs_user; > > > + int num_syncs; > > > + struct xe_sync_entry *syncs; > > > }; > > > > > > struct xe_oa_config_bo { > > > @@ -1417,6 +1422,9 @@ static int xe_oa_stream_init(struct xe_oa_stream *stream, > > > stream->period_exponent = param->period_exponent; > > > stream->no_preempt = param->no_preempt; > > > > > > + stream->num_syncs = param->num_syncs; > > > + stream->syncs = param->syncs; > > > + > > > /* > > > * For Xe2+, when overrun mode is enabled, there are no partial reports at the end > > > * of buffer, making the OA buffer effectively a non-power-of-2 size circular > > > @@ -1767,6 +1775,20 @@ static int xe_oa_set_no_preempt(struct xe_oa *oa, u64 value, > > > return 0; > > > } > > > > > > +static int xe_oa_set_num_syncs(struct xe_oa *oa, u64 value, > > > + struct xe_oa_open_param *param) > > > +{ > > > + param->num_syncs = value; > > > + return 0; > > > +} > > > + > > > +static int xe_oa_set_syncs_user(struct xe_oa *oa, u64 value, > > > + struct xe_oa_open_param *param) > > > +{ > > > + param->syncs_user = u64_to_user_ptr(value); > > > + return 0; > > > +} > > > + > > > typedef int (*xe_oa_set_property_fn)(struct xe_oa *oa, u64 value, > > > struct xe_oa_open_param *param); > > > static const xe_oa_set_property_fn xe_oa_set_property_funcs[] = { > > > @@ -1779,6 +1801,8 @@ static const xe_oa_set_property_fn xe_oa_set_property_funcs[] = { > > > [DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID] = xe_oa_set_prop_exec_queue_id, > > > [DRM_XE_OA_PROPERTY_OA_ENGINE_INSTANCE] = xe_oa_set_prop_engine_instance, > > > [DRM_XE_OA_PROPERTY_NO_PREEMPT] = xe_oa_set_no_preempt, > > > + [DRM_XE_OA_PROPERTY_NUM_SYNCS] = xe_oa_set_num_syncs, > > > + [DRM_XE_OA_PROPERTY_SYNCS] = xe_oa_set_syncs_user, > > > }; > > > > > > static int xe_oa_user_ext_set_property(struct xe_oa *oa, u64 extension, > > > @@ -1838,6 +1862,49 @@ static int xe_oa_user_extensions(struct xe_oa *oa, u64 extension, int ext_number > > > return 0; > > > } > > > > > > +static int xe_oa_parse_syncs(struct xe_oa *oa, struct xe_oa_open_param *param) > > > +{ > > > + int ret, num_syncs, num_ufence = 0; > > > + > > > + if (param->num_syncs && !param->syncs_user) { > > > + drm_dbg(&oa->xe->drm, "num_syncs specified without sync array\n"); > > > + ret = -EINVAL; > > > + goto exit; > > > + } > > > + > > > + if (param->num_syncs) { > > > + param->syncs = kcalloc(param->num_syncs, sizeof(*param->syncs), GFP_KERNEL); > > > + if (!param->syncs) { > > > + ret = -ENOMEM; > > > + goto exit; > > > + } > > > + } > > > + > > > + for (num_syncs = 0; num_syncs < param->num_syncs; num_syncs++) { > > > + ret = xe_sync_entry_parse(oa->xe, param->xef, ¶m->syncs[num_syncs], > > > + ¶m->syncs_user[num_syncs], SYNC_PARSE_FLAG_EXEC); > > > + if (ret) > > > + goto err_syncs; > > > + > > > + if (xe_sync_is_ufence(¶m->syncs[num_syncs])) > > > > Do you even want to allow ufences for OA? It doesn't seem like you do > > based on this series. Thanks for bringing this up, because I had naively assumed, without verifying, that ufences will "just work" for OA. However, I now tried this out and SYNC_PARSE_FLAG_EXEC flag above is incorrect. By setting the flag to 0, ufences do work for OA, in a way identical to ufences work for the VM bind case (the user fence address must be user pointer). OA submits its batches on a kernel exec queue (and kernel VM) so the 'exec' case is not an option for OA (since a user-visible exec_queue and vm is not always available to OA userspace). I have updated the code and documentation for this in v2. But otherwise assuming we can support ufences for OA too. Thanks. -- Ashutosh > > > > Matt > > > > > + num_ufence++; > > > + } > > > + > > > + if (XE_IOCTL_DBG(oa->xe, num_ufence > 1)) { > > > + ret = -EINVAL; > > > + goto err_syncs; > > > + } > > > + > > > + return 0; > > > + > > > +err_syncs: > > > + while (num_syncs--) > > > + xe_sync_entry_cleanup(¶m->syncs[num_syncs]); > > > + kfree(param->syncs); > > > +exit: > > > + return ret; > > > +} > > > + > > > /** > > > * xe_oa_stream_open_ioctl - Opens an OA stream > > > * @dev: @drm_device > > > @@ -1863,6 +1930,7 @@ int xe_oa_stream_open_ioctl(struct drm_device *dev, u64 data, struct drm_file *f > > > return -ENODEV; > > > } > > > > > > + param.xef = xef; > > > ret = xe_oa_user_extensions(oa, data, 0, ¶m); > > > if (ret) > > > return ret; > > > @@ -1931,11 +1999,24 @@ int xe_oa_stream_open_ioctl(struct drm_device *dev, u64 data, struct drm_file *f > > > drm_dbg(&oa->xe->drm, "Using periodic sampling freq %lld Hz\n", oa_freq_hz); > > > } > > > > > > + ret = xe_oa_parse_syncs(oa, ¶m); > > > + if (ret) > > > + goto err_exec_q; > > > + > > > mutex_lock(¶m.hwe->gt->oa.gt_lock); > > > ret = xe_oa_stream_open_ioctl_locked(oa, ¶m); > > > mutex_unlock(¶m.hwe->gt->oa.gt_lock); > > > + if (ret < 0) > > > + goto err_sync_cleanup; > > > + > > > + return ret; > > > + > > > +err_sync_cleanup: > > > + while (param.num_syncs--) > > > + xe_sync_entry_cleanup(¶m.syncs[param.num_syncs]); > > > + kfree(param.syncs); > > > err_exec_q: > > > - if (ret < 0 && param.exec_q) > > > + if (param.exec_q) > > > xe_exec_queue_put(param.exec_q); > > > return ret; > > > } > > > diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h > > > index 540c3ec53a6d7..c1ca960af9305 100644 > > > --- a/drivers/gpu/drm/xe/xe_oa_types.h > > > +++ b/drivers/gpu/drm/xe/xe_oa_types.h > > > @@ -238,5 +238,11 @@ struct xe_oa_stream { > > > > > > /** @no_preempt: Whether preemption and timeslicing is disabled for stream exec_q */ > > > u32 no_preempt; > > > + > > > + /** @num_syncs: size of @syncs array */ > > > + u32 num_syncs; > > > + > > > + /** @syncs: syncs to wait on and to signal */ > > > + struct xe_sync_entry *syncs; > > > }; > > > #endif > > > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h > > > index b6fbe4988f2e9..0a3daaea4eb27 100644 > > > --- a/include/uapi/drm/xe_drm.h > > > +++ b/include/uapi/drm/xe_drm.h > > > @@ -1632,6 +1632,20 @@ enum drm_xe_oa_property_id { > > > * to be disabled for the stream exec queue. > > > */ > > > DRM_XE_OA_PROPERTY_NO_PREEMPT, > > > + > > > + /** > > > + * @DRM_XE_OA_PROPERTY_NUM_SYNCS: Number of syncs in the sync array > > > + * specified in @DRM_XE_OA_PROPERTY_SYNCS > > > + */ > > > + DRM_XE_OA_PROPERTY_NUM_SYNCS, > > > + > > > + /** > > > + * @DRM_XE_OA_PROPERTY_SYNCS: Pointer to struct @drm_xe_sync array > > > + * with array size specified via @DRM_XE_OA_PROPERTY_NUM_SYNCS. OA > > > + * configuration will wait till input fences signal. Output fences > > > + * will signal after the new OA configuration takes effect. > > > + */ > > > + DRM_XE_OA_PROPERTY_SYNCS, > > > }; > > > > > > /** > > > -- > > > 2.41.0 > > >