From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9E314E7718C for ; Thu, 19 Dec 2024 20:15:44 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6863710E220; Thu, 19 Dec 2024 20:15:44 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="djYQNzFM"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) by gabe.freedesktop.org (Postfix) with ESMTPS id 40F3510E220 for ; Thu, 19 Dec 2024 20:15:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734639343; x=1766175343; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=Zbk8DKvok8dlITlKai73LBXXeX6A0H7ELuRIAqKYNkQ=; b=djYQNzFMw+FElabDoGSPC6gjPwxS2IcD1FE2DUkbVuV1JpfZWpWkE9El 8G9nQgdUxQVmy8//3zFHwj93SZ6PGRxrwP4WtDFX+aXvJCGcyDxDRSXhJ xiSNPJu93/hf1lnYLQwUxuWO8EN9iZskgipBt88BrFLT2/0QY6od6x72D /meD5iTTTxe1G811ltpgiTb8W1ezn+s/noTopJRh6Z0EC8G0SaqTWVdew /4kuE4z5ySLHtWsf3/AnHjril5Wye67CK51Zrmmf3FALnOY/+3l0uQD3I tb8W4UmAEHxXUX7KmivHwb9bHyr6Ix5MZuhpz0R8Yob9nivg4lKq4xZ/Y w==; X-CSE-ConnectionGUID: t5E+MamwT96YfDszWVmlTw== X-CSE-MsgGUID: YjYJfO4iRH+jY4h1QCU77A== X-IronPort-AV: E=McAfee;i="6700,10204,11291"; a="22763666" X-IronPort-AV: E=Sophos;i="6.12,248,1728975600"; d="scan'208";a="22763666" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Dec 2024 12:15:43 -0800 X-CSE-ConnectionGUID: TmPmlZNOSliSFMQLm0gVSQ== X-CSE-MsgGUID: Jc1SwkKNRLyWm9JEvDHuuw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,248,1728975600"; d="scan'208";a="103281182" Received: from orsosgc001.jf.intel.com (HELO orsosgc001.intel.com) ([10.165.21.142]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Dec 2024 12:15:43 -0800 Date: Thu, 19 Dec 2024 12:15:42 -0800 Message-ID: <85r0631lap.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Harish Chegondi Cc: intel-xe@lists.freedesktop.org, james.ausmus@intel.com, felix.j.degrood@intel.com, matias.a.cabral@intel.com, joshua.santosh.ranjan@intel.com, shubham.kumar@intel.com, matthew.d.roper@intel.com, matthew.olson@intel.com Subject: Re: [PATCH v6 6/7] drm/xe/uapi: Add a device query to get EU stall sampling information In-Reply-To: References: <6e104a81ccf931e55a577ce816e5821aab57bec4.1734427624.git.harish.chegondi@intel.com> <855xni2huy.wl-ashutosh.dixit@intel.com> <85seqj1vgf.wl-ashutosh.dixit@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-redhat-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Thu, 19 Dec 2024 12:04:40 -0800, Harish Chegondi wrote: > > On Thu, Dec 19, 2024 at 08:36:16AM -0800, Dixit, Ashutosh wrote: > > On Wed, 18 Dec 2024 15:24:18 -0800, Harish Chegondi wrote: > > > > > > On Tue, Dec 17, 2024 at 12:07:49PM -0800, Dixit, Ashutosh wrote: > > > > On Tue, 17 Dec 2024 01:46:56 -0800, Harish Chegondi wrote: > > > > > > > > > > > > Hi Ashutosh, > > > > > > > Hi Harish, > > > > > > > > Only reviewing the uapi, not the implementation. > > > > > > > > > User space can get the EU stall data record size, EU stall capabilities, > > > > > EU stall sampling rates, and per XeCore buffer size with query IOCTL > > > > > DRM_IOCTL_XE_DEVICE_QUERY with .query set to DRM_XE_DEVICE_QUERY_EU_STALL. > > > > > A struct drm_xe_query_eu_stall will be returned to the user space along > > > > > with an array of supported sampling rates sorted in the fastest sampling > > > > > rate first order. sampling_rates in struct drm_xe_query_eu_stall will > > > > > point to the array of sampling rates. > > > > > > > > > > Any capabilities in EU stall sampling as of this patch are considered > > > > > as base capabilities. Any new capabilities added later will need > > > > > a new capabilities flag. > > > > > > > > s/a new capabilities flag/new capability bits/. Or, better to say something > > > > like "New capability bits will be added for any new functionality added > > > > later". > > > > > > > > See OA capability bits in the latest drm-tip. > > > Will check and change. > > > > > > > > > > > > > > v6: Include EU stall sampling rates information and > > > > > per XeCore buffer size in the query information. > > > > > > > > /snip/ > > > > > > > > > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h > > > > > index 4ee3b04a1bb5..40c2d274473e 100644 > > > > > --- a/include/uapi/drm/xe_drm.h > > > > > +++ b/include/uapi/drm/xe_drm.h > > > > > @@ -700,6 +700,7 @@ struct drm_xe_device_query { > > > > > #define DRM_XE_DEVICE_QUERY_ENGINE_CYCLES 6 > > > > > #define DRM_XE_DEVICE_QUERY_UC_FW_VERSION 7 > > > > > #define DRM_XE_DEVICE_QUERY_OA_UNITS 8 > > > > > +#define DRM_XE_DEVICE_QUERY_EU_STALL 9 > > > > > /** @query: The type of data to query */ > > > > > __u32 query; > > > > > > > > > > @@ -1770,6 +1771,40 @@ enum drm_xe_eu_stall_property_id { > > > > > DRM_XE_EU_STALL_PROP_EVENT_REPORT_COUNT, > > > > > }; > > > > > > > > > > +/** > > > > > + * struct drm_xe_query_eu_stall - Information about EU stall sampling. > > > > > + * > > > > > + * If a query is made with a struct @drm_xe_device_query where .query > > > > > + * is equal to @DRM_XE_DEVICE_QUERY_EU_STALL, then the reply uses > > > > > + * struct @drm_xe_query_eu_stall in .data. > > > > > + */ > > > > > +struct drm_xe_query_eu_stall { > > > > > + /** @extensions: Pointer to the first extension struct, if any */ > > > > > + __u64 extensions; > > > > > + > > > > > + /** @capabilities: EU stall capabilities bit-mask */ > > > > > + __u64 capabilities; > > > > > +#define DRM_XE_EU_STALL_CAPS_BASE (1 << 0) > > > > > + > > > > > + /** @record_size: size of each EU stall data record */ > > > > > + __u64 record_size; > > > > > + > > > > > + /** @per_xecore_buf_size: Per XeCore buffer size */ > > > > > + __u64 per_xecore_buf_size; > > > > > > > > This new member has appeared. What is its purpose and how will userspace > > > > use it? Basically how is it relevant? > > > This is the per XeCore buffer size which will be used to create the per > > > GT EU stall data buffer. In the earlier patch, user space configured the > > > per Xecore size with one of the valid values - 128K, 256K or 512K. But > > > it was removed from the uAPI and instead the driver uses the biggest > > > size - 512K and creates the EU stall data buffer with size: > > > 512K x number of XeCores. User space requested the per XeCore buffer > > > sized be exposed through the query IOCTL. In the future, I need to add > > > support for this buffer size to be able to change it via the debugfs > > > > In my view, Xe is only a temporary branding name whereas our uapi is > > eternal :) Kidding, but you get the idea. So I would call this by a more > > general name such as per_core_buf_size if it indeed needs to exposed (so I > > prefer removing Xe from the name). > From an earlier review feedback, I changed all DSS (Dual Sub Slice) to > xecore as xecore is the new term for DSS or sub slice as we used to call > it. So, there are "xecore" strings in several parts of the code in > xe_eu_stall.c. So, I am not sure if I should change xecore to just core > in just this variable as it may lead to confusion about core vs xecore. Let's have @Roper, Matthew D take this on. Internally we can do whatever we want but in the uapi header we should be careful. > > > > Because I don't see what userspace will do with this. I don't even know if > > we can expose it if we cannot demonstrate that UMD's are consuming it. Also > > we can reintroduce the removed property in case it is useful (which also > > seems questionable). > L0 folks have asked for the buffer size to be exposed. Only then I added > this field to the uAPI. > I have plan to allow the user to change the per subslice buffer size > through debugfs entry in the future. Although the driver uses the > biggest buffer size, in debugging and pre-silicon environments, it helps > to have smaller buffer sizes. So, in the future I will add support for > the user to change the buffer size. So if there's a real use case for it, I would just introduce a property later on (with a capability bit), rather than side-channel ways like debugfs. So if we are going to introduce a property anyway, we probably don't need per_xecore_buf_size in the query. > > > > Anyway, for now I would keep the field but just the name to > > per_core_buf_size. But let's see if anyone else has an opinion on this. > > > > > > > > > > > > > + > > > > > + /** @num_sampling_rates: Number of sampling rates supported */ > > > > > + __u64 num_sampling_rates; > > > > > + > > > > > + /** > > > > > + * @sampling_rates: Pointer to an array of sampling rates > > > > > + * sorted in the fastest sampling rate first order. > > > > > > > > sorted from fastest to slowest. > > > > > > > > > + */ > > > > > + __u64 *sampling_rates; > > > > > > > > This should be written as a flexible array as follows: > > > I used a pointer instead of a flexible array as I didn't want any field > > > after reserved fields. But since flexible array fields don't have any > > > storage space allocated unlike pointer, I will go ahead and change it to > > > a flexible array and move it to the end of the struct in the next patch > > > series. > > > > > > > > __u64 sampling_rates[]; > > > > > > > > So sampling rate array is just present at the end of the struct, there > > > > should be no separate pointer here in the struct. > > > > > > > > https://en.wikipedia.org/wiki/Flexible_array_member > > > > > > > > Also we need to document what units the sampling rates are in. So something > > > > like "Sampling rates are specified in number of cycles of the reference > > > > clock". > > > > > > > > So we have decided not to use nanoseconds (so sampling period rather than > > > > sampling rate)? Any particular reason for that? Though I am ok either way. > > > > > > I decided to use GPU cycles instead of nanoseconds as using GPU cycles > > > will be more future proof to the uAPI. I also received feedback that > > > there are other interfaces in the driver that use GPU cycles instead of > > > nanoseconds, and therefore to be consistent, using GPU cycles may > > > better. > > > > > > > > > + > > > > > + /** @reserved: Reserved */ > > > > > + __u64 reserved[5]; > > > > > > > > Because flexible arrays must be the last field in a structure, this > > > > reserved field should be moved before num_sampling_rates field. > > > Will move to the end of the structure in the next patch series. > > > > > > > > > +}; > > > > > + > > > > > #if defined(__cplusplus) > > > > > } > > > > > #endif > > > > > -- > > > > > 2.47.0 > > > > > > > > > > > > > Ashutosh > > > Thanks for the review. > > > Harish. > > >