From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by gabe.freedesktop.org (Postfix) with ESMTPS id 518B610E363 for ; Tue, 11 Jul 2023 10:38:56 +0000 (UTC) Message-ID: <3a65db76-ab0b-0a4d-1a5a-59d7f001967e@intel.com> Date: Tue, 11 Jul 2023 12:38:35 +0200 Content-Language: en-US To: =?UTF-8?Q?Zbigniew_Kempczy=c5=84ski?= References: <20230706060555.282757-1-zbigniew.kempczynski@intel.com> <20230706060555.282757-12-zbigniew.kempczynski@intel.com> <20230711090632.yrmouv5ajsjhdjrs@zkempczy-mobl2> From: Karolina Stolarek In-Reply-To: <20230711090632.yrmouv5ajsjhdjrs@zkempczy-mobl2> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit MIME-Version: 1.0 Subject: Re: [igt-dev] [PATCH i-g-t v2 11/16] lib/intel_ctx: Add xe context information List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: igt-dev@lists.freedesktop.org Errors-To: igt-dev-bounces@lists.freedesktop.org Sender: "igt-dev" List-ID: On 11.07.2023 11:06, Zbigniew Kempczyński wrote: > On Fri, Jul 07, 2023 at 10:31:19AM +0200, Karolina Stolarek wrote: >> On 6.07.2023 08:05, Zbigniew Kempczyński wrote: >>> Most complicated part in adopting i915_blt to intel_blt - which should >>> handle both drivers - is how to achieve pipelined execution. In term >>> pipelined execution I mean all gpu workloads are executed without >>> stalls. >>> >>> Comparing to i915 relocations and softpinning xe architecture migrates >>> binding (this means also unbind operation) responsibility from the >>> kernel to user via vm_bind ioctl(). To avoid stalls user has to >>> provide in/out fences (syncobjs) between consecutive bindings/execs. >>> Of course for many igt tests we don't need pipelined execution, >>> just synchronous bind, then exec. But exercising the driver should >>> also cover pipelining to verify it is possible to work without stalls. >>> >>> I decided to extend intel_ctx_t with all necessary for xe objects >>> (vm, engine, syncobjs) to get flexibility in deciding how to bind, >>> execute and wait for (synchronize) those operations. Context object >>> along with i915 engine is already passed to blitter library so adding >>> xe required fields doesn't break i915 but will allow xe path to get >>> all necessary data to execute. >>> >>> Using intel_ctx with xe requires some code patterns caused by usage >>> of an allocator. For xe the allocator started tracking alloc()/free() >>> operations to do bind/unbind in one call just before execution. >>> I've added two helpers in intel_ctx which - intel_ctx_xe_exec() >>> and intel_ctx_xe_sync(). Depending how intel_ctx was created >>> (with 0 or real syncobj handles as in/bind/out fences) bind and exec >>> in intel_ctx_xe_exec() are pipelined but synchronize last operation >>> (exec). For real syncobjs they are used to join bind + exec calls >>> but there's no wait for exec (sync-out) completion. This allows >>> building more cascaded bind + exec operations without stalls. >>> >>> To wait for a sync-out fence caller may use intel_ctx_xe_sync() >>> which is synchronous wait on syncobj. It allows user to reset >>> fences to prepare for next operation. >>> >>> Signed-off-by: Zbigniew Kempczyński >>> --- >>> lib/intel_ctx.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++- >>> lib/intel_ctx.h | 14 ++++++ >>> 2 files changed, 123 insertions(+), 1 deletion(-) >>> >>> diff --git a/lib/intel_ctx.c b/lib/intel_ctx.c >>> index ded9c0f1e4..f210907fac 100644 >>> --- a/lib/intel_ctx.c >>> +++ b/lib/intel_ctx.c >>> @@ -5,9 +5,12 @@ >>> #include >>> +#include "i915/gem_engine_topology.h" >>> +#include "igt_syncobj.h" >>> +#include "intel_allocator.h" >>> #include "intel_ctx.h" >>> #include "ioctl_wrappers.h" >>> -#include "i915/gem_engine_topology.h" >>> +#include "xe/xe_ioctl.h" >>> /** >>> * SECTION:intel_ctx >>> @@ -390,3 +393,108 @@ unsigned int intel_ctx_engine_class(const intel_ctx_t *ctx, unsigned int engine) >>> { >>> return intel_ctx_cfg_engine_class(&ctx->cfg, engine); >>> } >>> + >>> +/** >>> + * intel_ctx_xe: >>> + * @fd: open i915 drm file descriptor >>> + * @vm: vm >>> + * @engine: engine >>> + * >>> + * Returns an intel_ctx_t representing the xe context. >>> + */ >>> +intel_ctx_t *intel_ctx_xe(int fd, uint32_t vm, uint32_t engine, >>> + uint32_t sync_in, uint32_t sync_bind, uint32_t sync_out) >>> +{ >>> + intel_ctx_t *ctx; >>> + >>> + ctx = calloc(1, sizeof(*ctx)); >>> + igt_assert(ctx); >>> + >>> + ctx->fd = fd; >>> + ctx->vm = vm; >>> + ctx->engine = engine; >>> + ctx->sync_in = sync_in; >>> + ctx->sync_bind = sync_bind; >>> + ctx->sync_out = sync_out; >>> + >>> + return ctx; >>> +} >>> + >>> +static int __xe_exec(int fd, struct drm_xe_exec *exec) >>> +{ >>> + int err = 0; >>> + >>> + if (igt_ioctl(fd, DRM_IOCTL_XE_EXEC, exec)) { >>> + err = -errno; >>> + igt_assume(err != 0); >> >> Wouldn't "igt_assume(err)" be enough? >> >>> + } >>> + errno = 0; >>> + return err; >>> +} >> >> I'm aware that it's a helper that you use in other execs, but it feels out >> of place, it doesn't deal with intel_ctx_t. Maybe xe_util could be its new >> home? >> > > I'm going to just export __xe_exec() from xe_ioctl.c. That's even a better idea, thanks > >>> + >>> +int __intel_ctx_xe_exec(const intel_ctx_t *ctx, uint64_t ahnd, uint64_t bb_offset) >>> +{ >>> + struct drm_xe_sync syncs[2] = { >>> + { .flags = DRM_XE_SYNC_SYNCOBJ, }, >>> + { .flags = DRM_XE_SYNC_SYNCOBJ | DRM_XE_SYNC_SIGNAL, }, >>> + }; >>> + struct drm_xe_exec exec = { >>> + .engine_id = ctx->engine, >>> + .syncs = (uintptr_t)syncs, >>> + .num_syncs = 2, >>> + .address = bb_offset, >>> + .num_batch_buffer = 1, >>> + }; >>> + uint32_t sync_in = ctx->sync_in; >>> + uint32_t sync_bind = ctx->sync_bind ?: syncobj_create(ctx->fd, 0); >>> + uint32_t sync_out = ctx->sync_out ?: syncobj_create(ctx->fd, 0); >>> + int ret; >>> + >>> + /* Synchronize allocator state -> vm */ >>> + intel_allocator_bind(ahnd, sync_in, sync_bind); >>> + >>> + /* Pipelined exec */ >>> + syncs[0].handle = sync_bind; >>> + syncs[1].handle = sync_out; >>> + >>> + ret = __xe_exec(ctx->fd, &exec); >>> + if (ret) >>> + goto err; >>> + >>> + if (!ctx->sync_bind || !ctx->sync_out) >>> + syncobj_wait_err(ctx->fd, &sync_out, 1, INT64_MAX, 0); >> >> This whole flow is so nice and tidy, I like it >> >>> + >>> +err: >>> + if (!ctx->sync_bind) >>> + syncobj_destroy(ctx->fd, sync_bind); >>> + >>> + if (!ctx->sync_out) >>> + syncobj_destroy(ctx->fd, sync_out); >>> + >>> + return ret; >>> +} >>> + >>> +void intel_ctx_xe_exec(const intel_ctx_t *ctx, uint64_t ahnd, uint64_t bb_offset) >>> +{ >>> + igt_assert_eq(__intel_ctx_xe_exec(ctx, ahnd, bb_offset), 0); >>> +} >>> + >>> +#define RESET_SYNCOBJ(__fd, __sync) do { \ >>> + if (__sync) \ >>> + syncobj_reset((__fd), &(__sync), 1); \ >>> +} while (0) >>> + >>> +int intel_ctx_xe_sync(intel_ctx_t *ctx, bool reset_syncs) >>> +{ >>> + int ret; >>> + >>> + ret = syncobj_wait_err(ctx->fd, &ctx->sync_out, 1, INT64_MAX, 0); >>> + >>> + if (reset_syncs) { >>> + RESET_SYNCOBJ(ctx->fd, ctx->sync_in); >>> + RESET_SYNCOBJ(ctx->fd, ctx->sync_bind); >>> + RESET_SYNCOBJ(ctx->fd, ctx->sync_out); >>> + } >> >> Is there a usecase where we want to do a synced execution without resetting >> syncobjs? >> > > I don't know - that's I left decision to the user. :) If there won't be a client for this functionality, we could drop it in the future > >>> + >>> + return ret; >>> +} >>> diff --git a/lib/intel_ctx.h b/lib/intel_ctx.h >>> index 3cfeaae81e..59d0360ada 100644 >>> --- a/lib/intel_ctx.h >>> +++ b/lib/intel_ctx.h >>> @@ -67,6 +67,14 @@ int intel_ctx_cfg_engine_class(const intel_ctx_cfg_t *cfg, unsigned int engine); >>> typedef struct intel_ctx { >>> uint32_t id; >>> intel_ctx_cfg_t cfg; >>> + >>> + /* Xe */ >>> + int fd; >>> + uint32_t vm; >>> + uint32_t engine; >>> + uint32_t sync_in; >>> + uint32_t sync_bind; >>> + uint32_t sync_out; >> >> Hmm, I wonder if we could wrap it in a struct. Yes, it would be painful to >> unpack, but now it feels like we've just added a bunch of fields that are >> irrelevant 80% of the time. Instead, we could have one additional field that >> could be NULL, and use it if it's initialized. >> But maybe I'm just being too nit-picky. > > I wondered to introduce union of i915 and xe structs, but I would need > to rewrite almost all igt's which are using this struct so I dropped > this idea. At the moment I need to handle both drivers so mixing > fields is not a big pain imo. Oh my, that would be painful indeed. I mean, I won't nack the mixed fields, but I just find it strange to leave them in the open when we're testing i915 stuff. All the best, Karolina > > -- > Zbigniew > >> >> All the best, >> Karolina >> >>> } intel_ctx_t; >>> int __intel_ctx_create(int fd, const intel_ctx_cfg_t *cfg, >>> @@ -81,4 +89,10 @@ void intel_ctx_destroy(int fd, const intel_ctx_t *ctx); >>> unsigned int intel_ctx_engine_class(const intel_ctx_t *ctx, unsigned int engine); >>> +intel_ctx_t *intel_ctx_xe(int fd, uint32_t vm, uint32_t engine, >>> + uint32_t sync_in, uint32_t sync_bind, uint32_t sync_out); >>> +int __intel_ctx_xe_exec(const intel_ctx_t *ctx, uint64_t ahnd, uint64_t bb_offset); >>> +void intel_ctx_xe_exec(const intel_ctx_t *ctx, uint64_t ahnd, uint64_t bb_offset); >>> +int intel_ctx_xe_sync(intel_ctx_t *ctx, bool reset_syncs); >>> + >>> #endif