From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1D660F3C9A9 for ; Tue, 24 Feb 2026 15:58:41 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C014410E5CB; Tue, 24 Feb 2026 15:58:40 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="Qo0ExYDn"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) by gabe.freedesktop.org (Postfix) with ESMTPS id A132610E5CB for ; Tue, 24 Feb 2026 15:58:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1771948719; x=1803484719; h=message-id:subject:from:to:cc:date:in-reply-to: references:content-transfer-encoding:mime-version; bh=b/o12MGYi/I3Kk9ziIdGBkI/7b1jQwfaKZdX8LGDVfo=; b=Qo0ExYDn0kSABtOXyLn4UNPYyhZQEVrZoLtcTi8wU8e1F4P/txlzEv9u hSM114tQH2glrEd2vLJnao6w5gvE8s+pT9AOSLPAKjFJr+B5HyIb99fst ZHnGqiNrNKuJ2ctU1rnQpieufGgwCMH4QMyTc65qb870/nIptNlwKyK3N mHWIi/ryR5xXDHBYN+1r0TX/g1saFbLGcSR0UuxMZAW5LHD0VrBlzzvAT zNBx+1rib8nQjQed3ax4I4RQXA/VTCaQl/8fWdovvi0aPyVSknPPJF+tE UArlKn6H8pf68Ku7Yseb/p9ecK773H2sIOWvPAI/lo3QTeeFWLQVvtJ4g g==; X-CSE-ConnectionGUID: o5xS1I1YTWKuqhKwYyDyNg== X-CSE-MsgGUID: GVMhr6GiT26eTuba6Ov5Lw== X-IronPort-AV: E=McAfee;i="6800,10657,11711"; a="76832089" X-IronPort-AV: E=Sophos;i="6.21,308,1763452800"; d="scan'208";a="76832089" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Feb 2026 07:58:39 -0800 X-CSE-ConnectionGUID: bfLZ7XcLTm6tUkaBJZuSvg== X-CSE-MsgGUID: F+/LEBFnS3KkWbjJO3rWew== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,308,1763452800"; d="scan'208";a="253696009" Received: from egrumbac-mobl6.ger.corp.intel.com (HELO [10.245.244.148]) ([10.245.244.148]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Feb 2026 07:58:37 -0800 Message-ID: <3033abb2dfe6755ff3559a480e0d21b5665436d5.camel@linux.intel.com> Subject: Re: [PATCH v3 1/3] drm/xe: Split H2G and G2H into separate buffer objects From: Thomas =?ISO-8859-1?Q?Hellstr=F6m?= To: Matthew Brost , intel-xe@lists.freedesktop.org Cc: stuart.summers@intel.com, francois.dugast@intel.com, daniele.ceraolospurio@intel.com, michal.wajdeczko@intel.com Date: Tue, 24 Feb 2026 16:58:35 +0100 In-Reply-To: <20260218043319.809548-2-matthew.brost@intel.com> References: <20260218043319.809548-1-matthew.brost@intel.com> <20260218043319.809548-2-matthew.brost@intel.com> Organization: Intel Sweden AB, Registration Number: 556189-6027 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.58.3 (3.58.3-1.fc43) MIME-Version: 1.0 X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Tue, 2026-02-17 at 20:33 -0800, Matthew Brost wrote: > H2G and G2H buffers have different access patterns (H2G is CPU-write, > GuC-read, while G2H is GPU-write, CPU-read). On dGPU, these patterns > benefit from different memory placements: H2G in VRAM and G2H in > system > memory. Split the CT buffer into two separate buffers=E2=80=94one for H2G= and > one for G2H=E2=80=94and select the optimal placement for each. >=20 > This provides a significant performance improvement on the G2H read > path, reducing a single read from ~20 =C2=B5s to under 1 =C2=B5s on BMG. >=20 > Signed-off-by: Matthew Brost Reviewed-by: Thomas Hellstr=C3=B6m Perhaps one could experiment with reading the data from the g2h bo using MOVNTDQA, like the write-combining memcopy. That would avoid caching the data and the GuC having to invalidate the cache line while snooping on the next write. But that should probably have a less impact, but perhaps speeding up GuC writes. /Thomas >=20 > --- > v3: > =C2=A0- Move BO to ctbs h2g or g2h structure (Michal) > --- > =C2=A0drivers/gpu/drm/xe/xe_guc_ct.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = | 67 +++++++++++++++++++------- > -- > =C2=A0drivers/gpu/drm/xe/xe_guc_ct_types.h |=C2=A0 4 +- > =C2=A02 files changed, 47 insertions(+), 24 deletions(-) >=20 > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c > b/drivers/gpu/drm/xe/xe_guc_ct.c > index 8a45573f8812..ea07a27757d5 100644 > --- a/drivers/gpu/drm/xe/xe_guc_ct.c > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c > @@ -255,6 +255,7 @@ static bool g2h_fence_needs_alloc(struct > g2h_fence *g2h_fence) > =C2=A0 > =C2=A0#define CTB_DESC_SIZE ALIGN(sizeof(struct > guc_ct_buffer_desc), SZ_2K) > =C2=A0#define CTB_H2G_BUFFER_OFFSET (CTB_DESC_SIZE * 2) > +#define CTB_G2H_BUFFER_OFFSET (CTB_DESC_SIZE * 2) > =C2=A0#define CTB_H2G_BUFFER_SIZE (SZ_4K) > =C2=A0#define CTB_H2G_BUFFER_DWORDS (CTB_H2G_BUFFER_SIZE / sizeof(u32)) > =C2=A0#define CTB_G2H_BUFFER_SIZE (SZ_128K) > @@ -279,10 +280,14 @@ long xe_guc_ct_queue_proc_time_jiffies(struct > xe_guc_ct *ct) > =C2=A0 return (CTB_H2G_BUFFER_SIZE / SZ_4K) * HZ; > =C2=A0} > =C2=A0 > -static size_t guc_ct_size(void) > +static size_t guc_h2g_size(void) > =C2=A0{ > - return CTB_H2G_BUFFER_OFFSET + CTB_H2G_BUFFER_SIZE + > - CTB_G2H_BUFFER_SIZE; > + return CTB_H2G_BUFFER_OFFSET + CTB_H2G_BUFFER_SIZE; > +} > + > +static size_t guc_g2h_size(void) > +{ > + return CTB_G2H_BUFFER_OFFSET + CTB_G2H_BUFFER_SIZE; > =C2=A0} > =C2=A0 > =C2=A0static void guc_ct_fini(struct drm_device *drm, void *arg) > @@ -311,7 +316,8 @@ int xe_guc_ct_init_noalloc(struct xe_guc_ct *ct) > =C2=A0 struct xe_gt *gt =3D ct_to_gt(ct); > =C2=A0 int err; > =C2=A0 > - xe_gt_assert(gt, !(guc_ct_size() % PAGE_SIZE)); > + xe_gt_assert(gt, !(guc_h2g_size() % PAGE_SIZE)); > + xe_gt_assert(gt, !(guc_g2h_size() % PAGE_SIZE)); > =C2=A0 > =C2=A0 err =3D drmm_mutex_init(&xe->drm, &ct->lock); > =C2=A0 if (err) > @@ -356,7 +362,17 @@ int xe_guc_ct_init(struct xe_guc_ct *ct) > =C2=A0 struct xe_tile *tile =3D gt_to_tile(gt); > =C2=A0 struct xe_bo *bo; > =C2=A0 > - bo =3D xe_managed_bo_create_pin_map(xe, tile, guc_ct_size(), > + bo =3D xe_managed_bo_create_pin_map(xe, tile, guc_h2g_size(), > + =C2=A0 XE_BO_FLAG_SYSTEM | > + =C2=A0 XE_BO_FLAG_GGTT | > + =C2=A0 XE_BO_FLAG_GGTT_INVALIDATE > | > + =C2=A0 > XE_BO_FLAG_PINNED_NORESTORE); > + if (IS_ERR(bo)) > + return PTR_ERR(bo); > + > + ct->ctbs.h2g.bo =3D bo; > + > + bo =3D xe_managed_bo_create_pin_map(xe, tile, guc_g2h_size(), > =C2=A0 =C2=A0 XE_BO_FLAG_SYSTEM | > =C2=A0 =C2=A0 XE_BO_FLAG_GGTT | > =C2=A0 =C2=A0 XE_BO_FLAG_GGTT_INVALIDATE > | > @@ -364,7 +380,7 @@ int xe_guc_ct_init(struct xe_guc_ct *ct) > =C2=A0 if (IS_ERR(bo)) > =C2=A0 return PTR_ERR(bo); > =C2=A0 > - ct->bo =3D bo; > + ct->ctbs.g2h.bo =3D bo; > =C2=A0 > =C2=A0 return devm_add_action_or_reset(xe->drm.dev, > guc_action_disable_ct, ct); > =C2=A0} > @@ -389,7 +405,7 @@ int xe_guc_ct_init_post_hwconfig(struct xe_guc_ct > *ct) > =C2=A0 xe_assert(xe, !xe_guc_ct_enabled(ct)); > =C2=A0 > =C2=A0 if (IS_DGFX(xe)) { > - ret =3D xe_managed_bo_reinit_in_vram(xe, tile, &ct- > >bo); > + ret =3D xe_managed_bo_reinit_in_vram(xe, tile, &ct- > >ctbs.h2g.bo); > =C2=A0 if (ret) > =C2=A0 return ret; > =C2=A0 } > @@ -439,8 +455,7 @@ static void guc_ct_ctb_g2h_init(struct xe_device > *xe, struct guc_ctb *g2h, > =C2=A0 g2h->desc =3D IOSYS_MAP_INIT_OFFSET(map, CTB_DESC_SIZE); > =C2=A0 xe_map_memset(xe, &g2h->desc, 0, 0, sizeof(struct > guc_ct_buffer_desc)); > =C2=A0 > - g2h->cmds =3D IOSYS_MAP_INIT_OFFSET(map, CTB_H2G_BUFFER_OFFSET > + > - =C2=A0=C2=A0=C2=A0 CTB_H2G_BUFFER_SIZE); > + g2h->cmds =3D IOSYS_MAP_INIT_OFFSET(map, > CTB_G2H_BUFFER_OFFSET); > =C2=A0} > =C2=A0 > =C2=A0static int guc_ct_ctb_h2g_register(struct xe_guc_ct *ct) > @@ -449,8 +464,8 @@ static int guc_ct_ctb_h2g_register(struct > xe_guc_ct *ct) > =C2=A0 u32 desc_addr, ctb_addr, size; > =C2=A0 int err; > =C2=A0 > - desc_addr =3D xe_bo_ggtt_addr(ct->bo); > - ctb_addr =3D xe_bo_ggtt_addr(ct->bo) + CTB_H2G_BUFFER_OFFSET; > + desc_addr =3D xe_bo_ggtt_addr(ct->ctbs.h2g.bo); > + ctb_addr =3D xe_bo_ggtt_addr(ct->ctbs.h2g.bo) + > CTB_H2G_BUFFER_OFFSET; > =C2=A0 size =3D ct->ctbs.h2g.info.size * sizeof(u32); > =C2=A0 > =C2=A0 err =3D xe_guc_self_cfg64(guc, > @@ -476,9 +491,8 @@ static int guc_ct_ctb_g2h_register(struct > xe_guc_ct *ct) > =C2=A0 u32 desc_addr, ctb_addr, size; > =C2=A0 int err; > =C2=A0 > - desc_addr =3D xe_bo_ggtt_addr(ct->bo) + CTB_DESC_SIZE; > - ctb_addr =3D xe_bo_ggtt_addr(ct->bo) + CTB_H2G_BUFFER_OFFSET + > - CTB_H2G_BUFFER_SIZE; > + desc_addr =3D xe_bo_ggtt_addr(ct->ctbs.g2h.bo) + > CTB_DESC_SIZE; > + ctb_addr =3D xe_bo_ggtt_addr(ct->ctbs.g2h.bo) + > CTB_G2H_BUFFER_OFFSET; > =C2=A0 size =3D ct->ctbs.g2h.info.size * sizeof(u32); > =C2=A0 > =C2=A0 err =3D xe_guc_self_cfg64(guc, > @@ -605,9 +619,12 @@ static int __xe_guc_ct_start(struct xe_guc_ct > *ct, bool needs_register) > =C2=A0 xe_gt_assert(gt, !xe_guc_ct_enabled(ct)); > =C2=A0 > =C2=A0 if (needs_register) { > - xe_map_memset(xe, &ct->bo->vmap, 0, 0, > xe_bo_size(ct->bo)); > - guc_ct_ctb_h2g_init(xe, &ct->ctbs.h2g, &ct->bo- > >vmap); > - guc_ct_ctb_g2h_init(xe, &ct->ctbs.g2h, &ct->bo- > >vmap); > + xe_map_memset(xe, &ct->ctbs.h2g.bo->vmap, 0, 0, > + =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 xe_bo_size(ct->ctbs.h2g.bo)); > + xe_map_memset(xe, &ct->ctbs.g2h.bo->vmap, 0, 0, > + =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 xe_bo_size(ct->ctbs.g2h.bo)); > + guc_ct_ctb_h2g_init(xe, &ct->ctbs.h2g, &ct- > >ctbs.h2g.bo->vmap); > + guc_ct_ctb_g2h_init(xe, &ct->ctbs.g2h, &ct- > >ctbs.g2h.bo->vmap); > =C2=A0 > =C2=A0 err =3D guc_ct_ctb_h2g_register(ct); > =C2=A0 if (err) > @@ -624,7 +641,7 @@ static int __xe_guc_ct_start(struct xe_guc_ct > *ct, bool needs_register) > =C2=A0 ct->ctbs.h2g.info.broken =3D false; > =C2=A0 ct->ctbs.g2h.info.broken =3D false; > =C2=A0 /* Skip everything in H2G buffer */ > - xe_map_memset(xe, &ct->bo->vmap, > CTB_H2G_BUFFER_OFFSET, 0, > + xe_map_memset(xe, &ct->ctbs.h2g.bo->vmap, > CTB_H2G_BUFFER_OFFSET, 0, > =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 CTB_H2G_BUFFER_SIZE); > =C2=A0 } > =C2=A0 > @@ -1963,8 +1980,9 @@ static struct xe_guc_ct_snapshot > *guc_ct_snapshot_alloc(struct xe_guc_ct *ct, bo > =C2=A0 if (!snapshot) > =C2=A0 return NULL; > =C2=A0 > - if (ct->bo && want_ctb) { > - snapshot->ctb_size =3D xe_bo_size(ct->bo); > + if (ct->ctbs.h2g.bo && ct->ctbs.g2h.bo && want_ctb) { > + snapshot->ctb_size =3D xe_bo_size(ct->ctbs.h2g.bo) + > + xe_bo_size(ct->ctbs.g2h.bo); > =C2=A0 snapshot->ctb =3D kmalloc(snapshot->ctb_size, atomic ? > GFP_ATOMIC : GFP_KERNEL); > =C2=A0 } > =C2=A0 > @@ -2012,8 +2030,13 @@ static struct xe_guc_ct_snapshot > *guc_ct_snapshot_capture(struct xe_guc_ct *ct, > =C2=A0 guc_ctb_snapshot_capture(xe, &ct->ctbs.g2h, > &snapshot->g2h); > =C2=A0 } > =C2=A0 > - if (ct->bo && snapshot->ctb) > - xe_map_memcpy_from(xe, snapshot->ctb, &ct->bo->vmap, > 0, snapshot->ctb_size); > + if (ct->ctbs.h2g.bo && ct->ctbs.g2h.bo && snapshot->ctb) { > + xe_map_memcpy_from(xe, snapshot->ctb, &ct- > >ctbs.h2g.bo->vmap, 0, > + =C2=A0=C2=A0 xe_bo_size(ct->ctbs.h2g.bo)); > + xe_map_memcpy_from(xe, snapshot->ctb + > xe_bo_size(ct->ctbs.h2g.bo), > + =C2=A0=C2=A0 &ct->ctbs.g2h.bo->vmap, 0, > + =C2=A0=C2=A0 xe_bo_size(ct->ctbs.g2h.bo)); > + } > =C2=A0 > =C2=A0 return snapshot; > =C2=A0} > diff --git a/drivers/gpu/drm/xe/xe_guc_ct_types.h > b/drivers/gpu/drm/xe/xe_guc_ct_types.h > index 09d7ff1ef42a..46ad1402347d 100644 > --- a/drivers/gpu/drm/xe/xe_guc_ct_types.h > +++ b/drivers/gpu/drm/xe/xe_guc_ct_types.h > @@ -39,6 +39,8 @@ struct guc_ctb_info { > =C2=A0 * struct guc_ctb - GuC command transport buffer (CTB) > =C2=A0 */ > =C2=A0struct guc_ctb { > + /** @bo: Xe BO for CTB */ > + struct xe_bo *bo; > =C2=A0 /** @desc: dma buffer map for CTB descriptor */ > =C2=A0 struct iosys_map desc; > =C2=A0 /** @cmds: dma buffer map for CTB commands */ > @@ -126,8 +128,6 @@ struct xe_fast_req_fence { > =C2=A0 * for the H2G and G2H requests sent and received through the > buffers. > =C2=A0 */ > =C2=A0struct xe_guc_ct { > - /** @bo: Xe BO for CT */ > - struct xe_bo *bo; > =C2=A0 /** @lock: protects everything in CT layer */ > =C2=A0 struct mutex lock; > =C2=A0 /** @fast_lock: protects G2H channel and credits */