From: Matthew Brost <matthew.brost@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>, <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v5 4/6] drm/xe: Rewrite GGTT VF initialisation
Date: Fri, 10 Oct 2025 08:34:55 -0700 [thread overview]
Message-ID: <aOknn/V9RVs0cbDv@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <3f96dc8f-c1bd-427a-b24a-64e7f5877fb2@intel.com>
On Fri, Oct 10, 2025 at 05:00:01PM +0200, Michal Wajdeczko wrote:
>
>
> On 10/10/2025 2:07 PM, Maarten Lankhorst wrote:
> > The previous code was using a complicated system with 2 balloons to
> > set GGTT size and adjust GGTT offset. While it works, it's overly
> > complicated.
>
> for the record:
>
> our previous attempts to make GGTT allocations 0-based where simply NAK'ed
> so there was no other option that going with the ballooning
>
I pulled this code yesterday and everything seemed to work.
> >
> > A better approach is to set the offset and size when initialising GGTT,
> > this removes the need for adding balloons. The resize function only
> > needs to re-initialise GGTT at the new offset.
> >
> > We use the newly created drm_mm_shift to shift the nodes.
>
> by keeping start at xe_ggtt level that could be even not needed
>
> >
> > This removes the need to manipulate the internals of xe_ggtt outside
> > of xe_ggtt, and cleans up a lot of now unneeded code.
> >
> > Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> > This patch has been rebased by Matthew Brost,
> > with some final fixups by Maarten to remove
> > the use of ggtt->mutex, and all references to the balloons.
> >
> > drivers/gpu/drm/xe/tests/xe_guc_buf_kunit.c | 2 +-
> > drivers/gpu/drm/xe/xe_device_types.h | 2 -
> > drivers/gpu/drm/xe/xe_ggtt.c | 143 +++-----------
> > drivers/gpu/drm/xe/xe_ggtt.h | 5 +-
> > drivers/gpu/drm/xe/xe_ggtt_types.h | 4 +-
> > drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 5 +-
> > drivers/gpu/drm/xe/xe_tile.c | 18 ++
> > drivers/gpu/drm/xe/xe_tile_sriov_vf.c | 197 ++------------------
> > drivers/gpu/drm/xe/xe_tile_sriov_vf.h | 4 +-
> > drivers/gpu/drm/xe/xe_tile_sriov_vf_types.h | 4 +
> > 10 files changed, 69 insertions(+), 315 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/tests/xe_guc_buf_kunit.c b/drivers/gpu/drm/xe/tests/xe_guc_buf_kunit.c
> > index d266882adc0e0..acddbedcf17cb 100644
> > --- a/drivers/gpu/drm/xe/tests/xe_guc_buf_kunit.c
> > +++ b/drivers/gpu/drm/xe/tests/xe_guc_buf_kunit.c
> > @@ -67,7 +67,7 @@ static int guc_buf_test_init(struct kunit *test)
> >
> > KUNIT_ASSERT_EQ(test, 0,
> > xe_ggtt_init_kunit(ggtt, DUT_GGTT_START,
> > - DUT_GGTT_START + DUT_GGTT_SIZE));
> > + DUT_GGTT_SIZE));
> >
> > kunit_activate_static_stub(test, xe_managed_bo_create_pin_map,
> > replacement_xe_managed_bo_create_pin_map);
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> > index 02c04ad7296e4..a05164cc669f9 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -192,8 +192,6 @@ struct xe_tile {
> > struct xe_lmtt lmtt;
> > } pf;
> > struct {
> > - /** @sriov.vf.ggtt_balloon: GGTT regions excluded from use. */
> > - struct xe_ggtt_node *ggtt_balloon[2];
> > /** @sriov.vf.self_config: VF configuration data */
> > struct xe_tile_sriov_vf_selfconfig self_config;
> > } vf;
> > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> > index 1fcb128e661b6..cd4b62303e0ec 100644
> > --- a/drivers/gpu/drm/xe/xe_ggtt.c
> > +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> > @@ -274,7 +274,6 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
> > struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> > unsigned int gsm_size;
> > u64 ggtt_start, wopcm = xe_wopcm_size(xe), ggtt_size;
> > - int err;
> >
> > if (!IS_SRIOV_VF(xe)) {
> > if (GRAPHICS_VERx100(xe) >= 1250)
> > @@ -288,9 +287,15 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
> > ggtt_start = wopcm;
> > ggtt_size = (gsm_size / 8) * (u64) XE_PAGE_SIZE - ggtt_start;
> > } else {
> > - /* GGTT is expected to be 4GiB */
> > - ggtt_start = wopcm;
> > - ggtt_size = SZ_4G - ggtt_start;
> > + ggtt_start = xe_tile_sriov_vf_ggtt_base(ggtt->tile);
> > + ggtt_size = xe_tile_sriov_vf_ggtt(ggtt->tile);
> > +
> > + if (ggtt_start < wopcm || ggtt_start > GUC_GGTT_TOP ||
> > + ggtt_size > GUC_GGTT_TOP - ggtt_start) {
> > + xe_tile_err(ggtt->tile, "tile%u: Invalid GGTT configuration: %#llx-%#llx\n",
> > + ggtt->tile->id, ggtt_start, ggtt_start + ggtt_size - 1);
> > + return -ERANGE;
> > + }
> > }
> >
> > ggtt->gsm = ggtt->tile->mmio.regs + SZ_8M;
> > @@ -311,17 +316,7 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
> > ggtt->wq = alloc_workqueue("xe-ggtt-wq", 0, WQ_MEM_RECLAIM);
> > __xe_ggtt_init_early(ggtt, ggtt_start, ggtt_size);
> >
> > - err = drmm_add_action_or_reset(&xe->drm, ggtt_fini_early, ggtt);
> > - if (err)
> > - return err;
> > -
> > - if (IS_SRIOV_VF(xe)) {
> > - err = xe_tile_sriov_vf_prepare_ggtt(ggtt->tile);
> > - if (err)
> > - return err;
> > - }
> > -
> > - return 0;
> > + return drmm_add_action_or_reset(&xe->drm, ggtt_fini_early, ggtt);
> > }
> > ALLOW_ERROR_INJECTION(xe_ggtt_init_early, ERRNO); /* See xe_pci_probe() */
> >
> > @@ -473,83 +468,8 @@ static void xe_ggtt_invalidate(struct xe_ggtt *ggtt)
> > ggtt_invalidate_gt_tlb(ggtt->tile->media_gt);
> > }
> >
> > -static void xe_ggtt_dump_node(struct xe_ggtt *ggtt,
> > - const struct drm_mm_node *node, const char *description)
> > -{
> > - char buf[10];
> > -
> > - if (IS_ENABLED(CONFIG_DRM_XE_DEBUG)) {
> > - string_get_size(node->size, 1, STRING_UNITS_2, buf, sizeof(buf));
> > - xe_tile_dbg(ggtt->tile, "GGTT %#llx-%#llx (%s) %s\n",
> > - node->start, node->start + node->size, buf, description);
> > - }
> > -}
> > -
> > /**
> > - * xe_ggtt_node_insert_balloon_locked - prevent allocation of specified GGTT addresses
> > - * @node: the &xe_ggtt_node to hold reserved GGTT node
> > - * @start: the starting GGTT address of the reserved region
> > - * @end: then end GGTT address of the reserved region
> > - *
> > - * To be used in cases where ggtt->lock is already taken.
> > - * Use xe_ggtt_node_remove_balloon_locked() to release a reserved GGTT node.
> > - *
> > - * Return: 0 on success or a negative error code on failure.
> > - */
> > -int xe_ggtt_node_insert_balloon_locked(struct xe_ggtt_node *node, u64 start, u64 end)
> > -{
> > - struct xe_ggtt *ggtt = node->ggtt;
> > - int err;
> > -
> > - xe_tile_assert(ggtt->tile, start < end);
> > - xe_tile_assert(ggtt->tile, IS_ALIGNED(start, XE_PAGE_SIZE));
> > - xe_tile_assert(ggtt->tile, IS_ALIGNED(end, XE_PAGE_SIZE));
> > - xe_tile_assert(ggtt->tile, !drm_mm_node_allocated(&node->base));
> > - lockdep_assert_held(&ggtt->lock);
> > -
> > - node->base.color = 0;
> > - node->base.start = start;
> > - node->base.size = end - start;
> > -
> > - err = drm_mm_reserve_node(&ggtt->mm, &node->base);
> > -
> > - if (xe_tile_WARN(ggtt->tile, err, "Failed to balloon GGTT %#llx-%#llx (%pe)\n",
> > - node->base.start, node->base.start + node->base.size, ERR_PTR(err)))
> > - return err;
> > -
> > - xe_ggtt_dump_node(ggtt, &node->base, "balloon");
> > - return 0;
> > -}
> > -
> > -/**
> > - * xe_ggtt_node_remove_balloon_locked - release a reserved GGTT region
> > - * @node: the &xe_ggtt_node with reserved GGTT region
> > - *
> > - * To be used in cases where ggtt->lock is already taken.
> > - * See xe_ggtt_node_insert_balloon_locked() for details.
> > - */
> > -void xe_ggtt_node_remove_balloon_locked(struct xe_ggtt_node *node)
> > -{
> > - if (!xe_ggtt_node_allocated(node))
> > - return;
> > -
> > - lockdep_assert_held(&node->ggtt->lock);
> > -
> > - xe_ggtt_dump_node(node->ggtt, &node->base, "remove-balloon");
> > -
> > - drm_mm_remove_node(&node->base);
> > -}
> > -
> > -static void xe_ggtt_assert_fit(struct xe_ggtt *ggtt, u64 start, u64 size)
> > -{
> > - struct xe_tile *tile = ggtt->tile;
> > -
> > - xe_tile_assert(tile, start >= ggtt->start);
> > - xe_tile_assert(tile, start + size <= ggtt->start + ggtt->size);
> > -}
> > -
> > -/**
> > - * xe_ggtt_shift_nodes_locked - Shift GGTT nodes to adjust for a change in usable address range.
> > + * xe_ggtt_shift_nodes - Shift GGTT nodes to adjust for a change in usable address range.
> > * @ggtt: the &xe_ggtt struct instance
> > * @shift: change to the location of area provisioned for current VF
> > *
> > @@ -563,29 +483,22 @@ static void xe_ggtt_assert_fit(struct xe_ggtt *ggtt, u64 start, u64 size)
> > * the list of nodes was either already damaged, or that the shift brings the address range
> > * outside of valid bounds. Both cases justify an assert rather than error code.
> > */
> > -void xe_ggtt_shift_nodes_locked(struct xe_ggtt *ggtt, s64 shift)
> > +void xe_ggtt_shift_nodes(struct xe_ggtt *ggtt, s64 shift)
> > {
> > - struct xe_tile *tile __maybe_unused = ggtt->tile;
> > - struct drm_mm_node *node, *tmpn;
> > - LIST_HEAD(temp_list_head);
> > + s64 new_start;
> >
> > - lockdep_assert_held(&ggtt->lock);
> > + if (!ggtt->size) {
> > + xe_tile_err(ggtt->tile, "Asked to resize before xe_ggtt_init_early()?\n");
>
> that's should be xe_assert()
>
> or our VF init flow is broken
>
> > + return;
> > + }
> >
> > - if (IS_ENABLED(CONFIG_DRM_XE_DEBUG))
> > - drm_mm_for_each_node_safe(node, tmpn, &ggtt->mm)
> > - xe_ggtt_assert_fit(ggtt, node->start + shift, node->size);
> > + guard(mutex)(&ggtt->lock);
> >
> > - drm_mm_for_each_node_safe(node, tmpn, &ggtt->mm) {
> > - drm_mm_remove_node(node);
> > - list_add(&node->node_list, &temp_list_head);
> > - }
> > + new_start = ggtt->start + shift;
> > + xe_tile_assert(ggtt->tile, new_start >= xe_wopcm_size(tile_to_xe(ggtt->tile)));
> > + xe_tile_assert(ggtt->tile, new_start + ggtt->size <= GUC_GGTT_TOP);
> >
> > - list_for_each_entry_safe(node, tmpn, &temp_list_head, node_list) {
> > - list_del(&node->node_list);
> > - node->start += shift;
> > - drm_mm_reserve_node(&ggtt->mm, node);
> > - xe_tile_assert(tile, drm_mm_node_allocated(node));
> > - }
> > + drm_mm_shift(&ggtt->mm, shift);
> > }
> >
> > /**
> > @@ -637,11 +550,9 @@ int xe_ggtt_node_insert(struct xe_ggtt_node *node, u32 size, u32 align)
> > * @ggtt: the &xe_ggtt where the new node will later be inserted/reserved.
> > *
> > * This function will allocate the struct %xe_ggtt_node and return its pointer.
> > - * This struct will then be freed after the node removal upon xe_ggtt_node_remove()
> > - * or xe_ggtt_node_remove_balloon_locked().
> > + * This struct will then be freed after the node removal upon xe_ggtt_node_remove().
> > * Having %xe_ggtt_node struct allocated doesn't mean that the node is already allocated
> > - * in GGTT. Only the xe_ggtt_node_insert(), xe_ggtt_node_insert_locked(),
> > - * xe_ggtt_node_insert_balloon_locked() will ensure the node is inserted or reserved in GGTT.
> > + * in GGTT. Only xe_ggtt_node_insert() will ensure the node is inserted or reserved in GGTT.
> > *
> > * Return: A pointer to %xe_ggtt_node struct on success. An ERR_PTR otherwise.
> > **/
> > @@ -662,9 +573,9 @@ struct xe_ggtt_node *xe_ggtt_node_init(struct xe_ggtt *ggtt)
> > * xe_ggtt_node_fini - Forcebly finalize %xe_ggtt_node struct
> > * @node: the &xe_ggtt_node to be freed
> > *
> > - * If anything went wrong with either xe_ggtt_node_insert(), xe_ggtt_node_insert_locked(),
> > - * or xe_ggtt_node_insert_balloon_locked(); and this @node is not going to be reused, then,
> > - * this function needs to be called to free the %xe_ggtt_node struct
> > + * If anything went wrong with either xe_ggtt_node_insert() and this @node is
> > + * not going to be reused, then this function needs to be called to free the
> > + * %xe_ggtt_node struct
> > **/
> > void xe_ggtt_node_fini(struct xe_ggtt_node *node)
> > {
> > diff --git a/drivers/gpu/drm/xe/xe_ggtt.h b/drivers/gpu/drm/xe/xe_ggtt.h
> > index 6482bddb2ef36..eccef2d2b3cee 100644
> > --- a/drivers/gpu/drm/xe/xe_ggtt.h
> > +++ b/drivers/gpu/drm/xe/xe_ggtt.h
> > @@ -19,10 +19,7 @@ int xe_ggtt_init(struct xe_ggtt *ggtt);
> >
> > struct xe_ggtt_node *xe_ggtt_node_init(struct xe_ggtt *ggtt);
> > void xe_ggtt_node_fini(struct xe_ggtt_node *node);
> > -int xe_ggtt_node_insert_balloon_locked(struct xe_ggtt_node *node,
> > - u64 start, u64 size);
> > -void xe_ggtt_node_remove_balloon_locked(struct xe_ggtt_node *node);
> > -void xe_ggtt_shift_nodes_locked(struct xe_ggtt *ggtt, s64 shift);
> > +void xe_ggtt_shift_nodes(struct xe_ggtt *ggtt, s64 shift);
> > u64 xe_ggtt_start(struct xe_ggtt *ggtt);
> > u64 xe_ggtt_size(struct xe_ggtt *ggtt);
> >
> > diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
> > index a27919302d6b2..b659ffc612269 100644
> > --- a/drivers/gpu/drm/xe/xe_ggtt_types.h
> > +++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
> > @@ -57,8 +57,8 @@ struct xe_ggtt {
> > * struct xe_ggtt_node - A node in GGTT.
> > *
> > * This struct needs to be initialized (only-once) with xe_ggtt_node_init() before any node
> > - * insertion, reservation, or 'ballooning'.
> > - * It will, then, be finalized by either xe_ggtt_node_remove() or xe_ggtt_node_deballoon().
> > + * insertion or reservation.
> > + * It will, then, be finalized by either xe_ggtt_node_remove().
> > */
> > struct xe_ggtt_node {
> > /** @ggtt: Back pointer to xe_ggtt where this region will be inserted at */
> > diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> > index 46518e629ba36..dd3cd7f140cd1 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> > @@ -442,7 +442,6 @@ u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt)
> > static int vf_get_ggtt_info(struct xe_gt *gt)
> > {
> > struct xe_tile *tile = gt_to_tile(gt);
> > - struct xe_ggtt *ggtt = tile->mem.ggtt;
> > struct xe_guc *guc = >->uc.guc;
> > u64 start, size, ggtt_size;
> > s64 shift;
> > @@ -450,7 +449,7 @@ static int vf_get_ggtt_info(struct xe_gt *gt)
> >
> > xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
> >
> > - guard(mutex)(&ggtt->lock);
> > + guard(mutex)(&tile->sriov.vf.self_config.ggtt_move_mutex);
>
> maybe instead we should just get a ggtt info *only* from the primary GT ?
>
> and use GT-level lock only ?
>
See below that doesn't work, and IMO out of scope.
Maybe a helper in xe_tile_sriov_vf.h that returns mutex * though as I
got feedback to not reach into other layer structures.
> >
> > err = guc_action_query_single_klv64(guc, GUC_KLV_VF_CFG_GGTT_START_KEY, &start);
> > if (unlikely(err))
> > @@ -480,7 +479,7 @@ static int vf_get_ggtt_info(struct xe_gt *gt)
> > if (shift && shift != start) {
> > xe_gt_sriov_info(gt, "Shifting GGTT base by %lld to 0x%016llx\n",
> > shift, start);
> > - xe_tile_sriov_vf_fixup_ggtt_nodes_locked(gt_to_tile(gt), shift);
> > + xe_ggtt_shift_nodes(tile->mem.ggtt, shift);
> > }
> >
> > if (xe_sriov_vf_migration_supported(gt_to_xe(gt))) {
> > diff --git a/drivers/gpu/drm/xe/xe_tile.c b/drivers/gpu/drm/xe/xe_tile.c
> > index 6edb5062c1da5..be3900681b6d8 100644
> > --- a/drivers/gpu/drm/xe/xe_tile.c
> > +++ b/drivers/gpu/drm/xe/xe_tile.c
> > @@ -17,6 +17,7 @@
> > #include "xe_sa.h"
> > #include "xe_svm.h"
> > #include "xe_tile.h"
> > +#include "xe_tile_sriov_vf.h"
> > #include "xe_tile_sysfs.h"
> > #include "xe_ttm_vram_mgr.h"
> > #include "xe_wa.h"
> > @@ -157,6 +158,12 @@ int xe_tile_init_early(struct xe_tile *tile, struct xe_device *xe, u8 id)
> > if (err)
> > return err;
> >
> > + if (IS_SRIOV_VF(xe)) {
> > + err = xe_tile_sriov_vf_init(tile);
> > + if (err)
> > + return err;
> > + }
> > +
> > tile->primary_gt = xe_gt_alloc(tile);
> > if (IS_ERR(tile->primary_gt))
> > return PTR_ERR(tile->primary_gt);
> > @@ -201,6 +208,17 @@ int xe_tile_init_noalloc(struct xe_tile *tile)
> > return xe_tile_sysfs_init(tile);
> > }
> >
> > +/**
> > + * xe_tile_init - Initialize the remainder of the tile.
> > + * @tile: The tile to initialize.
> > + *
> > + * This function is used for all tile initialization calls that may allocate memory.
> > + *
> > + * Note that since this is tile initialization, it should not perform any
> > + * GT-specific operations, and thus does not need to hold GT forcewake.
> > + *
> > + * Returns: 0 on success, negative error code on error.
> > + */
>
> nice, but since you are not touching xe_tile_init() in this patch, its kernel-doc should be added in separate patch
>
> > int xe_tile_init(struct xe_tile *tile)
> > {
> > int err;
> > diff --git a/drivers/gpu/drm/xe/xe_tile_sriov_vf.c b/drivers/gpu/drm/xe/xe_tile_sriov_vf.c
> > index c9bac2cfdd044..d1fa46e268350 100644
> > --- a/drivers/gpu/drm/xe/xe_tile_sriov_vf.c
> > +++ b/drivers/gpu/drm/xe/xe_tile_sriov_vf.c
> > @@ -14,173 +14,12 @@
> > #include "xe_tile_sriov_vf.h"
> > #include "xe_wopcm.h"
> >
> > -static int vf_init_ggtt_balloons(struct xe_tile *tile)
> > -{
> > - struct xe_ggtt *ggtt = tile->mem.ggtt;
> > -
> > - xe_tile_assert(tile, IS_SRIOV_VF(tile_to_xe(tile)));
> > -
> > - tile->sriov.vf.ggtt_balloon[0] = xe_ggtt_node_init(ggtt);
> > - if (IS_ERR(tile->sriov.vf.ggtt_balloon[0]))
> > - return PTR_ERR(tile->sriov.vf.ggtt_balloon[0]);
> > -
> > - tile->sriov.vf.ggtt_balloon[1] = xe_ggtt_node_init(ggtt);
> > - if (IS_ERR(tile->sriov.vf.ggtt_balloon[1])) {
> > - xe_ggtt_node_fini(tile->sriov.vf.ggtt_balloon[0]);
> > - return PTR_ERR(tile->sriov.vf.ggtt_balloon[1]);
> > - }
> > -
> > - return 0;
> > -}
> > -
> > -/**
> > - * xe_tile_sriov_vf_balloon_ggtt_locked - Insert balloon nodes to limit used GGTT address range.
> > - * @tile: the &xe_tile struct instance
> > - *
> > - * Return: 0 on success or a negative error code on failure.
> > - */
> > -static int xe_tile_sriov_vf_balloon_ggtt_locked(struct xe_tile *tile)
> > -{
> > - u64 ggtt_base = tile->sriov.vf.self_config.ggtt_base;
> > - u64 ggtt_size = tile->sriov.vf.self_config.ggtt_size;
> > - struct xe_device *xe = tile_to_xe(tile);
> > - u64 wopcm = xe_wopcm_size(xe);
> > - u64 start, end;
> > - int err;
> > -
> > - xe_tile_assert(tile, IS_SRIOV_VF(xe));
> > - xe_tile_assert(tile, ggtt_size);
> > - lockdep_assert_held(&tile->mem.ggtt->lock);
> > -
> > - /*
> > - * VF can only use part of the GGTT as allocated by the PF:
> > - *
> > - * WOPCM GUC_GGTT_TOP
> > - * |<------------ Total GGTT size ------------------>|
> > - *
> > - * VF GGTT base -->|<- size ->|
> > - *
> > - * +--------------------+----------+-----------------+
> > - * |////////////////////| block |\\\\\\\\\\\\\\\\\|
> > - * +--------------------+----------+-----------------+
> > - *
> > - * |<--- balloon[0] --->|<-- VF -->|<-- balloon[1] ->|
> > - */
> > -
> > - if (ggtt_base < wopcm || ggtt_base > GUC_GGTT_TOP ||
> > - ggtt_size > GUC_GGTT_TOP - ggtt_base) {
> > - xe_sriov_err(xe, "tile%u: Invalid GGTT configuration: %#llx-%#llx\n",
> > - tile->id, ggtt_base, ggtt_base + ggtt_size - 1);
> > - return -ERANGE;
> > - }
> > -
> > - start = wopcm;
> > - end = ggtt_base;
> > - if (end != start) {
> > - err = xe_ggtt_node_insert_balloon_locked(tile->sriov.vf.ggtt_balloon[0],
> > - start, end);
> > - if (err)
> > - return err;
> > - }
> > -
> > - start = ggtt_base + ggtt_size;
> > - end = GUC_GGTT_TOP;
> > - if (end != start) {
> > - err = xe_ggtt_node_insert_balloon_locked(tile->sriov.vf.ggtt_balloon[1],
> > - start, end);
> > - if (err) {
> > - xe_ggtt_node_remove_balloon_locked(tile->sriov.vf.ggtt_balloon[0]);
> > - return err;
> > - }
> > - }
> > -
> > - return 0;
> > -}
> > -
> > -static int vf_balloon_ggtt(struct xe_tile *tile)
> > -{
> > - struct xe_ggtt *ggtt = tile->mem.ggtt;
> > - int err;
> > -
> > - mutex_lock(&ggtt->lock);
> > - err = xe_tile_sriov_vf_balloon_ggtt_locked(tile);
> > - mutex_unlock(&ggtt->lock);
> > -
> > - return err;
> > -}
> > -
> > -/**
> > - * xe_tile_sriov_vf_deballoon_ggtt_locked - Remove balloon nodes.
> > - * @tile: the &xe_tile struct instance
> > - */
> > -void xe_tile_sriov_vf_deballoon_ggtt_locked(struct xe_tile *tile)
> > -{
> > - xe_tile_assert(tile, IS_SRIOV_VF(tile_to_xe(tile)));
> > -
> > - xe_ggtt_node_remove_balloon_locked(tile->sriov.vf.ggtt_balloon[1]);
> > - xe_ggtt_node_remove_balloon_locked(tile->sriov.vf.ggtt_balloon[0]);
> > -}
> > -
> > -static void vf_deballoon_ggtt(struct xe_tile *tile)
> > -{
> > - mutex_lock(&tile->mem.ggtt->lock);
> > - xe_tile_sriov_vf_deballoon_ggtt_locked(tile);
> > - mutex_unlock(&tile->mem.ggtt->lock);
> > -}
> > -
> > -static void vf_fini_ggtt_balloons(struct xe_tile *tile)
> > -{
> > - xe_tile_assert(tile, IS_SRIOV_VF(tile_to_xe(tile)));
> > -
> > - xe_ggtt_node_fini(tile->sriov.vf.ggtt_balloon[1]);
> > - xe_ggtt_node_fini(tile->sriov.vf.ggtt_balloon[0]);
> > -}
> > -
> > -static void cleanup_ggtt(struct drm_device *drm, void *arg)
> > -{
> > - struct xe_tile *tile = arg;
> > -
> > - vf_deballoon_ggtt(tile);
> > - vf_fini_ggtt_balloons(tile);
> > -}
> > -
> > -/**
> > - * xe_tile_sriov_vf_prepare_ggtt - Prepare a VF's GGTT configuration.
> > - * @tile: the &xe_tile
> > - *
> > - * This function is for VF use only.
> > - *
> > - * Return: 0 on success or a negative error code on failure.
> > - */
> > -int xe_tile_sriov_vf_prepare_ggtt(struct xe_tile *tile)
> > -{
> > - struct xe_device *xe = tile_to_xe(tile);
> > - int err;
> > -
> > - err = vf_init_ggtt_balloons(tile);
> > - if (err)
> > - return err;
> > -
> > - err = vf_balloon_ggtt(tile);
> > - if (err) {
> > - vf_fini_ggtt_balloons(tile);
> > - return err;
> > - }
> > -
> > - return drmm_add_action_or_reset(&xe->drm, cleanup_ggtt, tile);
> > -}
> > -
> > /**
> > * DOC: GGTT nodes shifting during VF post-migration recovery
> > *
> > * The first fixup applied to the VF KMD structures as part of post-migration
> > * recovery is shifting nodes within &xe_ggtt instance. The nodes are moved
> > * from range previously assigned to this VF, into newly provisioned area.
> > - * The changes include balloons, which are resized accordingly.
> > - *
> > - * The balloon nodes are there to eliminate unavailable ranges from use: one
> > - * reserves the GGTT area below the range for current VF, and another one
> > - * reserves area above.
> > *
> > * Below is a GGTT layout of example VF, with a certain address range assigned to
> > * said VF, and inaccessible areas above and below:
> > @@ -198,10 +37,6 @@ int xe_tile_sriov_vf_prepare_ggtt(struct xe_tile *tile)
> > *
> > * |<------- inaccessible for VF ------->|<VF owned>|<-- inaccessible for VF ->|
> > *
> > - * GGTT nodes used for tracking allocations:
> > - *
> > - * |<---------- balloon ------------>|<- nodes->|<----- balloon ------>|
> > - *
> > * After the migration, GGTT area assigned to the VF might have shifted, either
> > * to lower or to higher address. But we expect the total size and extra areas to
> > * be identical, as migration can only happen between matching platforms.
> > @@ -219,35 +54,27 @@ int xe_tile_sriov_vf_prepare_ggtt(struct xe_tile *tile)
> > * So the VF has a new slice of GGTT assigned, and during migration process, the
> > * memory content was copied to that new area. But the &xe_ggtt nodes are still
> > * tracking allocations using the old addresses. The nodes within VF owned area
> > - * have to be shifted, and balloon nodes need to be resized to properly mask out
> > - * areas not owned by the VF.
> > - *
> > - * Fixed &xe_ggtt nodes used for tracking allocations:
> > + * have to be shifted, and the start offset for GGTT adjusted.
> > *
> > - * |<------ balloon ------>|<- nodes->|<----------- balloon ----------->|
> > - *
> > - * Due to use of GPU profiles, we do not expect the old and new GGTT ares to
> > + * Due to use of GPU profiles, we do not expect the old and new GGTT areas to
> > * overlap; but our node shifting will fix addresses properly regardless.
> > */
> >
> > /**
> > - * xe_tile_sriov_vf_fixup_ggtt_nodes_locked - Shift GGTT allocations to match assigned range.
> > - * @tile: the &xe_tile struct instance
> > - * @shift: the shift value
> > + * xe_tile_sriov_vf_init - Init tile specific GGTT configuration.
> > + * @tile: the &xe_tile
> > *
> > - * Since Global GTT is not virtualized, each VF has an assigned range
> > - * within the global space. This range might have changed during migration,
> > - * which requires all memory addresses pointing to GGTT to be shifted.
> > + * This function is for VF use only.
> > + *
> > + * Return: 0 on success, negative value on error.
> > */
> > -void xe_tile_sriov_vf_fixup_ggtt_nodes_locked(struct xe_tile *tile, s64 shift)
> > +int xe_tile_sriov_vf_init(struct xe_tile *tile)
> > {
> > - struct xe_ggtt *ggtt = tile->mem.ggtt;
> > + struct xe_tile_sriov_vf_selfconfig *config = &tile->sriov.vf.self_config;
> >
> > - lockdep_assert_held(&ggtt->lock);
> > + xe_tile_assert(tile, IS_SRIOV_VF(tile_to_xe(tile)));
> >
> > - xe_tile_sriov_vf_deballoon_ggtt_locked(tile);
> > - xe_ggtt_shift_nodes_locked(ggtt, shift);
> > - xe_tile_sriov_vf_balloon_ggtt_locked(tile);
> > + return drmm_mutex_init(&tile->xe->drm, &config->ggtt_move_mutex);
> > }
> >
> > /**
> > @@ -312,6 +139,7 @@ void xe_tile_sriov_vf_ggtt_store(struct xe_tile *tile, u64 ggtt_size)
> > struct xe_tile_sriov_vf_selfconfig *config = &tile->sriov.vf.self_config;
> >
> > xe_tile_assert(tile, IS_SRIOV_VF(tile_to_xe(tile)));
> > + lockdep_assert_held(&config->ggtt_move_mutex);
> >
> > config->ggtt_size = ggtt_size;
> > }
> > @@ -345,6 +173,7 @@ void xe_tile_sriov_vf_ggtt_base_store(struct xe_tile *tile, u64 ggtt_base)
> > struct xe_tile_sriov_vf_selfconfig *config = &tile->sriov.vf.self_config;
> >
> > xe_tile_assert(tile, IS_SRIOV_VF(tile_to_xe(tile)));
> > + lockdep_assert_held(&config->ggtt_move_mutex);
> >
> > config->ggtt_base = ggtt_base;
> > }
> > diff --git a/drivers/gpu/drm/xe/xe_tile_sriov_vf.h b/drivers/gpu/drm/xe/xe_tile_sriov_vf.h
> > index 749f41504883c..1ca5bc87963f0 100644
> > --- a/drivers/gpu/drm/xe/xe_tile_sriov_vf.h
> > +++ b/drivers/gpu/drm/xe/xe_tile_sriov_vf.h
> > @@ -10,9 +10,7 @@
> >
> > struct xe_tile;
> >
> > -int xe_tile_sriov_vf_prepare_ggtt(struct xe_tile *tile);
> > -void xe_tile_sriov_vf_deballoon_ggtt_locked(struct xe_tile *tile);
> > -void xe_tile_sriov_vf_fixup_ggtt_nodes_locked(struct xe_tile *tile, s64 shift);
> > +int xe_tile_sriov_vf_init(struct xe_tile *tile);
> > u64 xe_tile_sriov_vf_ggtt(struct xe_tile *tile);
> > void xe_tile_sriov_vf_ggtt_store(struct xe_tile *tile, u64 ggtt_size);
> > u64 xe_tile_sriov_vf_ggtt_base(struct xe_tile *tile);
> > diff --git a/drivers/gpu/drm/xe/xe_tile_sriov_vf_types.h b/drivers/gpu/drm/xe/xe_tile_sriov_vf_types.h
> > index 4807ca51614cf..2cbbc51c101d4 100644
> > --- a/drivers/gpu/drm/xe/xe_tile_sriov_vf_types.h
> > +++ b/drivers/gpu/drm/xe/xe_tile_sriov_vf_types.h
> > @@ -7,11 +7,15 @@
> > #define _XE_TILE_SRIOV_VF_TYPES_H_
> >
> > #include <linux/types.h>
> > +#include <linux/mutex.h>
> >
> > /**
> > * struct xe_tile_sriov_vf_selfconfig - VF configuration data.
> > */
> > struct xe_tile_sriov_vf_selfconfig {
> > + /** @ggtt_move_mutex: Prevents multiple movements from happening in parallel */
> > + struct mutex ggtt_move_mutex;
>
> a) if we correctly get GGTT only from primary GT, then there should be no parallel updates ever
This is not correct. On dGPU the migration recovery worker runs a per-GT
workqueue - the GGTT query + shift needs to serialized between the two
workqueues, hence the need for a lock here. Ofc we could blindly make
all VFs which support migration share the GT workqueue like we do for
iGPU and could drop this lock. I believe that is out of scope of this
series though.
Matt
> b) if still needed maybe make this mutex more generic to protect all tile-level VF config
> or even make it top-level to protect all VF config ?
>
> > +
> > /** @ggtt_base: assigned base offset of the GGTT region. */
> > u64 ggtt_base;
> > /** @ggtt_size: assigned size of the GGTT region. */
>
next prev parent reply other threads:[~2025-10-10 15:35 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-10 12:06 [PATCH v5 0/6] drm/xe: Make struct xe_ggtt private Maarten Lankhorst
2025-10-10 12:06 ` [PATCH v5 1/6] drm/xe: Only have a single drmm release action Maarten Lankhorst
2025-10-10 12:14 ` Michal Wajdeczko
2025-10-10 13:15 ` Maarten Lankhorst
2025-10-10 12:06 ` [PATCH v5 2/6] drm/mm: Introduce address space shifting Maarten Lankhorst
2025-10-10 12:06 ` [PATCH v5 3/6] drm/xe: Start using ggtt->start in preparation of balloon removal Maarten Lankhorst
2025-10-10 12:54 ` Michal Wajdeczko
2025-10-10 12:07 ` [PATCH v5 4/6] drm/xe: Rewrite GGTT VF initialisation Maarten Lankhorst
2025-10-10 15:00 ` Michal Wajdeczko
2025-10-10 15:34 ` Matthew Brost [this message]
2025-10-10 12:07 ` [PATCH v5 5/6] drm/xe: Convert xe_fb_pin to use a callback for insertion into GGTT Maarten Lankhorst
2025-10-10 12:07 ` [PATCH v5 6/6] drm/xe: Move struct xe_ggtt to xe_ggtt.c Maarten Lankhorst
2025-10-10 15:05 ` Michal Wajdeczko
2025-10-13 17:54 ` Maarten Lankhorst
2025-10-10 13:30 ` ✗ CI.checkpatch: warning for drm/xe: Make struct xe_ggtt private. (rev5) Patchwork
2025-10-10 13:32 ` ✓ CI.KUnit: success " Patchwork
2025-10-10 13:50 ` ✗ CI.checksparse: warning " Patchwork
2025-10-10 14:14 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-10-10 18:57 ` ✗ Xe.CI.Full: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aOknn/V9RVs0cbDv@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=dev@lankhorst.se \
--cc=intel-xe@lists.freedesktop.org \
--cc=michal.wajdeczko@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox