From: Matthew Brost <matthew.brost@intel.com>
To: "Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
<dri-devel@lists.freedesktop.org>,
<thomas.hellstrom@linux.intel.com>
Subject: Re: [PATCH v4 2/5] drm/xe: Strict migration policy for atomic SVM faults
Date: Wed, 23 Apr 2025 10:29:15 -0700 [thread overview]
Message-ID: <aAkja+b5DAyuYczd@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <5eab3a5e-361e-4192-9858-70f2fd56a055@intel.com>
On Tue, Apr 22, 2025 at 10:51:44PM +0530, Ghimiray, Himal Prasad wrote:
>
>
> On 22-04-2025 22:34, Matthew Brost wrote:
> > Mixing GPU and CPU atomics does not work unless a strict migration
> > policy of GPU atomics must be device memory. Enforce a policy of must be
> > in VRAM with a retry loop of 2 attempts, if retry loop fails abort
> > fault.
>
> retry loop of 3 attempts not 2. with that addressed, Patch looks good to me.
> Since both of us collaborated on this , it has my ack and need someone else
> also to review it.
>
> Acked-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>
> >
> > v2:
> > - Only retry migration on atomics
> > - Drop alway migrate modparam
> > v3:
> > - Only set vram_only on DGFX (Himal)
> > - Bail on get_pages failure if vram_only and retry count exceeded (Himal)
> > - s/vram_only/devmem_only
> > - Update xe_svm_range_is_valid to accept devmem_only argument
> > v4:
> > - Fix logic bug get_pages failure
> >
> > Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_module.c | 3 --
> > drivers/gpu/drm/xe/xe_module.h | 1 -
> > drivers/gpu/drm/xe/xe_svm.c | 89 +++++++++++++++++++++++++---------
> > drivers/gpu/drm/xe/xe_svm.h | 5 --
> > 4 files changed, 65 insertions(+), 33 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
> > index 05c7d0ae6d83..1c4dfafbcd0b 100644
> > --- a/drivers/gpu/drm/xe/xe_module.c
> > +++ b/drivers/gpu/drm/xe/xe_module.c
> > @@ -33,9 +33,6 @@ struct xe_modparam xe_modparam = {
> > module_param_named(svm_notifier_size, xe_modparam.svm_notifier_size, uint, 0600);
> > MODULE_PARM_DESC(svm_notifier_size, "Set the svm notifier size(in MiB), must be power of 2");
> > -module_param_named(always_migrate_to_vram, xe_modparam.always_migrate_to_vram, bool, 0444);
> > -MODULE_PARM_DESC(always_migrate_to_vram, "Always migrate to VRAM on GPU fault");
> > -
> > module_param_named_unsafe(force_execlist, xe_modparam.force_execlist, bool, 0444);
> > MODULE_PARM_DESC(force_execlist, "Force Execlist submission");
> > diff --git a/drivers/gpu/drm/xe/xe_module.h b/drivers/gpu/drm/xe/xe_module.h
> > index 84339e509c80..5a3bfea8b7b4 100644
> > --- a/drivers/gpu/drm/xe/xe_module.h
> > +++ b/drivers/gpu/drm/xe/xe_module.h
> > @@ -12,7 +12,6 @@
> > struct xe_modparam {
> > bool force_execlist;
> > bool probe_display;
> > - bool always_migrate_to_vram;
> > u32 force_vram_bar_size;
> > int guc_log_level;
> > char *guc_firmware_path;
> > diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> > index 890f6b2f40e9..f749ae367a8f 100644
> > --- a/drivers/gpu/drm/xe/xe_svm.c
> > +++ b/drivers/gpu/drm/xe/xe_svm.c
> > @@ -650,9 +650,11 @@ void xe_svm_fini(struct xe_vm *vm)
> > }
> > static bool xe_svm_range_is_valid(struct xe_svm_range *range,
> > - struct xe_tile *tile)
> > + struct xe_tile *tile,
> > + bool devmem_only)
> > {
> > - return (range->tile_present & ~range->tile_invalidated) & BIT(tile->id);
> > + return ((range->tile_present & ~range->tile_invalidated) & BIT(tile->id))
> > + && (!devmem_only || range->base.flags.migrate_devmem);
Typo...
s/migrate_devmem/has_devmem_pages/
Matt
> > }
> > #if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
> > @@ -726,6 +728,35 @@ static int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> > }
> > #endif
> > +static bool supports_4K_migration(struct xe_device *xe)
> > +{
> > + if (xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > +static bool xe_svm_range_needs_migrate_to_vram(struct xe_svm_range *range,
> > + struct xe_vma *vma)
> > +{
> > + struct xe_vm *vm = range_to_vm(&range->base);
> > + u64 range_size = xe_svm_range_size(range);
> > +
> > + if (!range->base.flags.migrate_devmem)
> > + return false;
> > +
> > + if (xe_svm_range_in_vram(range)) {
> > + drm_dbg(&vm->xe->drm, "Range is already in VRAM\n");
> > + return false;
> > + }
> > +
> > + if (range_size <= SZ_64K && !supports_4K_migration(vm->xe)) {
> > + drm_dbg(&vm->xe->drm, "Platform doesn't support SZ_4K range migration\n");
> > + return false;
> > + }
> > +
> > + return true;
> > +}
> > /**
> > * xe_svm_handle_pagefault() - SVM handle page fault
> > @@ -750,12 +781,15 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
> > IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR),
> > .check_pages_threshold = IS_DGFX(vm->xe) &&
> > IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR) ? SZ_64K : 0,
> > + .devmem_only = atomic && IS_DGFX(vm->xe) &&
> > + IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR),
> > };
> > struct xe_svm_range *range;
> > struct drm_gpusvm_range *r;
> > struct drm_exec exec;
> > struct dma_fence *fence;
> > struct xe_tile *tile = gt_to_tile(gt);
> > + int migrate_try_count = ctx.devmem_only ? 3 : 1;
> > ktime_t end = 0;
> > int err;
> > @@ -777,23 +811,26 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
> > return PTR_ERR(r);
> > range = to_xe_range(r);
> > - if (xe_svm_range_is_valid(range, tile))
> > + if (xe_svm_range_is_valid(range, tile, ctx.devmem_only))
> > return 0;
> > range_debug(range, "PAGE FAULT");
> > - /* XXX: Add migration policy, for now migrate range once */
> > - if (!range->skip_migrate && range->base.flags.migrate_devmem &&
> > - xe_svm_range_size(range) >= SZ_64K) {
> > - range->skip_migrate = true;
> > -
> > + if (--migrate_try_count >= 0 &&
> > + xe_svm_range_needs_migrate_to_vram(range, vma)) {
> > err = xe_svm_alloc_vram(vm, tile, range, &ctx);
> > if (err) {
> > - drm_dbg(&vm->xe->drm,
> > - "VRAM allocation failed, falling back to "
> > - "retrying fault, asid=%u, errno=%pe\n",
> > - vm->usm.asid, ERR_PTR(err));
> > - goto retry;
> > + if (migrate_try_count || !ctx.devmem_only) {
> > + drm_dbg(&vm->xe->drm,
> > + "VRAM allocation failed, falling back to retrying fault, asid=%u, errno=%pe\n",
> > + vm->usm.asid, ERR_PTR(err));
> > + goto retry;
> > + } else {
> > + drm_err(&vm->xe->drm,
> > + "VRAM allocation failed, retry count exceeded, asid=%u, errno=%pe\n",
> > + vm->usm.asid, ERR_PTR(err));
> > + return err;
> > + }
> > }
> > }
> > @@ -801,15 +838,22 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
> > err = drm_gpusvm_range_get_pages(&vm->svm.gpusvm, r, &ctx);
> > /* Corner where CPU mappings have changed */
> > if (err == -EOPNOTSUPP || err == -EFAULT || err == -EPERM) {
> > - if (err == -EOPNOTSUPP) {
> > - range_debug(range, "PAGE FAULT - EVICT PAGES");
> > - drm_gpusvm_range_evict(&vm->svm.gpusvm, &range->base);
> > + if (migrate_try_count > 0 || !ctx.devmem_only) {
> > + if (err == -EOPNOTSUPP) {
> > + range_debug(range, "PAGE FAULT - EVICT PAGES");
> > + drm_gpusvm_range_evict(&vm->svm.gpusvm,
> > + &range->base);
> > + }
> > + drm_dbg(&vm->xe->drm,
> > + "Get pages failed, falling back to retrying, asid=%u, gpusvm=%p, errno=%pe\n",
> > + vm->usm.asid, &vm->svm.gpusvm, ERR_PTR(err));
> > + range_debug(range, "PAGE FAULT - RETRY PAGES");
> > + goto retry;
> > + } else {
> > + drm_err(&vm->xe->drm,
> > + "Get pages failed, retry count exceeded, asid=%u, gpusvm=%p, errno=%pe\n",
> > + vm->usm.asid, &vm->svm.gpusvm, ERR_PTR(err));
> > }
> > - drm_dbg(&vm->xe->drm,
> > - "Get pages failed, falling back to retrying, asid=%u, gpusvm=%p, errno=%pe\n",
> > - vm->usm.asid, &vm->svm.gpusvm, ERR_PTR(err));
> > - range_debug(range, "PAGE FAULT - RETRY PAGES");
> > - goto retry;
> > }
> > if (err) {
> > range_debug(range, "PAGE FAULT - FAIL PAGE COLLECT");
> > @@ -843,9 +887,6 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
> > }
> > drm_exec_fini(&exec);
> > - if (xe_modparam.always_migrate_to_vram)
> > - range->skip_migrate = false;
> > -
> > dma_fence_wait(fence, false);
> > dma_fence_put(fence);
> > diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
> > index 3d441eb1f7ea..0e1f376a7471 100644
> > --- a/drivers/gpu/drm/xe/xe_svm.h
> > +++ b/drivers/gpu/drm/xe/xe_svm.h
> > @@ -39,11 +39,6 @@ struct xe_svm_range {
> > * range. Protected by GPU SVM notifier lock.
> > */
> > u8 tile_invalidated;
> > - /**
> > - * @skip_migrate: Skip migration to VRAM, protected by GPU fault handler
> > - * locking.
> > - */
> > - u8 skip_migrate :1;
> > };
> > /**
>
next prev parent reply other threads:[~2025-04-23 17:28 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-22 17:04 [PATCH v4 0/5] Enable SVM atomics in Xe / GPU SVM Matthew Brost
2025-04-22 17:04 ` [PATCH v4 1/5] drm/gpusvm: Introduce devmem_only flag for allocation Matthew Brost
2025-04-23 16:14 ` Matthew Brost
2025-04-22 17:04 ` [PATCH v4 2/5] drm/xe: Strict migration policy for atomic SVM faults Matthew Brost
2025-04-22 17:21 ` Ghimiray, Himal Prasad
2025-04-23 17:29 ` Matthew Brost [this message]
2025-04-24 14:39 ` Thomas Hellström
2025-04-24 18:03 ` Matthew Brost
2025-04-25 7:18 ` Thomas Hellström
2025-04-25 7:39 ` Matthew Brost
2025-04-25 9:10 ` Thomas Hellström
2025-04-22 17:04 ` [PATCH v4 3/5] drm/gpusvm: Add timeslicing support to GPU SVM Matthew Brost
2025-04-23 5:36 ` Ghimiray, Himal Prasad
2025-04-22 17:04 ` [PATCH v4 4/5] drm/xe: Timeslice GPU on atomic SVM fault Matthew Brost
2025-04-23 5:36 ` Ghimiray, Himal Prasad
2025-04-22 17:04 ` [PATCH v4 5/5] drm/xe: Add atomic_svm_timeslice_ms debugfs entry Matthew Brost
2025-04-23 5:37 ` Ghimiray, Himal Prasad
2025-04-22 22:57 ` ✗ Xe.CI.Full: failure for Enable SVM atomics in Xe / GPU SVM (rev4) Patchwork
2025-04-23 13:02 ` ✓ CI.Patch_applied: success " Patchwork
2025-04-23 13:03 ` ✗ CI.checkpatch: warning " Patchwork
2025-04-23 13:04 ` ✓ CI.KUnit: success " Patchwork
2025-04-23 13:12 ` ✓ CI.Build: " Patchwork
2025-04-23 13:14 ` ✗ CI.Hooks: failure " Patchwork
2025-04-23 13:15 ` ✓ CI.checksparse: success " Patchwork
2025-04-23 20:09 ` ✗ Xe.CI.Full: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aAkja+b5DAyuYczd@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox