Re: [Intel-xe] [PATCH] drm/xe: Enable scratch page when page fault is enabled

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "Summers, Stuart" <stuart.summers@intel.com>
To: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"Chang,  Yu bruce" <yu.bruce.chang@intel.com>
Subject: Re: [Intel-xe] [PATCH] drm/xe: Enable scratch page when page fault is enabled
Date: Tue, 29 Aug 2023 16:36:07 +0000	[thread overview]
Message-ID: <3f8e77680f0c28d47445732b1ab8505e2986d294.camel@intel.com> (raw)
In-Reply-To: <CY8PR11MB6940963ED7121F4ED4AB4BA6C3E0A@CY8PR11MB6940.namprd11.prod.outlook.com>

On Mon, 2023-08-28 at 22:44 +0000, Chang, Yu bruce wrote:
> 
> 
> > -----Original Message-----
> > From: Summers, Stuart <stuart.summers@intel.com>
> > Sent: Monday, August 28, 2023 3:26 PM
> > To: intel-xe@lists.freedesktop.org; Chang, Yu bruce
> > <yu.bruce.chang@intel.com>
> > Cc: Brost, Matthew <matthew.brost@intel.com>; Welty, Brian
> > <brian.welty@intel.com>; Vishwanathapura, Niranjana
> > <niranjana.vishwanathapura@intel.com>; Zeng, Oak
> > <oak.zeng@intel.com>
> > Subject: Re: [PATCH] drm/xe: Enable scratch page when page fault is
> > enabled
> > 
> > On Mon, 2023-08-28 at 19:11 +0000, Chang, Yu bruce wrote:
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: Summers, Stuart <stuart.summers@intel.com>
> > > > Sent: Monday, August 28, 2023 10:56 AM
> > > > To: intel-xe@lists.freedesktop.org; Chang, Yu bruce
> > > > <yu.bruce.chang@intel.com>
> > > > Cc: Brost, Matthew <matthew.brost@intel.com>; Welty, Brian
> > > > <brian.welty@intel.com>; Vishwanathapura, Niranjana
> > > > <niranjana.vishwanathapura@intel.com>; Zeng, Oak
> > > > <oak.zeng@intel.com>
> > > > Subject: Re: [PATCH] drm/xe: Enable scratch page when page
> > > > fault is
> > > > enabled
> > > > 
> > > > On Sat, 2023-08-26 at 00:14 +0000, Chang, Bruce wrote:
> > > > > i915 can use scratch page even when page fault is enabled,
> > > > > this
> > > > > patch is trying to port this feature over.
> > > > 
> > > > But why? From a debug perspective maybe this is interesting,
> > > > but
> > > > ideally the failed page fault should give the app everything
> > > > they
> > > > need to recover. Why not give the onus to the application to
> > > > fix the
> > > > addressing problems?
> > > > 
> > > You are correct that this feature was originally designed to help
> > > application to run even they had invalid VA accesses. Now this
> > > should
> > > not be a goal anymore.
> > > 
> > > The main ask is for EU debug support, and also can be potential
> > > WA if
> > > there Is any prefetch related issue, such as L1.
> > 
> > I agree with what Matt B. had said. It would be nice to know why EU
> > debug needs
> > this. Is there an alternate solution? If not, can we enable this
> > only if EU debug is
> > active?
> > 
> > We also don't necessarily want a bunch of workarounds added for
> > potential
> > issues we don't yet know about. That's adding undue burden on the
> > driver.
> > 
> > Thanks,
> > Stuart
> > 
> The PVC HW has a limitation that the page fault due to invalid
> accessing will halt
> the EUs accessing it, and the default behavior is CAT fault. So, in
> order to activate
> the debugger, kmd needs to setup the scratch pages to unhalt the EUs.

We should add this as an explicit workaround for PVC then, not a
general feature. Also IMO it's better to hold off merging this until EU
debug is ready to consume it. We shouldn't have dead code sitting here
untested.

Do you have a link to the EU debug series adding this support?

Thanks,
Stuart

> 
> this feature can only be enabled if scratch flag is set. So, once EU
> debugger is running,
> the debugger umd will set the scratch flag, otherwise, this flag
> should not be set.
> So, in regular run, this feature should not be activated.
> 
> Will add this details into commit comment to make things more clear.
> 
> Thanks,
> Bruce
> 
> > > 
> > > > > 
> > > > > The current i915 solution changes page table directly which
> > > > > may be
> > > > > hard to make to upstream, so a more complex solution is
> > > > > needed to
> > > > > apply to the current Xe framework if following the existing
> > > > > i915
> > > > > solution.
> > > > > 
> > > > > This patch is trying to make the minimum impact to the
> > > > > existing
> > > > > driver, but still enable the scratch page support.
> > > > > 
> > > > > So, the idea is to bind a scratch vma if the page fault is
> > > > > from an
> > > > > invalid access. This patch is taking advantage of null pte at
> > > > > this
> > > > > point, we may introduce a special vma for scratch vma if
> > > > > needed.
> > > > 
> > > > So basically we want to avoid a cat fault because the cat fault
> > > > doesn't give enough information about the bad address? Or is
> > > > there
> > > > something else?
> > > > 
> > > > Thanks,
> > > > Stuart
> > > > 
> > > As mentioned above, mainly for EU debugger support and any
> > > potential
> > > prefetch WA.
> > > 
> > > The issue you mentioned should be fixed with a GuC change along
> > > with
> > > corresponding driver change. It is coming, but separately.
> > > 
> > > Thanks,
> > > Bruce
> > > 
> > > > > After the bind, the user app can continue to run without
> > > > > causing a
> > > > > fatal failure or reset and stop.
> > > > > 
> > > > > In case the app will bind this scratch vma to a valid
> > > > > address, it
> > > > > will fail the bind, this patch will handle the failre and
> > > > > unbind
> > > > > the scrach vma[s], so that the user binding will be retried
> > > > > to the
> > > > > valid address.
> > > > > 
> > > > > This patch only kicks in when there is a failure for both
> > > > > page
> > > > > fault and bind, so it should has no impact to the exiating
> > > > > code
> > > > > path.
> > > > > On
> > > > > another hand, it uses actual page tables instead of special
> > > > > scratch page tables, so it enables potential not to
> > > > > invalidate
> > > > > TLBs when doing unbind if all upper layer page tables are
> > > > > still
> > > > > being used.
> > > > > 
> > > > > tested on new scratch igt tests which will be sent out for
> > > > > review.
> > > > > 
> > > > > Cc: Oak Zeng <oak.zeng@intel.com>
> > > > > Cc: Brian Welty <brian.welty@intel.com>
> > > > > Cc: Niranjana Vishwanathapura
> > > > > <niranjana.vishwanathapura@intel.com>
> > > > > Cc: Stuart Summers <stuart.summers@intel.com>
> > > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/xe/xe_gt_pagefault.c |  9 ++++--
> > > > >  drivers/gpu/drm/xe/xe_vm.c           | 48
> > > > > +++++++++++++++++++++++---
> > > > > --
> > > > >  drivers/gpu/drm/xe/xe_vm.h           |  2 ++
> > > > >  3 files changed, 49 insertions(+), 10 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > > b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > > index b6f781b3d9d7..524b38df3d7a 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > > @@ -137,8 +137,13 @@ static int handle_pagefault(struct xe_gt
> > > > > *gt,
> > > > > struct pagefault *pf)
> > > > >         write_locked = true;
> > > > >         vma = lookup_vma(vm, pf->page_addr);
> > > > >         if (!vma) {
> > > > > -               ret = -EINVAL;
> > > > > -               goto unlock_vm;
> > > > > +               if (vm->flags & XE_VM_FLAG_SCRATCH_PAGE)
> > > > > +                       vma = xe_bind_scratch_vma(vm, pf-
> > > > > > page_addr,
> > > > > SZ_64K);
> > > > > +
> > > > > +               if (!vma) {
> > > > > +                       ret = -EINVAL;
> > > > > +                       goto unlock_vm;
> > > > > +               }
> > > > >         }
> > > > > 
> > > > >         if (!xe_vma_is_userptr(vma) ||
> > > > > !xe_vma_userptr_check_repin(vma)) { diff --git
> > > > > a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > > > > index
> > > > > 389ac5ba8ddf..4c3d5d781b58 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > > > @@ -1262,7 +1262,8 @@ struct xe_vm *xe_vm_create(struct
> > > > > xe_device
> > > > *xe,
> > > > > u32 flags)
> > > > >                 }
> > > > >         }
> > > > > 
> > > > > -       if (flags & XE_VM_FLAG_SCRATCH_PAGE) {
> > > > > +       if (flags & XE_VM_FLAG_SCRATCH_PAGE &&
> > > > > +           (!(flags & XE_VM_FLAG_FAULT_MODE))) {
> > > > >                 for_each_tile(tile, xe, id) {
> > > > >                         if (!vm->pt_root[id])
> > > > >                                 continue; @@ -1998,10 +1999,6
> > > > > @@
> > > > > int xe_vm_create_ioctl(struct drm_device *dev, void *data,
> > > > >         if (XE_IOCTL_DBG(xe, args->flags &
> > > > > ~ALL_DRM_XE_VM_CREATE_FLAGS))
> > > > >                 return -EINVAL;
> > > > > 
> > > > > -       if (XE_IOCTL_DBG(xe, args->flags &
> > > > > DRM_XE_VM_CREATE_SCRATCH_PAGE &&
> > > > > -                        args->flags &
> > > > > DRM_XE_VM_CREATE_FAULT_MODE))
> > > > > -               return -EINVAL;
> > > > > -
> > > > >         if (XE_IOCTL_DBG(xe, args->flags &
> > > > > DRM_XE_VM_CREATE_COMPUTE_MODE &&
> > > > >                          args->flags &
> > > > > DRM_XE_VM_CREATE_FAULT_MODE))
> > > > >                 return -EINVAL;
> > > > > @@ -2783,6 +2780,39 @@ static int __xe_vma_op_execute(struct
> > > > > xe_vm
> > > > > *vm, struct xe_vma *vma,
> > > > >         return err;
> > > > >  }
> > > > > 
> > > > > +struct xe_vma *xe_bind_scratch_vma(struct xe_vm *vm, u64
> > > > > addr,
> > > > > u64
> > > > > size)
> > > > > +{
> > > > > +       struct xe_vma *vma = 0;
> > > > > +
> > > > > +       if (!vm->size)
> > > > > +               return 0;
> > > > > +
> > > > > +       vma = xe_vma_create(vm, NULL, 0, addr, addr + size -
> > > > > 1,
> > > > > false, true, 0);
> > > > > +       if (!vma)
> > > > > +               return 0;
> > > > > +       xe_vm_insert_vma(vm, vma);
> > > > > +
> > > > > +       /* fault will handle the bind */
> > > > > +
> > > > > +       return vma;
> > > > > +}
> > > > > +
> > > > > +int xe_unbind_scratch_vma(struct xe_vm *vm, u64 addr, u64
> > > > > range)
> > > > > {
> > > > > +       struct xe_vma *vma;
> > > > > +
> > > > > +       vma = xe_vm_find_overlapping_vma(vm, addr, range);
> > > > > +       if (!vma)
> > > > > +               return -ENXIO;
> > > > > +
> > > > > +       if (xe_vma_is_null(vma)) {
> > > > > +               prep_vma_destroy(vm, vma, true);
> > > > > +               xe_vm_unbind(vm, vma, NULL, NULL, 0, NULL,
> > > > > true,
> > > > > false);
> > > > > +       }
> > > > > +
> > > > > +       return 0;
> > > > > +}
> > > > > +
> > > > >  static int xe_vma_op_execute(struct xe_vm *vm, struct
> > > > > xe_vma_op
> > > > > *op)
> > > > >  {
> > > > >         int ret = 0;
> > > > > @@ -3205,7 +3235,6 @@ int xe_vm_bind_ioctl(struct drm_device
> > > > > *dev,
> > > > > void *data, struct drm_file *file)
> > > > >         err = vm_bind_ioctl_check_args(xe, args, &bind_ops,
> > > > > &async);
> > > > >         if (err)
> > > > >                 return err;
> > > > > -
> > > > >         if (args->exec_queue_id) {
> > > > >                 q = xe_exec_queue_lookup(xef, args-
> > > > > > exec_queue_id);
> > > > >                 if (XE_IOCTL_DBG(xe, !q)) { @@ -3352,10
> > > > > +3381,13
> > > > > @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data,
> > > > > struct
> > > > > drm_file *file)
> > > > >                 u64 range = bind_ops[i].range;
> > > > >                 u64 addr = bind_ops[i].addr;
> > > > >                 u32 op = bind_ops[i].op;
> > > > > -
> > > > > +retry:
> > > > >                 err = vm_bind_ioctl_lookup_vma(vm, bos[i],
> > > > > addr,
> > > > > range, op);
> > > > > -               if (err)
> > > > > +               if (err) {
> > > > > +                       if (!xe_unbind_scratch_vma(vm, addr,
> > > > > range))
> > > > > +                               goto retry;
> > > > >                         goto free_syncs;
> > > > > +               }
> > > > >         }
> > > > > 
> > > > >         for (i = 0; i < args->num_binds; ++i) { diff --git
> > > > > a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> > > > > index
> > > > > 6de6e3edb24a..6447bed427b1 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_vm.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > > > > @@ -212,6 +212,8 @@ int xe_vma_userptr_pin_pages(struct
> > > > > xe_vma
> > > > *vma);
> > > > > 
> > > > >  int xe_vma_userptr_check_repin(struct xe_vma *vma);
> > > > > 
> > > > > +struct xe_vma *xe_bind_scratch_vma(struct xe_vm *vm, u64
> > > > > addr,
> > > > > u64
> > > > > size);
> > > > > +
> > > > >  /*
> > > > >   * XE_ONSTACK_TV is used to size the tv_onstack array that
> > > > > is
> > > > > input
> > > > >   * to xe_vm_lock_dma_resv() and xe_vm_unlock_dma_resv().
> > > 
>

next prev parent reply	other threads:[~2023-08-29 16:38 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-26  0:14 [Intel-xe] [PATCH] drm/xe: Enable scratch page when page fault is enabled Chang, Bruce
2023-08-26  0:16 ` [Intel-xe] ✗ CI.Patch_applied: failure for drm/xe: Enable scratch page when page fault is enabled (rev2) Patchwork
2023-08-28 17:55 ` [Intel-xe] [PATCH] drm/xe: Enable scratch page when page fault is enabled Summers, Stuart
2023-08-28 19:11   ` Chang, Yu bruce
2023-08-28 22:25     ` Summers, Stuart
2023-08-28 22:44       ` Chang, Yu bruce
2023-08-29 16:36         ` Summers, Stuart [this message]
2023-08-28 19:22 ` Matthew Brost
2023-08-28 22:34   ` Chang, Yu bruce
2023-08-28 23:44     ` Chang, Yu bruce
  -- strict thread matches above, loose matches on Subject: below --
2023-08-30 21:34 Chang, Bruce
2023-09-01  0:31 ` Welty, Brian
2023-11-02 21:16   ` Summers, Stuart
2023-08-29 23:16 Chang, Bruce
2023-08-29 23:58 ` Welty, Brian
2023-08-30 21:37   ` Chang, Yu bruce
2023-07-10 22:06 Chang, Bruce
2023-07-11  2:11 ` Zeng, Oak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3f8e77680f0c28d47445732b1ab8505e2986d294.camel@intel.com \
    --to=stuart.summers@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=yu.bruce.chang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox