Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: "Zeng, Oak" <oak.zeng@intel.com>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Cc: "Brost, Matthew" <matthew.brost@intel.com>,
	"Cavitt, Jonathan" <jonathan.cavitt@intel.com>
Subject: Re: [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform
Date: Thu, 06 Feb 2025 10:29:47 +0100	[thread overview]
Message-ID: <98073717d6ba2fe5de729315e165f7d77f850c45.camel@linux.intel.com> (raw)
In-Reply-To: <MN2PR11MB4728736D182B9C0671A8501692F62@MN2PR11MB4728.namprd11.prod.outlook.com>

On Thu, 2025-02-06 at 01:54 +0000, Zeng, Oak wrote:
> 
> 
> > -----Original Message-----
> > From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Sent: February 5, 2025 8:14 AM
> > To: Zeng, Oak <oak.zeng@intel.com>; intel-xe@lists.freedesktop.org
> > Cc: Brost, Matthew <matthew.brost@intel.com>; Cavitt, Jonathan
> > <jonathan.cavitt@intel.com>
> > Subject: Re: [PATCH 3/3] drm/xe: Allow scratch page under fault
> > mode for certain platform
> > 
> > On Tue, 2025-02-04 at 13:45 -0500, Oak Zeng wrote:
> > > Normally scratch page is not allowed when a vm is operate under
> > page
> > > fault mode, i.e., in the existing codes,
> > > DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE
> > > and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual
> > exclusive. The reason
> > > is fault mode relies on recoverable page to work, while scratch
> > > page
> > > can mute recoverable page fault.
> > > 
> > > On xe2 and xe3, out of bound prefetch can cause page fault and
> > > further
> > > system hang because xekmd can't resolve such page fault. SYCL and
> > OCL
> > > language runtime requires out of bound prefetch to be silently
> > > dropped
> > > without causing any functional problem, thus the existing
> > > behavior
> > > doesn't meet language runtime requirement.
> > > 
> > > At the same time, HW prefetching can cause page fault interrupt.
> > Due
> > > to
> > > page fault interrupt overhead (i.e., need Guc and KMD involved to
> > fix
> > > the page fault), HW prefetching can be slowed by many orders of
> > > magnitude.
> > > 
> > > Fix those problems by allowing scratch page under fault mode for
> > xe2
> > > and
> > > xe3. With scratch page in place, HW prefetching could always hit
> > > scratch
> > > page instead of causing interrupt.
> > > 
> > > A side effect is, scratch page could hide application program
> > > error.
> > > Application out of bound accesses are hided
> > s/hided/hidden/
> > 
> > >  by scratch page mapping,
> > > instead of get reported to user.
> > > 
> > > igt test: https://patchwork.freedesktop.org/series/144334/. Test
> > > result on
> > > BMG:
> > > 
> > > root@DUT1130BMGFRD:/home/szeng/dii-tools/igt-
> > public/build/tests#
> > > ./xe_exec_fault_mode --run-subtest scratch-fault
> > > IGT-Version: 1.30-gde1a3cb42 (x86_64) (Linux: 6.13.0-xe x86_64)
> > > Using IGT_SRANDOM=1738684805 for randomisation
> > > Opened device: /dev/dri/card0
> > > Starting subtest: scratch-fault
> > > Subtest scratch-fault: SUCCESS (0.080s)
> > > 
> > > Without this series, the test result is:
> > > 
> > > root@DUT1130BMGFRD:/home/szeng/dii-tools/igt-
> > public/build/tests#
> > > ./xe_exec_fault_mode --run-subtest scratch-fault
> > > IGT-Version: 1.30-gde1a3cb42 (x86_64) (Linux: 6.13.0-xe x86_64)
> > > Using IGT_SRANDOM=1738686046 for randomisation
> > > Opened device: /dev/dri/card0
> > > Starting subtest: scratch-fault
> > > (xe_exec_fault_mode:5047) CRITICAL: Test assertion failure
> > function
> > > test_exec, file ../tests/intel/xe_exec_fault_mode.c:349:
> > > (xe_exec_fault_mode:5047) CRITICAL: Failed assertion:
> > > __xe_wait_ufence(fd, &exec_sync[i], 0xdeadbeefdeadbeefull,
> > > exec_queues[i % n_exec_queues], &timeout) == 0
> > > (xe_exec_fault_mode:5047) CRITICAL: Last errno: 62, Timer expired
> > > (xe_exec_fault_mode:5047) CRITICAL: error: -62 != 0
> > > Stack trace:
> > >   #0 ../lib/igt_core.c:2266 __igt_fail_assert()
> > >   #1 ../tests/intel/xe_exec_fault_mode.c:346 test_exec()
> > >   #2 ../tests/intel/xe_exec_fault_mode.c:537
> > > __igt_unique____real_main407()
> > >   #3 ../tests/intel/xe_exec_fault_mode.c:407 main()
> > >   #4 ../sysdeps/nptl/libc_start_call_main.h:74
> > > __libc_start_call_main()
> > >   #5 ../csu/libc-start.c:128 __libc_start_main@@GLIBC_2.34()
> > >   #6 [_start+0x2e]
> > > Subtest scratch-fault failed.
> > > 
> > > v2: Refine commit message (Thomas)
> > > 
> > > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_vm.c | 9 +++++----
> > >  1 file changed, 5 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > b/drivers/gpu/drm/xe/xe_vm.c
> > > index 813d893d9b63..c0372f083d42 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > @@ -1752,6 +1752,11 @@ int xe_vm_create_ioctl(struct
> > drm_device *dev,
> > > void *data,
> > >  	if (XE_IOCTL_DBG(xe, args->extensions))
> > >  		return -EINVAL;
> > > 
> > > +	if (XE_IOCTL_DBG(xe, args->flags &
> > > DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE &&
> > > +			 args->flags &
> > > DRM_XE_VM_CREATE_FLAG_FAULT_MODE &&
> > > +			 !(NEEDS_SCRATCH(xe))))
> > > +		return -EINVAL;
> > > +
> > 
> > We should probably move this test to where the old test were below,
> > since the WA below enables scratch pages.
> 
> 
> Below wa code is for dg2.
> If we move this test below the wa code, then on dg2, if people create
> vm
> With FAULT_MODE | LR_MODE, above check would fail user.

Though on DG2 we don't support fault mode.

> 
> So I do think we should keep this check above wa codes.

I think what matters here is readability of the code. A naive reader
(like me in this case) would start to wonder why enabling scratch pages
*after* the sanity check for them is done, and whether that is a bug or
an oversight, and that would require looking up what the WA actually
means.

So IMO in general sanity checks should be done after all implicit
enabling of flags if at all possible. And if not, they should be
accompanied by a comment that explains why the enabling of the flag is
not included in the sanity check.

/Thomas


> 
> Oak
> 
> 
> > 
> > /Thomas
> > 
> > 
> > >  	if (XE_WA(xe_root_mmio_gt(xe), 14016763929))
> > >  		args->flags |=
> > DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE;
> > > 
> > > @@ -1765,10 +1770,6 @@ int xe_vm_create_ioctl(struct
> > drm_device *dev,
> > > void *data,
> > >  	if (XE_IOCTL_DBG(xe, args->flags &
> > > ~ALL_DRM_XE_VM_CREATE_FLAGS))
> > >  		return -EINVAL;
> > > 
> > > -	if (XE_IOCTL_DBG(xe, args->flags &
> > > DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE &&
> > > -			 args->flags &
> > > DRM_XE_VM_CREATE_FLAG_FAULT_MODE))
> > > -		return -EINVAL;
> > > -
> > >  	if (XE_IOCTL_DBG(xe, !(args->flags &
> > > DRM_XE_VM_CREATE_FLAG_LR_MODE) &&
> > >  			 args->flags &
> > > DRM_XE_VM_CREATE_FLAG_FAULT_MODE))
> > >  		return -EINVAL;
> 


  reply	other threads:[~2025-02-06  9:29 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-04 18:45 [PATCH 1/3] drm/xe: Introduced needs_scratch bit in device descriptor Oak Zeng
2025-02-04 18:45 ` [PATCH 2/3] drm/xe: Clear scratch page before vm_bind Oak Zeng
2025-02-05 14:35   ` Thomas Hellström
2025-02-06  1:52     ` Zeng, Oak
2025-02-06 12:51       ` Thomas Hellström
2025-02-06 18:56         ` Zeng, Oak
2025-02-06 20:11           ` Thomas Hellström
2025-02-06 21:05             ` Zeng, Oak
2025-02-06 10:34   ` Matthew Brost
2025-02-06 10:43     ` Thomas Hellström
2025-02-10 17:08       ` Matthew Brost
2025-02-06 15:16     ` Zeng, Oak
2025-02-10 17:09       ` Matthew Brost
2025-02-04 18:45 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng
2025-02-05 13:14   ` Thomas Hellström
2025-02-06  1:54     ` Zeng, Oak
2025-02-06  9:29       ` Thomas Hellström [this message]
2025-02-06 15:14         ` Zeng, Oak
2025-02-25 22:05   ` Matthew Brost
2025-02-04 23:27 ` ✓ CI.Patch_applied: success for series starting with [1/3] drm/xe: Introduced needs_scratch bit in device descriptor Patchwork
2025-02-04 23:27 ` ✗ CI.checkpatch: warning " Patchwork
2025-02-04 23:28 ` ✓ CI.KUnit: success " Patchwork
2025-02-04 23:45 ` ✓ CI.Build: " Patchwork
2025-02-04 23:47 ` ✓ CI.Hooks: " Patchwork
2025-02-04 23:48 ` ✓ CI.checksparse: " Patchwork
2025-02-05  6:05 ` ✓ Xe.CI.BAT: " Patchwork
2025-02-05 13:16 ` [PATCH 1/3] " Thomas Hellström
2025-02-25 21:37 ` Matthew Brost
  -- strict thread matches above, loose matches on Subject: below --
2025-02-13  2:23 Oak Zeng
2025-02-13  2:23 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng
2025-02-25 22:10   ` Matthew Brost
2025-02-26 22:12     ` Zeng, Oak
2025-02-27  0:22       ` Matthew Brost
2025-02-06 21:38 [PATCH 1/3] drm/xe: Introduced needs_scratch bit in device descriptor Oak Zeng
2025-02-06 21:38 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng
2025-02-06  2:11 [PATCH 1/3] drm/xe: Introduced needs_scratch bit in device descriptor Oak Zeng
2025-02-06  2:11 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng
2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng
2025-01-28 22:21 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng
2025-01-28 23:05   ` Cavitt, Jonathan
2025-01-29  8:52   ` Thomas Hellström
2025-01-29 16:41     ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98073717d6ba2fe5de729315e165f7d77f850c45.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jonathan.cavitt@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=oak.zeng@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox