From: Oak Zeng <oak.zeng@intel.com>
To: intel-xe@lists.freedesktop.org
Cc: Thomas.Hellstrom@linux.intel.com, matthew.brost@intel.com,
jonathan.cavitt@intel.com
Subject: [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform
Date: Wed, 12 Feb 2025 21:23:31 -0500 [thread overview]
Message-ID: <20250213022331.265424-3-oak.zeng@intel.com> (raw)
In-Reply-To: <20250213022331.265424-1-oak.zeng@intel.com>
Normally scratch page is not allowed when a vm is operate under page
fault mode, i.e., in the existing codes, DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE
and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason
is fault mode relies on recoverable page to work, while scratch page
can mute recoverable page fault.
On xe2 and xe3, out of bound prefetch can cause page fault and further
system hang because xekmd can't resolve such page fault. SYCL and OCL
language runtime requires out of bound prefetch to be silently dropped
without causing any functional problem, thus the existing behavior
doesn't meet language runtime requirement.
At the same time, HW prefetching can cause page fault interrupt. Due to
page fault interrupt overhead (i.e., need Guc and KMD involved to fix
the page fault), HW prefetching can be slowed by many orders of magnitude.
Fix those problems by allowing scratch page under fault mode for xe2 and
xe3. With scratch page in place, HW prefetching could always hit scratch
page instead of causing interrupt.
A side effect is, scratch page could hide application program error.
Application out of bound accesses are hided by scratch page mapping,
instead of get reported to user.
igt test: https://patchwork.freedesktop.org/series/144334/. Test result on
BMG:
root@DUT1130BMGFRD:/home/szeng/dii-tools/igt-public/build/tests# ./xe_exec_fault_mode --run-subtest scratch-fault
IGT-Version: 1.30-gde1a3cb42 (x86_64) (Linux: 6.13.0-xe x86_64)
Using IGT_SRANDOM=1738684805 for randomisation
Opened device: /dev/dri/card0
Starting subtest: scratch-fault
Subtest scratch-fault: SUCCESS (0.080s)
Without this series, the test result is:
root@DUT1130BMGFRD:/home/szeng/dii-tools/igt-public/build/tests# ./xe_exec_fault_mode --run-subtest scratch-fault
IGT-Version: 1.30-gde1a3cb42 (x86_64) (Linux: 6.13.0-xe x86_64)
Using IGT_SRANDOM=1738686046 for randomisation
Opened device: /dev/dri/card0
Starting subtest: scratch-fault
(xe_exec_fault_mode:5047) CRITICAL: Test assertion failure function test_exec, file ../tests/intel/xe_exec_fault_mode.c:349:
(xe_exec_fault_mode:5047) CRITICAL: Failed assertion: __xe_wait_ufence(fd, &exec_sync[i], 0xdeadbeefdeadbeefull, exec_queues[i % n_exec_queues], &timeout) == 0
(xe_exec_fault_mode:5047) CRITICAL: Last errno: 62, Timer expired
(xe_exec_fault_mode:5047) CRITICAL: error: -62 != 0
Stack trace:
#0 ../lib/igt_core.c:2266 __igt_fail_assert()
#1 ../tests/intel/xe_exec_fault_mode.c:346 test_exec()
#2 ../tests/intel/xe_exec_fault_mode.c:537 __igt_unique____real_main407()
#3 ../tests/intel/xe_exec_fault_mode.c:407 main()
#4 ../sysdeps/nptl/libc_start_call_main.h:74 __libc_start_call_main()
#5 ../csu/libc-start.c:128 __libc_start_main@@GLIBC_2.34()
#6 [_start+0x2e]
Subtest scratch-fault failed.
v2: Refine commit message (Thomas)
v3: Move the scratch page flag check to after scratch page wa (Thomas)
Signed-off-by: Oak Zeng <oak.zeng@intel.com>
---
drivers/gpu/drm/xe/xe_vm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 813d893d9b63..c2dfd0ade403 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1766,7 +1766,8 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
return -EINVAL;
if (XE_IOCTL_DBG(xe, args->flags & DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE &&
- args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE))
+ args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE &&
+ !(NEEDS_SCRATCH(xe))))
return -EINVAL;
if (XE_IOCTL_DBG(xe, !(args->flags & DRM_XE_VM_CREATE_FLAG_LR_MODE) &&
--
2.26.3
next prev parent reply other threads:[~2025-02-13 2:08 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-13 2:23 [PATCH 1/3] drm/xe: Introduced needs_scratch bit in device descriptor Oak Zeng
2025-02-13 2:16 ` ✓ CI.Patch_applied: success for series starting with [1/3] " Patchwork
2025-02-13 2:16 ` ✗ CI.checkpatch: warning " Patchwork
2025-02-13 2:18 ` ✓ CI.KUnit: success " Patchwork
2025-02-13 2:23 ` [PATCH 2/3] drm/xe: Clear scratch page on vm_bind Oak Zeng
2025-02-19 17:47 ` Matthew Brost
2025-02-19 20:19 ` Zeng, Oak
2025-02-19 20:46 ` Matthew Brost
2025-02-20 21:09 ` Matthew Brost
2025-02-25 22:54 ` Matthew Brost
2025-02-26 18:49 ` Zeng, Oak
2025-02-26 21:44 ` Matthew Brost
2025-02-13 2:23 ` Oak Zeng [this message]
2025-02-25 22:10 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Matthew Brost
2025-02-26 22:12 ` Zeng, Oak
2025-02-27 0:22 ` Matthew Brost
2025-02-13 2:34 ` ✓ CI.Build: success for series starting with [1/3] drm/xe: Introduced needs_scratch bit in device descriptor Patchwork
2025-02-13 2:37 ` ✓ CI.Hooks: " Patchwork
2025-02-13 2:38 ` ✓ CI.checksparse: " Patchwork
2025-02-13 2:58 ` ✓ Xe.CI.BAT: " Patchwork
2025-02-13 14:56 ` ✗ Xe.CI.Full: failure " Patchwork
2025-02-24 15:40 ` [PATCH 1/3] " Zeng, Oak
2025-02-24 17:27 ` Matthew Brost
2025-02-25 22:13 ` Matthew Brost
-- strict thread matches above, loose matches on Subject: below --
2025-02-06 21:38 Oak Zeng
2025-02-06 21:38 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng
2025-02-06 2:11 [PATCH 1/3] drm/xe: Introduced needs_scratch bit in device descriptor Oak Zeng
2025-02-06 2:11 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng
2025-02-04 18:45 [PATCH 1/3] drm/xe: Introduced needs_scratch bit in device descriptor Oak Zeng
2025-02-04 18:45 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng
2025-02-05 13:14 ` Thomas Hellström
2025-02-06 1:54 ` Zeng, Oak
2025-02-06 9:29 ` Thomas Hellström
2025-02-06 15:14 ` Zeng, Oak
2025-02-25 22:05 ` Matthew Brost
2025-01-28 22:21 [PATCH 1/3] drm/xe: Add a function to zap page table by address range Oak Zeng
2025-01-28 22:21 ` [PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform Oak Zeng
2025-01-28 23:05 ` Cavitt, Jonathan
2025-01-29 8:52 ` Thomas Hellström
2025-01-29 16:41 ` Matthew Brost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250213022331.265424-3-oak.zeng@intel.com \
--to=oak.zeng@intel.com \
--cc=Thomas.Hellstrom@linux.intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jonathan.cavitt@intel.com \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox