* [igt-dev] [PATCH i-g-t] tests/xe/xe_intel_bb: ensure valid next page
@ 2023-06-02 11:48 Matthew Auld
2023-06-02 14:23 ` [igt-dev] ✓ Fi.CI.BAT: success for " Patchwork
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Matthew Auld @ 2023-06-02 11:48 UTC (permalink / raw)
To: igt-dev; +Cc: Thomas Hellström, intel-xe
Due to over-fetch, recommendation is to ensure we have a single valid
extra page beyond the batch. We currently lack this which seems to
explain why xe_intel_bb@full-batch generates CAT errors.
Currently we allow using the last GTT page, but this looks to be no-go,
since the next page will be beyond the actual GTT, in the case of
full-batch. The i915 path looks to already account for this. However
even with that fixed, Xe doesn't use scratch pages by default so the
next page will still not be valid.
With Xe rather expect that callers know about HW over-fetch, ensuring
that the batch has an extra page, if needed. Alternatively we could
apply the DRM_XE_VM_CREATE_SCRATCH_PAGE when creating the vm, but really
we want to get away from such things.
Bspec: 60223
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/262
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
lib/intel_batchbuffer.c | 6 ++++++
tests/xe/xe_intel_bb.c | 8 +++++++-
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/lib/intel_batchbuffer.c b/lib/intel_batchbuffer.c
index 3cd680072..035facfc4 100644
--- a/lib/intel_batchbuffer.c
+++ b/lib/intel_batchbuffer.c
@@ -881,6 +881,12 @@ static inline uint64_t __intel_bb_get_offset(struct intel_bb *ibb,
* passed in. If this is the case, it copies the information over to the
* newly created batch buffer.
*
+ * NOTE: On Xe scratch pages are not used by default. Due to over-fetch (~512
+ * bytes) there might need to be a valid next page to avoid hangs or CAT errors
+ * if the batch is quite large and approaches the end boundary of the batch
+ * itself. Inflate the @size to ensure there is a valid next page in such
+ * cases.
+ *
* Returns:
*
* Pointer the intel_bb, asserts on failure.
diff --git a/tests/xe/xe_intel_bb.c b/tests/xe/xe_intel_bb.c
index 755cc530e..af8462af5 100644
--- a/tests/xe/xe_intel_bb.c
+++ b/tests/xe/xe_intel_bb.c
@@ -952,7 +952,13 @@ static void full_batch(struct buf_ops *bops)
struct intel_bb *ibb;
int i;
- ibb = intel_bb_create(xe, PAGE_SIZE);
+ /*
+ * Add an extra page to ensure over-fetch always sees a valid next page,
+ * which includes not going beyond the actual GTT, and ensuring we have
+ * a valid GTT entry, given that on xe we don't use scratch pages by
+ * default.
+ */
+ ibb = intel_bb_create(xe, 2 * PAGE_SIZE);
if (debug_bb)
intel_bb_set_debug(ibb, true);
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* [igt-dev] ✓ Fi.CI.BAT: success for tests/xe/xe_intel_bb: ensure valid next page 2023-06-02 11:48 [igt-dev] [PATCH i-g-t] tests/xe/xe_intel_bb: ensure valid next page Matthew Auld @ 2023-06-02 14:23 ` Patchwork 2023-06-04 17:38 ` [igt-dev] ✓ Fi.CI.IGT: " Patchwork ` (2 subsequent siblings) 3 siblings, 0 replies; 8+ messages in thread From: Patchwork @ 2023-06-02 14:23 UTC (permalink / raw) To: Matthew Auld; +Cc: igt-dev [-- Attachment #1: Type: text/plain, Size: 9708 bytes --] == Series Details == Series: tests/xe/xe_intel_bb: ensure valid next page URL : https://patchwork.freedesktop.org/series/118772/ State : success == Summary == CI Bug Log - changes from CI_DRM_13219 -> IGTPW_9098 ==================================================== Summary ------- **WARNING** Minor unknown changes coming with IGTPW_9098 need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in IGTPW_9098, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/index.html Participating hosts (37 -> 37) ------------------------------ Additional (2): bat-mtlp-6 bat-dg1-5 Missing (2): fi-kbl-soraka fi-snb-2520m Possible new issues ------------------- Here are the unknown changes that may have been introduced in IGTPW_9098: ### IGT changes ### #### Warnings #### * igt@kms_psr@primary_mmap_gtt: - bat-rplp-1: [SKIP][1] ([i915#1072]) -> [ABORT][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/bat-rplp-1/igt@kms_psr@primary_mmap_gtt.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-rplp-1/igt@kms_psr@primary_mmap_gtt.html Known issues ------------ Here are the changes found in IGTPW_9098 that come from known issues: ### IGT changes ### #### Issues hit #### * igt@gem_mmap@basic: - bat-dg1-5: NOTRUN -> [SKIP][3] ([i915#4083]) [3]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-dg1-5/igt@gem_mmap@basic.html * igt@gem_tiled_fence_blits@basic: - bat-dg1-5: NOTRUN -> [SKIP][4] ([i915#4077]) +2 similar issues [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-dg1-5/igt@gem_tiled_fence_blits@basic.html * igt@gem_tiled_pread_basic: - bat-dg1-5: NOTRUN -> [SKIP][5] ([i915#4079]) +1 similar issue [5]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-dg1-5/igt@gem_tiled_pread_basic.html * igt@i915_pm_backlight@basic-brightness: - bat-dg1-5: NOTRUN -> [SKIP][6] ([i915#7561]) [6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-dg1-5/igt@i915_pm_backlight@basic-brightness.html * igt@i915_pm_rps@basic-api: - bat-dg1-5: NOTRUN -> [SKIP][7] ([i915#6621]) [7]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-dg1-5/igt@i915_pm_rps@basic-api.html * igt@i915_selftest@live@gt_heartbeat: - fi-apl-guc: [PASS][8] -> [DMESG-FAIL][9] ([i915#5334]) [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html [9]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html * igt@i915_selftest@live@reset: - bat-rpls-2: [PASS][10] -> [ABORT][11] ([i915#4983] / [i915#7461] / [i915#7913] / [i915#7981] / [i915#8347]) [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/bat-rpls-2/igt@i915_selftest@live@reset.html [11]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-rpls-2/igt@i915_selftest@live@reset.html * igt@i915_suspend@basic-s3-without-i915: - bat-rpls-1: NOTRUN -> [ABORT][12] ([i915#6687] / [i915#7978]) [12]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-rpls-1/igt@i915_suspend@basic-s3-without-i915.html * igt@kms_addfb_basic@basic-x-tiled-legacy: - bat-dg1-5: NOTRUN -> [SKIP][13] ([i915#4212]) +7 similar issues [13]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-dg1-5/igt@kms_addfb_basic@basic-x-tiled-legacy.html * igt@kms_addfb_basic@basic-y-tiled-legacy: - bat-dg1-5: NOTRUN -> [SKIP][14] ([i915#4215]) [14]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-dg1-5/igt@kms_addfb_basic@basic-y-tiled-legacy.html * igt@kms_chamelium_hpd@vga-hpd-fast: - bat-dg1-5: NOTRUN -> [SKIP][15] ([i915#7828]) +8 similar issues [15]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-dg1-5/igt@kms_chamelium_hpd@vga-hpd-fast.html * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy: - bat-dg1-5: NOTRUN -> [SKIP][16] ([i915#4103] / [i915#4213]) +1 similar issue [16]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-dg1-5/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy.html * igt@kms_force_connector_basic@force-load-detect: - bat-dg1-5: NOTRUN -> [SKIP][17] ([fdo#109285]) [17]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-dg1-5/igt@kms_force_connector_basic@force-load-detect.html * igt@kms_pipe_crc_basic@nonblocking-crc@pipe-c-dp-1: - bat-dg2-8: [PASS][18] -> [FAIL][19] ([i915#7932]) [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/bat-dg2-8/igt@kms_pipe_crc_basic@nonblocking-crc@pipe-c-dp-1.html [19]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-dg2-8/igt@kms_pipe_crc_basic@nonblocking-crc@pipe-c-dp-1.html * igt@kms_psr@sprite_plane_onoff: - bat-dg1-5: NOTRUN -> [SKIP][20] ([i915#1072] / [i915#4078]) +3 similar issues [20]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-dg1-5/igt@kms_psr@sprite_plane_onoff.html * igt@kms_setmode@basic-clone-single-crtc: - bat-dg1-5: NOTRUN -> [SKIP][21] ([i915#3555] / [i915#4579]) [21]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-dg1-5/igt@kms_setmode@basic-clone-single-crtc.html * igt@prime_vgem@basic-fence-read: - bat-dg1-5: NOTRUN -> [SKIP][22] ([i915#3708]) +3 similar issues [22]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-dg1-5/igt@prime_vgem@basic-fence-read.html * igt@prime_vgem@basic-gtt: - bat-dg1-5: NOTRUN -> [SKIP][23] ([i915#3708] / [i915#4077]) +1 similar issue [23]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-dg1-5/igt@prime_vgem@basic-gtt.html #### Possible fixes #### * igt@i915_selftest@live@requests: - bat-rpls-1: [ABORT][24] ([i915#4983] / [i915#7911] / [i915#7920]) -> [PASS][25] [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/bat-rpls-1/igt@i915_selftest@live@requests.html [25]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/bat-rpls-1/igt@i915_selftest@live@requests.html {name}: This element is suppressed. This means it is ignored when computing the status of the difference (SUCCESS, WARNING, or FAILURE). [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285 [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072 [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845 [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582 [i915#3546]: https://gitlab.freedesktop.org/drm/intel/issues/3546 [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555 [i915#3595]: https://gitlab.freedesktop.org/drm/intel/issues/3595 [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637 [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708 [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077 [i915#4078]: https://gitlab.freedesktop.org/drm/intel/issues/4078 [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079 [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083 [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103 [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212 [i915#4213]: https://gitlab.freedesktop.org/drm/intel/issues/4213 [i915#4215]: https://gitlab.freedesktop.org/drm/intel/issues/4215 [i915#4342]: https://gitlab.freedesktop.org/drm/intel/issues/4342 [i915#4423]: https://gitlab.freedesktop.org/drm/intel/issues/4423 [i915#4579]: https://gitlab.freedesktop.org/drm/intel/issues/4579 [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613 [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983 [i915#5190]: https://gitlab.freedesktop.org/drm/intel/issues/5190 [i915#5274]: https://gitlab.freedesktop.org/drm/intel/issues/5274 [i915#5334]: https://gitlab.freedesktop.org/drm/intel/issues/5334 [i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367 [i915#6621]: https://gitlab.freedesktop.org/drm/intel/issues/6621 [i915#6645]: https://gitlab.freedesktop.org/drm/intel/issues/6645 [i915#6687]: https://gitlab.freedesktop.org/drm/intel/issues/6687 [i915#7456]: https://gitlab.freedesktop.org/drm/intel/issues/7456 [i915#7461]: https://gitlab.freedesktop.org/drm/intel/issues/7461 [i915#7561]: https://gitlab.freedesktop.org/drm/intel/issues/7561 [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828 [i915#7911]: https://gitlab.freedesktop.org/drm/intel/issues/7911 [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913 [i915#7920]: https://gitlab.freedesktop.org/drm/intel/issues/7920 [i915#7932]: https://gitlab.freedesktop.org/drm/intel/issues/7932 [i915#7978]: https://gitlab.freedesktop.org/drm/intel/issues/7978 [i915#7981]: https://gitlab.freedesktop.org/drm/intel/issues/7981 [i915#8347]: https://gitlab.freedesktop.org/drm/intel/issues/8347 Build changes ------------- * CI: CI-20190529 -> None * IGT: IGT_7317 -> IGTPW_9098 CI-20190529: 20190529 CI_DRM_13219: e64abcc87c73dc4c1e510a76f210c65828e06197 @ git://anongit.freedesktop.org/gfx-ci/linux IGTPW_9098: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/index.html IGT_7317: c902b72df45aa49faa38205bc5be3c748d33a3e0 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/index.html [-- Attachment #2: Type: text/html, Size: 10183 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* [igt-dev] ✓ Fi.CI.IGT: success for tests/xe/xe_intel_bb: ensure valid next page 2023-06-02 11:48 [igt-dev] [PATCH i-g-t] tests/xe/xe_intel_bb: ensure valid next page Matthew Auld 2023-06-02 14:23 ` [igt-dev] ✓ Fi.CI.BAT: success for " Patchwork @ 2023-06-04 17:38 ` Patchwork 2023-06-07 9:55 ` [igt-dev] [Intel-xe] [PATCH i-g-t] " Zbigniew Kempczyński 2023-06-07 10:01 ` [igt-dev] ✗ Fi.CI.BUILD: failure for tests/xe/xe_intel_bb: ensure valid next page (rev2) Patchwork 3 siblings, 0 replies; 8+ messages in thread From: Patchwork @ 2023-06-04 17:38 UTC (permalink / raw) To: Matthew Auld; +Cc: igt-dev [-- Attachment #1: Type: text/plain, Size: 16660 bytes --] == Series Details == Series: tests/xe/xe_intel_bb: ensure valid next page URL : https://patchwork.freedesktop.org/series/118772/ State : success == Summary == CI Bug Log - changes from CI_DRM_13219_full -> IGTPW_9098_full ==================================================== Summary ------- **SUCCESS** No regressions found. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/index.html Participating hosts (7 -> 7) ------------------------------ No changes in participating hosts Possible new issues ------------------- Here are the unknown changes that may have been introduced in IGTPW_9098_full: ### IGT changes ### #### Suppressed #### The following results come from untrusted machines, tests, or statuses. They do not affect the overall result. * {igt@kms_plane_scaling@planes-downscale-factor-0-25-upscale-factor-0-25@pipe-a-hdmi-a-1}: - {shard-rkl}: NOTRUN -> [SKIP][1] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-rkl-7/igt@kms_plane_scaling@planes-downscale-factor-0-25-upscale-factor-0-25@pipe-a-hdmi-a-1.html * {igt@kms_plane_scaling@planes-downscale-factor-0-25-upscale-factor-0-25@pipe-c-hdmi-a-4}: - {shard-dg1}: NOTRUN -> [SKIP][2] +2 similar issues [2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-dg1-18/igt@kms_plane_scaling@planes-downscale-factor-0-25-upscale-factor-0-25@pipe-c-hdmi-a-4.html Known issues ------------ Here are the changes found in IGTPW_9098_full that come from known issues: ### IGT changes ### #### Issues hit #### * igt@gem_pwrite@basic-exhaustion: - shard-apl: NOTRUN -> [WARN][3] ([i915#2658]) [3]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-apl3/igt@gem_pwrite@basic-exhaustion.html * igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a: - shard-snb: NOTRUN -> [SKIP][4] ([fdo#109271] / [i915#4579]) +10 similar issues [4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-snb1/igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a.html * igt@kms_ccs@pipe-b-bad-rotation-90-y_tiled_gen12_mc_ccs: - shard-apl: NOTRUN -> [SKIP][5] ([fdo#109271] / [i915#3886]) +2 similar issues [5]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-apl4/igt@kms_ccs@pipe-b-bad-rotation-90-y_tiled_gen12_mc_ccs.html * igt@kms_ccs@pipe-c-bad-aux-stride-y_tiled_gen12_mc_ccs: - shard-glk: NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#3886]) [6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-glk8/igt@kms_ccs@pipe-c-bad-aux-stride-y_tiled_gen12_mc_ccs.html * igt@kms_chamelium_color@ctm-blue-to-red: - shard-glk: NOTRUN -> [SKIP][7] ([fdo#109271]) +20 similar issues [7]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-glk6/igt@kms_chamelium_color@ctm-blue-to-red.html * igt@kms_cursor_crc@cursor-rapid-movement-32x32: - shard-glk: NOTRUN -> [SKIP][8] ([fdo#109271] / [i915#4579]) [8]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-glk2/igt@kms_cursor_crc@cursor-rapid-movement-32x32.html * igt@kms_cursor_crc@cursor-sliding-32x10: - shard-apl: NOTRUN -> [SKIP][9] ([fdo#109271] / [i915#4579]) [9]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-apl3/igt@kms_cursor_crc@cursor-sliding-32x10.html * igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions: - shard-apl: [PASS][10] -> [FAIL][11] ([i915#2346]) [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/shard-apl7/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions.html [11]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-apl4/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions.html * igt@kms_fbcon_fbt@fbc-suspend: - shard-apl: [PASS][12] -> [FAIL][13] ([i915#4767]) [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/shard-apl3/igt@kms_fbcon_fbt@fbc-suspend.html [13]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-apl1/igt@kms_fbcon_fbt@fbc-suspend.html - shard-glk: NOTRUN -> [FAIL][14] ([i915#4767]) [14]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-glk6/igt@kms_fbcon_fbt@fbc-suspend.html * igt@kms_flip@2x-nonexisting-fb: - shard-snb: NOTRUN -> [SKIP][15] ([fdo#109271]) +13 similar issues [15]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-snb7/igt@kms_flip@2x-nonexisting-fb.html * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-indfb-draw-mmap-wc: - shard-apl: NOTRUN -> [SKIP][16] ([fdo#109271]) +10 similar issues [16]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-apl3/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-indfb-draw-mmap-wc.html #### Possible fixes #### * igt@gem_exec_fair@basic-none@bcs0: - {shard-rkl}: [FAIL][17] ([i915#2842]) -> [PASS][18] +1 similar issue [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/shard-rkl-7/igt@gem_exec_fair@basic-none@bcs0.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-rkl-7/igt@gem_exec_fair@basic-none@bcs0.html * igt@gem_exec_fair@basic-pace-share@rcs0: - shard-apl: [FAIL][19] ([i915#2842]) -> [PASS][20] [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/shard-apl7/igt@gem_exec_fair@basic-pace-share@rcs0.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-apl7/igt@gem_exec_fair@basic-pace-share@rcs0.html * igt@gem_exec_suspend@basic-s4-devices@lmem0: - {shard-dg1}: [ABORT][21] ([i915#7975] / [i915#8213]) -> [PASS][22] [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/shard-dg1-14/igt@gem_exec_suspend@basic-s4-devices@lmem0.html [22]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-dg1-18/igt@gem_exec_suspend@basic-s4-devices@lmem0.html * igt@gem_exec_suspend@basic-s4-devices@smem: - {shard-tglu}: [ABORT][23] ([i915#7975] / [i915#8213]) -> [PASS][24] [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/shard-tglu-10/igt@gem_exec_suspend@basic-s4-devices@smem.html [24]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-tglu-2/igt@gem_exec_suspend@basic-s4-devices@smem.html * igt@gen9_exec_parse@allowed-single: - shard-glk: [ABORT][25] ([i915#5566]) -> [PASS][26] [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/shard-glk9/igt@gen9_exec_parse@allowed-single.html [26]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-glk7/igt@gen9_exec_parse@allowed-single.html * igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a: - {shard-rkl}: [SKIP][27] ([i915#1937] / [i915#4579]) -> [PASS][28] [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/shard-rkl-1/igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a.html [28]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-rkl-7/igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a.html * igt@kms_cursor_legacy@single-bo@pipe-b: - {shard-rkl}: [INCOMPLETE][29] ([i915#8011]) -> [PASS][30] [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/shard-rkl-7/igt@kms_cursor_legacy@single-bo@pipe-b.html [30]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-rkl-6/igt@kms_cursor_legacy@single-bo@pipe-b.html * igt@kms_flip@flip-vs-suspend@b-dp1: - shard-apl: [ABORT][31] ([i915#180]) -> [PASS][32] [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/shard-apl1/igt@kms_flip@flip-vs-suspend@b-dp1.html [32]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-apl3/igt@kms_flip@flip-vs-suspend@b-dp1.html * igt@kms_plane_scaling@intel-max-src-size@pipe-a-hdmi-a-2: - {shard-rkl}: [FAIL][33] ([i915#8292]) -> [PASS][34] [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13219/shard-rkl-3/igt@kms_plane_scaling@intel-max-src-size@pipe-a-hdmi-a-2.html [34]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/shard-rkl-6/igt@kms_plane_scaling@intel-max-src-size@pipe-a-hdmi-a-2.html {name}: This element is suppressed. This means it is ignored when computing the status of the difference (SUCCESS, WARNING, or FAILURE). [IGT#2]: https://gitlab.freedesktop.org/drm/igt-gpu-tools/issues/2 [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271 [fdo#109274]: https://bugs.freedesktop.org/show_bug.cgi?id=109274 [fdo#109279]: https://bugs.freedesktop.org/show_bug.cgi?id=109279 [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280 [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289 [fdo#109314]: https://bugs.freedesktop.org/show_bug.cgi?id=109314 [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315 [fdo#109506]: https://bugs.freedesktop.org/show_bug.cgi?id=109506 [fdo#110189]: https://bugs.freedesktop.org/show_bug.cgi?id=110189 [fdo#110723]: https://bugs.freedesktop.org/show_bug.cgi?id=110723 [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068 [fdo#111614]: https://bugs.freedesktop.org/show_bug.cgi?id=111614 [fdo#111615]: https://bugs.freedesktop.org/show_bug.cgi?id=111615 [fdo#111644]: https://bugs.freedesktop.org/show_bug.cgi?id=111644 [fdo#111825]: https://bugs.freedesktop.org/show_bug.cgi?id=111825 [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827 [fdo#112283]: https://bugs.freedesktop.org/show_bug.cgi?id=112283 [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072 [i915#1397]: https://gitlab.freedesktop.org/drm/intel/issues/1397 [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180 [i915#1825]: https://gitlab.freedesktop.org/drm/intel/issues/1825 [i915#1902]: https://gitlab.freedesktop.org/drm/intel/issues/1902 [i915#1937]: https://gitlab.freedesktop.org/drm/intel/issues/1937 [i915#2346]: https://gitlab.freedesktop.org/drm/intel/issues/2346 [i915#2433]: https://gitlab.freedesktop.org/drm/intel/issues/2433 [i915#2527]: https://gitlab.freedesktop.org/drm/intel/issues/2527 [i915#2575]: https://gitlab.freedesktop.org/drm/intel/issues/2575 [i915#2587]: https://gitlab.freedesktop.org/drm/intel/issues/2587 [i915#2658]: https://gitlab.freedesktop.org/drm/intel/issues/2658 [i915#2672]: https://gitlab.freedesktop.org/drm/intel/issues/2672 [i915#2705]: https://gitlab.freedesktop.org/drm/intel/issues/2705 [i915#2842]: https://gitlab.freedesktop.org/drm/intel/issues/2842 [i915#2856]: https://gitlab.freedesktop.org/drm/intel/issues/2856 [i915#3023]: https://gitlab.freedesktop.org/drm/intel/issues/3023 [i915#3281]: https://gitlab.freedesktop.org/drm/intel/issues/3281 [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282 [i915#3297]: https://gitlab.freedesktop.org/drm/intel/issues/3297 [i915#3359]: https://gitlab.freedesktop.org/drm/intel/issues/3359 [i915#3361]: https://gitlab.freedesktop.org/drm/intel/issues/3361 [i915#3458]: https://gitlab.freedesktop.org/drm/intel/issues/3458 [i915#3528]: https://gitlab.freedesktop.org/drm/intel/issues/3528 [i915#3539]: https://gitlab.freedesktop.org/drm/intel/issues/3539 [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555 [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637 [i915#3638]: https://gitlab.freedesktop.org/drm/intel/issues/3638 [i915#3689]: https://gitlab.freedesktop.org/drm/intel/issues/3689 [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708 [i915#3734]: https://gitlab.freedesktop.org/drm/intel/issues/3734 [i915#3742]: https://gitlab.freedesktop.org/drm/intel/issues/3742 [i915#3886]: https://gitlab.freedesktop.org/drm/intel/issues/3886 [i915#3936]: https://gitlab.freedesktop.org/drm/intel/issues/3936 [i915#3955]: https://gitlab.freedesktop.org/drm/intel/issues/3955 [i915#4070]: https://gitlab.freedesktop.org/drm/intel/issues/4070 [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077 [i915#4078]: https://gitlab.freedesktop.org/drm/intel/issues/4078 [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079 [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083 [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103 [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212 [i915#4270]: https://gitlab.freedesktop.org/drm/intel/issues/4270 [i915#433]: https://gitlab.freedesktop.org/drm/intel/issues/433 [i915#4349]: https://gitlab.freedesktop.org/drm/intel/issues/4349 [i915#4391]: https://gitlab.freedesktop.org/drm/intel/issues/4391 [i915#4423]: https://gitlab.freedesktop.org/drm/intel/issues/4423 [i915#4525]: https://gitlab.freedesktop.org/drm/intel/issues/4525 [i915#4538]: https://gitlab.freedesktop.org/drm/intel/issues/4538 [i915#4565]: https://gitlab.freedesktop.org/drm/intel/issues/4565 [i915#4579]: https://gitlab.freedesktop.org/drm/intel/issues/4579 [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613 [i915#4767]: https://gitlab.freedesktop.org/drm/intel/issues/4767 [i915#4771]: https://gitlab.freedesktop.org/drm/intel/issues/4771 [i915#4812]: https://gitlab.freedesktop.org/drm/intel/issues/4812 [i915#4833]: https://gitlab.freedesktop.org/drm/intel/issues/4833 [i915#4852]: https://gitlab.freedesktop.org/drm/intel/issues/4852 [i915#4860]: https://gitlab.freedesktop.org/drm/intel/issues/4860 [i915#4880]: https://gitlab.freedesktop.org/drm/intel/issues/4880 [i915#4884]: https://gitlab.freedesktop.org/drm/intel/issues/4884 [i915#4885]: https://gitlab.freedesktop.org/drm/intel/issues/4885 [i915#5176]: https://gitlab.freedesktop.org/drm/intel/issues/5176 [i915#5235]: https://gitlab.freedesktop.org/drm/intel/issues/5235 [i915#5286]: https://gitlab.freedesktop.org/drm/intel/issues/5286 [i915#5289]: https://gitlab.freedesktop.org/drm/intel/issues/5289 [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533 [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354 [i915#5493]: https://gitlab.freedesktop.org/drm/intel/issues/5493 [i915#5566]: https://gitlab.freedesktop.org/drm/intel/issues/5566 [i915#6095]: https://gitlab.freedesktop.org/drm/intel/issues/6095 [i915#6245]: https://gitlab.freedesktop.org/drm/intel/issues/6245 [i915#6268]: https://gitlab.freedesktop.org/drm/intel/issues/6268 [i915#6301]: https://gitlab.freedesktop.org/drm/intel/issues/6301 [i915#6433]: https://gitlab.freedesktop.org/drm/intel/issues/6433 [i915#6493]: https://gitlab.freedesktop.org/drm/intel/issues/6493 [i915#6524]: https://gitlab.freedesktop.org/drm/intel/issues/6524 [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658 [i915#6768]: https://gitlab.freedesktop.org/drm/intel/issues/6768 [i915#6944]: https://gitlab.freedesktop.org/drm/intel/issues/6944 [i915#6953]: https://gitlab.freedesktop.org/drm/intel/issues/6953 [i915#7116]: https://gitlab.freedesktop.org/drm/intel/issues/7116 [i915#7118]: https://gitlab.freedesktop.org/drm/intel/issues/7118 [i915#7561]: https://gitlab.freedesktop.org/drm/intel/issues/7561 [i915#7697]: https://gitlab.freedesktop.org/drm/intel/issues/7697 [i915#7711]: https://gitlab.freedesktop.org/drm/intel/issues/7711 [i915#7742]: https://gitlab.freedesktop.org/drm/intel/issues/7742 [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828 [i915#7975]: https://gitlab.freedesktop.org/drm/intel/issues/7975 [i915#8011]: https://gitlab.freedesktop.org/drm/intel/issues/8011 [i915#8213]: https://gitlab.freedesktop.org/drm/intel/issues/8213 [i915#8247]: https://gitlab.freedesktop.org/drm/intel/issues/8247 [i915#8292]: https://gitlab.freedesktop.org/drm/intel/issues/8292 [i915#8304]: https://gitlab.freedesktop.org/drm/intel/issues/8304 [i915#8381]: https://gitlab.freedesktop.org/drm/intel/issues/8381 [i915#8411]: https://gitlab.freedesktop.org/drm/intel/issues/8411 [i915#8414]: https://gitlab.freedesktop.org/drm/intel/issues/8414 [i915#8502]: https://gitlab.freedesktop.org/drm/intel/issues/8502 [i915#8555]: https://gitlab.freedesktop.org/drm/intel/issues/8555 Build changes ------------- * CI: CI-20190529 -> None * IGT: IGT_7317 -> IGTPW_9098 * Piglit: piglit_4509 -> None CI-20190529: 20190529 CI_DRM_13219: e64abcc87c73dc4c1e510a76f210c65828e06197 @ git://anongit.freedesktop.org/gfx-ci/linux IGTPW_9098: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/index.html IGT_7317: c902b72df45aa49faa38205bc5be3c748d33a3e0 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_9098/index.html [-- Attachment #2: Type: text/html, Size: 11927 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [igt-dev] [Intel-xe] [PATCH i-g-t] tests/xe/xe_intel_bb: ensure valid next page 2023-06-02 11:48 [igt-dev] [PATCH i-g-t] tests/xe/xe_intel_bb: ensure valid next page Matthew Auld 2023-06-02 14:23 ` [igt-dev] ✓ Fi.CI.BAT: success for " Patchwork 2023-06-04 17:38 ` [igt-dev] ✓ Fi.CI.IGT: " Patchwork @ 2023-06-07 9:55 ` Zbigniew Kempczyński 2023-06-07 10:27 ` Matthew Auld 2023-06-07 10:01 ` [igt-dev] ✗ Fi.CI.BUILD: failure for tests/xe/xe_intel_bb: ensure valid next page (rev2) Patchwork 3 siblings, 1 reply; 8+ messages in thread From: Zbigniew Kempczyński @ 2023-06-07 9:55 UTC (permalink / raw) To: Matthew Auld; +Cc: igt-dev, intel-xe On Fri, Jun 02, 2023 at 12:48:17PM +0100, Matthew Auld wrote: > Due to over-fetch, recommendation is to ensure we have a single valid > extra page beyond the batch. We currently lack this which seems to > explain why xe_intel_bb@full-batch generates CAT errors. > > Currently we allow using the last GTT page, but this looks to be no-go, > since the next page will be beyond the actual GTT, in the case of > full-batch. The i915 path looks to already account for this. However > even with that fixed, Xe doesn't use scratch pages by default so the > next page will still not be valid. > > With Xe rather expect that callers know about HW over-fetch, ensuring > that the batch has an extra page, if needed. Alternatively we could > apply the DRM_XE_VM_CREATE_SCRATCH_PAGE when creating the vm, but really > we want to get away from such things. > > Bspec: 60223 > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/262 > Signed-off-by: Matthew Auld <matthew.auld@intel.com> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> I observe this not only on the last page. I've introduced index 3cd680072e..ef3f782df7 100644 --- a/lib/intel_batchbuffer.c +++ b/lib/intel_batchbuffer.c @@ -947,7 +947,7 @@ __intel_bb_create(int fd, uint32_t ctx, uint32_t vm, const intel_ctx_cfg_t *cfg, /* Limit to 48-bit due to MI_* address limitation */ ibb->gtt_size = 1ull << min_t(uint32_t, xe_va_bits(fd), 48); - end = ibb->gtt_size; + end = ibb->gtt_size - xe_get_default_alignment(fd)*2; or even: + end = ibb->gtt_size/2; Having: ADLP:~/igt-upstream/build/tests# ./xe_intel_bb --run full-batch --debug IGT-Version: 1.27.1-NO-GIT (x86_64) (Linux: 6.3.0-xe+ x86_64) Opened device: /dev/dri/card0 (xe_intel_bb:11420) drmtest-DEBUG: Test requirement passed: !(fd<0) (xe_intel_bb:11420) intel_bufops-DEBUG: generation: 12, supported tiles: 0x3f, driver: xe Starting subtest: full-batch (xe_intel_bb:11420) intel_allocator_simple-DEBUG: Using simple allocator (xe_intel_bb:11420) intel_batchbuffer-DEBUG: Run on DRM_XE_ENGINE_CLASS_COPY (xe_intel_bb:11420) intel_batchbuffer-DEBUG: bind: MAP (xe_intel_bb:11420) intel_batchbuffer-DEBUG: handle: 1, offset: ffffffffd000, size: 1000 (xe_intel_bb:11420) intel_batchbuffer-DEBUG: bind: UNMAP (xe_intel_bb:11420) intel_batchbuffer-DEBUG: offset: ffffffffd000, size: 1000 And I still observe failure, what means problem is with prefetching from next page, regardless location. That means imo umd and any consumer have to overallocate to avoid hitting: [ 1406.062731] xe 0000:00:02.0: [drm] Engine memory cat error: guc_id=2 [ 1406.070519] xe 0000:00:02.0: [drm] Timedout job: seqno=4294967169, guc_id=2, flags=0x4 [ 1406.070535] xe 0000:00:02.0: [drm:xe_devcoredump [xe]] Multiple hangs are occurring, but only the first snapshot was taken [ 1406.071182] xe 0000:00:02.0: [drm] Engine reset: guc_id=2 So from my perspective you've avoided failure but that wasn't my intention when I've written this test. Do you maybe know does umd has mitigation code which prevents of entering area which starts prefetching from next page? -- Zbigniew > --- > lib/intel_batchbuffer.c | 6 ++++++ > tests/xe/xe_intel_bb.c | 8 +++++++- > 2 files changed, 13 insertions(+), 1 deletion(-) > > diff --git a/lib/intel_batchbuffer.c b/lib/intel_batchbuffer.c > index 3cd680072..035facfc4 100644 > --- a/lib/intel_batchbuffer.c > +++ b/lib/intel_batchbuffer.c > @@ -881,6 +881,12 @@ static inline uint64_t __intel_bb_get_offset(struct intel_bb *ibb, > * passed in. If this is the case, it copies the information over to the > * newly created batch buffer. > * > + * NOTE: On Xe scratch pages are not used by default. Due to over-fetch (~512 > + * bytes) there might need to be a valid next page to avoid hangs or CAT errors > + * if the batch is quite large and approaches the end boundary of the batch > + * itself. Inflate the @size to ensure there is a valid next page in such > + * cases. > + * > * Returns: > * > * Pointer the intel_bb, asserts on failure. > diff --git a/tests/xe/xe_intel_bb.c b/tests/xe/xe_intel_bb.c > index 755cc530e..af8462af5 100644 > --- a/tests/xe/xe_intel_bb.c > +++ b/tests/xe/xe_intel_bb.c > @@ -952,7 +952,13 @@ static void full_batch(struct buf_ops *bops) > struct intel_bb *ibb; > int i; > > - ibb = intel_bb_create(xe, PAGE_SIZE); > + /* > + * Add an extra page to ensure over-fetch always sees a valid next page, > + * which includes not going beyond the actual GTT, and ensuring we have > + * a valid GTT entry, given that on xe we don't use scratch pages by > + * default. > + */ > + ibb = intel_bb_create(xe, 2 * PAGE_SIZE); > if (debug_bb) > intel_bb_set_debug(ibb, true); > > -- > 2.40.1 > ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [igt-dev] [Intel-xe] [PATCH i-g-t] tests/xe/xe_intel_bb: ensure valid next page 2023-06-07 9:55 ` [igt-dev] [Intel-xe] [PATCH i-g-t] " Zbigniew Kempczyński @ 2023-06-07 10:27 ` Matthew Auld 2023-06-07 11:09 ` Zbigniew Kempczyński 0 siblings, 1 reply; 8+ messages in thread From: Matthew Auld @ 2023-06-07 10:27 UTC (permalink / raw) To: Zbigniew Kempczyński; +Cc: igt-dev, intel-xe On 07/06/2023 10:55, Zbigniew Kempczyński wrote: > On Fri, Jun 02, 2023 at 12:48:17PM +0100, Matthew Auld wrote: >> Due to over-fetch, recommendation is to ensure we have a single valid >> extra page beyond the batch. We currently lack this which seems to >> explain why xe_intel_bb@full-batch generates CAT errors. >> >> Currently we allow using the last GTT page, but this looks to be no-go, >> since the next page will be beyond the actual GTT, in the case of >> full-batch. The i915 path looks to already account for this. However >> even with that fixed, Xe doesn't use scratch pages by default so the >> next page will still not be valid. >> >> With Xe rather expect that callers know about HW over-fetch, ensuring >> that the batch has an extra page, if needed. Alternatively we could >> apply the DRM_XE_VM_CREATE_SCRATCH_PAGE when creating the vm, but really >> we want to get away from such things. >> >> Bspec: 60223 >> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/262 >> Signed-off-by: Matthew Auld <matthew.auld@intel.com> >> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> >> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> > > I observe this not only on the last page. I've introduced > > index 3cd680072e..ef3f782df7 100644 > --- a/lib/intel_batchbuffer.c > +++ b/lib/intel_batchbuffer.c > @@ -947,7 +947,7 @@ __intel_bb_create(int fd, uint32_t ctx, uint32_t vm, const intel_ctx_cfg_t *cfg, > > /* Limit to 48-bit due to MI_* address limitation */ > ibb->gtt_size = 1ull << min_t(uint32_t, xe_va_bits(fd), 48); > - end = ibb->gtt_size; > + end = ibb->gtt_size - xe_get_default_alignment(fd)*2; > > or even: > > + end = ibb->gtt_size/2; > > Having: > > ADLP:~/igt-upstream/build/tests# ./xe_intel_bb --run full-batch --debug > IGT-Version: 1.27.1-NO-GIT (x86_64) (Linux: 6.3.0-xe+ x86_64) > Opened device: /dev/dri/card0 > (xe_intel_bb:11420) drmtest-DEBUG: Test requirement passed: !(fd<0) > (xe_intel_bb:11420) intel_bufops-DEBUG: generation: 12, supported tiles: 0x3f, driver: xe > Starting subtest: full-batch > (xe_intel_bb:11420) intel_allocator_simple-DEBUG: Using simple allocator > (xe_intel_bb:11420) intel_batchbuffer-DEBUG: Run on DRM_XE_ENGINE_CLASS_COPY > (xe_intel_bb:11420) intel_batchbuffer-DEBUG: bind: MAP > (xe_intel_bb:11420) intel_batchbuffer-DEBUG: handle: 1, offset: ffffffffd000, size: 1000 > (xe_intel_bb:11420) intel_batchbuffer-DEBUG: bind: UNMAP > (xe_intel_bb:11420) intel_batchbuffer-DEBUG: offset: ffffffffd000, size: 1000 > > And I still observe failure, what means problem is with prefetching from > next page, regardless location. Yeah, AFAICT the CAT error will trigger for any overfetch that hits an bogus GTT address. Either because it goes beyond the GTT boundary or because the PTE is marked as invalid and we aren't able to fault it in it. > > That means imo umd and any consumer have to overallocate to avoid > hitting: > > [ 1406.062731] xe 0000:00:02.0: [drm] Engine memory cat error: guc_id=2 > [ 1406.070519] xe 0000:00:02.0: [drm] Timedout job: seqno=4294967169, guc_id=2, flags=0x4 > [ 1406.070535] xe 0000:00:02.0: [drm:xe_devcoredump [xe]] Multiple hangs are occurring, but only the first snapshot was taken > [ 1406.071182] xe 0000:00:02.0: [drm] Engine reset: guc_id=2 > > So from my perspective you've avoided failure but that wasn't my intention > when I've written this test. Do you maybe know does umd has mitigation > code which prevents of entering area which starts prefetching from next > page? Userspace should be aware of such HW behaviour I think. In Mesa + Xe I think it still currently forces scratch pages, but AFAIK they want to get rid of that once they are sure that all potential overfetch can't trigger CAT errors or similar. If the vm doesn't have scratch pages, then I think it's up to userspace to deal with overfetch, and I think that must involve inflating the batch size or somehow ensuring the next page always has something valid bound. Do you know if this test is specifically trying to poke at the overfetch behaviour, or does this test just want a PAGE_SIZE worth of NOOPS + BB_END? I wasn't sure given that currently on DG2/ATS-M the batch is always going to be 64K underneath, and so the overfetch is never going to find issues on such platforms. If the test cares specifically about triggering overfetch we could maybe enable scratch pages for the vm in full-batch, and on DG2/ATS-M also do s/PAGE_SIZE/64K/ or use system memory for the batch. And then ofc ensure the last GTT page is never given out. If we later decide to remove scratch pages completely in Xe then I guess we just remove the test. What do you think? > > -- > Zbigniew > > >> --- >> lib/intel_batchbuffer.c | 6 ++++++ >> tests/xe/xe_intel_bb.c | 8 +++++++- >> 2 files changed, 13 insertions(+), 1 deletion(-) >> >> diff --git a/lib/intel_batchbuffer.c b/lib/intel_batchbuffer.c >> index 3cd680072..035facfc4 100644 >> --- a/lib/intel_batchbuffer.c >> +++ b/lib/intel_batchbuffer.c >> @@ -881,6 +881,12 @@ static inline uint64_t __intel_bb_get_offset(struct intel_bb *ibb, >> * passed in. If this is the case, it copies the information over to the >> * newly created batch buffer. >> * >> + * NOTE: On Xe scratch pages are not used by default. Due to over-fetch (~512 >> + * bytes) there might need to be a valid next page to avoid hangs or CAT errors >> + * if the batch is quite large and approaches the end boundary of the batch >> + * itself. Inflate the @size to ensure there is a valid next page in such >> + * cases. >> + * >> * Returns: >> * >> * Pointer the intel_bb, asserts on failure. >> diff --git a/tests/xe/xe_intel_bb.c b/tests/xe/xe_intel_bb.c >> index 755cc530e..af8462af5 100644 >> --- a/tests/xe/xe_intel_bb.c >> +++ b/tests/xe/xe_intel_bb.c >> @@ -952,7 +952,13 @@ static void full_batch(struct buf_ops *bops) >> struct intel_bb *ibb; >> int i; >> >> - ibb = intel_bb_create(xe, PAGE_SIZE); >> + /* >> + * Add an extra page to ensure over-fetch always sees a valid next page, >> + * which includes not going beyond the actual GTT, and ensuring we have >> + * a valid GTT entry, given that on xe we don't use scratch pages by >> + * default. >> + */ >> + ibb = intel_bb_create(xe, 2 * PAGE_SIZE); >> if (debug_bb) >> intel_bb_set_debug(ibb, true); >> >> -- >> 2.40.1 >> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [igt-dev] [Intel-xe] [PATCH i-g-t] tests/xe/xe_intel_bb: ensure valid next page 2023-06-07 10:27 ` Matthew Auld @ 2023-06-07 11:09 ` Zbigniew Kempczyński 2023-06-07 11:14 ` Matthew Auld 0 siblings, 1 reply; 8+ messages in thread From: Zbigniew Kempczyński @ 2023-06-07 11:09 UTC (permalink / raw) To: Matthew Auld; +Cc: igt-dev, intel-xe On Wed, Jun 07, 2023 at 11:27:41AM +0100, Matthew Auld wrote: > On 07/06/2023 10:55, Zbigniew Kempczyński wrote: > > On Fri, Jun 02, 2023 at 12:48:17PM +0100, Matthew Auld wrote: > > > Due to over-fetch, recommendation is to ensure we have a single valid > > > extra page beyond the batch. We currently lack this which seems to > > > explain why xe_intel_bb@full-batch generates CAT errors. > > > > > > Currently we allow using the last GTT page, but this looks to be no-go, > > > since the next page will be beyond the actual GTT, in the case of > > > full-batch. The i915 path looks to already account for this. However > > > even with that fixed, Xe doesn't use scratch pages by default so the > > > next page will still not be valid. > > > > > > With Xe rather expect that callers know about HW over-fetch, ensuring > > > that the batch has an extra page, if needed. Alternatively we could > > > apply the DRM_XE_VM_CREATE_SCRATCH_PAGE when creating the vm, but really > > > we want to get away from such things. > > > > > > Bspec: 60223 > > > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/262 > > > Signed-off-by: Matthew Auld <matthew.auld@intel.com> > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> > > > > I observe this not only on the last page. I've introduced > > > > index 3cd680072e..ef3f782df7 100644 > > --- a/lib/intel_batchbuffer.c > > +++ b/lib/intel_batchbuffer.c > > @@ -947,7 +947,7 @@ __intel_bb_create(int fd, uint32_t ctx, uint32_t vm, const intel_ctx_cfg_t *cfg, > > /* Limit to 48-bit due to MI_* address limitation */ > > ibb->gtt_size = 1ull << min_t(uint32_t, xe_va_bits(fd), 48); > > - end = ibb->gtt_size; > > + end = ibb->gtt_size - xe_get_default_alignment(fd)*2; > > > > or even: > > > > + end = ibb->gtt_size/2; > > > > Having: > > > > ADLP:~/igt-upstream/build/tests# ./xe_intel_bb --run full-batch --debug > > IGT-Version: 1.27.1-NO-GIT (x86_64) (Linux: 6.3.0-xe+ x86_64) > > Opened device: /dev/dri/card0 > > (xe_intel_bb:11420) drmtest-DEBUG: Test requirement passed: !(fd<0) > > (xe_intel_bb:11420) intel_bufops-DEBUG: generation: 12, supported tiles: 0x3f, driver: xe > > Starting subtest: full-batch > > (xe_intel_bb:11420) intel_allocator_simple-DEBUG: Using simple allocator > > (xe_intel_bb:11420) intel_batchbuffer-DEBUG: Run on DRM_XE_ENGINE_CLASS_COPY > > (xe_intel_bb:11420) intel_batchbuffer-DEBUG: bind: MAP > > (xe_intel_bb:11420) intel_batchbuffer-DEBUG: handle: 1, offset: ffffffffd000, size: 1000 > > (xe_intel_bb:11420) intel_batchbuffer-DEBUG: bind: UNMAP > > (xe_intel_bb:11420) intel_batchbuffer-DEBUG: offset: ffffffffd000, size: 1000 > > > > And I still observe failure, what means problem is with prefetching from > > next page, regardless location. > > Yeah, AFAICT the CAT error will trigger for any overfetch that hits an bogus > GTT address. Either because it goes beyond the GTT boundary or because the > PTE is marked as invalid and we aren't able to fault it in it. > > > > > That means imo umd and any consumer have to overallocate to avoid > > hitting: > > > > [ 1406.062731] xe 0000:00:02.0: [drm] Engine memory cat error: guc_id=2 > > [ 1406.070519] xe 0000:00:02.0: [drm] Timedout job: seqno=4294967169, guc_id=2, flags=0x4 > > [ 1406.070535] xe 0000:00:02.0: [drm:xe_devcoredump [xe]] Multiple hangs are occurring, but only the first snapshot was taken > > [ 1406.071182] xe 0000:00:02.0: [drm] Engine reset: guc_id=2 > > > > So from my perspective you've avoided failure but that wasn't my intention > > when I've written this test. Do you maybe know does umd has mitigation > > code which prevents of entering area which starts prefetching from next > > page? > > Userspace should be aware of such HW behaviour I think. In Mesa + Xe I think > it still currently forces scratch pages, but AFAIK they want to get rid of > that once they are sure that all potential overfetch can't trigger CAT > errors or similar. If the vm doesn't have scratch pages, then I think it's > up to userspace to deal with overfetch, and I think that must involve > inflating the batch size or somehow ensuring the next page always has > something valid bound. > > Do you know if this test is specifically trying to poke at the overfetch > behaviour, or does this test just want a PAGE_SIZE worth of NOOPS + BB_END? > I wasn't sure given that currently on DG2/ATS-M the batch is always going to > be 64K underneath, and so the overfetch is never going to find issues on > such platforms. You're right, for discrete instead of PAGE_SIZE I should pick 64K (xe_get_default_alignment()). > > If the test cares specifically about triggering overfetch we could maybe > enable scratch pages for the vm in full-batch, and on DG2/ATS-M also do > s/PAGE_SIZE/64K/ or use system memory for the batch. And then ofc ensure the > last GTT page is never given out. If we later decide to remove scratch pages > completely in Xe then I guess we just remove the test. What do you think? Test was written to exercise last page on i915. Looks we got no influence on prefetching we always get the error when we enter last 512B if I'm not wrong. Scratch pages are not default option for intel-bb vm and I think it never will be so I would just remove the test. -- Zbigniew > > > > > -- > > Zbigniew > > > > > > > --- > > > lib/intel_batchbuffer.c | 6 ++++++ > > > tests/xe/xe_intel_bb.c | 8 +++++++- > > > 2 files changed, 13 insertions(+), 1 deletion(-) > > > > > > diff --git a/lib/intel_batchbuffer.c b/lib/intel_batchbuffer.c > > > index 3cd680072..035facfc4 100644 > > > --- a/lib/intel_batchbuffer.c > > > +++ b/lib/intel_batchbuffer.c > > > @@ -881,6 +881,12 @@ static inline uint64_t __intel_bb_get_offset(struct intel_bb *ibb, > > > * passed in. If this is the case, it copies the information over to the > > > * newly created batch buffer. > > > * > > > + * NOTE: On Xe scratch pages are not used by default. Due to over-fetch (~512 > > > + * bytes) there might need to be a valid next page to avoid hangs or CAT errors > > > + * if the batch is quite large and approaches the end boundary of the batch > > > + * itself. Inflate the @size to ensure there is a valid next page in such > > > + * cases. > > > + * > > > * Returns: > > > * > > > * Pointer the intel_bb, asserts on failure. > > > diff --git a/tests/xe/xe_intel_bb.c b/tests/xe/xe_intel_bb.c > > > index 755cc530e..af8462af5 100644 > > > --- a/tests/xe/xe_intel_bb.c > > > +++ b/tests/xe/xe_intel_bb.c > > > @@ -952,7 +952,13 @@ static void full_batch(struct buf_ops *bops) > > > struct intel_bb *ibb; > > > int i; > > > - ibb = intel_bb_create(xe, PAGE_SIZE); > > > + /* > > > + * Add an extra page to ensure over-fetch always sees a valid next page, > > > + * which includes not going beyond the actual GTT, and ensuring we have > > > + * a valid GTT entry, given that on xe we don't use scratch pages by > > > + * default. > > > + */ > > > + ibb = intel_bb_create(xe, 2 * PAGE_SIZE); > > > if (debug_bb) > > > intel_bb_set_debug(ibb, true); > > > -- > > > 2.40.1 > > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [igt-dev] [Intel-xe] [PATCH i-g-t] tests/xe/xe_intel_bb: ensure valid next page 2023-06-07 11:09 ` Zbigniew Kempczyński @ 2023-06-07 11:14 ` Matthew Auld 0 siblings, 0 replies; 8+ messages in thread From: Matthew Auld @ 2023-06-07 11:14 UTC (permalink / raw) To: Zbigniew Kempczyński; +Cc: igt-dev, intel-xe On 07/06/2023 12:09, Zbigniew Kempczyński wrote: > On Wed, Jun 07, 2023 at 11:27:41AM +0100, Matthew Auld wrote: >> On 07/06/2023 10:55, Zbigniew Kempczyński wrote: >>> On Fri, Jun 02, 2023 at 12:48:17PM +0100, Matthew Auld wrote: >>>> Due to over-fetch, recommendation is to ensure we have a single valid >>>> extra page beyond the batch. We currently lack this which seems to >>>> explain why xe_intel_bb@full-batch generates CAT errors. >>>> >>>> Currently we allow using the last GTT page, but this looks to be no-go, >>>> since the next page will be beyond the actual GTT, in the case of >>>> full-batch. The i915 path looks to already account for this. However >>>> even with that fixed, Xe doesn't use scratch pages by default so the >>>> next page will still not be valid. >>>> >>>> With Xe rather expect that callers know about HW over-fetch, ensuring >>>> that the batch has an extra page, if needed. Alternatively we could >>>> apply the DRM_XE_VM_CREATE_SCRATCH_PAGE when creating the vm, but really >>>> we want to get away from such things. >>>> >>>> Bspec: 60223 >>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/262 >>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com> >>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> >>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> >>> >>> I observe this not only on the last page. I've introduced >>> >>> index 3cd680072e..ef3f782df7 100644 >>> --- a/lib/intel_batchbuffer.c >>> +++ b/lib/intel_batchbuffer.c >>> @@ -947,7 +947,7 @@ __intel_bb_create(int fd, uint32_t ctx, uint32_t vm, const intel_ctx_cfg_t *cfg, >>> /* Limit to 48-bit due to MI_* address limitation */ >>> ibb->gtt_size = 1ull << min_t(uint32_t, xe_va_bits(fd), 48); >>> - end = ibb->gtt_size; >>> + end = ibb->gtt_size - xe_get_default_alignment(fd)*2; >>> >>> or even: >>> >>> + end = ibb->gtt_size/2; >>> >>> Having: >>> >>> ADLP:~/igt-upstream/build/tests# ./xe_intel_bb --run full-batch --debug >>> IGT-Version: 1.27.1-NO-GIT (x86_64) (Linux: 6.3.0-xe+ x86_64) >>> Opened device: /dev/dri/card0 >>> (xe_intel_bb:11420) drmtest-DEBUG: Test requirement passed: !(fd<0) >>> (xe_intel_bb:11420) intel_bufops-DEBUG: generation: 12, supported tiles: 0x3f, driver: xe >>> Starting subtest: full-batch >>> (xe_intel_bb:11420) intel_allocator_simple-DEBUG: Using simple allocator >>> (xe_intel_bb:11420) intel_batchbuffer-DEBUG: Run on DRM_XE_ENGINE_CLASS_COPY >>> (xe_intel_bb:11420) intel_batchbuffer-DEBUG: bind: MAP >>> (xe_intel_bb:11420) intel_batchbuffer-DEBUG: handle: 1, offset: ffffffffd000, size: 1000 >>> (xe_intel_bb:11420) intel_batchbuffer-DEBUG: bind: UNMAP >>> (xe_intel_bb:11420) intel_batchbuffer-DEBUG: offset: ffffffffd000, size: 1000 >>> >>> And I still observe failure, what means problem is with prefetching from >>> next page, regardless location. >> >> Yeah, AFAICT the CAT error will trigger for any overfetch that hits an bogus >> GTT address. Either because it goes beyond the GTT boundary or because the >> PTE is marked as invalid and we aren't able to fault it in it. >> >>> >>> That means imo umd and any consumer have to overallocate to avoid >>> hitting: >>> >>> [ 1406.062731] xe 0000:00:02.0: [drm] Engine memory cat error: guc_id=2 >>> [ 1406.070519] xe 0000:00:02.0: [drm] Timedout job: seqno=4294967169, guc_id=2, flags=0x4 >>> [ 1406.070535] xe 0000:00:02.0: [drm:xe_devcoredump [xe]] Multiple hangs are occurring, but only the first snapshot was taken >>> [ 1406.071182] xe 0000:00:02.0: [drm] Engine reset: guc_id=2 >>> >>> So from my perspective you've avoided failure but that wasn't my intention >>> when I've written this test. Do you maybe know does umd has mitigation >>> code which prevents of entering area which starts prefetching from next >>> page? >> >> Userspace should be aware of such HW behaviour I think. In Mesa + Xe I think >> it still currently forces scratch pages, but AFAIK they want to get rid of >> that once they are sure that all potential overfetch can't trigger CAT >> errors or similar. If the vm doesn't have scratch pages, then I think it's >> up to userspace to deal with overfetch, and I think that must involve >> inflating the batch size or somehow ensuring the next page always has >> something valid bound. >> >> Do you know if this test is specifically trying to poke at the overfetch >> behaviour, or does this test just want a PAGE_SIZE worth of NOOPS + BB_END? >> I wasn't sure given that currently on DG2/ATS-M the batch is always going to >> be 64K underneath, and so the overfetch is never going to find issues on >> such platforms. > > You're right, for discrete instead of PAGE_SIZE I should pick 64K > (xe_get_default_alignment()). > >> >> If the test cares specifically about triggering overfetch we could maybe >> enable scratch pages for the vm in full-batch, and on DG2/ATS-M also do >> s/PAGE_SIZE/64K/ or use system memory for the batch. And then ofc ensure the >> last GTT page is never given out. If we later decide to remove scratch pages >> completely in Xe then I guess we just remove the test. What do you think? > > Test was written to exercise last page on i915. Looks we got no influence > on prefetching we always get the error when we enter last 512B if I'm not > wrong. Scratch pages are not default option for intel-bb vm and I think it > never will be so I would just remove the test. Ok, will remove instead. Thanks for taking a look. > > -- > Zbigniew > >> >>> >>> -- >>> Zbigniew >>> >>> >>>> --- >>>> lib/intel_batchbuffer.c | 6 ++++++ >>>> tests/xe/xe_intel_bb.c | 8 +++++++- >>>> 2 files changed, 13 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/lib/intel_batchbuffer.c b/lib/intel_batchbuffer.c >>>> index 3cd680072..035facfc4 100644 >>>> --- a/lib/intel_batchbuffer.c >>>> +++ b/lib/intel_batchbuffer.c >>>> @@ -881,6 +881,12 @@ static inline uint64_t __intel_bb_get_offset(struct intel_bb *ibb, >>>> * passed in. If this is the case, it copies the information over to the >>>> * newly created batch buffer. >>>> * >>>> + * NOTE: On Xe scratch pages are not used by default. Due to over-fetch (~512 >>>> + * bytes) there might need to be a valid next page to avoid hangs or CAT errors >>>> + * if the batch is quite large and approaches the end boundary of the batch >>>> + * itself. Inflate the @size to ensure there is a valid next page in such >>>> + * cases. >>>> + * >>>> * Returns: >>>> * >>>> * Pointer the intel_bb, asserts on failure. >>>> diff --git a/tests/xe/xe_intel_bb.c b/tests/xe/xe_intel_bb.c >>>> index 755cc530e..af8462af5 100644 >>>> --- a/tests/xe/xe_intel_bb.c >>>> +++ b/tests/xe/xe_intel_bb.c >>>> @@ -952,7 +952,13 @@ static void full_batch(struct buf_ops *bops) >>>> struct intel_bb *ibb; >>>> int i; >>>> - ibb = intel_bb_create(xe, PAGE_SIZE); >>>> + /* >>>> + * Add an extra page to ensure over-fetch always sees a valid next page, >>>> + * which includes not going beyond the actual GTT, and ensuring we have >>>> + * a valid GTT entry, given that on xe we don't use scratch pages by >>>> + * default. >>>> + */ >>>> + ibb = intel_bb_create(xe, 2 * PAGE_SIZE); >>>> if (debug_bb) >>>> intel_bb_set_debug(ibb, true); >>>> -- >>>> 2.40.1 >>>> ^ permalink raw reply [flat|nested] 8+ messages in thread
* [igt-dev] ✗ Fi.CI.BUILD: failure for tests/xe/xe_intel_bb: ensure valid next page (rev2) 2023-06-02 11:48 [igt-dev] [PATCH i-g-t] tests/xe/xe_intel_bb: ensure valid next page Matthew Auld ` (2 preceding siblings ...) 2023-06-07 9:55 ` [igt-dev] [Intel-xe] [PATCH i-g-t] " Zbigniew Kempczyński @ 2023-06-07 10:01 ` Patchwork 3 siblings, 0 replies; 8+ messages in thread From: Patchwork @ 2023-06-07 10:01 UTC (permalink / raw) To: Zbigniew Kempczyński; +Cc: igt-dev == Series Details == Series: tests/xe/xe_intel_bb: ensure valid next page (rev2) URL : https://patchwork.freedesktop.org/series/118772/ State : failure == Summary == Applying: tests/xe/xe_intel_bb: ensure valid next page Patch failed at 0001 tests/xe/xe_intel_bb: ensure valid next page When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-06-07 11:14 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-06-02 11:48 [igt-dev] [PATCH i-g-t] tests/xe/xe_intel_bb: ensure valid next page Matthew Auld 2023-06-02 14:23 ` [igt-dev] ✓ Fi.CI.BAT: success for " Patchwork 2023-06-04 17:38 ` [igt-dev] ✓ Fi.CI.IGT: " Patchwork 2023-06-07 9:55 ` [igt-dev] [Intel-xe] [PATCH i-g-t] " Zbigniew Kempczyński 2023-06-07 10:27 ` Matthew Auld 2023-06-07 11:09 ` Zbigniew Kempczyński 2023-06-07 11:14 ` Matthew Auld 2023-06-07 10:01 ` [igt-dev] ✗ Fi.CI.BUILD: failure for tests/xe/xe_intel_bb: ensure valid next page (rev2) Patchwork
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox