* [Intel-xe] [PATCH 0/2] Implement rcs/ccs missing invalidations and flushes
@ 2023-06-07 17:47 Thomas Hellström
2023-06-07 17:47 ` [Intel-xe] [PATCH 1/2] drm/xe: Invalidate TLB also on bind if in scratch page mode Thomas Hellström
` (5 more replies)
0 siblings, 6 replies; 11+ messages in thread
From: Thomas Hellström @ 2023-06-07 17:47 UTC (permalink / raw)
To: intel-xe
Mesa is seeing unexpected content in some tests.
Fixing those require a TLB invalidation at batch start and a
render cache flush at batch end.
KMD also requires the latter to make sure any GPU side caches are
flushed before handing memory over for reuse. This is implemented
in patch 2.
The former is likely due to scratch PTEs remaining in the TLB after a
prefetch or similar. We could discuss whether user-space should be
responsible for a TLB invalidation after a VM_BIND operation, but
patch 1 implements a TLB flush at batch start for non-LR vms with scratch
pages. For LR vms with scratch pages the TLB flush is incoporated
in the bind fence.
The TLB invalidation can be optimized / coalesced later.
Thomas Hellström (2):
drm/xe: Invalidate TLB also on bind if in scratch page mode
drm/xe: Emit a render cache flush after each rcs/ccs batch
drivers/gpu/drm/xe/regs/xe_gpu_commands.h | 4 ++
drivers/gpu/drm/xe/xe_pt.c | 17 +++++++-
drivers/gpu/drm/xe/xe_ring_ops.c | 50 +++++++++++++++++++++--
drivers/gpu/drm/xe/xe_wa_oob.rules | 1 +
4 files changed, 67 insertions(+), 5 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Intel-xe] [PATCH 1/2] drm/xe: Invalidate TLB also on bind if in scratch page mode
2023-06-07 17:47 [Intel-xe] [PATCH 0/2] Implement rcs/ccs missing invalidations and flushes Thomas Hellström
@ 2023-06-07 17:47 ` Thomas Hellström
2023-06-07 18:01 ` Souza, Jose
2023-06-09 15:51 ` Matthew Brost
2023-06-07 17:47 ` [Intel-xe] [PATCH 2/2] drm/xe: Emit a render cache flush after each rcs/ccs batch Thomas Hellström
` (4 subsequent siblings)
5 siblings, 2 replies; 11+ messages in thread
From: Thomas Hellström @ 2023-06-07 17:47 UTC (permalink / raw)
To: intel-xe
For scratch table mode we need to cover the case where a scratch PTE might
have been pre-fetched and cached and used instead of that of the newly
bound vma.
For compute vms, invalidate TLB globally using GuC before signalling
bind complete. For !long-running vms, invalidate TLB at batch start.
Also document how TLB invalidation works.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
drivers/gpu/drm/xe/regs/xe_gpu_commands.h | 1 +
drivers/gpu/drm/xe/xe_pt.c | 17 +++++++++++++++--
drivers/gpu/drm/xe/xe_ring_ops.c | 15 ++++++++++++---
3 files changed, 28 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
index 0f9c5b0b8a3b..d2d41f717525 100644
--- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
+++ b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
@@ -73,6 +73,7 @@
#define PIPE_CONTROL_STORE_DATA_INDEX (1<<21)
#define PIPE_CONTROL_CS_STALL (1<<20)
#define PIPE_CONTROL_GLOBAL_SNAPSHOT_RESET (1<<19)
+#define PIPE_CONTROL_TLB_INVALIDATE (1<<18)
#define PIPE_CONTROL_PSD_SYNC (1<<17)
#define PIPE_CONTROL_QW_WRITE (1<<14)
#define PIPE_CONTROL_DEPTH_STALL (1<<13)
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index bef265715000..e817fa9fe65e 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1297,7 +1297,20 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_engine *e,
xe_vm_dbg_print_entries(tile_to_xe(tile), entries, num_entries);
- if (rebind && !xe_vm_no_dma_fences(vma->vm)) {
+ /*
+ * If rebind, we have to invalidate TLB on !LR vms to invalidate
+ * cached PTEs point to freed memory. on LR vms this is done
+ * automatically when the context is re-enabled by the rebind worker,
+ * or in fault mode it was invalidated on PTE zapping.
+ *
+ * If !rebind, and scratch enabled VMs, there is a chance the scratch
+ * PTE is already cached in the TLB so it needs to be invalidated.
+ * on !LR VMs this is done in the ring ops preceding a batch, but on
+ * non-faulting LR, in particular on user-space batch buffer chaining,
+ * it needs to be done here.
+ */
+ if ((rebind && !xe_vm_no_dma_fences(vm)) ||
+ (!rebind && vm->scratch_bo[tile->id] && xe_vm_in_compute_mode(vm))) {
ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
if (!ifence)
return ERR_PTR(-ENOMEM);
@@ -1313,7 +1326,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_engine *e,
LLIST_HEAD(deferred);
/* TLB invalidation must be done before signaling rebind */
- if (rebind && !xe_vm_no_dma_fences(vma->vm)) {
+ if (ifence) {
int err = invalidation_fence_init(tile->primary_gt, ifence, fence,
vma);
if (err) {
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 2deee7a2bb14..c20fe41c0729 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -15,6 +15,7 @@
#include "xe_macros.h"
#include "xe_sched_job.h"
#include "xe_vm_types.h"
+#include "xe_vm.h"
/*
* 3D-related flags that can't be set on _engines_ that lack access to the 3D
@@ -107,7 +108,7 @@ static int emit_flush_invalidate(u32 flag, u32 *dw, int i)
return i;
}
-static int emit_pipe_invalidate(u32 mask_flags, u32 *dw, int i)
+static int emit_pipe_invalidate(u32 mask_flags, u32 extra_flags, u32 *dw, int i)
{
u32 flags = PIPE_CONTROL_CS_STALL |
PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
@@ -117,7 +118,8 @@ static int emit_pipe_invalidate(u32 mask_flags, u32 *dw, int i)
PIPE_CONTROL_CONST_CACHE_INVALIDATE |
PIPE_CONTROL_STATE_CACHE_INVALIDATE |
PIPE_CONTROL_QW_WRITE |
- PIPE_CONTROL_STORE_DATA_INDEX;
+ PIPE_CONTROL_STORE_DATA_INDEX |
+ extra_flags;
flags &= ~mask_flags;
@@ -250,14 +252,21 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
struct xe_gt *gt = job->engine->gt;
struct xe_device *xe = gt_to_xe(gt);
bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
+ struct xe_vm *vm = job->engine->vm;
u32 mask_flags = 0;
+ u32 extra_flags = 0;
dw[i++] = preparser_disable(true);
if (lacks_render)
mask_flags = PIPE_CONTROL_3D_ARCH_FLAGS;
else if (job->engine->class == XE_ENGINE_CLASS_COMPUTE)
mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
- i = emit_pipe_invalidate(mask_flags, dw, i);
+
+ /* See xe_pt.c for a discussion on TLB invalidations. */
+ if (!xe_vm_no_dma_fences(vm) && vm->scratch_bo[gt_to_tile(gt)->id])
+ extra_flags = PIPE_CONTROL_TLB_INVALIDATE;
+
+ i = emit_pipe_invalidate(mask_flags, extra_flags, dw, i);
/* hsdes: 1809175790 */
if (has_aux_ccs(xe))
--
2.39.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Intel-xe] [PATCH 2/2] drm/xe: Emit a render cache flush after each rcs/ccs batch
2023-06-07 17:47 [Intel-xe] [PATCH 0/2] Implement rcs/ccs missing invalidations and flushes Thomas Hellström
2023-06-07 17:47 ` [Intel-xe] [PATCH 1/2] drm/xe: Invalidate TLB also on bind if in scratch page mode Thomas Hellström
@ 2023-06-07 17:47 ` Thomas Hellström
2023-06-07 18:44 ` Souza, Jose
2023-06-07 17:49 ` [Intel-xe] ✓ CI.Patch_applied: success for Implement rcs/ccs missing invalidations and flushes Patchwork
` (3 subsequent siblings)
5 siblings, 1 reply; 11+ messages in thread
From: Thomas Hellström @ 2023-06-07 17:47 UTC (permalink / raw)
To: intel-xe
We need to flush render caches before fence signalling, where we might
release the memory for reuse. We can't rely on userspace doing this,
so flush render caches after the batch, but before user fence- and
dma_fence signalling.
Copy the cache flush from i915, but omit PIPE_CONTROL_FLUSH_L3, since it
should be implied by the other flushes. Also omit
PIPE_CONTROL_TLB_INVALIDATE since there should be no apparent need to
invalidate TLB after batch completion.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
drivers/gpu/drm/xe/regs/xe_gpu_commands.h | 3 ++
drivers/gpu/drm/xe/xe_ring_ops.c | 35 +++++++++++++++++++++++
drivers/gpu/drm/xe/xe_wa_oob.rules | 1 +
3 files changed, 39 insertions(+)
diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
index d2d41f717525..dd3408fd3d33 100644
--- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
+++ b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
@@ -66,6 +66,9 @@
#define PVC_MS_MOCS_INDEX_MASK GENMASK(6, 1)
#define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
+
+#define PIPE_CONTROL0_HDC_PIPELINE_FLUSH REG_BIT(9) /* gen12 */
+
#define PIPE_CONTROL_COMMAND_CACHE_INVALIDATE (1<<29)
#define PIPE_CONTROL_TILE_CACHE_FLUSH (1<<28)
#define PIPE_CONTROL_AMFS_FLUSH (1<<25)
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index c20fe41c0729..91511f72d971 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -5,6 +5,7 @@
#include "xe_ring_ops.h"
+#include "generated/xe_wa_oob.h"
#include "regs/xe_gpu_commands.h"
#include "regs/xe_gt_regs.h"
#include "regs/xe_lrc_layout.h"
@@ -16,6 +17,7 @@
#include "xe_sched_job.h"
#include "xe_vm_types.h"
#include "xe_vm.h"
+#include "xe_wa.h"
/*
* 3D-related flags that can't be set on _engines_ that lack access to the 3D
@@ -147,6 +149,37 @@ static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
return i;
}
+static int emit_render_cache_flush(struct xe_sched_job *job, u32 *dw, int i)
+{
+ struct xe_gt *gt = job->engine->gt;
+ bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
+ u32 flags;
+
+ flags = (PIPE_CONTROL_CS_STALL |
+ PIPE_CONTROL_TILE_CACHE_FLUSH |
+ PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
+ PIPE_CONTROL_DEPTH_CACHE_FLUSH |
+ PIPE_CONTROL_DC_FLUSH_ENABLE |
+ PIPE_CONTROL_FLUSH_ENABLE);
+
+ if (XE_WA(gt, 1409600907))
+ flags |= PIPE_CONTROL_DEPTH_STALL;
+
+ if (lacks_render)
+ flags &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
+ else if (job->engine->class == XE_ENGINE_CLASS_COMPUTE)
+ flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
+
+ dw[i++] = GFX_OP_PIPE_CONTROL(6) | PIPE_CONTROL0_HDC_PIPELINE_FLUSH;
+ dw[i++] = flags;
+ dw[i++] = 0;
+ dw[i++] = 0;
+ dw[i++] = 0;
+ dw[i++] = 0;
+
+ return i;
+}
+
static int emit_pipe_imm_ggtt(u32 addr, u32 value, bool stall_only, u32 *dw,
int i)
{
@@ -279,6 +312,8 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
i = emit_bb_start(batch_addr, ppgtt_flag, dw, i);
+ i = emit_render_cache_flush(job, dw, i);
+
if (job->user_fence.used)
i = emit_store_imm_ppgtt_posted(job->user_fence.addr,
job->user_fence.value,
diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules
index 1ecb10390b28..15c23813398a 100644
--- a/drivers/gpu/drm/xe/xe_wa_oob.rules
+++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
@@ -14,3 +14,4 @@
SUBPLATFORM(DG2, G12)
18020744125 PLATFORM(PVC)
1509372804 PLATFORM(PVC), GRAPHICS_STEP(A0, C0)
+1409600907 GRAPHICS_VERSION_RANGE(1200, 1250)
--
2.39.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Intel-xe] ✓ CI.Patch_applied: success for Implement rcs/ccs missing invalidations and flushes
2023-06-07 17:47 [Intel-xe] [PATCH 0/2] Implement rcs/ccs missing invalidations and flushes Thomas Hellström
2023-06-07 17:47 ` [Intel-xe] [PATCH 1/2] drm/xe: Invalidate TLB also on bind if in scratch page mode Thomas Hellström
2023-06-07 17:47 ` [Intel-xe] [PATCH 2/2] drm/xe: Emit a render cache flush after each rcs/ccs batch Thomas Hellström
@ 2023-06-07 17:49 ` Patchwork
2023-06-07 17:49 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
` (2 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2023-06-07 17:49 UTC (permalink / raw)
To: Thomas Hellström; +Cc: intel-xe
== Series Details ==
Series: Implement rcs/ccs missing invalidations and flushes
URL : https://patchwork.freedesktop.org/series/119033/
State : success
== Summary ==
=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: dc8094ef1 drm/xe/wa: Extend scope of Wa_14015795083
=== git am output follows ===
Applying: drm/xe: Invalidate TLB also on bind if in scratch page mode
Applying: drm/xe: Emit a render cache flush after each rcs/ccs batch
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Intel-xe] ✗ CI.checkpatch: warning for Implement rcs/ccs missing invalidations and flushes
2023-06-07 17:47 [Intel-xe] [PATCH 0/2] Implement rcs/ccs missing invalidations and flushes Thomas Hellström
` (2 preceding siblings ...)
2023-06-07 17:49 ` [Intel-xe] ✓ CI.Patch_applied: success for Implement rcs/ccs missing invalidations and flushes Patchwork
@ 2023-06-07 17:49 ` Patchwork
2023-06-07 17:50 ` [Intel-xe] ✗ CI.KUnit: failure " Patchwork
2023-06-07 18:03 ` [Intel-xe] [PATCH 0/2] " Souza, Jose
5 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2023-06-07 17:49 UTC (permalink / raw)
To: Thomas Hellström; +Cc: intel-xe
== Series Details ==
Series: Implement rcs/ccs missing invalidations and flushes
URL : https://patchwork.freedesktop.org/series/119033/
State : warning
== Summary ==
+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
c7d32770e3cd31d9fc134ce41f329b10aa33ee15
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit e101f891af28a3685c61bb6145a14f4a6db6c87f
Author: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Date: Wed Jun 7 19:47:29 2023 +0200
drm/xe: Emit a render cache flush after each rcs/ccs batch
We need to flush render caches before fence signalling, where we might
release the memory for reuse. We can't rely on userspace doing this,
so flush render caches after the batch, but before user fence- and
dma_fence signalling.
Copy the cache flush from i915, but omit PIPE_CONTROL_FLUSH_L3, since it
should be implied by the other flushes. Also omit
PIPE_CONTROL_TLB_INVALIDATE since there should be no apparent need to
invalidate TLB after batch completion.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
+ /mt/dim checkpatch dc8094ef1ae1faa7bff0a2a3d510a38ba557302a drm-intel
1745dca57 drm/xe: Invalidate TLB also on bind if in scratch page mode
-:27: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#27: FILE: drivers/gpu/drm/xe/regs/xe_gpu_commands.h:76:
+#define PIPE_CONTROL_TLB_INVALIDATE (1<<18)
^
total: 0 errors, 0 warnings, 1 checks, 82 lines checked
e101f891a drm/xe: Emit a render cache flush after each rcs/ccs batch
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Intel-xe] ✗ CI.KUnit: failure for Implement rcs/ccs missing invalidations and flushes
2023-06-07 17:47 [Intel-xe] [PATCH 0/2] Implement rcs/ccs missing invalidations and flushes Thomas Hellström
` (3 preceding siblings ...)
2023-06-07 17:49 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
@ 2023-06-07 17:50 ` Patchwork
2023-06-07 18:03 ` [Intel-xe] [PATCH 0/2] " Souza, Jose
5 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2023-06-07 17:50 UTC (permalink / raw)
To: Thomas Hellström; +Cc: intel-xe
== Series Details ==
Series: Implement rcs/ccs missing invalidations and flushes
URL : https://patchwork.freedesktop.org/series/119033/
State : failure
== Summary ==
+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:../drivers/gpu/drm/xe/xe_ring_ops.c:8:10: fatal error: generated/xe_wa_oob.h: No such file or directory
8 | #include "generated/xe_wa_oob.h"
| ^~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[6]: *** [../scripts/Makefile.build:252: drivers/gpu/drm/xe/xe_ring_ops.o] Error 1
make[6]: *** Waiting for unfinished jobs....
make[5]: *** [../scripts/Makefile.build:494: drivers/gpu/drm/xe] Error 2
make[5]: *** Waiting for unfinished jobs....
make[4]: *** [../scripts/Makefile.build:494: drivers/gpu/drm] Error 2
make[3]: *** [../scripts/Makefile.build:494: drivers/gpu] Error 2
make[2]: *** [../scripts/Makefile.build:494: drivers] Error 2
make[1]: *** [/kernel/Makefile:2025: .] Error 2
make: *** [Makefile:226: __sub-make] Error 2
[17:49:50] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[17:49:54] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Intel-xe] [PATCH 1/2] drm/xe: Invalidate TLB also on bind if in scratch page mode
2023-06-07 17:47 ` [Intel-xe] [PATCH 1/2] drm/xe: Invalidate TLB also on bind if in scratch page mode Thomas Hellström
@ 2023-06-07 18:01 ` Souza, Jose
2023-06-09 15:51 ` Matthew Brost
1 sibling, 0 replies; 11+ messages in thread
From: Souza, Jose @ 2023-06-07 18:01 UTC (permalink / raw)
To: intel-xe@lists.freedesktop.org, thomas.hellstrom@linux.intel.com
On Wed, 2023-06-07 at 19:47 +0200, Thomas Hellström wrote:
> For scratch table mode we need to cover the case where a scratch PTE might
> have been pre-fetched and cached and used instead of that of the newly
> bound vma.
> For compute vms, invalidate TLB globally using GuC before signalling
> bind complete. For !long-running vms, invalidate TLB at batch start.
>
> Also document how TLB invalidation works.
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
> drivers/gpu/drm/xe/regs/xe_gpu_commands.h | 1 +
> drivers/gpu/drm/xe/xe_pt.c | 17 +++++++++++++++--
> drivers/gpu/drm/xe/xe_ring_ops.c | 15 ++++++++++++---
> 3 files changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> index 0f9c5b0b8a3b..d2d41f717525 100644
> --- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> +++ b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> @@ -73,6 +73,7 @@
> #define PIPE_CONTROL_STORE_DATA_INDEX (1<<21)
> #define PIPE_CONTROL_CS_STALL (1<<20)
> #define PIPE_CONTROL_GLOBAL_SNAPSHOT_RESET (1<<19)
> +#define PIPE_CONTROL_TLB_INVALIDATE (1<<18)
> #define PIPE_CONTROL_PSD_SYNC (1<<17)
> #define PIPE_CONTROL_QW_WRITE (1<<14)
> #define PIPE_CONTROL_DEPTH_STALL (1<<13)
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index bef265715000..e817fa9fe65e 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1297,7 +1297,20 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_engine *e,
>
> xe_vm_dbg_print_entries(tile_to_xe(tile), entries, num_entries);
>
> - if (rebind && !xe_vm_no_dma_fences(vma->vm)) {
> + /*
> + * If rebind, we have to invalidate TLB on !LR vms to invalidate
> + * cached PTEs point to freed memory. on LR vms this is done
> + * automatically when the context is re-enabled by the rebind worker,
> + * or in fault mode it was invalidated on PTE zapping.
> + *
> + * If !rebind, and scratch enabled VMs, there is a chance the scratch
> + * PTE is already cached in the TLB so it needs to be invalidated.
> + * on !LR VMs this is done in the ring ops preceding a batch, but on
> + * non-faulting LR, in particular on user-space batch buffer chaining,
> + * it needs to be done here.
> + */
> + if ((rebind && !xe_vm_no_dma_fences(vm)) ||
> + (!rebind && vm->scratch_bo[tile->id] && xe_vm_in_compute_mode(vm))) {
> ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
> if (!ifence)
> return ERR_PTR(-ENOMEM);
> @@ -1313,7 +1326,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_engine *e,
> LLIST_HEAD(deferred);
>
> /* TLB invalidation must be done before signaling rebind */
> - if (rebind && !xe_vm_no_dma_fences(vma->vm)) {
> + if (ifence) {
> int err = invalidation_fence_init(tile->primary_gt, ifence, fence,
> vma);
> if (err) {
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> index 2deee7a2bb14..c20fe41c0729 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -15,6 +15,7 @@
> #include "xe_macros.h"
> #include "xe_sched_job.h"
> #include "xe_vm_types.h"
> +#include "xe_vm.h"
>
> /*
> * 3D-related flags that can't be set on _engines_ that lack access to the 3D
> @@ -107,7 +108,7 @@ static int emit_flush_invalidate(u32 flag, u32 *dw, int i)
> return i;
> }
>
> -static int emit_pipe_invalidate(u32 mask_flags, u32 *dw, int i)
> +static int emit_pipe_invalidate(u32 mask_flags, u32 extra_flags, u32 *dw, int i)
> {
> u32 flags = PIPE_CONTROL_CS_STALL |
> PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
> @@ -117,7 +118,8 @@ static int emit_pipe_invalidate(u32 mask_flags, u32 *dw, int i)
> PIPE_CONTROL_CONST_CACHE_INVALIDATE |
> PIPE_CONTROL_STATE_CACHE_INVALIDATE |
> PIPE_CONTROL_QW_WRITE |
> - PIPE_CONTROL_STORE_DATA_INDEX;
> + PIPE_CONTROL_STORE_DATA_INDEX |
> + extra_flags;
>
> flags &= ~mask_flags;
>
> @@ -250,14 +252,21 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
> struct xe_gt *gt = job->engine->gt;
> struct xe_device *xe = gt_to_xe(gt);
> bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
> + struct xe_vm *vm = job->engine->vm;
> u32 mask_flags = 0;
> + u32 extra_flags = 0;
>
> dw[i++] = preparser_disable(true);
> if (lacks_render)
> mask_flags = PIPE_CONTROL_3D_ARCH_FLAGS;
> else if (job->engine->class == XE_ENGINE_CLASS_COMPUTE)
> mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
> - i = emit_pipe_invalidate(mask_flags, dw, i);
> +
> + /* See xe_pt.c for a discussion on TLB invalidations. */
Maybe also mention the function where the comment is in xe_pt.c?
> + if (!xe_vm_no_dma_fences(vm) && vm->scratch_bo[gt_to_tile(gt)->id])
> + extra_flags = PIPE_CONTROL_TLB_INVALIDATE;
> +
> + i = emit_pipe_invalidate(mask_flags, extra_flags, dw, i);
>
> /* hsdes: 1809175790 */
> if (has_aux_ccs(xe))
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Intel-xe] [PATCH 0/2] Implement rcs/ccs missing invalidations and flushes
2023-06-07 17:47 [Intel-xe] [PATCH 0/2] Implement rcs/ccs missing invalidations and flushes Thomas Hellström
` (4 preceding siblings ...)
2023-06-07 17:50 ` [Intel-xe] ✗ CI.KUnit: failure " Patchwork
@ 2023-06-07 18:03 ` Souza, Jose
5 siblings, 0 replies; 11+ messages in thread
From: Souza, Jose @ 2023-06-07 18:03 UTC (permalink / raw)
To: intel-xe@lists.freedesktop.org, thomas.hellstrom@linux.intel.com
On Wed, 2023-06-07 at 19:47 +0200, Thomas Hellström wrote:
> Mesa is seeing unexpected content in some tests.
> Fixing those require a TLB invalidation at batch start and a
> render cache flush at batch end.
>
> KMD also requires the latter to make sure any GPU side caches are
> flushed before handing memory over for reuse. This is implemented
> in patch 2.
>
> The former is likely due to scratch PTEs remaining in the TLB after a
> prefetch or similar. We could discuss whether user-space should be
> responsible for a TLB invalidation after a VM_BIND operation, but
> patch 1 implements a TLB flush at batch start for non-LR vms with scratch
> pages. For LR vms with scratch pages the TLB flush is incoporated
> in the bind fence.
>
> The TLB invalidation can be optimized / coalesced later.
Tested-by: José Roberto de Souza <jose.souza@intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/291
>
> Thomas Hellström (2):
> drm/xe: Invalidate TLB also on bind if in scratch page mode
> drm/xe: Emit a render cache flush after each rcs/ccs batch
>
> drivers/gpu/drm/xe/regs/xe_gpu_commands.h | 4 ++
> drivers/gpu/drm/xe/xe_pt.c | 17 +++++++-
> drivers/gpu/drm/xe/xe_ring_ops.c | 50 +++++++++++++++++++++--
> drivers/gpu/drm/xe/xe_wa_oob.rules | 1 +
> 4 files changed, 67 insertions(+), 5 deletions(-)
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Intel-xe] [PATCH 2/2] drm/xe: Emit a render cache flush after each rcs/ccs batch
2023-06-07 17:47 ` [Intel-xe] [PATCH 2/2] drm/xe: Emit a render cache flush after each rcs/ccs batch Thomas Hellström
@ 2023-06-07 18:44 ` Souza, Jose
0 siblings, 0 replies; 11+ messages in thread
From: Souza, Jose @ 2023-06-07 18:44 UTC (permalink / raw)
To: intel-xe@lists.freedesktop.org, thomas.hellstrom@linux.intel.com
On Wed, 2023-06-07 at 19:47 +0200, Thomas Hellström wrote:
> We need to flush render caches before fence signalling, where we might
> release the memory for reuse. We can't rely on userspace doing this,
> so flush render caches after the batch, but before user fence- and
> dma_fence signalling.
>
> Copy the cache flush from i915, but omit PIPE_CONTROL_FLUSH_L3, since it
> should be implied by the other flushes. Also omit
> PIPE_CONTROL_TLB_INVALIDATE since there should be no apparent need to
> invalidate TLB after batch completion.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
> drivers/gpu/drm/xe/regs/xe_gpu_commands.h | 3 ++
> drivers/gpu/drm/xe/xe_ring_ops.c | 35 +++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_wa_oob.rules | 1 +
> 3 files changed, 39 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> index d2d41f717525..dd3408fd3d33 100644
> --- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> +++ b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> @@ -66,6 +66,9 @@
> #define PVC_MS_MOCS_INDEX_MASK GENMASK(6, 1)
>
> #define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
> +
> +#define PIPE_CONTROL0_HDC_PIPELINE_FLUSH REG_BIT(9) /* gen12 */
> +
> #define PIPE_CONTROL_COMMAND_CACHE_INVALIDATE (1<<29)
> #define PIPE_CONTROL_TILE_CACHE_FLUSH (1<<28)
> #define PIPE_CONTROL_AMFS_FLUSH (1<<25)
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> index c20fe41c0729..91511f72d971 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -5,6 +5,7 @@
>
> #include "xe_ring_ops.h"
>
> +#include "generated/xe_wa_oob.h"
> #include "regs/xe_gpu_commands.h"
> #include "regs/xe_gt_regs.h"
> #include "regs/xe_lrc_layout.h"
> @@ -16,6 +17,7 @@
> #include "xe_sched_job.h"
> #include "xe_vm_types.h"
> #include "xe_vm.h"
> +#include "xe_wa.h"
>
> /*
> * 3D-related flags that can't be set on _engines_ that lack access to the 3D
> @@ -147,6 +149,37 @@ static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
> return i;
> }
>
> +static int emit_render_cache_flush(struct xe_sched_job *job, u32 *dw, int i)
> +{
> + struct xe_gt *gt = job->engine->gt;
> + bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
> + u32 flags;
> +
> + flags = (PIPE_CONTROL_CS_STALL |
> + PIPE_CONTROL_TILE_CACHE_FLUSH |
> + PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
> + PIPE_CONTROL_DEPTH_CACHE_FLUSH |
> + PIPE_CONTROL_DC_FLUSH_ENABLE |
> + PIPE_CONTROL_FLUSH_ENABLE);
> +
> + if (XE_WA(gt, 1409600907))
> + flags |= PIPE_CONTROL_DEPTH_STALL;
> +
> + if (lacks_render)
> + flags &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
> + else if (job->engine->class == XE_ENGINE_CLASS_COMPUTE)
> + flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
> +
> + dw[i++] = GFX_OP_PIPE_CONTROL(6) | PIPE_CONTROL0_HDC_PIPELINE_FLUSH;
> + dw[i++] = flags;
> + dw[i++] = 0;
> + dw[i++] = 0;
> + dw[i++] = 0;
> + dw[i++] = 0;
> +
> + return i;
> +}
> +
> static int emit_pipe_imm_ggtt(u32 addr, u32 value, bool stall_only, u32 *dw,
> int i)
> {
> @@ -279,6 +312,8 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
>
> i = emit_bb_start(batch_addr, ppgtt_flag, dw, i);
>
> + i = emit_render_cache_flush(job, dw, i);
> +
> if (job->user_fence.used)
> i = emit_store_imm_ppgtt_posted(job->user_fence.addr,
> job->user_fence.value,
> diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules
> index 1ecb10390b28..15c23813398a 100644
> --- a/drivers/gpu/drm/xe/xe_wa_oob.rules
> +++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
> @@ -14,3 +14,4 @@
> SUBPLATFORM(DG2, G12)
> 18020744125 PLATFORM(PVC)
> 1509372804 PLATFORM(PVC), GRAPHICS_STEP(A0, C0)
> +1409600907 GRAPHICS_VERSION_RANGE(1200, 1250)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Intel-xe] [PATCH 1/2] drm/xe: Invalidate TLB also on bind if in scratch page mode
2023-06-07 17:47 ` [Intel-xe] [PATCH 1/2] drm/xe: Invalidate TLB also on bind if in scratch page mode Thomas Hellström
2023-06-07 18:01 ` Souza, Jose
@ 2023-06-09 15:51 ` Matthew Brost
2023-06-09 15:54 ` Souza, Jose
1 sibling, 1 reply; 11+ messages in thread
From: Matthew Brost @ 2023-06-09 15:51 UTC (permalink / raw)
To: Thomas Hellström; +Cc: intel-xe
On Wed, Jun 07, 2023 at 07:47:28PM +0200, Thomas Hellström wrote:
> For scratch table mode we need to cover the case where a scratch PTE might
> have been pre-fetched and cached and used instead of that of the newly
> bound vma.
> For compute vms, invalidate TLB globally using GuC before signalling
> bind complete. For !long-running vms, invalidate TLB at batch start.
>
> Also document how TLB invalidation works.
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
> drivers/gpu/drm/xe/regs/xe_gpu_commands.h | 1 +
> drivers/gpu/drm/xe/xe_pt.c | 17 +++++++++++++++--
> drivers/gpu/drm/xe/xe_ring_ops.c | 15 ++++++++++++---
> 3 files changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> index 0f9c5b0b8a3b..d2d41f717525 100644
> --- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> +++ b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> @@ -73,6 +73,7 @@
> #define PIPE_CONTROL_STORE_DATA_INDEX (1<<21)
> #define PIPE_CONTROL_CS_STALL (1<<20)
> #define PIPE_CONTROL_GLOBAL_SNAPSHOT_RESET (1<<19)
> +#define PIPE_CONTROL_TLB_INVALIDATE (1<<18)
> #define PIPE_CONTROL_PSD_SYNC (1<<17)
> #define PIPE_CONTROL_QW_WRITE (1<<14)
> #define PIPE_CONTROL_DEPTH_STALL (1<<13)
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index bef265715000..e817fa9fe65e 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1297,7 +1297,20 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_engine *e,
>
> xe_vm_dbg_print_entries(tile_to_xe(tile), entries, num_entries);
>
> - if (rebind && !xe_vm_no_dma_fences(vma->vm)) {
> + /*
> + * If rebind, we have to invalidate TLB on !LR vms to invalidate
> + * cached PTEs point to freed memory. on LR vms this is done
> + * automatically when the context is re-enabled by the rebind worker,
> + * or in fault mode it was invalidated on PTE zapping.
> + *
> + * If !rebind, and scratch enabled VMs, there is a chance the scratch
> + * PTE is already cached in the TLB so it needs to be invalidated.
> + * on !LR VMs this is done in the ring ops preceding a batch, but on
> + * non-faulting LR, in particular on user-space batch buffer chaining,
> + * it needs to be done here.
> + */
> + if ((rebind && !xe_vm_no_dma_fences(vm)) ||
> + (!rebind && vm->scratch_bo[tile->id] && xe_vm_in_compute_mode(vm))) {
> ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
> if (!ifence)
> return ERR_PTR(-ENOMEM);
> @@ -1313,7 +1326,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_engine *e,
> LLIST_HEAD(deferred);
>
> /* TLB invalidation must be done before signaling rebind */
> - if (rebind && !xe_vm_no_dma_fences(vma->vm)) {
> + if (ifence) {
> int err = invalidation_fence_init(tile->primary_gt, ifence, fence,
> vma);
> if (err) {
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> index 2deee7a2bb14..c20fe41c0729 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -15,6 +15,7 @@
> #include "xe_macros.h"
> #include "xe_sched_job.h"
> #include "xe_vm_types.h"
> +#include "xe_vm.h"
>
> /*
> * 3D-related flags that can't be set on _engines_ that lack access to the 3D
> @@ -107,7 +108,7 @@ static int emit_flush_invalidate(u32 flag, u32 *dw, int i)
> return i;
> }
>
> -static int emit_pipe_invalidate(u32 mask_flags, u32 *dw, int i)
> +static int emit_pipe_invalidate(u32 mask_flags, u32 extra_flags, u32 *dw, int i)
> {
> u32 flags = PIPE_CONTROL_CS_STALL |
> PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
> @@ -117,7 +118,8 @@ static int emit_pipe_invalidate(u32 mask_flags, u32 *dw, int i)
> PIPE_CONTROL_CONST_CACHE_INVALIDATE |
> PIPE_CONTROL_STATE_CACHE_INVALIDATE |
> PIPE_CONTROL_QW_WRITE |
> - PIPE_CONTROL_STORE_DATA_INDEX;
> + PIPE_CONTROL_STORE_DATA_INDEX |
> + extra_flags;
>
> flags &= ~mask_flags;
>
> @@ -250,14 +252,21 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
> struct xe_gt *gt = job->engine->gt;
> struct xe_device *xe = gt_to_xe(gt);
> bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
> + struct xe_vm *vm = job->engine->vm;
> u32 mask_flags = 0;
> + u32 extra_flags = 0;
>
> dw[i++] = preparser_disable(true);
> if (lacks_render)
> mask_flags = PIPE_CONTROL_3D_ARCH_FLAGS;
> else if (job->engine->class == XE_ENGINE_CLASS_COMPUTE)
> mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
> - i = emit_pipe_invalidate(mask_flags, dw, i);
> +
> + /* See xe_pt.c for a discussion on TLB invalidations. */
> + if (!xe_vm_no_dma_fences(vm) && vm->scratch_bo[gt_to_tile(gt)->id])
> + extra_flags = PIPE_CONTROL_TLB_INVALIDATE;
I think we need a similar if statement + emit_flush_invalidate call in
the functions that emit jobs for different classes too, right?
e.g. emit_job_gen12_copy, emit_job_gen12_video
Matt
> +
> + i = emit_pipe_invalidate(mask_flags, extra_flags, dw, i);
>
> /* hsdes: 1809175790 */
> if (has_aux_ccs(xe))
> --
> 2.39.2
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Intel-xe] [PATCH 1/2] drm/xe: Invalidate TLB also on bind if in scratch page mode
2023-06-09 15:51 ` Matthew Brost
@ 2023-06-09 15:54 ` Souza, Jose
0 siblings, 0 replies; 11+ messages in thread
From: Souza, Jose @ 2023-06-09 15:54 UTC (permalink / raw)
To: Brost, Matthew, thomas.hellstrom@linux.intel.com
Cc: intel-xe@lists.freedesktop.org
On Fri, 2023-06-09 at 15:51 +0000, Matthew Brost wrote:
> On Wed, Jun 07, 2023 at 07:47:28PM +0200, Thomas Hellström wrote:
> > For scratch table mode we need to cover the case where a scratch PTE might
> > have been pre-fetched and cached and used instead of that of the newly
> > bound vma.
> > For compute vms, invalidate TLB globally using GuC before signalling
> > bind complete. For !long-running vms, invalidate TLB at batch start.
> >
> > Also document how TLB invalidation works.
> >
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> > drivers/gpu/drm/xe/regs/xe_gpu_commands.h | 1 +
> > drivers/gpu/drm/xe/xe_pt.c | 17 +++++++++++++++--
> > drivers/gpu/drm/xe/xe_ring_ops.c | 15 ++++++++++++---
> > 3 files changed, 28 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > index 0f9c5b0b8a3b..d2d41f717525 100644
> > --- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > +++ b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > @@ -73,6 +73,7 @@
> > #define PIPE_CONTROL_STORE_DATA_INDEX (1<<21)
> > #define PIPE_CONTROL_CS_STALL (1<<20)
> > #define PIPE_CONTROL_GLOBAL_SNAPSHOT_RESET (1<<19)
> > +#define PIPE_CONTROL_TLB_INVALIDATE (1<<18)
> > #define PIPE_CONTROL_PSD_SYNC (1<<17)
> > #define PIPE_CONTROL_QW_WRITE (1<<14)
> > #define PIPE_CONTROL_DEPTH_STALL (1<<13)
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> > index bef265715000..e817fa9fe65e 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -1297,7 +1297,20 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_engine *e,
> >
> > xe_vm_dbg_print_entries(tile_to_xe(tile), entries, num_entries);
> >
> > - if (rebind && !xe_vm_no_dma_fences(vma->vm)) {
> > + /*
> > + * If rebind, we have to invalidate TLB on !LR vms to invalidate
> > + * cached PTEs point to freed memory. on LR vms this is done
> > + * automatically when the context is re-enabled by the rebind worker,
> > + * or in fault mode it was invalidated on PTE zapping.
> > + *
> > + * If !rebind, and scratch enabled VMs, there is a chance the scratch
> > + * PTE is already cached in the TLB so it needs to be invalidated.
> > + * on !LR VMs this is done in the ring ops preceding a batch, but on
> > + * non-faulting LR, in particular on user-space batch buffer chaining,
> > + * it needs to be done here.
> > + */
> > + if ((rebind && !xe_vm_no_dma_fences(vm)) ||
> > + (!rebind && vm->scratch_bo[tile->id] && xe_vm_in_compute_mode(vm))) {
> > ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
> > if (!ifence)
> > return ERR_PTR(-ENOMEM);
> > @@ -1313,7 +1326,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_engine *e,
> > LLIST_HEAD(deferred);
> >
> > /* TLB invalidation must be done before signaling rebind */
> > - if (rebind && !xe_vm_no_dma_fences(vma->vm)) {
> > + if (ifence) {
> > int err = invalidation_fence_init(tile->primary_gt, ifence, fence,
> > vma);
> > if (err) {
> > diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> > index 2deee7a2bb14..c20fe41c0729 100644
> > --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> > +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> > @@ -15,6 +15,7 @@
> > #include "xe_macros.h"
> > #include "xe_sched_job.h"
> > #include "xe_vm_types.h"
> > +#include "xe_vm.h"
> >
> > /*
> > * 3D-related flags that can't be set on _engines_ that lack access to the 3D
> > @@ -107,7 +108,7 @@ static int emit_flush_invalidate(u32 flag, u32 *dw, int i)
> > return i;
> > }
> >
> > -static int emit_pipe_invalidate(u32 mask_flags, u32 *dw, int i)
> > +static int emit_pipe_invalidate(u32 mask_flags, u32 extra_flags, u32 *dw, int i)
> > {
> > u32 flags = PIPE_CONTROL_CS_STALL |
> > PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
> > @@ -117,7 +118,8 @@ static int emit_pipe_invalidate(u32 mask_flags, u32 *dw, int i)
> > PIPE_CONTROL_CONST_CACHE_INVALIDATE |
> > PIPE_CONTROL_STATE_CACHE_INVALIDATE |
> > PIPE_CONTROL_QW_WRITE |
> > - PIPE_CONTROL_STORE_DATA_INDEX;
> > + PIPE_CONTROL_STORE_DATA_INDEX |
> > + extra_flags;
> >
> > flags &= ~mask_flags;
> >
> > @@ -250,14 +252,21 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
> > struct xe_gt *gt = job->engine->gt;
> > struct xe_device *xe = gt_to_xe(gt);
> > bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
> > + struct xe_vm *vm = job->engine->vm;
> > u32 mask_flags = 0;
> > + u32 extra_flags = 0;
> >
> > dw[i++] = preparser_disable(true);
> > if (lacks_render)
> > mask_flags = PIPE_CONTROL_3D_ARCH_FLAGS;
> > else if (job->engine->class == XE_ENGINE_CLASS_COMPUTE)
> > mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
> > - i = emit_pipe_invalidate(mask_flags, dw, i);
> > +
> > + /* See xe_pt.c for a discussion on TLB invalidations. */
> > + if (!xe_vm_no_dma_fences(vm) && vm->scratch_bo[gt_to_tile(gt)->id])
> > + extra_flags = PIPE_CONTROL_TLB_INVALIDATE;
>
> I think we need a similar if statement + emit_flush_invalidate call in
> the functions that emit jobs for different classes too, right?
Handled in the new version: https://patchwork.freedesktop.org/series/119124/
>
> e.g. emit_job_gen12_copy, emit_job_gen12_video
>
> Matt
>
> > +
> > + i = emit_pipe_invalidate(mask_flags, extra_flags, dw, i);
> >
> > /* hsdes: 1809175790 */
> > if (has_aux_ccs(xe))
> > --
> > 2.39.2
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-06-09 15:55 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-07 17:47 [Intel-xe] [PATCH 0/2] Implement rcs/ccs missing invalidations and flushes Thomas Hellström
2023-06-07 17:47 ` [Intel-xe] [PATCH 1/2] drm/xe: Invalidate TLB also on bind if in scratch page mode Thomas Hellström
2023-06-07 18:01 ` Souza, Jose
2023-06-09 15:51 ` Matthew Brost
2023-06-09 15:54 ` Souza, Jose
2023-06-07 17:47 ` [Intel-xe] [PATCH 2/2] drm/xe: Emit a render cache flush after each rcs/ccs batch Thomas Hellström
2023-06-07 18:44 ` Souza, Jose
2023-06-07 17:49 ` [Intel-xe] ✓ CI.Patch_applied: success for Implement rcs/ccs missing invalidations and flushes Patchwork
2023-06-07 17:49 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-06-07 17:50 ` [Intel-xe] ✗ CI.KUnit: failure " Patchwork
2023-06-07 18:03 ` [Intel-xe] [PATCH 0/2] " Souza, Jose
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.