* [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations
@ 2025-07-02 23:42 Matthew Brost
2025-07-02 23:42 ` [PATCH v2 1/9] drm/xe: Explicitly mark migration queues with flag Matthew Brost
` (13 more replies)
0 siblings, 14 replies; 45+ messages in thread
From: Matthew Brost @ 2025-07-02 23:42 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.auld, maarten.lankhorst
Use the DRM scheduler for delayed GT TLB invalidations, which properly
fixes the issue raised in [1]. GT TLB fences have their own dma-fence
context, so even if the invalidations are ordered, the dma-resv/DRM
scheduler cannot squash the fences. This results in O(M*N*N) complexity
in the garbage collector, where M is the number of ranges in the garbage
collector and N is the number of pending GT TLB invalidations. After
this change, the resulting complexity in O(M*C) where C is the number of
TLB invalidation contexts (i.e., number of exec queue, GTs tuples) with
an invalidation in flight.
Admittedly, it's quite a lot of code, but the series includes extensive
kernel documentation and clear code comments. It introduces a generic
dependency scheduler that can be reused in the future and is logically
much cleaner than the previous open-coded solution for delaying GT TLB
invalidations until a bind job completes.
v2:
- Various cleanup covered in detail in change logs
- Use a per-GT ordered workqueue as DRM scheduler workqueue
- Remove unused ftrace points
Matt
[1] https://patchwork.freedesktop.org/patch/658370/?series=150188&rev=1
Matthew Brost (9):
drm/xe: Explicitly mark migration queues with flag
drm/xe: Add generic dependecy jobs / scheduler
drm: Simplify drmm_alloc_ordered_workqueue return
drm/xe: Create ordered workqueue for GT TLB invalidation jobs
drm/xe: Add dependency scheduler for GT TLB invalidations to bind
queues
drm/xe: Add xe_migrate_job_lock/unlock helpers
drm/xe: Add GT TLB invalidation jobs
drm/xe: Use GT TLB invalidation jobs in PT layer
drm/xe: Remove unused GT TLB invalidation trace points
drivers/gpu/drm/vkms/vkms_crtc.c | 2 -
drivers/gpu/drm/xe/Makefile | 2 +
drivers/gpu/drm/xe/xe_dep_job_types.h | 29 +++
drivers/gpu/drm/xe/xe_dep_scheduler.c | 145 +++++++++++
drivers/gpu/drm/xe/xe_dep_scheduler.h | 21 ++
drivers/gpu/drm/xe/xe_exec_queue.c | 48 ++++
drivers/gpu/drm/xe/xe_exec_queue_types.h | 15 ++
drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c | 271 ++++++++++++++++++++
drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h | 34 +++
drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 8 +
drivers/gpu/drm/xe/xe_gt_types.h | 2 +
drivers/gpu/drm/xe/xe_migrate.c | 42 ++-
drivers/gpu/drm/xe/xe_migrate.h | 13 +
drivers/gpu/drm/xe/xe_pt.c | 178 +++++--------
drivers/gpu/drm/xe/xe_trace.h | 16 --
include/drm/drm_managed.h | 15 +-
16 files changed, 712 insertions(+), 129 deletions(-)
create mode 100644 drivers/gpu/drm/xe/xe_dep_job_types.h
create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.c
create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.h
create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
--
2.34.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH v2 1/9] drm/xe: Explicitly mark migration queues with flag
2025-07-02 23:42 [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations Matthew Brost
@ 2025-07-02 23:42 ` Matthew Brost
2025-07-10 8:43 ` Francois Dugast
2025-07-11 21:20 ` Summers, Stuart
2025-07-02 23:42 ` [PATCH v2 2/9] drm/xe: Add generic dependecy jobs / scheduler Matthew Brost
` (12 subsequent siblings)
13 siblings, 2 replies; 45+ messages in thread
From: Matthew Brost @ 2025-07-02 23:42 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.auld, maarten.lankhorst
Rather than inferring if an exec queue is a migration queue for a flag,
explicitly mark migration queues with a flag.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 ++
drivers/gpu/drm/xe/xe_migrate.c | 6 ++++--
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index cc1cffb5c87f..abdf4a57e6e2 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -87,6 +87,8 @@ struct xe_exec_queue {
#define EXEC_QUEUE_FLAG_HIGH_PRIORITY BIT(4)
/* flag to indicate low latency hint to guc */
#define EXEC_QUEUE_FLAG_LOW_LATENCY BIT(5)
+/* for migration (kernel copy, clear, bind) jobs */
+#define EXEC_QUEUE_FLAG_MIGRATE BIT(6)
/**
* @flags: flags for this exec queue, should statically setup aside from ban
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 0838582537e8..b5f85162b9ed 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -437,12 +437,14 @@ struct xe_migrate *xe_migrate_init(struct xe_tile *tile)
m->q = xe_exec_queue_create(xe, vm, logical_mask, 1, hwe,
EXEC_QUEUE_FLAG_KERNEL |
EXEC_QUEUE_FLAG_PERMANENT |
- EXEC_QUEUE_FLAG_HIGH_PRIORITY, 0);
+ EXEC_QUEUE_FLAG_HIGH_PRIORITY |
+ EXEC_QUEUE_FLAG_MIGRATE, 0);
} else {
m->q = xe_exec_queue_create_class(xe, primary_gt, vm,
XE_ENGINE_CLASS_COPY,
EXEC_QUEUE_FLAG_KERNEL |
- EXEC_QUEUE_FLAG_PERMANENT, 0);
+ EXEC_QUEUE_FLAG_PERMANENT |
+ EXEC_QUEUE_FLAG_MIGRATE, 0);
}
if (IS_ERR(m->q)) {
xe_vm_close_and_put(vm);
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 2/9] drm/xe: Add generic dependecy jobs / scheduler
2025-07-02 23:42 [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations Matthew Brost
2025-07-02 23:42 ` [PATCH v2 1/9] drm/xe: Explicitly mark migration queues with flag Matthew Brost
@ 2025-07-02 23:42 ` Matthew Brost
2025-07-10 11:51 ` Francois Dugast
2025-07-15 21:04 ` Summers, Stuart
2025-07-02 23:42 ` [PATCH v2 3/9] drm: Simplify drmm_alloc_ordered_workqueue return Matthew Brost
` (11 subsequent siblings)
13 siblings, 2 replies; 45+ messages in thread
From: Matthew Brost @ 2025-07-02 23:42 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.auld, maarten.lankhorst
Add generic dependecy jobs / scheduler which serves as wrapper for DRM
scheduler. Useful when we want delay a generic operation until a
dma-fence signals.
Existing use cases could be destroying of resources based fences /
dma-resv, the preempt rebind worker, and pipelined GT TLB invalidations.
Written in such a way it could be moved to DRM subsystem if needed.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/xe/Makefile | 1 +
drivers/gpu/drm/xe/xe_dep_job_types.h | 29 ++++++
drivers/gpu/drm/xe/xe_dep_scheduler.c | 145 ++++++++++++++++++++++++++
drivers/gpu/drm/xe/xe_dep_scheduler.h | 21 ++++
4 files changed, 196 insertions(+)
create mode 100644 drivers/gpu/drm/xe/xe_dep_job_types.h
create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.c
create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.h
diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 1d97e5b63f4e..0edcfc770c0d 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -28,6 +28,7 @@ $(obj)/generated/%_wa_oob.c $(obj)/generated/%_wa_oob.h: $(obj)/xe_gen_wa_oob \
xe-y += xe_bb.o \
xe_bo.o \
xe_bo_evict.o \
+ xe_dep_scheduler.o \
xe_devcoredump.o \
xe_device.o \
xe_device_sysfs.o \
diff --git a/drivers/gpu/drm/xe/xe_dep_job_types.h b/drivers/gpu/drm/xe/xe_dep_job_types.h
new file mode 100644
index 000000000000..c6a484f24c8c
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_dep_job_types.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#ifndef _XE_DEP_JOB_TYPES_H_
+#define _XE_DEP_JOB_TYPES_H_
+
+#include <drm/gpu_scheduler.h>
+
+struct xe_dep_job;
+
+/** struct xe_dep_job_ops - Generic Xe dependency job operations */
+struct xe_dep_job_ops {
+ /** @run_job: Run generic Xe dependency job */
+ struct dma_fence *(*run_job)(struct xe_dep_job *job);
+ /** @free_job: Free generic Xe dependency job */
+ void (*free_job)(struct xe_dep_job *job);
+};
+
+/** struct xe_dep_job - Generic dependency Xe job */
+struct xe_dep_job {
+ /** @drm: base DRM scheduler job */
+ struct drm_sched_job drm;
+ /** @ops: dependency job operations */
+ const struct xe_dep_job_ops *ops;
+};
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.c b/drivers/gpu/drm/xe/xe_dep_scheduler.c
new file mode 100644
index 000000000000..fbd55577d787
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_dep_scheduler.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#include <linux/slab.h>
+
+#include <drm/gpu_scheduler.h>
+
+#include "xe_dep_job_types.h"
+#include "xe_dep_scheduler.h"
+#include "xe_device_types.h"
+
+/**
+ * DOC: Xe Dependency Scheduler
+ *
+ * The Xe dependency scheduler is a simple wrapper built around the DRM
+ * scheduler to execute jobs once their dependencies are resolved (i.e., all
+ * input fences specified as dependencies are signaled). The jobs that are
+ * executed contain virtual functions to run (execute) and free the job,
+ * allowing a single dependency scheduler to handle jobs performing different
+ * operations.
+ *
+ * Example use cases include deferred resource freeing, TLB invalidations after
+ * bind jobs, etc.
+ */
+
+/** struct xe_dep_scheduler - Generic Xe dependency scheduler */
+struct xe_dep_scheduler {
+ /** @sched: DRM GPU scheduler */
+ struct drm_gpu_scheduler sched;
+ /** @entity: DRM scheduler entity */
+ struct drm_sched_entity entity;
+ /** @rcu: For safe freeing of exported dma fences */
+ struct rcu_head rcu;
+};
+
+static struct dma_fence *xe_dep_scheduler_run_job(struct drm_sched_job *drm_job)
+{
+ struct xe_dep_job *dep_job =
+ container_of(drm_job, typeof(*dep_job), drm);
+
+ return dep_job->ops->run_job(dep_job);
+}
+
+static void xe_dep_scheduler_free_job(struct drm_sched_job *drm_job)
+{
+ struct xe_dep_job *dep_job =
+ container_of(drm_job, typeof(*dep_job), drm);
+
+ dep_job->ops->free_job(dep_job);
+}
+
+static const struct drm_sched_backend_ops sched_ops = {
+ .run_job = xe_dep_scheduler_run_job,
+ .free_job = xe_dep_scheduler_free_job,
+};
+
+/**
+ * xe_dep_scheduler_create() - Generic Xe dependency scheduler create
+ * @xe: Xe device
+ * @submit_wq: Submit workqueue struct (can be NULL)
+ * @name: Name of dependency scheduler
+ * @job_limit: Max dependency jobs that can be scheduled
+ *
+ * Create a generic Xe dependency scheduler and initialize internal DRM
+ * scheduler objects.
+ *
+ * Return: Generic Xe dependency scheduler object or ERR_PTR
+ */
+struct xe_dep_scheduler *
+xe_dep_scheduler_create(struct xe_device *xe,
+ struct workqueue_struct *submit_wq,
+ const char *name, u32 job_limit)
+{
+ struct xe_dep_scheduler *dep_scheduler;
+ struct drm_gpu_scheduler *sched;
+ const struct drm_sched_init_args args = {
+ .ops = &sched_ops,
+ .submit_wq = submit_wq,
+ .num_rqs = 1,
+ .credit_limit = job_limit,
+ .timeout = MAX_SCHEDULE_TIMEOUT,
+ .name = name,
+ .dev = xe->drm.dev,
+ };
+ int err;
+
+ dep_scheduler = kzalloc(sizeof(*dep_scheduler), GFP_KERNEL);
+ if (!dep_scheduler)
+ return ERR_PTR(-ENOMEM);
+
+ err = drm_sched_init(&dep_scheduler->sched, &args);
+ if (err)
+ goto err_free;
+
+ sched = &dep_scheduler->sched;
+ err = drm_sched_entity_init(&dep_scheduler->entity, 0,
+ (struct drm_gpu_scheduler **)&sched, 1,
+ NULL);
+ if (err)
+ goto err_sched;
+
+ init_rcu_head(&dep_scheduler->rcu);
+
+ return dep_scheduler;
+
+err_sched:
+ drm_sched_fini(&dep_scheduler->sched);
+err_free:
+ kfree(dep_scheduler);
+
+ return ERR_PTR(err);
+}
+
+/**
+ * xe_dep_scheduler_fini() - Generic Xe dependency scheduler finalize
+ * @dep_scheduler: Generic Xe dependency scheduler object
+ *
+ * Finalize internal DRM scheduler objects and free generic Xe dependency
+ * scheduler object
+ */
+void xe_dep_scheduler_fini(struct xe_dep_scheduler *dep_scheduler)
+{
+ drm_sched_entity_fini(&dep_scheduler->entity);
+ drm_sched_fini(&dep_scheduler->sched);
+ /*
+ * RCU free due sched being exported via DRM scheduler fences
+ * (timeline name).
+ */
+ kfree_rcu(dep_scheduler, rcu);
+}
+
+/**
+ * xe_dep_scheduler_entity() - Retrieve a generic Xe dependency scheduler
+ * DRM scheduler entity
+ * @dep_scheduler: Generic Xe dependency scheduler object
+ *
+ * Return: The generic Xe dependency scheduler's DRM scheduler entity
+ */
+struct drm_sched_entity *
+xe_dep_scheduler_entity(struct xe_dep_scheduler *dep_scheduler)
+{
+ return &dep_scheduler->entity;
+}
diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.h b/drivers/gpu/drm/xe/xe_dep_scheduler.h
new file mode 100644
index 000000000000..853961eec64b
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_dep_scheduler.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#include <linux/types.h>
+
+struct drm_sched_entity;
+struct workqueue_struct;
+struct xe_dep_scheduler;
+struct xe_device;
+
+struct xe_dep_scheduler *
+xe_dep_scheduler_create(struct xe_device *xe,
+ struct workqueue_struct *submit_wq,
+ const char *name, u32 job_limit);
+
+void xe_dep_scheduler_fini(struct xe_dep_scheduler *dep_scheduler);
+
+struct drm_sched_entity *
+xe_dep_scheduler_entity(struct xe_dep_scheduler *dep_scheduler);
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 3/9] drm: Simplify drmm_alloc_ordered_workqueue return
2025-07-02 23:42 [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations Matthew Brost
2025-07-02 23:42 ` [PATCH v2 1/9] drm/xe: Explicitly mark migration queues with flag Matthew Brost
2025-07-02 23:42 ` [PATCH v2 2/9] drm/xe: Add generic dependecy jobs / scheduler Matthew Brost
@ 2025-07-02 23:42 ` Matthew Brost
2025-07-16 1:10 ` Matthew Brost
2025-07-02 23:42 ` [PATCH v2 4/9] drm/xe: Create ordered workqueue for GT TLB invalidation jobs Matthew Brost
` (10 subsequent siblings)
13 siblings, 1 reply; 45+ messages in thread
From: Matthew Brost @ 2025-07-02 23:42 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.auld, maarten.lankhorst
Rather than returning ERR_PTR or NULL on failure, replace the NULL
return with ERR_PTR(-ENOMEM). This simplifies error handling at the
caller. While here, add kernel documentation for
drmm_alloc_ordered_workqueue.
Cc: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/vkms/vkms_crtc.c | 2 --
include/drm/drm_managed.h | 15 +++++++++++++--
2 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
index 8c9898b9055d..e60573e0f3e9 100644
--- a/drivers/gpu/drm/vkms/vkms_crtc.c
+++ b/drivers/gpu/drm/vkms/vkms_crtc.c
@@ -302,8 +302,6 @@ struct vkms_output *vkms_crtc_init(struct drm_device *dev, struct drm_plane *pri
vkms_out->composer_workq = drmm_alloc_ordered_workqueue(dev, "vkms_composer", 0);
if (IS_ERR(vkms_out->composer_workq))
return ERR_CAST(vkms_out->composer_workq);
- if (!vkms_out->composer_workq)
- return ERR_PTR(-ENOMEM);
return vkms_out;
}
diff --git a/include/drm/drm_managed.h b/include/drm/drm_managed.h
index 53017cc609ac..72bfac002c06 100644
--- a/include/drm/drm_managed.h
+++ b/include/drm/drm_managed.h
@@ -129,14 +129,25 @@ void __drmm_mutex_release(struct drm_device *dev, void *res);
void __drmm_workqueue_release(struct drm_device *device, void *wq);
+/**
+ * drmm_alloc_ordered_workqueue - &drm_device managed alloc_ordered_workqueue()
+ * @dev: DRM device
+ * @fmt: printf format for the name of the workqueue
+ * @flags: WQ_* flags (only WQ_FREEZABLE and WQ_MEM_RECLAIM are meaningful)
+ * @args: args for @fmt
+ *
+ * This is a &drm_device-managed version of alloc_ordered_workqueue(). The
+ * allocated workqueue is automatically destroyed on the final drm_dev_put().
+ *
+ * Returns: workqueue on success, negative ERR_PTR otherwise.
+ */
#define drmm_alloc_ordered_workqueue(dev, fmt, flags, args...) \
({ \
struct workqueue_struct *wq = alloc_ordered_workqueue(fmt, flags, ##args); \
wq ? ({ \
int ret = drmm_add_action_or_reset(dev, __drmm_workqueue_release, wq); \
ret ? ERR_PTR(ret) : wq; \
- }) : \
- wq; \
+ }) : ERR_PTR(-ENOMEM); \
})
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 4/9] drm/xe: Create ordered workqueue for GT TLB invalidation jobs
2025-07-02 23:42 [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations Matthew Brost
` (2 preceding siblings ...)
2025-07-02 23:42 ` [PATCH v2 3/9] drm: Simplify drmm_alloc_ordered_workqueue return Matthew Brost
@ 2025-07-02 23:42 ` Matthew Brost
2025-07-17 19:55 ` Summers, Stuart
2025-07-02 23:42 ` [PATCH v2 5/9] drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues Matthew Brost
` (9 subsequent siblings)
13 siblings, 1 reply; 45+ messages in thread
From: Matthew Brost @ 2025-07-02 23:42 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.auld, maarten.lankhorst
No sense to schedule GT TLB invalidation jobs in parallel which target
the same given these all contend on the same lock, create ordered
workqueue for GT TLB invalidation jobs.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 8 ++++++++
drivers/gpu/drm/xe/xe_gt_types.h | 2 ++
2 files changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 6088df8e159c..f6f32600e8a5 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -3,6 +3,8 @@
* Copyright © 2023 Intel Corporation
*/
+#include <drm/drm_managed.h>
+
#include "xe_gt_tlb_invalidation.h"
#include "abi/guc_actions_abi.h"
@@ -123,6 +125,12 @@ int xe_gt_tlb_invalidation_init_early(struct xe_gt *gt)
INIT_DELAYED_WORK(>->tlb_invalidation.fence_tdr,
xe_gt_tlb_fence_timeout);
+ gt->tlb_invalidation.job_wq =
+ drmm_alloc_ordered_workqueue(>_to_xe(gt)->drm, "gt-tbl-inval-job-wq",
+ WQ_MEM_RECLAIM);
+ if (IS_ERR(gt->tlb_invalidation.job_wq))
+ return PTR_ERR(gt->tlb_invalidation.job_wq);
+
return 0;
}
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index 96344c604726..dfd4a16da5f0 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -210,6 +210,8 @@ struct xe_gt {
* xe_gt_tlb_fence_timeout after the timeut interval is over.
*/
struct delayed_work fence_tdr;
+ /** @wtlb_invalidation.wq: schedules GT TLB invalidation jobs */
+ struct workqueue_struct *job_wq;
/** @tlb_invalidation.lock: protects TLB invalidation fences */
spinlock_t lock;
} tlb_invalidation;
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 5/9] drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues
2025-07-02 23:42 [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations Matthew Brost
` (3 preceding siblings ...)
2025-07-02 23:42 ` [PATCH v2 4/9] drm/xe: Create ordered workqueue for GT TLB invalidation jobs Matthew Brost
@ 2025-07-02 23:42 ` Matthew Brost
2025-07-15 21:34 ` Summers, Stuart
2025-07-02 23:42 ` [PATCH v2 6/9] drm/xe: Add xe_migrate_job_lock/unlock helpers Matthew Brost
` (8 subsequent siblings)
13 siblings, 1 reply; 45+ messages in thread
From: Matthew Brost @ 2025-07-02 23:42 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.auld, maarten.lankhorst
Add a generic dependency scheduler for GT TLB invalidations, used to
schedule jobs that issue GT TLB invalidations to bind queues.
v2:
- Use shared GT TLB invalidation queue for dep scheduler
- Break allocation of dep scheduler into its own function
- Add define for max number tlb invalidations
- Skip media if not present
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/xe/xe_exec_queue.c | 48 ++++++++++++++++++++++++
drivers/gpu/drm/xe/xe_exec_queue_types.h | 13 +++++++
2 files changed, 61 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index fee22358cc09..7aaf669cf5fc 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -12,6 +12,7 @@
#include <drm/drm_file.h>
#include <uapi/drm/xe_drm.h>
+#include "xe_dep_scheduler.h"
#include "xe_device.h"
#include "xe_gt.h"
#include "xe_hw_engine_class_sysfs.h"
@@ -39,6 +40,12 @@ static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue
static void __xe_exec_queue_free(struct xe_exec_queue *q)
{
+ int i;
+
+ for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i)
+ if (q->tlb_inval[i].dep_scheduler)
+ xe_dep_scheduler_fini(q->tlb_inval[i].dep_scheduler);
+
if (xe_exec_queue_uses_pxp(q))
xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
if (q->vm)
@@ -50,6 +57,39 @@ static void __xe_exec_queue_free(struct xe_exec_queue *q)
kfree(q);
}
+static int alloc_dep_schedulers(struct xe_device *xe, struct xe_exec_queue *q)
+{
+ struct xe_tile *tile = gt_to_tile(q->gt);
+ int i;
+
+ for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i) {
+ struct xe_dep_scheduler *dep_scheduler;
+ struct xe_gt *gt;
+ struct workqueue_struct *wq;
+
+ if (i == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT)
+ gt = tile->primary_gt;
+ else
+ gt = tile->media_gt;
+
+ if (!gt)
+ continue;
+
+ wq = gt->tlb_invalidation.job_wq;
+
+#define MAX_TLB_INVAL_JOBS 16 /* Picking a reasonable value */
+ dep_scheduler = xe_dep_scheduler_create(xe, wq, q->name,
+ MAX_TLB_INVAL_JOBS);
+ if (IS_ERR(dep_scheduler))
+ return PTR_ERR(dep_scheduler);
+
+ q->tlb_inval[i].dep_scheduler = dep_scheduler;
+ }
+#undef MAX_TLB_INVAL_JOBS
+
+ return 0;
+}
+
static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
struct xe_vm *vm,
u32 logical_mask,
@@ -94,6 +134,14 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
else
q->sched_props.priority = XE_EXEC_QUEUE_PRIORITY_NORMAL;
+ if (q->flags & (EXEC_QUEUE_FLAG_MIGRATE | EXEC_QUEUE_FLAG_VM)) {
+ err = alloc_dep_schedulers(xe, q);
+ if (err) {
+ __xe_exec_queue_free(q);
+ return ERR_PTR(err);
+ }
+ }
+
if (vm)
q->vm = xe_vm_get(vm);
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index abdf4a57e6e2..ba443a497b38 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -134,6 +134,19 @@ struct xe_exec_queue {
struct list_head link;
} lr;
+#define XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT 0
+#define XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT 1
+#define XE_EXEC_QUEUE_TLB_INVAL_COUNT (XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT + 1)
+
+ /** @tlb_inval: TLB invalidations exec queue state */
+ struct {
+ /**
+ * @tlb_inval.dep_scheduler: The TLB invalidation
+ * dependency scheduler
+ */
+ struct xe_dep_scheduler *dep_scheduler;
+ } tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT];
+
/** @pxp: PXP info tracking */
struct {
/** @pxp.type: PXP session type used by this queue */
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 6/9] drm/xe: Add xe_migrate_job_lock/unlock helpers
2025-07-02 23:42 [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations Matthew Brost
` (4 preceding siblings ...)
2025-07-02 23:42 ` [PATCH v2 5/9] drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues Matthew Brost
@ 2025-07-02 23:42 ` Matthew Brost
2025-07-15 22:48 ` Summers, Stuart
2025-07-02 23:42 ` [PATCH v2 7/9] drm/xe: Add GT TLB invalidation jobs Matthew Brost
` (7 subsequent siblings)
13 siblings, 1 reply; 45+ messages in thread
From: Matthew Brost @ 2025-07-02 23:42 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.auld, maarten.lankhorst
Add xe_migrate_job_lock/unlock helpers is used ensure ordering when
issuing GT TLB invalidation jobs.
v2:
- Fix multi-line comments (checkpatch)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/xe/xe_migrate.c | 36 +++++++++++++++++++++++++++++++++
drivers/gpu/drm/xe/xe_migrate.h | 4 ++++
2 files changed, 40 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index b5f85162b9ed..1f57adcbb535 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -1917,6 +1917,42 @@ int xe_migrate_access_memory(struct xe_migrate *m, struct xe_bo *bo,
return IS_ERR(fence) ? PTR_ERR(fence) : 0;
}
+/**
+ * xe_migrate_job_lock() - Lock migrate job lock
+ * @m: The migration context.
+ * @q: Queue associated with the operation which requires a lock
+ *
+ * Lock the migrate job lock if the queue is a migration queue, otherwise
+ * assert the VM's dma-resv is held (user queue's have own locking).
+ */
+void xe_migrate_job_lock(struct xe_migrate *m, struct xe_exec_queue *q)
+{
+ bool is_migrate = q == m->q;
+
+ if (is_migrate)
+ mutex_lock(&m->job_mutex);
+ else
+ xe_vm_assert_held(q->vm); /* User queues VM's should be locked */
+}
+
+/**
+ * xe_migrate_job_unlock() - Unlock migrate job lock
+ * @m: The migration context.
+ * @q: Queue associated with the operation which requires a lock
+ *
+ * Unlock the migrate job lock if the queue is a migration queue, otherwise
+ * assert the VM's dma-resv is held (user queue's have own locking).
+ */
+void xe_migrate_job_unlock(struct xe_migrate *m, struct xe_exec_queue *q)
+{
+ bool is_migrate = q == m->q;
+
+ if (is_migrate)
+ mutex_unlock(&m->job_mutex);
+ else
+ xe_vm_assert_held(q->vm); /* User queues VM's should be locked */
+}
+
#if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
#include "tests/xe_migrate.c"
#endif
diff --git a/drivers/gpu/drm/xe/xe_migrate.h b/drivers/gpu/drm/xe/xe_migrate.h
index fb9839c1bae0..e9d83d320f8c 100644
--- a/drivers/gpu/drm/xe/xe_migrate.h
+++ b/drivers/gpu/drm/xe/xe_migrate.h
@@ -134,4 +134,8 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
void xe_migrate_wait(struct xe_migrate *m);
struct xe_exec_queue *xe_tile_migrate_exec_queue(struct xe_tile *tile);
+
+void xe_migrate_job_lock(struct xe_migrate *m, struct xe_exec_queue *q);
+void xe_migrate_job_unlock(struct xe_migrate *m, struct xe_exec_queue *q);
+
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 7/9] drm/xe: Add GT TLB invalidation jobs
2025-07-02 23:42 [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations Matthew Brost
` (5 preceding siblings ...)
2025-07-02 23:42 ` [PATCH v2 6/9] drm/xe: Add xe_migrate_job_lock/unlock helpers Matthew Brost
@ 2025-07-02 23:42 ` Matthew Brost
2025-07-15 23:09 ` Summers, Stuart
2025-07-02 23:42 ` [PATCH v2 8/9] drm/xe: Use GT TLB invalidation jobs in PT layer Matthew Brost
` (6 subsequent siblings)
13 siblings, 1 reply; 45+ messages in thread
From: Matthew Brost @ 2025-07-02 23:42 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.auld, maarten.lankhorst
Add GT TLB invalidation jobs which issue GT TLB invalidations. Built on
top of Xe generic dependency scheduler.
v2:
- Fix checkpatch
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/xe/Makefile | 1 +
drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c | 271 +++++++++++++++++++++++
drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h | 34 +++
3 files changed, 306 insertions(+)
create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 0edcfc770c0d..5aad44a3b5fd 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -55,6 +55,7 @@ xe-y += xe_bb.o \
xe_gt_sysfs.o \
xe_gt_throttle.o \
xe_gt_tlb_invalidation.o \
+ xe_gt_tlb_inval_job.o \
xe_gt_topology.o \
xe_guc.o \
xe_guc_ads.o \
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
new file mode 100644
index 000000000000..428d20f16ec2
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
@@ -0,0 +1,271 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#include "xe_dep_job_types.h"
+#include "xe_dep_scheduler.h"
+#include "xe_exec_queue.h"
+#include "xe_gt.h"
+#include "xe_gt_tlb_invalidation.h"
+#include "xe_gt_tlb_inval_job.h"
+#include "xe_migrate.h"
+#include "xe_pm.h"
+
+/** struct xe_gt_tlb_inval_job - GT TLB invalidation job */
+struct xe_gt_tlb_inval_job {
+ /** @dep: base generic dependency Xe job */
+ struct xe_dep_job dep;
+ /** @gt: GT to invalidate */
+ struct xe_gt *gt;
+ /** @q: exec queue issuing the invalidate */
+ struct xe_exec_queue *q;
+ /** @refcount: ref count of this job */
+ struct kref refcount;
+ /**
+ * @fence: dma fence to indicate completion. 1 way relationship - job
+ * can safely reference fence, fence cannot safely reference job.
+ */
+ struct dma_fence *fence;
+ /** @start: Start address to invalidate */
+ u64 start;
+ /** @end: End address to invalidate */
+ u64 end;
+ /** @asid: Address space ID to invalidate */
+ u32 asid;
+ /** @fence_armed: Fence has been armed */
+ bool fence_armed;
+};
+
+static struct dma_fence *xe_gt_tlb_inval_job_run(struct xe_dep_job *dep_job)
+{
+ struct xe_gt_tlb_inval_job *job =
+ container_of(dep_job, typeof(*job), dep);
+ struct xe_gt_tlb_invalidation_fence *ifence =
+ container_of(job->fence, typeof(*ifence), base);
+
+ xe_gt_tlb_invalidation_range(job->gt, ifence, job->start,
+ job->end, job->asid);
+
+ return job->fence;
+}
+
+static void xe_gt_tlb_inval_job_free(struct xe_dep_job *dep_job)
+{
+ struct xe_gt_tlb_inval_job *job =
+ container_of(dep_job, typeof(*job), dep);
+
+ /* Pairs with get in xe_gt_tlb_inval_job_push */
+ xe_gt_tlb_inval_job_put(job);
+}
+
+static const struct xe_dep_job_ops dep_job_ops = {
+ .run_job = xe_gt_tlb_inval_job_run,
+ .free_job = xe_gt_tlb_inval_job_free,
+};
+
+static int xe_gt_tlb_inval_context(struct xe_gt *gt)
+{
+ return xe_gt_is_media_type(gt) ? XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT :
+ XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT;
+}
+
+/**
+ * xe_gt_tlb_inval_job_create() - GT TLB invalidation job create
+ * @gt: GT to invalidate
+ * @q: exec queue issuing the invalidate
+ * @start: Start address to invalidate
+ * @end: End address to invalidate
+ * @asid: Address space ID to invalidate
+ *
+ * Create a GT TLB invalidation job and initialize internal fields. The caller is
+ * responsible for releasing the creation reference.
+ *
+ * Return: GT TLB invalidation job object or ERR_PTR
+ */
+struct xe_gt_tlb_inval_job *xe_gt_tlb_inval_job_create(struct xe_exec_queue *q,
+ struct xe_gt *gt,
+ u64 start, u64 end,
+ u32 asid)
+{
+ struct xe_gt_tlb_inval_job *job;
+ struct xe_dep_scheduler *dep_scheduler =
+ q->tlb_inval[xe_gt_tlb_inval_context(gt)].dep_scheduler;
+ struct drm_sched_entity *entity =
+ xe_dep_scheduler_entity(dep_scheduler);
+ struct xe_gt_tlb_invalidation_fence *ifence;
+ int err;
+
+ job = kmalloc(sizeof(*job), GFP_KERNEL);
+ if (!job)
+ return ERR_PTR(-ENOMEM);
+
+ job->q = q;
+ job->gt = gt;
+ job->start = start;
+ job->end = end;
+ job->asid = asid;
+ job->fence_armed = false;
+ job->dep.ops = &dep_job_ops;
+ kref_init(&job->refcount);
+ xe_exec_queue_get(q);
+
+ ifence = kmalloc(sizeof(*ifence), GFP_KERNEL);
+ if (!ifence) {
+ err = -ENOMEM;
+ goto err_job;
+ }
+ job->fence = &ifence->base;
+
+ err = drm_sched_job_init(&job->dep.drm, entity, 1, NULL,
+ q->xef ? q->xef->drm->client_id : 0);
+ if (err)
+ goto err_fence;
+
+ xe_pm_runtime_get_noresume(gt_to_xe(job->gt));
+ return job;
+
+err_fence:
+ kfree(ifence);
+err_job:
+ xe_exec_queue_put(q);
+ kfree(job);
+
+ return ERR_PTR(err);
+}
+
+static void xe_gt_tlb_inval_job_destroy(struct kref *ref)
+{
+ struct xe_gt_tlb_inval_job *job = container_of(ref, typeof(*job),
+ refcount);
+ struct xe_gt_tlb_invalidation_fence *ifence =
+ container_of(job->fence, typeof(*ifence), base);
+ struct xe_device *xe = gt_to_xe(job->gt);
+ struct xe_exec_queue *q = job->q;
+
+ if (!job->fence_armed)
+ kfree(ifence);
+ else
+ /* Ref from xe_gt_tlb_invalidation_fence_init */
+ dma_fence_put(job->fence);
+
+ drm_sched_job_cleanup(&job->dep.drm);
+ kfree(job);
+ xe_exec_queue_put(q); /* Pairs with get from xe_gt_tlb_inval_job_create */
+ xe_pm_runtime_put(xe); /* Pairs with get from xe_gt_tlb_inval_job_create */
+}
+
+/**
+ * xe_gt_tlb_inval_alloc_dep() - GT TLB invalidation job alloc dependency
+ * @job: GT TLB invalidation job to alloc dependency for
+ *
+ * Allocate storage for a dependency in the GT TLB invalidation fence. This
+ * function should be called at most once per job and must be paired with
+ * xe_gt_tlb_inval_job_push being called with a real (non-signaled) fence.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int xe_gt_tlb_inval_job_alloc_dep(struct xe_gt_tlb_inval_job *job)
+{
+ xe_assert(gt_to_xe(job->gt), !xa_load(&job->dep.drm.dependencies, 0));
+
+ return drm_sched_job_add_dependency(&job->dep.drm,
+ dma_fence_get_stub());
+}
+
+/**
+ * xe_gt_tlb_inval_job_push() - GT TLB invalidation job push
+ * @job: GT TLB invalidation job to push
+ * @m: The migration object being used
+ * @fence: Dependency for GT TLB invalidation job
+ *
+ * Pushes a GT TLB invalidation job for execution, using @fence as a dependency.
+ * Storage for @fence must be preallocated with xe_gt_tlb_inval_job_alloc_dep
+ * prior to this call if @fence is not signaled. Takes a reference to the job’s
+ * finished fence, which the caller is responsible for releasing, and retutn it
+ * to the caller. This function is safe to be called in the path of reclaim.
+ *
+ * Return: Job's finished fence
+ */
+struct dma_fence *xe_gt_tlb_inval_job_push(struct xe_gt_tlb_inval_job *job,
+ struct xe_migrate *m,
+ struct dma_fence *fence)
+{
+ struct xe_gt_tlb_invalidation_fence *ifence =
+ container_of(job->fence, typeof(*ifence), base);
+
+ if (!dma_fence_is_signaled(fence)) {
+ void *ptr;
+
+ /*
+ * Can be in path of reclaim, hence the preallocation of fence
+ * storage in xe_gt_tlb_inval_job_alloc_dep. Verify caller did
+ * this correctly.
+ */
+ xe_assert(gt_to_xe(job->gt),
+ xa_load(&job->dep.drm.dependencies, 0) ==
+ dma_fence_get_stub());
+
+ dma_fence_get(fence); /* ref released once dependency processed by scheduler */
+ ptr = xa_store(&job->dep.drm.dependencies, 0, fence,
+ GFP_ATOMIC);
+ xe_assert(gt_to_xe(job->gt), !xa_is_err(ptr));
+ }
+
+ xe_gt_tlb_inval_job_get(job); /* Pairs with put in free_job */
+ job->fence_armed = true;
+
+ /*
+ * We need the migration lock to protect the seqnos (job and
+ * invalidation fence) and the spsc queue, only taken on migration
+ * queue, user queues protected dma-resv VM lock.
+ */
+ xe_migrate_job_lock(m, job->q);
+
+ /* Creation ref pairs with put in xe_gt_tlb_inval_job_destroy */
+ xe_gt_tlb_invalidation_fence_init(job->gt, ifence, false);
+ dma_fence_get(job->fence); /* Pairs with put in DRM scheduler */
+
+ drm_sched_job_arm(&job->dep.drm);
+ /*
+ * caller ref, get must be done before job push as it could immediately
+ * signal and free.
+ */
+ dma_fence_get(&job->dep.drm.s_fence->finished);
+ drm_sched_entity_push_job(&job->dep.drm);
+
+ xe_migrate_job_unlock(m, job->q);
+
+ /*
+ * Not using job->fence, as it has its own dma-fence context, which does
+ * not allow GT TLB invalidation fences on the same queue, GT tuple to
+ * be squashed in dma-resv/DRM scheduler. Instead, we use the DRM scheduler
+ * context and job's finished fence, which enables squashing.
+ */
+ return &job->dep.drm.s_fence->finished;
+}
+
+/**
+ * xe_gt_tlb_inval_job_get() - Get a reference to GT TLB invalidation job
+ * @job: GT TLB invalidation job object
+ *
+ * Increment the GT TLB invalidation job's reference count
+ */
+void xe_gt_tlb_inval_job_get(struct xe_gt_tlb_inval_job *job)
+{
+ kref_get(&job->refcount);
+}
+
+/**
+ * xe_gt_tlb_inval_job_put() - Put a reference to GT TLB invalidation job
+ * @job: GT TLB invalidation job object
+ *
+ * Decrement the GT TLB invalidation job's reference count, call
+ * xe_gt_tlb_inval_job_destroy when reference count == 0. Skips decrement if
+ * input @job is NULL or IS_ERR.
+ */
+void xe_gt_tlb_inval_job_put(struct xe_gt_tlb_inval_job *job)
+{
+ if (job && !IS_ERR(job))
+ kref_put(&job->refcount, xe_gt_tlb_inval_job_destroy);
+}
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
new file mode 100644
index 000000000000..883896194a34
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#ifndef _XE_GT_TLB_INVAL_JOB_H_
+#define _XE_GT_TLB_INVAL_JOB_H_
+
+#include <linux/types.h>
+
+struct dma_fence;
+struct drm_sched_job;
+struct kref;
+struct xe_exec_queue;
+struct xe_gt;
+struct xe_gt_tlb_inval_job;
+struct xe_migrate;
+
+struct xe_gt_tlb_inval_job *xe_gt_tlb_inval_job_create(struct xe_exec_queue *q,
+ struct xe_gt *gt,
+ u64 start, u64 end,
+ u32 asid);
+
+int xe_gt_tlb_inval_job_alloc_dep(struct xe_gt_tlb_inval_job *job);
+
+struct dma_fence *xe_gt_tlb_inval_job_push(struct xe_gt_tlb_inval_job *job,
+ struct xe_migrate *m,
+ struct dma_fence *fence);
+
+void xe_gt_tlb_inval_job_get(struct xe_gt_tlb_inval_job *job);
+
+void xe_gt_tlb_inval_job_put(struct xe_gt_tlb_inval_job *job);
+
+#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 8/9] drm/xe: Use GT TLB invalidation jobs in PT layer
2025-07-02 23:42 [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations Matthew Brost
` (6 preceding siblings ...)
2025-07-02 23:42 ` [PATCH v2 7/9] drm/xe: Add GT TLB invalidation jobs Matthew Brost
@ 2025-07-02 23:42 ` Matthew Brost
2025-07-17 21:00 ` Summers, Stuart
2025-07-02 23:42 ` [PATCH v2 9/9] drm/xe: Remove unused GT TLB invalidation trace points Matthew Brost
` (5 subsequent siblings)
13 siblings, 1 reply; 45+ messages in thread
From: Matthew Brost @ 2025-07-02 23:42 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.auld, maarten.lankhorst
Rather than open-coding GT TLB invalidations in the PT layer, use GT TLB
invalidation jobs. The real benefit is that GT TLB invalidation jobs use
a single dma-fence context, allowing the generated fences to be squashed
in dma-resv/DRM scheduler.
v2:
- s/;;/; (checkpatch)
- Move ijob/mjob job push after range fence install
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/xe/xe_migrate.h | 9 ++
drivers/gpu/drm/xe/xe_pt.c | 178 +++++++++++++-------------------
2 files changed, 80 insertions(+), 107 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_migrate.h b/drivers/gpu/drm/xe/xe_migrate.h
index e9d83d320f8c..605398ea773e 100644
--- a/drivers/gpu/drm/xe/xe_migrate.h
+++ b/drivers/gpu/drm/xe/xe_migrate.h
@@ -14,6 +14,7 @@ struct ttm_resource;
struct xe_bo;
struct xe_gt;
+struct xe_gt_tlb_inval_job;
struct xe_exec_queue;
struct xe_migrate;
struct xe_migrate_pt_update;
@@ -89,6 +90,14 @@ struct xe_migrate_pt_update {
struct xe_vma_ops *vops;
/** @job: The job if a GPU page-table update. NULL otherwise */
struct xe_sched_job *job;
+ /**
+ * @ijob: The GT TLB invalidation job for primary tile. NULL otherwise
+ */
+ struct xe_gt_tlb_inval_job *ijob;
+ /**
+ * @mjob: The GT TLB invalidation job for media tile. NULL otherwise
+ */
+ struct xe_gt_tlb_inval_job *mjob;
/** @tile_id: Tile ID of the update */
u8 tile_id;
};
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index c8e63bd23300..67d02307779b 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -13,7 +13,7 @@
#include "xe_drm_client.h"
#include "xe_exec_queue.h"
#include "xe_gt.h"
-#include "xe_gt_tlb_invalidation.h"
+#include "xe_gt_tlb_inval_job.h"
#include "xe_migrate.h"
#include "xe_pt_types.h"
#include "xe_pt_walk.h"
@@ -1261,6 +1261,8 @@ static int op_add_deps(struct xe_vm *vm, struct xe_vma_op *op,
}
static int xe_pt_vm_dependencies(struct xe_sched_job *job,
+ struct xe_gt_tlb_inval_job *ijob,
+ struct xe_gt_tlb_inval_job *mjob,
struct xe_vm *vm,
struct xe_vma_ops *vops,
struct xe_vm_pgtable_update_ops *pt_update_ops,
@@ -1328,6 +1330,20 @@ static int xe_pt_vm_dependencies(struct xe_sched_job *job,
for (i = 0; job && !err && i < vops->num_syncs; i++)
err = xe_sync_entry_add_deps(&vops->syncs[i], job);
+ if (job) {
+ if (ijob) {
+ err = xe_gt_tlb_inval_job_alloc_dep(ijob);
+ if (err)
+ return err;
+ }
+
+ if (mjob) {
+ err = xe_gt_tlb_inval_job_alloc_dep(mjob);
+ if (err)
+ return err;
+ }
+ }
+
return err;
}
@@ -1339,7 +1355,8 @@ static int xe_pt_pre_commit(struct xe_migrate_pt_update *pt_update)
struct xe_vm_pgtable_update_ops *pt_update_ops =
&vops->pt_update_ops[pt_update->tile_id];
- return xe_pt_vm_dependencies(pt_update->job, vm, pt_update->vops,
+ return xe_pt_vm_dependencies(pt_update->job, pt_update->ijob,
+ pt_update->mjob, vm, pt_update->vops,
pt_update_ops, rftree);
}
@@ -1509,75 +1526,6 @@ static int xe_pt_svm_pre_commit(struct xe_migrate_pt_update *pt_update)
}
#endif
-struct invalidation_fence {
- struct xe_gt_tlb_invalidation_fence base;
- struct xe_gt *gt;
- struct dma_fence *fence;
- struct dma_fence_cb cb;
- struct work_struct work;
- u64 start;
- u64 end;
- u32 asid;
-};
-
-static void invalidation_fence_cb(struct dma_fence *fence,
- struct dma_fence_cb *cb)
-{
- struct invalidation_fence *ifence =
- container_of(cb, struct invalidation_fence, cb);
- struct xe_device *xe = gt_to_xe(ifence->gt);
-
- trace_xe_gt_tlb_invalidation_fence_cb(xe, &ifence->base);
- if (!ifence->fence->error) {
- queue_work(system_wq, &ifence->work);
- } else {
- ifence->base.base.error = ifence->fence->error;
- xe_gt_tlb_invalidation_fence_signal(&ifence->base);
- }
- dma_fence_put(ifence->fence);
-}
-
-static void invalidation_fence_work_func(struct work_struct *w)
-{
- struct invalidation_fence *ifence =
- container_of(w, struct invalidation_fence, work);
- struct xe_device *xe = gt_to_xe(ifence->gt);
-
- trace_xe_gt_tlb_invalidation_fence_work_func(xe, &ifence->base);
- xe_gt_tlb_invalidation_range(ifence->gt, &ifence->base, ifence->start,
- ifence->end, ifence->asid);
-}
-
-static void invalidation_fence_init(struct xe_gt *gt,
- struct invalidation_fence *ifence,
- struct dma_fence *fence,
- u64 start, u64 end, u32 asid)
-{
- int ret;
-
- trace_xe_gt_tlb_invalidation_fence_create(gt_to_xe(gt), &ifence->base);
-
- xe_gt_tlb_invalidation_fence_init(gt, &ifence->base, false);
-
- ifence->fence = fence;
- ifence->gt = gt;
- ifence->start = start;
- ifence->end = end;
- ifence->asid = asid;
-
- INIT_WORK(&ifence->work, invalidation_fence_work_func);
- ret = dma_fence_add_callback(fence, &ifence->cb, invalidation_fence_cb);
- if (ret == -ENOENT) {
- dma_fence_put(ifence->fence); /* Usually dropped in CB */
- invalidation_fence_work_func(&ifence->work);
- } else if (ret) {
- dma_fence_put(&ifence->base.base); /* Caller ref */
- dma_fence_put(&ifence->base.base); /* Creation ref */
- }
-
- xe_gt_assert(gt, !ret || ret == -ENOENT);
-}
-
struct xe_pt_stage_unbind_walk {
/** @base: The pagewalk base-class. */
struct xe_pt_walk base;
@@ -2407,8 +2355,8 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
struct xe_vm *vm = vops->vm;
struct xe_vm_pgtable_update_ops *pt_update_ops =
&vops->pt_update_ops[tile->id];
- struct dma_fence *fence;
- struct invalidation_fence *ifence = NULL, *mfence = NULL;
+ struct dma_fence *fence, *ifence, *mfence;
+ struct xe_gt_tlb_inval_job *ijob = NULL, *mjob = NULL;
struct dma_fence **fences = NULL;
struct dma_fence_array *cf = NULL;
struct xe_range_fence *rfence;
@@ -2440,34 +2388,47 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
#endif
if (pt_update_ops->needs_invalidation) {
- ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
- if (!ifence) {
- err = -ENOMEM;
+ ijob = xe_gt_tlb_inval_job_create(pt_update_ops->q,
+ tile->primary_gt,
+ pt_update_ops->start,
+ pt_update_ops->last,
+ vm->usm.asid);
+
+ if (IS_ERR(ijob)) {
+ err = PTR_ERR(ijob);
goto kill_vm_tile1;
}
+
if (tile->media_gt) {
- mfence = kzalloc(sizeof(*ifence), GFP_KERNEL);
- if (!mfence) {
- err = -ENOMEM;
- goto free_ifence;
+ mjob = xe_gt_tlb_inval_job_create(pt_update_ops->q,
+ tile->media_gt,
+ pt_update_ops->start,
+ pt_update_ops->last,
+ vm->usm.asid);
+ if (IS_ERR(mjob)) {
+ err = PTR_ERR(mjob);
+ goto free_ijob;
}
fences = kmalloc_array(2, sizeof(*fences), GFP_KERNEL);
if (!fences) {
err = -ENOMEM;
- goto free_ifence;
+ goto free_ijob;
}
cf = dma_fence_array_alloc(2);
if (!cf) {
err = -ENOMEM;
- goto free_ifence;
+ goto free_ijob;
}
}
+
+ update.ijob = ijob;
+ update.mjob = mjob;
}
rfence = kzalloc(sizeof(*rfence), GFP_KERNEL);
if (!rfence) {
err = -ENOMEM;
- goto free_ifence;
+ goto free_ijob;
}
fence = xe_migrate_update_pgtables(tile->migrate, &update);
@@ -2491,30 +2452,30 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
pt_update_ops->last, fence))
dma_fence_wait(fence, false);
- /* tlb invalidation must be done before signaling rebind */
- if (ifence) {
- if (mfence)
- dma_fence_get(fence);
- invalidation_fence_init(tile->primary_gt, ifence, fence,
- pt_update_ops->start,
- pt_update_ops->last, vm->usm.asid);
- if (mfence) {
- invalidation_fence_init(tile->media_gt, mfence, fence,
- pt_update_ops->start,
- pt_update_ops->last, vm->usm.asid);
- fences[0] = &ifence->base.base;
- fences[1] = &mfence->base.base;
+ if (ijob) {
+ struct dma_fence *__fence;
+
+ ifence = xe_gt_tlb_inval_job_push(ijob, tile->migrate, fence);
+ __fence = ifence;
+
+ if (mjob) {
+ fences[0] = ifence;
+ mfence = xe_gt_tlb_inval_job_push(mjob, tile->migrate,
+ fence);
+ fences[1] = mfence;
+
dma_fence_array_init(cf, 2, fences,
vm->composite_fence_ctx,
vm->composite_fence_seqno++,
false);
- fence = &cf->base;
- } else {
- fence = &ifence->base.base;
+ __fence = &cf->base;
}
+
+ dma_fence_put(fence);
+ fence = __fence;
}
- if (!mfence) {
+ if (!mjob) {
dma_resv_add_fence(xe_vm_resv(vm), fence,
pt_update_ops->wait_vm_bookkeep ?
DMA_RESV_USAGE_KERNEL :
@@ -2523,19 +2484,19 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
list_for_each_entry(op, &vops->list, link)
op_commit(vops->vm, tile, pt_update_ops, op, fence, NULL);
} else {
- dma_resv_add_fence(xe_vm_resv(vm), &ifence->base.base,
+ dma_resv_add_fence(xe_vm_resv(vm), ifence,
pt_update_ops->wait_vm_bookkeep ?
DMA_RESV_USAGE_KERNEL :
DMA_RESV_USAGE_BOOKKEEP);
- dma_resv_add_fence(xe_vm_resv(vm), &mfence->base.base,
+ dma_resv_add_fence(xe_vm_resv(vm), mfence,
pt_update_ops->wait_vm_bookkeep ?
DMA_RESV_USAGE_KERNEL :
DMA_RESV_USAGE_BOOKKEEP);
list_for_each_entry(op, &vops->list, link)
- op_commit(vops->vm, tile, pt_update_ops, op,
- &ifence->base.base, &mfence->base.base);
+ op_commit(vops->vm, tile, pt_update_ops, op, ifence,
+ mfence);
}
if (pt_update_ops->needs_svm_lock)
@@ -2543,15 +2504,18 @@ xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops)
if (pt_update_ops->needs_userptr_lock)
up_read(&vm->userptr.notifier_lock);
+ xe_gt_tlb_inval_job_put(mjob);
+ xe_gt_tlb_inval_job_put(ijob);
+
return fence;
free_rfence:
kfree(rfence);
-free_ifence:
+free_ijob:
kfree(cf);
kfree(fences);
- kfree(mfence);
- kfree(ifence);
+ xe_gt_tlb_inval_job_put(mjob);
+ xe_gt_tlb_inval_job_put(ijob);
kill_vm_tile1:
if (err != -EAGAIN && err != -ENODATA && tile->id)
xe_vm_kill(vops->vm, false);
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 9/9] drm/xe: Remove unused GT TLB invalidation trace points
2025-07-02 23:42 [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations Matthew Brost
` (7 preceding siblings ...)
2025-07-02 23:42 ` [PATCH v2 8/9] drm/xe: Use GT TLB invalidation jobs in PT layer Matthew Brost
@ 2025-07-02 23:42 ` Matthew Brost
2025-07-11 21:13 ` Summers, Stuart
2025-07-03 0:45 ` ✗ CI.checkpatch: warning for Use DRM scheduler for delayed GT TLB invalidations (rev2) Patchwork
` (4 subsequent siblings)
13 siblings, 1 reply; 45+ messages in thread
From: Matthew Brost @ 2025-07-02 23:42 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.auld, maarten.lankhorst
Remove unused GT TLB invalidation trace points after converting to used
GT TLB invalidation jobs. Tracepoint removed were used during early
bring up of unstable driver, with a stable driver no need to replace
with new tracepoints.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/xe/xe_trace.h | 16 ----------------
1 file changed, 16 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index b4a3577df70c..21486a6f693a 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -45,22 +45,6 @@ DECLARE_EVENT_CLASS(xe_gt_tlb_invalidation_fence,
__get_str(dev), __entry->fence, __entry->seqno)
);
-DEFINE_EVENT(xe_gt_tlb_invalidation_fence, xe_gt_tlb_invalidation_fence_create,
- TP_PROTO(struct xe_device *xe, struct xe_gt_tlb_invalidation_fence *fence),
- TP_ARGS(xe, fence)
-);
-
-DEFINE_EVENT(xe_gt_tlb_invalidation_fence,
- xe_gt_tlb_invalidation_fence_work_func,
- TP_PROTO(struct xe_device *xe, struct xe_gt_tlb_invalidation_fence *fence),
- TP_ARGS(xe, fence)
-);
-
-DEFINE_EVENT(xe_gt_tlb_invalidation_fence, xe_gt_tlb_invalidation_fence_cb,
- TP_PROTO(struct xe_device *xe, struct xe_gt_tlb_invalidation_fence *fence),
- TP_ARGS(xe, fence)
-);
-
DEFINE_EVENT(xe_gt_tlb_invalidation_fence, xe_gt_tlb_invalidation_fence_send,
TP_PROTO(struct xe_device *xe, struct xe_gt_tlb_invalidation_fence *fence),
TP_ARGS(xe, fence)
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* ✗ CI.checkpatch: warning for Use DRM scheduler for delayed GT TLB invalidations (rev2)
2025-07-02 23:42 [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations Matthew Brost
` (8 preceding siblings ...)
2025-07-02 23:42 ` [PATCH v2 9/9] drm/xe: Remove unused GT TLB invalidation trace points Matthew Brost
@ 2025-07-03 0:45 ` Patchwork
2025-07-03 0:46 ` ✓ CI.KUnit: success " Patchwork
` (3 subsequent siblings)
13 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2025-07-03 0:45 UTC (permalink / raw)
To: Matthew Brost; +Cc: intel-xe
== Series Details ==
Series: Use DRM scheduler for delayed GT TLB invalidations (rev2)
URL : https://patchwork.freedesktop.org/series/150402/
State : warning
== Summary ==
+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
f8ff75ae1d2127635239b134695774ed4045d05b
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit ab779db6c849a20cce551a3657118e501035461e
Author: Matthew Brost <matthew.brost@intel.com>
Date: Wed Jul 2 16:42:22 2025 -0700
drm/xe: Remove unused GT TLB invalidation trace points
Remove unused GT TLB invalidation trace points after converting to used
GT TLB invalidation jobs. Tracepoint removed were used during early
bring up of unstable driver, with a stable driver no need to replace
with new tracepoints.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
+ /mt/dim checkpatch d04a54cd3b99001adbc4cd3305b44f9f3e658407 drm-intel
98c94bb20348 drm/xe: Explicitly mark migration queues with flag
07bd59ab265f drm/xe: Add generic dependecy jobs / scheduler
-:30: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#30:
new file mode 100644
total: 0 errors, 1 warnings, 0 checks, 202 lines checked
95c78be3a531 drm: Simplify drmm_alloc_ordered_workqueue return
7da0b8367f16 drm/xe: Create ordered workqueue for GT TLB invalidation jobs
f3469f546bcc drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues
b4b49a64b967 drm/xe: Add xe_migrate_job_lock/unlock helpers
7cf9477334c9 drm/xe: Add GT TLB invalidation jobs
-:31: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#31:
new file mode 100644
total: 0 errors, 1 warnings, 0 checks, 312 lines checked
adb50df92929 drm/xe: Use GT TLB invalidation jobs in PT layer
ab779db6c849 drm/xe: Remove unused GT TLB invalidation trace points
^ permalink raw reply [flat|nested] 45+ messages in thread
* ✓ CI.KUnit: success for Use DRM scheduler for delayed GT TLB invalidations (rev2)
2025-07-02 23:42 [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations Matthew Brost
` (9 preceding siblings ...)
2025-07-03 0:45 ` ✗ CI.checkpatch: warning for Use DRM scheduler for delayed GT TLB invalidations (rev2) Patchwork
@ 2025-07-03 0:46 ` Patchwork
2025-07-03 1:00 ` ✗ CI.checksparse: warning " Patchwork
` (2 subsequent siblings)
13 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2025-07-03 0:46 UTC (permalink / raw)
To: Matthew Brost; +Cc: intel-xe
== Series Details ==
Series: Use DRM scheduler for delayed GT TLB invalidations (rev2)
URL : https://patchwork.freedesktop.org/series/150402/
State : success
== Summary ==
+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[00:45:17] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[00:45:22] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[00:45:48] Starting KUnit Kernel (1/1)...
[00:45:48] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[00:45:49] ================== guc_buf (11 subtests) ===================
[00:45:49] [PASSED] test_smallest
[00:45:49] [PASSED] test_largest
[00:45:49] [PASSED] test_granular
[00:45:49] [PASSED] test_unique
[00:45:49] [PASSED] test_overlap
[00:45:49] [PASSED] test_reusable
[00:45:49] [PASSED] test_too_big
[00:45:49] [PASSED] test_flush
[00:45:49] [PASSED] test_lookup
[00:45:49] [PASSED] test_data
[00:45:49] [PASSED] test_class
[00:45:49] ===================== [PASSED] guc_buf =====================
[00:45:49] =================== guc_dbm (7 subtests) ===================
[00:45:49] [PASSED] test_empty
[00:45:49] [PASSED] test_default
[00:45:49] ======================== test_size ========================
[00:45:49] [PASSED] 4
[00:45:49] [PASSED] 8
[00:45:49] [PASSED] 32
[00:45:49] [PASSED] 256
[00:45:49] ==================== [PASSED] test_size ====================
[00:45:49] ======================= test_reuse ========================
[00:45:49] [PASSED] 4
[00:45:49] [PASSED] 8
[00:45:49] [PASSED] 32
[00:45:49] [PASSED] 256
[00:45:49] =================== [PASSED] test_reuse ====================
[00:45:49] =================== test_range_overlap ====================
[00:45:49] [PASSED] 4
[00:45:49] [PASSED] 8
[00:45:49] [PASSED] 32
[00:45:49] [PASSED] 256
[00:45:49] =============== [PASSED] test_range_overlap ================
[00:45:49] =================== test_range_compact ====================
[00:45:49] [PASSED] 4
[00:45:49] [PASSED] 8
[00:45:49] [PASSED] 32
[00:45:49] [PASSED] 256
[00:45:49] =============== [PASSED] test_range_compact ================
[00:45:49] ==================== test_range_spare =====================
[00:45:49] [PASSED] 4
[00:45:49] [PASSED] 8
[00:45:49] [PASSED] 32
[00:45:49] [PASSED] 256
[00:45:49] ================ [PASSED] test_range_spare =================
[00:45:49] ===================== [PASSED] guc_dbm =====================
[00:45:49] =================== guc_idm (6 subtests) ===================
[00:45:49] [PASSED] bad_init
[00:45:49] [PASSED] no_init
[00:45:49] [PASSED] init_fini
[00:45:49] [PASSED] check_used
[00:45:49] [PASSED] check_quota
[00:45:49] [PASSED] check_all
[00:45:49] ===================== [PASSED] guc_idm =====================
[00:45:49] ================== no_relay (3 subtests) ===================
[00:45:49] [PASSED] xe_drops_guc2pf_if_not_ready
[00:45:49] [PASSED] xe_drops_guc2vf_if_not_ready
[00:45:49] [PASSED] xe_rejects_send_if_not_ready
[00:45:49] ==================== [PASSED] no_relay =====================
[00:45:49] ================== pf_relay (14 subtests) ==================
[00:45:49] [PASSED] pf_rejects_guc2pf_too_short
[00:45:49] [PASSED] pf_rejects_guc2pf_too_long
[00:45:49] [PASSED] pf_rejects_guc2pf_no_payload
[00:45:49] [PASSED] pf_fails_no_payload
[00:45:49] [PASSED] pf_fails_bad_origin
[00:45:49] [PASSED] pf_fails_bad_type
[00:45:49] [PASSED] pf_txn_reports_error
[00:45:49] [PASSED] pf_txn_sends_pf2guc
[00:45:49] [PASSED] pf_sends_pf2guc
[00:45:49] [SKIPPED] pf_loopback_nop
[00:45:49] [SKIPPED] pf_loopback_echo
[00:45:49] [SKIPPED] pf_loopback_fail
[00:45:49] [SKIPPED] pf_loopback_busy
[00:45:49] [SKIPPED] pf_loopback_retry
[00:45:49] ==================== [PASSED] pf_relay =====================
[00:45:49] ================== vf_relay (3 subtests) ===================
[00:45:49] [PASSED] vf_rejects_guc2vf_too_short
[00:45:49] [PASSED] vf_rejects_guc2vf_too_long
[00:45:49] [PASSED] vf_rejects_guc2vf_no_payload
[00:45:49] ==================== [PASSED] vf_relay =====================
[00:45:49] ================= pf_service (11 subtests) =================
[00:45:49] [PASSED] pf_negotiate_any
[00:45:49] [PASSED] pf_negotiate_base_match
[00:45:49] [PASSED] pf_negotiate_base_newer
[00:45:49] [PASSED] pf_negotiate_base_next
[00:45:49] [SKIPPED] pf_negotiate_base_older
[00:45:49] [PASSED] pf_negotiate_base_prev
[00:45:49] [PASSED] pf_negotiate_latest_match
[00:45:49] [PASSED] pf_negotiate_latest_newer
[00:45:49] [PASSED] pf_negotiate_latest_next
[00:45:49] [SKIPPED] pf_negotiate_latest_older
[00:45:49] [SKIPPED] pf_negotiate_latest_prev
[00:45:49] =================== [PASSED] pf_service ====================
[00:45:49] ===================== lmtt (1 subtest) =====================
[00:45:49] ======================== test_ops =========================
[00:45:49] [PASSED] 2-level
[00:45:49] [PASSED] multi-level
[00:45:49] ==================== [PASSED] test_ops =====================
[00:45:49] ====================== [PASSED] lmtt =======================
[00:45:49] =================== xe_mocs (2 subtests) ===================
[00:45:49] ================ xe_live_mocs_kernel_kunit ================
[00:45:49] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[00:45:49] ================ xe_live_mocs_reset_kunit =================
[00:45:49] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[00:45:49] ==================== [SKIPPED] xe_mocs =====================
[00:45:49] ================= xe_migrate (2 subtests) ==================
[00:45:49] ================= xe_migrate_sanity_kunit =================
[00:45:49] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[00:45:49] ================== xe_validate_ccs_kunit ==================
[00:45:49] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[00:45:49] =================== [SKIPPED] xe_migrate ===================
[00:45:49] ================== xe_dma_buf (1 subtest) ==================
[00:45:49] ==================== xe_dma_buf_kunit =====================
[00:45:49] ================ [SKIPPED] xe_dma_buf_kunit ================
[00:45:49] =================== [SKIPPED] xe_dma_buf ===================
[00:45:49] ================= xe_bo_shrink (1 subtest) =================
[00:45:49] =================== xe_bo_shrink_kunit ====================
[00:45:49] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[00:45:49] ================== [SKIPPED] xe_bo_shrink ==================
[00:45:49] ==================== xe_bo (2 subtests) ====================
[00:45:49] ================== xe_ccs_migrate_kunit ===================
[00:45:49] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[00:45:49] ==================== xe_bo_evict_kunit ====================
[00:45:49] =============== [SKIPPED] xe_bo_evict_kunit ================
[00:45:49] ===================== [SKIPPED] xe_bo ======================
[00:45:49] ==================== args (11 subtests) ====================
[00:45:49] [PASSED] count_args_test
[00:45:49] [PASSED] call_args_example
[00:45:49] [PASSED] call_args_test
[00:45:49] [PASSED] drop_first_arg_example
[00:45:49] [PASSED] drop_first_arg_test
[00:45:49] [PASSED] first_arg_example
[00:45:49] [PASSED] first_arg_test
[00:45:49] [PASSED] last_arg_example
[00:45:49] [PASSED] last_arg_test
[00:45:49] [PASSED] pick_arg_example
[00:45:49] [PASSED] sep_comma_example
[00:45:49] ====================== [PASSED] args =======================
[00:45:49] =================== xe_pci (3 subtests) ====================
[00:45:49] ==================== check_graphics_ip ====================
[00:45:49] [PASSED] 12.70 Xe_LPG
[00:45:49] [PASSED] 12.71 Xe_LPG
[00:45:49] [PASSED] 12.74 Xe_LPG+
[00:45:49] [PASSED] 20.01 Xe2_HPG
[00:45:49] [PASSED] 20.02 Xe2_HPG
[00:45:49] [PASSED] 20.04 Xe2_LPG
[00:45:49] [PASSED] 30.00 Xe3_LPG
[00:45:49] [PASSED] 30.01 Xe3_LPG
[00:45:49] [PASSED] 30.03 Xe3_LPG
[00:45:49] ================ [PASSED] check_graphics_ip ================
[00:45:49] ===================== check_media_ip ======================
[00:45:49] [PASSED] 13.00 Xe_LPM+
[00:45:49] [PASSED] 13.01 Xe2_HPM
[00:45:49] [PASSED] 20.00 Xe2_LPM
[00:45:49] [PASSED] 30.00 Xe3_LPM
[00:45:49] [PASSED] 30.02 Xe3_LPM
[00:45:49] ================= [PASSED] check_media_ip ==================
[00:45:49] ================= check_platform_gt_count =================
[00:45:49] [PASSED] 0x9A60 (TIGERLAKE)
[00:45:49] [PASSED] 0x9A68 (TIGERLAKE)
[00:45:49] [PASSED] 0x9A70 (TIGERLAKE)
[00:45:49] [PASSED] 0x9A40 (TIGERLAKE)
[00:45:49] [PASSED] 0x9A49 (TIGERLAKE)
[00:45:49] [PASSED] 0x9A59 (TIGERLAKE)
[00:45:49] [PASSED] 0x9A78 (TIGERLAKE)
[00:45:49] [PASSED] 0x9AC0 (TIGERLAKE)
[00:45:49] [PASSED] 0x9AC9 (TIGERLAKE)
[00:45:49] [PASSED] 0x9AD9 (TIGERLAKE)
[00:45:49] [PASSED] 0x9AF8 (TIGERLAKE)
[00:45:49] [PASSED] 0x4C80 (ROCKETLAKE)
[00:45:49] [PASSED] 0x4C8A (ROCKETLAKE)
[00:45:49] [PASSED] 0x4C8B (ROCKETLAKE)
[00:45:49] [PASSED] 0x4C8C (ROCKETLAKE)
[00:45:49] [PASSED] 0x4C90 (ROCKETLAKE)
[00:45:49] [PASSED] 0x4C9A (ROCKETLAKE)
[00:45:49] [PASSED] 0x4680 (ALDERLAKE_S)
[00:45:49] [PASSED] 0x4682 (ALDERLAKE_S)
[00:45:49] [PASSED] 0x4688 (ALDERLAKE_S)
[00:45:49] [PASSED] 0x468A (ALDERLAKE_S)
[00:45:49] [PASSED] 0x468B (ALDERLAKE_S)
[00:45:49] [PASSED] 0x4690 (ALDERLAKE_S)
[00:45:49] [PASSED] 0x4692 (ALDERLAKE_S)
[00:45:49] [PASSED] 0x4693 (ALDERLAKE_S)
[00:45:49] [PASSED] 0x46A0 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x46A1 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x46A2 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x46A3 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x46A6 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x46A8 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x46AA (ALDERLAKE_P)
[00:45:49] [PASSED] 0x462A (ALDERLAKE_P)
[00:45:49] [PASSED] 0x4626 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x4628 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x46B0 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x46B1 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x46B2 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x46B3 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x46C0 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x46C1 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x46C2 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x46C3 (ALDERLAKE_P)
[00:45:49] [PASSED] 0x46D0 (ALDERLAKE_N)
[00:45:49] [PASSED] 0x46D1 (ALDERLAKE_N)
[00:45:49] [PASSED] 0x46D2 (ALDERLAKE_N)
[00:45:49] [PASSED] 0x46D3 (ALDERLAKE_N)
[00:45:49] [PASSED] 0x46D4 (ALDERLAKE_N)
[00:45:49] [PASSED] 0xA721 (ALDERLAKE_P)
[00:45:49] [PASSED] 0xA7A1 (ALDERLAKE_P)
[00:45:49] [PASSED] 0xA7A9 (ALDERLAKE_P)
[00:45:49] [PASSED] 0xA7AC (ALDERLAKE_P)
[00:45:49] [PASSED] 0xA7AD (ALDERLAKE_P)
[00:45:49] [PASSED] 0xA720 (ALDERLAKE_P)
[00:45:49] [PASSED] 0xA7A0 (ALDERLAKE_P)
[00:45:49] [PASSED] 0xA7A8 (ALDERLAKE_P)
[00:45:49] [PASSED] 0xA7AA (ALDERLAKE_P)
[00:45:49] [PASSED] 0xA7AB (ALDERLAKE_P)
[00:45:49] [PASSED] 0xA780 (ALDERLAKE_S)
[00:45:49] [PASSED] 0xA781 (ALDERLAKE_S)
[00:45:49] [PASSED] 0xA782 (ALDERLAKE_S)
[00:45:49] [PASSED] 0xA783 (ALDERLAKE_S)
[00:45:49] [PASSED] 0xA788 (ALDERLAKE_S)
[00:45:49] [PASSED] 0xA789 (ALDERLAKE_S)
[00:45:49] [PASSED] 0xA78A (ALDERLAKE_S)
[00:45:49] [PASSED] 0xA78B (ALDERLAKE_S)
[00:45:49] [PASSED] 0x4905 (DG1)
[00:45:49] [PASSED] 0x4906 (DG1)
[00:45:49] [PASSED] 0x4907 (DG1)
[00:45:49] [PASSED] 0x4908 (DG1)
[00:45:49] [PASSED] 0x4909 (DG1)
[00:45:49] [PASSED] 0x56C0 (DG2)
[00:45:49] [PASSED] 0x56C2 (DG2)
[00:45:49] [PASSED] 0x56C1 (DG2)
[00:45:49] [PASSED] 0x7D51 (METEORLAKE)
[00:45:49] [PASSED] 0x7DD1 (METEORLAKE)
[00:45:49] [PASSED] 0x7D41 (METEORLAKE)
[00:45:49] [PASSED] 0x7D67 (METEORLAKE)
[00:45:49] [PASSED] 0xB640 (METEORLAKE)
[00:45:49] [PASSED] 0x56A0 (DG2)
[00:45:49] [PASSED] 0x56A1 (DG2)
[00:45:49] [PASSED] 0x56A2 (DG2)
[00:45:49] [PASSED] 0x56BE (DG2)
[00:45:49] [PASSED] 0x56BF (DG2)
[00:45:49] [PASSED] 0x5690 (DG2)
[00:45:49] [PASSED] 0x5691 (DG2)
[00:45:49] [PASSED] 0x5692 (DG2)
[00:45:49] [PASSED] 0x56A5 (DG2)
[00:45:49] [PASSED] 0x56A6 (DG2)
[00:45:49] [PASSED] 0x56B0 (DG2)
[00:45:49] [PASSED] 0x56B1 (DG2)
[00:45:49] [PASSED] 0x56BA (DG2)
[00:45:49] [PASSED] 0x56BB (DG2)
[00:45:49] [PASSED] 0x56BC (DG2)
[00:45:49] [PASSED] 0x56BD (DG2)
[00:45:49] [PASSED] 0x5693 (DG2)
[00:45:49] [PASSED] 0x5694 (DG2)
[00:45:49] [PASSED] 0x5695 (DG2)
[00:45:49] [PASSED] 0x56A3 (DG2)
[00:45:49] [PASSED] 0x56A4 (DG2)
[00:45:49] [PASSED] 0x56B2 (DG2)
[00:45:49] [PASSED] 0x56B3 (DG2)
[00:45:49] [PASSED] 0x5696 (DG2)
[00:45:49] [PASSED] 0x5697 (DG2)
[00:45:49] [PASSED] 0xB69 (PVC)
[00:45:49] [PASSED] 0xB6E (PVC)
[00:45:49] [PASSED] 0xBD4 (PVC)
[00:45:49] [PASSED] 0xBD5 (PVC)
[00:45:49] [PASSED] 0xBD6 (PVC)
[00:45:49] [PASSED] 0xBD7 (PVC)
[00:45:49] [PASSED] 0xBD8 (PVC)
[00:45:49] [PASSED] 0xBD9 (PVC)
[00:45:49] [PASSED] 0xBDA (PVC)
[00:45:49] [PASSED] 0xBDB (PVC)
[00:45:49] [PASSED] 0xBE0 (PVC)
[00:45:49] [PASSED] 0xBE1 (PVC)
[00:45:49] [PASSED] 0xBE5 (PVC)
[00:45:49] [PASSED] 0x7D40 (METEORLAKE)
[00:45:49] [PASSED] 0x7D45 (METEORLAKE)
[00:45:49] [PASSED] 0x7D55 (METEORLAKE)
[00:45:49] [PASSED] 0x7D60 (METEORLAKE)
[00:45:49] [PASSED] 0x7DD5 (METEORLAKE)
[00:45:49] [PASSED] 0x6420 (LUNARLAKE)
[00:45:49] [PASSED] 0x64A0 (LUNARLAKE)
[00:45:49] [PASSED] 0x64B0 (LUNARLAKE)
[00:45:49] [PASSED] 0xE202 (BATTLEMAGE)
[00:45:49] [PASSED] 0xE20B (BATTLEMAGE)
[00:45:49] [PASSED] 0xE20C (BATTLEMAGE)
[00:45:49] [PASSED] 0xE20D (BATTLEMAGE)
[00:45:49] [PASSED] 0xE210 (BATTLEMAGE)
[00:45:49] [PASSED] 0xE211 (BATTLEMAGE)
[00:45:49] [PASSED] 0xE212 (BATTLEMAGE)
[00:45:49] [PASSED] 0xE216 (BATTLEMAGE)
[00:45:49] [PASSED] 0xE220 (BATTLEMAGE)
[00:45:49] [PASSED] 0xE221 (BATTLEMAGE)
[00:45:49] [PASSED] 0xE222 (BATTLEMAGE)
[00:45:49] [PASSED] 0xE223 (BATTLEMAGE)
[00:45:49] [PASSED] 0xB080 (PANTHERLAKE)
[00:45:49] [PASSED] 0xB081 (PANTHERLAKE)
[00:45:49] [PASSED] 0xB082 (PANTHERLAKE)
[00:45:49] [PASSED] 0xB083 (PANTHERLAKE)
[00:45:49] [PASSED] 0xB084 (PANTHERLAKE)
[00:45:49] [PASSED] 0xB085 (PANTHERLAKE)
[00:45:49] [PASSED] 0xB086 (PANTHERLAKE)
[00:45:49] [PASSED] 0xB087 (PANTHERLAKE)
[00:45:49] [PASSED] 0xB08F (PANTHERLAKE)
[00:45:49] [PASSED] 0xB090 (PANTHERLAKE)
[00:45:49] [PASSED] 0xB0A0 (PANTHERLAKE)
[00:45:49] [PASSED] 0xB0B0 (PANTHERLAKE)
[00:45:49] [PASSED] 0xFD80 (PANTHERLAKE)
[00:45:49] [PASSED] 0xFD81 (PANTHERLAKE)
[00:45:49] ============= [PASSED] check_platform_gt_count =============
[00:45:49] ===================== [PASSED] xe_pci ======================
[00:45:49] =================== xe_rtp (2 subtests) ====================
[00:45:49] =============== xe_rtp_process_to_sr_tests ================
[00:45:49] [PASSED] coalesce-same-reg
[00:45:49] [PASSED] no-match-no-add
[00:45:49] [PASSED] match-or
[00:45:49] [PASSED] match-or-xfail
[00:45:49] [PASSED] no-match-no-add-multiple-rules
[00:45:49] [PASSED] two-regs-two-entries
[00:45:49] [PASSED] clr-one-set-other
[00:45:49] [PASSED] set-field
[00:45:49] [PASSED] conflict-duplicate
[00:45:49] [PASSED] conflict-not-disjoint
[00:45:49] [PASSED] conflict-reg-type
[00:45:49] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[00:45:49] ================== xe_rtp_process_tests ===================
[00:45:49] [PASSED] active1
[00:45:49] [PASSED] active2
[00:45:49] [PASSED] active-inactive
[00:45:49] [PASSED] inactive-active
[00:45:49] [PASSED] inactive-1st_or_active-inactive
[00:45:49] [PASSED] inactive-2nd_or_active-inactive
[00:45:49] [PASSED] inactive-last_or_active-inactive
[00:45:49] [PASSED] inactive-no_or_active-inactive
[00:45:49] ============== [PASSED] xe_rtp_process_tests ===============
[00:45:49] ===================== [PASSED] xe_rtp ======================
[00:45:49] ==================== xe_wa (1 subtest) =====================
[00:45:49] ======================== xe_wa_gt =========================
[00:45:49] [PASSED] TIGERLAKE (B0)
[00:45:49] [PASSED] DG1 (A0)
[00:45:49] [PASSED] DG1 (B0)
[00:45:49] [PASSED] ALDERLAKE_S (A0)
[00:45:49] [PASSED] ALDERLAKE_S (B0)
[00:45:49] [PASSED] ALDERLAKE_S (C0)
[00:45:49] [PASSED] ALDERLAKE_S (D0)
[00:45:49] [PASSED] ALDERLAKE_P (A0)
[00:45:49] [PASSED] ALDERLAKE_P (B0)
[00:45:49] [PASSED] ALDERLAKE_P (C0)
[00:45:49] [PASSED] ALDERLAKE_S_RPLS (D0)
[00:45:49] [PASSED] ALDERLAKE_P_RPLU (E0)
[00:45:49] [PASSED] DG2_G10 (C0)
[00:45:49] [PASSED] DG2_G11 (B1)
[00:45:49] [PASSED] DG2_G12 (A1)
[00:45:49] [PASSED] METEORLAKE (g:A0, m:A0)
[00:45:49] [PASSED] METEORLAKE (g:A0, m:A0)
[00:45:49] [PASSED] METEORLAKE (g:A0, m:A0)
[00:45:49] [PASSED] LUNARLAKE (g:A0, m:A0)
[00:45:49] [PASSED] LUNARLAKE (g:B0, m:A0)
[00:45:49] [PASSED] BATTLEMAGE (g:A0, m:A1)
stty: 'standard input': Inappropriate ioctl for device
[00:45:49] ==================== [PASSED] xe_wa_gt =====================
[00:45:49] ====================== [PASSED] xe_wa ======================
[00:45:49] ============================================================
[00:45:49] Testing complete. Ran 296 tests: passed: 280, skipped: 16
[00:45:49] Elapsed time: 31.426s total, 4.198s configuring, 26.862s building, 0.312s running
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[00:45:49] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[00:45:51] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[00:46:12] Starting KUnit Kernel (1/1)...
[00:46:12] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[00:46:12] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[00:46:12] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[00:46:12] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[00:46:12] =========== drm_validate_clone_mode (2 subtests) ===========
[00:46:12] ============== drm_test_check_in_clone_mode ===============
[00:46:12] [PASSED] in_clone_mode
[00:46:12] [PASSED] not_in_clone_mode
[00:46:12] ========== [PASSED] drm_test_check_in_clone_mode ===========
[00:46:12] =============== drm_test_check_valid_clones ===============
[00:46:12] [PASSED] not_in_clone_mode
[00:46:12] [PASSED] valid_clone
[00:46:12] [PASSED] invalid_clone
[00:46:12] =========== [PASSED] drm_test_check_valid_clones ===========
[00:46:12] ============= [PASSED] drm_validate_clone_mode =============
[00:46:12] ============= drm_validate_modeset (1 subtest) =============
[00:46:12] [PASSED] drm_test_check_connector_changed_modeset
[00:46:12] ============== [PASSED] drm_validate_modeset ===============
[00:46:12] ====== drm_test_bridge_get_current_state (2 subtests) ======
[00:46:12] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[00:46:12] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[00:46:12] ======== [PASSED] drm_test_bridge_get_current_state ========
[00:46:12] ====== drm_test_bridge_helper_reset_crtc (3 subtests) ======
[00:46:12] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[00:46:12] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[00:46:12] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[00:46:12] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[00:46:12] ============== drm_bridge_alloc (2 subtests) ===============
[00:46:12] [PASSED] drm_test_drm_bridge_alloc_basic
[00:46:12] [PASSED] drm_test_drm_bridge_alloc_get_put
[00:46:12] ================ [PASSED] drm_bridge_alloc =================
[00:46:12] ================== drm_buddy (7 subtests) ==================
[00:46:12] [PASSED] drm_test_buddy_alloc_limit
[00:46:12] [PASSED] drm_test_buddy_alloc_optimistic
[00:46:12] [PASSED] drm_test_buddy_alloc_pessimistic
[00:46:12] [PASSED] drm_test_buddy_alloc_pathological
[00:46:12] [PASSED] drm_test_buddy_alloc_contiguous
[00:46:12] [PASSED] drm_test_buddy_alloc_clear
[00:46:12] [PASSED] drm_test_buddy_alloc_range_bias
[00:46:12] ==================== [PASSED] drm_buddy ====================
[00:46:12] ============= drm_cmdline_parser (40 subtests) =============
[00:46:12] [PASSED] drm_test_cmdline_force_d_only
[00:46:12] [PASSED] drm_test_cmdline_force_D_only_dvi
[00:46:12] [PASSED] drm_test_cmdline_force_D_only_hdmi
[00:46:12] [PASSED] drm_test_cmdline_force_D_only_not_digital
[00:46:12] [PASSED] drm_test_cmdline_force_e_only
[00:46:12] [PASSED] drm_test_cmdline_res
[00:46:12] [PASSED] drm_test_cmdline_res_vesa
[00:46:12] [PASSED] drm_test_cmdline_res_vesa_rblank
[00:46:12] [PASSED] drm_test_cmdline_res_rblank
[00:46:12] [PASSED] drm_test_cmdline_res_bpp
[00:46:12] [PASSED] drm_test_cmdline_res_refresh
[00:46:12] [PASSED] drm_test_cmdline_res_bpp_refresh
[00:46:12] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[00:46:12] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[00:46:12] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[00:46:12] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[00:46:12] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[00:46:12] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[00:46:12] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[00:46:12] [PASSED] drm_test_cmdline_res_margins_force_on
[00:46:12] [PASSED] drm_test_cmdline_res_vesa_margins
[00:46:12] [PASSED] drm_test_cmdline_name
[00:46:12] [PASSED] drm_test_cmdline_name_bpp
[00:46:12] [PASSED] drm_test_cmdline_name_option
[00:46:12] [PASSED] drm_test_cmdline_name_bpp_option
[00:46:12] [PASSED] drm_test_cmdline_rotate_0
[00:46:12] [PASSED] drm_test_cmdline_rotate_90
[00:46:12] [PASSED] drm_test_cmdline_rotate_180
[00:46:12] [PASSED] drm_test_cmdline_rotate_270
[00:46:12] [PASSED] drm_test_cmdline_hmirror
[00:46:12] [PASSED] drm_test_cmdline_vmirror
[00:46:12] [PASSED] drm_test_cmdline_margin_options
[00:46:12] [PASSED] drm_test_cmdline_multiple_options
[00:46:12] [PASSED] drm_test_cmdline_bpp_extra_and_option
[00:46:12] [PASSED] drm_test_cmdline_extra_and_option
[00:46:12] [PASSED] drm_test_cmdline_freestanding_options
[00:46:12] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[00:46:12] [PASSED] drm_test_cmdline_panel_orientation
[00:46:12] ================ drm_test_cmdline_invalid =================
[00:46:12] [PASSED] margin_only
[00:46:12] [PASSED] interlace_only
[00:46:12] [PASSED] res_missing_x
[00:46:12] [PASSED] res_missing_y
[00:46:12] [PASSED] res_bad_y
[00:46:12] [PASSED] res_missing_y_bpp
[00:46:12] [PASSED] res_bad_bpp
[00:46:12] [PASSED] res_bad_refresh
[00:46:12] [PASSED] res_bpp_refresh_force_on_off
[00:46:12] [PASSED] res_invalid_mode
[00:46:12] [PASSED] res_bpp_wrong_place_mode
[00:46:12] [PASSED] name_bpp_refresh
[00:46:12] [PASSED] name_refresh
[00:46:12] [PASSED] name_refresh_wrong_mode
[00:46:12] [PASSED] name_refresh_invalid_mode
[00:46:12] [PASSED] rotate_multiple
[00:46:12] [PASSED] rotate_invalid_val
[00:46:12] [PASSED] rotate_truncated
[00:46:12] [PASSED] invalid_option
[00:46:12] [PASSED] invalid_tv_option
[00:46:12] [PASSED] truncated_tv_option
[00:46:12] ============ [PASSED] drm_test_cmdline_invalid =============
[00:46:12] =============== drm_test_cmdline_tv_options ===============
[00:46:12] [PASSED] NTSC
[00:46:12] [PASSED] NTSC_443
[00:46:12] [PASSED] NTSC_J
[00:46:12] [PASSED] PAL
[00:46:12] [PASSED] PAL_M
[00:46:12] [PASSED] PAL_N
[00:46:12] [PASSED] SECAM
[00:46:12] [PASSED] MONO_525
[00:46:12] [PASSED] MONO_625
[00:46:12] =========== [PASSED] drm_test_cmdline_tv_options ===========
[00:46:12] =============== [PASSED] drm_cmdline_parser ================
[00:46:12] ========== drmm_connector_hdmi_init (20 subtests) ==========
[00:46:12] [PASSED] drm_test_connector_hdmi_init_valid
[00:46:12] [PASSED] drm_test_connector_hdmi_init_bpc_8
[00:46:12] [PASSED] drm_test_connector_hdmi_init_bpc_10
[00:46:12] [PASSED] drm_test_connector_hdmi_init_bpc_12
[00:46:12] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[00:46:12] [PASSED] drm_test_connector_hdmi_init_bpc_null
[00:46:12] [PASSED] drm_test_connector_hdmi_init_formats_empty
[00:46:12] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[00:46:12] === drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[00:46:12] [PASSED] supported_formats=0x9 yuv420_allowed=1
[00:46:12] [PASSED] supported_formats=0x9 yuv420_allowed=0
[00:46:12] [PASSED] supported_formats=0x3 yuv420_allowed=1
[00:46:12] [PASSED] supported_formats=0x3 yuv420_allowed=0
[00:46:12] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[00:46:12] [PASSED] drm_test_connector_hdmi_init_null_ddc
[00:46:12] [PASSED] drm_test_connector_hdmi_init_null_product
[00:46:12] [PASSED] drm_test_connector_hdmi_init_null_vendor
[00:46:12] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[00:46:12] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[00:46:12] [PASSED] drm_test_connector_hdmi_init_product_valid
[00:46:12] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[00:46:12] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[00:46:12] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[00:46:12] ========= drm_test_connector_hdmi_init_type_valid =========
[00:46:12] [PASSED] HDMI-A
[00:46:12] [PASSED] HDMI-B
[00:46:12] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[00:46:12] ======== drm_test_connector_hdmi_init_type_invalid ========
[00:46:12] [PASSED] Unknown
[00:46:12] [PASSED] VGA
[00:46:12] [PASSED] DVI-I
[00:46:12] [PASSED] DVI-D
[00:46:12] [PASSED] DVI-A
[00:46:12] [PASSED] Composite
[00:46:12] [PASSED] SVIDEO
[00:46:12] [PASSED] LVDS
[00:46:12] [PASSED] Component
[00:46:12] [PASSED] DIN
[00:46:12] [PASSED] DP
[00:46:12] [PASSED] TV
[00:46:12] [PASSED] eDP
[00:46:12] [PASSED] Virtual
[00:46:12] [PASSED] DSI
[00:46:12] [PASSED] DPI
[00:46:12] [PASSED] Writeback
[00:46:12] [PASSED] SPI
[00:46:12] [PASSED] USB
[00:46:12] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[00:46:12] ============ [PASSED] drmm_connector_hdmi_init =============
[00:46:12] ============= drmm_connector_init (3 subtests) =============
[00:46:12] [PASSED] drm_test_drmm_connector_init
[00:46:12] [PASSED] drm_test_drmm_connector_init_null_ddc
[00:46:12] ========= drm_test_drmm_connector_init_type_valid =========
[00:46:12] [PASSED] Unknown
[00:46:12] [PASSED] VGA
[00:46:12] [PASSED] DVI-I
[00:46:12] [PASSED] DVI-D
[00:46:12] [PASSED] DVI-A
[00:46:12] [PASSED] Composite
[00:46:12] [PASSED] SVIDEO
[00:46:12] [PASSED] LVDS
[00:46:12] [PASSED] Component
[00:46:12] [PASSED] DIN
[00:46:12] [PASSED] DP
[00:46:12] [PASSED] HDMI-A
[00:46:12] [PASSED] HDMI-B
[00:46:12] [PASSED] TV
[00:46:12] [PASSED] eDP
[00:46:12] [PASSED] Virtual
[00:46:12] [PASSED] DSI
[00:46:12] [PASSED] DPI
[00:46:12] [PASSED] Writeback
[00:46:12] [PASSED] SPI
[00:46:12] [PASSED] USB
[00:46:12] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[00:46:12] =============== [PASSED] drmm_connector_init ===============
[00:46:12] ========= drm_connector_dynamic_init (6 subtests) ==========
[00:46:12] [PASSED] drm_test_drm_connector_dynamic_init
[00:46:12] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[00:46:12] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[00:46:12] [PASSED] drm_test_drm_connector_dynamic_init_properties
[00:46:12] ===== drm_test_drm_connector_dynamic_init_type_valid ======
[00:46:12] [PASSED] Unknown
[00:46:12] [PASSED] VGA
[00:46:12] [PASSED] DVI-I
[00:46:12] [PASSED] DVI-D
[00:46:12] [PASSED] DVI-A
[00:46:12] [PASSED] Composite
[00:46:12] [PASSED] SVIDEO
[00:46:12] [PASSED] LVDS
[00:46:12] [PASSED] Component
[00:46:12] [PASSED] DIN
[00:46:12] [PASSED] DP
[00:46:12] [PASSED] HDMI-A
[00:46:12] [PASSED] HDMI-B
[00:46:12] [PASSED] TV
[00:46:12] [PASSED] eDP
[00:46:12] [PASSED] Virtual
[00:46:12] [PASSED] DSI
[00:46:12] [PASSED] DPI
[00:46:12] [PASSED] Writeback
[00:46:12] [PASSED] SPI
[00:46:12] [PASSED] USB
[00:46:12] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[00:46:12] ======== drm_test_drm_connector_dynamic_init_name =========
[00:46:12] [PASSED] Unknown
[00:46:12] [PASSED] VGA
[00:46:12] [PASSED] DVI-I
[00:46:12] [PASSED] DVI-D
[00:46:12] [PASSED] DVI-A
[00:46:12] [PASSED] Composite
[00:46:12] [PASSED] SVIDEO
[00:46:12] [PASSED] LVDS
[00:46:12] [PASSED] Component
[00:46:12] [PASSED] DIN
[00:46:12] [PASSED] DP
[00:46:12] [PASSED] HDMI-A
[00:46:12] [PASSED] HDMI-B
[00:46:12] [PASSED] TV
[00:46:12] [PASSED] eDP
[00:46:12] [PASSED] Virtual
[00:46:12] [PASSED] DSI
[00:46:12] [PASSED] DPI
[00:46:12] [PASSED] Writeback
[00:46:12] [PASSED] SPI
[00:46:12] [PASSED] USB
[00:46:12] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[00:46:12] =========== [PASSED] drm_connector_dynamic_init ============
[00:46:12] ==== drm_connector_dynamic_register_early (4 subtests) =====
[00:46:12] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[00:46:12] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[00:46:12] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[00:46:12] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[00:46:12] ====== [PASSED] drm_connector_dynamic_register_early =======
[00:46:12] ======= drm_connector_dynamic_register (7 subtests) ========
[00:46:12] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[00:46:12] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[00:46:12] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[00:46:12] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[00:46:12] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[00:46:12] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[00:46:12] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[00:46:12] ========= [PASSED] drm_connector_dynamic_register ==========
[00:46:12] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[00:46:12] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[00:46:12] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[00:46:12] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[00:46:12] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[00:46:12] ========== drm_test_get_tv_mode_from_name_valid ===========
[00:46:12] [PASSED] NTSC
[00:46:12] [PASSED] NTSC-443
[00:46:12] [PASSED] NTSC-J
[00:46:12] [PASSED] PAL
[00:46:12] [PASSED] PAL-M
[00:46:12] [PASSED] PAL-N
[00:46:12] [PASSED] SECAM
[00:46:12] [PASSED] Mono
[00:46:12] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[00:46:12] [PASSED] drm_test_get_tv_mode_from_name_truncated
[00:46:12] ============ [PASSED] drm_get_tv_mode_from_name ============
[00:46:12] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[00:46:12] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[00:46:12] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[00:46:12] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[00:46:12] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[00:46:12] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[00:46:12] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[00:46:12] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid =
[00:46:12] [PASSED] VIC 96
[00:46:12] [PASSED] VIC 97
[00:46:12] [PASSED] VIC 101
[00:46:12] [PASSED] VIC 102
[00:46:12] [PASSED] VIC 106
[00:46:12] [PASSED] VIC 107
[00:46:12] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[00:46:12] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[00:46:12] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[00:46:12] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[00:46:12] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[00:46:12] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[00:46:12] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[00:46:12] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[00:46:12] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name ====
[00:46:12] [PASSED] Automatic
[00:46:12] [PASSED] Full
[00:46:12] [PASSED] Limited 16:235
[00:46:12] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[00:46:12] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[00:46:12] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[00:46:12] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[00:46:12] === drm_test_drm_hdmi_connector_get_output_format_name ====
[00:46:12] [PASSED] RGB
[00:46:12] [PASSED] YUV 4:2:0
[00:46:12] [PASSED] YUV 4:2:2
[00:46:12] [PASSED] YUV 4:4:4
[00:46:12] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[00:46:12] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[00:46:12] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[00:46:12] ============= drm_damage_helper (21 subtests) ==============
[00:46:12] [PASSED] drm_test_damage_iter_no_damage
[00:46:12] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[00:46:12] [PASSED] drm_test_damage_iter_no_damage_src_moved
[00:46:12] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[00:46:12] [PASSED] drm_test_damage_iter_no_damage_not_visible
[00:46:12] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[00:46:12] [PASSED] drm_test_damage_iter_no_damage_no_fb
[00:46:12] [PASSED] drm_test_damage_iter_simple_damage
[00:46:12] [PASSED] drm_test_damage_iter_single_damage
[00:46:12] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[00:46:12] [PASSED] drm_test_damage_iter_single_damage_outside_src
[00:46:12] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[00:46:12] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[00:46:12] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[00:46:12] [PASSED] drm_test_damage_iter_single_damage_src_moved
[00:46:12] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[00:46:12] [PASSED] drm_test_damage_iter_damage
[00:46:12] [PASSED] drm_test_damage_iter_damage_one_intersect
[00:46:12] [PASSED] drm_test_damage_iter_damage_one_outside
[00:46:12] [PASSED] drm_test_damage_iter_damage_src_moved
[00:46:12] [PASSED] drm_test_damage_iter_damage_not_visible
[00:46:12] ================ [PASSED] drm_damage_helper ================
[00:46:12] ============== drm_dp_mst_helper (3 subtests) ==============
[00:46:12] ============== drm_test_dp_mst_calc_pbn_mode ==============
[00:46:12] [PASSED] Clock 154000 BPP 30 DSC disabled
[00:46:12] [PASSED] Clock 234000 BPP 30 DSC disabled
[00:46:12] [PASSED] Clock 297000 BPP 24 DSC disabled
[00:46:12] [PASSED] Clock 332880 BPP 24 DSC enabled
[00:46:12] [PASSED] Clock 324540 BPP 24 DSC enabled
[00:46:12] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[00:46:12] ============== drm_test_dp_mst_calc_pbn_div ===============
[00:46:12] [PASSED] Link rate 2000000 lane count 4
[00:46:12] [PASSED] Link rate 2000000 lane count 2
[00:46:12] [PASSED] Link rate 2000000 lane count 1
[00:46:12] [PASSED] Link rate 1350000 lane count 4
[00:46:12] [PASSED] Link rate 1350000 lane count 2
[00:46:12] [PASSED] Link rate 1350000 lane count 1
[00:46:12] [PASSED] Link rate 1000000 lane count 4
[00:46:12] [PASSED] Link rate 1000000 lane count 2
[00:46:12] [PASSED] Link rate 1000000 lane count 1
[00:46:12] [PASSED] Link rate 810000 lane count 4
[00:46:12] [PASSED] Link rate 810000 lane count 2
[00:46:12] [PASSED] Link rate 810000 lane count 1
[00:46:12] [PASSED] Link rate 540000 lane count 4
[00:46:12] [PASSED] Link rate 540000 lane count 2
[00:46:12] [PASSED] Link rate 540000 lane count 1
[00:46:12] [PASSED] Link rate 270000 lane count 4
[00:46:12] [PASSED] Link rate 270000 lane count 2
[00:46:12] [PASSED] Link rate 270000 lane count 1
[00:46:12] [PASSED] Link rate 162000 lane count 4
[00:46:12] [PASSED] Link rate 162000 lane count 2
[00:46:12] [PASSED] Link rate 162000 lane count 1
[00:46:12] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[00:46:12] ========= drm_test_dp_mst_sideband_msg_req_decode =========
[00:46:12] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[00:46:12] [PASSED] DP_POWER_UP_PHY with port number
[00:46:12] [PASSED] DP_POWER_DOWN_PHY with port number
[00:46:12] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[00:46:12] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[00:46:12] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[00:46:12] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[00:46:12] [PASSED] DP_QUERY_PAYLOAD with port number
[00:46:12] [PASSED] DP_QUERY_PAYLOAD with VCPI
[00:46:12] [PASSED] DP_REMOTE_DPCD_READ with port number
[00:46:12] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[00:46:12] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[00:46:12] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[00:46:12] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[00:46:12] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[00:46:12] [PASSED] DP_REMOTE_I2C_READ with port number
[00:46:12] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[00:46:12] [PASSED] DP_REMOTE_I2C_READ with transactions array
[00:46:12] [PASSED] DP_REMOTE_I2C_WRITE with port number
[00:46:12] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[00:46:12] [PASSED] DP_REMOTE_I2C_WRITE with data array
[00:46:12] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[00:46:12] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[00:46:12] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[00:46:12] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[00:46:12] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[00:46:12] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[00:46:12] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[00:46:12] ================ [PASSED] drm_dp_mst_helper ================
[00:46:12] ================== drm_exec (7 subtests) ===================
[00:46:12] [PASSED] sanitycheck
[00:46:12] [PASSED] test_lock
[00:46:12] [PASSED] test_lock_unlock
[00:46:12] [PASSED] test_duplicates
[00:46:12] [PASSED] test_prepare
[00:46:12] [PASSED] test_prepare_array
[00:46:12] [PASSED] test_multiple_loops
[00:46:12] ==================== [PASSED] drm_exec =====================
[00:46:12] =========== drm_format_helper_test (17 subtests) ===========
[00:46:12] ============== drm_test_fb_xrgb8888_to_gray8 ==============
[00:46:12] [PASSED] single_pixel_source_buffer
[00:46:12] [PASSED] single_pixel_clip_rectangle
[00:46:12] [PASSED] well_known_colors
[00:46:12] [PASSED] destination_pitch
[00:46:12] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[00:46:12] ============= drm_test_fb_xrgb8888_to_rgb332 ==============
[00:46:12] [PASSED] single_pixel_source_buffer
[00:46:12] [PASSED] single_pixel_clip_rectangle
[00:46:12] [PASSED] well_known_colors
[00:46:12] [PASSED] destination_pitch
[00:46:12] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[00:46:12] ============= drm_test_fb_xrgb8888_to_rgb565 ==============
[00:46:12] [PASSED] single_pixel_source_buffer
[00:46:12] [PASSED] single_pixel_clip_rectangle
[00:46:12] [PASSED] well_known_colors
[00:46:12] [PASSED] destination_pitch
[00:46:12] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[00:46:12] ============ drm_test_fb_xrgb8888_to_xrgb1555 =============
[00:46:12] [PASSED] single_pixel_source_buffer
[00:46:12] [PASSED] single_pixel_clip_rectangle
[00:46:12] [PASSED] well_known_colors
[00:46:12] [PASSED] destination_pitch
[00:46:12] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[00:46:12] ============ drm_test_fb_xrgb8888_to_argb1555 =============
[00:46:12] [PASSED] single_pixel_source_buffer
[00:46:12] [PASSED] single_pixel_clip_rectangle
[00:46:12] [PASSED] well_known_colors
[00:46:12] [PASSED] destination_pitch
[00:46:12] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[00:46:12] ============ drm_test_fb_xrgb8888_to_rgba5551 =============
[00:46:12] [PASSED] single_pixel_source_buffer
[00:46:12] [PASSED] single_pixel_clip_rectangle
[00:46:12] [PASSED] well_known_colors
[00:46:12] [PASSED] destination_pitch
[00:46:12] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[00:46:12] ============= drm_test_fb_xrgb8888_to_rgb888 ==============
[00:46:12] [PASSED] single_pixel_source_buffer
[00:46:12] [PASSED] single_pixel_clip_rectangle
[00:46:12] [PASSED] well_known_colors
[00:46:12] [PASSED] destination_pitch
[00:46:12] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[00:46:12] ============= drm_test_fb_xrgb8888_to_bgr888 ==============
[00:46:12] [PASSED] single_pixel_source_buffer
[00:46:12] [PASSED] single_pixel_clip_rectangle
[00:46:12] [PASSED] well_known_colors
[00:46:12] [PASSED] destination_pitch
[00:46:12] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[00:46:12] ============ drm_test_fb_xrgb8888_to_argb8888 =============
[00:46:12] [PASSED] single_pixel_source_buffer
[00:46:12] [PASSED] single_pixel_clip_rectangle
[00:46:12] [PASSED] well_known_colors
[00:46:12] [PASSED] destination_pitch
[00:46:12] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[00:46:12] =========== drm_test_fb_xrgb8888_to_xrgb2101010 ===========
[00:46:12] [PASSED] single_pixel_source_buffer
[00:46:12] [PASSED] single_pixel_clip_rectangle
[00:46:12] [PASSED] well_known_colors
[00:46:12] [PASSED] destination_pitch
[00:46:12] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[00:46:12] =========== drm_test_fb_xrgb8888_to_argb2101010 ===========
[00:46:12] [PASSED] single_pixel_source_buffer
[00:46:12] [PASSED] single_pixel_clip_rectangle
[00:46:12] [PASSED] well_known_colors
[00:46:12] [PASSED] destination_pitch
[00:46:12] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[00:46:12] ============== drm_test_fb_xrgb8888_to_mono ===============
[00:46:12] [PASSED] single_pixel_source_buffer
[00:46:12] [PASSED] single_pixel_clip_rectangle
[00:46:12] [PASSED] well_known_colors
[00:46:12] [PASSED] destination_pitch
[00:46:12] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[00:46:12] ==================== drm_test_fb_swab =====================
[00:46:12] [PASSED] single_pixel_source_buffer
[00:46:12] [PASSED] single_pixel_clip_rectangle
[00:46:12] [PASSED] well_known_colors
[00:46:12] [PASSED] destination_pitch
[00:46:12] ================ [PASSED] drm_test_fb_swab =================
[00:46:12] ============ drm_test_fb_xrgb8888_to_xbgr8888 =============
[00:46:12] [PASSED] single_pixel_source_buffer
[00:46:12] [PASSED] single_pixel_clip_rectangle
[00:46:12] [PASSED] well_known_colors
[00:46:12] [PASSED] destination_pitch
[00:46:12] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[00:46:12] ============ drm_test_fb_xrgb8888_to_abgr8888 =============
[00:46:12] [PASSED] single_pixel_source_buffer
[00:46:12] [PASSED] single_pixel_clip_rectangle
[00:46:12] [PASSED] well_known_colors
[00:46:12] [PASSED] destination_pitch
[00:46:12] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[00:46:12] ================= drm_test_fb_clip_offset =================
[00:46:12] [PASSED] pass through
[00:46:12] [PASSED] horizontal offset
[00:46:12] [PASSED] vertical offset
[00:46:12] [PASSED] horizontal and vertical offset
[00:46:12] [PASSED] horizontal offset (custom pitch)
[00:46:12] [PASSED] vertical offset (custom pitch)
[00:46:12] [PASSED] horizontal and vertical offset (custom pitch)
[00:46:12] ============= [PASSED] drm_test_fb_clip_offset =============
[00:46:12] =================== drm_test_fb_memcpy ====================
[00:46:12] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[00:46:12] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[00:46:12] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[00:46:12] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[00:46:12] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[00:46:12] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[00:46:12] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[00:46:12] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[00:46:12] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[00:46:12] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[00:46:12] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[00:46:12] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[00:46:12] =============== [PASSED] drm_test_fb_memcpy ================
[00:46:12] ============= [PASSED] drm_format_helper_test ==============
[00:46:12] ================= drm_format (18 subtests) =================
[00:46:12] [PASSED] drm_test_format_block_width_invalid
[00:46:12] [PASSED] drm_test_format_block_width_one_plane
[00:46:12] [PASSED] drm_test_format_block_width_two_plane
[00:46:12] [PASSED] drm_test_format_block_width_three_plane
[00:46:12] [PASSED] drm_test_format_block_width_tiled
[00:46:12] [PASSED] drm_test_format_block_height_invalid
[00:46:12] [PASSED] drm_test_format_block_height_one_plane
[00:46:12] [PASSED] drm_test_format_block_height_two_plane
[00:46:12] [PASSED] drm_test_format_block_height_three_plane
[00:46:12] [PASSED] drm_test_format_block_height_tiled
[00:46:12] [PASSED] drm_test_format_min_pitch_invalid
[00:46:12] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[00:46:12] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[00:46:12] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[00:46:12] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[00:46:12] [PASSED] drm_test_format_min_pitch_two_plane
[00:46:12] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[00:46:12] [PASSED] drm_test_format_min_pitch_tiled
[00:46:12] =================== [PASSED] drm_format ====================
[00:46:12] ============== drm_framebuffer (10 subtests) ===============
[00:46:12] ========== drm_test_framebuffer_check_src_coords ==========
[00:46:12] [PASSED] Success: source fits into fb
[00:46:12] [PASSED] Fail: overflowing fb with x-axis coordinate
[00:46:12] [PASSED] Fail: overflowing fb with y-axis coordinate
[00:46:12] [PASSED] Fail: overflowing fb with source width
[00:46:12] [PASSED] Fail: overflowing fb with source height
[00:46:12] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[00:46:12] [PASSED] drm_test_framebuffer_cleanup
[00:46:12] =============== drm_test_framebuffer_create ===============
[00:46:12] [PASSED] ABGR8888 normal sizes
[00:46:12] [PASSED] ABGR8888 max sizes
[00:46:12] [PASSED] ABGR8888 pitch greater than min required
[00:46:12] [PASSED] ABGR8888 pitch less than min required
[00:46:12] [PASSED] ABGR8888 Invalid width
[00:46:12] [PASSED] ABGR8888 Invalid buffer handle
[00:46:12] [PASSED] No pixel format
[00:46:12] [PASSED] ABGR8888 Width 0
[00:46:12] [PASSED] ABGR8888 Height 0
[00:46:12] [PASSED] ABGR8888 Out of bound height * pitch combination
[00:46:12] [PASSED] ABGR8888 Large buffer offset
[00:46:12] [PASSED] ABGR8888 Buffer offset for inexistent plane
[00:46:12] [PASSED] ABGR8888 Invalid flag
[00:46:12] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[00:46:12] [PASSED] ABGR8888 Valid buffer modifier
[00:46:12] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[00:46:12] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[00:46:12] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[00:46:12] [PASSED] NV12 Normal sizes
[00:46:12] [PASSED] NV12 Max sizes
[00:46:12] [PASSED] NV12 Invalid pitch
[00:46:12] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[00:46:12] [PASSED] NV12 different modifier per-plane
[00:46:12] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[00:46:12] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[00:46:12] [PASSED] NV12 Modifier for inexistent plane
[00:46:12] [PASSED] NV12 Handle for inexistent plane
[00:46:12] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[00:46:12] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[00:46:12] [PASSED] YVU420 Normal sizes
[00:46:12] [PASSED] YVU420 Max sizes
[00:46:12] [PASSED] YVU420 Invalid pitch
[00:46:12] [PASSED] YVU420 Different pitches
[00:46:12] [PASSED] YVU420 Different buffer offsets/pitches
[00:46:12] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[00:46:12] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[00:46:12] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[00:46:12] [PASSED] YVU420 Valid modifier
[00:46:12] [PASSED] YVU420 Different modifiers per plane
[00:46:12] [PASSED] YVU420 Modifier for inexistent plane
[00:46:12] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[00:46:12] [PASSED] X0L2 Normal sizes
[00:46:12] [PASSED] X0L2 Max sizes
[00:46:12] [PASSED] X0L2 Invalid pitch
[00:46:12] [PASSED] X0L2 Pitch greater than minimum required
[00:46:12] [PASSED] X0L2 Handle for inexistent plane
[00:46:12] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[00:46:12] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[00:46:12] [PASSED] X0L2 Valid modifier
[00:46:12] [PASSED] X0L2 Modifier for inexistent plane
[00:46:12] =========== [PASSED] drm_test_framebuffer_create ===========
[00:46:12] [PASSED] drm_test_framebuffer_free
[00:46:12] [PASSED] drm_test_framebuffer_init
[00:46:12] [PASSED] drm_test_framebuffer_init_bad_format
[00:46:12] [PASSED] drm_test_framebuffer_init_dev_mismatch
[00:46:12] [PASSED] drm_test_framebuffer_lookup
[00:46:12] [PASSED] drm_test_framebuffer_lookup_inexistent
[00:46:12] [PASSED] drm_test_framebuffer_modifiers_not_supported
[00:46:12] ================= [PASSED] drm_framebuffer =================
[00:46:12] ================ drm_gem_shmem (8 subtests) ================
[00:46:12] [PASSED] drm_gem_shmem_test_obj_create
[00:46:12] [PASSED] drm_gem_shmem_test_obj_create_private
[00:46:12] [PASSED] drm_gem_shmem_test_pin_pages
[00:46:12] [PASSED] drm_gem_shmem_test_vmap
[00:46:12] [PASSED] drm_gem_shmem_test_get_pages_sgt
[00:46:12] [PASSED] drm_gem_shmem_test_get_sg_table
[00:46:12] [PASSED] drm_gem_shmem_test_madvise
[00:46:12] [PASSED] drm_gem_shmem_test_purge
[00:46:12] ================== [PASSED] drm_gem_shmem ==================
[00:46:12] === drm_atomic_helper_connector_hdmi_check (27 subtests) ===
[00:46:12] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[00:46:12] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[00:46:12] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[00:46:12] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[00:46:12] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[00:46:12] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[00:46:12] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420 =======
[00:46:12] [PASSED] Automatic
[00:46:12] [PASSED] Full
[00:46:12] [PASSED] Limited 16:235
[00:46:12] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[00:46:12] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[00:46:12] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[00:46:12] [PASSED] drm_test_check_disable_connector
[00:46:12] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[00:46:12] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[00:46:12] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[00:46:12] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[00:46:12] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[00:46:12] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[00:46:12] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[00:46:12] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[00:46:12] [PASSED] drm_test_check_output_bpc_dvi
[00:46:12] [PASSED] drm_test_check_output_bpc_format_vic_1
[00:46:12] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[00:46:12] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[00:46:12] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[00:46:12] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[00:46:12] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[00:46:12] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[00:46:12] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[00:46:12] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[00:46:12] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[00:46:12] [PASSED] drm_test_check_broadcast_rgb_value
[00:46:12] [PASSED] drm_test_check_bpc_8_value
[00:46:12] [PASSED] drm_test_check_bpc_10_value
[00:46:12] [PASSED] drm_test_check_bpc_12_value
[00:46:12] [PASSED] drm_test_check_format_value
[00:46:12] [PASSED] drm_test_check_tmds_char_value
[00:46:12] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[00:46:12] = drm_atomic_helper_connector_hdmi_mode_valid (4 subtests) =
[00:46:12] [PASSED] drm_test_check_mode_valid
[00:46:12] [PASSED] drm_test_check_mode_valid_reject
[00:46:12] [PASSED] drm_test_check_mode_valid_reject_rate
[00:46:12] [PASSED] drm_test_check_mode_valid_reject_max_clock
[00:46:12] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[00:46:12] ================= drm_managed (2 subtests) =================
[00:46:12] [PASSED] drm_test_managed_release_action
[00:46:12] [PASSED] drm_test_managed_run_action
[00:46:12] =================== [PASSED] drm_managed ===================
[00:46:12] =================== drm_mm (6 subtests) ====================
[00:46:12] [PASSED] drm_test_mm_init
[00:46:12] [PASSED] drm_test_mm_debug
[00:46:12] [PASSED] drm_test_mm_align32
[00:46:12] [PASSED] drm_test_mm_align64
[00:46:12] [PASSED] drm_test_mm_lowest
[00:46:12] [PASSED] drm_test_mm_highest
[00:46:12] ===================== [PASSED] drm_mm ======================
[00:46:12] ============= drm_modes_analog_tv (5 subtests) =============
[00:46:12] [PASSED] drm_test_modes_analog_tv_mono_576i
[00:46:12] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[00:46:12] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[00:46:12] [PASSED] drm_test_modes_analog_tv_pal_576i
[00:46:12] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[00:46:12] =============== [PASSED] drm_modes_analog_tv ===============
[00:46:12] ============== drm_plane_helper (2 subtests) ===============
[00:46:12] =============== drm_test_check_plane_state ================
[00:46:12] [PASSED] clipping_simple
[00:46:12] [PASSED] clipping_rotate_reflect
[00:46:12] [PASSED] positioning_simple
[00:46:12] [PASSED] upscaling
[00:46:12] [PASSED] downscaling
[00:46:12] [PASSED] rounding1
[00:46:12] [PASSED] rounding2
[00:46:12] [PASSED] rounding3
[00:46:12] [PASSED] rounding4
[00:46:12] =========== [PASSED] drm_test_check_plane_state ============
[00:46:12] =========== drm_test_check_invalid_plane_state ============
[00:46:12] [PASSED] positioning_invalid
[00:46:12] [PASSED] upscaling_invalid
[00:46:12] [PASSED] downscaling_invalid
[00:46:12] ======= [PASSED] drm_test_check_invalid_plane_state ========
[00:46:12] ================ [PASSED] drm_plane_helper =================
[00:46:12] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[00:46:12] ====== drm_test_connector_helper_tv_get_modes_check =======
[00:46:12] [PASSED] None
[00:46:12] [PASSED] PAL
[00:46:12] [PASSED] NTSC
[00:46:12] [PASSED] Both, NTSC Default
[00:46:12] [PASSED] Both, PAL Default
[00:46:12] [PASSED] Both, NTSC Default, with PAL on command-line
[00:46:12] [PASSED] Both, PAL Default, with NTSC on command-line
[00:46:12] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[00:46:12] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[00:46:12] ================== drm_rect (9 subtests) ===================
[00:46:12] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[00:46:12] [PASSED] drm_test_rect_clip_scaled_not_clipped
[00:46:12] [PASSED] drm_test_rect_clip_scaled_clipped
[00:46:12] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[00:46:12] ================= drm_test_rect_intersect =================
[00:46:12] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[00:46:12] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[00:46:12] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[00:46:12] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[00:46:12] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[00:46:12] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[00:46:12] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[00:46:12] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[00:46:12] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[00:46:12] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[00:46:12] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[00:46:12] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[00:46:12] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[00:46:12] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[00:46:12] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[00:46:12] ============= [PASSED] drm_test_rect_intersect =============
[00:46:12] ================ drm_test_rect_calc_hscale ================
[00:46:12] [PASSED] normal use
[00:46:12] [PASSED] out of max range
[00:46:12] [PASSED] out of min range
[00:46:12] [PASSED] zero dst
[00:46:12] [PASSED] negative src
[00:46:12] [PASSED] negative dst
[00:46:12] ============ [PASSED] drm_test_rect_calc_hscale ============
[00:46:12] ================ drm_test_rect_calc_vscale ================
[00:46:12] [PASSED] normal use
[00:46:12] [PASSED] out of max range
[00:46:12] [PASSED] out of min range
[00:46:12] [PASSED] zero dst
[00:46:12] [PASSED] negative src
[00:46:12] [PASSED] negative dst
[00:46:12] ============ [PASSED] drm_test_rect_calc_vscale ============
[00:46:12] ================== drm_test_rect_rotate ===================
[00:46:12] [PASSED] reflect-x
[00:46:12] [PASSED] reflect-y
[00:46:12] [PASSED] rotate-0
[00:46:12] [PASSED] rotate-90
[00:46:12] [PASSED] rotate-180
[00:46:12] [PASSED] rotate-270
stty: 'standard input': Inappropriate ioctl for device
[00:46:12] ============== [PASSED] drm_test_rect_rotate ===============
[00:46:12] ================ drm_test_rect_rotate_inv =================
[00:46:12] [PASSED] reflect-x
[00:46:12] [PASSED] reflect-y
[00:46:12] [PASSED] rotate-0
[00:46:12] [PASSED] rotate-90
[00:46:12] [PASSED] rotate-180
[00:46:12] [PASSED] rotate-270
[00:46:12] ============ [PASSED] drm_test_rect_rotate_inv =============
[00:46:12] ==================== [PASSED] drm_rect =====================
[00:46:12] ============ drm_sysfb_modeset_test (1 subtest) ============
[00:46:12] ============ drm_test_sysfb_build_fourcc_list =============
[00:46:12] [PASSED] no native formats
[00:46:12] [PASSED] XRGB8888 as native format
[00:46:12] [PASSED] remove duplicates
[00:46:12] [PASSED] convert alpha formats
[00:46:12] [PASSED] random formats
[00:46:12] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[00:46:12] ============= [PASSED] drm_sysfb_modeset_test ==============
[00:46:12] ============================================================
[00:46:12] Testing complete. Ran 616 tests: passed: 616
[00:46:12] Elapsed time: 23.505s total, 1.667s configuring, 21.620s building, 0.188s running
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[00:46:13] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[00:46:14] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[00:46:22] Starting KUnit Kernel (1/1)...
[00:46:22] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[00:46:22] ================= ttm_device (5 subtests) ==================
[00:46:22] [PASSED] ttm_device_init_basic
[00:46:22] [PASSED] ttm_device_init_multiple
[00:46:22] [PASSED] ttm_device_fini_basic
[00:46:22] [PASSED] ttm_device_init_no_vma_man
[00:46:22] ================== ttm_device_init_pools ==================
[00:46:22] [PASSED] No DMA allocations, no DMA32 required
[00:46:22] [PASSED] DMA allocations, DMA32 required
[00:46:22] [PASSED] No DMA allocations, DMA32 required
[00:46:22] [PASSED] DMA allocations, no DMA32 required
[00:46:22] ============== [PASSED] ttm_device_init_pools ==============
[00:46:22] =================== [PASSED] ttm_device ====================
[00:46:22] ================== ttm_pool (8 subtests) ===================
[00:46:22] ================== ttm_pool_alloc_basic ===================
[00:46:22] [PASSED] One page
[00:46:22] [PASSED] More than one page
[00:46:22] [PASSED] Above the allocation limit
[00:46:22] [PASSED] One page, with coherent DMA mappings enabled
[00:46:22] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[00:46:22] ============== [PASSED] ttm_pool_alloc_basic ===============
[00:46:22] ============== ttm_pool_alloc_basic_dma_addr ==============
[00:46:22] [PASSED] One page
[00:46:22] [PASSED] More than one page
[00:46:22] [PASSED] Above the allocation limit
[00:46:22] [PASSED] One page, with coherent DMA mappings enabled
[00:46:22] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[00:46:22] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[00:46:22] [PASSED] ttm_pool_alloc_order_caching_match
[00:46:22] [PASSED] ttm_pool_alloc_caching_mismatch
[00:46:22] [PASSED] ttm_pool_alloc_order_mismatch
[00:46:22] [PASSED] ttm_pool_free_dma_alloc
[00:46:22] [PASSED] ttm_pool_free_no_dma_alloc
[00:46:22] [PASSED] ttm_pool_fini_basic
[00:46:22] ==================== [PASSED] ttm_pool =====================
[00:46:22] ================ ttm_resource (8 subtests) =================
[00:46:22] ================= ttm_resource_init_basic =================
[00:46:22] [PASSED] Init resource in TTM_PL_SYSTEM
[00:46:22] [PASSED] Init resource in TTM_PL_VRAM
[00:46:22] [PASSED] Init resource in a private placement
[00:46:22] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[00:46:22] ============= [PASSED] ttm_resource_init_basic =============
[00:46:22] [PASSED] ttm_resource_init_pinned
[00:46:22] [PASSED] ttm_resource_fini_basic
[00:46:22] [PASSED] ttm_resource_manager_init_basic
[00:46:22] [PASSED] ttm_resource_manager_usage_basic
[00:46:22] [PASSED] ttm_resource_manager_set_used_basic
[00:46:22] [PASSED] ttm_sys_man_alloc_basic
[00:46:22] [PASSED] ttm_sys_man_free_basic
[00:46:22] ================== [PASSED] ttm_resource ===================
[00:46:22] =================== ttm_tt (15 subtests) ===================
[00:46:22] ==================== ttm_tt_init_basic ====================
[00:46:22] [PASSED] Page-aligned size
[00:46:22] [PASSED] Extra pages requested
[00:46:22] ================ [PASSED] ttm_tt_init_basic ================
[00:46:22] [PASSED] ttm_tt_init_misaligned
[00:46:22] [PASSED] ttm_tt_fini_basic
[00:46:22] [PASSED] ttm_tt_fini_sg
[00:46:22] [PASSED] ttm_tt_fini_shmem
[00:46:22] [PASSED] ttm_tt_create_basic
[00:46:22] [PASSED] ttm_tt_create_invalid_bo_type
[00:46:22] [PASSED] ttm_tt_create_ttm_exists
[00:46:22] [PASSED] ttm_tt_create_failed
[00:46:22] [PASSED] ttm_tt_destroy_basic
[00:46:22] [PASSED] ttm_tt_populate_null_ttm
[00:46:22] [PASSED] ttm_tt_populate_populated_ttm
[00:46:22] [PASSED] ttm_tt_unpopulate_basic
[00:46:22] [PASSED] ttm_tt_unpopulate_empty_ttm
[00:46:22] [PASSED] ttm_tt_swapin_basic
[00:46:22] ===================== [PASSED] ttm_tt ======================
[00:46:22] =================== ttm_bo (14 subtests) ===================
[00:46:22] =========== ttm_bo_reserve_optimistic_no_ticket ===========
[00:46:22] [PASSED] Cannot be interrupted and sleeps
[00:46:22] [PASSED] Cannot be interrupted, locks straight away
[00:46:22] [PASSED] Can be interrupted, sleeps
[00:46:22] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[00:46:22] [PASSED] ttm_bo_reserve_locked_no_sleep
[00:46:22] [PASSED] ttm_bo_reserve_no_wait_ticket
[00:46:22] [PASSED] ttm_bo_reserve_double_resv
[00:46:22] [PASSED] ttm_bo_reserve_interrupted
[00:46:22] [PASSED] ttm_bo_reserve_deadlock
[00:46:22] [PASSED] ttm_bo_unreserve_basic
[00:46:22] [PASSED] ttm_bo_unreserve_pinned
[00:46:22] [PASSED] ttm_bo_unreserve_bulk
[00:46:22] [PASSED] ttm_bo_put_basic
[00:46:22] [PASSED] ttm_bo_put_shared_resv
[00:46:22] [PASSED] ttm_bo_pin_basic
[00:46:22] [PASSED] ttm_bo_pin_unpin_resource
[00:46:22] [PASSED] ttm_bo_multiple_pin_one_unpin
[00:46:22] ===================== [PASSED] ttm_bo ======================
[00:46:22] ============== ttm_bo_validate (22 subtests) ===============
[00:46:22] ============== ttm_bo_init_reserved_sys_man ===============
[00:46:22] [PASSED] Buffer object for userspace
[00:46:22] [PASSED] Kernel buffer object
[00:46:22] [PASSED] Shared buffer object
[00:46:22] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[00:46:22] ============== ttm_bo_init_reserved_mock_man ==============
[00:46:22] [PASSED] Buffer object for userspace
[00:46:22] [PASSED] Kernel buffer object
[00:46:22] [PASSED] Shared buffer object
[00:46:22] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[00:46:22] [PASSED] ttm_bo_init_reserved_resv
[00:46:22] ================== ttm_bo_validate_basic ==================
[00:46:22] [PASSED] Buffer object for userspace
[00:46:22] [PASSED] Kernel buffer object
[00:46:22] [PASSED] Shared buffer object
[00:46:22] ============== [PASSED] ttm_bo_validate_basic ==============
[00:46:22] [PASSED] ttm_bo_validate_invalid_placement
[00:46:22] ============= ttm_bo_validate_same_placement ==============
[00:46:22] [PASSED] System manager
[00:46:22] [PASSED] VRAM manager
[00:46:22] ========= [PASSED] ttm_bo_validate_same_placement ==========
[00:46:22] [PASSED] ttm_bo_validate_failed_alloc
[00:46:22] [PASSED] ttm_bo_validate_pinned
[00:46:22] [PASSED] ttm_bo_validate_busy_placement
[00:46:22] ================ ttm_bo_validate_multihop =================
[00:46:22] [PASSED] Buffer object for userspace
[00:46:22] [PASSED] Kernel buffer object
[00:46:22] [PASSED] Shared buffer object
[00:46:22] ============ [PASSED] ttm_bo_validate_multihop =============
[00:46:22] ========== ttm_bo_validate_no_placement_signaled ==========
[00:46:22] [PASSED] Buffer object in system domain, no page vector
[00:46:22] [PASSED] Buffer object in system domain with an existing page vector
[00:46:22] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[00:46:22] ======== ttm_bo_validate_no_placement_not_signaled ========
[00:46:22] [PASSED] Buffer object for userspace
[00:46:22] [PASSED] Kernel buffer object
[00:46:22] [PASSED] Shared buffer object
[00:46:22] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[00:46:22] [PASSED] ttm_bo_validate_move_fence_signaled
[00:46:22] ========= ttm_bo_validate_move_fence_not_signaled =========
[00:46:22] [PASSED] Waits for GPU
[00:46:22] [PASSED] Tries to lock straight away
[00:46:22] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[00:46:22] [PASSED] ttm_bo_validate_swapout
[00:46:22] [PASSED] ttm_bo_validate_happy_evict
[00:46:22] [PASSED] ttm_bo_validate_all_pinned_evict
[00:46:22] [PASSED] ttm_bo_validate_allowed_only_evict
[00:46:22] [PASSED] ttm_bo_validate_deleted_evict
[00:46:22] [PASSED] ttm_bo_validate_busy_domain_evict
[00:46:22] [PASSED] ttm_bo_validate_evict_gutting
[00:46:22] [PASSED] ttm_bo_validate_recrusive_evict
stty: 'standard input': Inappropriate ioctl for device
[00:46:22] ================= [PASSED] ttm_bo_validate =================
[00:46:22] ============================================================
[00:46:22] Testing complete. Ran 102 tests: passed: 102
[00:46:23] Elapsed time: 10.004s total, 1.658s configuring, 7.728s building, 0.531s running
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel
^ permalink raw reply [flat|nested] 45+ messages in thread
* ✗ CI.checksparse: warning for Use DRM scheduler for delayed GT TLB invalidations (rev2)
2025-07-02 23:42 [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations Matthew Brost
` (10 preceding siblings ...)
2025-07-03 0:46 ` ✓ CI.KUnit: success " Patchwork
@ 2025-07-03 1:00 ` Patchwork
2025-07-03 1:32 ` ✓ Xe.CI.BAT: success " Patchwork
2025-07-04 18:11 ` ✗ Xe.CI.Full: failure " Patchwork
13 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2025-07-03 1:00 UTC (permalink / raw)
To: Matthew Brost; +Cc: intel-xe
== Series Details ==
Series: Use DRM scheduler for delayed GT TLB invalidations (rev2)
URL : https://patchwork.freedesktop.org/series/150402/
State : warning
== Summary ==
+ trap cleanup EXIT
+ KERNEL=/kernel
+ MT=/root/linux/maintainer-tools
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools /root/linux/maintainer-tools
Cloning into '/root/linux/maintainer-tools'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ make -C /root/linux/maintainer-tools
make: Entering directory '/root/linux/maintainer-tools'
cc -O2 -g -Wextra -o remap-log remap-log.c
make: Leaving directory '/root/linux/maintainer-tools'
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ /root/linux/maintainer-tools/dim sparse --fast d04a54cd3b99001adbc4cd3305b44f9f3e658407
Sparse version: 0.6.4 (Ubuntu: 0.6.4-4ubuntu3)
Fast mode used, each commit won't be checked separately.
-
+drivers/gpu/drm/drm_drv.c:452:6: warning: context imbalance in 'drm_dev_enter' - different lock contexts for basic block
+drivers/gpu/drm/drm_drv.c: note: in included file (through include/linux/notifier.h, arch/x86/include/asm/uprobes.h, include/linux/uprobes.h, include/linux/mm_types.h, include/linux/mmzone.h, include/linux/gfp.h, ...):
+drivers/gpu/drm/drm_plane.c:213:24: warning: Using plain integer as NULL pointer
+drivers/gpu/drm/i915/intel_uncore.c:1927:1: warning: context imbalance in 'fwtable_read8' - unexpected unlock
+drivers/gpu/drm/i915/intel_uncore.c:1928:1: warning: context imbalance in 'fwtable_read16' - unexpected unlock
+drivers/gpu/drm/i915/intel_uncore.c:1929:1: warning: context imbalance in 'fwtable_read32' - unexpected unlock
+drivers/gpu/drm/i915/intel_uncore.c:1930:1: warning: context imbalance in 'fwtable_read64' - unexpected unlock
+drivers/gpu/drm/i915/intel_uncore.c:1995:1: warning: context imbalance in 'gen6_write8' - unexpected unlock
+drivers/gpu/drm/i915/intel_uncore.c:1996:1: warning: context imbalance in 'gen6_write16' - unexpected unlock
+drivers/gpu/drm/i915/intel_uncore.c:1997:1: warning: context imbalance in 'gen6_write32' - unexpected unlock
+drivers/gpu/drm/i915/intel_uncore.c:2017:1: warning: context imbalance in 'fwtable_write8' - unexpected unlock
+drivers/gpu/drm/i915/intel_uncore.c:2018:1: warning: context imbalance in 'fwtable_write16' - unexpected unlock
+drivers/gpu/drm/i915/intel_uncore.c:2019:1: warning: context imbalance in 'fwtable_write32' - unexpected unlock
+./include/linux/srcu.h:400:9: warning: context imbalance in 'drm_dev_exit' - unexpected unlock
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel
^ permalink raw reply [flat|nested] 45+ messages in thread
* ✓ Xe.CI.BAT: success for Use DRM scheduler for delayed GT TLB invalidations (rev2)
2025-07-02 23:42 [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations Matthew Brost
` (11 preceding siblings ...)
2025-07-03 1:00 ` ✗ CI.checksparse: warning " Patchwork
@ 2025-07-03 1:32 ` Patchwork
2025-07-04 18:11 ` ✗ Xe.CI.Full: failure " Patchwork
13 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2025-07-03 1:32 UTC (permalink / raw)
To: Matthew Brost; +Cc: intel-xe
[-- Attachment #1: Type: text/plain, Size: 1706 bytes --]
== Series Details ==
Series: Use DRM scheduler for delayed GT TLB invalidations (rev2)
URL : https://patchwork.freedesktop.org/series/150402/
State : success
== Summary ==
CI Bug Log - changes from xe-3339-d04a54cd3b99001adbc4cd3305b44f9f3e658407_BAT -> xe-pw-150402v2_BAT
====================================================
Summary
-------
**SUCCESS**
No regressions found.
Participating hosts (9 -> 8)
------------------------------
Missing (1): bat-adlp-vm
Known issues
------------
Here are the changes found in xe-pw-150402v2_BAT that come from known issues:
### IGT changes ###
#### Issues hit ####
* igt@kms_flip@basic-plain-flip@b-edp1:
- bat-adlp-7: [PASS][1] -> [DMESG-WARN][2] ([Intel XE#4543]) +1 other test dmesg-warn
[1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3339-d04a54cd3b99001adbc4cd3305b44f9f3e658407/bat-adlp-7/igt@kms_flip@basic-plain-flip@b-edp1.html
[2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-150402v2/bat-adlp-7/igt@kms_flip@basic-plain-flip@b-edp1.html
[Intel XE#4543]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4543
Build changes
-------------
* IGT: IGT_8434 -> IGT_8435
* Linux: xe-3339-d04a54cd3b99001adbc4cd3305b44f9f3e658407 -> xe-pw-150402v2
IGT_8434: 5185b9527673518a418d575c3f58b5554e27f111 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
IGT_8435: 157b34af651681184df1f41c47576f77c3b784f1 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
xe-3339-d04a54cd3b99001adbc4cd3305b44f9f3e658407: d04a54cd3b99001adbc4cd3305b44f9f3e658407
xe-pw-150402v2: 150402v2
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-150402v2/index.html
[-- Attachment #2: Type: text/html, Size: 2285 bytes --]
^ permalink raw reply [flat|nested] 45+ messages in thread
* ✗ Xe.CI.Full: failure for Use DRM scheduler for delayed GT TLB invalidations (rev2)
2025-07-02 23:42 [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations Matthew Brost
` (12 preceding siblings ...)
2025-07-03 1:32 ` ✓ Xe.CI.BAT: success " Patchwork
@ 2025-07-04 18:11 ` Patchwork
13 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2025-07-04 18:11 UTC (permalink / raw)
To: Matthew Brost; +Cc: intel-xe
[-- Attachment #1: Type: text/plain, Size: 388 bytes --]
== Series Details ==
Series: Use DRM scheduler for delayed GT TLB invalidations (rev2)
URL : https://patchwork.freedesktop.org/series/150402/
State : failure
== Summary ==
ERROR: The runconfig 'xe-3339-d04a54cd3b99001adbc4cd3305b44f9f3e658407_FULL' does not exist in the database
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-150402v2/index.html
[-- Attachment #2: Type: text/html, Size: 953 bytes --]
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 1/9] drm/xe: Explicitly mark migration queues with flag
2025-07-02 23:42 ` [PATCH v2 1/9] drm/xe: Explicitly mark migration queues with flag Matthew Brost
@ 2025-07-10 8:43 ` Francois Dugast
2025-07-11 21:20 ` Summers, Stuart
1 sibling, 0 replies; 45+ messages in thread
From: Francois Dugast @ 2025-07-10 8:43 UTC (permalink / raw)
To: Matthew Brost; +Cc: intel-xe, matthew.auld, maarten.lankhorst
On Wed, Jul 02, 2025 at 04:42:14PM -0700, Matthew Brost wrote:
> Rather than inferring if an exec queue is a migration queue for a flag,
> explicitly mark migration queues with a flag.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
> ---
> drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 ++
> drivers/gpu/drm/xe/xe_migrate.c | 6 ++++--
> 2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index cc1cffb5c87f..abdf4a57e6e2 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -87,6 +87,8 @@ struct xe_exec_queue {
> #define EXEC_QUEUE_FLAG_HIGH_PRIORITY BIT(4)
> /* flag to indicate low latency hint to guc */
> #define EXEC_QUEUE_FLAG_LOW_LATENCY BIT(5)
> +/* for migration (kernel copy, clear, bind) jobs */
> +#define EXEC_QUEUE_FLAG_MIGRATE BIT(6)
>
> /**
> * @flags: flags for this exec queue, should statically setup aside from ban
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index 0838582537e8..b5f85162b9ed 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -437,12 +437,14 @@ struct xe_migrate *xe_migrate_init(struct xe_tile *tile)
> m->q = xe_exec_queue_create(xe, vm, logical_mask, 1, hwe,
> EXEC_QUEUE_FLAG_KERNEL |
> EXEC_QUEUE_FLAG_PERMANENT |
> - EXEC_QUEUE_FLAG_HIGH_PRIORITY, 0);
> + EXEC_QUEUE_FLAG_HIGH_PRIORITY |
> + EXEC_QUEUE_FLAG_MIGRATE, 0);
> } else {
> m->q = xe_exec_queue_create_class(xe, primary_gt, vm,
> XE_ENGINE_CLASS_COPY,
> EXEC_QUEUE_FLAG_KERNEL |
> - EXEC_QUEUE_FLAG_PERMANENT, 0);
> + EXEC_QUEUE_FLAG_PERMANENT |
> + EXEC_QUEUE_FLAG_MIGRATE, 0);
> }
> if (IS_ERR(m->q)) {
> xe_vm_close_and_put(vm);
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 2/9] drm/xe: Add generic dependecy jobs / scheduler
2025-07-02 23:42 ` [PATCH v2 2/9] drm/xe: Add generic dependecy jobs / scheduler Matthew Brost
@ 2025-07-10 11:51 ` Francois Dugast
2025-07-10 17:38 ` Matthew Brost
2025-07-15 21:04 ` Summers, Stuart
1 sibling, 1 reply; 45+ messages in thread
From: Francois Dugast @ 2025-07-10 11:51 UTC (permalink / raw)
To: Matthew Brost; +Cc: intel-xe, matthew.auld, maarten.lankhorst
On Wed, Jul 02, 2025 at 04:42:15PM -0700, Matthew Brost wrote:
> Add generic dependecy jobs / scheduler which serves as wrapper for DRM
> scheduler. Useful when we want delay a generic operation until a
> dma-fence signals.
>
> Existing use cases could be destroying of resources based fences /
> dma-resv, the preempt rebind worker, and pipelined GT TLB invalidations.
>
> Written in such a way it could be moved to DRM subsystem if needed.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
> drivers/gpu/drm/xe/Makefile | 1 +
> drivers/gpu/drm/xe/xe_dep_job_types.h | 29 ++++++
> drivers/gpu/drm/xe/xe_dep_scheduler.c | 145 ++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_dep_scheduler.h | 21 ++++
> 4 files changed, 196 insertions(+)
> create mode 100644 drivers/gpu/drm/xe/xe_dep_job_types.h
> create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.c
> create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.h
>
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 1d97e5b63f4e..0edcfc770c0d 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -28,6 +28,7 @@ $(obj)/generated/%_wa_oob.c $(obj)/generated/%_wa_oob.h: $(obj)/xe_gen_wa_oob \
> xe-y += xe_bb.o \
> xe_bo.o \
> xe_bo_evict.o \
> + xe_dep_scheduler.o \
> xe_devcoredump.o \
> xe_device.o \
> xe_device_sysfs.o \
> diff --git a/drivers/gpu/drm/xe/xe_dep_job_types.h b/drivers/gpu/drm/xe/xe_dep_job_types.h
> new file mode 100644
> index 000000000000..c6a484f24c8c
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_dep_job_types.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_DEP_JOB_TYPES_H_
> +#define _XE_DEP_JOB_TYPES_H_
> +
> +#include <drm/gpu_scheduler.h>
> +
> +struct xe_dep_job;
> +
> +/** struct xe_dep_job_ops - Generic Xe dependency job operations */
> +struct xe_dep_job_ops {
> + /** @run_job: Run generic Xe dependency job */
> + struct dma_fence *(*run_job)(struct xe_dep_job *job);
> + /** @free_job: Free generic Xe dependency job */
> + void (*free_job)(struct xe_dep_job *job);
> +};
> +
> +/** struct xe_dep_job - Generic dependency Xe job */
> +struct xe_dep_job {
> + /** @drm: base DRM scheduler job */
> + struct drm_sched_job drm;
> + /** @ops: dependency job operations */
> + const struct xe_dep_job_ops *ops;
> +};
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.c b/drivers/gpu/drm/xe/xe_dep_scheduler.c
> new file mode 100644
> index 000000000000..fbd55577d787
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_dep_scheduler.c
> @@ -0,0 +1,145 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#include <linux/slab.h>
> +
> +#include <drm/gpu_scheduler.h>
> +
> +#include "xe_dep_job_types.h"
> +#include "xe_dep_scheduler.h"
> +#include "xe_device_types.h"
> +
> +/**
> + * DOC: Xe Dependency Scheduler
> + *
> + * The Xe dependency scheduler is a simple wrapper built around the DRM
> + * scheduler to execute jobs once their dependencies are resolved (i.e., all
> + * input fences specified as dependencies are signaled). The jobs that are
> + * executed contain virtual functions to run (execute) and free the job,
> + * allowing a single dependency scheduler to handle jobs performing different
> + * operations.
> + *
> + * Example use cases include deferred resource freeing, TLB invalidations after
> + * bind jobs, etc.
> + */
This is already well documentated but as this code might eventually get
promoted to DRM, maybe we could add generic pseudo-code showing how it is
intended to be used, for example:
* .. code-block:: c
*
* struct my_job {
* struct xe_dep_job dep;
* struct dma_fence *fence;
* ...
* }
*
*
* static struct dma_fence *my_job_run(struct xe_dep_job *dep_job)
* {
* struct my_job *job = container_of(dep_job, typeof(*job), dep);
*
* // start the job and get a fence
* ...
*
* return job->fence;
* }
*
* static void my_job_free(struct xe_dep_job *dep_job)
* {
* ...
* }
*
* static const struct xe_dep_job_ops my_job_ops = {
* .run_job = my_job_run,
* .free_job = my_job_free,
* };
*
* void init()
* {
* struct xe_dep_scheduler *dep_scheduler = xe_dep_scheduler_create(xe, wq, name, 16);
* ...
* }
*
* struct my_job *create()
* {
* struct drm_sched_entity *entity = xe_dep_scheduler_entity(dep_scheduler);
* struct my_job *job;
* ...
*
* job->dep.ops = &dep_job_ops;
* drm_sched_job_init(&job->dep.drm, entity, ...);
*
* return job;
* }
*
* void cleanup()
* {
* xe_dep_scheduler_fini(dep_scheduler);
* ...
* }
> +
> +/** struct xe_dep_scheduler - Generic Xe dependency scheduler */
> +struct xe_dep_scheduler {
> + /** @sched: DRM GPU scheduler */
> + struct drm_gpu_scheduler sched;
> + /** @entity: DRM scheduler entity */
> + struct drm_sched_entity entity;
> + /** @rcu: For safe freeing of exported dma fences */
> + struct rcu_head rcu;
Is it used in the series?
Francois
> +};
> +
> +static struct dma_fence *xe_dep_scheduler_run_job(struct drm_sched_job *drm_job)
> +{
> + struct xe_dep_job *dep_job =
> + container_of(drm_job, typeof(*dep_job), drm);
> +
> + return dep_job->ops->run_job(dep_job);
> +}
> +
> +static void xe_dep_scheduler_free_job(struct drm_sched_job *drm_job)
> +{
> + struct xe_dep_job *dep_job =
> + container_of(drm_job, typeof(*dep_job), drm);
> +
> + dep_job->ops->free_job(dep_job);
> +}
> +
> +static const struct drm_sched_backend_ops sched_ops = {
> + .run_job = xe_dep_scheduler_run_job,
> + .free_job = xe_dep_scheduler_free_job,
> +};
> +
> +/**
> + * xe_dep_scheduler_create() - Generic Xe dependency scheduler create
> + * @xe: Xe device
> + * @submit_wq: Submit workqueue struct (can be NULL)
> + * @name: Name of dependency scheduler
> + * @job_limit: Max dependency jobs that can be scheduled
> + *
> + * Create a generic Xe dependency scheduler and initialize internal DRM
> + * scheduler objects.
> + *
> + * Return: Generic Xe dependency scheduler object or ERR_PTR
> + */
> +struct xe_dep_scheduler *
> +xe_dep_scheduler_create(struct xe_device *xe,
> + struct workqueue_struct *submit_wq,
> + const char *name, u32 job_limit)
> +{
> + struct xe_dep_scheduler *dep_scheduler;
> + struct drm_gpu_scheduler *sched;
> + const struct drm_sched_init_args args = {
> + .ops = &sched_ops,
> + .submit_wq = submit_wq,
> + .num_rqs = 1,
> + .credit_limit = job_limit,
> + .timeout = MAX_SCHEDULE_TIMEOUT,
> + .name = name,
> + .dev = xe->drm.dev,
> + };
> + int err;
> +
> + dep_scheduler = kzalloc(sizeof(*dep_scheduler), GFP_KERNEL);
> + if (!dep_scheduler)
> + return ERR_PTR(-ENOMEM);
> +
> + err = drm_sched_init(&dep_scheduler->sched, &args);
> + if (err)
> + goto err_free;
> +
> + sched = &dep_scheduler->sched;
> + err = drm_sched_entity_init(&dep_scheduler->entity, 0,
> + (struct drm_gpu_scheduler **)&sched, 1,
> + NULL);
> + if (err)
> + goto err_sched;
> +
> + init_rcu_head(&dep_scheduler->rcu);
> +
> + return dep_scheduler;
> +
> +err_sched:
> + drm_sched_fini(&dep_scheduler->sched);
> +err_free:
> + kfree(dep_scheduler);
> +
> + return ERR_PTR(err);
> +}
> +
> +/**
> + * xe_dep_scheduler_fini() - Generic Xe dependency scheduler finalize
> + * @dep_scheduler: Generic Xe dependency scheduler object
> + *
> + * Finalize internal DRM scheduler objects and free generic Xe dependency
> + * scheduler object
> + */
> +void xe_dep_scheduler_fini(struct xe_dep_scheduler *dep_scheduler)
> +{
> + drm_sched_entity_fini(&dep_scheduler->entity);
> + drm_sched_fini(&dep_scheduler->sched);
> + /*
> + * RCU free due sched being exported via DRM scheduler fences
> + * (timeline name).
> + */
> + kfree_rcu(dep_scheduler, rcu);
> +}
> +
> +/**
> + * xe_dep_scheduler_entity() - Retrieve a generic Xe dependency scheduler
> + * DRM scheduler entity
> + * @dep_scheduler: Generic Xe dependency scheduler object
> + *
> + * Return: The generic Xe dependency scheduler's DRM scheduler entity
> + */
> +struct drm_sched_entity *
> +xe_dep_scheduler_entity(struct xe_dep_scheduler *dep_scheduler)
> +{
> + return &dep_scheduler->entity;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.h b/drivers/gpu/drm/xe/xe_dep_scheduler.h
> new file mode 100644
> index 000000000000..853961eec64b
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_dep_scheduler.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#include <linux/types.h>
> +
> +struct drm_sched_entity;
> +struct workqueue_struct;
> +struct xe_dep_scheduler;
> +struct xe_device;
> +
> +struct xe_dep_scheduler *
> +xe_dep_scheduler_create(struct xe_device *xe,
> + struct workqueue_struct *submit_wq,
> + const char *name, u32 job_limit);
> +
> +void xe_dep_scheduler_fini(struct xe_dep_scheduler *dep_scheduler);
> +
> +struct drm_sched_entity *
> +xe_dep_scheduler_entity(struct xe_dep_scheduler *dep_scheduler);
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 2/9] drm/xe: Add generic dependecy jobs / scheduler
2025-07-10 11:51 ` Francois Dugast
@ 2025-07-10 17:38 ` Matthew Brost
0 siblings, 0 replies; 45+ messages in thread
From: Matthew Brost @ 2025-07-10 17:38 UTC (permalink / raw)
To: Francois Dugast; +Cc: intel-xe, matthew.auld, maarten.lankhorst
On Thu, Jul 10, 2025 at 01:51:15PM +0200, Francois Dugast wrote:
> On Wed, Jul 02, 2025 at 04:42:15PM -0700, Matthew Brost wrote:
> > Add generic dependecy jobs / scheduler which serves as wrapper for DRM
> > scheduler. Useful when we want delay a generic operation until a
> > dma-fence signals.
> >
> > Existing use cases could be destroying of resources based fences /
> > dma-resv, the preempt rebind worker, and pipelined GT TLB invalidations.
> >
> > Written in such a way it could be moved to DRM subsystem if needed.
> >
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> > drivers/gpu/drm/xe/Makefile | 1 +
> > drivers/gpu/drm/xe/xe_dep_job_types.h | 29 ++++++
> > drivers/gpu/drm/xe/xe_dep_scheduler.c | 145 ++++++++++++++++++++++++++
> > drivers/gpu/drm/xe/xe_dep_scheduler.h | 21 ++++
> > 4 files changed, 196 insertions(+)
> > create mode 100644 drivers/gpu/drm/xe/xe_dep_job_types.h
> > create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.c
> > create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.h
> >
> > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> > index 1d97e5b63f4e..0edcfc770c0d 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -28,6 +28,7 @@ $(obj)/generated/%_wa_oob.c $(obj)/generated/%_wa_oob.h: $(obj)/xe_gen_wa_oob \
> > xe-y += xe_bb.o \
> > xe_bo.o \
> > xe_bo_evict.o \
> > + xe_dep_scheduler.o \
> > xe_devcoredump.o \
> > xe_device.o \
> > xe_device_sysfs.o \
> > diff --git a/drivers/gpu/drm/xe/xe_dep_job_types.h b/drivers/gpu/drm/xe/xe_dep_job_types.h
> > new file mode 100644
> > index 000000000000..c6a484f24c8c
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_dep_job_types.h
> > @@ -0,0 +1,29 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#ifndef _XE_DEP_JOB_TYPES_H_
> > +#define _XE_DEP_JOB_TYPES_H_
> > +
> > +#include <drm/gpu_scheduler.h>
> > +
> > +struct xe_dep_job;
> > +
> > +/** struct xe_dep_job_ops - Generic Xe dependency job operations */
> > +struct xe_dep_job_ops {
> > + /** @run_job: Run generic Xe dependency job */
> > + struct dma_fence *(*run_job)(struct xe_dep_job *job);
> > + /** @free_job: Free generic Xe dependency job */
> > + void (*free_job)(struct xe_dep_job *job);
> > +};
> > +
> > +/** struct xe_dep_job - Generic dependency Xe job */
> > +struct xe_dep_job {
> > + /** @drm: base DRM scheduler job */
> > + struct drm_sched_job drm;
> > + /** @ops: dependency job operations */
> > + const struct xe_dep_job_ops *ops;
> > +};
> > +
> > +#endif
> > diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.c b/drivers/gpu/drm/xe/xe_dep_scheduler.c
> > new file mode 100644
> > index 000000000000..fbd55577d787
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_dep_scheduler.c
> > @@ -0,0 +1,145 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#include <linux/slab.h>
> > +
> > +#include <drm/gpu_scheduler.h>
> > +
> > +#include "xe_dep_job_types.h"
> > +#include "xe_dep_scheduler.h"
> > +#include "xe_device_types.h"
> > +
> > +/**
> > + * DOC: Xe Dependency Scheduler
> > + *
> > + * The Xe dependency scheduler is a simple wrapper built around the DRM
> > + * scheduler to execute jobs once their dependencies are resolved (i.e., all
> > + * input fences specified as dependencies are signaled). The jobs that are
> > + * executed contain virtual functions to run (execute) and free the job,
> > + * allowing a single dependency scheduler to handle jobs performing different
> > + * operations.
> > + *
> > + * Example use cases include deferred resource freeing, TLB invalidations after
> > + * bind jobs, etc.
> > + */
>
> This is already well documentated but as this code might eventually get
> promoted to DRM, maybe we could add generic pseudo-code showing how it is
> intended to be used, for example:
>
This seems a little verbose and typically only use detailed code-block
example like this in uAPI. xe_gt_tlb_inval_job.c (patch 7) should show
how use properly use this too.
If / when this promoted to DRM, then perhaps some code examples would be
needed.
> * .. code-block:: c
> *
> * struct my_job {
> * struct xe_dep_job dep;
> * struct dma_fence *fence;
> * ...
> * }
> *
> *
> * static struct dma_fence *my_job_run(struct xe_dep_job *dep_job)
> * {
> * struct my_job *job = container_of(dep_job, typeof(*job), dep);
> *
> * // start the job and get a fence
> * ...
> *
> * return job->fence;
> * }
> *
> * static void my_job_free(struct xe_dep_job *dep_job)
> * {
> * ...
> * }
> *
> * static const struct xe_dep_job_ops my_job_ops = {
> * .run_job = my_job_run,
> * .free_job = my_job_free,
> * };
> *
> * void init()
> * {
> * struct xe_dep_scheduler *dep_scheduler = xe_dep_scheduler_create(xe, wq, name, 16);
> * ...
> * }
> *
> * struct my_job *create()
> * {
> * struct drm_sched_entity *entity = xe_dep_scheduler_entity(dep_scheduler);
> * struct my_job *job;
> * ...
> *
> * job->dep.ops = &dep_job_ops;
> * drm_sched_job_init(&job->dep.drm, entity, ...);
> *
> * return job;
> * }
> *
> * void cleanup()
> * {
> * xe_dep_scheduler_fini(dep_scheduler);
> * ...
> * }
>
> > +
> > +/** struct xe_dep_scheduler - Generic Xe dependency scheduler */
> > +struct xe_dep_scheduler {
> > + /** @sched: DRM GPU scheduler */
> > + struct drm_gpu_scheduler sched;
> > + /** @entity: DRM scheduler entity */
> > + struct drm_sched_entity entity;
> > + /** @rcu: For safe freeing of exported dma fences */
> > + struct rcu_head rcu;
>
> Is it used in the series?
>
Yes. xe_dep_scheduler_fini calls kfree_rcu which is required to avoid a
UAF in fences hanging around that were generated from this DRM
scheduler.
Matt
> Francois
>
> > +};
> > +
> > +static struct dma_fence *xe_dep_scheduler_run_job(struct drm_sched_job *drm_job)
> > +{
> > + struct xe_dep_job *dep_job =
> > + container_of(drm_job, typeof(*dep_job), drm);
> > +
> > + return dep_job->ops->run_job(dep_job);
> > +}
> > +
> > +static void xe_dep_scheduler_free_job(struct drm_sched_job *drm_job)
> > +{
> > + struct xe_dep_job *dep_job =
> > + container_of(drm_job, typeof(*dep_job), drm);
> > +
> > + dep_job->ops->free_job(dep_job);
> > +}
> > +
> > +static const struct drm_sched_backend_ops sched_ops = {
> > + .run_job = xe_dep_scheduler_run_job,
> > + .free_job = xe_dep_scheduler_free_job,
> > +};
> > +
> > +/**
> > + * xe_dep_scheduler_create() - Generic Xe dependency scheduler create
> > + * @xe: Xe device
> > + * @submit_wq: Submit workqueue struct (can be NULL)
> > + * @name: Name of dependency scheduler
> > + * @job_limit: Max dependency jobs that can be scheduled
> > + *
> > + * Create a generic Xe dependency scheduler and initialize internal DRM
> > + * scheduler objects.
> > + *
> > + * Return: Generic Xe dependency scheduler object or ERR_PTR
> > + */
> > +struct xe_dep_scheduler *
> > +xe_dep_scheduler_create(struct xe_device *xe,
> > + struct workqueue_struct *submit_wq,
> > + const char *name, u32 job_limit)
> > +{
> > + struct xe_dep_scheduler *dep_scheduler;
> > + struct drm_gpu_scheduler *sched;
> > + const struct drm_sched_init_args args = {
> > + .ops = &sched_ops,
> > + .submit_wq = submit_wq,
> > + .num_rqs = 1,
> > + .credit_limit = job_limit,
> > + .timeout = MAX_SCHEDULE_TIMEOUT,
> > + .name = name,
> > + .dev = xe->drm.dev,
> > + };
> > + int err;
> > +
> > + dep_scheduler = kzalloc(sizeof(*dep_scheduler), GFP_KERNEL);
> > + if (!dep_scheduler)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + err = drm_sched_init(&dep_scheduler->sched, &args);
> > + if (err)
> > + goto err_free;
> > +
> > + sched = &dep_scheduler->sched;
> > + err = drm_sched_entity_init(&dep_scheduler->entity, 0,
> > + (struct drm_gpu_scheduler **)&sched, 1,
> > + NULL);
> > + if (err)
> > + goto err_sched;
> > +
> > + init_rcu_head(&dep_scheduler->rcu);
> > +
> > + return dep_scheduler;
> > +
> > +err_sched:
> > + drm_sched_fini(&dep_scheduler->sched);
> > +err_free:
> > + kfree(dep_scheduler);
> > +
> > + return ERR_PTR(err);
> > +}
> > +
> > +/**
> > + * xe_dep_scheduler_fini() - Generic Xe dependency scheduler finalize
> > + * @dep_scheduler: Generic Xe dependency scheduler object
> > + *
> > + * Finalize internal DRM scheduler objects and free generic Xe dependency
> > + * scheduler object
> > + */
> > +void xe_dep_scheduler_fini(struct xe_dep_scheduler *dep_scheduler)
> > +{
> > + drm_sched_entity_fini(&dep_scheduler->entity);
> > + drm_sched_fini(&dep_scheduler->sched);
> > + /*
> > + * RCU free due sched being exported via DRM scheduler fences
> > + * (timeline name).
> > + */
> > + kfree_rcu(dep_scheduler, rcu);
> > +}
> > +
> > +/**
> > + * xe_dep_scheduler_entity() - Retrieve a generic Xe dependency scheduler
> > + * DRM scheduler entity
> > + * @dep_scheduler: Generic Xe dependency scheduler object
> > + *
> > + * Return: The generic Xe dependency scheduler's DRM scheduler entity
> > + */
> > +struct drm_sched_entity *
> > +xe_dep_scheduler_entity(struct xe_dep_scheduler *dep_scheduler)
> > +{
> > + return &dep_scheduler->entity;
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.h b/drivers/gpu/drm/xe/xe_dep_scheduler.h
> > new file mode 100644
> > index 000000000000..853961eec64b
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_dep_scheduler.h
> > @@ -0,0 +1,21 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#include <linux/types.h>
> > +
> > +struct drm_sched_entity;
> > +struct workqueue_struct;
> > +struct xe_dep_scheduler;
> > +struct xe_device;
> > +
> > +struct xe_dep_scheduler *
> > +xe_dep_scheduler_create(struct xe_device *xe,
> > + struct workqueue_struct *submit_wq,
> > + const char *name, u32 job_limit);
> > +
> > +void xe_dep_scheduler_fini(struct xe_dep_scheduler *dep_scheduler);
> > +
> > +struct drm_sched_entity *
> > +xe_dep_scheduler_entity(struct xe_dep_scheduler *dep_scheduler);
> > --
> > 2.34.1
> >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 9/9] drm/xe: Remove unused GT TLB invalidation trace points
2025-07-02 23:42 ` [PATCH v2 9/9] drm/xe: Remove unused GT TLB invalidation trace points Matthew Brost
@ 2025-07-11 21:13 ` Summers, Stuart
0 siblings, 0 replies; 45+ messages in thread
From: Summers, Stuart @ 2025-07-11 21:13 UTC (permalink / raw)
To: intel-xe@lists.freedesktop.org, Brost, Matthew
Cc: maarten.lankhorst@linux.intel.com, Auld, Matthew
On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> Remove unused GT TLB invalidation trace points after converting to
> used
> GT TLB invalidation jobs. Tracepoint removed were used during early
> bring up of unstable driver, with a stable driver no need to replace
> with new tracepoints.
Makes sense to me. IMO the g2h/h2g messages are more interesting here
for debug anyway.
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
> drivers/gpu/drm/xe/xe_trace.h | 16 ----------------
> 1 file changed, 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_trace.h
> b/drivers/gpu/drm/xe/xe_trace.h
> index b4a3577df70c..21486a6f693a 100644
> --- a/drivers/gpu/drm/xe/xe_trace.h
> +++ b/drivers/gpu/drm/xe/xe_trace.h
> @@ -45,22 +45,6 @@ DECLARE_EVENT_CLASS(xe_gt_tlb_invalidation_fence,
> __get_str(dev), __entry->fence,
> __entry->seqno)
> );
>
> -DEFINE_EVENT(xe_gt_tlb_invalidation_fence,
> xe_gt_tlb_invalidation_fence_create,
> - TP_PROTO(struct xe_device *xe, struct
> xe_gt_tlb_invalidation_fence *fence),
> - TP_ARGS(xe, fence)
> -);
> -
> -DEFINE_EVENT(xe_gt_tlb_invalidation_fence,
> - xe_gt_tlb_invalidation_fence_work_func,
> - TP_PROTO(struct xe_device *xe, struct
> xe_gt_tlb_invalidation_fence *fence),
> - TP_ARGS(xe, fence)
> -);
> -
> -DEFINE_EVENT(xe_gt_tlb_invalidation_fence,
> xe_gt_tlb_invalidation_fence_cb,
> - TP_PROTO(struct xe_device *xe, struct
> xe_gt_tlb_invalidation_fence *fence),
> - TP_ARGS(xe, fence)
> -);
> -
> DEFINE_EVENT(xe_gt_tlb_invalidation_fence,
> xe_gt_tlb_invalidation_fence_send,
> TP_PROTO(struct xe_device *xe, struct
> xe_gt_tlb_invalidation_fence *fence),
> TP_ARGS(xe, fence)
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 1/9] drm/xe: Explicitly mark migration queues with flag
2025-07-02 23:42 ` [PATCH v2 1/9] drm/xe: Explicitly mark migration queues with flag Matthew Brost
2025-07-10 8:43 ` Francois Dugast
@ 2025-07-11 21:20 ` Summers, Stuart
1 sibling, 0 replies; 45+ messages in thread
From: Summers, Stuart @ 2025-07-11 21:20 UTC (permalink / raw)
To: intel-xe@lists.freedesktop.org, Brost, Matthew
Cc: maarten.lankhorst@linux.intel.com, Auld, Matthew
On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> Rather than inferring if an exec queue is a migration queue for a
> flag,
> explicitly mark migration queues with a flag.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
> ---
> drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 ++
> drivers/gpu/drm/xe/xe_migrate.c | 6 ++++--
> 2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index cc1cffb5c87f..abdf4a57e6e2 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -87,6 +87,8 @@ struct xe_exec_queue {
> #define EXEC_QUEUE_FLAG_HIGH_PRIORITY BIT(4)
> /* flag to indicate low latency hint to guc */
> #define EXEC_QUEUE_FLAG_LOW_LATENCY BIT(5)
> +/* for migration (kernel copy, clear, bind) jobs */
> +#define EXEC_QUEUE_FLAG_MIGRATE BIT(6)
>
> /**
> * @flags: flags for this exec queue, should statically setup
> aside from ban
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> b/drivers/gpu/drm/xe/xe_migrate.c
> index 0838582537e8..b5f85162b9ed 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -437,12 +437,14 @@ struct xe_migrate *xe_migrate_init(struct
> xe_tile *tile)
> m->q = xe_exec_queue_create(xe, vm, logical_mask, 1,
> hwe,
> EXEC_QUEUE_FLAG_KERNEL |
> EXEC_QUEUE_FLAG_PERMANENT
> |
> -
> EXEC_QUEUE_FLAG_HIGH_PRIORITY, 0);
> +
> EXEC_QUEUE_FLAG_HIGH_PRIORITY |
> + EXEC_QUEUE_FLAG_MIGRATE,
> 0);
> } else {
> m->q = xe_exec_queue_create_class(xe, primary_gt, vm,
>
> XE_ENGINE_CLASS_COPY,
>
> EXEC_QUEUE_FLAG_KERNEL |
> -
> EXEC_QUEUE_FLAG_PERMANENT, 0);
> +
> EXEC_QUEUE_FLAG_PERMANENT |
> +
> EXEC_QUEUE_FLAG_MIGRATE, 0);
> }
> if (IS_ERR(m->q)) {
> xe_vm_close_and_put(vm);
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 2/9] drm/xe: Add generic dependecy jobs / scheduler
2025-07-02 23:42 ` [PATCH v2 2/9] drm/xe: Add generic dependecy jobs / scheduler Matthew Brost
2025-07-10 11:51 ` Francois Dugast
@ 2025-07-15 21:04 ` Summers, Stuart
2025-07-15 21:14 ` Matthew Brost
1 sibling, 1 reply; 45+ messages in thread
From: Summers, Stuart @ 2025-07-15 21:04 UTC (permalink / raw)
To: intel-xe@lists.freedesktop.org, Brost, Matthew
Cc: maarten.lankhorst@linux.intel.com, Auld, Matthew
On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> Add generic dependecy jobs / scheduler which serves as wrapper for
> DRM
> scheduler. Useful when we want delay a generic operation until a
> dma-fence signals.
>
> Existing use cases could be destroying of resources based fences /
> dma-resv, the preempt rebind worker, and pipelined GT TLB
> invalidations.
>
> Written in such a way it could be moved to DRM subsystem if needed.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
> drivers/gpu/drm/xe/Makefile | 1 +
> drivers/gpu/drm/xe/xe_dep_job_types.h | 29 ++++++
> drivers/gpu/drm/xe/xe_dep_scheduler.c | 145
> ++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_dep_scheduler.h | 21 ++++
> 4 files changed, 196 insertions(+)
> create mode 100644 drivers/gpu/drm/xe/xe_dep_job_types.h
> create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.c
> create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.h
>
> diff --git a/drivers/gpu/drm/xe/Makefile
> b/drivers/gpu/drm/xe/Makefile
> index 1d97e5b63f4e..0edcfc770c0d 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -28,6 +28,7 @@ $(obj)/generated/%_wa_oob.c
> $(obj)/generated/%_wa_oob.h: $(obj)/xe_gen_wa_oob \
> xe-y += xe_bb.o \
> xe_bo.o \
> xe_bo_evict.o \
> + xe_dep_scheduler.o \
> xe_devcoredump.o \
> xe_device.o \
> xe_device_sysfs.o \
> diff --git a/drivers/gpu/drm/xe/xe_dep_job_types.h
> b/drivers/gpu/drm/xe/xe_dep_job_types.h
> new file mode 100644
> index 000000000000..c6a484f24c8c
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_dep_job_types.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_DEP_JOB_TYPES_H_
> +#define _XE_DEP_JOB_TYPES_H_
> +
> +#include <drm/gpu_scheduler.h>
> +
> +struct xe_dep_job;
> +
> +/** struct xe_dep_job_ops - Generic Xe dependency job operations */
> +struct xe_dep_job_ops {
> + /** @run_job: Run generic Xe dependency job */
> + struct dma_fence *(*run_job)(struct xe_dep_job *job);
> + /** @free_job: Free generic Xe dependency job */
> + void (*free_job)(struct xe_dep_job *job);
> +};
> +
> +/** struct xe_dep_job - Generic dependency Xe job */
> +struct xe_dep_job {
> + /** @drm: base DRM scheduler job */
> + struct drm_sched_job drm;
> + /** @ops: dependency job operations */
> + const struct xe_dep_job_ops *ops;
> +};
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.c
> b/drivers/gpu/drm/xe/xe_dep_scheduler.c
> new file mode 100644
> index 000000000000..fbd55577d787
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_dep_scheduler.c
> @@ -0,0 +1,145 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#include <linux/slab.h>
> +
> +#include <drm/gpu_scheduler.h>
> +
> +#include "xe_dep_job_types.h"
> +#include "xe_dep_scheduler.h"
> +#include "xe_device_types.h"
> +
> +/**
> + * DOC: Xe Dependency Scheduler
> + *
> + * The Xe dependency scheduler is a simple wrapper built around the
> DRM
> + * scheduler to execute jobs once their dependencies are resolved
> (i.e., all
> + * input fences specified as dependencies are signaled). The jobs
> that are
> + * executed contain virtual functions to run (execute) and free the
> job,
> + * allowing a single dependency scheduler to handle jobs performing
> different
> + * operations.
> + *
> + * Example use cases include deferred resource freeing, TLB
> invalidations after
> + * bind jobs, etc.
> + */
> +
> +/** struct xe_dep_scheduler - Generic Xe dependency scheduler */
> +struct xe_dep_scheduler {
> + /** @sched: DRM GPU scheduler */
> + struct drm_gpu_scheduler sched;
> + /** @entity: DRM scheduler entity */
> + struct drm_sched_entity entity;
> + /** @rcu: For safe freeing of exported dma fences */
> + struct rcu_head rcu;
> +};
> +
> +static struct dma_fence *xe_dep_scheduler_run_job(struct
> drm_sched_job *drm_job)
> +{
> + struct xe_dep_job *dep_job =
> + container_of(drm_job, typeof(*dep_job), drm);
> +
> + return dep_job->ops->run_job(dep_job);
> +}
> +
> +static void xe_dep_scheduler_free_job(struct drm_sched_job *drm_job)
> +{
> + struct xe_dep_job *dep_job =
> + container_of(drm_job, typeof(*dep_job), drm);
> +
> + dep_job->ops->free_job(dep_job);
> +}
> +
> +static const struct drm_sched_backend_ops sched_ops = {
> + .run_job = xe_dep_scheduler_run_job,
> + .free_job = xe_dep_scheduler_free_job,
> +};
> +
> +/**
> + * xe_dep_scheduler_create() - Generic Xe dependency scheduler
> create
> + * @xe: Xe device
> + * @submit_wq: Submit workqueue struct (can be NULL)
> + * @name: Name of dependency scheduler
> + * @job_limit: Max dependency jobs that can be scheduled
> + *
> + * Create a generic Xe dependency scheduler and initialize internal
> DRM
> + * scheduler objects.
> + *
> + * Return: Generic Xe dependency scheduler object or ERR_PTR
> + */
> +struct xe_dep_scheduler *
> +xe_dep_scheduler_create(struct xe_device *xe,
> + struct workqueue_struct *submit_wq,
> + const char *name, u32 job_limit)
> +{
> + struct xe_dep_scheduler *dep_scheduler;
> + struct drm_gpu_scheduler *sched;
> + const struct drm_sched_init_args args = {
> + .ops = &sched_ops,
> + .submit_wq = submit_wq,
> + .num_rqs = 1,
> + .credit_limit = job_limit,
> + .timeout = MAX_SCHEDULE_TIMEOUT,
> + .name = name,
> + .dev = xe->drm.dev,
> + };
> + int err;
> +
> + dep_scheduler = kzalloc(sizeof(*dep_scheduler), GFP_KERNEL);
> + if (!dep_scheduler)
> + return ERR_PTR(-ENOMEM);
> +
> + err = drm_sched_init(&dep_scheduler->sched, &args);
> + if (err)
> + goto err_free;
> +
> + sched = &dep_scheduler->sched;
> + err = drm_sched_entity_init(&dep_scheduler->entity, 0,
> + (struct drm_gpu_scheduler
> **)&sched, 1,
Why the cast here?
Otherwise this patch lgtm.
Thanks,
Stuart
> + NULL);
> + if (err)
> + goto err_sched;
> +
> + init_rcu_head(&dep_scheduler->rcu);
> +
> + return dep_scheduler;
> +
> +err_sched:
> + drm_sched_fini(&dep_scheduler->sched);
> +err_free:
> + kfree(dep_scheduler);
> +
> + return ERR_PTR(err);
> +}
> +
> +/**
> + * xe_dep_scheduler_fini() - Generic Xe dependency scheduler
> finalize
> + * @dep_scheduler: Generic Xe dependency scheduler object
> + *
> + * Finalize internal DRM scheduler objects and free generic Xe
> dependency
> + * scheduler object
> + */
> +void xe_dep_scheduler_fini(struct xe_dep_scheduler *dep_scheduler)
> +{
> + drm_sched_entity_fini(&dep_scheduler->entity);
> + drm_sched_fini(&dep_scheduler->sched);
> + /*
> + * RCU free due sched being exported via DRM scheduler fences
> + * (timeline name).
> + */
> + kfree_rcu(dep_scheduler, rcu);
> +}
> +
> +/**
> + * xe_dep_scheduler_entity() - Retrieve a generic Xe dependency
> scheduler
> + * DRM scheduler entity
> + * @dep_scheduler: Generic Xe dependency scheduler object
> + *
> + * Return: The generic Xe dependency scheduler's DRM scheduler
> entity
> + */
> +struct drm_sched_entity *
> +xe_dep_scheduler_entity(struct xe_dep_scheduler *dep_scheduler)
> +{
> + return &dep_scheduler->entity;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.h
> b/drivers/gpu/drm/xe/xe_dep_scheduler.h
> new file mode 100644
> index 000000000000..853961eec64b
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_dep_scheduler.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#include <linux/types.h>
> +
> +struct drm_sched_entity;
> +struct workqueue_struct;
> +struct xe_dep_scheduler;
> +struct xe_device;
> +
> +struct xe_dep_scheduler *
> +xe_dep_scheduler_create(struct xe_device *xe,
> + struct workqueue_struct *submit_wq,
> + const char *name, u32 job_limit);
> +
> +void xe_dep_scheduler_fini(struct xe_dep_scheduler *dep_scheduler);
> +
> +struct drm_sched_entity *
> +xe_dep_scheduler_entity(struct xe_dep_scheduler *dep_scheduler);
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 2/9] drm/xe: Add generic dependecy jobs / scheduler
2025-07-15 21:14 ` Matthew Brost
@ 2025-07-15 21:13 ` Summers, Stuart
2025-07-15 22:43 ` Summers, Stuart
0 siblings, 1 reply; 45+ messages in thread
From: Summers, Stuart @ 2025-07-15 21:13 UTC (permalink / raw)
To: Brost, Matthew
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Tue, 2025-07-15 at 14:14 -0700, Matthew Brost wrote:
> On Tue, Jul 15, 2025 at 03:04:07PM -0600, Summers, Stuart wrote:
> > On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > > Add generic dependecy jobs / scheduler which serves as wrapper
> > > for
> > > DRM
> > > scheduler. Useful when we want delay a generic operation until a
> > > dma-fence signals.
> > >
> > > Existing use cases could be destroying of resources based fences
> > > /
> > > dma-resv, the preempt rebind worker, and pipelined GT TLB
> > > invalidations.
> > >
> > > Written in such a way it could be moved to DRM subsystem if
> > > needed.
> > >
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > > drivers/gpu/drm/xe/Makefile | 1 +
> > > drivers/gpu/drm/xe/xe_dep_job_types.h | 29 ++++++
> > > drivers/gpu/drm/xe/xe_dep_scheduler.c | 145
> > > ++++++++++++++++++++++++++
> > > drivers/gpu/drm/xe/xe_dep_scheduler.h | 21 ++++
> > > 4 files changed, 196 insertions(+)
> > > create mode 100644 drivers/gpu/drm/xe/xe_dep_job_types.h
> > > create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.c
> > > create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.h
> > >
> > > diff --git a/drivers/gpu/drm/xe/Makefile
> > > b/drivers/gpu/drm/xe/Makefile
> > > index 1d97e5b63f4e..0edcfc770c0d 100644
> > > --- a/drivers/gpu/drm/xe/Makefile
> > > +++ b/drivers/gpu/drm/xe/Makefile
> > > @@ -28,6 +28,7 @@ $(obj)/generated/%_wa_oob.c
> > > $(obj)/generated/%_wa_oob.h: $(obj)/xe_gen_wa_oob \
> > > xe-y += xe_bb.o \
> > > xe_bo.o \
> > > xe_bo_evict.o \
> > > + xe_dep_scheduler.o \
> > > xe_devcoredump.o \
> > > xe_device.o \
> > > xe_device_sysfs.o \
> > > diff --git a/drivers/gpu/drm/xe/xe_dep_job_types.h
> > > b/drivers/gpu/drm/xe/xe_dep_job_types.h
> > > new file mode 100644
> > > index 000000000000..c6a484f24c8c
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/xe/xe_dep_job_types.h
> > > @@ -0,0 +1,29 @@
> > > +/* SPDX-License-Identifier: MIT */
> > > +/*
> > > + * Copyright © 2025 Intel Corporation
> > > + */
> > > +
> > > +#ifndef _XE_DEP_JOB_TYPES_H_
> > > +#define _XE_DEP_JOB_TYPES_H_
> > > +
> > > +#include <drm/gpu_scheduler.h>
> > > +
> > > +struct xe_dep_job;
> > > +
> > > +/** struct xe_dep_job_ops - Generic Xe dependency job operations
> > > */
> > > +struct xe_dep_job_ops {
> > > + /** @run_job: Run generic Xe dependency job */
> > > + struct dma_fence *(*run_job)(struct xe_dep_job *job);
> > > + /** @free_job: Free generic Xe dependency job */
> > > + void (*free_job)(struct xe_dep_job *job);
> > > +};
> > > +
> > > +/** struct xe_dep_job - Generic dependency Xe job */
> > > +struct xe_dep_job {
> > > + /** @drm: base DRM scheduler job */
> > > + struct drm_sched_job drm;
> > > + /** @ops: dependency job operations */
> > > + const struct xe_dep_job_ops *ops;
> > > +};
> > > +
> > > +#endif
> > > diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.c
> > > b/drivers/gpu/drm/xe/xe_dep_scheduler.c
> > > new file mode 100644
> > > index 000000000000..fbd55577d787
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/xe/xe_dep_scheduler.c
> > > @@ -0,0 +1,145 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright © 2025 Intel Corporation
> > > + */
> > > +
> > > +#include <linux/slab.h>
> > > +
> > > +#include <drm/gpu_scheduler.h>
> > > +
> > > +#include "xe_dep_job_types.h"
> > > +#include "xe_dep_scheduler.h"
> > > +#include "xe_device_types.h"
> > > +
> > > +/**
> > > + * DOC: Xe Dependency Scheduler
> > > + *
> > > + * The Xe dependency scheduler is a simple wrapper built around
> > > the
> > > DRM
> > > + * scheduler to execute jobs once their dependencies are
> > > resolved
> > > (i.e., all
> > > + * input fences specified as dependencies are signaled). The
> > > jobs
> > > that are
> > > + * executed contain virtual functions to run (execute) and free
> > > the
> > > job,
> > > + * allowing a single dependency scheduler to handle jobs
> > > performing
> > > different
> > > + * operations.
> > > + *
> > > + * Example use cases include deferred resource freeing, TLB
> > > invalidations after
> > > + * bind jobs, etc.
> > > + */
> > > +
> > > +/** struct xe_dep_scheduler - Generic Xe dependency scheduler */
> > > +struct xe_dep_scheduler {
> > > + /** @sched: DRM GPU scheduler */
> > > + struct drm_gpu_scheduler sched;
> > > + /** @entity: DRM scheduler entity */
> > > + struct drm_sched_entity entity;
> > > + /** @rcu: For safe freeing of exported dma fences */
> > > + struct rcu_head rcu;
> > > +};
> > > +
> > > +static struct dma_fence *xe_dep_scheduler_run_job(struct
> > > drm_sched_job *drm_job)
> > > +{
> > > + struct xe_dep_job *dep_job =
> > > + container_of(drm_job, typeof(*dep_job), drm);
> > > +
> > > + return dep_job->ops->run_job(dep_job);
> > > +}
> > > +
> > > +static void xe_dep_scheduler_free_job(struct drm_sched_job
> > > *drm_job)
> > > +{
> > > + struct xe_dep_job *dep_job =
> > > + container_of(drm_job, typeof(*dep_job), drm);
> > > +
> > > + dep_job->ops->free_job(dep_job);
> > > +}
> > > +
> > > +static const struct drm_sched_backend_ops sched_ops = {
> > > + .run_job = xe_dep_scheduler_run_job,
> > > + .free_job = xe_dep_scheduler_free_job,
> > > +};
> > > +
> > > +/**
> > > + * xe_dep_scheduler_create() - Generic Xe dependency scheduler
> > > create
> > > + * @xe: Xe device
> > > + * @submit_wq: Submit workqueue struct (can be NULL)
> > > + * @name: Name of dependency scheduler
> > > + * @job_limit: Max dependency jobs that can be scheduled
> > > + *
> > > + * Create a generic Xe dependency scheduler and initialize
> > > internal
> > > DRM
> > > + * scheduler objects.
> > > + *
> > > + * Return: Generic Xe dependency scheduler object or ERR_PTR
> > > + */
> > > +struct xe_dep_scheduler *
> > > +xe_dep_scheduler_create(struct xe_device *xe,
> > > + struct workqueue_struct *submit_wq,
> > > + const char *name, u32 job_limit)
> > > +{
> > > + struct xe_dep_scheduler *dep_scheduler;
> > > + struct drm_gpu_scheduler *sched;
> > > + const struct drm_sched_init_args args = {
> > > + .ops = &sched_ops,
> > > + .submit_wq = submit_wq,
> > > + .num_rqs = 1,
> > > + .credit_limit = job_limit,
> > > + .timeout = MAX_SCHEDULE_TIMEOUT,
> > > + .name = name,
> > > + .dev = xe->drm.dev,
> > > + };
> > > + int err;
> > > +
> > > + dep_scheduler = kzalloc(sizeof(*dep_scheduler),
> > > GFP_KERNEL);
> > > + if (!dep_scheduler)
> > > + return ERR_PTR(-ENOMEM);
> > > +
> > > + err = drm_sched_init(&dep_scheduler->sched, &args);
> > > + if (err)
> > > + goto err_free;
> > > +
> > > + sched = &dep_scheduler->sched;
> > > + err = drm_sched_entity_init(&dep_scheduler->entity, 0,
> > > + (struct drm_gpu_scheduler
> > > **)&sched, 1,
> >
> > Why the cast here?
> >
>
> Copied from some existing code that had a cast, not needed it either
> case. Will remove.
Sounds great. With that:
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Thanks,
Stuart
>
> Matt
>
> > Otherwise this patch lgtm.
> >
> > Thanks,
> > Stuart
> >
> > > + NULL);
> > > + if (err)
> > > + goto err_sched;
> > > +
> > > + init_rcu_head(&dep_scheduler->rcu);
> > > +
> > > + return dep_scheduler;
> > > +
> > > +err_sched:
> > > + drm_sched_fini(&dep_scheduler->sched);
> > > +err_free:
> > > + kfree(dep_scheduler);
> > > +
> > > + return ERR_PTR(err);
> > > +}
> > > +
> > > +/**
> > > + * xe_dep_scheduler_fini() - Generic Xe dependency scheduler
> > > finalize
> > > + * @dep_scheduler: Generic Xe dependency scheduler object
> > > + *
> > > + * Finalize internal DRM scheduler objects and free generic Xe
> > > dependency
> > > + * scheduler object
> > > + */
> > > +void xe_dep_scheduler_fini(struct xe_dep_scheduler
> > > *dep_scheduler)
> > > +{
> > > + drm_sched_entity_fini(&dep_scheduler->entity);
> > > + drm_sched_fini(&dep_scheduler->sched);
> > > + /*
> > > + * RCU free due sched being exported via DRM scheduler
> > > fences
> > > + * (timeline name).
> > > + */
> > > + kfree_rcu(dep_scheduler, rcu);
> > > +}
> > > +
> > > +/**
> > > + * xe_dep_scheduler_entity() - Retrieve a generic Xe dependency
> > > scheduler
> > > + * DRM scheduler entity
> > > + * @dep_scheduler: Generic Xe dependency scheduler object
> > > + *
> > > + * Return: The generic Xe dependency scheduler's DRM scheduler
> > > entity
> > > + */
> > > +struct drm_sched_entity *
> > > +xe_dep_scheduler_entity(struct xe_dep_scheduler *dep_scheduler)
> > > +{
> > > + return &dep_scheduler->entity;
> > > +}
> > > diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.h
> > > b/drivers/gpu/drm/xe/xe_dep_scheduler.h
> > > new file mode 100644
> > > index 000000000000..853961eec64b
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/xe/xe_dep_scheduler.h
> > > @@ -0,0 +1,21 @@
> > > +/* SPDX-License-Identifier: MIT */
> > > +/*
> > > + * Copyright © 2025 Intel Corporation
> > > + */
> > > +
> > > +#include <linux/types.h>
> > > +
> > > +struct drm_sched_entity;
> > > +struct workqueue_struct;
> > > +struct xe_dep_scheduler;
> > > +struct xe_device;
> > > +
> > > +struct xe_dep_scheduler *
> > > +xe_dep_scheduler_create(struct xe_device *xe,
> > > + struct workqueue_struct *submit_wq,
> > > + const char *name, u32 job_limit);
> > > +
> > > +void xe_dep_scheduler_fini(struct xe_dep_scheduler
> > > *dep_scheduler);
> > > +
> > > +struct drm_sched_entity *
> > > +xe_dep_scheduler_entity(struct xe_dep_scheduler *dep_scheduler);
> >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 2/9] drm/xe: Add generic dependecy jobs / scheduler
2025-07-15 21:04 ` Summers, Stuart
@ 2025-07-15 21:14 ` Matthew Brost
2025-07-15 21:13 ` Summers, Stuart
0 siblings, 1 reply; 45+ messages in thread
From: Matthew Brost @ 2025-07-15 21:14 UTC (permalink / raw)
To: Summers, Stuart
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Tue, Jul 15, 2025 at 03:04:07PM -0600, Summers, Stuart wrote:
> On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > Add generic dependecy jobs / scheduler which serves as wrapper for
> > DRM
> > scheduler. Useful when we want delay a generic operation until a
> > dma-fence signals.
> >
> > Existing use cases could be destroying of resources based fences /
> > dma-resv, the preempt rebind worker, and pipelined GT TLB
> > invalidations.
> >
> > Written in such a way it could be moved to DRM subsystem if needed.
> >
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> > drivers/gpu/drm/xe/Makefile | 1 +
> > drivers/gpu/drm/xe/xe_dep_job_types.h | 29 ++++++
> > drivers/gpu/drm/xe/xe_dep_scheduler.c | 145
> > ++++++++++++++++++++++++++
> > drivers/gpu/drm/xe/xe_dep_scheduler.h | 21 ++++
> > 4 files changed, 196 insertions(+)
> > create mode 100644 drivers/gpu/drm/xe/xe_dep_job_types.h
> > create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.c
> > create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.h
> >
> > diff --git a/drivers/gpu/drm/xe/Makefile
> > b/drivers/gpu/drm/xe/Makefile
> > index 1d97e5b63f4e..0edcfc770c0d 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -28,6 +28,7 @@ $(obj)/generated/%_wa_oob.c
> > $(obj)/generated/%_wa_oob.h: $(obj)/xe_gen_wa_oob \
> > xe-y += xe_bb.o \
> > xe_bo.o \
> > xe_bo_evict.o \
> > + xe_dep_scheduler.o \
> > xe_devcoredump.o \
> > xe_device.o \
> > xe_device_sysfs.o \
> > diff --git a/drivers/gpu/drm/xe/xe_dep_job_types.h
> > b/drivers/gpu/drm/xe/xe_dep_job_types.h
> > new file mode 100644
> > index 000000000000..c6a484f24c8c
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_dep_job_types.h
> > @@ -0,0 +1,29 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#ifndef _XE_DEP_JOB_TYPES_H_
> > +#define _XE_DEP_JOB_TYPES_H_
> > +
> > +#include <drm/gpu_scheduler.h>
> > +
> > +struct xe_dep_job;
> > +
> > +/** struct xe_dep_job_ops - Generic Xe dependency job operations */
> > +struct xe_dep_job_ops {
> > + /** @run_job: Run generic Xe dependency job */
> > + struct dma_fence *(*run_job)(struct xe_dep_job *job);
> > + /** @free_job: Free generic Xe dependency job */
> > + void (*free_job)(struct xe_dep_job *job);
> > +};
> > +
> > +/** struct xe_dep_job - Generic dependency Xe job */
> > +struct xe_dep_job {
> > + /** @drm: base DRM scheduler job */
> > + struct drm_sched_job drm;
> > + /** @ops: dependency job operations */
> > + const struct xe_dep_job_ops *ops;
> > +};
> > +
> > +#endif
> > diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.c
> > b/drivers/gpu/drm/xe/xe_dep_scheduler.c
> > new file mode 100644
> > index 000000000000..fbd55577d787
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_dep_scheduler.c
> > @@ -0,0 +1,145 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#include <linux/slab.h>
> > +
> > +#include <drm/gpu_scheduler.h>
> > +
> > +#include "xe_dep_job_types.h"
> > +#include "xe_dep_scheduler.h"
> > +#include "xe_device_types.h"
> > +
> > +/**
> > + * DOC: Xe Dependency Scheduler
> > + *
> > + * The Xe dependency scheduler is a simple wrapper built around the
> > DRM
> > + * scheduler to execute jobs once their dependencies are resolved
> > (i.e., all
> > + * input fences specified as dependencies are signaled). The jobs
> > that are
> > + * executed contain virtual functions to run (execute) and free the
> > job,
> > + * allowing a single dependency scheduler to handle jobs performing
> > different
> > + * operations.
> > + *
> > + * Example use cases include deferred resource freeing, TLB
> > invalidations after
> > + * bind jobs, etc.
> > + */
> > +
> > +/** struct xe_dep_scheduler - Generic Xe dependency scheduler */
> > +struct xe_dep_scheduler {
> > + /** @sched: DRM GPU scheduler */
> > + struct drm_gpu_scheduler sched;
> > + /** @entity: DRM scheduler entity */
> > + struct drm_sched_entity entity;
> > + /** @rcu: For safe freeing of exported dma fences */
> > + struct rcu_head rcu;
> > +};
> > +
> > +static struct dma_fence *xe_dep_scheduler_run_job(struct
> > drm_sched_job *drm_job)
> > +{
> > + struct xe_dep_job *dep_job =
> > + container_of(drm_job, typeof(*dep_job), drm);
> > +
> > + return dep_job->ops->run_job(dep_job);
> > +}
> > +
> > +static void xe_dep_scheduler_free_job(struct drm_sched_job *drm_job)
> > +{
> > + struct xe_dep_job *dep_job =
> > + container_of(drm_job, typeof(*dep_job), drm);
> > +
> > + dep_job->ops->free_job(dep_job);
> > +}
> > +
> > +static const struct drm_sched_backend_ops sched_ops = {
> > + .run_job = xe_dep_scheduler_run_job,
> > + .free_job = xe_dep_scheduler_free_job,
> > +};
> > +
> > +/**
> > + * xe_dep_scheduler_create() - Generic Xe dependency scheduler
> > create
> > + * @xe: Xe device
> > + * @submit_wq: Submit workqueue struct (can be NULL)
> > + * @name: Name of dependency scheduler
> > + * @job_limit: Max dependency jobs that can be scheduled
> > + *
> > + * Create a generic Xe dependency scheduler and initialize internal
> > DRM
> > + * scheduler objects.
> > + *
> > + * Return: Generic Xe dependency scheduler object or ERR_PTR
> > + */
> > +struct xe_dep_scheduler *
> > +xe_dep_scheduler_create(struct xe_device *xe,
> > + struct workqueue_struct *submit_wq,
> > + const char *name, u32 job_limit)
> > +{
> > + struct xe_dep_scheduler *dep_scheduler;
> > + struct drm_gpu_scheduler *sched;
> > + const struct drm_sched_init_args args = {
> > + .ops = &sched_ops,
> > + .submit_wq = submit_wq,
> > + .num_rqs = 1,
> > + .credit_limit = job_limit,
> > + .timeout = MAX_SCHEDULE_TIMEOUT,
> > + .name = name,
> > + .dev = xe->drm.dev,
> > + };
> > + int err;
> > +
> > + dep_scheduler = kzalloc(sizeof(*dep_scheduler), GFP_KERNEL);
> > + if (!dep_scheduler)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + err = drm_sched_init(&dep_scheduler->sched, &args);
> > + if (err)
> > + goto err_free;
> > +
> > + sched = &dep_scheduler->sched;
> > + err = drm_sched_entity_init(&dep_scheduler->entity, 0,
> > + (struct drm_gpu_scheduler
> > **)&sched, 1,
>
> Why the cast here?
>
Copied from some existing code that had a cast, not needed it either
case. Will remove.
Matt
> Otherwise this patch lgtm.
>
> Thanks,
> Stuart
>
> > + NULL);
> > + if (err)
> > + goto err_sched;
> > +
> > + init_rcu_head(&dep_scheduler->rcu);
> > +
> > + return dep_scheduler;
> > +
> > +err_sched:
> > + drm_sched_fini(&dep_scheduler->sched);
> > +err_free:
> > + kfree(dep_scheduler);
> > +
> > + return ERR_PTR(err);
> > +}
> > +
> > +/**
> > + * xe_dep_scheduler_fini() - Generic Xe dependency scheduler
> > finalize
> > + * @dep_scheduler: Generic Xe dependency scheduler object
> > + *
> > + * Finalize internal DRM scheduler objects and free generic Xe
> > dependency
> > + * scheduler object
> > + */
> > +void xe_dep_scheduler_fini(struct xe_dep_scheduler *dep_scheduler)
> > +{
> > + drm_sched_entity_fini(&dep_scheduler->entity);
> > + drm_sched_fini(&dep_scheduler->sched);
> > + /*
> > + * RCU free due sched being exported via DRM scheduler fences
> > + * (timeline name).
> > + */
> > + kfree_rcu(dep_scheduler, rcu);
> > +}
> > +
> > +/**
> > + * xe_dep_scheduler_entity() - Retrieve a generic Xe dependency
> > scheduler
> > + * DRM scheduler entity
> > + * @dep_scheduler: Generic Xe dependency scheduler object
> > + *
> > + * Return: The generic Xe dependency scheduler's DRM scheduler
> > entity
> > + */
> > +struct drm_sched_entity *
> > +xe_dep_scheduler_entity(struct xe_dep_scheduler *dep_scheduler)
> > +{
> > + return &dep_scheduler->entity;
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.h
> > b/drivers/gpu/drm/xe/xe_dep_scheduler.h
> > new file mode 100644
> > index 000000000000..853961eec64b
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_dep_scheduler.h
> > @@ -0,0 +1,21 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#include <linux/types.h>
> > +
> > +struct drm_sched_entity;
> > +struct workqueue_struct;
> > +struct xe_dep_scheduler;
> > +struct xe_device;
> > +
> > +struct xe_dep_scheduler *
> > +xe_dep_scheduler_create(struct xe_device *xe,
> > + struct workqueue_struct *submit_wq,
> > + const char *name, u32 job_limit);
> > +
> > +void xe_dep_scheduler_fini(struct xe_dep_scheduler *dep_scheduler);
> > +
> > +struct drm_sched_entity *
> > +xe_dep_scheduler_entity(struct xe_dep_scheduler *dep_scheduler);
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 5/9] drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues
2025-07-02 23:42 ` [PATCH v2 5/9] drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues Matthew Brost
@ 2025-07-15 21:34 ` Summers, Stuart
2025-07-15 21:44 ` Matthew Brost
0 siblings, 1 reply; 45+ messages in thread
From: Summers, Stuart @ 2025-07-15 21:34 UTC (permalink / raw)
To: intel-xe@lists.freedesktop.org, Brost, Matthew
Cc: maarten.lankhorst@linux.intel.com, Auld, Matthew
On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> Add a generic dependency scheduler for GT TLB invalidations, used to
> schedule jobs that issue GT TLB invalidations to bind queues.
>
> v2:
> - Use shared GT TLB invalidation queue for dep scheduler
> - Break allocation of dep scheduler into its own function
> - Add define for max number tlb invalidations
> - Skip media if not present
>
> Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
> drivers/gpu/drm/xe/xe_exec_queue.c | 48
> ++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_exec_queue_types.h | 13 +++++++
> 2 files changed, 61 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> b/drivers/gpu/drm/xe/xe_exec_queue.c
> index fee22358cc09..7aaf669cf5fc 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -12,6 +12,7 @@
> #include <drm/drm_file.h>
> #include <uapi/drm/xe_drm.h>
>
> +#include "xe_dep_scheduler.h"
> #include "xe_device.h"
> #include "xe_gt.h"
> #include "xe_hw_engine_class_sysfs.h"
> @@ -39,6 +40,12 @@ static int exec_queue_user_extensions(struct
> xe_device *xe, struct xe_exec_queue
>
> static void __xe_exec_queue_free(struct xe_exec_queue *q)
> {
> + int i;
> +
> + for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i)
> + if (q->tlb_inval[i].dep_scheduler)
> + xe_dep_scheduler_fini(q-
> >tlb_inval[i].dep_scheduler);
> +
> if (xe_exec_queue_uses_pxp(q))
> xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
> if (q->vm)
> @@ -50,6 +57,39 @@ static void __xe_exec_queue_free(struct
> xe_exec_queue *q)
> kfree(q);
> }
>
> +static int alloc_dep_schedulers(struct xe_device *xe, struct
> xe_exec_queue *q)
> +{
> + struct xe_tile *tile = gt_to_tile(q->gt);
> + int i;
> +
> + for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i) {
> + struct xe_dep_scheduler *dep_scheduler;
> + struct xe_gt *gt;
> + struct workqueue_struct *wq;
> +
> + if (i == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT)
> + gt = tile->primary_gt;
> + else
> + gt = tile->media_gt;
> +
> + if (!gt)
> + continue;
> +
> + wq = gt->tlb_invalidation.job_wq;
> +
> +#define MAX_TLB_INVAL_JOBS 16 /* Picking a reasonable value
> */
So if we exceed tihs number, that means we won't get an invalidation of
that range before a context switch to a new queue?
> + dep_scheduler = xe_dep_scheduler_create(xe, wq, q-
> >name,
> + MAX_TLB_INVAL
> _JOBS);
> + if (IS_ERR(dep_scheduler))
> + return PTR_ERR(dep_scheduler);
> +
> + q->tlb_inval[i].dep_scheduler = dep_scheduler;
> + }
> +#undef MAX_TLB_INVAL_JOBS
> +
> + return 0;
> +}
> +
> static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device
> *xe,
> struct xe_vm *vm,
> u32 logical_mask,
> @@ -94,6 +134,14 @@ static struct xe_exec_queue
> *__xe_exec_queue_alloc(struct xe_device *xe,
> else
> q->sched_props.priority =
> XE_EXEC_QUEUE_PRIORITY_NORMAL;
We want these to be high priority right? I.e. if another user creates a
queue and submits at high priority can we get stale data on those
executions if this one is blocked behind that high priority one?
Thanks,
Stuart
>
> + if (q->flags & (EXEC_QUEUE_FLAG_MIGRATE |
> EXEC_QUEUE_FLAG_VM)) {
> + err = alloc_dep_schedulers(xe, q);
> + if (err) {
> + __xe_exec_queue_free(q);
> + return ERR_PTR(err);
> + }
> + }
> +
> if (vm)
> q->vm = xe_vm_get(vm);
>
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index abdf4a57e6e2..ba443a497b38 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -134,6 +134,19 @@ struct xe_exec_queue {
> struct list_head link;
> } lr;
>
> +#define XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT 0
> +#define XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT 1
> +#define
> XE_EXEC_QUEUE_TLB_INVAL_COUNT (XE_EXEC_QUEUE_TLB_INVAL_MEDIA
> _GT + 1)
> +
> + /** @tlb_inval: TLB invalidations exec queue state */
> + struct {
> + /**
> + * @tlb_inval.dep_scheduler: The TLB invalidation
> + * dependency scheduler
> + */
> + struct xe_dep_scheduler *dep_scheduler;
> + } tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT];
> +
> /** @pxp: PXP info tracking */
> struct {
> /** @pxp.type: PXP session type used by this queue */
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 5/9] drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues
2025-07-15 21:34 ` Summers, Stuart
@ 2025-07-15 21:44 ` Matthew Brost
2025-07-15 21:45 ` Summers, Stuart
0 siblings, 1 reply; 45+ messages in thread
From: Matthew Brost @ 2025-07-15 21:44 UTC (permalink / raw)
To: Summers, Stuart
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Tue, Jul 15, 2025 at 03:34:29PM -0600, Summers, Stuart wrote:
> On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > Add a generic dependency scheduler for GT TLB invalidations, used to
> > schedule jobs that issue GT TLB invalidations to bind queues.
> >
> > v2:
> > - Use shared GT TLB invalidation queue for dep scheduler
> > - Break allocation of dep scheduler into its own function
> > - Add define for max number tlb invalidations
> > - Skip media if not present
> >
> > Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_exec_queue.c | 48
> > ++++++++++++++++++++++++
> > drivers/gpu/drm/xe/xe_exec_queue_types.h | 13 +++++++
> > 2 files changed, 61 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > index fee22358cc09..7aaf669cf5fc 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > @@ -12,6 +12,7 @@
> > #include <drm/drm_file.h>
> > #include <uapi/drm/xe_drm.h>
> >
> > +#include "xe_dep_scheduler.h"
> > #include "xe_device.h"
> > #include "xe_gt.h"
> > #include "xe_hw_engine_class_sysfs.h"
> > @@ -39,6 +40,12 @@ static int exec_queue_user_extensions(struct
> > xe_device *xe, struct xe_exec_queue
> >
> > static void __xe_exec_queue_free(struct xe_exec_queue *q)
> > {
> > + int i;
> > +
> > + for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i)
> > + if (q->tlb_inval[i].dep_scheduler)
> > + xe_dep_scheduler_fini(q-
> > >tlb_inval[i].dep_scheduler);
> > +
> > if (xe_exec_queue_uses_pxp(q))
> > xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
> > if (q->vm)
> > @@ -50,6 +57,39 @@ static void __xe_exec_queue_free(struct
> > xe_exec_queue *q)
> > kfree(q);
> > }
> >
> > +static int alloc_dep_schedulers(struct xe_device *xe, struct
> > xe_exec_queue *q)
> > +{
> > + struct xe_tile *tile = gt_to_tile(q->gt);
> > + int i;
> > +
> > + for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i) {
> > + struct xe_dep_scheduler *dep_scheduler;
> > + struct xe_gt *gt;
> > + struct workqueue_struct *wq;
> > +
> > + if (i == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT)
> > + gt = tile->primary_gt;
> > + else
> > + gt = tile->media_gt;
> > +
> > + if (!gt)
> > + continue;
> > +
> > + wq = gt->tlb_invalidation.job_wq;
> > +
> > +#define MAX_TLB_INVAL_JOBS 16 /* Picking a reasonable value
> > */
>
> So if we exceed tihs number, that means we won't get an invalidation of
> that range before a context switch to a new queue?
>
> > + dep_scheduler = xe_dep_scheduler_create(xe, wq, q-
> > >name,
> > + MAX_TLB_INVAL
> > _JOBS);
> > + if (IS_ERR(dep_scheduler))
> > + return PTR_ERR(dep_scheduler);
> > +
> > + q->tlb_inval[i].dep_scheduler = dep_scheduler;
> > + }
> > +#undef MAX_TLB_INVAL_JOBS
> > +
> > + return 0;
> > +}
> > +
> > static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device
> > *xe,
> > struct xe_vm *vm,
> > u32 logical_mask,
> > @@ -94,6 +134,14 @@ static struct xe_exec_queue
> > *__xe_exec_queue_alloc(struct xe_device *xe,
> > else
> > q->sched_props.priority =
> > XE_EXEC_QUEUE_PRIORITY_NORMAL;
>
> We want these to be high priority right? I.e. if another user creates a
> queue and submits at high priority can we get stale data on those
> executions if this one is blocked behind that high priority one?
I'm struggling with exactly what you are asking here.
The q->sched_props.priority is priority of exec queue being created
and is existing code. In the case of bind queues (i.e., ones that
call alloc_dep_schedulers) this maps to priority of the GuC context
which runs bind jobs on a hardware copy engine.
dep_schedulers do not have priority, at least that is used in any way.
This is just software queue which runs TLB invalidations once the bind
jobs complete.
wrt high priority passing lower priority ones, that isn't possible if
there is a dependency chain. The driver holds any jobs in the KMD until
all dependecies are resolved - high priority queues just get serviced by
the GuC over low priority queues if both queues have runnable jobs.
TL;DR nothing is patch changes anything wrt to priorities.
Matt
>
> Thanks,
> Stuart
>
> >
> > + if (q->flags & (EXEC_QUEUE_FLAG_MIGRATE |
> > EXEC_QUEUE_FLAG_VM)) {
> > + err = alloc_dep_schedulers(xe, q);
> > + if (err) {
> > + __xe_exec_queue_free(q);
> > + return ERR_PTR(err);
> > + }
> > + }
> > +
> > if (vm)
> > q->vm = xe_vm_get(vm);
> >
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > index abdf4a57e6e2..ba443a497b38 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > @@ -134,6 +134,19 @@ struct xe_exec_queue {
> > struct list_head link;
> > } lr;
> >
> > +#define XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT 0
> > +#define XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT 1
> > +#define
> > XE_EXEC_QUEUE_TLB_INVAL_COUNT (XE_EXEC_QUEUE_TLB_INVAL_MEDIA
> > _GT + 1)
> > +
> > + /** @tlb_inval: TLB invalidations exec queue state */
> > + struct {
> > + /**
> > + * @tlb_inval.dep_scheduler: The TLB invalidation
> > + * dependency scheduler
> > + */
> > + struct xe_dep_scheduler *dep_scheduler;
> > + } tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT];
> > +
> > /** @pxp: PXP info tracking */
> > struct {
> > /** @pxp.type: PXP session type used by this queue */
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 5/9] drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues
2025-07-15 21:44 ` Matthew Brost
@ 2025-07-15 21:45 ` Summers, Stuart
2025-07-15 21:52 ` Matthew Brost
0 siblings, 1 reply; 45+ messages in thread
From: Summers, Stuart @ 2025-07-15 21:45 UTC (permalink / raw)
To: Brost, Matthew
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Tue, 2025-07-15 at 14:44 -0700, Matthew Brost wrote:
> On Tue, Jul 15, 2025 at 03:34:29PM -0600, Summers, Stuart wrote:
> > On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > > Add a generic dependency scheduler for GT TLB invalidations, used
> > > to
> > > schedule jobs that issue GT TLB invalidations to bind queues.
> > >
> > > v2:
> > > - Use shared GT TLB invalidation queue for dep scheduler
> > > - Break allocation of dep scheduler into its own function
> > > - Add define for max number tlb invalidations
> > > - Skip media if not present
> > >
> > > Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > > drivers/gpu/drm/xe/xe_exec_queue.c | 48
> > > ++++++++++++++++++++++++
> > > drivers/gpu/drm/xe/xe_exec_queue_types.h | 13 +++++++
> > > 2 files changed, 61 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > index fee22358cc09..7aaf669cf5fc 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > @@ -12,6 +12,7 @@
> > > #include <drm/drm_file.h>
> > > #include <uapi/drm/xe_drm.h>
> > >
> > > +#include "xe_dep_scheduler.h"
> > > #include "xe_device.h"
> > > #include "xe_gt.h"
> > > #include "xe_hw_engine_class_sysfs.h"
> > > @@ -39,6 +40,12 @@ static int exec_queue_user_extensions(struct
> > > xe_device *xe, struct xe_exec_queue
> > >
> > > static void __xe_exec_queue_free(struct xe_exec_queue *q)
> > > {
> > > + int i;
> > > +
> > > + for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i)
> > > + if (q->tlb_inval[i].dep_scheduler)
> > > + xe_dep_scheduler_fini(q-
> > > > tlb_inval[i].dep_scheduler);
> > > +
> > > if (xe_exec_queue_uses_pxp(q))
> > > xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp,
> > > q);
> > > if (q->vm)
> > > @@ -50,6 +57,39 @@ static void __xe_exec_queue_free(struct
> > > xe_exec_queue *q)
> > > kfree(q);
> > > }
> > >
> > > +static int alloc_dep_schedulers(struct xe_device *xe, struct
> > > xe_exec_queue *q)
> > > +{
> > > + struct xe_tile *tile = gt_to_tile(q->gt);
> > > + int i;
> > > +
> > > + for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i) {
> > > + struct xe_dep_scheduler *dep_scheduler;
> > > + struct xe_gt *gt;
> > > + struct workqueue_struct *wq;
> > > +
> > > + if (i == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT)
> > > + gt = tile->primary_gt;
> > > + else
> > > + gt = tile->media_gt;
> > > +
> > > + if (!gt)
> > > + continue;
> > > +
> > > + wq = gt->tlb_invalidation.job_wq;
> > > +
> > > +#define MAX_TLB_INVAL_JOBS 16 /* Picking a reasonable
> > > value
> > > */
> >
> > So if we exceed tihs number, that means we won't get an
> > invalidation of
> > that range before a context switch to a new queue?
> >
> > > + dep_scheduler = xe_dep_scheduler_create(xe, wq,
> > > q-
> > > > name,
> > > + MAX_TLB_I
> > > NVAL
> > > _JOBS);
> > > + if (IS_ERR(dep_scheduler))
> > > + return PTR_ERR(dep_scheduler);
> > > +
> > > + q->tlb_inval[i].dep_scheduler = dep_scheduler;
> > > + }
> > > +#undef MAX_TLB_INVAL_JOBS
> > > +
> > > + return 0;
> > > +}
> > > +
> > > static struct xe_exec_queue *__xe_exec_queue_alloc(struct
> > > xe_device
> > > *xe,
> > > struct xe_vm
> > > *vm,
> > > u32
> > > logical_mask,
> > > @@ -94,6 +134,14 @@ static struct xe_exec_queue
> > > *__xe_exec_queue_alloc(struct xe_device *xe,
> > > else
> > > q->sched_props.priority =
> > > XE_EXEC_QUEUE_PRIORITY_NORMAL;
> >
> > We want these to be high priority right? I.e. if another user
> > creates a
> > queue and submits at high priority can we get stale data on those
> > executions if this one is blocked behind that high priority one?
>
> I'm struggling with exactly what you are asking here.
>
> The q->sched_props.priority is priority of exec queue being created
> and is existing code. In the case of bind queues (i.e., ones that
> call alloc_dep_schedulers) this maps to priority of the GuC context
> which runs bind jobs on a hardware copy engine.
>
> dep_schedulers do not have priority, at least that is used in any
> way.
> This is just software queue which runs TLB invalidations once the
> bind
> jobs complete.
>
> wrt high priority passing lower priority ones, that isn't possible if
> there is a dependency chain. The driver holds any jobs in the KMD
> until
> all dependecies are resolved - high priority queues just get serviced
> by
> the GuC over low priority queues if both queues have runnable jobs.
>
> TL;DR nothing is patch changes anything wrt to priorities.
You're right I think I read this wrong somehow :(, ignore my comment!
Still the minor question above about the max number of dependency jobs
when you have time.
Thanks,
Stuart
>
> Matt
>
> >
> > Thanks,
> > Stuart
> >
> > >
> > > + if (q->flags & (EXEC_QUEUE_FLAG_MIGRATE |
> > > EXEC_QUEUE_FLAG_VM)) {
> > > + err = alloc_dep_schedulers(xe, q);
> > > + if (err) {
> > > + __xe_exec_queue_free(q);
> > > + return ERR_PTR(err);
> > > + }
> > > + }
> > > +
> > > if (vm)
> > > q->vm = xe_vm_get(vm);
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > index abdf4a57e6e2..ba443a497b38 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > @@ -134,6 +134,19 @@ struct xe_exec_queue {
> > > struct list_head link;
> > > } lr;
> > >
> > > +#define XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT 0
> > > +#define XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT 1
> > > +#define
> > > XE_EXEC_QUEUE_TLB_INVAL_COUNT (XE_EXEC_QUEUE_TLB_INVAL_M
> > > EDIA
> > > _GT + 1)
> > > +
> > > + /** @tlb_inval: TLB invalidations exec queue state */
> > > + struct {
> > > + /**
> > > + * @tlb_inval.dep_scheduler: The TLB invalidation
> > > + * dependency scheduler
> > > + */
> > > + struct xe_dep_scheduler *dep_scheduler;
> > > + } tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT];
> > > +
> > > /** @pxp: PXP info tracking */
> > > struct {
> > > /** @pxp.type: PXP session type used by this
> > > queue */
> >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 5/9] drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues
2025-07-15 21:45 ` Summers, Stuart
@ 2025-07-15 21:52 ` Matthew Brost
2025-07-15 21:53 ` Summers, Stuart
0 siblings, 1 reply; 45+ messages in thread
From: Matthew Brost @ 2025-07-15 21:52 UTC (permalink / raw)
To: Summers, Stuart
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Tue, Jul 15, 2025 at 03:45:55PM -0600, Summers, Stuart wrote:
> On Tue, 2025-07-15 at 14:44 -0700, Matthew Brost wrote:
> > On Tue, Jul 15, 2025 at 03:34:29PM -0600, Summers, Stuart wrote:
> > > On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > > > Add a generic dependency scheduler for GT TLB invalidations, used
> > > > to
> > > > schedule jobs that issue GT TLB invalidations to bind queues.
> > > >
> > > > v2:
> > > > - Use shared GT TLB invalidation queue for dep scheduler
> > > > - Break allocation of dep scheduler into its own function
> > > > - Add define for max number tlb invalidations
> > > > - Skip media if not present
> > > >
> > > > Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_exec_queue.c | 48
> > > > ++++++++++++++++++++++++
> > > > drivers/gpu/drm/xe/xe_exec_queue_types.h | 13 +++++++
> > > > 2 files changed, 61 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > index fee22358cc09..7aaf669cf5fc 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > @@ -12,6 +12,7 @@
> > > > #include <drm/drm_file.h>
> > > > #include <uapi/drm/xe_drm.h>
> > > >
> > > > +#include "xe_dep_scheduler.h"
> > > > #include "xe_device.h"
> > > > #include "xe_gt.h"
> > > > #include "xe_hw_engine_class_sysfs.h"
> > > > @@ -39,6 +40,12 @@ static int exec_queue_user_extensions(struct
> > > > xe_device *xe, struct xe_exec_queue
> > > >
> > > > static void __xe_exec_queue_free(struct xe_exec_queue *q)
> > > > {
> > > > + int i;
> > > > +
> > > > + for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i)
> > > > + if (q->tlb_inval[i].dep_scheduler)
> > > > + xe_dep_scheduler_fini(q-
> > > > > tlb_inval[i].dep_scheduler);
> > > > +
> > > > if (xe_exec_queue_uses_pxp(q))
> > > > xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp,
> > > > q);
> > > > if (q->vm)
> > > > @@ -50,6 +57,39 @@ static void __xe_exec_queue_free(struct
> > > > xe_exec_queue *q)
> > > > kfree(q);
> > > > }
> > > >
> > > > +static int alloc_dep_schedulers(struct xe_device *xe, struct
> > > > xe_exec_queue *q)
> > > > +{
> > > > + struct xe_tile *tile = gt_to_tile(q->gt);
> > > > + int i;
> > > > +
> > > > + for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i) {
> > > > + struct xe_dep_scheduler *dep_scheduler;
> > > > + struct xe_gt *gt;
> > > > + struct workqueue_struct *wq;
> > > > +
> > > > + if (i == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT)
> > > > + gt = tile->primary_gt;
> > > > + else
> > > > + gt = tile->media_gt;
> > > > +
> > > > + if (!gt)
> > > > + continue;
> > > > +
> > > > + wq = gt->tlb_invalidation.job_wq;
> > > > +
> > > > +#define MAX_TLB_INVAL_JOBS 16 /* Picking a reasonable
> > > > value
> > > > */
> > >
> > > So if we exceed tihs number, that means we won't get an
> > > invalidation of
> > > that range before a context switch to a new queue?
Sorry missed this.
No. This is just maximum number of TLB invalidations, on this queue,
which can be inflight to the hardware at once. If MAX_TLB_INVAL_JOBS is
exceeded, the scheduler just holds jobs until prior onces complete and
more job credits become available.
Context switching or other queues are not really related here. Think of
this a FIFO in which only MAX_TLB_INVAL_JOBS can be outstanding at time.
Matt
> > >
> > > > + dep_scheduler = xe_dep_scheduler_create(xe, wq,
> > > > q-
> > > > > name,
> > > > + MAX_TLB_I
> > > > NVAL
> > > > _JOBS);
> > > > + if (IS_ERR(dep_scheduler))
> > > > + return PTR_ERR(dep_scheduler);
> > > > +
> > > > + q->tlb_inval[i].dep_scheduler = dep_scheduler;
> > > > + }
> > > > +#undef MAX_TLB_INVAL_JOBS
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > static struct xe_exec_queue *__xe_exec_queue_alloc(struct
> > > > xe_device
> > > > *xe,
> > > > struct xe_vm
> > > > *vm,
> > > > u32
> > > > logical_mask,
> > > > @@ -94,6 +134,14 @@ static struct xe_exec_queue
> > > > *__xe_exec_queue_alloc(struct xe_device *xe,
> > > > else
> > > > q->sched_props.priority =
> > > > XE_EXEC_QUEUE_PRIORITY_NORMAL;
> > >
> > > We want these to be high priority right? I.e. if another user
> > > creates a
> > > queue and submits at high priority can we get stale data on those
> > > executions if this one is blocked behind that high priority one?
> >
> > I'm struggling with exactly what you are asking here.
> >
> > The q->sched_props.priority is priority of exec queue being created
> > and is existing code. In the case of bind queues (i.e., ones that
> > call alloc_dep_schedulers) this maps to priority of the GuC context
> > which runs bind jobs on a hardware copy engine.
> >
> > dep_schedulers do not have priority, at least that is used in any
> > way.
> > This is just software queue which runs TLB invalidations once the
> > bind
> > jobs complete.
> >
> > wrt high priority passing lower priority ones, that isn't possible if
> > there is a dependency chain. The driver holds any jobs in the KMD
> > until
> > all dependecies are resolved - high priority queues just get serviced
> > by
> > the GuC over low priority queues if both queues have runnable jobs.
> >
> > TL;DR nothing is patch changes anything wrt to priorities.
>
> You're right I think I read this wrong somehow :(, ignore my comment!
>
> Still the minor question above about the max number of dependency jobs
> when you have time.
>
> Thanks,
> Stuart
>
> >
> > Matt
> >
> > >
> > > Thanks,
> > > Stuart
> > >
> > > >
> > > > + if (q->flags & (EXEC_QUEUE_FLAG_MIGRATE |
> > > > EXEC_QUEUE_FLAG_VM)) {
> > > > + err = alloc_dep_schedulers(xe, q);
> > > > + if (err) {
> > > > + __xe_exec_queue_free(q);
> > > > + return ERR_PTR(err);
> > > > + }
> > > > + }
> > > > +
> > > > if (vm)
> > > > q->vm = xe_vm_get(vm);
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > index abdf4a57e6e2..ba443a497b38 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > @@ -134,6 +134,19 @@ struct xe_exec_queue {
> > > > struct list_head link;
> > > > } lr;
> > > >
> > > > +#define XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT 0
> > > > +#define XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT 1
> > > > +#define
> > > > XE_EXEC_QUEUE_TLB_INVAL_COUNT (XE_EXEC_QUEUE_TLB_INVAL_M
> > > > EDIA
> > > > _GT + 1)
> > > > +
> > > > + /** @tlb_inval: TLB invalidations exec queue state */
> > > > + struct {
> > > > + /**
> > > > + * @tlb_inval.dep_scheduler: The TLB invalidation
> > > > + * dependency scheduler
> > > > + */
> > > > + struct xe_dep_scheduler *dep_scheduler;
> > > > + } tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT];
> > > > +
> > > > /** @pxp: PXP info tracking */
> > > > struct {
> > > > /** @pxp.type: PXP session type used by this
> > > > queue */
> > >
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 5/9] drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues
2025-07-15 21:52 ` Matthew Brost
@ 2025-07-15 21:53 ` Summers, Stuart
2025-07-15 22:01 ` Matthew Brost
0 siblings, 1 reply; 45+ messages in thread
From: Summers, Stuart @ 2025-07-15 21:53 UTC (permalink / raw)
To: Brost, Matthew
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Tue, 2025-07-15 at 14:52 -0700, Matthew Brost wrote:
> On Tue, Jul 15, 2025 at 03:45:55PM -0600, Summers, Stuart wrote:
> > On Tue, 2025-07-15 at 14:44 -0700, Matthew Brost wrote:
> > > On Tue, Jul 15, 2025 at 03:34:29PM -0600, Summers, Stuart wrote:
> > > > On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > > > > Add a generic dependency scheduler for GT TLB invalidations,
> > > > > used
> > > > > to
> > > > > schedule jobs that issue GT TLB invalidations to bind queues.
> > > > >
> > > > > v2:
> > > > > - Use shared GT TLB invalidation queue for dep scheduler
> > > > > - Break allocation of dep scheduler into its own function
> > > > > - Add define for max number tlb invalidations
> > > > > - Skip media if not present
> > > > >
> > > > > Suggested-by: Thomas Hellström
> > > > > <thomas.hellstrom@linux.intel.com>
> > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > ---
> > > > > drivers/gpu/drm/xe/xe_exec_queue.c | 48
> > > > > ++++++++++++++++++++++++
> > > > > drivers/gpu/drm/xe/xe_exec_queue_types.h | 13 +++++++
> > > > > 2 files changed, 61 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > index fee22358cc09..7aaf669cf5fc 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > @@ -12,6 +12,7 @@
> > > > > #include <drm/drm_file.h>
> > > > > #include <uapi/drm/xe_drm.h>
> > > > >
> > > > > +#include "xe_dep_scheduler.h"
> > > > > #include "xe_device.h"
> > > > > #include "xe_gt.h"
> > > > > #include "xe_hw_engine_class_sysfs.h"
> > > > > @@ -39,6 +40,12 @@ static int
> > > > > exec_queue_user_extensions(struct
> > > > > xe_device *xe, struct xe_exec_queue
> > > > >
> > > > > static void __xe_exec_queue_free(struct xe_exec_queue *q)
> > > > > {
> > > > > + int i;
> > > > > +
> > > > > + for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i)
> > > > > + if (q->tlb_inval[i].dep_scheduler)
> > > > > + xe_dep_scheduler_fini(q-
> > > > > > tlb_inval[i].dep_scheduler);
> > > > > +
> > > > > if (xe_exec_queue_uses_pxp(q))
> > > > > xe_pxp_exec_queue_remove(gt_to_xe(q->gt)-
> > > > > >pxp,
> > > > > q);
> > > > > if (q->vm)
> > > > > @@ -50,6 +57,39 @@ static void __xe_exec_queue_free(struct
> > > > > xe_exec_queue *q)
> > > > > kfree(q);
> > > > > }
> > > > >
> > > > > +static int alloc_dep_schedulers(struct xe_device *xe, struct
> > > > > xe_exec_queue *q)
> > > > > +{
> > > > > + struct xe_tile *tile = gt_to_tile(q->gt);
> > > > > + int i;
> > > > > +
> > > > > + for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i) {
> > > > > + struct xe_dep_scheduler *dep_scheduler;
> > > > > + struct xe_gt *gt;
> > > > > + struct workqueue_struct *wq;
> > > > > +
> > > > > + if (i == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT)
> > > > > + gt = tile->primary_gt;
> > > > > + else
> > > > > + gt = tile->media_gt;
> > > > > +
> > > > > + if (!gt)
> > > > > + continue;
> > > > > +
> > > > > + wq = gt->tlb_invalidation.job_wq;
> > > > > +
> > > > > +#define MAX_TLB_INVAL_JOBS 16 /* Picking a
> > > > > reasonable
> > > > > value
> > > > > */
> > > >
> > > > So if we exceed tihs number, that means we won't get an
> > > > invalidation of
> > > > that range before a context switch to a new queue?
>
> Sorry missed this.
>
> No. This is just maximum number of TLB invalidations, on this queue,
> which can be inflight to the hardware at once. If MAX_TLB_INVAL_JOBS
> is
> exceeded, the scheduler just holds jobs until prior onces complete
> and
> more job credits become available.
>
> Context switching or other queues are not really related here. Think
> of
> this a FIFO in which only MAX_TLB_INVAL_JOBS can be outstanding at
> time.
Got it. So basically summarizing, if I were to submit 17 TLB
invalidations at once (before hardware completed any one of those)
theoretically, followed by a submission from userspace on a new
context, the scheduler would still accept these, but would only submit
the first 16 to hardware. It would then submit the 17th once a hardware
slot became available, then finally that new user context. Is that
right?
Thanks,
Stuart
>
> Matt
>
> > > >
> > > > > + dep_scheduler = xe_dep_scheduler_create(xe,
> > > > > wq,
> > > > > q-
> > > > > > name,
> > > > > + MAX_T
> > > > > LB_I
> > > > > NVAL
> > > > > _JOBS);
> > > > > + if (IS_ERR(dep_scheduler))
> > > > > + return PTR_ERR(dep_scheduler);
> > > > > +
> > > > > + q->tlb_inval[i].dep_scheduler =
> > > > > dep_scheduler;
> > > > > + }
> > > > > +#undef MAX_TLB_INVAL_JOBS
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > static struct xe_exec_queue *__xe_exec_queue_alloc(struct
> > > > > xe_device
> > > > > *xe,
> > > > > struct
> > > > > xe_vm
> > > > > *vm,
> > > > > u32
> > > > > logical_mask,
> > > > > @@ -94,6 +134,14 @@ static struct xe_exec_queue
> > > > > *__xe_exec_queue_alloc(struct xe_device *xe,
> > > > > else
> > > > > q->sched_props.priority =
> > > > > XE_EXEC_QUEUE_PRIORITY_NORMAL;
> > > >
> > > > We want these to be high priority right? I.e. if another user
> > > > creates a
> > > > queue and submits at high priority can we get stale data on
> > > > those
> > > > executions if this one is blocked behind that high priority
> > > > one?
> > >
> > > I'm struggling with exactly what you are asking here.
> > >
> > > The q->sched_props.priority is priority of exec queue being
> > > created
> > > and is existing code. In the case of bind queues (i.e., ones that
> > > call alloc_dep_schedulers) this maps to priority of the GuC
> > > context
> > > which runs bind jobs on a hardware copy engine.
> > >
> > > dep_schedulers do not have priority, at least that is used in any
> > > way.
> > > This is just software queue which runs TLB invalidations once the
> > > bind
> > > jobs complete.
> > >
> > > wrt high priority passing lower priority ones, that isn't
> > > possible if
> > > there is a dependency chain. The driver holds any jobs in the KMD
> > > until
> > > all dependecies are resolved - high priority queues just get
> > > serviced
> > > by
> > > the GuC over low priority queues if both queues have runnable
> > > jobs.
> > >
> > > TL;DR nothing is patch changes anything wrt to priorities.
> >
> > You're right I think I read this wrong somehow :(, ignore my
> > comment!
> >
> > Still the minor question above about the max number of dependency
> > jobs
> > when you have time.
> >
> > Thanks,
> > Stuart
> >
> > >
> > > Matt
> > >
> > > >
> > > > Thanks,
> > > > Stuart
> > > >
> > > > >
> > > > > + if (q->flags & (EXEC_QUEUE_FLAG_MIGRATE |
> > > > > EXEC_QUEUE_FLAG_VM)) {
> > > > > + err = alloc_dep_schedulers(xe, q);
> > > > > + if (err) {
> > > > > + __xe_exec_queue_free(q);
> > > > > + return ERR_PTR(err);
> > > > > + }
> > > > > + }
> > > > > +
> > > > > if (vm)
> > > > > q->vm = xe_vm_get(vm);
> > > > >
> > > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > index abdf4a57e6e2..ba443a497b38 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > @@ -134,6 +134,19 @@ struct xe_exec_queue {
> > > > > struct list_head link;
> > > > > } lr;
> > > > >
> > > > > +#define XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT 0
> > > > > +#define XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT 1
> > > > > +#define
> > > > > XE_EXEC_QUEUE_TLB_INVAL_COUNT (XE_EXEC_QUEUE_TLB_INV
> > > > > AL_M
> > > > > EDIA
> > > > > _GT + 1)
> > > > > +
> > > > > + /** @tlb_inval: TLB invalidations exec queue state */
> > > > > + struct {
> > > > > + /**
> > > > > + * @tlb_inval.dep_scheduler: The TLB
> > > > > invalidation
> > > > > + * dependency scheduler
> > > > > + */
> > > > > + struct xe_dep_scheduler *dep_scheduler;
> > > > > + } tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT];
> > > > > +
> > > > > /** @pxp: PXP info tracking */
> > > > > struct {
> > > > > /** @pxp.type: PXP session type used by this
> > > > > queue */
> > > >
> >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 5/9] drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues
2025-07-15 21:53 ` Summers, Stuart
@ 2025-07-15 22:01 ` Matthew Brost
2025-07-15 22:49 ` Summers, Stuart
0 siblings, 1 reply; 45+ messages in thread
From: Matthew Brost @ 2025-07-15 22:01 UTC (permalink / raw)
To: Summers, Stuart
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Tue, Jul 15, 2025 at 03:53:27PM -0600, Summers, Stuart wrote:
> On Tue, 2025-07-15 at 14:52 -0700, Matthew Brost wrote:
> > On Tue, Jul 15, 2025 at 03:45:55PM -0600, Summers, Stuart wrote:
> > > On Tue, 2025-07-15 at 14:44 -0700, Matthew Brost wrote:
> > > > On Tue, Jul 15, 2025 at 03:34:29PM -0600, Summers, Stuart wrote:
> > > > > On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > > > > > Add a generic dependency scheduler for GT TLB invalidations,
> > > > > > used
> > > > > > to
> > > > > > schedule jobs that issue GT TLB invalidations to bind queues.
> > > > > >
> > > > > > v2:
> > > > > > - Use shared GT TLB invalidation queue for dep scheduler
> > > > > > - Break allocation of dep scheduler into its own function
> > > > > > - Add define for max number tlb invalidations
> > > > > > - Skip media if not present
> > > > > >
> > > > > > Suggested-by: Thomas Hellström
> > > > > > <thomas.hellstrom@linux.intel.com>
> > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > > ---
> > > > > > drivers/gpu/drm/xe/xe_exec_queue.c | 48
> > > > > > ++++++++++++++++++++++++
> > > > > > drivers/gpu/drm/xe/xe_exec_queue_types.h | 13 +++++++
> > > > > > 2 files changed, 61 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > index fee22358cc09..7aaf669cf5fc 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > @@ -12,6 +12,7 @@
> > > > > > #include <drm/drm_file.h>
> > > > > > #include <uapi/drm/xe_drm.h>
> > > > > >
> > > > > > +#include "xe_dep_scheduler.h"
> > > > > > #include "xe_device.h"
> > > > > > #include "xe_gt.h"
> > > > > > #include "xe_hw_engine_class_sysfs.h"
> > > > > > @@ -39,6 +40,12 @@ static int
> > > > > > exec_queue_user_extensions(struct
> > > > > > xe_device *xe, struct xe_exec_queue
> > > > > >
> > > > > > static void __xe_exec_queue_free(struct xe_exec_queue *q)
> > > > > > {
> > > > > > + int i;
> > > > > > +
> > > > > > + for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i)
> > > > > > + if (q->tlb_inval[i].dep_scheduler)
> > > > > > + xe_dep_scheduler_fini(q-
> > > > > > > tlb_inval[i].dep_scheduler);
> > > > > > +
> > > > > > if (xe_exec_queue_uses_pxp(q))
> > > > > > xe_pxp_exec_queue_remove(gt_to_xe(q->gt)-
> > > > > > >pxp,
> > > > > > q);
> > > > > > if (q->vm)
> > > > > > @@ -50,6 +57,39 @@ static void __xe_exec_queue_free(struct
> > > > > > xe_exec_queue *q)
> > > > > > kfree(q);
> > > > > > }
> > > > > >
> > > > > > +static int alloc_dep_schedulers(struct xe_device *xe, struct
> > > > > > xe_exec_queue *q)
> > > > > > +{
> > > > > > + struct xe_tile *tile = gt_to_tile(q->gt);
> > > > > > + int i;
> > > > > > +
> > > > > > + for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT; ++i) {
> > > > > > + struct xe_dep_scheduler *dep_scheduler;
> > > > > > + struct xe_gt *gt;
> > > > > > + struct workqueue_struct *wq;
> > > > > > +
> > > > > > + if (i == XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT)
> > > > > > + gt = tile->primary_gt;
> > > > > > + else
> > > > > > + gt = tile->media_gt;
> > > > > > +
> > > > > > + if (!gt)
> > > > > > + continue;
> > > > > > +
> > > > > > + wq = gt->tlb_invalidation.job_wq;
> > > > > > +
> > > > > > +#define MAX_TLB_INVAL_JOBS 16 /* Picking a
> > > > > > reasonable
> > > > > > value
> > > > > > */
> > > > >
> > > > > So if we exceed tihs number, that means we won't get an
> > > > > invalidation of
> > > > > that range before a context switch to a new queue?
> >
> > Sorry missed this.
> >
> > No. This is just maximum number of TLB invalidations, on this queue,
> > which can be inflight to the hardware at once. If MAX_TLB_INVAL_JOBS
> > is
> > exceeded, the scheduler just holds jobs until prior onces complete
> > and
> > more job credits become available.
> >
> > Context switching or other queues are not really related here. Think
> > of
> > this a FIFO in which only MAX_TLB_INVAL_JOBS can be outstanding at
> > time.
>
> Got it. So basically summarizing, if I were to submit 17 TLB
> invalidations at once (before hardware completed any one of those)
> theoretically, followed by a submission from userspace on a new
> context, the scheduler would still accept these, but would only submit
> the first 16 to hardware. It would then submit the 17th once a hardware
> slot became available, then finally that new user context. Is that
> right?
Yes, exactly. TLB invalidations from a bind or unbind resolve to a
fence, which is installed in the dma-resv slots and returned to users
via sync objects. The dma-resv slots are used to ensure that KMD-issued
binds are complete before scheduling a user submission. Meanwhile, the
user should use the output DRM syncobj from a bind/unbind as input to a
submission if there is a dependency.
This is all part of the existing flow—this series only changes how TLB
invalidations are scheduled and how the fences are generated.
Matt
>
> Thanks,
> Stuart
>
> >
> > Matt
> >
> > > > >
> > > > > > + dep_scheduler = xe_dep_scheduler_create(xe,
> > > > > > wq,
> > > > > > q-
> > > > > > > name,
> > > > > > + MAX_T
> > > > > > LB_I
> > > > > > NVAL
> > > > > > _JOBS);
> > > > > > + if (IS_ERR(dep_scheduler))
> > > > > > + return PTR_ERR(dep_scheduler);
> > > > > > +
> > > > > > + q->tlb_inval[i].dep_scheduler =
> > > > > > dep_scheduler;
> > > > > > + }
> > > > > > +#undef MAX_TLB_INVAL_JOBS
> > > > > > +
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > static struct xe_exec_queue *__xe_exec_queue_alloc(struct
> > > > > > xe_device
> > > > > > *xe,
> > > > > > struct
> > > > > > xe_vm
> > > > > > *vm,
> > > > > > u32
> > > > > > logical_mask,
> > > > > > @@ -94,6 +134,14 @@ static struct xe_exec_queue
> > > > > > *__xe_exec_queue_alloc(struct xe_device *xe,
> > > > > > else
> > > > > > q->sched_props.priority =
> > > > > > XE_EXEC_QUEUE_PRIORITY_NORMAL;
> > > > >
> > > > > We want these to be high priority right? I.e. if another user
> > > > > creates a
> > > > > queue and submits at high priority can we get stale data on
> > > > > those
> > > > > executions if this one is blocked behind that high priority
> > > > > one?
> > > >
> > > > I'm struggling with exactly what you are asking here.
> > > >
> > > > The q->sched_props.priority is priority of exec queue being
> > > > created
> > > > and is existing code. In the case of bind queues (i.e., ones that
> > > > call alloc_dep_schedulers) this maps to priority of the GuC
> > > > context
> > > > which runs bind jobs on a hardware copy engine.
> > > >
> > > > dep_schedulers do not have priority, at least that is used in any
> > > > way.
> > > > This is just software queue which runs TLB invalidations once the
> > > > bind
> > > > jobs complete.
> > > >
> > > > wrt high priority passing lower priority ones, that isn't
> > > > possible if
> > > > there is a dependency chain. The driver holds any jobs in the KMD
> > > > until
> > > > all dependecies are resolved - high priority queues just get
> > > > serviced
> > > > by
> > > > the GuC over low priority queues if both queues have runnable
> > > > jobs.
> > > >
> > > > TL;DR nothing is patch changes anything wrt to priorities.
> > >
> > > You're right I think I read this wrong somehow :(, ignore my
> > > comment!
> > >
> > > Still the minor question above about the max number of dependency
> > > jobs
> > > when you have time.
> > >
> > > Thanks,
> > > Stuart
> > >
> > > >
> > > > Matt
> > > >
> > > > >
> > > > > Thanks,
> > > > > Stuart
> > > > >
> > > > > >
> > > > > > + if (q->flags & (EXEC_QUEUE_FLAG_MIGRATE |
> > > > > > EXEC_QUEUE_FLAG_VM)) {
> > > > > > + err = alloc_dep_schedulers(xe, q);
> > > > > > + if (err) {
> > > > > > + __xe_exec_queue_free(q);
> > > > > > + return ERR_PTR(err);
> > > > > > + }
> > > > > > + }
> > > > > > +
> > > > > > if (vm)
> > > > > > q->vm = xe_vm_get(vm);
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > > index abdf4a57e6e2..ba443a497b38 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > > @@ -134,6 +134,19 @@ struct xe_exec_queue {
> > > > > > struct list_head link;
> > > > > > } lr;
> > > > > >
> > > > > > +#define XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT 0
> > > > > > +#define XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT 1
> > > > > > +#define
> > > > > > XE_EXEC_QUEUE_TLB_INVAL_COUNT (XE_EXEC_QUEUE_TLB_INV
> > > > > > AL_M
> > > > > > EDIA
> > > > > > _GT + 1)
> > > > > > +
> > > > > > + /** @tlb_inval: TLB invalidations exec queue state */
> > > > > > + struct {
> > > > > > + /**
> > > > > > + * @tlb_inval.dep_scheduler: The TLB
> > > > > > invalidation
> > > > > > + * dependency scheduler
> > > > > > + */
> > > > > > + struct xe_dep_scheduler *dep_scheduler;
> > > > > > + } tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT];
> > > > > > +
> > > > > > /** @pxp: PXP info tracking */
> > > > > > struct {
> > > > > > /** @pxp.type: PXP session type used by this
> > > > > > queue */
> > > > >
> > >
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 2/9] drm/xe: Add generic dependecy jobs / scheduler
2025-07-15 21:13 ` Summers, Stuart
@ 2025-07-15 22:43 ` Summers, Stuart
2025-07-15 22:48 ` Matthew Brost
0 siblings, 1 reply; 45+ messages in thread
From: Summers, Stuart @ 2025-07-15 22:43 UTC (permalink / raw)
To: Brost, Matthew
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Tue, 2025-07-15 at 21:13 +0000, Summers, Stuart wrote:
> On Tue, 2025-07-15 at 14:14 -0700, Matthew Brost wrote:
> > On Tue, Jul 15, 2025 at 03:04:07PM -0600, Summers, Stuart wrote:
> > > On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > > > Add generic dependecy jobs / scheduler which serves as wrapper
> > > > for
> > > > DRM
> > > > scheduler. Useful when we want delay a generic operation until
> > > > a
> > > > dma-fence signals.
> > > >
> > > > Existing use cases could be destroying of resources based
> > > > fences
> > > > /
> > > > dma-resv, the preempt rebind worker, and pipelined GT TLB
> > > > invalidations.
> > > >
> > > > Written in such a way it could be moved to DRM subsystem if
> > > > needed.
> > > >
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/Makefile | 1 +
> > > > drivers/gpu/drm/xe/xe_dep_job_types.h | 29 ++++++
> > > > drivers/gpu/drm/xe/xe_dep_scheduler.c | 145
> > > > ++++++++++++++++++++++++++
> > > > drivers/gpu/drm/xe/xe_dep_scheduler.h | 21 ++++
> > > > 4 files changed, 196 insertions(+)
> > > > create mode 100644 drivers/gpu/drm/xe/xe_dep_job_types.h
> > > > create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.c
> > > > create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.h
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/Makefile
> > > > b/drivers/gpu/drm/xe/Makefile
> > > > index 1d97e5b63f4e..0edcfc770c0d 100644
> > > > --- a/drivers/gpu/drm/xe/Makefile
> > > > +++ b/drivers/gpu/drm/xe/Makefile
> > > > @@ -28,6 +28,7 @@ $(obj)/generated/%_wa_oob.c
> > > > $(obj)/generated/%_wa_oob.h: $(obj)/xe_gen_wa_oob \
> > > > xe-y += xe_bb.o \
> > > > xe_bo.o \
> > > > xe_bo_evict.o \
> > > > + xe_dep_scheduler.o \
> > > > xe_devcoredump.o \
> > > > xe_device.o \
> > > > xe_device_sysfs.o \
> > > > diff --git a/drivers/gpu/drm/xe/xe_dep_job_types.h
> > > > b/drivers/gpu/drm/xe/xe_dep_job_types.h
> > > > new file mode 100644
> > > > index 000000000000..c6a484f24c8c
> > > > --- /dev/null
> > > > +++ b/drivers/gpu/drm/xe/xe_dep_job_types.h
> > > > @@ -0,0 +1,29 @@
> > > > +/* SPDX-License-Identifier: MIT */
> > > > +/*
> > > > + * Copyright © 2025 Intel Corporation
> > > > + */
> > > > +
> > > > +#ifndef _XE_DEP_JOB_TYPES_H_
> > > > +#define _XE_DEP_JOB_TYPES_H_
> > > > +
> > > > +#include <drm/gpu_scheduler.h>
> > > > +
> > > > +struct xe_dep_job;
> > > > +
> > > > +/** struct xe_dep_job_ops - Generic Xe dependency job
> > > > operations
> > > > */
> > > > +struct xe_dep_job_ops {
> > > > + /** @run_job: Run generic Xe dependency job */
> > > > + struct dma_fence *(*run_job)(struct xe_dep_job *job);
> > > > + /** @free_job: Free generic Xe dependency job */
> > > > + void (*free_job)(struct xe_dep_job *job);
> > > > +};
> > > > +
> > > > +/** struct xe_dep_job - Generic dependency Xe job */
> > > > +struct xe_dep_job {
> > > > + /** @drm: base DRM scheduler job */
> > > > + struct drm_sched_job drm;
Not necessary for this patch, you can keep as-is, but this naming was a
little confusing to me. IMO it'd be a little more clear as drm_job or
even just job. drm alone to me implies drm_device struct.
Thanks,
Stuart
> > > > + /** @ops: dependency job operations */
> > > > + const struct xe_dep_job_ops *ops;
> > > > +};
> > > > +
> > > > +#endif
> > > > diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.c
> > > > b/drivers/gpu/drm/xe/xe_dep_scheduler.c
> > > > new file mode 100644
> > > > index 000000000000..fbd55577d787
> > > > --- /dev/null
> > > > +++ b/drivers/gpu/drm/xe/xe_dep_scheduler.c
> > > > @@ -0,0 +1,145 @@
> > > > +// SPDX-License-Identifier: MIT
> > > > +/*
> > > > + * Copyright © 2025 Intel Corporation
> > > > + */
> > > > +
> > > > +#include <linux/slab.h>
> > > > +
> > > > +#include <drm/gpu_scheduler.h>
> > > > +
> > > > +#include "xe_dep_job_types.h"
> > > > +#include "xe_dep_scheduler.h"
> > > > +#include "xe_device_types.h"
> > > > +
> > > > +/**
> > > > + * DOC: Xe Dependency Scheduler
> > > > + *
> > > > + * The Xe dependency scheduler is a simple wrapper built
> > > > around
> > > > the
> > > > DRM
> > > > + * scheduler to execute jobs once their dependencies are
> > > > resolved
> > > > (i.e., all
> > > > + * input fences specified as dependencies are signaled). The
> > > > jobs
> > > > that are
> > > > + * executed contain virtual functions to run (execute) and
> > > > free
> > > > the
> > > > job,
> > > > + * allowing a single dependency scheduler to handle jobs
> > > > performing
> > > > different
> > > > + * operations.
> > > > + *
> > > > + * Example use cases include deferred resource freeing, TLB
> > > > invalidations after
> > > > + * bind jobs, etc.
> > > > + */
> > > > +
> > > > +/** struct xe_dep_scheduler - Generic Xe dependency scheduler
> > > > */
> > > > +struct xe_dep_scheduler {
> > > > + /** @sched: DRM GPU scheduler */
> > > > + struct drm_gpu_scheduler sched;
> > > > + /** @entity: DRM scheduler entity */
> > > > + struct drm_sched_entity entity;
> > > > + /** @rcu: For safe freeing of exported dma fences */
> > > > + struct rcu_head rcu;
> > > > +};
> > > > +
> > > > +static struct dma_fence *xe_dep_scheduler_run_job(struct
> > > > drm_sched_job *drm_job)
> > > > +{
> > > > + struct xe_dep_job *dep_job =
> > > > + container_of(drm_job, typeof(*dep_job), drm);
> > > > +
> > > > + return dep_job->ops->run_job(dep_job);
> > > > +}
> > > > +
> > > > +static void xe_dep_scheduler_free_job(struct drm_sched_job
> > > > *drm_job)
> > > > +{
> > > > + struct xe_dep_job *dep_job =
> > > > + container_of(drm_job, typeof(*dep_job), drm);
> > > > +
> > > > + dep_job->ops->free_job(dep_job);
> > > > +}
> > > > +
> > > > +static const struct drm_sched_backend_ops sched_ops = {
> > > > + .run_job = xe_dep_scheduler_run_job,
> > > > + .free_job = xe_dep_scheduler_free_job,
> > > > +};
> > > > +
> > > > +/**
> > > > + * xe_dep_scheduler_create() - Generic Xe dependency scheduler
> > > > create
> > > > + * @xe: Xe device
> > > > + * @submit_wq: Submit workqueue struct (can be NULL)
> > > > + * @name: Name of dependency scheduler
> > > > + * @job_limit: Max dependency jobs that can be scheduled
> > > > + *
> > > > + * Create a generic Xe dependency scheduler and initialize
> > > > internal
> > > > DRM
> > > > + * scheduler objects.
> > > > + *
> > > > + * Return: Generic Xe dependency scheduler object or ERR_PTR
> > > > + */
> > > > +struct xe_dep_scheduler *
> > > > +xe_dep_scheduler_create(struct xe_device *xe,
> > > > + struct workqueue_struct *submit_wq,
> > > > + const char *name, u32 job_limit)
> > > > +{
> > > > + struct xe_dep_scheduler *dep_scheduler;
> > > > + struct drm_gpu_scheduler *sched;
> > > > + const struct drm_sched_init_args args = {
> > > > + .ops = &sched_ops,
> > > > + .submit_wq = submit_wq,
> > > > + .num_rqs = 1,
> > > > + .credit_limit = job_limit,
> > > > + .timeout = MAX_SCHEDULE_TIMEOUT,
> > > > + .name = name,
> > > > + .dev = xe->drm.dev,
> > > > + };
> > > > + int err;
> > > > +
> > > > + dep_scheduler = kzalloc(sizeof(*dep_scheduler),
> > > > GFP_KERNEL);
> > > > + if (!dep_scheduler)
> > > > + return ERR_PTR(-ENOMEM);
> > > > +
> > > > + err = drm_sched_init(&dep_scheduler->sched, &args);
> > > > + if (err)
> > > > + goto err_free;
> > > > +
> > > > + sched = &dep_scheduler->sched;
> > > > + err = drm_sched_entity_init(&dep_scheduler->entity, 0,
> > > > + (struct drm_gpu_scheduler
> > > > **)&sched, 1,
> > >
> > > Why the cast here?
> > >
> >
> > Copied from some existing code that had a cast, not needed it
> > either
> > case. Will remove.
>
> Sounds great. With that:
> Reviewed-by: Stuart Summers <stuart.summers@intel.com>
>
> Thanks,
> Stuart
>
> >
> > Matt
> >
> > > Otherwise this patch lgtm.
> > >
> > > Thanks,
> > > Stuart
> > >
> > > > + NULL);
> > > > + if (err)
> > > > + goto err_sched;
> > > > +
> > > > + init_rcu_head(&dep_scheduler->rcu);
> > > > +
> > > > + return dep_scheduler;
> > > > +
> > > > +err_sched:
> > > > + drm_sched_fini(&dep_scheduler->sched);
> > > > +err_free:
> > > > + kfree(dep_scheduler);
> > > > +
> > > > + return ERR_PTR(err);
> > > > +}
> > > > +
> > > > +/**
> > > > + * xe_dep_scheduler_fini() - Generic Xe dependency scheduler
> > > > finalize
> > > > + * @dep_scheduler: Generic Xe dependency scheduler object
> > > > + *
> > > > + * Finalize internal DRM scheduler objects and free generic Xe
> > > > dependency
> > > > + * scheduler object
> > > > + */
> > > > +void xe_dep_scheduler_fini(struct xe_dep_scheduler
> > > > *dep_scheduler)
> > > > +{
> > > > + drm_sched_entity_fini(&dep_scheduler->entity);
> > > > + drm_sched_fini(&dep_scheduler->sched);
> > > > + /*
> > > > + * RCU free due sched being exported via DRM scheduler
> > > > fences
> > > > + * (timeline name).
> > > > + */
> > > > + kfree_rcu(dep_scheduler, rcu);
> > > > +}
> > > > +
> > > > +/**
> > > > + * xe_dep_scheduler_entity() - Retrieve a generic Xe
> > > > dependency
> > > > scheduler
> > > > + * DRM scheduler entity
> > > > + * @dep_scheduler: Generic Xe dependency scheduler object
> > > > + *
> > > > + * Return: The generic Xe dependency scheduler's DRM scheduler
> > > > entity
> > > > + */
> > > > +struct drm_sched_entity *
> > > > +xe_dep_scheduler_entity(struct xe_dep_scheduler
> > > > *dep_scheduler)
> > > > +{
> > > > + return &dep_scheduler->entity;
> > > > +}
> > > > diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.h
> > > > b/drivers/gpu/drm/xe/xe_dep_scheduler.h
> > > > new file mode 100644
> > > > index 000000000000..853961eec64b
> > > > --- /dev/null
> > > > +++ b/drivers/gpu/drm/xe/xe_dep_scheduler.h
> > > > @@ -0,0 +1,21 @@
> > > > +/* SPDX-License-Identifier: MIT */
> > > > +/*
> > > > + * Copyright © 2025 Intel Corporation
> > > > + */
> > > > +
> > > > +#include <linux/types.h>
> > > > +
> > > > +struct drm_sched_entity;
> > > > +struct workqueue_struct;
> > > > +struct xe_dep_scheduler;
> > > > +struct xe_device;
> > > > +
> > > > +struct xe_dep_scheduler *
> > > > +xe_dep_scheduler_create(struct xe_device *xe,
> > > > + struct workqueue_struct *submit_wq,
> > > > + const char *name, u32 job_limit);
> > > > +
> > > > +void xe_dep_scheduler_fini(struct xe_dep_scheduler
> > > > *dep_scheduler);
> > > > +
> > > > +struct drm_sched_entity *
> > > > +xe_dep_scheduler_entity(struct xe_dep_scheduler
> > > > *dep_scheduler);
> > >
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 6/9] drm/xe: Add xe_migrate_job_lock/unlock helpers
2025-07-02 23:42 ` [PATCH v2 6/9] drm/xe: Add xe_migrate_job_lock/unlock helpers Matthew Brost
@ 2025-07-15 22:48 ` Summers, Stuart
2025-07-16 1:11 ` Matthew Brost
0 siblings, 1 reply; 45+ messages in thread
From: Summers, Stuart @ 2025-07-15 22:48 UTC (permalink / raw)
To: intel-xe@lists.freedesktop.org, Brost, Matthew
Cc: maarten.lankhorst@linux.intel.com, Auld, Matthew
On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> Add xe_migrate_job_lock/unlock helpers is used ensure ordering when
> issuing GT TLB invalidation jobs.
>
> v2:
> - Fix multi-line comments (checkpatch)
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
IMO this could be squashed with the patch after since there aren't any
users in this patch. But the code itself looks ok to me.
> ---
> drivers/gpu/drm/xe/xe_migrate.c | 36
> +++++++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_migrate.h | 4 ++++
> 2 files changed, 40 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> b/drivers/gpu/drm/xe/xe_migrate.c
> index b5f85162b9ed..1f57adcbb535 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -1917,6 +1917,42 @@ int xe_migrate_access_memory(struct xe_migrate
> *m, struct xe_bo *bo,
> return IS_ERR(fence) ? PTR_ERR(fence) : 0;
> }
>
> +/**
> + * xe_migrate_job_lock() - Lock migrate job lock
> + * @m: The migration context.
> + * @q: Queue associated with the operation which requires a lock
> + *
> + * Lock the migrate job lock if the queue is a migration queue,
> otherwise
> + * assert the VM's dma-resv is held (user queue's have own locking).
> + */
> +void xe_migrate_job_lock(struct xe_migrate *m, struct xe_exec_queue
> *q)
> +{
> + bool is_migrate = q == m->q;
Maybe not worth it, but we're doing these same calculations in
xe_migrate.c. Should we just add a helper?
Either way for the above 2, the code looks ok to me:
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Thanks,
Stuart
> +
> + if (is_migrate)
> + mutex_lock(&m->job_mutex);
> + else
> + xe_vm_assert_held(q->vm); /* User queues VM's
> should be locked */
> +}
> +
> +/**
> + * xe_migrate_job_unlock() - Unlock migrate job lock
> + * @m: The migration context.
> + * @q: Queue associated with the operation which requires a lock
> + *
> + * Unlock the migrate job lock if the queue is a migration queue,
> otherwise
> + * assert the VM's dma-resv is held (user queue's have own locking).
> + */
> +void xe_migrate_job_unlock(struct xe_migrate *m, struct
> xe_exec_queue *q)
> +{
> + bool is_migrate = q == m->q;
> +
> + if (is_migrate)
> + mutex_unlock(&m->job_mutex);
> + else
> + xe_vm_assert_held(q->vm); /* User queues VM's
> should be locked */
> +}
> +
> #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
> #include "tests/xe_migrate.c"
> #endif
> diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> b/drivers/gpu/drm/xe/xe_migrate.h
> index fb9839c1bae0..e9d83d320f8c 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.h
> +++ b/drivers/gpu/drm/xe/xe_migrate.h
> @@ -134,4 +134,8 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
> void xe_migrate_wait(struct xe_migrate *m);
>
> struct xe_exec_queue *xe_tile_migrate_exec_queue(struct xe_tile
> *tile);
> +
> +void xe_migrate_job_lock(struct xe_migrate *m, struct xe_exec_queue
> *q);
> +void xe_migrate_job_unlock(struct xe_migrate *m, struct
> xe_exec_queue *q);
> +
> #endif
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 2/9] drm/xe: Add generic dependecy jobs / scheduler
2025-07-15 22:43 ` Summers, Stuart
@ 2025-07-15 22:48 ` Matthew Brost
0 siblings, 0 replies; 45+ messages in thread
From: Matthew Brost @ 2025-07-15 22:48 UTC (permalink / raw)
To: Summers, Stuart
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Tue, Jul 15, 2025 at 04:43:54PM -0600, Summers, Stuart wrote:
> On Tue, 2025-07-15 at 21:13 +0000, Summers, Stuart wrote:
> > On Tue, 2025-07-15 at 14:14 -0700, Matthew Brost wrote:
> > > On Tue, Jul 15, 2025 at 03:04:07PM -0600, Summers, Stuart wrote:
> > > > On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > > > > Add generic dependecy jobs / scheduler which serves as wrapper
> > > > > for
> > > > > DRM
> > > > > scheduler. Useful when we want delay a generic operation until
> > > > > a
> > > > > dma-fence signals.
> > > > >
> > > > > Existing use cases could be destroying of resources based
> > > > > fences
> > > > > /
> > > > > dma-resv, the preempt rebind worker, and pipelined GT TLB
> > > > > invalidations.
> > > > >
> > > > > Written in such a way it could be moved to DRM subsystem if
> > > > > needed.
> > > > >
> > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > ---
> > > > > drivers/gpu/drm/xe/Makefile | 1 +
> > > > > drivers/gpu/drm/xe/xe_dep_job_types.h | 29 ++++++
> > > > > drivers/gpu/drm/xe/xe_dep_scheduler.c | 145
> > > > > ++++++++++++++++++++++++++
> > > > > drivers/gpu/drm/xe/xe_dep_scheduler.h | 21 ++++
> > > > > 4 files changed, 196 insertions(+)
> > > > > create mode 100644 drivers/gpu/drm/xe/xe_dep_job_types.h
> > > > > create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.c
> > > > > create mode 100644 drivers/gpu/drm/xe/xe_dep_scheduler.h
> > > > >
> > > > > diff --git a/drivers/gpu/drm/xe/Makefile
> > > > > b/drivers/gpu/drm/xe/Makefile
> > > > > index 1d97e5b63f4e..0edcfc770c0d 100644
> > > > > --- a/drivers/gpu/drm/xe/Makefile
> > > > > +++ b/drivers/gpu/drm/xe/Makefile
> > > > > @@ -28,6 +28,7 @@ $(obj)/generated/%_wa_oob.c
> > > > > $(obj)/generated/%_wa_oob.h: $(obj)/xe_gen_wa_oob \
> > > > > xe-y += xe_bb.o \
> > > > > xe_bo.o \
> > > > > xe_bo_evict.o \
> > > > > + xe_dep_scheduler.o \
> > > > > xe_devcoredump.o \
> > > > > xe_device.o \
> > > > > xe_device_sysfs.o \
> > > > > diff --git a/drivers/gpu/drm/xe/xe_dep_job_types.h
> > > > > b/drivers/gpu/drm/xe/xe_dep_job_types.h
> > > > > new file mode 100644
> > > > > index 000000000000..c6a484f24c8c
> > > > > --- /dev/null
> > > > > +++ b/drivers/gpu/drm/xe/xe_dep_job_types.h
> > > > > @@ -0,0 +1,29 @@
> > > > > +/* SPDX-License-Identifier: MIT */
> > > > > +/*
> > > > > + * Copyright © 2025 Intel Corporation
> > > > > + */
> > > > > +
> > > > > +#ifndef _XE_DEP_JOB_TYPES_H_
> > > > > +#define _XE_DEP_JOB_TYPES_H_
> > > > > +
> > > > > +#include <drm/gpu_scheduler.h>
> > > > > +
> > > > > +struct xe_dep_job;
> > > > > +
> > > > > +/** struct xe_dep_job_ops - Generic Xe dependency job
> > > > > operations
> > > > > */
> > > > > +struct xe_dep_job_ops {
> > > > > + /** @run_job: Run generic Xe dependency job */
> > > > > + struct dma_fence *(*run_job)(struct xe_dep_job *job);
> > > > > + /** @free_job: Free generic Xe dependency job */
> > > > > + void (*free_job)(struct xe_dep_job *job);
> > > > > +};
> > > > > +
> > > > > +/** struct xe_dep_job - Generic dependency Xe job */
> > > > > +struct xe_dep_job {
> > > > > + /** @drm: base DRM scheduler job */
> > > > > + struct drm_sched_job drm;
>
> Not necessary for this patch, you can keep as-is, but this naming was a
> little confusing to me. IMO it'd be a little more clear as drm_job or
> even just job. drm alone to me implies drm_device struct.
>
I followed xe_sched_job namming here. I guess it could be better...
'base' or 'drm_job' would be better everywhere. Can do that is a follow up.
Matt
> Thanks,
> Stuart
>
> > > > > + /** @ops: dependency job operations */
> > > > > + const struct xe_dep_job_ops *ops;
> > > > > +};
> > > > > +
> > > > > +#endif
> > > > > diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.c
> > > > > b/drivers/gpu/drm/xe/xe_dep_scheduler.c
> > > > > new file mode 100644
> > > > > index 000000000000..fbd55577d787
> > > > > --- /dev/null
> > > > > +++ b/drivers/gpu/drm/xe/xe_dep_scheduler.c
> > > > > @@ -0,0 +1,145 @@
> > > > > +// SPDX-License-Identifier: MIT
> > > > > +/*
> > > > > + * Copyright © 2025 Intel Corporation
> > > > > + */
> > > > > +
> > > > > +#include <linux/slab.h>
> > > > > +
> > > > > +#include <drm/gpu_scheduler.h>
> > > > > +
> > > > > +#include "xe_dep_job_types.h"
> > > > > +#include "xe_dep_scheduler.h"
> > > > > +#include "xe_device_types.h"
> > > > > +
> > > > > +/**
> > > > > + * DOC: Xe Dependency Scheduler
> > > > > + *
> > > > > + * The Xe dependency scheduler is a simple wrapper built
> > > > > around
> > > > > the
> > > > > DRM
> > > > > + * scheduler to execute jobs once their dependencies are
> > > > > resolved
> > > > > (i.e., all
> > > > > + * input fences specified as dependencies are signaled). The
> > > > > jobs
> > > > > that are
> > > > > + * executed contain virtual functions to run (execute) and
> > > > > free
> > > > > the
> > > > > job,
> > > > > + * allowing a single dependency scheduler to handle jobs
> > > > > performing
> > > > > different
> > > > > + * operations.
> > > > > + *
> > > > > + * Example use cases include deferred resource freeing, TLB
> > > > > invalidations after
> > > > > + * bind jobs, etc.
> > > > > + */
> > > > > +
> > > > > +/** struct xe_dep_scheduler - Generic Xe dependency scheduler
> > > > > */
> > > > > +struct xe_dep_scheduler {
> > > > > + /** @sched: DRM GPU scheduler */
> > > > > + struct drm_gpu_scheduler sched;
> > > > > + /** @entity: DRM scheduler entity */
> > > > > + struct drm_sched_entity entity;
> > > > > + /** @rcu: For safe freeing of exported dma fences */
> > > > > + struct rcu_head rcu;
> > > > > +};
> > > > > +
> > > > > +static struct dma_fence *xe_dep_scheduler_run_job(struct
> > > > > drm_sched_job *drm_job)
> > > > > +{
> > > > > + struct xe_dep_job *dep_job =
> > > > > + container_of(drm_job, typeof(*dep_job), drm);
> > > > > +
> > > > > + return dep_job->ops->run_job(dep_job);
> > > > > +}
> > > > > +
> > > > > +static void xe_dep_scheduler_free_job(struct drm_sched_job
> > > > > *drm_job)
> > > > > +{
> > > > > + struct xe_dep_job *dep_job =
> > > > > + container_of(drm_job, typeof(*dep_job), drm);
> > > > > +
> > > > > + dep_job->ops->free_job(dep_job);
> > > > > +}
> > > > > +
> > > > > +static const struct drm_sched_backend_ops sched_ops = {
> > > > > + .run_job = xe_dep_scheduler_run_job,
> > > > > + .free_job = xe_dep_scheduler_free_job,
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * xe_dep_scheduler_create() - Generic Xe dependency scheduler
> > > > > create
> > > > > + * @xe: Xe device
> > > > > + * @submit_wq: Submit workqueue struct (can be NULL)
> > > > > + * @name: Name of dependency scheduler
> > > > > + * @job_limit: Max dependency jobs that can be scheduled
> > > > > + *
> > > > > + * Create a generic Xe dependency scheduler and initialize
> > > > > internal
> > > > > DRM
> > > > > + * scheduler objects.
> > > > > + *
> > > > > + * Return: Generic Xe dependency scheduler object or ERR_PTR
> > > > > + */
> > > > > +struct xe_dep_scheduler *
> > > > > +xe_dep_scheduler_create(struct xe_device *xe,
> > > > > + struct workqueue_struct *submit_wq,
> > > > > + const char *name, u32 job_limit)
> > > > > +{
> > > > > + struct xe_dep_scheduler *dep_scheduler;
> > > > > + struct drm_gpu_scheduler *sched;
> > > > > + const struct drm_sched_init_args args = {
> > > > > + .ops = &sched_ops,
> > > > > + .submit_wq = submit_wq,
> > > > > + .num_rqs = 1,
> > > > > + .credit_limit = job_limit,
> > > > > + .timeout = MAX_SCHEDULE_TIMEOUT,
> > > > > + .name = name,
> > > > > + .dev = xe->drm.dev,
> > > > > + };
> > > > > + int err;
> > > > > +
> > > > > + dep_scheduler = kzalloc(sizeof(*dep_scheduler),
> > > > > GFP_KERNEL);
> > > > > + if (!dep_scheduler)
> > > > > + return ERR_PTR(-ENOMEM);
> > > > > +
> > > > > + err = drm_sched_init(&dep_scheduler->sched, &args);
> > > > > + if (err)
> > > > > + goto err_free;
> > > > > +
> > > > > + sched = &dep_scheduler->sched;
> > > > > + err = drm_sched_entity_init(&dep_scheduler->entity, 0,
> > > > > + (struct drm_gpu_scheduler
> > > > > **)&sched, 1,
> > > >
> > > > Why the cast here?
> > > >
> > >
> > > Copied from some existing code that had a cast, not needed it
> > > either
> > > case. Will remove.
> >
> > Sounds great. With that:
> > Reviewed-by: Stuart Summers <stuart.summers@intel.com>
> >
> > Thanks,
> > Stuart
> >
> > >
> > > Matt
> > >
> > > > Otherwise this patch lgtm.
> > > >
> > > > Thanks,
> > > > Stuart
> > > >
> > > > > + NULL);
> > > > > + if (err)
> > > > > + goto err_sched;
> > > > > +
> > > > > + init_rcu_head(&dep_scheduler->rcu);
> > > > > +
> > > > > + return dep_scheduler;
> > > > > +
> > > > > +err_sched:
> > > > > + drm_sched_fini(&dep_scheduler->sched);
> > > > > +err_free:
> > > > > + kfree(dep_scheduler);
> > > > > +
> > > > > + return ERR_PTR(err);
> > > > > +}
> > > > > +
> > > > > +/**
> > > > > + * xe_dep_scheduler_fini() - Generic Xe dependency scheduler
> > > > > finalize
> > > > > + * @dep_scheduler: Generic Xe dependency scheduler object
> > > > > + *
> > > > > + * Finalize internal DRM scheduler objects and free generic Xe
> > > > > dependency
> > > > > + * scheduler object
> > > > > + */
> > > > > +void xe_dep_scheduler_fini(struct xe_dep_scheduler
> > > > > *dep_scheduler)
> > > > > +{
> > > > > + drm_sched_entity_fini(&dep_scheduler->entity);
> > > > > + drm_sched_fini(&dep_scheduler->sched);
> > > > > + /*
> > > > > + * RCU free due sched being exported via DRM scheduler
> > > > > fences
> > > > > + * (timeline name).
> > > > > + */
> > > > > + kfree_rcu(dep_scheduler, rcu);
> > > > > +}
> > > > > +
> > > > > +/**
> > > > > + * xe_dep_scheduler_entity() - Retrieve a generic Xe
> > > > > dependency
> > > > > scheduler
> > > > > + * DRM scheduler entity
> > > > > + * @dep_scheduler: Generic Xe dependency scheduler object
> > > > > + *
> > > > > + * Return: The generic Xe dependency scheduler's DRM scheduler
> > > > > entity
> > > > > + */
> > > > > +struct drm_sched_entity *
> > > > > +xe_dep_scheduler_entity(struct xe_dep_scheduler
> > > > > *dep_scheduler)
> > > > > +{
> > > > > + return &dep_scheduler->entity;
> > > > > +}
> > > > > diff --git a/drivers/gpu/drm/xe/xe_dep_scheduler.h
> > > > > b/drivers/gpu/drm/xe/xe_dep_scheduler.h
> > > > > new file mode 100644
> > > > > index 000000000000..853961eec64b
> > > > > --- /dev/null
> > > > > +++ b/drivers/gpu/drm/xe/xe_dep_scheduler.h
> > > > > @@ -0,0 +1,21 @@
> > > > > +/* SPDX-License-Identifier: MIT */
> > > > > +/*
> > > > > + * Copyright © 2025 Intel Corporation
> > > > > + */
> > > > > +
> > > > > +#include <linux/types.h>
> > > > > +
> > > > > +struct drm_sched_entity;
> > > > > +struct workqueue_struct;
> > > > > +struct xe_dep_scheduler;
> > > > > +struct xe_device;
> > > > > +
> > > > > +struct xe_dep_scheduler *
> > > > > +xe_dep_scheduler_create(struct xe_device *xe,
> > > > > + struct workqueue_struct *submit_wq,
> > > > > + const char *name, u32 job_limit);
> > > > > +
> > > > > +void xe_dep_scheduler_fini(struct xe_dep_scheduler
> > > > > *dep_scheduler);
> > > > > +
> > > > > +struct drm_sched_entity *
> > > > > +xe_dep_scheduler_entity(struct xe_dep_scheduler
> > > > > *dep_scheduler);
> > > >
> >
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 5/9] drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues
2025-07-15 22:01 ` Matthew Brost
@ 2025-07-15 22:49 ` Summers, Stuart
0 siblings, 0 replies; 45+ messages in thread
From: Summers, Stuart @ 2025-07-15 22:49 UTC (permalink / raw)
To: Brost, Matthew
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Tue, 2025-07-15 at 15:01 -0700, Matthew Brost wrote:
> On Tue, Jul 15, 2025 at 03:53:27PM -0600, Summers, Stuart wrote:
> > On Tue, 2025-07-15 at 14:52 -0700, Matthew Brost wrote:
> > > On Tue, Jul 15, 2025 at 03:45:55PM -0600, Summers, Stuart wrote:
> > > > On Tue, 2025-07-15 at 14:44 -0700, Matthew Brost wrote:
> > > > > On Tue, Jul 15, 2025 at 03:34:29PM -0600, Summers, Stuart
> > > > > wrote:
> > > > > > On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > > > > > > Add a generic dependency scheduler for GT TLB
> > > > > > > invalidations,
> > > > > > > used
> > > > > > > to
> > > > > > > schedule jobs that issue GT TLB invalidations to bind
> > > > > > > queues.
> > > > > > >
> > > > > > > v2:
> > > > > > > - Use shared GT TLB invalidation queue for dep scheduler
> > > > > > > - Break allocation of dep scheduler into its own
> > > > > > > function
> > > > > > > - Add define for max number tlb invalidations
> > > > > > > - Skip media if not present
> > > > > > >
> > > > > > > Suggested-by: Thomas Hellström
> > > > > > > <thomas.hellstrom@linux.intel.com>
> > > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > > > ---
> > > > > > > drivers/gpu/drm/xe/xe_exec_queue.c | 48
> > > > > > > ++++++++++++++++++++++++
> > > > > > > drivers/gpu/drm/xe/xe_exec_queue_types.h | 13 +++++++
> > > > > > > 2 files changed, 61 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > > index fee22358cc09..7aaf669cf5fc 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > > @@ -12,6 +12,7 @@
> > > > > > > #include <drm/drm_file.h>
> > > > > > > #include <uapi/drm/xe_drm.h>
> > > > > > >
> > > > > > > +#include "xe_dep_scheduler.h"
> > > > > > > #include "xe_device.h"
> > > > > > > #include "xe_gt.h"
> > > > > > > #include "xe_hw_engine_class_sysfs.h"
> > > > > > > @@ -39,6 +40,12 @@ static int
> > > > > > > exec_queue_user_extensions(struct
> > > > > > > xe_device *xe, struct xe_exec_queue
> > > > > > >
> > > > > > > static void __xe_exec_queue_free(struct xe_exec_queue
> > > > > > > *q)
> > > > > > > {
> > > > > > > + int i;
> > > > > > > +
> > > > > > > + for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT;
> > > > > > > ++i)
> > > > > > > + if (q->tlb_inval[i].dep_scheduler)
> > > > > > > + xe_dep_scheduler_fini(q-
> > > > > > > > tlb_inval[i].dep_scheduler);
> > > > > > > +
> > > > > > > if (xe_exec_queue_uses_pxp(q))
> > > > > > > xe_pxp_exec_queue_remove(gt_to_xe(q->gt)-
> > > > > > > > pxp,
> > > > > > > q);
> > > > > > > if (q->vm)
> > > > > > > @@ -50,6 +57,39 @@ static void
> > > > > > > __xe_exec_queue_free(struct
> > > > > > > xe_exec_queue *q)
> > > > > > > kfree(q);
> > > > > > > }
> > > > > > >
> > > > > > > +static int alloc_dep_schedulers(struct xe_device *xe,
> > > > > > > struct
> > > > > > > xe_exec_queue *q)
> > > > > > > +{
> > > > > > > + struct xe_tile *tile = gt_to_tile(q->gt);
> > > > > > > + int i;
> > > > > > > +
> > > > > > > + for (i = 0; i < XE_EXEC_QUEUE_TLB_INVAL_COUNT;
> > > > > > > ++i) {
> > > > > > > + struct xe_dep_scheduler *dep_scheduler;
> > > > > > > + struct xe_gt *gt;
> > > > > > > + struct workqueue_struct *wq;
> > > > > > > +
> > > > > > > + if (i ==
> > > > > > > XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT)
> > > > > > > + gt = tile->primary_gt;
> > > > > > > + else
> > > > > > > + gt = tile->media_gt;
> > > > > > > +
> > > > > > > + if (!gt)
> > > > > > > + continue;
> > > > > > > +
> > > > > > > + wq = gt->tlb_invalidation.job_wq;
> > > > > > > +
> > > > > > > +#define MAX_TLB_INVAL_JOBS 16 /* Picking a
> > > > > > > reasonable
> > > > > > > value
> > > > > > > */
> > > > > >
> > > > > > So if we exceed tihs number, that means we won't get an
> > > > > > invalidation of
> > > > > > that range before a context switch to a new queue?
> > >
> > > Sorry missed this.
> > >
> > > No. This is just maximum number of TLB invalidations, on this
> > > queue,
> > > which can be inflight to the hardware at once. If
> > > MAX_TLB_INVAL_JOBS
> > > is
> > > exceeded, the scheduler just holds jobs until prior onces
> > > complete
> > > and
> > > more job credits become available.
> > >
> > > Context switching or other queues are not really related here.
> > > Think
> > > of
> > > this a FIFO in which only MAX_TLB_INVAL_JOBS can be outstanding
> > > at
> > > time.
> >
> > Got it. So basically summarizing, if I were to submit 17 TLB
> > invalidations at once (before hardware completed any one of those)
> > theoretically, followed by a submission from userspace on a new
> > context, the scheduler would still accept these, but would only
> > submit
> > the first 16 to hardware. It would then submit the 17th once a
> > hardware
> > slot became available, then finally that new user context. Is that
> > right?
>
> Yes, exactly. TLB invalidations from a bind or unbind resolve to a
> fence, which is installed in the dma-resv slots and returned to users
> via sync objects. The dma-resv slots are used to ensure that KMD-
> issued
> binds are complete before scheduling a user submission. Meanwhile,
> the
> user should use the output DRM syncobj from a bind/unbind as input to
> a
> submission if there is a dependency.
>
> This is all part of the existing flow—this series only changes how
> TLB
> invalidations are scheduled and how the fences are generated.
Thanks Matt and makes sense:
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
>
> Matt
>
> >
> > Thanks,
> > Stuart
> >
> > >
> > > Matt
> > >
> > > > > >
> > > > > > > + dep_scheduler =
> > > > > > > xe_dep_scheduler_create(xe,
> > > > > > > wq,
> > > > > > > q-
> > > > > > > > name,
> > > > > > > + M
> > > > > > > AX_T
> > > > > > > LB_I
> > > > > > > NVAL
> > > > > > > _JOBS);
> > > > > > > + if (IS_ERR(dep_scheduler))
> > > > > > > + return PTR_ERR(dep_scheduler);
> > > > > > > +
> > > > > > > + q->tlb_inval[i].dep_scheduler =
> > > > > > > dep_scheduler;
> > > > > > > + }
> > > > > > > +#undef MAX_TLB_INVAL_JOBS
> > > > > > > +
> > > > > > > + return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > static struct xe_exec_queue
> > > > > > > *__xe_exec_queue_alloc(struct
> > > > > > > xe_device
> > > > > > > *xe,
> > > > > > > struct
> > > > > > > xe_vm
> > > > > > > *vm,
> > > > > > > u32
> > > > > > > logical_mask,
> > > > > > > @@ -94,6 +134,14 @@ static struct xe_exec_queue
> > > > > > > *__xe_exec_queue_alloc(struct xe_device *xe,
> > > > > > > else
> > > > > > > q->sched_props.priority =
> > > > > > > XE_EXEC_QUEUE_PRIORITY_NORMAL;
> > > > > >
> > > > > > We want these to be high priority right? I.e. if another
> > > > > > user
> > > > > > creates a
> > > > > > queue and submits at high priority can we get stale data on
> > > > > > those
> > > > > > executions if this one is blocked behind that high priority
> > > > > > one?
> > > > >
> > > > > I'm struggling with exactly what you are asking here.
> > > > >
> > > > > The q->sched_props.priority is priority of exec queue being
> > > > > created
> > > > > and is existing code. In the case of bind queues (i.e., ones
> > > > > that
> > > > > call alloc_dep_schedulers) this maps to priority of the GuC
> > > > > context
> > > > > which runs bind jobs on a hardware copy engine.
> > > > >
> > > > > dep_schedulers do not have priority, at least that is used in
> > > > > any
> > > > > way.
> > > > > This is just software queue which runs TLB invalidations once
> > > > > the
> > > > > bind
> > > > > jobs complete.
> > > > >
> > > > > wrt high priority passing lower priority ones, that isn't
> > > > > possible if
> > > > > there is a dependency chain. The driver holds any jobs in the
> > > > > KMD
> > > > > until
> > > > > all dependecies are resolved - high priority queues just get
> > > > > serviced
> > > > > by
> > > > > the GuC over low priority queues if both queues have runnable
> > > > > jobs.
> > > > >
> > > > > TL;DR nothing is patch changes anything wrt to priorities.
> > > >
> > > > You're right I think I read this wrong somehow :(, ignore my
> > > > comment!
> > > >
> > > > Still the minor question above about the max number of
> > > > dependency
> > > > jobs
> > > > when you have time.
> > > >
> > > > Thanks,
> > > > Stuart
> > > >
> > > > >
> > > > > Matt
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Stuart
> > > > > >
> > > > > > >
> > > > > > > + if (q->flags & (EXEC_QUEUE_FLAG_MIGRATE |
> > > > > > > EXEC_QUEUE_FLAG_VM)) {
> > > > > > > + err = alloc_dep_schedulers(xe, q);
> > > > > > > + if (err) {
> > > > > > > + __xe_exec_queue_free(q);
> > > > > > > + return ERR_PTR(err);
> > > > > > > + }
> > > > > > > + }
> > > > > > > +
> > > > > > > if (vm)
> > > > > > > q->vm = xe_vm_get(vm);
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > > > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > > > index abdf4a57e6e2..ba443a497b38 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > > > @@ -134,6 +134,19 @@ struct xe_exec_queue {
> > > > > > > struct list_head link;
> > > > > > > } lr;
> > > > > > >
> > > > > > > +#define XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT 0
> > > > > > > +#define XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT 1
> > > > > > > +#define
> > > > > > > XE_EXEC_QUEUE_TLB_INVAL_COUNT (XE_EXEC_QUEUE_TLB
> > > > > > > _INV
> > > > > > > AL_M
> > > > > > > EDIA
> > > > > > > _GT + 1)
> > > > > > > +
> > > > > > > + /** @tlb_inval: TLB invalidations exec queue
> > > > > > > state */
> > > > > > > + struct {
> > > > > > > + /**
> > > > > > > + * @tlb_inval.dep_scheduler: The TLB
> > > > > > > invalidation
> > > > > > > + * dependency scheduler
> > > > > > > + */
> > > > > > > + struct xe_dep_scheduler *dep_scheduler;
> > > > > > > + } tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT];
> > > > > > > +
> > > > > > > /** @pxp: PXP info tracking */
> > > > > > > struct {
> > > > > > > /** @pxp.type: PXP session type used by
> > > > > > > this
> > > > > > > queue */
> > > > > >
> > > >
> >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 7/9] drm/xe: Add GT TLB invalidation jobs
2025-07-02 23:42 ` [PATCH v2 7/9] drm/xe: Add GT TLB invalidation jobs Matthew Brost
@ 2025-07-15 23:09 ` Summers, Stuart
2025-07-16 1:08 ` Matthew Brost
0 siblings, 1 reply; 45+ messages in thread
From: Summers, Stuart @ 2025-07-15 23:09 UTC (permalink / raw)
To: intel-xe@lists.freedesktop.org, Brost, Matthew
Cc: maarten.lankhorst@linux.intel.com, Auld, Matthew
On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> Add GT TLB invalidation jobs which issue GT TLB invalidations. Built
> on
> top of Xe generic dependency scheduler.
>
> v2:
> - Fix checkpatch
>
> Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
> drivers/gpu/drm/xe/Makefile | 1 +
> drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c | 271
> +++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h | 34 +++
> 3 files changed, 306 insertions(+)
> create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
> create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
>
> diff --git a/drivers/gpu/drm/xe/Makefile
> b/drivers/gpu/drm/xe/Makefile
> index 0edcfc770c0d..5aad44a3b5fd 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -55,6 +55,7 @@ xe-y += xe_bb.o \
> xe_gt_sysfs.o \
> xe_gt_throttle.o \
> xe_gt_tlb_invalidation.o \
> + xe_gt_tlb_inval_job.o \
> xe_gt_topology.o \
> xe_guc.o \
> xe_guc_ads.o \
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
> b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
> new file mode 100644
> index 000000000000..428d20f16ec2
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
> @@ -0,0 +1,271 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#include "xe_dep_job_types.h"
> +#include "xe_dep_scheduler.h"
> +#include "xe_exec_queue.h"
> +#include "xe_gt.h"
> +#include "xe_gt_tlb_invalidation.h"
> +#include "xe_gt_tlb_inval_job.h"
> +#include "xe_migrate.h"
> +#include "xe_pm.h"
> +
> +/** struct xe_gt_tlb_inval_job - GT TLB invalidation job */
> +struct xe_gt_tlb_inval_job {
> + /** @dep: base generic dependency Xe job */
> + struct xe_dep_job dep;
> + /** @gt: GT to invalidate */
> + struct xe_gt *gt;
> + /** @q: exec queue issuing the invalidate */
> + struct xe_exec_queue *q;
> + /** @refcount: ref count of this job */
> + struct kref refcount;
> + /**
> + * @fence: dma fence to indicate completion. 1 way
> relationship - job
> + * can safely reference fence, fence cannot safely reference
> job.
> + */
> + struct dma_fence *fence;
> + /** @start: Start address to invalidate */
> + u64 start;
> + /** @end: End address to invalidate */
> + u64 end;
> + /** @asid: Address space ID to invalidate */
> + u32 asid;
> + /** @fence_armed: Fence has been armed */
> + bool fence_armed;
> +};
> +
> +static struct dma_fence *xe_gt_tlb_inval_job_run(struct xe_dep_job
> *dep_job)
> +{
> + struct xe_gt_tlb_inval_job *job =
> + container_of(dep_job, typeof(*job), dep);
> + struct xe_gt_tlb_invalidation_fence *ifence =
> + container_of(job->fence, typeof(*ifence), base);
> +
> + xe_gt_tlb_invalidation_range(job->gt, ifence, job->start,
> + job->end, job->asid);
> +
> + return job->fence;
> +}
> +
> +static void xe_gt_tlb_inval_job_free(struct xe_dep_job *dep_job)
> +{
> + struct xe_gt_tlb_inval_job *job =
> + container_of(dep_job, typeof(*job), dep);
> +
> + /* Pairs with get in xe_gt_tlb_inval_job_push */
> + xe_gt_tlb_inval_job_put(job);
> +}
> +
> +static const struct xe_dep_job_ops dep_job_ops = {
> + .run_job = xe_gt_tlb_inval_job_run,
> + .free_job = xe_gt_tlb_inval_job_free,
> +};
> +
> +static int xe_gt_tlb_inval_context(struct xe_gt *gt)
> +{
> + return xe_gt_is_media_type(gt) ?
> XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT :
> + XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT;
> +}
> +
> +/**
> + * xe_gt_tlb_inval_job_create() - GT TLB invalidation job create
> + * @gt: GT to invalidate
> + * @q: exec queue issuing the invalidate
> + * @start: Start address to invalidate
> + * @end: End address to invalidate
> + * @asid: Address space ID to invalidate
> + *
> + * Create a GT TLB invalidation job and initialize internal fields.
> The caller is
> + * responsible for releasing the creation reference.
> + *
> + * Return: GT TLB invalidation job object or ERR_PTR
> + */
> +struct xe_gt_tlb_inval_job *xe_gt_tlb_inval_job_create(struct
> xe_exec_queue *q,
> + struct xe_gt
> *gt,
> + u64 start, u64
> end,
> + u32 asid)
> +{
> + struct xe_gt_tlb_inval_job *job;
> + struct xe_dep_scheduler *dep_scheduler =
> + q-
> >tlb_inval[xe_gt_tlb_inval_context(gt)].dep_scheduler;
> + struct drm_sched_entity *entity =
> + xe_dep_scheduler_entity(dep_scheduler);
> + struct xe_gt_tlb_invalidation_fence *ifence;
> + int err;
> +
> + job = kmalloc(sizeof(*job), GFP_KERNEL);
> + if (!job)
> + return ERR_PTR(-ENOMEM);
> +
> + job->q = q;
> + job->gt = gt;
> + job->start = start;
> + job->end = end;
> + job->asid = asid;
> + job->fence_armed = false;
> + job->dep.ops = &dep_job_ops;
> + kref_init(&job->refcount);
> + xe_exec_queue_get(q);
> +
> + ifence = kmalloc(sizeof(*ifence), GFP_KERNEL);
> + if (!ifence) {
> + err = -ENOMEM;
> + goto err_job;
> + }
> + job->fence = &ifence->base;
> +
> + err = drm_sched_job_init(&job->dep.drm, entity, 1, NULL,
> + q->xef ? q->xef->drm->client_id :
> 0);
> + if (err)
> + goto err_fence;
> +
> + xe_pm_runtime_get_noresume(gt_to_xe(job->gt));
> + return job;
> +
> +err_fence:
> + kfree(ifence);
> +err_job:
> + xe_exec_queue_put(q);
> + kfree(job);
> +
> + return ERR_PTR(err);
> +}
> +
> +static void xe_gt_tlb_inval_job_destroy(struct kref *ref)
> +{
> + struct xe_gt_tlb_inval_job *job = container_of(ref,
> typeof(*job),
> + refcount);
> + struct xe_gt_tlb_invalidation_fence *ifence =
> + container_of(job->fence, typeof(*ifence), base);
> + struct xe_device *xe = gt_to_xe(job->gt);
> + struct xe_exec_queue *q = job->q;
> +
> + if (!job->fence_armed)
> + kfree(ifence);
> + else
> + /* Ref from xe_gt_tlb_invalidation_fence_init */
> + dma_fence_put(job->fence);
> +
> + drm_sched_job_cleanup(&job->dep.drm);
> + kfree(job);
> + xe_exec_queue_put(q); /* Pairs with get from
> xe_gt_tlb_inval_job_create */
> + xe_pm_runtime_put(xe); /* Pairs with get from
> xe_gt_tlb_inval_job_create */
This patch also looks great to me. My only concern here is it feels
weird reordering these puts and kfrees from how we allocated them in
xe_gt_tlb_inval_job_create. It does look functional though because the
kfree of the job and exec queue/runtime put are really independent, but
if for some reason that changed in the future (however unlikely) and we
wanted to link those, it seems safer to do the cleanup in the reverse
of the order you did the allocation. Let me know what you think here.
Thanks,
Stuart
> +}
> +
> +/**
> + * xe_gt_tlb_inval_alloc_dep() - GT TLB invalidation job alloc
> dependency
> + * @job: GT TLB invalidation job to alloc dependency for
> + *
> + * Allocate storage for a dependency in the GT TLB invalidation
> fence. This
> + * function should be called at most once per job and must be paired
> with
> + * xe_gt_tlb_inval_job_push being called with a real (non-signaled)
> fence.
> + *
> + * Return: 0 on success, -errno on failure
> + */
> +int xe_gt_tlb_inval_job_alloc_dep(struct xe_gt_tlb_inval_job *job)
> +{
> + xe_assert(gt_to_xe(job->gt), !xa_load(&job-
> >dep.drm.dependencies, 0));
> +
> + return drm_sched_job_add_dependency(&job->dep.drm,
> + dma_fence_get_stub());
> +}
> +
> +/**
> + * xe_gt_tlb_inval_job_push() - GT TLB invalidation job push
> + * @job: GT TLB invalidation job to push
> + * @m: The migration object being used
> + * @fence: Dependency for GT TLB invalidation job
> + *
> + * Pushes a GT TLB invalidation job for execution, using @fence as a
> dependency.
> + * Storage for @fence must be preallocated with
> xe_gt_tlb_inval_job_alloc_dep
> + * prior to this call if @fence is not signaled. Takes a reference
> to the job’s
> + * finished fence, which the caller is responsible for releasing,
> and retutn it
> + * to the caller. This function is safe to be called in the path of
> reclaim.
> + *
> + * Return: Job's finished fence
> + */
> +struct dma_fence *xe_gt_tlb_inval_job_push(struct
> xe_gt_tlb_inval_job *job,
> + struct xe_migrate *m,
> + struct dma_fence *fence)
> +{
> + struct xe_gt_tlb_invalidation_fence *ifence =
> + container_of(job->fence, typeof(*ifence), base);
> +
> + if (!dma_fence_is_signaled(fence)) {
> + void *ptr;
> +
> + /*
> + * Can be in path of reclaim, hence the preallocation
> of fence
> + * storage in xe_gt_tlb_inval_job_alloc_dep. Verify
> caller did
> + * this correctly.
> + */
> + xe_assert(gt_to_xe(job->gt),
> + xa_load(&job->dep.drm.dependencies, 0) ==
> + dma_fence_get_stub());
> +
> + dma_fence_get(fence); /* ref released once
> dependency processed by scheduler */
> + ptr = xa_store(&job->dep.drm.dependencies, 0, fence,
> + GFP_ATOMIC);
> + xe_assert(gt_to_xe(job->gt), !xa_is_err(ptr));
> + }
> +
> + xe_gt_tlb_inval_job_get(job); /* Pairs with put in free_job
> */
> + job->fence_armed = true;
> +
> + /*
> + * We need the migration lock to protect the seqnos (job and
> + * invalidation fence) and the spsc queue, only taken on
> migration
> + * queue, user queues protected dma-resv VM lock.
> + */
> + xe_migrate_job_lock(m, job->q);
> +
> + /* Creation ref pairs with put in xe_gt_tlb_inval_job_destroy
> */
> + xe_gt_tlb_invalidation_fence_init(job->gt, ifence, false);
> + dma_fence_get(job->fence); /* Pairs with put in DRM
> scheduler */
> +
> + drm_sched_job_arm(&job->dep.drm);
> + /*
> + * caller ref, get must be done before job push as it could
> immediately
> + * signal and free.
> + */
> + dma_fence_get(&job->dep.drm.s_fence->finished);
> + drm_sched_entity_push_job(&job->dep.drm);
> +
> + xe_migrate_job_unlock(m, job->q);
> +
> + /*
> + * Not using job->fence, as it has its own dma-fence context,
> which does
> + * not allow GT TLB invalidation fences on the same queue, GT
> tuple to
> + * be squashed in dma-resv/DRM scheduler. Instead, we use the
> DRM scheduler
> + * context and job's finished fence, which enables squashing.
> + */
> + return &job->dep.drm.s_fence->finished;
> +}
> +
> +/**
> + * xe_gt_tlb_inval_job_get() - Get a reference to GT TLB
> invalidation job
> + * @job: GT TLB invalidation job object
> + *
> + * Increment the GT TLB invalidation job's reference count
> + */
> +void xe_gt_tlb_inval_job_get(struct xe_gt_tlb_inval_job *job)
> +{
> + kref_get(&job->refcount);
> +}
> +
> +/**
> + * xe_gt_tlb_inval_job_put() - Put a reference to GT TLB
> invalidation job
> + * @job: GT TLB invalidation job object
> + *
> + * Decrement the GT TLB invalidation job's reference count, call
> + * xe_gt_tlb_inval_job_destroy when reference count == 0. Skips
> decrement if
> + * input @job is NULL or IS_ERR.
> + */
> +void xe_gt_tlb_inval_job_put(struct xe_gt_tlb_inval_job *job)
> +{
> + if (job && !IS_ERR(job))
> + kref_put(&job->refcount,
> xe_gt_tlb_inval_job_destroy);
> +}
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
> b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
> new file mode 100644
> index 000000000000..883896194a34
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_GT_TLB_INVAL_JOB_H_
> +#define _XE_GT_TLB_INVAL_JOB_H_
> +
> +#include <linux/types.h>
> +
> +struct dma_fence;
> +struct drm_sched_job;
> +struct kref;
> +struct xe_exec_queue;
> +struct xe_gt;
> +struct xe_gt_tlb_inval_job;
> +struct xe_migrate;
> +
> +struct xe_gt_tlb_inval_job *xe_gt_tlb_inval_job_create(struct
> xe_exec_queue *q,
> + struct xe_gt
> *gt,
> + u64 start, u64
> end,
> + u32 asid);
> +
> +int xe_gt_tlb_inval_job_alloc_dep(struct xe_gt_tlb_inval_job *job);
> +
> +struct dma_fence *xe_gt_tlb_inval_job_push(struct
> xe_gt_tlb_inval_job *job,
> + struct xe_migrate *m,
> + struct dma_fence *fence);
> +
> +void xe_gt_tlb_inval_job_get(struct xe_gt_tlb_inval_job *job);
> +
> +void xe_gt_tlb_inval_job_put(struct xe_gt_tlb_inval_job *job);
> +
> +#endif
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 7/9] drm/xe: Add GT TLB invalidation jobs
2025-07-15 23:09 ` Summers, Stuart
@ 2025-07-16 1:08 ` Matthew Brost
2025-07-17 15:58 ` Summers, Stuart
0 siblings, 1 reply; 45+ messages in thread
From: Matthew Brost @ 2025-07-16 1:08 UTC (permalink / raw)
To: Summers, Stuart
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Tue, Jul 15, 2025 at 05:09:01PM -0600, Summers, Stuart wrote:
> On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > Add GT TLB invalidation jobs which issue GT TLB invalidations. Built
> > on
> > top of Xe generic dependency scheduler.
> >
> > v2:
> > - Fix checkpatch
> >
> > Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> > drivers/gpu/drm/xe/Makefile | 1 +
> > drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c | 271
> > +++++++++++++++++++++++
> > drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h | 34 +++
> > 3 files changed, 306 insertions(+)
> > create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
> > create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
> >
> > diff --git a/drivers/gpu/drm/xe/Makefile
> > b/drivers/gpu/drm/xe/Makefile
> > index 0edcfc770c0d..5aad44a3b5fd 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -55,6 +55,7 @@ xe-y += xe_bb.o \
> > xe_gt_sysfs.o \
> > xe_gt_throttle.o \
> > xe_gt_tlb_invalidation.o \
> > + xe_gt_tlb_inval_job.o \
> > xe_gt_topology.o \
> > xe_guc.o \
> > xe_guc_ads.o \
> > diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
> > b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
> > new file mode 100644
> > index 000000000000..428d20f16ec2
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
> > @@ -0,0 +1,271 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#include "xe_dep_job_types.h"
> > +#include "xe_dep_scheduler.h"
> > +#include "xe_exec_queue.h"
> > +#include "xe_gt.h"
> > +#include "xe_gt_tlb_invalidation.h"
> > +#include "xe_gt_tlb_inval_job.h"
> > +#include "xe_migrate.h"
> > +#include "xe_pm.h"
> > +
> > +/** struct xe_gt_tlb_inval_job - GT TLB invalidation job */
> > +struct xe_gt_tlb_inval_job {
> > + /** @dep: base generic dependency Xe job */
> > + struct xe_dep_job dep;
> > + /** @gt: GT to invalidate */
> > + struct xe_gt *gt;
> > + /** @q: exec queue issuing the invalidate */
> > + struct xe_exec_queue *q;
> > + /** @refcount: ref count of this job */
> > + struct kref refcount;
> > + /**
> > + * @fence: dma fence to indicate completion. 1 way
> > relationship - job
> > + * can safely reference fence, fence cannot safely reference
> > job.
> > + */
> > + struct dma_fence *fence;
> > + /** @start: Start address to invalidate */
> > + u64 start;
> > + /** @end: End address to invalidate */
> > + u64 end;
> > + /** @asid: Address space ID to invalidate */
> > + u32 asid;
> > + /** @fence_armed: Fence has been armed */
> > + bool fence_armed;
> > +};
> > +
> > +static struct dma_fence *xe_gt_tlb_inval_job_run(struct xe_dep_job
> > *dep_job)
> > +{
> > + struct xe_gt_tlb_inval_job *job =
> > + container_of(dep_job, typeof(*job), dep);
> > + struct xe_gt_tlb_invalidation_fence *ifence =
> > + container_of(job->fence, typeof(*ifence), base);
> > +
> > + xe_gt_tlb_invalidation_range(job->gt, ifence, job->start,
> > + job->end, job->asid);
> > +
> > + return job->fence;
> > +}
> > +
> > +static void xe_gt_tlb_inval_job_free(struct xe_dep_job *dep_job)
> > +{
> > + struct xe_gt_tlb_inval_job *job =
> > + container_of(dep_job, typeof(*job), dep);
> > +
> > + /* Pairs with get in xe_gt_tlb_inval_job_push */
> > + xe_gt_tlb_inval_job_put(job);
> > +}
> > +
> > +static const struct xe_dep_job_ops dep_job_ops = {
> > + .run_job = xe_gt_tlb_inval_job_run,
> > + .free_job = xe_gt_tlb_inval_job_free,
> > +};
> > +
> > +static int xe_gt_tlb_inval_context(struct xe_gt *gt)
> > +{
> > + return xe_gt_is_media_type(gt) ?
> > XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT :
> > + XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT;
> > +}
> > +
> > +/**
> > + * xe_gt_tlb_inval_job_create() - GT TLB invalidation job create
> > + * @gt: GT to invalidate
> > + * @q: exec queue issuing the invalidate
> > + * @start: Start address to invalidate
> > + * @end: End address to invalidate
> > + * @asid: Address space ID to invalidate
> > + *
> > + * Create a GT TLB invalidation job and initialize internal fields.
> > The caller is
> > + * responsible for releasing the creation reference.
> > + *
> > + * Return: GT TLB invalidation job object or ERR_PTR
> > + */
> > +struct xe_gt_tlb_inval_job *xe_gt_tlb_inval_job_create(struct
> > xe_exec_queue *q,
> > + struct xe_gt
> > *gt,
> > + u64 start, u64
> > end,
> > + u32 asid)
> > +{
> > + struct xe_gt_tlb_inval_job *job;
> > + struct xe_dep_scheduler *dep_scheduler =
> > + q-
> > >tlb_inval[xe_gt_tlb_inval_context(gt)].dep_scheduler;
> > + struct drm_sched_entity *entity =
> > + xe_dep_scheduler_entity(dep_scheduler);
> > + struct xe_gt_tlb_invalidation_fence *ifence;
> > + int err;
> > +
> > + job = kmalloc(sizeof(*job), GFP_KERNEL);
> > + if (!job)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + job->q = q;
> > + job->gt = gt;
> > + job->start = start;
> > + job->end = end;
> > + job->asid = asid;
> > + job->fence_armed = false;
> > + job->dep.ops = &dep_job_ops;
> > + kref_init(&job->refcount);
> > + xe_exec_queue_get(q);
> > +
> > + ifence = kmalloc(sizeof(*ifence), GFP_KERNEL);
> > + if (!ifence) {
> > + err = -ENOMEM;
> > + goto err_job;
> > + }
> > + job->fence = &ifence->base;
> > +
> > + err = drm_sched_job_init(&job->dep.drm, entity, 1, NULL,
> > + q->xef ? q->xef->drm->client_id :
> > 0);
> > + if (err)
> > + goto err_fence;
> > +
> > + xe_pm_runtime_get_noresume(gt_to_xe(job->gt));
> > + return job;
> > +
> > +err_fence:
> > + kfree(ifence);
> > +err_job:
> > + xe_exec_queue_put(q);
> > + kfree(job);
> > +
> > + return ERR_PTR(err);
> > +}
> > +
> > +static void xe_gt_tlb_inval_job_destroy(struct kref *ref)
> > +{
> > + struct xe_gt_tlb_inval_job *job = container_of(ref,
> > typeof(*job),
> > + refcount);
> > + struct xe_gt_tlb_invalidation_fence *ifence =
> > + container_of(job->fence, typeof(*ifence), base);
> > + struct xe_device *xe = gt_to_xe(job->gt);
> > + struct xe_exec_queue *q = job->q;
> > +
> > + if (!job->fence_armed)
> > + kfree(ifence);
> > + else
> > + /* Ref from xe_gt_tlb_invalidation_fence_init */
> > + dma_fence_put(job->fence);
> > +
> > + drm_sched_job_cleanup(&job->dep.drm);
> > + kfree(job);
> > + xe_exec_queue_put(q); /* Pairs with get from
> > xe_gt_tlb_inval_job_create */
> > + xe_pm_runtime_put(xe); /* Pairs with get from
> > xe_gt_tlb_inval_job_create */
>
> This patch also looks great to me. My only concern here is it feels
> weird reordering these puts and kfrees from how we allocated them in
> xe_gt_tlb_inval_job_create. It does look functional though because the
> kfree of the job and exec queue/runtime put are really independent, but
> if for some reason that changed in the future (however unlikely) and we
> wanted to link those, it seems safer to do the cleanup in the reverse
> of the order you did the allocation. Let me know what you think here.
They are independent. The ordering is just mirroring what
xe_sched_job_destroy does but in bith cases could be reordered if
desired. I'd rather leave it as is for now, so the functions that create
/ destroy jobs look more or less the same in Xe.
Matt
>
> Thanks,
> Stuart
>
> > +}
> > +
> > +/**
> > + * xe_gt_tlb_inval_alloc_dep() - GT TLB invalidation job alloc
> > dependency
> > + * @job: GT TLB invalidation job to alloc dependency for
> > + *
> > + * Allocate storage for a dependency in the GT TLB invalidation
> > fence. This
> > + * function should be called at most once per job and must be paired
> > with
> > + * xe_gt_tlb_inval_job_push being called with a real (non-signaled)
> > fence.
> > + *
> > + * Return: 0 on success, -errno on failure
> > + */
> > +int xe_gt_tlb_inval_job_alloc_dep(struct xe_gt_tlb_inval_job *job)
> > +{
> > + xe_assert(gt_to_xe(job->gt), !xa_load(&job-
> > >dep.drm.dependencies, 0));
> > +
> > + return drm_sched_job_add_dependency(&job->dep.drm,
> > + dma_fence_get_stub());
> > +}
> > +
> > +/**
> > + * xe_gt_tlb_inval_job_push() - GT TLB invalidation job push
> > + * @job: GT TLB invalidation job to push
> > + * @m: The migration object being used
> > + * @fence: Dependency for GT TLB invalidation job
> > + *
> > + * Pushes a GT TLB invalidation job for execution, using @fence as a
> > dependency.
> > + * Storage for @fence must be preallocated with
> > xe_gt_tlb_inval_job_alloc_dep
> > + * prior to this call if @fence is not signaled. Takes a reference
> > to the job’s
> > + * finished fence, which the caller is responsible for releasing,
> > and retutn it
> > + * to the caller. This function is safe to be called in the path of
> > reclaim.
> > + *
> > + * Return: Job's finished fence
> > + */
> > +struct dma_fence *xe_gt_tlb_inval_job_push(struct
> > xe_gt_tlb_inval_job *job,
> > + struct xe_migrate *m,
> > + struct dma_fence *fence)
> > +{
> > + struct xe_gt_tlb_invalidation_fence *ifence =
> > + container_of(job->fence, typeof(*ifence), base);
> > +
> > + if (!dma_fence_is_signaled(fence)) {
> > + void *ptr;
> > +
> > + /*
> > + * Can be in path of reclaim, hence the preallocation
> > of fence
> > + * storage in xe_gt_tlb_inval_job_alloc_dep. Verify
> > caller did
> > + * this correctly.
> > + */
> > + xe_assert(gt_to_xe(job->gt),
> > + xa_load(&job->dep.drm.dependencies, 0) ==
> > + dma_fence_get_stub());
> > +
> > + dma_fence_get(fence); /* ref released once
> > dependency processed by scheduler */
> > + ptr = xa_store(&job->dep.drm.dependencies, 0, fence,
> > + GFP_ATOMIC);
> > + xe_assert(gt_to_xe(job->gt), !xa_is_err(ptr));
> > + }
> > +
> > + xe_gt_tlb_inval_job_get(job); /* Pairs with put in free_job
> > */
> > + job->fence_armed = true;
> > +
> > + /*
> > + * We need the migration lock to protect the seqnos (job and
> > + * invalidation fence) and the spsc queue, only taken on
> > migration
> > + * queue, user queues protected dma-resv VM lock.
> > + */
> > + xe_migrate_job_lock(m, job->q);
> > +
> > + /* Creation ref pairs with put in xe_gt_tlb_inval_job_destroy
> > */
> > + xe_gt_tlb_invalidation_fence_init(job->gt, ifence, false);
> > + dma_fence_get(job->fence); /* Pairs with put in DRM
> > scheduler */
> > +
> > + drm_sched_job_arm(&job->dep.drm);
> > + /*
> > + * caller ref, get must be done before job push as it could
> > immediately
> > + * signal and free.
> > + */
> > + dma_fence_get(&job->dep.drm.s_fence->finished);
> > + drm_sched_entity_push_job(&job->dep.drm);
> > +
> > + xe_migrate_job_unlock(m, job->q);
> > +
> > + /*
> > + * Not using job->fence, as it has its own dma-fence context,
> > which does
> > + * not allow GT TLB invalidation fences on the same queue, GT
> > tuple to
> > + * be squashed in dma-resv/DRM scheduler. Instead, we use the
> > DRM scheduler
> > + * context and job's finished fence, which enables squashing.
> > + */
> > + return &job->dep.drm.s_fence->finished;
> > +}
> > +
> > +/**
> > + * xe_gt_tlb_inval_job_get() - Get a reference to GT TLB
> > invalidation job
> > + * @job: GT TLB invalidation job object
> > + *
> > + * Increment the GT TLB invalidation job's reference count
> > + */
> > +void xe_gt_tlb_inval_job_get(struct xe_gt_tlb_inval_job *job)
> > +{
> > + kref_get(&job->refcount);
> > +}
> > +
> > +/**
> > + * xe_gt_tlb_inval_job_put() - Put a reference to GT TLB
> > invalidation job
> > + * @job: GT TLB invalidation job object
> > + *
> > + * Decrement the GT TLB invalidation job's reference count, call
> > + * xe_gt_tlb_inval_job_destroy when reference count == 0. Skips
> > decrement if
> > + * input @job is NULL or IS_ERR.
> > + */
> > +void xe_gt_tlb_inval_job_put(struct xe_gt_tlb_inval_job *job)
> > +{
> > + if (job && !IS_ERR(job))
> > + kref_put(&job->refcount,
> > xe_gt_tlb_inval_job_destroy);
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
> > b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
> > new file mode 100644
> > index 000000000000..883896194a34
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
> > @@ -0,0 +1,34 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#ifndef _XE_GT_TLB_INVAL_JOB_H_
> > +#define _XE_GT_TLB_INVAL_JOB_H_
> > +
> > +#include <linux/types.h>
> > +
> > +struct dma_fence;
> > +struct drm_sched_job;
> > +struct kref;
> > +struct xe_exec_queue;
> > +struct xe_gt;
> > +struct xe_gt_tlb_inval_job;
> > +struct xe_migrate;
> > +
> > +struct xe_gt_tlb_inval_job *xe_gt_tlb_inval_job_create(struct
> > xe_exec_queue *q,
> > + struct xe_gt
> > *gt,
> > + u64 start, u64
> > end,
> > + u32 asid);
> > +
> > +int xe_gt_tlb_inval_job_alloc_dep(struct xe_gt_tlb_inval_job *job);
> > +
> > +struct dma_fence *xe_gt_tlb_inval_job_push(struct
> > xe_gt_tlb_inval_job *job,
> > + struct xe_migrate *m,
> > + struct dma_fence *fence);
> > +
> > +void xe_gt_tlb_inval_job_get(struct xe_gt_tlb_inval_job *job);
> > +
> > +void xe_gt_tlb_inval_job_put(struct xe_gt_tlb_inval_job *job);
> > +
> > +#endif
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 3/9] drm: Simplify drmm_alloc_ordered_workqueue return
2025-07-02 23:42 ` [PATCH v2 3/9] drm: Simplify drmm_alloc_ordered_workqueue return Matthew Brost
@ 2025-07-16 1:10 ` Matthew Brost
0 siblings, 0 replies; 45+ messages in thread
From: Matthew Brost @ 2025-07-16 1:10 UTC (permalink / raw)
To: intel-xe; +Cc: matthew.auld, maarten.lankhorst
On Wed, Jul 02, 2025 at 04:42:16PM -0700, Matthew Brost wrote:
> Rather than returning ERR_PTR or NULL on failure, replace the NULL
> return with ERR_PTR(-ENOMEM). This simplifies error handling at the
> caller. While here, add kernel documentation for
> drmm_alloc_ordered_workqueue.
>
> Cc: Louis Chauvet <louis.chauvet@bootlin.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
This one is already merged through drm-xe-misc.
Matt
> ---
> drivers/gpu/drm/vkms/vkms_crtc.c | 2 --
> include/drm/drm_managed.h | 15 +++++++++++++--
> 2 files changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
> index 8c9898b9055d..e60573e0f3e9 100644
> --- a/drivers/gpu/drm/vkms/vkms_crtc.c
> +++ b/drivers/gpu/drm/vkms/vkms_crtc.c
> @@ -302,8 +302,6 @@ struct vkms_output *vkms_crtc_init(struct drm_device *dev, struct drm_plane *pri
> vkms_out->composer_workq = drmm_alloc_ordered_workqueue(dev, "vkms_composer", 0);
> if (IS_ERR(vkms_out->composer_workq))
> return ERR_CAST(vkms_out->composer_workq);
> - if (!vkms_out->composer_workq)
> - return ERR_PTR(-ENOMEM);
>
> return vkms_out;
> }
> diff --git a/include/drm/drm_managed.h b/include/drm/drm_managed.h
> index 53017cc609ac..72bfac002c06 100644
> --- a/include/drm/drm_managed.h
> +++ b/include/drm/drm_managed.h
> @@ -129,14 +129,25 @@ void __drmm_mutex_release(struct drm_device *dev, void *res);
>
> void __drmm_workqueue_release(struct drm_device *device, void *wq);
>
> +/**
> + * drmm_alloc_ordered_workqueue - &drm_device managed alloc_ordered_workqueue()
> + * @dev: DRM device
> + * @fmt: printf format for the name of the workqueue
> + * @flags: WQ_* flags (only WQ_FREEZABLE and WQ_MEM_RECLAIM are meaningful)
> + * @args: args for @fmt
> + *
> + * This is a &drm_device-managed version of alloc_ordered_workqueue(). The
> + * allocated workqueue is automatically destroyed on the final drm_dev_put().
> + *
> + * Returns: workqueue on success, negative ERR_PTR otherwise.
> + */
> #define drmm_alloc_ordered_workqueue(dev, fmt, flags, args...) \
> ({ \
> struct workqueue_struct *wq = alloc_ordered_workqueue(fmt, flags, ##args); \
> wq ? ({ \
> int ret = drmm_add_action_or_reset(dev, __drmm_workqueue_release, wq); \
> ret ? ERR_PTR(ret) : wq; \
> - }) : \
> - wq; \
> + }) : ERR_PTR(-ENOMEM); \
> })
>
> #endif
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 6/9] drm/xe: Add xe_migrate_job_lock/unlock helpers
2025-07-15 22:48 ` Summers, Stuart
@ 2025-07-16 1:11 ` Matthew Brost
0 siblings, 0 replies; 45+ messages in thread
From: Matthew Brost @ 2025-07-16 1:11 UTC (permalink / raw)
To: Summers, Stuart
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Tue, Jul 15, 2025 at 04:48:00PM -0600, Summers, Stuart wrote:
> On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > Add xe_migrate_job_lock/unlock helpers is used ensure ordering when
> > issuing GT TLB invalidation jobs.
> >
> > v2:
> > - Fix multi-line comments (checkpatch)
> >
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>
> IMO this could be squashed with the patch after since there aren't any
> users in this patch. But the code itself looks ok to me.
>
Sure, I'll squash into the following patch.
Matt
> > ---
> > drivers/gpu/drm/xe/xe_migrate.c | 36
> > +++++++++++++++++++++++++++++++++
> > drivers/gpu/drm/xe/xe_migrate.h | 4 ++++
> > 2 files changed, 40 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > b/drivers/gpu/drm/xe/xe_migrate.c
> > index b5f85162b9ed..1f57adcbb535 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > @@ -1917,6 +1917,42 @@ int xe_migrate_access_memory(struct xe_migrate
> > *m, struct xe_bo *bo,
> > return IS_ERR(fence) ? PTR_ERR(fence) : 0;
> > }
> >
> > +/**
> > + * xe_migrate_job_lock() - Lock migrate job lock
> > + * @m: The migration context.
> > + * @q: Queue associated with the operation which requires a lock
> > + *
> > + * Lock the migrate job lock if the queue is a migration queue,
> > otherwise
> > + * assert the VM's dma-resv is held (user queue's have own locking).
> > + */
> > +void xe_migrate_job_lock(struct xe_migrate *m, struct xe_exec_queue
> > *q)
> > +{
> > + bool is_migrate = q == m->q;
>
> Maybe not worth it, but we're doing these same calculations in
> xe_migrate.c. Should we just add a helper?
>
> Either way for the above 2, the code looks ok to me:
> Reviewed-by: Stuart Summers <stuart.summers@intel.com>
>
> Thanks,
> Stuart
>
> > +
> > + if (is_migrate)
> > + mutex_lock(&m->job_mutex);
> > + else
> > + xe_vm_assert_held(q->vm); /* User queues VM's
> > should be locked */
> > +}
> > +
> > +/**
> > + * xe_migrate_job_unlock() - Unlock migrate job lock
> > + * @m: The migration context.
> > + * @q: Queue associated with the operation which requires a lock
> > + *
> > + * Unlock the migrate job lock if the queue is a migration queue,
> > otherwise
> > + * assert the VM's dma-resv is held (user queue's have own locking).
> > + */
> > +void xe_migrate_job_unlock(struct xe_migrate *m, struct
> > xe_exec_queue *q)
> > +{
> > + bool is_migrate = q == m->q;
> > +
> > + if (is_migrate)
> > + mutex_unlock(&m->job_mutex);
> > + else
> > + xe_vm_assert_held(q->vm); /* User queues VM's
> > should be locked */
> > +}
> > +
> > #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
> > #include "tests/xe_migrate.c"
> > #endif
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> > b/drivers/gpu/drm/xe/xe_migrate.h
> > index fb9839c1bae0..e9d83d320f8c 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.h
> > +++ b/drivers/gpu/drm/xe/xe_migrate.h
> > @@ -134,4 +134,8 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
> > void xe_migrate_wait(struct xe_migrate *m);
> >
> > struct xe_exec_queue *xe_tile_migrate_exec_queue(struct xe_tile
> > *tile);
> > +
> > +void xe_migrate_job_lock(struct xe_migrate *m, struct xe_exec_queue
> > *q);
> > +void xe_migrate_job_unlock(struct xe_migrate *m, struct
> > xe_exec_queue *q);
> > +
> > #endif
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 7/9] drm/xe: Add GT TLB invalidation jobs
2025-07-16 1:08 ` Matthew Brost
@ 2025-07-17 15:58 ` Summers, Stuart
0 siblings, 0 replies; 45+ messages in thread
From: Summers, Stuart @ 2025-07-17 15:58 UTC (permalink / raw)
To: Brost, Matthew
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Tue, 2025-07-15 at 18:08 -0700, Matthew Brost wrote:
> > On Tue, Jul 15, 2025 at 05:09:01PM -0600, Summers, Stuart wrote:
> > > > On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > > > > > Add GT TLB invalidation jobs which issue GT TLB
> > > > > > invalidations. > > > Built
> > > > > > on
> > > > > > top of Xe generic dependency scheduler.
> > > > > >
> > > > > > v2:
> > > > > > - Fix checkpatch
> > > > > >
> > > > > > Suggested-by: Thomas Hellström
> > > > > > <thomas.hellstrom@linux.intel.com>
> > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > > ---
> > > > > > drivers/gpu/drm/xe/Makefile | 1 +
> > > > > > drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c | 271
> > > > > > +++++++++++++++++++++++
> > > > > > drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h | 34 +++
> > > > > > 3 files changed, 306 insertions(+)
> > > > > > create mode 100644
> > > > > > drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
> > > > > > create mode 100644
> > > > > > drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/xe/Makefile
> > > > > > b/drivers/gpu/drm/xe/Makefile
> > > > > > index 0edcfc770c0d..5aad44a3b5fd 100644
> > > > > > --- a/drivers/gpu/drm/xe/Makefile
> > > > > > +++ b/drivers/gpu/drm/xe/Makefile
> > > > > > @@ -55,6 +55,7 @@ xe-y += xe_bb.o \
> > > > > > xe_gt_sysfs.o \
> > > > > > xe_gt_throttle.o \
> > > > > > xe_gt_tlb_invalidation.o \
> > > > > > + xe_gt_tlb_inval_job.o \
> > > > > > xe_gt_topology.o \
> > > > > > xe_guc.o \
> > > > > > xe_guc_ads.o \
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
> > > > > > b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
> > > > > > new file mode 100644
> > > > > > index 000000000000..428d20f16ec2
> > > > > > --- /dev/null
> > > > > > +++ b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.c
> > > > > > @@ -0,0 +1,271 @@
> > > > > > +// SPDX-License-Identifier: MIT
> > > > > > +/*
> > > > > > + * Copyright © 2025 Intel Corporation
> > > > > > + */
> > > > > > +
> > > > > > +#include "xe_dep_job_types.h"
> > > > > > +#include "xe_dep_scheduler.h"
> > > > > > +#include "xe_exec_queue.h"
> > > > > > +#include "xe_gt.h"
> > > > > > +#include "xe_gt_tlb_invalidation.h"
> > > > > > +#include "xe_gt_tlb_inval_job.h"
> > > > > > +#include "xe_migrate.h"
> > > > > > +#include "xe_pm.h"
> > > > > > +
> > > > > > +/** struct xe_gt_tlb_inval_job - GT TLB invalidation job
> > > > > > */
> > > > > > +struct xe_gt_tlb_inval_job {
> > > > > > + /** @dep: base generic dependency Xe job */
> > > > > > + struct xe_dep_job dep;
> > > > > > + /** @gt: GT to invalidate */
> > > > > > + struct xe_gt *gt;
> > > > > > + /** @q: exec queue issuing the invalidate */
> > > > > > + struct xe_exec_queue *q;
> > > > > > + /** @refcount: ref count of this job */
> > > > > > + struct kref refcount;
> > > > > > + /**
> > > > > > + * @fence: dma fence to indicate completion. 1 way
> > > > > > relationship - job
> > > > > > + * can safely reference fence, fence cannot safely
> > > > > > > > > reference
> > > > > > job.
> > > > > > + */
> > > > > > + struct dma_fence *fence;
> > > > > > + /** @start: Start address to invalidate */
> > > > > > + u64 start;
> > > > > > + /** @end: End address to invalidate */
> > > > > > + u64 end;
> > > > > > + /** @asid: Address space ID to invalidate */
> > > > > > + u32 asid;
> > > > > > + /** @fence_armed: Fence has been armed */
> > > > > > + bool fence_armed;
> > > > > > +};
> > > > > > +
> > > > > > +static struct dma_fence *xe_gt_tlb_inval_job_run(struct >
> > > > > > > > xe_dep_job
> > > > > > *dep_job)
> > > > > > +{
> > > > > > + struct xe_gt_tlb_inval_job *job =
> > > > > > + container_of(dep_job, typeof(*job), dep);
> > > > > > + struct xe_gt_tlb_invalidation_fence *ifence =
> > > > > > + container_of(job->fence, typeof(*ifence),
> > > > > > base);
> > > > > > +
> > > > > > + xe_gt_tlb_invalidation_range(job->gt, ifence, job-
> > > > > > >start,
> > > > > > + job->end, job->asid);
> > > > > > +
> > > > > > + return job->fence;
> > > > > > +}
> > > > > > +
> > > > > > +static void xe_gt_tlb_inval_job_free(struct xe_dep_job
> > > > > > *dep_job)
> > > > > > +{
> > > > > > + struct xe_gt_tlb_inval_job *job =
> > > > > > + container_of(dep_job, typeof(*job), dep);
> > > > > > +
> > > > > > + /* Pairs with get in xe_gt_tlb_inval_job_push */
> > > > > > + xe_gt_tlb_inval_job_put(job);
> > > > > > +}
> > > > > > +
> > > > > > +static const struct xe_dep_job_ops dep_job_ops = {
> > > > > > + .run_job = xe_gt_tlb_inval_job_run,
> > > > > > + .free_job = xe_gt_tlb_inval_job_free,
> > > > > > +};
> > > > > > +
> > > > > > +static int xe_gt_tlb_inval_context(struct xe_gt *gt)
> > > > > > +{
> > > > > > + return xe_gt_is_media_type(gt) ?
> > > > > > XE_EXEC_QUEUE_TLB_INVAL_MEDIA_GT :
> > > > > > + XE_EXEC_QUEUE_TLB_INVAL_PRIMARY_GT;
> > > > > > +}
> > > > > > +
> > > > > > +/**
> > > > > > + * xe_gt_tlb_inval_job_create() - GT TLB invalidation job
> > > > > > create
> > > > > > + * @gt: GT to invalidate
> > > > > > + * @q: exec queue issuing the invalidate
> > > > > > + * @start: Start address to invalidate
> > > > > > + * @end: End address to invalidate
> > > > > > + * @asid: Address space ID to invalidate
> > > > > > + *
> > > > > > + * Create a GT TLB invalidation job and initialize
> > > > > > internal > > > fields.
> > > > > > The caller is
> > > > > > + * responsible for releasing the creation reference.
> > > > > > + *
> > > > > > + * Return: GT TLB invalidation job object or ERR_PTR
> > > > > > + */
> > > > > > +struct xe_gt_tlb_inval_job
> > > > > > *xe_gt_tlb_inval_job_create(struct
> > > > > > xe_exec_queue *q,
> > > > > > +
> > > > > > struct > > > xe_gt
> > > > > > *gt,
> > > > > > + u64
> > > > > > start, > > > u64
> > > > > > end,
> > > > > > + u32
> > > > > > asid)
> > > > > > +{
> > > > > > + struct xe_gt_tlb_inval_job *job;
> > > > > > + struct xe_dep_scheduler *dep_scheduler =
> > > > > > + q-
> > > > > > > > tlb_inval[xe_gt_tlb_inval_context(gt)].dep_scheduler;
> > > > > > + struct drm_sched_entity *entity =
> > > > > > + xe_dep_scheduler_entity(dep_scheduler);
> > > > > > + struct xe_gt_tlb_invalidation_fence *ifence;
> > > > > > + int err;
> > > > > > +
> > > > > > + job = kmalloc(sizeof(*job), GFP_KERNEL);
> > > > > > + if (!job)
> > > > > > + return ERR_PTR(-ENOMEM);
> > > > > > +
> > > > > > + job->q = q;
> > > > > > + job->gt = gt;
> > > > > > + job->start = start;
> > > > > > + job->end = end;
> > > > > > + job->asid = asid;
> > > > > > + job->fence_armed = false;
> > > > > > + job->dep.ops = &dep_job_ops;
> > > > > > + kref_init(&job->refcount);
> > > > > > + xe_exec_queue_get(q);
> > > > > > +
> > > > > > + ifence = kmalloc(sizeof(*ifence), GFP_KERNEL);
> > > > > > + if (!ifence) {
> > > > > > + err = -ENOMEM;
> > > > > > + goto err_job;
> > > > > > + }
> > > > > > + job->fence = &ifence->base;
> > > > > > +
> > > > > > + err = drm_sched_job_init(&job->dep.drm, entity, 1,
> > > > > > NULL,
> > > > > > + q->xef ? q->xef->drm-
> > > > > > >client_id > > > :
> > > > > > 0);
> > > > > > + if (err)
> > > > > > + goto err_fence;
> > > > > > +
> > > > > > + xe_pm_runtime_get_noresume(gt_to_xe(job->gt));
> > > > > > + return job;
> > > > > > +
> > > > > > +err_fence:
> > > > > > + kfree(ifence);
> > > > > > +err_job:
> > > > > > + xe_exec_queue_put(q);
> > > > > > + kfree(job);
> > > > > > +
> > > > > > + return ERR_PTR(err);
> > > > > > +}
> > > > > > +
> > > > > > +static void xe_gt_tlb_inval_job_destroy(struct kref *ref)
> > > > > > +{
> > > > > > + struct xe_gt_tlb_inval_job *job = container_of(ref,
> > > > > > typeof(*job),
> > > > > > +
> > > > > > refcount);
> > > > > > + struct xe_gt_tlb_invalidation_fence *ifence =
> > > > > > + container_of(job->fence, typeof(*ifence),
> > > > > > base);
> > > > > > + struct xe_device *xe = gt_to_xe(job->gt);
> > > > > > + struct xe_exec_queue *q = job->q;
> > > > > > +
> > > > > > + if (!job->fence_armed)
> > > > > > + kfree(ifence);
> > > > > > + else
> > > > > > + /* Ref from
> > > > > > xe_gt_tlb_invalidation_fence_init */
> > > > > > + dma_fence_put(job->fence);
> > > > > > +
> > > > > > + drm_sched_job_cleanup(&job->dep.drm);
> > > > > > + kfree(job);
> > > > > > + xe_exec_queue_put(q); /* Pairs with get from
> > > > > > xe_gt_tlb_inval_job_create */
> > > > > > + xe_pm_runtime_put(xe); /* Pairs with get from
> > > > > > xe_gt_tlb_inval_job_create */
> > > >
> > > > This patch also looks great to me. My only concern here is it
> > > > feels
> > > > weird reordering these puts and kfrees from how we allocated
> > > > them > > in
> > > > xe_gt_tlb_inval_job_create. It does look functional though
> > > > because > > the
> > > > kfree of the job and exec queue/runtime put are really
> > > > independent, > > but
> > > > if for some reason that changed in the future (however
> > > > unlikely) > > and we
> > > > wanted to link those, it seems safer to do the cleanup in the >
> > > > > reverse
> > > > of the order you did the allocation. Let me know what you think
> > > > > > here.
> >
> > They are independent. The ordering is just mirroring what
> > xe_sched_job_destroy does but in bith cases could be reordered if
> > desired. I'd rather leave it as is for now, so the functions that >
> > create
> > / destroy jobs look more or less the same in Xe.
Yeah makes sense to me. Like I said I don't think this is a functional
problem or we'd have a lot of other issues. But eventually it would be
nice to clean this up.
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
> >
> > Matt
> >
> > > >
> > > > Thanks,
> > > > Stuart
> > > >
> > > > > > +}
> > > > > > +
> > > > > > +/**
> > > > > > + * xe_gt_tlb_inval_alloc_dep() - GT TLB invalidation job
> > > > > > alloc
> > > > > > dependency
> > > > > > + * @job: GT TLB invalidation job to alloc dependency for
> > > > > > + *
> > > > > > + * Allocate storage for a dependency in the GT TLB
> > > > > > invalidation
> > > > > > fence. This
> > > > > > + * function should be called at most once per job and must
> > > > > > be > > > paired
> > > > > > with
> > > > > > + * xe_gt_tlb_inval_job_push being called with a real > > >
> > > > > > (non-signaled)
> > > > > > fence.
> > > > > > + *
> > > > > > + * Return: 0 on success, -errno on failure
> > > > > > + */
> > > > > > +int xe_gt_tlb_inval_job_alloc_dep(struct
> > > > > > xe_gt_tlb_inval_job > > > *job)
> > > > > > +{
> > > > > > + xe_assert(gt_to_xe(job->gt), !xa_load(&job-
> > > > > > > > dep.drm.dependencies, 0));
> > > > > > +
> > > > > > + return drm_sched_job_add_dependency(&job->dep.drm,
> > > > > > + > > >
> > > > > > dma_fence_get_stub());
> > > > > > +}
> > > > > > +
> > > > > > +/**
> > > > > > + * xe_gt_tlb_inval_job_push() - GT TLB invalidation job
> > > > > > push
> > > > > > + * @job: GT TLB invalidation job to push
> > > > > > + * @m: The migration object being used
> > > > > > + * @fence: Dependency for GT TLB invalidation job
> > > > > > + *
> > > > > > + * Pushes a GT TLB invalidation job for execution, using
> > > > > > @fence > > > as a
> > > > > > dependency.
> > > > > > + * Storage for @fence must be preallocated with
> > > > > > xe_gt_tlb_inval_job_alloc_dep
> > > > > > + * prior to this call if @fence is not signaled. Takes a >
> > > > > > > > reference
> > > > > > to the job’s
> > > > > > + * finished fence, which the caller is responsible for > >
> > > > > > > releasing,
> > > > > > and retutn it
> > > > > > + * to the caller. This function is safe to be called in
> > > > > > the path > > > of
> > > > > > reclaim.
> > > > > > + *
> > > > > > + * Return: Job's finished fence
> > > > > > + */
> > > > > > +struct dma_fence *xe_gt_tlb_inval_job_push(struct
> > > > > > xe_gt_tlb_inval_job *job,
> > > > > > + struct
> > > > > > xe_migrate *m,
> > > > > > + struct dma_fence
> > > > > > > > > *fence)
> > > > > > +{
> > > > > > + struct xe_gt_tlb_invalidation_fence *ifence =
> > > > > > + container_of(job->fence, typeof(*ifence),
> > > > > > base);
> > > > > > +
> > > > > > + if (!dma_fence_is_signaled(fence)) {
> > > > > > + void *ptr;
> > > > > > +
> > > > > > + /*
> > > > > > + * Can be in path of reclaim, hence the > >
> > > > > > > preallocation
> > > > > > of fence
> > > > > > + * storage in
> > > > > > xe_gt_tlb_inval_job_alloc_dep. > > > Verify
> > > > > > caller did
> > > > > > + * this correctly.
> > > > > > + */
> > > > > > + xe_assert(gt_to_xe(job->gt),
> > > > > > + xa_load(&job-
> > > > > > >dep.drm.dependencies, 0) > > > ==
> > > > > > + dma_fence_get_stub());
> > > > > > +
> > > > > > + dma_fence_get(fence); /* ref released
> > > > > > once
> > > > > > dependency processed by scheduler */
> > > > > > + ptr = xa_store(&job->dep.drm.dependencies,
> > > > > > 0, > > > fence,
> > > > > > + GFP_ATOMIC);
> > > > > > + xe_assert(gt_to_xe(job->gt),
> > > > > > !xa_is_err(ptr));
> > > > > > + }
> > > > > > +
> > > > > > + xe_gt_tlb_inval_job_get(job); /* Pairs with put
> > > > > > in > > > free_job
> > > > > > */
> > > > > > + job->fence_armed = true;
> > > > > > +
> > > > > > + /*
> > > > > > + * We need the migration lock to protect the seqnos
> > > > > > (job > > > and
> > > > > > + * invalidation fence) and the spsc queue, only
> > > > > > taken on
> > > > > > migration
> > > > > > + * queue, user queues protected dma-resv VM lock.
> > > > > > + */
> > > > > > + xe_migrate_job_lock(m, job->q);
> > > > > > +
> > > > > > + /* Creation ref pairs with put in > > >
> > > > > > xe_gt_tlb_inval_job_destroy
> > > > > > */
> > > > > > + xe_gt_tlb_invalidation_fence_init(job->gt, ifence,
> > > > > > > > > false);
> > > > > > + dma_fence_get(job->fence); /* Pairs with put
> > > > > > in DRM
> > > > > > scheduler */
> > > > > > +
> > > > > > + drm_sched_job_arm(&job->dep.drm);
> > > > > > + /*
> > > > > > + * caller ref, get must be done before job push as
> > > > > > it > > > could
> > > > > > immediately
> > > > > > + * signal and free.
> > > > > > + */
> > > > > > + dma_fence_get(&job->dep.drm.s_fence->finished);
> > > > > > + drm_sched_entity_push_job(&job->dep.drm);
> > > > > > +
> > > > > > + xe_migrate_job_unlock(m, job->q);
> > > > > > +
> > > > > > + /*
> > > > > > + * Not using job->fence, as it has its own dma-
> > > > > > fence > > > context,
> > > > > > which does
> > > > > > + * not allow GT TLB invalidation fences on the same
> > > > > > > > > queue, GT
> > > > > > tuple to
> > > > > > + * be squashed in dma-resv/DRM scheduler. Instead,
> > > > > > we use > > > the
> > > > > > DRM scheduler
> > > > > > + * context and job's finished fence, which enables
> > > > > > > > > squashing.
> > > > > > + */
> > > > > > + return &job->dep.drm.s_fence->finished;
> > > > > > +}
> > > > > > +
> > > > > > +/**
> > > > > > + * xe_gt_tlb_inval_job_get() - Get a reference to GT TLB
> > > > > > invalidation job
> > > > > > + * @job: GT TLB invalidation job object
> > > > > > + *
> > > > > > + * Increment the GT TLB invalidation job's reference count
> > > > > > + */
> > > > > > +void xe_gt_tlb_inval_job_get(struct xe_gt_tlb_inval_job
> > > > > > *job)
> > > > > > +{
> > > > > > + kref_get(&job->refcount);
> > > > > > +}
> > > > > > +
> > > > > > +/**
> > > > > > + * xe_gt_tlb_inval_job_put() - Put a reference to GT TLB
> > > > > > invalidation job
> > > > > > + * @job: GT TLB invalidation job object
> > > > > > + *
> > > > > > + * Decrement the GT TLB invalidation job's reference
> > > > > > count, call
> > > > > > + * xe_gt_tlb_inval_job_destroy when reference count == 0.
> > > > > > Skips
> > > > > > decrement if
> > > > > > + * input @job is NULL or IS_ERR.
> > > > > > + */
> > > > > > +void xe_gt_tlb_inval_job_put(struct xe_gt_tlb_inval_job
> > > > > > *job)
> > > > > > +{
> > > > > > + if (job && !IS_ERR(job))
> > > > > > + kref_put(&job->refcount,
> > > > > > xe_gt_tlb_inval_job_destroy);
> > > > > > +}
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
> > > > > > b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
> > > > > > new file mode 100644
> > > > > > index 000000000000..883896194a34
> > > > > > --- /dev/null
> > > > > > +++ b/drivers/gpu/drm/xe/xe_gt_tlb_inval_job.h
> > > > > > @@ -0,0 +1,34 @@
> > > > > > +/* SPDX-License-Identifier: MIT */
> > > > > > +/*
> > > > > > + * Copyright © 2025 Intel Corporation
> > > > > > + */
> > > > > > +
> > > > > > +#ifndef _XE_GT_TLB_INVAL_JOB_H_
> > > > > > +#define _XE_GT_TLB_INVAL_JOB_H_
> > > > > > +
> > > > > > +#include <linux/types.h>
> > > > > > +
> > > > > > +struct dma_fence;
> > > > > > +struct drm_sched_job;
> > > > > > +struct kref;
> > > > > > +struct xe_exec_queue;
> > > > > > +struct xe_gt;
> > > > > > +struct xe_gt_tlb_inval_job;
> > > > > > +struct xe_migrate;
> > > > > > +
> > > > > > +struct xe_gt_tlb_inval_job
> > > > > > *xe_gt_tlb_inval_job_create(struct
> > > > > > xe_exec_queue *q,
> > > > > > +
> > > > > > struct > > > xe_gt
> > > > > > *gt,
> > > > > > + u64
> > > > > > start, > > > u64
> > > > > > end,
> > > > > > + u32
> > > > > > asid);
> > > > > > +
> > > > > > +int xe_gt_tlb_inval_job_alloc_dep(struct
> > > > > > xe_gt_tlb_inval_job > > > *job);
> > > > > > +
> > > > > > +struct dma_fence *xe_gt_tlb_inval_job_push(struct
> > > > > > xe_gt_tlb_inval_job *job,
> > > > > > + struct
> > > > > > xe_migrate *m,
> > > > > > + struct dma_fence
> > > > > > > > > *fence);
> > > > > > +
> > > > > > +void xe_gt_tlb_inval_job_get(struct xe_gt_tlb_inval_job
> > > > > > *job);
> > > > > > +
> > > > > > +void xe_gt_tlb_inval_job_put(struct xe_gt_tlb_inval_job
> > > > > > *job);
> > > > > > +
> > > > > > +#endif
> > > >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 4/9] drm/xe: Create ordered workqueue for GT TLB invalidation jobs
2025-07-02 23:42 ` [PATCH v2 4/9] drm/xe: Create ordered workqueue for GT TLB invalidation jobs Matthew Brost
@ 2025-07-17 19:55 ` Summers, Stuart
2025-07-17 19:59 ` Matthew Brost
0 siblings, 1 reply; 45+ messages in thread
From: Summers, Stuart @ 2025-07-17 19:55 UTC (permalink / raw)
To: intel-xe@lists.freedesktop.org, Brost, Matthew
Cc: maarten.lankhorst@linux.intel.com, Auld, Matthew
On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> No sense to schedule GT TLB invalidation jobs in parallel which
> target
> the same given these all contend on the same lock, create ordered
This was supposed to be "in parallel which target the same exec queue"?
> workqueue for GT TLB invalidation jobs.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
I still think we should be including these patches with patches that
actually use what is implemented here generally, but not a huge deal
here. This is pretty straight forward and already used in the
subsequent patches in the series.
Verified also the drmm_alloc_ordered_workqueue is auto destroyed on the
drm dev put.
Other than the minor commit message confusion above:
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
> ---
> drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 8 ++++++++
> drivers/gpu/drm/xe/xe_gt_types.h | 2 ++
> 2 files changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> index 6088df8e159c..f6f32600e8a5 100644
> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> @@ -3,6 +3,8 @@
> * Copyright © 2023 Intel Corporation
> */
>
> +#include <drm/drm_managed.h>
> +
> #include "xe_gt_tlb_invalidation.h"
>
> #include "abi/guc_actions_abi.h"
> @@ -123,6 +125,12 @@ int xe_gt_tlb_invalidation_init_early(struct
> xe_gt *gt)
> INIT_DELAYED_WORK(>->tlb_invalidation.fence_tdr,
> xe_gt_tlb_fence_timeout);
>
> + gt->tlb_invalidation.job_wq =
> + drmm_alloc_ordered_workqueue(>_to_xe(gt)->drm, "gt-
> tbl-inval-job-wq",
> + WQ_MEM_RECLAIM);
> + if (IS_ERR(gt->tlb_invalidation.job_wq))
> + return PTR_ERR(gt->tlb_invalidation.job_wq);
> +
> return 0;
> }
>
> diff --git a/drivers/gpu/drm/xe/xe_gt_types.h
> b/drivers/gpu/drm/xe/xe_gt_types.h
> index 96344c604726..dfd4a16da5f0 100644
> --- a/drivers/gpu/drm/xe/xe_gt_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_types.h
> @@ -210,6 +210,8 @@ struct xe_gt {
> * xe_gt_tlb_fence_timeout after the timeut interval
> is over.
> */
> struct delayed_work fence_tdr;
> + /** @wtlb_invalidation.wq: schedules GT TLB
> invalidation jobs */
> + struct workqueue_struct *job_wq;
> /** @tlb_invalidation.lock: protects TLB invalidation
> fences */
> spinlock_t lock;
> } tlb_invalidation;
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 4/9] drm/xe: Create ordered workqueue for GT TLB invalidation jobs
2025-07-17 19:55 ` Summers, Stuart
@ 2025-07-17 19:59 ` Matthew Brost
0 siblings, 0 replies; 45+ messages in thread
From: Matthew Brost @ 2025-07-17 19:59 UTC (permalink / raw)
To: Summers, Stuart
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Thu, Jul 17, 2025 at 01:55:21PM -0600, Summers, Stuart wrote:
> On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > No sense to schedule GT TLB invalidation jobs in parallel which
> > target
> > the same given these all contend on the same lock, create ordered
>
> This was supposed to be "in parallel which target the same exec queue"?
>
'same GT'.
Will fix.
Matt
> > workqueue for GT TLB invalidation jobs.
> >
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>
> I still think we should be including these patches with patches that
> actually use what is implemented here generally, but not a huge deal
> here. This is pretty straight forward and already used in the
> subsequent patches in the series.
>
> Verified also the drmm_alloc_ordered_workqueue is auto destroyed on the
> drm dev put.
>
> Other than the minor commit message confusion above:
> Reviewed-by: Stuart Summers <stuart.summers@intel.com>
>
> > ---
> > drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 8 ++++++++
> > drivers/gpu/drm/xe/xe_gt_types.h | 2 ++
> > 2 files changed, 10 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > index 6088df8e159c..f6f32600e8a5 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > @@ -3,6 +3,8 @@
> > * Copyright © 2023 Intel Corporation
> > */
> >
> > +#include <drm/drm_managed.h>
> > +
> > #include "xe_gt_tlb_invalidation.h"
> >
> > #include "abi/guc_actions_abi.h"
> > @@ -123,6 +125,12 @@ int xe_gt_tlb_invalidation_init_early(struct
> > xe_gt *gt)
> > INIT_DELAYED_WORK(>->tlb_invalidation.fence_tdr,
> > xe_gt_tlb_fence_timeout);
> >
> > + gt->tlb_invalidation.job_wq =
> > + drmm_alloc_ordered_workqueue(>_to_xe(gt)->drm, "gt-
> > tbl-inval-job-wq",
> > + WQ_MEM_RECLAIM);
> > + if (IS_ERR(gt->tlb_invalidation.job_wq))
> > + return PTR_ERR(gt->tlb_invalidation.job_wq);
> > +
> > return 0;
> > }
> >
> > diff --git a/drivers/gpu/drm/xe/xe_gt_types.h
> > b/drivers/gpu/drm/xe/xe_gt_types.h
> > index 96344c604726..dfd4a16da5f0 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_types.h
> > +++ b/drivers/gpu/drm/xe/xe_gt_types.h
> > @@ -210,6 +210,8 @@ struct xe_gt {
> > * xe_gt_tlb_fence_timeout after the timeut interval
> > is over.
> > */
> > struct delayed_work fence_tdr;
> > + /** @wtlb_invalidation.wq: schedules GT TLB
> > invalidation jobs */
> > + struct workqueue_struct *job_wq;
> > /** @tlb_invalidation.lock: protects TLB invalidation
> > fences */
> > spinlock_t lock;
> > } tlb_invalidation;
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 8/9] drm/xe: Use GT TLB invalidation jobs in PT layer
2025-07-02 23:42 ` [PATCH v2 8/9] drm/xe: Use GT TLB invalidation jobs in PT layer Matthew Brost
@ 2025-07-17 21:00 ` Summers, Stuart
2025-07-17 21:07 ` Matthew Brost
0 siblings, 1 reply; 45+ messages in thread
From: Summers, Stuart @ 2025-07-17 21:00 UTC (permalink / raw)
To: intel-xe@lists.freedesktop.org, Brost, Matthew
Cc: maarten.lankhorst@linux.intel.com, Auld, Matthew
On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> Rather than open-coding GT TLB invalidations in the PT layer, use GT
> TLB
> invalidation jobs. The real benefit is that GT TLB invalidation jobs
> use
> a single dma-fence context, allowing the generated fences to be
> squashed
> in dma-resv/DRM scheduler.
>
> v2:
> - s/;;/; (checkpatch)
> - Move ijob/mjob job push after range fence install
>
> Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
> drivers/gpu/drm/xe/xe_migrate.h | 9 ++
> drivers/gpu/drm/xe/xe_pt.c | 178 +++++++++++++-----------------
> --
> 2 files changed, 80 insertions(+), 107 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> b/drivers/gpu/drm/xe/xe_migrate.h
> index e9d83d320f8c..605398ea773e 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.h
> +++ b/drivers/gpu/drm/xe/xe_migrate.h
> @@ -14,6 +14,7 @@ struct ttm_resource;
>
> struct xe_bo;
> struct xe_gt;
> +struct xe_gt_tlb_inval_job;
> struct xe_exec_queue;
> struct xe_migrate;
> struct xe_migrate_pt_update;
> @@ -89,6 +90,14 @@ struct xe_migrate_pt_update {
> struct xe_vma_ops *vops;
> /** @job: The job if a GPU page-table update. NULL otherwise
> */
> struct xe_sched_job *job;
> + /**
> + * @ijob: The GT TLB invalidation job for primary tile. NULL
> otherwise
> + */
> + struct xe_gt_tlb_inval_job *ijob;
> + /**
> + * @mjob: The GT TLB invalidation job for media tile. NULL
> otherwise
> + */
> + struct xe_gt_tlb_inval_job *mjob;
> /** @tile_id: Tile ID of the update */
> u8 tile_id;
> };
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index c8e63bd23300..67d02307779b 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -13,7 +13,7 @@
> #include "xe_drm_client.h"
> #include "xe_exec_queue.h"
> #include "xe_gt.h"
> -#include "xe_gt_tlb_invalidation.h"
> +#include "xe_gt_tlb_inval_job.h"
> #include "xe_migrate.h"
> #include "xe_pt_types.h"
> #include "xe_pt_walk.h"
> @@ -1261,6 +1261,8 @@ static int op_add_deps(struct xe_vm *vm, struct
> xe_vma_op *op,
> }
>
> static int xe_pt_vm_dependencies(struct xe_sched_job *job,
> + struct xe_gt_tlb_inval_job *ijob,
> + struct xe_gt_tlb_inval_job *mjob,
> struct xe_vm *vm,
> struct xe_vma_ops *vops,
> struct xe_vm_pgtable_update_ops
> *pt_update_ops,
> @@ -1328,6 +1330,20 @@ static int xe_pt_vm_dependencies(struct
> xe_sched_job *job,
> for (i = 0; job && !err && i < vops->num_syncs; i++)
> err = xe_sync_entry_add_deps(&vops->syncs[i], job);
>
> + if (job) {
> + if (ijob) {
> + err = xe_gt_tlb_inval_job_alloc_dep(ijob);
> + if (err)
> + return err;
> + }
> +
> + if (mjob) {
> + err = xe_gt_tlb_inval_job_alloc_dep(mjob);
> + if (err)
> + return err;
> + }
> + }
> +
> return err;
> }
>
> @@ -1339,7 +1355,8 @@ static int xe_pt_pre_commit(struct
> xe_migrate_pt_update *pt_update)
> struct xe_vm_pgtable_update_ops *pt_update_ops =
> &vops->pt_update_ops[pt_update->tile_id];
>
> - return xe_pt_vm_dependencies(pt_update->job, vm, pt_update-
> >vops,
> + return xe_pt_vm_dependencies(pt_update->job, pt_update->ijob,
> + pt_update->mjob, vm, pt_update-
> >vops,
> pt_update_ops, rftree);
> }
>
> @@ -1509,75 +1526,6 @@ static int xe_pt_svm_pre_commit(struct
> xe_migrate_pt_update *pt_update)
> }
> #endif
>
> -struct invalidation_fence {
> - struct xe_gt_tlb_invalidation_fence base;
> - struct xe_gt *gt;
> - struct dma_fence *fence;
> - struct dma_fence_cb cb;
> - struct work_struct work;
> - u64 start;
> - u64 end;
> - u32 asid;
> -};
> -
> -static void invalidation_fence_cb(struct dma_fence *fence,
> - struct dma_fence_cb *cb)
> -{
> - struct invalidation_fence *ifence =
> - container_of(cb, struct invalidation_fence, cb);
> - struct xe_device *xe = gt_to_xe(ifence->gt);
> -
> - trace_xe_gt_tlb_invalidation_fence_cb(xe, &ifence->base);
> - if (!ifence->fence->error) {
> - queue_work(system_wq, &ifence->work);
> - } else {
> - ifence->base.base.error = ifence->fence->error;
> - xe_gt_tlb_invalidation_fence_signal(&ifence->base);
> - }
> - dma_fence_put(ifence->fence);
> -}
> -
> -static void invalidation_fence_work_func(struct work_struct *w)
> -{
> - struct invalidation_fence *ifence =
> - container_of(w, struct invalidation_fence, work);
> - struct xe_device *xe = gt_to_xe(ifence->gt);
> -
> - trace_xe_gt_tlb_invalidation_fence_work_func(xe, &ifence-
> >base);
> - xe_gt_tlb_invalidation_range(ifence->gt, &ifence->base,
> ifence->start,
> - ifence->end, ifence->asid);
> -}
> -
> -static void invalidation_fence_init(struct xe_gt *gt,
> - struct invalidation_fence
> *ifence,
> - struct dma_fence *fence,
> - u64 start, u64 end, u32 asid)
> -{
> - int ret;
> -
> - trace_xe_gt_tlb_invalidation_fence_create(gt_to_xe(gt),
> &ifence->base);
> -
> - xe_gt_tlb_invalidation_fence_init(gt, &ifence->base, false);
> -
> - ifence->fence = fence;
> - ifence->gt = gt;
> - ifence->start = start;
> - ifence->end = end;
> - ifence->asid = asid;
> -
> - INIT_WORK(&ifence->work, invalidation_fence_work_func);
> - ret = dma_fence_add_callback(fence, &ifence->cb,
> invalidation_fence_cb);
> - if (ret == -ENOENT) {
> - dma_fence_put(ifence->fence); /* Usually dropped in
> CB */
> - invalidation_fence_work_func(&ifence->work);
> - } else if (ret) {
> - dma_fence_put(&ifence->base.base); /* Caller ref
> */
> - dma_fence_put(&ifence->base.base); /* Creation
> ref */
> - }
> -
> - xe_gt_assert(gt, !ret || ret == -ENOENT);
> -}
> -
> struct xe_pt_stage_unbind_walk {
> /** @base: The pagewalk base-class. */
> struct xe_pt_walk base;
> @@ -2407,8 +2355,8 @@ xe_pt_update_ops_run(struct xe_tile *tile,
> struct xe_vma_ops *vops)
> struct xe_vm *vm = vops->vm;
> struct xe_vm_pgtable_update_ops *pt_update_ops =
> &vops->pt_update_ops[tile->id];
> - struct dma_fence *fence;
> - struct invalidation_fence *ifence = NULL, *mfence = NULL;
> + struct dma_fence *fence, *ifence, *mfence;
> + struct xe_gt_tlb_inval_job *ijob = NULL, *mjob = NULL;
> struct dma_fence **fences = NULL;
> struct dma_fence_array *cf = NULL;
> struct xe_range_fence *rfence;
> @@ -2440,34 +2388,47 @@ xe_pt_update_ops_run(struct xe_tile *tile,
> struct xe_vma_ops *vops)
> #endif
>
> if (pt_update_ops->needs_invalidation) {
> - ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
> - if (!ifence) {
> - err = -ENOMEM;
> + ijob = xe_gt_tlb_inval_job_create(pt_update_ops->q,
> + tile->primary_gt,
> + pt_update_ops-
> >start,
> + pt_update_ops-
> >last,
> + vm->usm.asid);
> +
Remove extra line.
> + if (IS_ERR(ijob)) {
> + err = PTR_ERR(ijob);
> goto kill_vm_tile1;
> }
> +
> if (tile->media_gt) {
> - mfence = kzalloc(sizeof(*ifence),
I realize it's the same, but this should probably be sizeof(*mfence).
> GFP_KERNEL);
> - if (!mfence) {
> - err = -ENOMEM;
> - goto free_ifence;
> + mjob =
> xe_gt_tlb_inval_job_create(pt_update_ops->q,
> + tile-
> >media_gt,
> +
> pt_update_ops->start,
> +
> pt_update_ops->last,
> + vm-
> >usm.asid);
> + if (IS_ERR(mjob)) {
> + err = PTR_ERR(mjob);
> + goto free_ijob;
I think this needs a little more granularity. In free_ijob below, we're
also doing a kfree for fences and cf, both of which at this point
aren't yet allocated. I realize you aren't changing anything really
here, but that looks wrong to me. Maybe we just haven't hit an issue
here because we've never tested the ENOMEM case?
> }
> fences = kmalloc_array(2, sizeof(*fences),
> GFP_KERNEL);
> if (!fences) {
> err = -ENOMEM;
> - goto free_ifence;
> + goto free_ijob;
> }
> cf = dma_fence_array_alloc(2);
> if (!cf) {
> err = -ENOMEM;
> - goto free_ifence;
> + goto free_ijob;
> }
> }
> +
> + update.ijob = ijob;
> + update.mjob = mjob;
Is there a reason not to put these inline above where the ijob and mjob
are allocated? That way if we moved this to a loop eventually (not
here, in a future patch) we could more easily reduce the number of
indentations here by doing:
if (!media_gt)
continue;
> }
>
> rfence = kzalloc(sizeof(*rfence), GFP_KERNEL);
> if (!rfence) {
> err = -ENOMEM;
> - goto free_ifence;
> + goto free_ijob;
> }
>
> fence = xe_migrate_update_pgtables(tile->migrate, &update);
> @@ -2491,30 +2452,30 @@ xe_pt_update_ops_run(struct xe_tile *tile,
> struct xe_vma_ops *vops)
> pt_update_ops->last, fence))
> dma_fence_wait(fence, false);
>
> - /* tlb invalidation must be done before signaling rebind */
Why drop the comment?
Thanks,
Stuart
> - if (ifence) {
> - if (mfence)
> - dma_fence_get(fence);
> - invalidation_fence_init(tile->primary_gt, ifence,
> fence,
> - pt_update_ops->start,
> - pt_update_ops->last, vm-
> >usm.asid);
> - if (mfence) {
> - invalidation_fence_init(tile->media_gt,
> mfence, fence,
> - pt_update_ops->start,
> - pt_update_ops->last,
> vm->usm.asid);
> - fences[0] = &ifence->base.base;
> - fences[1] = &mfence->base.base;
> + if (ijob) {
> + struct dma_fence *__fence;
> +
> + ifence = xe_gt_tlb_inval_job_push(ijob, tile-
> >migrate, fence);
> + __fence = ifence;
> +
> + if (mjob) {
> + fences[0] = ifence;
> + mfence = xe_gt_tlb_inval_job_push(mjob, tile-
> >migrate,
> + fence);
> + fences[1] = mfence;
> +
> dma_fence_array_init(cf, 2, fences,
> vm->composite_fence_ctx,
> vm-
> >composite_fence_seqno++,
> false);
> - fence = &cf->base;
> - } else {
> - fence = &ifence->base.base;
> + __fence = &cf->base;
> }
> +
> + dma_fence_put(fence);
> + fence = __fence;
> }
>
> - if (!mfence) {
> + if (!mjob) {
> dma_resv_add_fence(xe_vm_resv(vm), fence,
> pt_update_ops->wait_vm_bookkeep ?
> DMA_RESV_USAGE_KERNEL :
> @@ -2523,19 +2484,19 @@ xe_pt_update_ops_run(struct xe_tile *tile,
> struct xe_vma_ops *vops)
> list_for_each_entry(op, &vops->list, link)
> op_commit(vops->vm, tile, pt_update_ops, op,
> fence, NULL);
> } else {
> - dma_resv_add_fence(xe_vm_resv(vm), &ifence-
> >base.base,
> + dma_resv_add_fence(xe_vm_resv(vm), ifence,
> pt_update_ops->wait_vm_bookkeep ?
> DMA_RESV_USAGE_KERNEL :
> DMA_RESV_USAGE_BOOKKEEP);
>
> - dma_resv_add_fence(xe_vm_resv(vm), &mfence-
> >base.base,
> + dma_resv_add_fence(xe_vm_resv(vm), mfence,
> pt_update_ops->wait_vm_bookkeep ?
> DMA_RESV_USAGE_KERNEL :
> DMA_RESV_USAGE_BOOKKEEP);
>
> list_for_each_entry(op, &vops->list, link)
> - op_commit(vops->vm, tile, pt_update_ops, op,
> - &ifence->base.base, &mfence-
> >base.base);
> + op_commit(vops->vm, tile, pt_update_ops, op,
> ifence,
> + mfence);
> }
>
> if (pt_update_ops->needs_svm_lock)
> @@ -2543,15 +2504,18 @@ xe_pt_update_ops_run(struct xe_tile *tile,
> struct xe_vma_ops *vops)
> if (pt_update_ops->needs_userptr_lock)
> up_read(&vm->userptr.notifier_lock);
>
> + xe_gt_tlb_inval_job_put(mjob);
> + xe_gt_tlb_inval_job_put(ijob);
> +
> return fence;
>
> free_rfence:
> kfree(rfence);
> -free_ifence:
> +free_ijob:
> kfree(cf);
> kfree(fences);
> - kfree(mfence);
> - kfree(ifence);
> + xe_gt_tlb_inval_job_put(mjob);
> + xe_gt_tlb_inval_job_put(ijob);
> kill_vm_tile1:
> if (err != -EAGAIN && err != -ENODATA && tile->id)
> xe_vm_kill(vops->vm, false);
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 8/9] drm/xe: Use GT TLB invalidation jobs in PT layer
2025-07-17 21:00 ` Summers, Stuart
@ 2025-07-17 21:07 ` Matthew Brost
2025-07-17 22:26 ` Summers, Stuart
0 siblings, 1 reply; 45+ messages in thread
From: Matthew Brost @ 2025-07-17 21:07 UTC (permalink / raw)
To: Summers, Stuart
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Thu, Jul 17, 2025 at 03:00:14PM -0600, Summers, Stuart wrote:
> On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > Rather than open-coding GT TLB invalidations in the PT layer, use GT
> > TLB
> > invalidation jobs. The real benefit is that GT TLB invalidation jobs
> > use
> > a single dma-fence context, allowing the generated fences to be
> > squashed
> > in dma-resv/DRM scheduler.
> >
> > v2:
> > - s/;;/; (checkpatch)
> > - Move ijob/mjob job push after range fence install
> >
> > Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_migrate.h | 9 ++
> > drivers/gpu/drm/xe/xe_pt.c | 178 +++++++++++++-----------------
> > --
> > 2 files changed, 80 insertions(+), 107 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> > b/drivers/gpu/drm/xe/xe_migrate.h
> > index e9d83d320f8c..605398ea773e 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.h
> > +++ b/drivers/gpu/drm/xe/xe_migrate.h
> > @@ -14,6 +14,7 @@ struct ttm_resource;
> >
> > struct xe_bo;
> > struct xe_gt;
> > +struct xe_gt_tlb_inval_job;
> > struct xe_exec_queue;
> > struct xe_migrate;
> > struct xe_migrate_pt_update;
> > @@ -89,6 +90,14 @@ struct xe_migrate_pt_update {
> > struct xe_vma_ops *vops;
> > /** @job: The job if a GPU page-table update. NULL otherwise
> > */
> > struct xe_sched_job *job;
> > + /**
> > + * @ijob: The GT TLB invalidation job for primary tile. NULL
> > otherwise
> > + */
> > + struct xe_gt_tlb_inval_job *ijob;
> > + /**
> > + * @mjob: The GT TLB invalidation job for media tile. NULL
> > otherwise
> > + */
> > + struct xe_gt_tlb_inval_job *mjob;
> > /** @tile_id: Tile ID of the update */
> > u8 tile_id;
> > };
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> > index c8e63bd23300..67d02307779b 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -13,7 +13,7 @@
> > #include "xe_drm_client.h"
> > #include "xe_exec_queue.h"
> > #include "xe_gt.h"
> > -#include "xe_gt_tlb_invalidation.h"
> > +#include "xe_gt_tlb_inval_job.h"
> > #include "xe_migrate.h"
> > #include "xe_pt_types.h"
> > #include "xe_pt_walk.h"
> > @@ -1261,6 +1261,8 @@ static int op_add_deps(struct xe_vm *vm, struct
> > xe_vma_op *op,
> > }
> >
> > static int xe_pt_vm_dependencies(struct xe_sched_job *job,
> > + struct xe_gt_tlb_inval_job *ijob,
> > + struct xe_gt_tlb_inval_job *mjob,
> > struct xe_vm *vm,
> > struct xe_vma_ops *vops,
> > struct xe_vm_pgtable_update_ops
> > *pt_update_ops,
> > @@ -1328,6 +1330,20 @@ static int xe_pt_vm_dependencies(struct
> > xe_sched_job *job,
> > for (i = 0; job && !err && i < vops->num_syncs; i++)
> > err = xe_sync_entry_add_deps(&vops->syncs[i], job);
> >
> > + if (job) {
> > + if (ijob) {
> > + err = xe_gt_tlb_inval_job_alloc_dep(ijob);
> > + if (err)
> > + return err;
> > + }
> > +
> > + if (mjob) {
> > + err = xe_gt_tlb_inval_job_alloc_dep(mjob);
> > + if (err)
> > + return err;
> > + }
> > + }
> > +
> > return err;
> > }
> >
> > @@ -1339,7 +1355,8 @@ static int xe_pt_pre_commit(struct
> > xe_migrate_pt_update *pt_update)
> > struct xe_vm_pgtable_update_ops *pt_update_ops =
> > &vops->pt_update_ops[pt_update->tile_id];
> >
> > - return xe_pt_vm_dependencies(pt_update->job, vm, pt_update-
> > >vops,
> > + return xe_pt_vm_dependencies(pt_update->job, pt_update->ijob,
> > + pt_update->mjob, vm, pt_update-
> > >vops,
> > pt_update_ops, rftree);
> > }
> >
> > @@ -1509,75 +1526,6 @@ static int xe_pt_svm_pre_commit(struct
> > xe_migrate_pt_update *pt_update)
> > }
> > #endif
> >
> > -struct invalidation_fence {
> > - struct xe_gt_tlb_invalidation_fence base;
> > - struct xe_gt *gt;
> > - struct dma_fence *fence;
> > - struct dma_fence_cb cb;
> > - struct work_struct work;
> > - u64 start;
> > - u64 end;
> > - u32 asid;
> > -};
> > -
> > -static void invalidation_fence_cb(struct dma_fence *fence,
> > - struct dma_fence_cb *cb)
> > -{
> > - struct invalidation_fence *ifence =
> > - container_of(cb, struct invalidation_fence, cb);
> > - struct xe_device *xe = gt_to_xe(ifence->gt);
> > -
> > - trace_xe_gt_tlb_invalidation_fence_cb(xe, &ifence->base);
> > - if (!ifence->fence->error) {
> > - queue_work(system_wq, &ifence->work);
> > - } else {
> > - ifence->base.base.error = ifence->fence->error;
> > - xe_gt_tlb_invalidation_fence_signal(&ifence->base);
> > - }
> > - dma_fence_put(ifence->fence);
> > -}
> > -
> > -static void invalidation_fence_work_func(struct work_struct *w)
> > -{
> > - struct invalidation_fence *ifence =
> > - container_of(w, struct invalidation_fence, work);
> > - struct xe_device *xe = gt_to_xe(ifence->gt);
> > -
> > - trace_xe_gt_tlb_invalidation_fence_work_func(xe, &ifence-
> > >base);
> > - xe_gt_tlb_invalidation_range(ifence->gt, &ifence->base,
> > ifence->start,
> > - ifence->end, ifence->asid);
> > -}
> > -
> > -static void invalidation_fence_init(struct xe_gt *gt,
> > - struct invalidation_fence
> > *ifence,
> > - struct dma_fence *fence,
> > - u64 start, u64 end, u32 asid)
> > -{
> > - int ret;
> > -
> > - trace_xe_gt_tlb_invalidation_fence_create(gt_to_xe(gt),
> > &ifence->base);
> > -
> > - xe_gt_tlb_invalidation_fence_init(gt, &ifence->base, false);
> > -
> > - ifence->fence = fence;
> > - ifence->gt = gt;
> > - ifence->start = start;
> > - ifence->end = end;
> > - ifence->asid = asid;
> > -
> > - INIT_WORK(&ifence->work, invalidation_fence_work_func);
> > - ret = dma_fence_add_callback(fence, &ifence->cb,
> > invalidation_fence_cb);
> > - if (ret == -ENOENT) {
> > - dma_fence_put(ifence->fence); /* Usually dropped in
> > CB */
> > - invalidation_fence_work_func(&ifence->work);
> > - } else if (ret) {
> > - dma_fence_put(&ifence->base.base); /* Caller ref
> > */
> > - dma_fence_put(&ifence->base.base); /* Creation
> > ref */
> > - }
> > -
> > - xe_gt_assert(gt, !ret || ret == -ENOENT);
> > -}
> > -
> > struct xe_pt_stage_unbind_walk {
> > /** @base: The pagewalk base-class. */
> > struct xe_pt_walk base;
> > @@ -2407,8 +2355,8 @@ xe_pt_update_ops_run(struct xe_tile *tile,
> > struct xe_vma_ops *vops)
> > struct xe_vm *vm = vops->vm;
> > struct xe_vm_pgtable_update_ops *pt_update_ops =
> > &vops->pt_update_ops[tile->id];
> > - struct dma_fence *fence;
> > - struct invalidation_fence *ifence = NULL, *mfence = NULL;
> > + struct dma_fence *fence, *ifence, *mfence;
> > + struct xe_gt_tlb_inval_job *ijob = NULL, *mjob = NULL;
> > struct dma_fence **fences = NULL;
> > struct dma_fence_array *cf = NULL;
> > struct xe_range_fence *rfence;
> > @@ -2440,34 +2388,47 @@ xe_pt_update_ops_run(struct xe_tile *tile,
> > struct xe_vma_ops *vops)
> > #endif
> >
> > if (pt_update_ops->needs_invalidation) {
> > - ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
> > - if (!ifence) {
> > - err = -ENOMEM;
> > + ijob = xe_gt_tlb_inval_job_create(pt_update_ops->q,
> > + tile->primary_gt,
> > + pt_update_ops-
> > >start,
> > + pt_update_ops-
> > >last,
> > + vm->usm.asid);
> > +
>
> Remove extra line.
>
Sure.
> > + if (IS_ERR(ijob)) {
> > + err = PTR_ERR(ijob);
> > goto kill_vm_tile1;
> > }
> > +
> > if (tile->media_gt) {
> > - mfence = kzalloc(sizeof(*ifence),
>
> I realize it's the same, but this should probably be sizeof(*mfence).
>
Well it is getting removed.
> > GFP_KERNEL);
> > - if (!mfence) {
> > - err = -ENOMEM;
> > - goto free_ifence;
> > + mjob =
> > xe_gt_tlb_inval_job_create(pt_update_ops->q,
> > + tile-
> > >media_gt,
> > +
> > pt_update_ops->start,
> > +
> > pt_update_ops->last,
> > + vm-
> > >usm.asid);
> > + if (IS_ERR(mjob)) {
> > + err = PTR_ERR(mjob);
> > + goto free_ijob;
>
> I think this needs a little more granularity. In free_ijob below, we're
> also doing a kfree for fences and cf, both of which at this point
> aren't yet allocated. I realize you aren't changing anything really
> here, but that looks wrong to me. Maybe we just haven't hit an issue
> here because we've never tested the ENOMEM case?
>
fences & cf are initialized to NULL, kfree on a NULL is a nop.
Same with xe_gt_tlb_inval_job_put, skips on NULL or IS_ERR.
> > }
> > fences = kmalloc_array(2, sizeof(*fences),
> > GFP_KERNEL);
> > if (!fences) {
> > err = -ENOMEM;
> > - goto free_ifence;
> > + goto free_ijob;
> > }
> > cf = dma_fence_array_alloc(2);
> > if (!cf) {
> > err = -ENOMEM;
> > - goto free_ifence;
> > + goto free_ijob;
> > }
> > }
> > +
> > + update.ijob = ijob;
> > + update.mjob = mjob;
>
> Is there a reason not to put these inline above where the ijob and mjob
> are allocated? That way if we moved this to a loop eventually (not
> here, in a future patch) we could more easily reduce the number of
> indentations here by doing:
Sure, can move.
> if (!media_gt)
> continue;
>
> > }
> >
> > rfence = kzalloc(sizeof(*rfence), GFP_KERNEL);
> > if (!rfence) {
> > err = -ENOMEM;
> > - goto free_ifence;
> > + goto free_ijob;
> > }
> >
> > fence = xe_migrate_update_pgtables(tile->migrate, &update);
> > @@ -2491,30 +2452,30 @@ xe_pt_update_ops_run(struct xe_tile *tile,
> > struct xe_vma_ops *vops)
> > pt_update_ops->last, fence))
> > dma_fence_wait(fence, false);
> >
> > - /* tlb invalidation must be done before signaling rebind */
>
> Why drop the comment?
>
Let me pull that back in.
Matt
> Thanks,
> Stuart
>
> > - if (ifence) {
> > - if (mfence)
> > - dma_fence_get(fence);
> > - invalidation_fence_init(tile->primary_gt, ifence,
> > fence,
> > - pt_update_ops->start,
> > - pt_update_ops->last, vm-
> > >usm.asid);
> > - if (mfence) {
> > - invalidation_fence_init(tile->media_gt,
> > mfence, fence,
> > - pt_update_ops->start,
> > - pt_update_ops->last,
> > vm->usm.asid);
> > - fences[0] = &ifence->base.base;
> > - fences[1] = &mfence->base.base;
> > + if (ijob) {
> > + struct dma_fence *__fence;
> > +
> > + ifence = xe_gt_tlb_inval_job_push(ijob, tile-
> > >migrate, fence);
> > + __fence = ifence;
> > +
> > + if (mjob) {
> > + fences[0] = ifence;
> > + mfence = xe_gt_tlb_inval_job_push(mjob, tile-
> > >migrate,
> > + fence);
> > + fences[1] = mfence;
> > +
> > dma_fence_array_init(cf, 2, fences,
> > vm->composite_fence_ctx,
> > vm-
> > >composite_fence_seqno++,
> > false);
> > - fence = &cf->base;
> > - } else {
> > - fence = &ifence->base.base;
> > + __fence = &cf->base;
> > }
> > +
> > + dma_fence_put(fence);
> > + fence = __fence;
> > }
> >
> > - if (!mfence) {
> > + if (!mjob) {
> > dma_resv_add_fence(xe_vm_resv(vm), fence,
> > pt_update_ops->wait_vm_bookkeep ?
> > DMA_RESV_USAGE_KERNEL :
> > @@ -2523,19 +2484,19 @@ xe_pt_update_ops_run(struct xe_tile *tile,
> > struct xe_vma_ops *vops)
> > list_for_each_entry(op, &vops->list, link)
> > op_commit(vops->vm, tile, pt_update_ops, op,
> > fence, NULL);
> > } else {
> > - dma_resv_add_fence(xe_vm_resv(vm), &ifence-
> > >base.base,
> > + dma_resv_add_fence(xe_vm_resv(vm), ifence,
> > pt_update_ops->wait_vm_bookkeep ?
> > DMA_RESV_USAGE_KERNEL :
> > DMA_RESV_USAGE_BOOKKEEP);
> >
> > - dma_resv_add_fence(xe_vm_resv(vm), &mfence-
> > >base.base,
> > + dma_resv_add_fence(xe_vm_resv(vm), mfence,
> > pt_update_ops->wait_vm_bookkeep ?
> > DMA_RESV_USAGE_KERNEL :
> > DMA_RESV_USAGE_BOOKKEEP);
> >
> > list_for_each_entry(op, &vops->list, link)
> > - op_commit(vops->vm, tile, pt_update_ops, op,
> > - &ifence->base.base, &mfence-
> > >base.base);
> > + op_commit(vops->vm, tile, pt_update_ops, op,
> > ifence,
> > + mfence);
> > }
> >
> > if (pt_update_ops->needs_svm_lock)
> > @@ -2543,15 +2504,18 @@ xe_pt_update_ops_run(struct xe_tile *tile,
> > struct xe_vma_ops *vops)
> > if (pt_update_ops->needs_userptr_lock)
> > up_read(&vm->userptr.notifier_lock);
> >
> > + xe_gt_tlb_inval_job_put(mjob);
> > + xe_gt_tlb_inval_job_put(ijob);
> > +
> > return fence;
> >
> > free_rfence:
> > kfree(rfence);
> > -free_ifence:
> > +free_ijob:
> > kfree(cf);
> > kfree(fences);
> > - kfree(mfence);
> > - kfree(ifence);
> > + xe_gt_tlb_inval_job_put(mjob);
> > + xe_gt_tlb_inval_job_put(ijob);
> > kill_vm_tile1:
> > if (err != -EAGAIN && err != -ENODATA && tile->id)
> > xe_vm_kill(vops->vm, false);
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 8/9] drm/xe: Use GT TLB invalidation jobs in PT layer
2025-07-17 21:07 ` Matthew Brost
@ 2025-07-17 22:26 ` Summers, Stuart
2025-07-17 22:35 ` Matthew Brost
0 siblings, 1 reply; 45+ messages in thread
From: Summers, Stuart @ 2025-07-17 22:26 UTC (permalink / raw)
To: Brost, Matthew
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Thu, 2025-07-17 at 14:07 -0700, Matthew Brost wrote:
> On Thu, Jul 17, 2025 at 03:00:14PM -0600, Summers, Stuart wrote:
> > On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > > Rather than open-coding GT TLB invalidations in the PT layer, use
> > > GT
> > > TLB
> > > invalidation jobs. The real benefit is that GT TLB invalidation
> > > jobs
> > > use
> > > a single dma-fence context, allowing the generated fences to be
> > > squashed
> > > in dma-resv/DRM scheduler.
> > >
> > > v2:
> > > - s/;;/; (checkpatch)
> > > - Move ijob/mjob job push after range fence install
> > >
> > > Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > > drivers/gpu/drm/xe/xe_migrate.h | 9 ++
> > > drivers/gpu/drm/xe/xe_pt.c | 178 +++++++++++++-------------
> > > ----
> > > --
> > > 2 files changed, 80 insertions(+), 107 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> > > b/drivers/gpu/drm/xe/xe_migrate.h
> > > index e9d83d320f8c..605398ea773e 100644
> > > --- a/drivers/gpu/drm/xe/xe_migrate.h
> > > +++ b/drivers/gpu/drm/xe/xe_migrate.h
> > > @@ -14,6 +14,7 @@ struct ttm_resource;
> > >
> > > struct xe_bo;
> > > struct xe_gt;
> > > +struct xe_gt_tlb_inval_job;
> > > struct xe_exec_queue;
> > > struct xe_migrate;
> > > struct xe_migrate_pt_update;
> > > @@ -89,6 +90,14 @@ struct xe_migrate_pt_update {
> > > struct xe_vma_ops *vops;
> > > /** @job: The job if a GPU page-table update. NULL
> > > otherwise
> > > */
> > > struct xe_sched_job *job;
> > > + /**
> > > + * @ijob: The GT TLB invalidation job for primary tile.
> > > NULL
> > > otherwise
> > > + */
> > > + struct xe_gt_tlb_inval_job *ijob;
> > > + /**
> > > + * @mjob: The GT TLB invalidation job for media tile.
> > > NULL
> > > otherwise
> > > + */
> > > + struct xe_gt_tlb_inval_job *mjob;
> > > /** @tile_id: Tile ID of the update */
> > > u8 tile_id;
> > > };
> > > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > > b/drivers/gpu/drm/xe/xe_pt.c
> > > index c8e63bd23300..67d02307779b 100644
> > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > @@ -13,7 +13,7 @@
> > > #include "xe_drm_client.h"
> > > #include "xe_exec_queue.h"
> > > #include "xe_gt.h"
> > > -#include "xe_gt_tlb_invalidation.h"
> > > +#include "xe_gt_tlb_inval_job.h"
> > > #include "xe_migrate.h"
> > > #include "xe_pt_types.h"
> > > #include "xe_pt_walk.h"
> > > @@ -1261,6 +1261,8 @@ static int op_add_deps(struct xe_vm *vm,
> > > struct
> > > xe_vma_op *op,
> > > }
> > >
> > > static int xe_pt_vm_dependencies(struct xe_sched_job *job,
> > > + struct xe_gt_tlb_inval_job
> > > *ijob,
> > > + struct xe_gt_tlb_inval_job
> > > *mjob,
> > > struct xe_vm *vm,
> > > struct xe_vma_ops *vops,
> > > struct xe_vm_pgtable_update_ops
> > > *pt_update_ops,
> > > @@ -1328,6 +1330,20 @@ static int xe_pt_vm_dependencies(struct
> > > xe_sched_job *job,
> > > for (i = 0; job && !err && i < vops->num_syncs; i++)
> > > err = xe_sync_entry_add_deps(&vops->syncs[i],
> > > job);
> > >
> > > + if (job) {
> > > + if (ijob) {
> > > + err =
> > > xe_gt_tlb_inval_job_alloc_dep(ijob);
> > > + if (err)
> > > + return err;
> > > + }
> > > +
> > > + if (mjob) {
> > > + err =
> > > xe_gt_tlb_inval_job_alloc_dep(mjob);
> > > + if (err)
> > > + return err;
> > > + }
> > > + }
> > > +
> > > return err;
> > > }
> > >
> > > @@ -1339,7 +1355,8 @@ static int xe_pt_pre_commit(struct
> > > xe_migrate_pt_update *pt_update)
> > > struct xe_vm_pgtable_update_ops *pt_update_ops =
> > > &vops->pt_update_ops[pt_update->tile_id];
> > >
> > > - return xe_pt_vm_dependencies(pt_update->job, vm,
> > > pt_update-
> > > > vops,
> > > + return xe_pt_vm_dependencies(pt_update->job, pt_update-
> > > >ijob,
> > > + pt_update->mjob, vm,
> > > pt_update-
> > > > vops,
> > > pt_update_ops, rftree);
> > > }
> > >
> > > @@ -1509,75 +1526,6 @@ static int xe_pt_svm_pre_commit(struct
> > > xe_migrate_pt_update *pt_update)
> > > }
> > > #endif
> > >
> > > -struct invalidation_fence {
> > > - struct xe_gt_tlb_invalidation_fence base;
> > > - struct xe_gt *gt;
> > > - struct dma_fence *fence;
> > > - struct dma_fence_cb cb;
> > > - struct work_struct work;
> > > - u64 start;
> > > - u64 end;
> > > - u32 asid;
> > > -};
> > > -
> > > -static void invalidation_fence_cb(struct dma_fence *fence,
> > > - struct dma_fence_cb *cb)
> > > -{
> > > - struct invalidation_fence *ifence =
> > > - container_of(cb, struct invalidation_fence, cb);
> > > - struct xe_device *xe = gt_to_xe(ifence->gt);
> > > -
> > > - trace_xe_gt_tlb_invalidation_fence_cb(xe, &ifence->base);
> > > - if (!ifence->fence->error) {
> > > - queue_work(system_wq, &ifence->work);
> > > - } else {
> > > - ifence->base.base.error = ifence->fence->error;
> > > - xe_gt_tlb_invalidation_fence_signal(&ifence-
> > > >base);
> > > - }
> > > - dma_fence_put(ifence->fence);
> > > -}
> > > -
> > > -static void invalidation_fence_work_func(struct work_struct *w)
> > > -{
> > > - struct invalidation_fence *ifence =
> > > - container_of(w, struct invalidation_fence, work);
> > > - struct xe_device *xe = gt_to_xe(ifence->gt);
> > > -
> > > - trace_xe_gt_tlb_invalidation_fence_work_func(xe, &ifence-
> > > > base);
> > > - xe_gt_tlb_invalidation_range(ifence->gt, &ifence->base,
> > > ifence->start,
> > > - ifence->end, ifence->asid);
> > > -}
> > > -
> > > -static void invalidation_fence_init(struct xe_gt *gt,
> > > - struct invalidation_fence
> > > *ifence,
> > > - struct dma_fence *fence,
> > > - u64 start, u64 end, u32 asid)
> > > -{
> > > - int ret;
> > > -
> > > - trace_xe_gt_tlb_invalidation_fence_create(gt_to_xe(gt),
> > > &ifence->base);
> > > -
> > > - xe_gt_tlb_invalidation_fence_init(gt, &ifence->base,
> > > false);
> > > -
> > > - ifence->fence = fence;
> > > - ifence->gt = gt;
> > > - ifence->start = start;
> > > - ifence->end = end;
> > > - ifence->asid = asid;
> > > -
> > > - INIT_WORK(&ifence->work, invalidation_fence_work_func);
> > > - ret = dma_fence_add_callback(fence, &ifence->cb,
> > > invalidation_fence_cb);
> > > - if (ret == -ENOENT) {
> > > - dma_fence_put(ifence->fence); /* Usually
> > > dropped in
> > > CB */
> > > - invalidation_fence_work_func(&ifence->work);
> > > - } else if (ret) {
> > > - dma_fence_put(&ifence->base.base); /* Caller
> > > ref
> > > */
> > > - dma_fence_put(&ifence->base.base); /*
> > > Creation
> > > ref */
> > > - }
> > > -
> > > - xe_gt_assert(gt, !ret || ret == -ENOENT);
> > > -}
> > > -
> > > struct xe_pt_stage_unbind_walk {
> > > /** @base: The pagewalk base-class. */
> > > struct xe_pt_walk base;
> > > @@ -2407,8 +2355,8 @@ xe_pt_update_ops_run(struct xe_tile *tile,
> > > struct xe_vma_ops *vops)
> > > struct xe_vm *vm = vops->vm;
> > > struct xe_vm_pgtable_update_ops *pt_update_ops =
> > > &vops->pt_update_ops[tile->id];
> > > - struct dma_fence *fence;
> > > - struct invalidation_fence *ifence = NULL, *mfence = NULL;
> > > + struct dma_fence *fence, *ifence, *mfence;
> > > + struct xe_gt_tlb_inval_job *ijob = NULL, *mjob = NULL;
> > > struct dma_fence **fences = NULL;
> > > struct dma_fence_array *cf = NULL;
> > > struct xe_range_fence *rfence;
> > > @@ -2440,34 +2388,47 @@ xe_pt_update_ops_run(struct xe_tile
> > > *tile,
> > > struct xe_vma_ops *vops)
> > > #endif
> > >
> > > if (pt_update_ops->needs_invalidation) {
> > > - ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
> > > - if (!ifence) {
> > > - err = -ENOMEM;
> > > + ijob = xe_gt_tlb_inval_job_create(pt_update_ops-
> > > >q,
> > > + tile-
> > > >primary_gt,
> > > + pt_update_ops-
> > > > start,
> > > + pt_update_ops-
> > > > last,
> > > + vm->usm.asid);
> > > +
> >
> > Remove extra line.
> >
>
> Sure.
>
> > > + if (IS_ERR(ijob)) {
> > > + err = PTR_ERR(ijob);
> > > goto kill_vm_tile1;
> > > }
> > > +
> > > if (tile->media_gt) {
> > > - mfence = kzalloc(sizeof(*ifence),
> >
> > I realize it's the same, but this should probably be
> > sizeof(*mfence).
> >
>
> Well it is getting removed.
Heh, somehow I misread that...
>
>
> > > GFP_KERNEL);
> > > - if (!mfence) {
> > > - err = -ENOMEM;
> > > - goto free_ifence;
> > > + mjob =
> > > xe_gt_tlb_inval_job_create(pt_update_ops->q,
> > > + tile-
> > > > media_gt,
> > > +
> > > pt_update_ops->start,
> > > +
> > > pt_update_ops->last,
> > > + vm-
> > > > usm.asid);
> > > + if (IS_ERR(mjob)) {
> > > + err = PTR_ERR(mjob);
> > > + goto free_ijob;
> >
> > I think this needs a little more granularity. In free_ijob below,
> > we're
> > also doing a kfree for fences and cf, both of which at this point
> > aren't yet allocated. I realize you aren't changing anything really
> > here, but that looks wrong to me. Maybe we just haven't hit an
> > issue
> > here because we've never tested the ENOMEM case?
> >
>
> fences & cf are initialized to NULL, kfree on a NULL is a nop.
>
> Same with xe_gt_tlb_inval_job_put, skips on NULL or IS_ERR.
Yeah I saw the job_put.. and true on the kfree... I guess we're saving
a couple of labels at the bottom of the function, but it still feels
wrong as we're depending on the underlying implementation, which would
break if we changed that or wrapped it somehow.
Anyway not a deal breaker, it's the same as it was before anyway.
>
> > > }
> > > fences = kmalloc_array(2,
> > > sizeof(*fences),
> > > GFP_KERNEL);
> > > if (!fences) {
> > > err = -ENOMEM;
> > > - goto free_ifence;
> > > + goto free_ijob;
> > > }
> > > cf = dma_fence_array_alloc(2);
> > > if (!cf) {
> > > err = -ENOMEM;
> > > - goto free_ifence;
> > > + goto free_ijob;
> > > }
> > > }
> > > +
> > > + update.ijob = ijob;
> > > + update.mjob = mjob;
> >
> > Is there a reason not to put these inline above where the ijob and
> > mjob
> > are allocated? That way if we moved this to a loop eventually (not
> > here, in a future patch) we could more easily reduce the number of
> > indentations here by doing:
>
> Sure, can move.
>
> > if (!media_gt)
> > continue;
> >
> > > }
> > >
> > > rfence = kzalloc(sizeof(*rfence), GFP_KERNEL);
> > > if (!rfence) {
> > > err = -ENOMEM;
> > > - goto free_ifence;
> > > + goto free_ijob;
> > > }
> > >
> > > fence = xe_migrate_update_pgtables(tile->migrate,
> > > &update);
> > > @@ -2491,30 +2452,30 @@ xe_pt_update_ops_run(struct xe_tile
> > > *tile,
> > > struct xe_vma_ops *vops)
> > > pt_update_ops->last, fence))
> > > dma_fence_wait(fence, false);
> > >
> > > - /* tlb invalidation must be done before signaling rebind
> > > */
> >
> > Why drop the comment?
> >
>
> Let me pull that back in.
Yeah so other than those two (comment here and line break above), the
rest looks good to me. I like the job based approach rather than
embedding like we used to have, thanks for the change Matt!
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
>
> Matt
>
> > Thanks,
> > Stuart
> >
> > > - if (ifence) {
> > > - if (mfence)
> > > - dma_fence_get(fence);
> > > - invalidation_fence_init(tile->primary_gt, ifence,
> > > fence,
> > > - pt_update_ops->start,
> > > - pt_update_ops->last, vm-
> > > > usm.asid);
> > > - if (mfence) {
> > > - invalidation_fence_init(tile->media_gt,
> > > mfence, fence,
> > > - pt_update_ops-
> > > >start,
> > > - pt_update_ops-
> > > >last,
> > > vm->usm.asid);
> > > - fences[0] = &ifence->base.base;
> > > - fences[1] = &mfence->base.base;
> > > + if (ijob) {
> > > + struct dma_fence *__fence;
> > > +
> > > + ifence = xe_gt_tlb_inval_job_push(ijob, tile-
> > > > migrate, fence);
> > > + __fence = ifence;
> > > +
> > > + if (mjob) {
> > > + fences[0] = ifence;
> > > + mfence = xe_gt_tlb_inval_job_push(mjob,
> > > tile-
> > > > migrate,
> > > + fence);
> > > + fences[1] = mfence;
> > > +
> > > dma_fence_array_init(cf, 2, fences,
> > > vm-
> > > >composite_fence_ctx,
> > > vm-
> > > > composite_fence_seqno++,
> > > false);
> > > - fence = &cf->base;
> > > - } else {
> > > - fence = &ifence->base.base;
> > > + __fence = &cf->base;
> > > }
> > > +
> > > + dma_fence_put(fence);
> > > + fence = __fence;
> > > }
> > >
> > > - if (!mfence) {
> > > + if (!mjob) {
> > > dma_resv_add_fence(xe_vm_resv(vm), fence,
> > > pt_update_ops-
> > > >wait_vm_bookkeep ?
> > > DMA_RESV_USAGE_KERNEL :
> > > @@ -2523,19 +2484,19 @@ xe_pt_update_ops_run(struct xe_tile
> > > *tile,
> > > struct xe_vma_ops *vops)
> > > list_for_each_entry(op, &vops->list, link)
> > > op_commit(vops->vm, tile, pt_update_ops,
> > > op,
> > > fence, NULL);
> > > } else {
> > > - dma_resv_add_fence(xe_vm_resv(vm), &ifence-
> > > > base.base,
> > > + dma_resv_add_fence(xe_vm_resv(vm), ifence,
> > > pt_update_ops-
> > > >wait_vm_bookkeep ?
> > > DMA_RESV_USAGE_KERNEL :
> > > DMA_RESV_USAGE_BOOKKEEP);
> > >
> > > - dma_resv_add_fence(xe_vm_resv(vm), &mfence-
> > > > base.base,
> > > + dma_resv_add_fence(xe_vm_resv(vm), mfence,
> > > pt_update_ops-
> > > >wait_vm_bookkeep ?
> > > DMA_RESV_USAGE_KERNEL :
> > > DMA_RESV_USAGE_BOOKKEEP);
> > >
> > > list_for_each_entry(op, &vops->list, link)
> > > - op_commit(vops->vm, tile, pt_update_ops,
> > > op,
> > > - &ifence->base.base, &mfence-
> > > > base.base);
> > > + op_commit(vops->vm, tile, pt_update_ops,
> > > op,
> > > ifence,
> > > + mfence);
> > > }
> > >
> > > if (pt_update_ops->needs_svm_lock)
> > > @@ -2543,15 +2504,18 @@ xe_pt_update_ops_run(struct xe_tile
> > > *tile,
> > > struct xe_vma_ops *vops)
> > > if (pt_update_ops->needs_userptr_lock)
> > > up_read(&vm->userptr.notifier_lock);
> > >
> > > + xe_gt_tlb_inval_job_put(mjob);
> > > + xe_gt_tlb_inval_job_put(ijob);
> > > +
> > > return fence;
> > >
> > > free_rfence:
> > > kfree(rfence);
> > > -free_ifence:
> > > +free_ijob:
> > > kfree(cf);
> > > kfree(fences);
> > > - kfree(mfence);
> > > - kfree(ifence);
> > > + xe_gt_tlb_inval_job_put(mjob);
> > > + xe_gt_tlb_inval_job_put(ijob);
> > > kill_vm_tile1:
> > > if (err != -EAGAIN && err != -ENODATA && tile->id)
> > > xe_vm_kill(vops->vm, false);
> >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 8/9] drm/xe: Use GT TLB invalidation jobs in PT layer
2025-07-17 22:26 ` Summers, Stuart
@ 2025-07-17 22:35 ` Matthew Brost
2025-07-17 22:36 ` Summers, Stuart
0 siblings, 1 reply; 45+ messages in thread
From: Matthew Brost @ 2025-07-17 22:35 UTC (permalink / raw)
To: Summers, Stuart
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Thu, Jul 17, 2025 at 04:26:13PM -0600, Summers, Stuart wrote:
> On Thu, 2025-07-17 at 14:07 -0700, Matthew Brost wrote:
> > On Thu, Jul 17, 2025 at 03:00:14PM -0600, Summers, Stuart wrote:
> > > On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > > > Rather than open-coding GT TLB invalidations in the PT layer, use
> > > > GT
> > > > TLB
> > > > invalidation jobs. The real benefit is that GT TLB invalidation
> > > > jobs
> > > > use
> > > > a single dma-fence context, allowing the generated fences to be
> > > > squashed
> > > > in dma-resv/DRM scheduler.
> > > >
> > > > v2:
> > > > - s/;;/; (checkpatch)
> > > > - Move ijob/mjob job push after range fence install
> > > >
> > > > Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_migrate.h | 9 ++
> > > > drivers/gpu/drm/xe/xe_pt.c | 178 +++++++++++++-------------
> > > > ----
> > > > --
> > > > 2 files changed, 80 insertions(+), 107 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> > > > b/drivers/gpu/drm/xe/xe_migrate.h
> > > > index e9d83d320f8c..605398ea773e 100644
> > > > --- a/drivers/gpu/drm/xe/xe_migrate.h
> > > > +++ b/drivers/gpu/drm/xe/xe_migrate.h
> > > > @@ -14,6 +14,7 @@ struct ttm_resource;
> > > >
> > > > struct xe_bo;
> > > > struct xe_gt;
> > > > +struct xe_gt_tlb_inval_job;
> > > > struct xe_exec_queue;
> > > > struct xe_migrate;
> > > > struct xe_migrate_pt_update;
> > > > @@ -89,6 +90,14 @@ struct xe_migrate_pt_update {
> > > > struct xe_vma_ops *vops;
> > > > /** @job: The job if a GPU page-table update. NULL
> > > > otherwise
> > > > */
> > > > struct xe_sched_job *job;
> > > > + /**
> > > > + * @ijob: The GT TLB invalidation job for primary tile.
> > > > NULL
> > > > otherwise
> > > > + */
> > > > + struct xe_gt_tlb_inval_job *ijob;
> > > > + /**
> > > > + * @mjob: The GT TLB invalidation job for media tile.
> > > > NULL
> > > > otherwise
> > > > + */
> > > > + struct xe_gt_tlb_inval_job *mjob;
> > > > /** @tile_id: Tile ID of the update */
> > > > u8 tile_id;
> > > > };
> > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > > > b/drivers/gpu/drm/xe/xe_pt.c
> > > > index c8e63bd23300..67d02307779b 100644
> > > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > > @@ -13,7 +13,7 @@
> > > > #include "xe_drm_client.h"
> > > > #include "xe_exec_queue.h"
> > > > #include "xe_gt.h"
> > > > -#include "xe_gt_tlb_invalidation.h"
> > > > +#include "xe_gt_tlb_inval_job.h"
> > > > #include "xe_migrate.h"
> > > > #include "xe_pt_types.h"
> > > > #include "xe_pt_walk.h"
> > > > @@ -1261,6 +1261,8 @@ static int op_add_deps(struct xe_vm *vm,
> > > > struct
> > > > xe_vma_op *op,
> > > > }
> > > >
> > > > static int xe_pt_vm_dependencies(struct xe_sched_job *job,
> > > > + struct xe_gt_tlb_inval_job
> > > > *ijob,
> > > > + struct xe_gt_tlb_inval_job
> > > > *mjob,
> > > > struct xe_vm *vm,
> > > > struct xe_vma_ops *vops,
> > > > struct xe_vm_pgtable_update_ops
> > > > *pt_update_ops,
> > > > @@ -1328,6 +1330,20 @@ static int xe_pt_vm_dependencies(struct
> > > > xe_sched_job *job,
> > > > for (i = 0; job && !err && i < vops->num_syncs; i++)
> > > > err = xe_sync_entry_add_deps(&vops->syncs[i],
> > > > job);
> > > >
> > > > + if (job) {
> > > > + if (ijob) {
> > > > + err =
> > > > xe_gt_tlb_inval_job_alloc_dep(ijob);
> > > > + if (err)
> > > > + return err;
> > > > + }
> > > > +
> > > > + if (mjob) {
> > > > + err =
> > > > xe_gt_tlb_inval_job_alloc_dep(mjob);
> > > > + if (err)
> > > > + return err;
> > > > + }
> > > > + }
> > > > +
> > > > return err;
> > > > }
> > > >
> > > > @@ -1339,7 +1355,8 @@ static int xe_pt_pre_commit(struct
> > > > xe_migrate_pt_update *pt_update)
> > > > struct xe_vm_pgtable_update_ops *pt_update_ops =
> > > > &vops->pt_update_ops[pt_update->tile_id];
> > > >
> > > > - return xe_pt_vm_dependencies(pt_update->job, vm,
> > > > pt_update-
> > > > > vops,
> > > > + return xe_pt_vm_dependencies(pt_update->job, pt_update-
> > > > >ijob,
> > > > + pt_update->mjob, vm,
> > > > pt_update-
> > > > > vops,
> > > > pt_update_ops, rftree);
> > > > }
> > > >
> > > > @@ -1509,75 +1526,6 @@ static int xe_pt_svm_pre_commit(struct
> > > > xe_migrate_pt_update *pt_update)
> > > > }
> > > > #endif
> > > >
> > > > -struct invalidation_fence {
> > > > - struct xe_gt_tlb_invalidation_fence base;
> > > > - struct xe_gt *gt;
> > > > - struct dma_fence *fence;
> > > > - struct dma_fence_cb cb;
> > > > - struct work_struct work;
> > > > - u64 start;
> > > > - u64 end;
> > > > - u32 asid;
> > > > -};
> > > > -
> > > > -static void invalidation_fence_cb(struct dma_fence *fence,
> > > > - struct dma_fence_cb *cb)
> > > > -{
> > > > - struct invalidation_fence *ifence =
> > > > - container_of(cb, struct invalidation_fence, cb);
> > > > - struct xe_device *xe = gt_to_xe(ifence->gt);
> > > > -
> > > > - trace_xe_gt_tlb_invalidation_fence_cb(xe, &ifence->base);
> > > > - if (!ifence->fence->error) {
> > > > - queue_work(system_wq, &ifence->work);
> > > > - } else {
> > > > - ifence->base.base.error = ifence->fence->error;
> > > > - xe_gt_tlb_invalidation_fence_signal(&ifence-
> > > > >base);
> > > > - }
> > > > - dma_fence_put(ifence->fence);
> > > > -}
> > > > -
> > > > -static void invalidation_fence_work_func(struct work_struct *w)
> > > > -{
> > > > - struct invalidation_fence *ifence =
> > > > - container_of(w, struct invalidation_fence, work);
> > > > - struct xe_device *xe = gt_to_xe(ifence->gt);
> > > > -
> > > > - trace_xe_gt_tlb_invalidation_fence_work_func(xe, &ifence-
> > > > > base);
> > > > - xe_gt_tlb_invalidation_range(ifence->gt, &ifence->base,
> > > > ifence->start,
> > > > - ifence->end, ifence->asid);
> > > > -}
> > > > -
> > > > -static void invalidation_fence_init(struct xe_gt *gt,
> > > > - struct invalidation_fence
> > > > *ifence,
> > > > - struct dma_fence *fence,
> > > > - u64 start, u64 end, u32 asid)
> > > > -{
> > > > - int ret;
> > > > -
> > > > - trace_xe_gt_tlb_invalidation_fence_create(gt_to_xe(gt),
> > > > &ifence->base);
> > > > -
> > > > - xe_gt_tlb_invalidation_fence_init(gt, &ifence->base,
> > > > false);
> > > > -
> > > > - ifence->fence = fence;
> > > > - ifence->gt = gt;
> > > > - ifence->start = start;
> > > > - ifence->end = end;
> > > > - ifence->asid = asid;
> > > > -
> > > > - INIT_WORK(&ifence->work, invalidation_fence_work_func);
> > > > - ret = dma_fence_add_callback(fence, &ifence->cb,
> > > > invalidation_fence_cb);
> > > > - if (ret == -ENOENT) {
> > > > - dma_fence_put(ifence->fence); /* Usually
> > > > dropped in
> > > > CB */
> > > > - invalidation_fence_work_func(&ifence->work);
> > > > - } else if (ret) {
> > > > - dma_fence_put(&ifence->base.base); /* Caller
> > > > ref
> > > > */
> > > > - dma_fence_put(&ifence->base.base); /*
> > > > Creation
> > > > ref */
> > > > - }
> > > > -
> > > > - xe_gt_assert(gt, !ret || ret == -ENOENT);
> > > > -}
> > > > -
> > > > struct xe_pt_stage_unbind_walk {
> > > > /** @base: The pagewalk base-class. */
> > > > struct xe_pt_walk base;
> > > > @@ -2407,8 +2355,8 @@ xe_pt_update_ops_run(struct xe_tile *tile,
> > > > struct xe_vma_ops *vops)
> > > > struct xe_vm *vm = vops->vm;
> > > > struct xe_vm_pgtable_update_ops *pt_update_ops =
> > > > &vops->pt_update_ops[tile->id];
> > > > - struct dma_fence *fence;
> > > > - struct invalidation_fence *ifence = NULL, *mfence = NULL;
> > > > + struct dma_fence *fence, *ifence, *mfence;
> > > > + struct xe_gt_tlb_inval_job *ijob = NULL, *mjob = NULL;
> > > > struct dma_fence **fences = NULL;
> > > > struct dma_fence_array *cf = NULL;
> > > > struct xe_range_fence *rfence;
> > > > @@ -2440,34 +2388,47 @@ xe_pt_update_ops_run(struct xe_tile
> > > > *tile,
> > > > struct xe_vma_ops *vops)
> > > > #endif
> > > >
> > > > if (pt_update_ops->needs_invalidation) {
> > > > - ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
> > > > - if (!ifence) {
> > > > - err = -ENOMEM;
> > > > + ijob = xe_gt_tlb_inval_job_create(pt_update_ops-
> > > > >q,
> > > > + tile-
> > > > >primary_gt,
> > > > + pt_update_ops-
> > > > > start,
> > > > + pt_update_ops-
> > > > > last,
> > > > + vm->usm.asid);
> > > > +
> > >
> > > Remove extra line.
> > >
> >
> > Sure.
> >
> > > > + if (IS_ERR(ijob)) {
> > > > + err = PTR_ERR(ijob);
> > > > goto kill_vm_tile1;
> > > > }
> > > > +
> > > > if (tile->media_gt) {
> > > > - mfence = kzalloc(sizeof(*ifence),
> > >
> > > I realize it's the same, but this should probably be
> > > sizeof(*mfence).
> > >
> >
> > Well it is getting removed.
>
> Heh, somehow I misread that...
>
> >
> >
> > > > GFP_KERNEL);
> > > > - if (!mfence) {
> > > > - err = -ENOMEM;
> > > > - goto free_ifence;
> > > > + mjob =
> > > > xe_gt_tlb_inval_job_create(pt_update_ops->q,
> > > > + tile-
> > > > > media_gt,
> > > > +
> > > > pt_update_ops->start,
> > > > +
> > > > pt_update_ops->last,
> > > > + vm-
> > > > > usm.asid);
> > > > + if (IS_ERR(mjob)) {
> > > > + err = PTR_ERR(mjob);
> > > > + goto free_ijob;
> > >
> > > I think this needs a little more granularity. In free_ijob below,
> > > we're
> > > also doing a kfree for fences and cf, both of which at this point
> > > aren't yet allocated. I realize you aren't changing anything really
> > > here, but that looks wrong to me. Maybe we just haven't hit an
> > > issue
> > > here because we've never tested the ENOMEM case?
> > >
> >
> > fences & cf are initialized to NULL, kfree on a NULL is a nop.
> >
> > Same with xe_gt_tlb_inval_job_put, skips on NULL or IS_ERR.
>
> Yeah I saw the job_put.. and true on the kfree... I guess we're saving
> a couple of labels at the bottom of the function, but it still feels
> wrong as we're depending on the underlying implementation, which would
> break if we changed that or wrapped it somehow.
>
> Anyway not a deal breaker, it's the same as it was before anyway.
>
I get what you are saying, but we do things like all over driver in
particular with kfree, dma_fence_get/put that rely NULL being an
acceptable argument. I'm not saying this is the best practice, just
saying it is done everywhere in Xe / Linux. In either of cases, kfree or
dma_fence_get/put, changed to not accept NULL the entire Linux kernel
would explode so it is pretty safe to assume this will not be changing.
Matt
> >
> > > > }
> > > > fences = kmalloc_array(2,
> > > > sizeof(*fences),
> > > > GFP_KERNEL);
> > > > if (!fences) {
> > > > err = -ENOMEM;
> > > > - goto free_ifence;
> > > > + goto free_ijob;
> > > > }
> > > > cf = dma_fence_array_alloc(2);
> > > > if (!cf) {
> > > > err = -ENOMEM;
> > > > - goto free_ifence;
> > > > + goto free_ijob;
> > > > }
> > > > }
> > > > +
> > > > + update.ijob = ijob;
> > > > + update.mjob = mjob;
> > >
> > > Is there a reason not to put these inline above where the ijob and
> > > mjob
> > > are allocated? That way if we moved this to a loop eventually (not
> > > here, in a future patch) we could more easily reduce the number of
> > > indentations here by doing:
> >
> > Sure, can move.
> >
> > > if (!media_gt)
> > > continue;
> > >
> > > > }
> > > >
> > > > rfence = kzalloc(sizeof(*rfence), GFP_KERNEL);
> > > > if (!rfence) {
> > > > err = -ENOMEM;
> > > > - goto free_ifence;
> > > > + goto free_ijob;
> > > > }
> > > >
> > > > fence = xe_migrate_update_pgtables(tile->migrate,
> > > > &update);
> > > > @@ -2491,30 +2452,30 @@ xe_pt_update_ops_run(struct xe_tile
> > > > *tile,
> > > > struct xe_vma_ops *vops)
> > > > pt_update_ops->last, fence))
> > > > dma_fence_wait(fence, false);
> > > >
> > > > - /* tlb invalidation must be done before signaling rebind
> > > > */
> > >
> > > Why drop the comment?
> > >
> >
> > Let me pull that back in.
>
> Yeah so other than those two (comment here and line break above), the
> rest looks good to me. I like the job based approach rather than
> embedding like we used to have, thanks for the change Matt!
>
> Reviewed-by: Stuart Summers <stuart.summers@intel.com>
>
> >
> > Matt
> >
> > > Thanks,
> > > Stuart
> > >
> > > > - if (ifence) {
> > > > - if (mfence)
> > > > - dma_fence_get(fence);
> > > > - invalidation_fence_init(tile->primary_gt, ifence,
> > > > fence,
> > > > - pt_update_ops->start,
> > > > - pt_update_ops->last, vm-
> > > > > usm.asid);
> > > > - if (mfence) {
> > > > - invalidation_fence_init(tile->media_gt,
> > > > mfence, fence,
> > > > - pt_update_ops-
> > > > >start,
> > > > - pt_update_ops-
> > > > >last,
> > > > vm->usm.asid);
> > > > - fences[0] = &ifence->base.base;
> > > > - fences[1] = &mfence->base.base;
> > > > + if (ijob) {
> > > > + struct dma_fence *__fence;
> > > > +
> > > > + ifence = xe_gt_tlb_inval_job_push(ijob, tile-
> > > > > migrate, fence);
> > > > + __fence = ifence;
> > > > +
> > > > + if (mjob) {
> > > > + fences[0] = ifence;
> > > > + mfence = xe_gt_tlb_inval_job_push(mjob,
> > > > tile-
> > > > > migrate,
> > > > + fence);
> > > > + fences[1] = mfence;
> > > > +
> > > > dma_fence_array_init(cf, 2, fences,
> > > > vm-
> > > > >composite_fence_ctx,
> > > > vm-
> > > > > composite_fence_seqno++,
> > > > false);
> > > > - fence = &cf->base;
> > > > - } else {
> > > > - fence = &ifence->base.base;
> > > > + __fence = &cf->base;
> > > > }
> > > > +
> > > > + dma_fence_put(fence);
> > > > + fence = __fence;
> > > > }
> > > >
> > > > - if (!mfence) {
> > > > + if (!mjob) {
> > > > dma_resv_add_fence(xe_vm_resv(vm), fence,
> > > > pt_update_ops-
> > > > >wait_vm_bookkeep ?
> > > > DMA_RESV_USAGE_KERNEL :
> > > > @@ -2523,19 +2484,19 @@ xe_pt_update_ops_run(struct xe_tile
> > > > *tile,
> > > > struct xe_vma_ops *vops)
> > > > list_for_each_entry(op, &vops->list, link)
> > > > op_commit(vops->vm, tile, pt_update_ops,
> > > > op,
> > > > fence, NULL);
> > > > } else {
> > > > - dma_resv_add_fence(xe_vm_resv(vm), &ifence-
> > > > > base.base,
> > > > + dma_resv_add_fence(xe_vm_resv(vm), ifence,
> > > > pt_update_ops-
> > > > >wait_vm_bookkeep ?
> > > > DMA_RESV_USAGE_KERNEL :
> > > > DMA_RESV_USAGE_BOOKKEEP);
> > > >
> > > > - dma_resv_add_fence(xe_vm_resv(vm), &mfence-
> > > > > base.base,
> > > > + dma_resv_add_fence(xe_vm_resv(vm), mfence,
> > > > pt_update_ops-
> > > > >wait_vm_bookkeep ?
> > > > DMA_RESV_USAGE_KERNEL :
> > > > DMA_RESV_USAGE_BOOKKEEP);
> > > >
> > > > list_for_each_entry(op, &vops->list, link)
> > > > - op_commit(vops->vm, tile, pt_update_ops,
> > > > op,
> > > > - &ifence->base.base, &mfence-
> > > > > base.base);
> > > > + op_commit(vops->vm, tile, pt_update_ops,
> > > > op,
> > > > ifence,
> > > > + mfence);
> > > > }
> > > >
> > > > if (pt_update_ops->needs_svm_lock)
> > > > @@ -2543,15 +2504,18 @@ xe_pt_update_ops_run(struct xe_tile
> > > > *tile,
> > > > struct xe_vma_ops *vops)
> > > > if (pt_update_ops->needs_userptr_lock)
> > > > up_read(&vm->userptr.notifier_lock);
> > > >
> > > > + xe_gt_tlb_inval_job_put(mjob);
> > > > + xe_gt_tlb_inval_job_put(ijob);
> > > > +
> > > > return fence;
> > > >
> > > > free_rfence:
> > > > kfree(rfence);
> > > > -free_ifence:
> > > > +free_ijob:
> > > > kfree(cf);
> > > > kfree(fences);
> > > > - kfree(mfence);
> > > > - kfree(ifence);
> > > > + xe_gt_tlb_inval_job_put(mjob);
> > > > + xe_gt_tlb_inval_job_put(ijob);
> > > > kill_vm_tile1:
> > > > if (err != -EAGAIN && err != -ENODATA && tile->id)
> > > > xe_vm_kill(vops->vm, false);
> > >
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 8/9] drm/xe: Use GT TLB invalidation jobs in PT layer
2025-07-17 22:35 ` Matthew Brost
@ 2025-07-17 22:36 ` Summers, Stuart
0 siblings, 0 replies; 45+ messages in thread
From: Summers, Stuart @ 2025-07-17 22:36 UTC (permalink / raw)
To: Brost, Matthew
Cc: intel-xe@lists.freedesktop.org, maarten.lankhorst@linux.intel.com,
Auld, Matthew
On Thu, 2025-07-17 at 15:35 -0700, Matthew Brost wrote:
> On Thu, Jul 17, 2025 at 04:26:13PM -0600, Summers, Stuart wrote:
> > On Thu, 2025-07-17 at 14:07 -0700, Matthew Brost wrote:
> > > On Thu, Jul 17, 2025 at 03:00:14PM -0600, Summers, Stuart wrote:
> > > > On Wed, 2025-07-02 at 16:42 -0700, Matthew Brost wrote:
> > > > > Rather than open-coding GT TLB invalidations in the PT layer,
> > > > > use
> > > > > GT
> > > > > TLB
> > > > > invalidation jobs. The real benefit is that GT TLB
> > > > > invalidation
> > > > > jobs
> > > > > use
> > > > > a single dma-fence context, allowing the generated fences to
> > > > > be
> > > > > squashed
> > > > > in dma-resv/DRM scheduler.
> > > > >
> > > > > v2:
> > > > > - s/;;/; (checkpatch)
> > > > > - Move ijob/mjob job push after range fence install
> > > > >
> > > > > Suggested-by: Thomas Hellström
> > > > > <thomas.hellstrom@linux.intel.com>
> > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > ---
> > > > > drivers/gpu/drm/xe/xe_migrate.h | 9 ++
> > > > > drivers/gpu/drm/xe/xe_pt.c | 178 +++++++++++++---------
> > > > > ----
> > > > > ----
> > > > > --
> > > > > 2 files changed, 80 insertions(+), 107 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> > > > > b/drivers/gpu/drm/xe/xe_migrate.h
> > > > > index e9d83d320f8c..605398ea773e 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_migrate.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_migrate.h
> > > > > @@ -14,6 +14,7 @@ struct ttm_resource;
> > > > >
> > > > > struct xe_bo;
> > > > > struct xe_gt;
> > > > > +struct xe_gt_tlb_inval_job;
> > > > > struct xe_exec_queue;
> > > > > struct xe_migrate;
> > > > > struct xe_migrate_pt_update;
> > > > > @@ -89,6 +90,14 @@ struct xe_migrate_pt_update {
> > > > > struct xe_vma_ops *vops;
> > > > > /** @job: The job if a GPU page-table update. NULL
> > > > > otherwise
> > > > > */
> > > > > struct xe_sched_job *job;
> > > > > + /**
> > > > > + * @ijob: The GT TLB invalidation job for primary
> > > > > tile.
> > > > > NULL
> > > > > otherwise
> > > > > + */
> > > > > + struct xe_gt_tlb_inval_job *ijob;
> > > > > + /**
> > > > > + * @mjob: The GT TLB invalidation job for media tile.
> > > > > NULL
> > > > > otherwise
> > > > > + */
> > > > > + struct xe_gt_tlb_inval_job *mjob;
> > > > > /** @tile_id: Tile ID of the update */
> > > > > u8 tile_id;
> > > > > };
> > > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > > > > b/drivers/gpu/drm/xe/xe_pt.c
> > > > > index c8e63bd23300..67d02307779b 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > > > @@ -13,7 +13,7 @@
> > > > > #include "xe_drm_client.h"
> > > > > #include "xe_exec_queue.h"
> > > > > #include "xe_gt.h"
> > > > > -#include "xe_gt_tlb_invalidation.h"
> > > > > +#include "xe_gt_tlb_inval_job.h"
> > > > > #include "xe_migrate.h"
> > > > > #include "xe_pt_types.h"
> > > > > #include "xe_pt_walk.h"
> > > > > @@ -1261,6 +1261,8 @@ static int op_add_deps(struct xe_vm
> > > > > *vm,
> > > > > struct
> > > > > xe_vma_op *op,
> > > > > }
> > > > >
> > > > > static int xe_pt_vm_dependencies(struct xe_sched_job *job,
> > > > > + struct xe_gt_tlb_inval_job
> > > > > *ijob,
> > > > > + struct xe_gt_tlb_inval_job
> > > > > *mjob,
> > > > > struct xe_vm *vm,
> > > > > struct xe_vma_ops *vops,
> > > > > struct
> > > > > xe_vm_pgtable_update_ops
> > > > > *pt_update_ops,
> > > > > @@ -1328,6 +1330,20 @@ static int
> > > > > xe_pt_vm_dependencies(struct
> > > > > xe_sched_job *job,
> > > > > for (i = 0; job && !err && i < vops->num_syncs; i++)
> > > > > err = xe_sync_entry_add_deps(&vops->syncs[i],
> > > > > job);
> > > > >
> > > > > + if (job) {
> > > > > + if (ijob) {
> > > > > + err =
> > > > > xe_gt_tlb_inval_job_alloc_dep(ijob);
> > > > > + if (err)
> > > > > + return err;
> > > > > + }
> > > > > +
> > > > > + if (mjob) {
> > > > > + err =
> > > > > xe_gt_tlb_inval_job_alloc_dep(mjob);
> > > > > + if (err)
> > > > > + return err;
> > > > > + }
> > > > > + }
> > > > > +
> > > > > return err;
> > > > > }
> > > > >
> > > > > @@ -1339,7 +1355,8 @@ static int xe_pt_pre_commit(struct
> > > > > xe_migrate_pt_update *pt_update)
> > > > > struct xe_vm_pgtable_update_ops *pt_update_ops =
> > > > > &vops->pt_update_ops[pt_update->tile_id];
> > > > >
> > > > > - return xe_pt_vm_dependencies(pt_update->job, vm,
> > > > > pt_update-
> > > > > > vops,
> > > > > + return xe_pt_vm_dependencies(pt_update->job,
> > > > > pt_update-
> > > > > > ijob,
> > > > > + pt_update->mjob, vm,
> > > > > pt_update-
> > > > > > vops,
> > > > > pt_update_ops, rftree);
> > > > > }
> > > > >
> > > > > @@ -1509,75 +1526,6 @@ static int xe_pt_svm_pre_commit(struct
> > > > > xe_migrate_pt_update *pt_update)
> > > > > }
> > > > > #endif
> > > > >
> > > > > -struct invalidation_fence {
> > > > > - struct xe_gt_tlb_invalidation_fence base;
> > > > > - struct xe_gt *gt;
> > > > > - struct dma_fence *fence;
> > > > > - struct dma_fence_cb cb;
> > > > > - struct work_struct work;
> > > > > - u64 start;
> > > > > - u64 end;
> > > > > - u32 asid;
> > > > > -};
> > > > > -
> > > > > -static void invalidation_fence_cb(struct dma_fence *fence,
> > > > > - struct dma_fence_cb *cb)
> > > > > -{
> > > > > - struct invalidation_fence *ifence =
> > > > > - container_of(cb, struct invalidation_fence,
> > > > > cb);
> > > > > - struct xe_device *xe = gt_to_xe(ifence->gt);
> > > > > -
> > > > > - trace_xe_gt_tlb_invalidation_fence_cb(xe, &ifence-
> > > > > >base);
> > > > > - if (!ifence->fence->error) {
> > > > > - queue_work(system_wq, &ifence->work);
> > > > > - } else {
> > > > > - ifence->base.base.error = ifence->fence-
> > > > > >error;
> > > > > - xe_gt_tlb_invalidation_fence_signal(&ifence-
> > > > > > base);
> > > > > - }
> > > > > - dma_fence_put(ifence->fence);
> > > > > -}
> > > > > -
> > > > > -static void invalidation_fence_work_func(struct work_struct
> > > > > *w)
> > > > > -{
> > > > > - struct invalidation_fence *ifence =
> > > > > - container_of(w, struct invalidation_fence,
> > > > > work);
> > > > > - struct xe_device *xe = gt_to_xe(ifence->gt);
> > > > > -
> > > > > - trace_xe_gt_tlb_invalidation_fence_work_func(xe,
> > > > > &ifence-
> > > > > > base);
> > > > > - xe_gt_tlb_invalidation_range(ifence->gt, &ifence-
> > > > > >base,
> > > > > ifence->start,
> > > > > - ifence->end, ifence-
> > > > > >asid);
> > > > > -}
> > > > > -
> > > > > -static void invalidation_fence_init(struct xe_gt *gt,
> > > > > - struct invalidation_fence
> > > > > *ifence,
> > > > > - struct dma_fence *fence,
> > > > > - u64 start, u64 end, u32
> > > > > asid)
> > > > > -{
> > > > > - int ret;
> > > > > -
> > > > > -
> > > > > trace_xe_gt_tlb_invalidation_fence_create(gt_to_xe(gt),
> > > > > &ifence->base);
> > > > > -
> > > > > - xe_gt_tlb_invalidation_fence_init(gt, &ifence->base,
> > > > > false);
> > > > > -
> > > > > - ifence->fence = fence;
> > > > > - ifence->gt = gt;
> > > > > - ifence->start = start;
> > > > > - ifence->end = end;
> > > > > - ifence->asid = asid;
> > > > > -
> > > > > - INIT_WORK(&ifence->work,
> > > > > invalidation_fence_work_func);
> > > > > - ret = dma_fence_add_callback(fence, &ifence->cb,
> > > > > invalidation_fence_cb);
> > > > > - if (ret == -ENOENT) {
> > > > > - dma_fence_put(ifence->fence); /* Usually
> > > > > dropped in
> > > > > CB */
> > > > > - invalidation_fence_work_func(&ifence->work);
> > > > > - } else if (ret) {
> > > > > - dma_fence_put(&ifence->base.base); /*
> > > > > Caller
> > > > > ref
> > > > > */
> > > > > - dma_fence_put(&ifence->base.base); /*
> > > > > Creation
> > > > > ref */
> > > > > - }
> > > > > -
> > > > > - xe_gt_assert(gt, !ret || ret == -ENOENT);
> > > > > -}
> > > > > -
> > > > > struct xe_pt_stage_unbind_walk {
> > > > > /** @base: The pagewalk base-class. */
> > > > > struct xe_pt_walk base;
> > > > > @@ -2407,8 +2355,8 @@ xe_pt_update_ops_run(struct xe_tile
> > > > > *tile,
> > > > > struct xe_vma_ops *vops)
> > > > > struct xe_vm *vm = vops->vm;
> > > > > struct xe_vm_pgtable_update_ops *pt_update_ops =
> > > > > &vops->pt_update_ops[tile->id];
> > > > > - struct dma_fence *fence;
> > > > > - struct invalidation_fence *ifence = NULL, *mfence =
> > > > > NULL;
> > > > > + struct dma_fence *fence, *ifence, *mfence;
> > > > > + struct xe_gt_tlb_inval_job *ijob = NULL, *mjob =
> > > > > NULL;
> > > > > struct dma_fence **fences = NULL;
> > > > > struct dma_fence_array *cf = NULL;
> > > > > struct xe_range_fence *rfence;
> > > > > @@ -2440,34 +2388,47 @@ xe_pt_update_ops_run(struct xe_tile
> > > > > *tile,
> > > > > struct xe_vma_ops *vops)
> > > > > #endif
> > > > >
> > > > > if (pt_update_ops->needs_invalidation) {
> > > > > - ifence = kzalloc(sizeof(*ifence),
> > > > > GFP_KERNEL);
> > > > > - if (!ifence) {
> > > > > - err = -ENOMEM;
> > > > > + ijob =
> > > > > xe_gt_tlb_inval_job_create(pt_update_ops-
> > > > > > q,
> > > > > + tile-
> > > > > > primary_gt,
> > > > > +
> > > > > pt_update_ops-
> > > > > > start,
> > > > > +
> > > > > pt_update_ops-
> > > > > > last,
> > > > > + vm-
> > > > > >usm.asid);
> > > > > +
> > > >
> > > > Remove extra line.
> > > >
> > >
> > > Sure.
> > >
> > > > > + if (IS_ERR(ijob)) {
> > > > > + err = PTR_ERR(ijob);
> > > > > goto kill_vm_tile1;
> > > > > }
> > > > > +
> > > > > if (tile->media_gt) {
> > > > > - mfence = kzalloc(sizeof(*ifence),
> > > >
> > > > I realize it's the same, but this should probably be
> > > > sizeof(*mfence).
> > > >
> > >
> > > Well it is getting removed.
> >
> > Heh, somehow I misread that...
> >
> > >
> > >
> > > > > GFP_KERNEL);
> > > > > - if (!mfence) {
> > > > > - err = -ENOMEM;
> > > > > - goto free_ifence;
> > > > > + mjob =
> > > > > xe_gt_tlb_inval_job_create(pt_update_ops->q,
> > > > > +
> > > > > tile-
> > > > > > media_gt,
> > > > > +
> > > > > pt_update_ops->start,
> > > > > +
> > > > > pt_update_ops->last,
> > > > > + vm-
> > > > > > usm.asid);
> > > > > + if (IS_ERR(mjob)) {
> > > > > + err = PTR_ERR(mjob);
> > > > > + goto free_ijob;
> > > >
> > > > I think this needs a little more granularity. In free_ijob
> > > > below,
> > > > we're
> > > > also doing a kfree for fences and cf, both of which at this
> > > > point
> > > > aren't yet allocated. I realize you aren't changing anything
> > > > really
> > > > here, but that looks wrong to me. Maybe we just haven't hit an
> > > > issue
> > > > here because we've never tested the ENOMEM case?
> > > >
> > >
> > > fences & cf are initialized to NULL, kfree on a NULL is a nop.
> > >
> > > Same with xe_gt_tlb_inval_job_put, skips on NULL or IS_ERR.
> >
> > Yeah I saw the job_put.. and true on the kfree... I guess we're
> > saving
> > a couple of labels at the bottom of the function, but it still
> > feels
> > wrong as we're depending on the underlying implementation, which
> > would
> > break if we changed that or wrapped it somehow.
> >
> > Anyway not a deal breaker, it's the same as it was before anyway.
> >
>
> I get what you are saying, but we do things like all over driver in
> particular with kfree, dma_fence_get/put that rely NULL being an
> acceptable argument. I'm not saying this is the best practice, just
> saying it is done everywhere in Xe / Linux. In either of cases, kfree
> or
> dma_fence_get/put, changed to not accept NULL the entire Linux kernel
> would explode so it is pretty safe to assume this will not be
> changing.
Yeah I get that and it makes sense. I do think in general we should aim
to do things right regardless of the rest of the kernel implementation
(within reason of course). But also like I had said, this was already
implemented this way, you're just changing the pointers around a bit.
No worries here for now. Maybe we can look at refactoring some of this
some time down the road when there's time.
Thanks,
Stuart
>
> Matt
>
> > >
> > > > > }
> > > > > fences = kmalloc_array(2,
> > > > > sizeof(*fences),
> > > > > GFP_KERNEL);
> > > > > if (!fences) {
> > > > > err = -ENOMEM;
> > > > > - goto free_ifence;
> > > > > + goto free_ijob;
> > > > > }
> > > > > cf = dma_fence_array_alloc(2);
> > > > > if (!cf) {
> > > > > err = -ENOMEM;
> > > > > - goto free_ifence;
> > > > > + goto free_ijob;
> > > > > }
> > > > > }
> > > > > +
> > > > > + update.ijob = ijob;
> > > > > + update.mjob = mjob;
> > > >
> > > > Is there a reason not to put these inline above where the ijob
> > > > and
> > > > mjob
> > > > are allocated? That way if we moved this to a loop eventually
> > > > (not
> > > > here, in a future patch) we could more easily reduce the number
> > > > of
> > > > indentations here by doing:
> > >
> > > Sure, can move.
> > >
> > > > if (!media_gt)
> > > > continue;
> > > >
> > > > > }
> > > > >
> > > > > rfence = kzalloc(sizeof(*rfence), GFP_KERNEL);
> > > > > if (!rfence) {
> > > > > err = -ENOMEM;
> > > > > - goto free_ifence;
> > > > > + goto free_ijob;
> > > > > }
> > > > >
> > > > > fence = xe_migrate_update_pgtables(tile->migrate,
> > > > > &update);
> > > > > @@ -2491,30 +2452,30 @@ xe_pt_update_ops_run(struct xe_tile
> > > > > *tile,
> > > > > struct xe_vma_ops *vops)
> > > > > pt_update_ops->last,
> > > > > fence))
> > > > > dma_fence_wait(fence, false);
> > > > >
> > > > > - /* tlb invalidation must be done before signaling
> > > > > rebind
> > > > > */
> > > >
> > > > Why drop the comment?
> > > >
> > >
> > > Let me pull that back in.
> >
> > Yeah so other than those two (comment here and line break above),
> > the
> > rest looks good to me. I like the job based approach rather than
> > embedding like we used to have, thanks for the change Matt!
> >
> > Reviewed-by: Stuart Summers <stuart.summers@intel.com>
> >
> > >
> > > Matt
> > >
> > > > Thanks,
> > > > Stuart
> > > >
> > > > > - if (ifence) {
> > > > > - if (mfence)
> > > > > - dma_fence_get(fence);
> > > > > - invalidation_fence_init(tile->primary_gt,
> > > > > ifence,
> > > > > fence,
> > > > > - pt_update_ops->start,
> > > > > - pt_update_ops->last,
> > > > > vm-
> > > > > > usm.asid);
> > > > > - if (mfence) {
> > > > > - invalidation_fence_init(tile-
> > > > > >media_gt,
> > > > > mfence, fence,
> > > > > -
> > > > > pt_update_ops-
> > > > > > start,
> > > > > -
> > > > > pt_update_ops-
> > > > > > last,
> > > > > vm->usm.asid);
> > > > > - fences[0] = &ifence->base.base;
> > > > > - fences[1] = &mfence->base.base;
> > > > > + if (ijob) {
> > > > > + struct dma_fence *__fence;
> > > > > +
> > > > > + ifence = xe_gt_tlb_inval_job_push(ijob, tile-
> > > > > > migrate, fence);
> > > > > + __fence = ifence;
> > > > > +
> > > > > + if (mjob) {
> > > > > + fences[0] = ifence;
> > > > > + mfence =
> > > > > xe_gt_tlb_inval_job_push(mjob,
> > > > > tile-
> > > > > > migrate,
> > > > > +
> > > > > fence);
> > > > > + fences[1] = mfence;
> > > > > +
> > > > > dma_fence_array_init(cf, 2, fences,
> > > > > vm-
> > > > > > composite_fence_ctx,
> > > > > vm-
> > > > > > composite_fence_seqno++,
> > > > > false);
> > > > > - fence = &cf->base;
> > > > > - } else {
> > > > > - fence = &ifence->base.base;
> > > > > + __fence = &cf->base;
> > > > > }
> > > > > +
> > > > > + dma_fence_put(fence);
> > > > > + fence = __fence;
> > > > > }
> > > > >
> > > > > - if (!mfence) {
> > > > > + if (!mjob) {
> > > > > dma_resv_add_fence(xe_vm_resv(vm), fence,
> > > > > pt_update_ops-
> > > > > > wait_vm_bookkeep ?
> > > > > DMA_RESV_USAGE_KERNEL :
> > > > > @@ -2523,19 +2484,19 @@ xe_pt_update_ops_run(struct xe_tile
> > > > > *tile,
> > > > > struct xe_vma_ops *vops)
> > > > > list_for_each_entry(op, &vops->list, link)
> > > > > op_commit(vops->vm, tile,
> > > > > pt_update_ops,
> > > > > op,
> > > > > fence, NULL);
> > > > > } else {
> > > > > - dma_resv_add_fence(xe_vm_resv(vm), &ifence-
> > > > > > base.base,
> > > > > + dma_resv_add_fence(xe_vm_resv(vm), ifence,
> > > > > pt_update_ops-
> > > > > > wait_vm_bookkeep ?
> > > > > DMA_RESV_USAGE_KERNEL :
> > > > > DMA_RESV_USAGE_BOOKKEEP);
> > > > >
> > > > > - dma_resv_add_fence(xe_vm_resv(vm), &mfence-
> > > > > > base.base,
> > > > > + dma_resv_add_fence(xe_vm_resv(vm), mfence,
> > > > > pt_update_ops-
> > > > > > wait_vm_bookkeep ?
> > > > > DMA_RESV_USAGE_KERNEL :
> > > > > DMA_RESV_USAGE_BOOKKEEP);
> > > > >
> > > > > list_for_each_entry(op, &vops->list, link)
> > > > > - op_commit(vops->vm, tile,
> > > > > pt_update_ops,
> > > > > op,
> > > > > - &ifence->base.base,
> > > > > &mfence-
> > > > > > base.base);
> > > > > + op_commit(vops->vm, tile,
> > > > > pt_update_ops,
> > > > > op,
> > > > > ifence,
> > > > > + mfence);
> > > > > }
> > > > >
> > > > > if (pt_update_ops->needs_svm_lock)
> > > > > @@ -2543,15 +2504,18 @@ xe_pt_update_ops_run(struct xe_tile
> > > > > *tile,
> > > > > struct xe_vma_ops *vops)
> > > > > if (pt_update_ops->needs_userptr_lock)
> > > > > up_read(&vm->userptr.notifier_lock);
> > > > >
> > > > > + xe_gt_tlb_inval_job_put(mjob);
> > > > > + xe_gt_tlb_inval_job_put(ijob);
> > > > > +
> > > > > return fence;
> > > > >
> > > > > free_rfence:
> > > > > kfree(rfence);
> > > > > -free_ifence:
> > > > > +free_ijob:
> > > > > kfree(cf);
> > > > > kfree(fences);
> > > > > - kfree(mfence);
> > > > > - kfree(ifence);
> > > > > + xe_gt_tlb_inval_job_put(mjob);
> > > > > + xe_gt_tlb_inval_job_put(ijob);
> > > > > kill_vm_tile1:
> > > > > if (err != -EAGAIN && err != -ENODATA && tile->id)
> > > > > xe_vm_kill(vops->vm, false);
> > > >
> >
^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2025-07-17 22:36 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-02 23:42 [PATCH v2 0/9] Use DRM scheduler for delayed GT TLB invalidations Matthew Brost
2025-07-02 23:42 ` [PATCH v2 1/9] drm/xe: Explicitly mark migration queues with flag Matthew Brost
2025-07-10 8:43 ` Francois Dugast
2025-07-11 21:20 ` Summers, Stuart
2025-07-02 23:42 ` [PATCH v2 2/9] drm/xe: Add generic dependecy jobs / scheduler Matthew Brost
2025-07-10 11:51 ` Francois Dugast
2025-07-10 17:38 ` Matthew Brost
2025-07-15 21:04 ` Summers, Stuart
2025-07-15 21:14 ` Matthew Brost
2025-07-15 21:13 ` Summers, Stuart
2025-07-15 22:43 ` Summers, Stuart
2025-07-15 22:48 ` Matthew Brost
2025-07-02 23:42 ` [PATCH v2 3/9] drm: Simplify drmm_alloc_ordered_workqueue return Matthew Brost
2025-07-16 1:10 ` Matthew Brost
2025-07-02 23:42 ` [PATCH v2 4/9] drm/xe: Create ordered workqueue for GT TLB invalidation jobs Matthew Brost
2025-07-17 19:55 ` Summers, Stuart
2025-07-17 19:59 ` Matthew Brost
2025-07-02 23:42 ` [PATCH v2 5/9] drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues Matthew Brost
2025-07-15 21:34 ` Summers, Stuart
2025-07-15 21:44 ` Matthew Brost
2025-07-15 21:45 ` Summers, Stuart
2025-07-15 21:52 ` Matthew Brost
2025-07-15 21:53 ` Summers, Stuart
2025-07-15 22:01 ` Matthew Brost
2025-07-15 22:49 ` Summers, Stuart
2025-07-02 23:42 ` [PATCH v2 6/9] drm/xe: Add xe_migrate_job_lock/unlock helpers Matthew Brost
2025-07-15 22:48 ` Summers, Stuart
2025-07-16 1:11 ` Matthew Brost
2025-07-02 23:42 ` [PATCH v2 7/9] drm/xe: Add GT TLB invalidation jobs Matthew Brost
2025-07-15 23:09 ` Summers, Stuart
2025-07-16 1:08 ` Matthew Brost
2025-07-17 15:58 ` Summers, Stuart
2025-07-02 23:42 ` [PATCH v2 8/9] drm/xe: Use GT TLB invalidation jobs in PT layer Matthew Brost
2025-07-17 21:00 ` Summers, Stuart
2025-07-17 21:07 ` Matthew Brost
2025-07-17 22:26 ` Summers, Stuart
2025-07-17 22:35 ` Matthew Brost
2025-07-17 22:36 ` Summers, Stuart
2025-07-02 23:42 ` [PATCH v2 9/9] drm/xe: Remove unused GT TLB invalidation trace points Matthew Brost
2025-07-11 21:13 ` Summers, Stuart
2025-07-03 0:45 ` ✗ CI.checkpatch: warning for Use DRM scheduler for delayed GT TLB invalidations (rev2) Patchwork
2025-07-03 0:46 ` ✓ CI.KUnit: success " Patchwork
2025-07-03 1:00 ` ✗ CI.checksparse: warning " Patchwork
2025-07-03 1:32 ` ✓ Xe.CI.BAT: success " Patchwork
2025-07-04 18:11 ` ✗ Xe.CI.Full: failure " Patchwork
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).