[Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe
@ 2023-09-12  2:16 Matthew Brost
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 01/13] drm/sched: Add drm_sched_submit_* helpers Matthew Brost
                   ` (14 more replies)
  0 siblings, 15 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-12  2:16 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, christian.koenig, faith.ekstrand

As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we
have been asked to merge our common DRM scheduler patches first.

This a continuation of a RFC [3] with all comments addressed, ready for
a full review, and hopefully in state which can merged in the near
future. More details of this series can found in the cover letter of the
RFC [3].

These changes have been tested with the Xe driver.

v2:
 - Break run job, free job, and process message in own work items
 - This might break other drivers as run job and free job now can run in
   parallel, can fix up if needed

v3:
 - Include missing patch 'drm/sched: Add drm_sched_submit_* helpers'
 - Fix issue with setting timestamp to early
 - Don't dequeue jobs for single entity after calling entity fini
 - Flush pending jobs on entity fini
 - Add documentation for entity teardown
 - Add Matthew Brost to maintainers of DRM scheduler

Matt

[1] https://gitlab.freedesktop.org/drm/xe/kernel
[2] https://patchwork.freedesktop.org/series/112188/
[3] https://patchwork.freedesktop.org/series/116055/

Matthew Brost (13):
  drm/sched: Add drm_sched_submit_* helpers
  drm/sched: Convert drm scheduler to use a work queue rather than
    kthread
  drm/sched: Move schedule policy to scheduler / entity
  drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
  drm/sched: Split free_job into own work item
  drm/sched: Add generic scheduler message interface
  drm/sched: Add drm_sched_start_timeout_unlocked helper
  drm/sched: Start run wq before TDR in drm_sched_start
  drm/sched: Submit job before starting TDR
  drm/sched: Add helper to set TDR timeout
  drm/sched: Waiting for pending jobs to complete in scheduler kill
  drm/sched/doc: Add Entity teardown documentaion
  drm/sched: Update maintainers of GPU scheduler

 Documentation/gpu/drm-mm.rst                  |   6 +
 MAINTAINERS                                   |   1 +
 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  17 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  15 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c       |   5 +-
 drivers/gpu/drm/lima/lima_sched.c             |   5 +-
 drivers/gpu/drm/msm/adreno/adreno_device.c    |   6 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c          |   5 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c       |   5 +-
 drivers/gpu/drm/panfrost/panfrost_job.c       |   5 +-
 drivers/gpu/drm/scheduler/sched_entity.c      | 111 +++-
 drivers/gpu/drm/scheduler/sched_fence.c       |   2 +-
 drivers/gpu/drm/scheduler/sched_main.c        | 497 ++++++++++++++----
 drivers/gpu/drm/v3d/v3d_sched.c               |  25 +-
 include/drm/gpu_scheduler.h                   |  96 +++-
 16 files changed, 644 insertions(+), 159 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-xe] [PATCH v3 01/13] drm/sched: Add drm_sched_submit_* helpers
  2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
@ 2023-09-12  2:16 ` Matthew Brost
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-12  2:16 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, christian.koenig, faith.ekstrand

Add scheduler submit ready, stop, and start helpers to hide the
implementation details of the scheduler from the drivers.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   | 15 +++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 12 +++---
 drivers/gpu/drm/msm/adreno/adreno_device.c    |  6 ++-
 drivers/gpu/drm/scheduler/sched_main.c        | 40 ++++++++++++++++++-
 include/drm/gpu_scheduler.h                   |  3 ++
 6 files changed, 60 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 625db444df1c..36a1accbc846 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -290,7 +290,7 @@ static int suspend_resume_compute_scheduler(struct amdgpu_device *adev, bool sus
 	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
 		struct amdgpu_ring *ring = &adev->gfx.compute_ring[i];
 
-		if (!(ring && ring->sched.thread))
+		if (!(ring && drm_sched_submit_ready(&ring->sched)))
 			continue;
 
 		/* stop secheduler and drain ring. */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index a4faea4fa0b5..fb5dad687168 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1659,9 +1659,9 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
 	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_submit_ready(&ring->sched))
 			continue;
-		kthread_park(ring->sched.thread);
+		drm_sched_submit_stop(&ring->sched);
 	}
 
 	seq_puts(m, "run ib test:\n");
@@ -1675,9 +1675,9 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
 	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_submit_ready(&ring->sched))
 			continue;
-		kthread_unpark(ring->sched.thread);
+		drm_sched_submit_start(&ring->sched);
 	}
 
 	up_write(&adev->reset_domain->sem);
@@ -1897,7 +1897,8 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 
 	ring = adev->rings[val];
 
-	if (!ring || !ring->funcs->preempt_ib || !ring->sched.thread)
+	if (!ring || !ring->funcs->preempt_ib ||
+	    !drm_sched_submit_ready(&ring->sched))
 		return -EINVAL;
 
 	/* the last preemption failed */
@@ -1915,7 +1916,7 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 		goto pro_end;
 
 	/* stop the scheduler */
-	kthread_park(ring->sched.thread);
+	drm_sched_submit_stop(&ring->sched);
 
 	/* preempt the IB */
 	r = amdgpu_ring_preempt_ib(ring);
@@ -1949,7 +1950,7 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 
 failure:
 	/* restart the scheduler */
-	kthread_unpark(ring->sched.thread);
+	drm_sched_submit_start(&ring->sched);
 
 	up_read(&adev->reset_domain->sem);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 3f001a50b34a..1f8a794704d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4614,7 +4614,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev)
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_submit_ready(&ring->sched))
 			continue;
 
 		spin_lock(&ring->sched.job_list_lock);
@@ -4753,7 +4753,7 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_submit_ready(&ring->sched))
 			continue;
 
 		/* Clear job fence from fence drv to avoid force_completion
@@ -5292,7 +5292,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 		for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 			struct amdgpu_ring *ring = tmp_adev->rings[i];
 
-			if (!ring || !ring->sched.thread)
+			if (!ring || !drm_sched_submit_ready(&ring->sched))
 				continue;
 
 			drm_sched_stop(&ring->sched, job ? &job->base : NULL);
@@ -5367,7 +5367,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 		for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 			struct amdgpu_ring *ring = tmp_adev->rings[i];
 
-			if (!ring || !ring->sched.thread)
+			if (!ring || !drm_sched_submit_ready(&ring->sched))
 				continue;
 
 			drm_sched_start(&ring->sched, true);
@@ -5693,7 +5693,7 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
 		for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 			struct amdgpu_ring *ring = adev->rings[i];
 
-			if (!ring || !ring->sched.thread)
+			if (!ring || !drm_sched_submit_ready(&ring->sched))
 				continue;
 
 			drm_sched_stop(&ring->sched, NULL);
@@ -5821,7 +5821,7 @@ void amdgpu_pci_resume(struct pci_dev *pdev)
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_submit_ready(&ring->sched))
 			continue;
 
 		drm_sched_start(&ring->sched, true);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index fa527935ffd4..e046dc5ff72a 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -809,7 +809,8 @@ static void suspend_scheduler(struct msm_gpu *gpu)
 	 */
 	for (i = 0; i < gpu->nr_rings; i++) {
 		struct drm_gpu_scheduler *sched = &gpu->rb[i]->sched;
-		kthread_park(sched->thread);
+
+		drm_sched_submit_stop(sched);
 	}
 }
 
@@ -819,7 +820,8 @@ static void resume_scheduler(struct msm_gpu *gpu)
 
 	for (i = 0; i < gpu->nr_rings; i++) {
 		struct drm_gpu_scheduler *sched = &gpu->rb[i]->sched;
-		kthread_unpark(sched->thread);
+
+		drm_sched_submit_start(sched);
 	}
 }
 
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 506371c42745..e4fa62abca41 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -439,7 +439,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 {
 	struct drm_sched_job *s_job, *tmp;
 
-	kthread_park(sched->thread);
+	drm_sched_submit_stop(sched);
 
 	/*
 	 * Reinsert back the bad job here - now it's safe as
@@ -552,7 +552,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
 		spin_unlock(&sched->job_list_lock);
 	}
 
-	kthread_unpark(sched->thread);
+	drm_sched_submit_start(sched);
 }
 EXPORT_SYMBOL(drm_sched_start);
 
@@ -1206,3 +1206,39 @@ void drm_sched_increase_karma(struct drm_sched_job *bad)
 	}
 }
 EXPORT_SYMBOL(drm_sched_increase_karma);
+
+/**
+ * drm_sched_submit_ready - scheduler ready for submission
+ *
+ * @sched: scheduler instance
+ *
+ * Returns true if submission is ready
+ */
+bool drm_sched_submit_ready(struct drm_gpu_scheduler *sched)
+{
+	return !!sched->thread;
+
+}
+EXPORT_SYMBOL(drm_sched_submit_ready);
+
+/**
+ * drm_sched_submit_stop - stop scheduler submission
+ *
+ * @sched: scheduler instance
+ */
+void drm_sched_submit_stop(struct drm_gpu_scheduler *sched)
+{
+	kthread_park(sched->thread);
+}
+EXPORT_SYMBOL(drm_sched_submit_stop);
+
+/**
+ * drm_sched_submit_start - start scheduler submission
+ *
+ * @sched: scheduler instance
+ */
+void drm_sched_submit_start(struct drm_gpu_scheduler *sched)
+{
+	kthread_unpark(sched->thread);
+}
+EXPORT_SYMBOL(drm_sched_submit_start);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index f9544d9b670d..f12c5aea5294 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -550,6 +550,9 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 
 void drm_sched_job_cleanup(struct drm_sched_job *job);
 void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched);
+bool drm_sched_submit_ready(struct drm_gpu_scheduler *sched);
+void drm_sched_submit_stop(struct drm_gpu_scheduler *sched);
+void drm_sched_submit_start(struct drm_gpu_scheduler *sched);
 void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad);
 void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery);
 void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-xe] [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread
  2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 01/13] drm/sched: Add drm_sched_submit_* helpers Matthew Brost
@ 2023-09-12  2:16 ` Matthew Brost
  2023-09-12  7:29   ` Boris Brezillon
                     ` (2 more replies)
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity Matthew Brost
                   ` (12 subsequent siblings)
  14 siblings, 3 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-12  2:16 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, christian.koenig, faith.ekstrand

In XE, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.

1. In XE the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in XE we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.

2. In XE submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.

A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.

v2:
  - (Rob Clark) Fix msm build
  - Pass in run work queue
v3:
  - (Boris) don't have loop in worker
v4:
  - (Tvrtko) break out submit ready, stop, start helpers into own patch

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
 drivers/gpu/drm/lima/lima_sched.c          |   2 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c       |   2 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c    |   2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
 drivers/gpu/drm/scheduler/sched_main.c     | 106 +++++++++------------
 drivers/gpu/drm/v3d/v3d_sched.c            |  10 +-
 include/drm/gpu_scheduler.h                |  12 ++-
 9 files changed, 65 insertions(+), 75 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1f8a794704d0..c83a76bccc1d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2305,7 +2305,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
 			break;
 		}
 
-		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
+		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL,
 				   ring->num_hw_submission, 0,
 				   timeout, adev->reset_domain->wq,
 				   ring->sched_score, ring->name,
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 345fec6cb1a4..618a804ddc34 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -134,7 +134,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
 {
 	int ret;
 
-	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
+	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
 			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
 			     msecs_to_jiffies(500), NULL, NULL,
 			     dev_name(gpu->dev), gpu->dev);
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index ffd91a5ee299..8d858aed0e56 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
 
 	INIT_WORK(&pipe->recover_work, lima_sched_recover_work);
 
-	return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
+	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1,
 			      lima_job_hang_limit,
 			      msecs_to_jiffies(timeout), NULL,
 			      NULL, name, pipe->ldev->dev);
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 40c0bc35a44c..b8865e61b40f 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -94,7 +94,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
 	 /* currently managing hangcheck ourselves: */
 	sched_timeout = MAX_SCHEDULE_TIMEOUT;
 
-	ret = drm_sched_init(&ring->sched, &msm_sched_ops,
+	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
 			num_hw_submissions, 0, sched_timeout,
 			NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
 	if (ret) {
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
index 88217185e0f3..d458c2227d4f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sched.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
@@ -429,7 +429,7 @@ int nouveau_sched_init(struct nouveau_drm *drm)
 	if (!drm->sched_wq)
 		return -ENOMEM;
 
-	return drm_sched_init(sched, &nouveau_sched_ops,
+	return drm_sched_init(sched, &nouveau_sched_ops, NULL,
 			      NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
 			      NULL, NULL, "nouveau_sched", drm->dev->dev);
 }
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 033f5e684707..326ca1ddf1d7 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -831,7 +831,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
 		js->queue[j].fence_context = dma_fence_context_alloc(1);
 
 		ret = drm_sched_init(&js->queue[j].sched,
-				     &panfrost_sched_ops,
+				     &panfrost_sched_ops, NULL,
 				     nentries, 0,
 				     msecs_to_jiffies(JOB_TIMEOUT_MS),
 				     pfdev->reset.wq,
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index e4fa62abca41..614e8c97a622 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -48,7 +48,6 @@
  * through the jobs entity pointer.
  */
 
-#include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/sched.h>
 #include <linux/completion.h>
@@ -256,6 +255,16 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
 	return rb ? rb_entry(rb, struct drm_sched_entity, rb_tree_node) : NULL;
 }
 
+/**
+ * drm_sched_submit_queue - scheduler queue submission
+ * @sched: scheduler instance
+ */
+static void drm_sched_submit_queue(struct drm_gpu_scheduler *sched)
+{
+	if (!READ_ONCE(sched->pause_submit))
+		queue_work(sched->submit_wq, &sched->work_submit);
+}
+
 /**
  * drm_sched_job_done - complete a job
  * @s_job: pointer to the job which is done
@@ -275,7 +284,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
 	dma_fence_get(&s_fence->finished);
 	drm_sched_fence_finished(s_fence, result);
 	dma_fence_put(&s_fence->finished);
-	wake_up_interruptible(&sched->wake_up_worker);
+	drm_sched_submit_queue(sched);
 }
 
 /**
@@ -868,7 +877,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
 void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched)
 {
 	if (drm_sched_can_queue(sched))
-		wake_up_interruptible(&sched->wake_up_worker);
+		drm_sched_submit_queue(sched);
 }
 
 /**
@@ -978,61 +987,42 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 }
 EXPORT_SYMBOL(drm_sched_pick_best);
 
-/**
- * drm_sched_blocked - check if the scheduler is blocked
- *
- * @sched: scheduler instance
- *
- * Returns true if blocked, otherwise false.
- */
-static bool drm_sched_blocked(struct drm_gpu_scheduler *sched)
-{
-	if (kthread_should_park()) {
-		kthread_parkme();
-		return true;
-	}
-
-	return false;
-}
-
 /**
  * drm_sched_main - main scheduler thread
  *
  * @param: scheduler instance
- *
- * Returns 0.
  */
-static int drm_sched_main(void *param)
+static void drm_sched_main(struct work_struct *w)
 {
-	struct drm_gpu_scheduler *sched = (struct drm_gpu_scheduler *)param;
+	struct drm_gpu_scheduler *sched =
+		container_of(w, struct drm_gpu_scheduler, work_submit);
+	struct drm_sched_entity *entity;
+	struct drm_sched_job *cleanup_job;
 	int r;
 
-	sched_set_fifo_low(current);
+	if (READ_ONCE(sched->pause_submit))
+		return;
 
-	while (!kthread_should_stop()) {
-		struct drm_sched_entity *entity = NULL;
-		struct drm_sched_fence *s_fence;
-		struct drm_sched_job *sched_job;
-		struct dma_fence *fence;
-		struct drm_sched_job *cleanup_job = NULL;
+	cleanup_job = drm_sched_get_cleanup_job(sched);
+	entity = drm_sched_select_entity(sched);
 
-		wait_event_interruptible(sched->wake_up_worker,
-					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
-					 (!drm_sched_blocked(sched) &&
-					  (entity = drm_sched_select_entity(sched))) ||
-					 kthread_should_stop());
+	if (!entity && !cleanup_job)
+		return;	/* No more work */
 
-		if (cleanup_job)
-			sched->ops->free_job(cleanup_job);
+	if (cleanup_job)
+		sched->ops->free_job(cleanup_job);
 
-		if (!entity)
-			continue;
+	if (entity) {
+		struct dma_fence *fence;
+		struct drm_sched_fence *s_fence;
+		struct drm_sched_job *sched_job;
 
 		sched_job = drm_sched_entity_pop_job(entity);
-
 		if (!sched_job) {
 			complete_all(&entity->entity_idle);
-			continue;
+			if (!cleanup_job)
+				return;	/* No more work */
+			goto again;
 		}
 
 		s_fence = sched_job->s_fence;
@@ -1063,7 +1053,9 @@ static int drm_sched_main(void *param)
 
 		wake_up(&sched->job_scheduled);
 	}
-	return 0;
+
+again:
+	drm_sched_submit_queue(sched);
 }
 
 /**
@@ -1071,6 +1063,7 @@ static int drm_sched_main(void *param)
  *
  * @sched: scheduler instance
  * @ops: backend operations for this scheduler
+ * @submit_wq: workqueue to use for submission. If NULL, the system_wq is used
  * @hw_submission: number of hw submissions that can be in flight
  * @hang_limit: number of times to allow a job to hang before dropping it
  * @timeout: timeout value in jiffies for the scheduler
@@ -1084,14 +1077,16 @@ static int drm_sched_main(void *param)
  */
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
+		   struct workqueue_struct *submit_wq,
 		   unsigned hw_submission, unsigned hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
 		   atomic_t *score, const char *name, struct device *dev)
 {
-	int i, ret;
+	int i;
 	sched->ops = ops;
 	sched->hw_submission_limit = hw_submission;
 	sched->name = name;
+	sched->submit_wq = submit_wq ? : system_wq;
 	sched->timeout = timeout;
 	sched->timeout_wq = timeout_wq ? : system_wq;
 	sched->hang_limit = hang_limit;
@@ -1100,23 +1095,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
 		drm_sched_rq_init(sched, &sched->sched_rq[i]);
 
-	init_waitqueue_head(&sched->wake_up_worker);
 	init_waitqueue_head(&sched->job_scheduled);
 	INIT_LIST_HEAD(&sched->pending_list);
 	spin_lock_init(&sched->job_list_lock);
 	atomic_set(&sched->hw_rq_count, 0);
 	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
+	INIT_WORK(&sched->work_submit, drm_sched_main);
 	atomic_set(&sched->_score, 0);
 	atomic64_set(&sched->job_id_count, 0);
-
-	/* Each scheduler will run on a seperate kernel thread */
-	sched->thread = kthread_run(drm_sched_main, sched, sched->name);
-	if (IS_ERR(sched->thread)) {
-		ret = PTR_ERR(sched->thread);
-		sched->thread = NULL;
-		DRM_DEV_ERROR(sched->dev, "Failed to create scheduler for %s.\n", name);
-		return ret;
-	}
+	sched->pause_submit = false;
 
 	sched->ready = true;
 	return 0;
@@ -1135,8 +1122,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 	struct drm_sched_entity *s_entity;
 	int i;
 
-	if (sched->thread)
-		kthread_stop(sched->thread);
+	drm_sched_submit_stop(sched);
 
 	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
 		struct drm_sched_rq *rq = &sched->sched_rq[i];
@@ -1216,7 +1202,7 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
  */
 bool drm_sched_submit_ready(struct drm_gpu_scheduler *sched)
 {
-	return !!sched->thread;
+	return sched->ready;
 
 }
 EXPORT_SYMBOL(drm_sched_submit_ready);
@@ -1228,7 +1214,8 @@ EXPORT_SYMBOL(drm_sched_submit_ready);
  */
 void drm_sched_submit_stop(struct drm_gpu_scheduler *sched)
 {
-	kthread_park(sched->thread);
+	WRITE_ONCE(sched->pause_submit, true);
+	cancel_work_sync(&sched->work_submit);
 }
 EXPORT_SYMBOL(drm_sched_submit_stop);
 
@@ -1239,6 +1226,7 @@ EXPORT_SYMBOL(drm_sched_submit_stop);
  */
 void drm_sched_submit_start(struct drm_gpu_scheduler *sched)
 {
-	kthread_unpark(sched->thread);
+	WRITE_ONCE(sched->pause_submit, false);
+	queue_work(sched->submit_wq, &sched->work_submit);
 }
 EXPORT_SYMBOL(drm_sched_submit_start);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 06238e6d7f5c..38e092ea41e6 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -388,7 +388,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 	int ret;
 
 	ret = drm_sched_init(&v3d->queue[V3D_BIN].sched,
-			     &v3d_bin_sched_ops,
+			     &v3d_bin_sched_ops, NULL,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
 			     NULL, "v3d_bin", v3d->drm.dev);
@@ -396,7 +396,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 		return ret;
 
 	ret = drm_sched_init(&v3d->queue[V3D_RENDER].sched,
-			     &v3d_render_sched_ops,
+			     &v3d_render_sched_ops, NULL,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
 			     NULL, "v3d_render", v3d->drm.dev);
@@ -404,7 +404,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 		goto fail;
 
 	ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
-			     &v3d_tfu_sched_ops,
+			     &v3d_tfu_sched_ops, NULL,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
 			     NULL, "v3d_tfu", v3d->drm.dev);
@@ -413,7 +413,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 
 	if (v3d_has_csd(v3d)) {
 		ret = drm_sched_init(&v3d->queue[V3D_CSD].sched,
-				     &v3d_csd_sched_ops,
+				     &v3d_csd_sched_ops, NULL,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms), NULL,
 				     NULL, "v3d_csd", v3d->drm.dev);
@@ -421,7 +421,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 			goto fail;
 
 		ret = drm_sched_init(&v3d->queue[V3D_CACHE_CLEAN].sched,
-				     &v3d_cache_clean_sched_ops,
+				     &v3d_cache_clean_sched_ops, NULL,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms), NULL,
 				     NULL, "v3d_cache_clean", v3d->drm.dev);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index f12c5aea5294..278106e358a8 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -473,17 +473,16 @@ struct drm_sched_backend_ops {
  * @timeout: the time after which a job is removed from the scheduler.
  * @name: name of the ring for which this scheduler is being used.
  * @sched_rq: priority wise array of run queues.
- * @wake_up_worker: the wait queue on which the scheduler sleeps until a job
- *                  is ready to be scheduled.
  * @job_scheduled: once @drm_sched_entity_do_release is called the scheduler
  *                 waits on this wait queue until all the scheduled jobs are
  *                 finished.
  * @hw_rq_count: the number of jobs currently in the hardware queue.
  * @job_id_count: used to assign unique id to the each job.
+ * @submit_wq: workqueue used to queue @work_submit
  * @timeout_wq: workqueue used to queue @work_tdr
+ * @work_submit: schedules jobs and cleans up entities
  * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
  *            timeout interval is over.
- * @thread: the kthread on which the scheduler which run.
  * @pending_list: the list of jobs which are currently in the job queue.
  * @job_list_lock: lock to protect the pending_list.
  * @hang_limit: once the hangs by a job crosses this limit then it is marked
@@ -492,6 +491,7 @@ struct drm_sched_backend_ops {
  * @_score: score used when the driver doesn't provide one
  * @ready: marks if the underlying HW is ready to work
  * @free_guilty: A hit to time out handler to free the guilty job.
+ * @pause_submit: pause queuing of @work_submit on @submit_wq
  * @dev: system &struct device
  *
  * One scheduler is implemented for each hardware ring.
@@ -502,13 +502,13 @@ struct drm_gpu_scheduler {
 	long				timeout;
 	const char			*name;
 	struct drm_sched_rq		sched_rq[DRM_SCHED_PRIORITY_COUNT];
-	wait_queue_head_t		wake_up_worker;
 	wait_queue_head_t		job_scheduled;
 	atomic_t			hw_rq_count;
 	atomic64_t			job_id_count;
+	struct workqueue_struct		*submit_wq;
 	struct workqueue_struct		*timeout_wq;
+	struct work_struct		work_submit;
 	struct delayed_work		work_tdr;
-	struct task_struct		*thread;
 	struct list_head		pending_list;
 	spinlock_t			job_list_lock;
 	int				hang_limit;
@@ -516,11 +516,13 @@ struct drm_gpu_scheduler {
 	atomic_t                        _score;
 	bool				ready;
 	bool				free_guilty;
+	bool				pause_submit;
 	struct device			*dev;
 };
 
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
+		   struct workqueue_struct *submit_wq,
 		   uint32_t hw_submission, unsigned hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
 		   atomic_t *score, const char *name, struct device *dev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity
  2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 01/13] drm/sched: Add drm_sched_submit_* helpers Matthew Brost
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
@ 2023-09-12  2:16 ` Matthew Brost
  2023-09-12  7:37   ` Boris Brezillon
                     ` (2 more replies)
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 04/13] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
                   ` (11 subsequent siblings)
  14 siblings, 3 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-12  2:16 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, christian.koenig, faith.ekstrand

Rather than a global modparam for scheduling policy, move the scheduling
policy to scheduler / entity so user can control each scheduler / entity
policy.

v2:
  - s/DRM_SCHED_POLICY_MAX/DRM_SCHED_POLICY_COUNT (Luben)
  - Only include policy in scheduler (Luben)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
 drivers/gpu/drm/etnaviv/etnaviv_sched.c    |  3 ++-
 drivers/gpu/drm/lima/lima_sched.c          |  3 ++-
 drivers/gpu/drm/msm/msm_ringbuffer.c       |  3 ++-
 drivers/gpu/drm/nouveau/nouveau_sched.c    |  3 ++-
 drivers/gpu/drm/panfrost/panfrost_job.c    |  3 ++-
 drivers/gpu/drm/scheduler/sched_entity.c   | 24 ++++++++++++++++++----
 drivers/gpu/drm/scheduler/sched_main.c     | 23 +++++++++++++++------
 drivers/gpu/drm/v3d/v3d_sched.c            | 15 +++++++++-----
 include/drm/gpu_scheduler.h                | 20 ++++++++++++------
 10 files changed, 72 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c83a76bccc1d..ecb00991dd51 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2309,6 +2309,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
 				   ring->num_hw_submission, 0,
 				   timeout, adev->reset_domain->wq,
 				   ring->sched_score, ring->name,
+				   DRM_SCHED_POLICY_DEFAULT,
 				   adev->dev);
 		if (r) {
 			DRM_ERROR("Failed to create scheduler on ring %s.\n",
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 618a804ddc34..3646f995ca94 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -137,7 +137,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
 	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
 			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
 			     msecs_to_jiffies(500), NULL, NULL,
-			     dev_name(gpu->dev), gpu->dev);
+			     dev_name(gpu->dev), DRM_SCHED_POLICY_DEFAULT,
+			     gpu->dev);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index 8d858aed0e56..465d4bf3882b 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -491,7 +491,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
 	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1,
 			      lima_job_hang_limit,
 			      msecs_to_jiffies(timeout), NULL,
-			      NULL, name, pipe->ldev->dev);
+			      NULL, name, DRM_SCHED_POLICY_DEFAULT,
+			      pipe->ldev->dev);
 }
 
 void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index b8865e61b40f..f45e674a0aaf 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -96,7 +96,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
 
 	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
 			num_hw_submissions, 0, sched_timeout,
-			NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
+			NULL, NULL, to_msm_bo(ring->bo)->name,
+			DRM_SCHED_POLICY_DEFAULT, gpu->dev->dev);
 	if (ret) {
 		goto fail;
 	}
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
index d458c2227d4f..70e497e40c70 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sched.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
@@ -431,7 +431,8 @@ int nouveau_sched_init(struct nouveau_drm *drm)
 
 	return drm_sched_init(sched, &nouveau_sched_ops, NULL,
 			      NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
-			      NULL, NULL, "nouveau_sched", drm->dev->dev);
+			      NULL, NULL, "nouveau_sched",
+			      DRM_SCHED_POLICY_DEFAULT, drm->dev->dev);
 }
 
 void nouveau_sched_fini(struct nouveau_drm *drm)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 326ca1ddf1d7..ad36bf3a4699 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -835,7 +835,8 @@ int panfrost_job_init(struct panfrost_device *pfdev)
 				     nentries, 0,
 				     msecs_to_jiffies(JOB_TIMEOUT_MS),
 				     pfdev->reset.wq,
-				     NULL, "pan_js", pfdev->dev);
+				     NULL, "pan_js", DRM_SCHED_POLICY_DEFAULT,
+				     pfdev->dev);
 		if (ret) {
 			dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret);
 			goto err_sched;
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index a42763e1429d..65a972b52eda 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -33,6 +33,20 @@
 #define to_drm_sched_job(sched_job)		\
 		container_of((sched_job), struct drm_sched_job, queue_node)
 
+static bool bad_policies(struct drm_gpu_scheduler **sched_list,
+			 unsigned int num_sched_list)
+{
+	enum drm_sched_policy sched_policy = sched_list[0]->sched_policy;
+	unsigned int i;
+
+	/* All schedule policies must match */
+	for (i = 1; i < num_sched_list; ++i)
+		if (sched_policy != sched_list[i]->sched_policy)
+			return true;
+
+	return false;
+}
+
 /**
  * drm_sched_entity_init - Init a context entity used by scheduler when
  * submit to HW ring.
@@ -62,7 +76,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 			  unsigned int num_sched_list,
 			  atomic_t *guilty)
 {
-	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
+	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])) ||
+	    bad_policies(sched_list, num_sched_list))
 		return -EINVAL;
 
 	memset(entity, 0, sizeof(struct drm_sched_entity));
@@ -486,7 +501,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
 	 * Update the entity's location in the min heap according to
 	 * the timestamp of the next job, if any.
 	 */
-	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
+	if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO) {
 		struct drm_sched_job *next;
 
 		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
@@ -558,7 +573,8 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
 void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 {
 	struct drm_sched_entity *entity = sched_job->entity;
-	bool first;
+	bool first, fifo = entity->rq->sched->sched_policy ==
+		DRM_SCHED_POLICY_FIFO;
 	ktime_t submit_ts;
 
 	trace_drm_sched_job(sched_job, entity);
@@ -587,7 +603,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 		drm_sched_rq_add_entity(entity->rq, entity);
 		spin_unlock(&entity->rq_lock);
 
-		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+		if (fifo)
 			drm_sched_rq_update_fifo(entity, submit_ts);
 
 		drm_sched_wakeup_if_can_queue(entity->rq->sched);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 614e8c97a622..545d5298c086 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -66,14 +66,14 @@
 #define to_drm_sched_job(sched_job)		\
 		container_of((sched_job), struct drm_sched_job, queue_node)
 
-int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
+int default_drm_sched_policy = DRM_SCHED_POLICY_FIFO;
 
 /**
  * DOC: sched_policy (int)
  * Used to override default entities scheduling policy in a run queue.
  */
-MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
-module_param_named(sched_policy, drm_sched_policy, int, 0444);
+MODULE_PARM_DESC(sched_policy, "Specify the default scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
+module_param_named(sched_policy, default_drm_sched_policy, int, 0444);
 
 static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a,
 							    const struct rb_node *b)
@@ -177,7 +177,7 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
 	if (rq->current_entity == entity)
 		rq->current_entity = NULL;
 
-	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+	if (rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO)
 		drm_sched_rq_remove_fifo_locked(entity);
 
 	spin_unlock(&rq->lock);
@@ -898,7 +898,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
 
 	/* Kernel run queue has higher priority than normal run queue*/
 	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
-		entity = drm_sched_policy == DRM_SCHED_POLICY_FIFO ?
+		entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ?
 			drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]) :
 			drm_sched_rq_select_entity_rr(&sched->sched_rq[i]);
 		if (entity)
@@ -1071,6 +1071,7 @@ static void drm_sched_main(struct work_struct *w)
  *		used
  * @score: optional score atomic shared with other schedulers
  * @name: name used for debugging
+ * @sched_policy: schedule policy
  * @dev: target &struct device
  *
  * Return 0 on success, otherwise error code.
@@ -1080,9 +1081,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   struct workqueue_struct *submit_wq,
 		   unsigned hw_submission, unsigned hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
-		   atomic_t *score, const char *name, struct device *dev)
+		   atomic_t *score, const char *name,
+		   enum drm_sched_policy sched_policy,
+		   struct device *dev)
 {
 	int i;
+
+	if (sched_policy >= DRM_SCHED_POLICY_COUNT)
+		return -EINVAL;
+
 	sched->ops = ops;
 	sched->hw_submission_limit = hw_submission;
 	sched->name = name;
@@ -1092,6 +1099,10 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 	sched->hang_limit = hang_limit;
 	sched->score = score ? score : &sched->_score;
 	sched->dev = dev;
+	if (sched_policy == DRM_SCHED_POLICY_DEFAULT)
+		sched->sched_policy = default_drm_sched_policy;
+	else
+		sched->sched_policy = sched_policy;
 	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
 		drm_sched_rq_init(sched, &sched->sched_rq[i]);
 
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 38e092ea41e6..5e3fe77fa991 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -391,7 +391,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 			     &v3d_bin_sched_ops, NULL,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
-			     NULL, "v3d_bin", v3d->drm.dev);
+			     NULL, "v3d_bin", DRM_SCHED_POLICY_DEFAULT,
+			     v3d->drm.dev);
 	if (ret)
 		return ret;
 
@@ -399,7 +400,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 			     &v3d_render_sched_ops, NULL,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
-			     NULL, "v3d_render", v3d->drm.dev);
+			     ULL, "v3d_render", DRM_SCHED_POLICY_DEFAULT,
+			     v3d->drm.dev);
 	if (ret)
 		goto fail;
 
@@ -407,7 +409,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 			     &v3d_tfu_sched_ops, NULL,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
-			     NULL, "v3d_tfu", v3d->drm.dev);
+			     NULL, "v3d_tfu", DRM_SCHED_POLICY_DEFAULT,
+			     v3d->drm.dev);
 	if (ret)
 		goto fail;
 
@@ -416,7 +419,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 				     &v3d_csd_sched_ops, NULL,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms), NULL,
-				     NULL, "v3d_csd", v3d->drm.dev);
+				     NULL, "v3d_csd", DRM_SCHED_POLICY_DEFAULT,
+				     v3d->drm.dev);
 		if (ret)
 			goto fail;
 
@@ -424,7 +428,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 				     &v3d_cache_clean_sched_ops, NULL,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms), NULL,
-				     NULL, "v3d_cache_clean", v3d->drm.dev);
+				     NULL, "v3d_cache_clean",
+				     DRM_SCHED_POLICY_DEFAULT, v3d->drm.dev);
 		if (ret)
 			goto fail;
 	}
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 278106e358a8..897d52a4ff4f 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -72,11 +72,15 @@ enum drm_sched_priority {
 	DRM_SCHED_PRIORITY_UNSET = -2
 };
 
-/* Used to chose between FIFO and RR jobs scheduling */
-extern int drm_sched_policy;
-
-#define DRM_SCHED_POLICY_RR    0
-#define DRM_SCHED_POLICY_FIFO  1
+/* Used to chose default scheduling policy*/
+extern int default_drm_sched_policy;
+
+enum drm_sched_policy {
+	DRM_SCHED_POLICY_DEFAULT,
+	DRM_SCHED_POLICY_RR,
+	DRM_SCHED_POLICY_FIFO,
+	DRM_SCHED_POLICY_COUNT,
+};
 
 /**
  * struct drm_sched_entity - A wrapper around a job queue (typically
@@ -489,6 +493,7 @@ struct drm_sched_backend_ops {
  *              guilty and it will no longer be considered for scheduling.
  * @score: score to help loadbalancer pick a idle sched
  * @_score: score used when the driver doesn't provide one
+ * @sched_policy: Schedule policy for scheduler
  * @ready: marks if the underlying HW is ready to work
  * @free_guilty: A hit to time out handler to free the guilty job.
  * @pause_submit: pause queuing of @work_submit on @submit_wq
@@ -514,6 +519,7 @@ struct drm_gpu_scheduler {
 	int				hang_limit;
 	atomic_t                        *score;
 	atomic_t                        _score;
+	enum drm_sched_policy		sched_policy;
 	bool				ready;
 	bool				free_guilty;
 	bool				pause_submit;
@@ -525,7 +531,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   struct workqueue_struct *submit_wq,
 		   uint32_t hw_submission, unsigned hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
-		   atomic_t *score, const char *name, struct device *dev);
+		   atomic_t *score, const char *name,
+		   enum drm_sched_policy sched_policy,
+		   struct device *dev);
 
 void drm_sched_fini(struct drm_gpu_scheduler *sched);
 int drm_sched_job_init(struct drm_sched_job *job,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-xe] [PATCH v3 04/13] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
  2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
                   ` (2 preceding siblings ...)
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity Matthew Brost
@ 2023-09-12  2:16 ` Matthew Brost
  2023-09-13 12:30   ` kernel test robot
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 05/13] drm/sched: Split free_job into own work item Matthew Brost
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Matthew Brost @ 2023-09-12  2:16 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, christian.koenig, faith.ekstrand

DRM_SCHED_POLICY_SINGLE_ENTITY creates a 1 to 1 relationship between
scheduler and entity. No priorities or run queue used in this mode.
Intended for devices with firmware schedulers.

v2:
  - Drop sched / rq union (Luben)
v3:
  - Don't pick entity if stopped in drm_sched_select_entity (Danilo)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 69 ++++++++++++++++++------
 drivers/gpu/drm/scheduler/sched_fence.c  |  2 +-
 drivers/gpu/drm/scheduler/sched_main.c   | 64 +++++++++++++++++++---
 include/drm/gpu_scheduler.h              |  8 +++
 4 files changed, 120 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 65a972b52eda..1dec97caaba3 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -83,6 +83,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 	memset(entity, 0, sizeof(struct drm_sched_entity));
 	INIT_LIST_HEAD(&entity->list);
 	entity->rq = NULL;
+	entity->single_sched = NULL;
 	entity->guilty = guilty;
 	entity->num_sched_list = num_sched_list;
 	entity->priority = priority;
@@ -90,8 +91,17 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 	RCU_INIT_POINTER(entity->last_scheduled, NULL);
 	RB_CLEAR_NODE(&entity->rb_tree_node);
 
-	if(num_sched_list)
-		entity->rq = &sched_list[0]->sched_rq[entity->priority];
+	if (num_sched_list) {
+		if (sched_list[0]->sched_policy !=
+		    DRM_SCHED_POLICY_SINGLE_ENTITY) {
+			entity->rq = &sched_list[0]->sched_rq[entity->priority];
+		} else {
+			if (num_sched_list != 1 || sched_list[0]->single_entity)
+				return -EINVAL;
+			sched_list[0]->single_entity = entity;
+			entity->single_sched = sched_list[0];
+		}
+	}
 
 	init_completion(&entity->entity_idle);
 
@@ -124,7 +134,8 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 				    struct drm_gpu_scheduler **sched_list,
 				    unsigned int num_sched_list)
 {
-	WARN_ON(!num_sched_list || !sched_list);
+	WARN_ON(!num_sched_list || !sched_list ||
+		!!entity->single_sched);
 
 	entity->sched_list = sched_list;
 	entity->num_sched_list = num_sched_list;
@@ -231,13 +242,15 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
 {
 	struct drm_sched_job *job;
 	struct dma_fence *prev;
+	bool single_entity = !!entity->single_sched;
 
-	if (!entity->rq)
+	if (!entity->rq && !single_entity)
 		return;
 
 	spin_lock(&entity->rq_lock);
 	entity->stopped = true;
-	drm_sched_rq_remove_entity(entity->rq, entity);
+	if (!single_entity)
+		drm_sched_rq_remove_entity(entity->rq, entity);
 	spin_unlock(&entity->rq_lock);
 
 	/* Make sure this entity is not used by the scheduler at the moment */
@@ -259,6 +272,20 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
 	dma_fence_put(prev);
 }
 
+/**
+ * drm_sched_entity_to_scheduler - Schedule entity to GPU scheduler
+ * @entity: scheduler entity
+ *
+ * Returns GPU scheduler for the entity
+ */
+struct drm_gpu_scheduler *
+drm_sched_entity_to_scheduler(struct drm_sched_entity *entity)
+{
+	bool single_entity = !!entity->single_sched;
+
+	return single_entity ? entity->single_sched : entity->rq->sched;
+}
+
 /**
  * drm_sched_entity_flush - Flush a context entity
  *
@@ -276,11 +303,12 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
 	struct drm_gpu_scheduler *sched;
 	struct task_struct *last_user;
 	long ret = timeout;
+	bool single_entity = !!entity->single_sched;
 
-	if (!entity->rq)
+	if (!entity->rq && !single_entity)
 		return 0;
 
-	sched = entity->rq->sched;
+	sched = drm_sched_entity_to_scheduler(entity);
 	/**
 	 * The client will not queue more IBs during this fini, consume existing
 	 * queued IBs or discard them on SIGKILL
@@ -373,7 +401,7 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
 		container_of(cb, struct drm_sched_entity, cb);
 
 	drm_sched_entity_clear_dep(f, cb);
-	drm_sched_wakeup_if_can_queue(entity->rq->sched);
+	drm_sched_wakeup_if_can_queue(drm_sched_entity_to_scheduler(entity));
 }
 
 /**
@@ -387,6 +415,8 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
 void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
 				   enum drm_sched_priority priority)
 {
+	WARN_ON(!!entity->single_sched);
+
 	spin_lock(&entity->rq_lock);
 	entity->priority = priority;
 	spin_unlock(&entity->rq_lock);
@@ -399,7 +429,7 @@ EXPORT_SYMBOL(drm_sched_entity_set_priority);
  */
 static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
 {
-	struct drm_gpu_scheduler *sched = entity->rq->sched;
+	struct drm_gpu_scheduler *sched = drm_sched_entity_to_scheduler(entity);
 	struct dma_fence *fence = entity->dependency;
 	struct drm_sched_fence *s_fence;
 
@@ -501,7 +531,8 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
 	 * Update the entity's location in the min heap according to
 	 * the timestamp of the next job, if any.
 	 */
-	if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO) {
+	if (drm_sched_entity_to_scheduler(entity)->sched_policy ==
+	    DRM_SCHED_POLICY_FIFO) {
 		struct drm_sched_job *next;
 
 		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
@@ -524,6 +555,8 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
 	struct drm_gpu_scheduler *sched;
 	struct drm_sched_rq *rq;
 
+	WARN_ON(!!entity->single_sched);
+
 	/* single possible engine and already selected */
 	if (!entity->sched_list)
 		return;
@@ -573,12 +606,13 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
 void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 {
 	struct drm_sched_entity *entity = sched_job->entity;
-	bool first, fifo = entity->rq->sched->sched_policy ==
-		DRM_SCHED_POLICY_FIFO;
+	bool single_entity = !!entity->single_sched;
+	bool first;
 	ktime_t submit_ts;
 
 	trace_drm_sched_job(sched_job, entity);
-	atomic_inc(entity->rq->sched->score);
+	if (!single_entity)
+		atomic_inc(entity->rq->sched->score);
 	WRITE_ONCE(entity->last_user, current->group_leader);
 
 	/*
@@ -591,6 +625,10 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 
 	/* first job wakes up scheduler */
 	if (first) {
+		struct drm_gpu_scheduler *sched =
+			drm_sched_entity_to_scheduler(entity);
+		bool fifo = sched->sched_policy == DRM_SCHED_POLICY_FIFO;
+
 		/* Add the entity to the run queue */
 		spin_lock(&entity->rq_lock);
 		if (entity->stopped) {
@@ -600,13 +638,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 			return;
 		}
 
-		drm_sched_rq_add_entity(entity->rq, entity);
+		if (!single_entity)
+			drm_sched_rq_add_entity(entity->rq, entity);
 		spin_unlock(&entity->rq_lock);
 
 		if (fifo)
 			drm_sched_rq_update_fifo(entity, submit_ts);
 
-		drm_sched_wakeup_if_can_queue(entity->rq->sched);
+		drm_sched_wakeup_if_can_queue(sched);
 	}
 }
 EXPORT_SYMBOL(drm_sched_entity_push_job);
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index 06cedfe4b486..f6b926f5e188 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -225,7 +225,7 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
 {
 	unsigned seq;
 
-	fence->sched = entity->rq->sched;
+	fence->sched = drm_sched_entity_to_scheduler(entity);
 	seq = atomic_inc_return(&entity->fence_seq);
 	dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
 		       &fence->lock, entity->fence_context, seq);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 545d5298c086..3820e9ae12c8 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -32,7 +32,8 @@
  * backend operations to the scheduler like submitting a job to hardware run queue,
  * returning the dependencies of a job etc.
  *
- * The organisation of the scheduler is the following:
+ * The organisation of the scheduler is the following for scheduling policies
+ * DRM_SCHED_POLICY_RR and DRM_SCHED_POLICY_FIFO:
  *
  * 1. Each hw run queue has one scheduler
  * 2. Each scheduler has multiple run queues with different priorities
@@ -43,6 +44,23 @@
  *
  * The jobs in a entity are always scheduled in the order that they were pushed.
  *
+ * The organisation of the scheduler is the following for scheduling policy
+ * DRM_SCHED_POLICY_SINGLE_ENTITY:
+ *
+ * 1. One to one relationship between scheduler and entity
+ * 2. No priorities implemented per scheduler (single job queue)
+ * 3. No run queues in scheduler rather jobs are directly dequeued from entity
+ * 4. The entity maintains a queue of jobs that will be scheduled on the
+ * hardware
+ *
+ * The jobs in a entity are always scheduled in the order that they were pushed
+ * regardless of scheduling policy.
+ *
+ * A policy of DRM_SCHED_POLICY_RR or DRM_SCHED_POLICY_FIFO is expected to used
+ * when the KMD is scheduling directly on the hardware while a scheduling policy
+ * of DRM_SCHED_POLICY_SINGLE_ENTITY is expected to be used when there is a
+ * firmware scheduler.
+ *
  * Note that once a job was taken from the entities queue and pushed to the
  * hardware, i.e. the pending queue, the entity must not be referenced anymore
  * through the jobs entity pointer.
@@ -96,6 +114,8 @@ static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *enti
 
 void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
 {
+	WARN_ON(!!entity->single_sched);
+
 	/*
 	 * Both locks need to be grabbed, one to protect from entity->rq change
 	 * for entity from within concurrent drm_sched_entity_select_rq and the
@@ -126,6 +146,8 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
 static void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
 			      struct drm_sched_rq *rq)
 {
+	WARN_ON(sched->sched_policy == DRM_SCHED_POLICY_SINGLE_ENTITY);
+
 	spin_lock_init(&rq->lock);
 	INIT_LIST_HEAD(&rq->entities);
 	rq->rb_tree_root = RB_ROOT_CACHED;
@@ -144,6 +166,8 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
 void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
 			     struct drm_sched_entity *entity)
 {
+	WARN_ON(!!entity->single_sched);
+
 	if (!list_empty(&entity->list))
 		return;
 
@@ -166,6 +190,8 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
 void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
 				struct drm_sched_entity *entity)
 {
+	WARN_ON(!!entity->single_sched);
+
 	if (list_empty(&entity->list))
 		return;
 
@@ -641,7 +667,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
 		       struct drm_sched_entity *entity,
 		       void *owner)
 {
-	if (!entity->rq)
+	if (!entity->rq && !entity->single_sched)
 		return -ENOENT;
 
 	job->entity = entity;
@@ -674,13 +700,16 @@ void drm_sched_job_arm(struct drm_sched_job *job)
 {
 	struct drm_gpu_scheduler *sched;
 	struct drm_sched_entity *entity = job->entity;
+	bool single_entity = !!entity->single_sched;
 
 	BUG_ON(!entity);
-	drm_sched_entity_select_rq(entity);
-	sched = entity->rq->sched;
+	if (!single_entity)
+		drm_sched_entity_select_rq(entity);
+	sched = drm_sched_entity_to_scheduler(entity);
 
 	job->sched = sched;
-	job->s_priority = entity->rq - sched->sched_rq;
+	if (!single_entity)
+		job->s_priority = entity->rq - sched->sched_rq;
 	job->id = atomic64_inc_return(&sched->job_id_count);
 
 	drm_sched_fence_init(job->s_fence, job->entity);
@@ -896,6 +925,14 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
 	if (!drm_sched_can_queue(sched))
 		return NULL;
 
+	if (sched->single_entity) {
+		if (!READ_ONCE(sched->single_entity->stopped) &&
+		    drm_sched_entity_is_ready(sched->single_entity))
+			return sched->single_entity;
+
+		return NULL;
+	}
+
 	/* Kernel run queue has higher priority than normal run queue*/
 	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
 		entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ?
@@ -1091,6 +1128,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		return -EINVAL;
 
 	sched->ops = ops;
+	sched->single_entity = NULL;
 	sched->hw_submission_limit = hw_submission;
 	sched->name = name;
 	sched->submit_wq = submit_wq ? : system_wq;
@@ -1103,7 +1141,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		sched->sched_policy = default_drm_sched_policy;
 	else
 		sched->sched_policy = sched_policy;
-	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
+	for (i = DRM_SCHED_PRIORITY_MIN; sched_policy !=
+	     DRM_SCHED_POLICY_SINGLE_ENTITY && i < DRM_SCHED_PRIORITY_COUNT;
+	     i++)
 		drm_sched_rq_init(sched, &sched->sched_rq[i]);
 
 	init_waitqueue_head(&sched->job_scheduled);
@@ -1135,7 +1175,15 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 
 	drm_sched_submit_stop(sched);
 
-	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
+	if (sched->single_entity) {
+		spin_lock(&sched->single_entity->rq_lock);
+		sched->single_entity->stopped = true;
+		spin_unlock(&sched->single_entity->rq_lock);
+	}
+
+	for (i = DRM_SCHED_PRIORITY_COUNT - 1; sched->sched_policy !=
+	     DRM_SCHED_POLICY_SINGLE_ENTITY && i >= DRM_SCHED_PRIORITY_MIN;
+	     i--) {
 		struct drm_sched_rq *rq = &sched->sched_rq[i];
 
 		spin_lock(&rq->lock);
@@ -1176,6 +1224,8 @@ void drm_sched_increase_karma(struct drm_sched_job *bad)
 	struct drm_sched_entity *entity;
 	struct drm_gpu_scheduler *sched = bad->sched;
 
+	WARN_ON(sched->sched_policy == DRM_SCHED_POLICY_SINGLE_ENTITY);
+
 	/* don't change @bad's karma if it's from KERNEL RQ,
 	 * because sometimes GPU hang would cause kernel jobs (like VM updating jobs)
 	 * corrupt but keep in mind that kernel jobs always considered good.
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 897d52a4ff4f..04eec2d7635f 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -79,6 +79,7 @@ enum drm_sched_policy {
 	DRM_SCHED_POLICY_DEFAULT,
 	DRM_SCHED_POLICY_RR,
 	DRM_SCHED_POLICY_FIFO,
+	DRM_SCHED_POLICY_SINGLE_ENTITY,
 	DRM_SCHED_POLICY_COUNT,
 };
 
@@ -112,6 +113,9 @@ struct drm_sched_entity {
 	 */
 	struct drm_sched_rq		*rq;
 
+	/** @single_sched: Single scheduler */
+	struct drm_gpu_scheduler	*single_sched;
+
 	/**
 	 * @sched_list:
 	 *
@@ -473,6 +477,7 @@ struct drm_sched_backend_ops {
  * struct drm_gpu_scheduler - scheduler instance-specific data
  *
  * @ops: backend operations provided by the driver.
+ * @single_entity: Single entity for the scheduler
  * @hw_submission_limit: the max size of the hardware queue.
  * @timeout: the time after which a job is removed from the scheduler.
  * @name: name of the ring for which this scheduler is being used.
@@ -503,6 +508,7 @@ struct drm_sched_backend_ops {
  */
 struct drm_gpu_scheduler {
 	const struct drm_sched_backend_ops	*ops;
+	struct drm_sched_entity		*single_entity;
 	uint32_t			hw_submission_limit;
 	long				timeout;
 	const char			*name;
@@ -585,6 +591,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 			  struct drm_gpu_scheduler **sched_list,
 			  unsigned int num_sched_list,
 			  atomic_t *guilty);
+struct drm_gpu_scheduler *
+drm_sched_entity_to_scheduler(struct drm_sched_entity *entity);
 long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout);
 void drm_sched_entity_fini(struct drm_sched_entity *entity);
 void drm_sched_entity_destroy(struct drm_sched_entity *entity);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-xe] [PATCH v3 05/13] drm/sched: Split free_job into own work item
  2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
                   ` (3 preceding siblings ...)
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 04/13] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
@ 2023-09-12  2:16 ` Matthew Brost
  2023-09-12  8:08   ` Boris Brezillon
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 06/13] drm/sched: Add generic scheduler message interface Matthew Brost
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Matthew Brost @ 2023-09-12  2:16 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, christian.koenig, faith.ekstrand

Rather than call free_job and run_job in same work item have a dedicated
work item for each. This aligns with the design and intended use of work
queues.

v2:
   - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
     timestamp in free_job() work item (Danilo)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 143 ++++++++++++++++++-------
 include/drm/gpu_scheduler.h            |   8 +-
 2 files changed, 110 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 3820e9ae12c8..d28b6751256e 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -213,11 +213,12 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
  * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run
  *
  * @rq: scheduler run queue to check.
+ * @dequeue: dequeue selected entity
  *
  * Try to find a ready entity, returns NULL if none found.
  */
 static struct drm_sched_entity *
-drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
+drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq, bool dequeue)
 {
 	struct drm_sched_entity *entity;
 
@@ -227,8 +228,10 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
 	if (entity) {
 		list_for_each_entry_continue(entity, &rq->entities, list) {
 			if (drm_sched_entity_is_ready(entity)) {
-				rq->current_entity = entity;
-				reinit_completion(&entity->entity_idle);
+				if (dequeue) {
+					rq->current_entity = entity;
+					reinit_completion(&entity->entity_idle);
+				}
 				spin_unlock(&rq->lock);
 				return entity;
 			}
@@ -238,8 +241,10 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
 	list_for_each_entry(entity, &rq->entities, list) {
 
 		if (drm_sched_entity_is_ready(entity)) {
-			rq->current_entity = entity;
-			reinit_completion(&entity->entity_idle);
+			if (dequeue) {
+				rq->current_entity = entity;
+				reinit_completion(&entity->entity_idle);
+			}
 			spin_unlock(&rq->lock);
 			return entity;
 		}
@@ -257,11 +262,12 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
  * drm_sched_rq_select_entity_fifo - Select an entity which provides a job to run
  *
  * @rq: scheduler run queue to check.
+ * @dequeue: dequeue selected entity
  *
  * Find oldest waiting ready entity, returns NULL if none found.
  */
 static struct drm_sched_entity *
-drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
+drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq, bool dequeue)
 {
 	struct rb_node *rb;
 
@@ -271,8 +277,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
 
 		entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
 		if (drm_sched_entity_is_ready(entity)) {
-			rq->current_entity = entity;
-			reinit_completion(&entity->entity_idle);
+			if (dequeue) {
+				rq->current_entity = entity;
+				reinit_completion(&entity->entity_idle);
+			}
 			break;
 		}
 	}
@@ -282,13 +290,54 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
 }
 
 /**
- * drm_sched_submit_queue - scheduler queue submission
+ * drm_sched_run_job_queue - queue job submission
  * @sched: scheduler instance
  */
-static void drm_sched_submit_queue(struct drm_gpu_scheduler *sched)
+static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
 {
 	if (!READ_ONCE(sched->pause_submit))
-		queue_work(sched->submit_wq, &sched->work_submit);
+		queue_work(sched->submit_wq, &sched->work_run_job);
+}
+
+static struct drm_sched_entity *
+drm_sched_select_entity(struct drm_gpu_scheduler *sched, bool dequeue);
+
+/**
+ * drm_sched_run_job_queue_if_ready - queue job submission if ready
+ * @sched: scheduler instance
+ */
+static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
+{
+	if (drm_sched_select_entity(sched, false))
+		drm_sched_run_job_queue(sched);
+}
+
+/**
+ * drm_sched_free_job_queue - queue free job
+ *
+ * @sched: scheduler instance to queue free job
+ */
+static void drm_sched_free_job_queue(struct drm_gpu_scheduler *sched)
+{
+	if (!READ_ONCE(sched->pause_submit))
+		queue_work(sched->submit_wq, &sched->work_free_job);
+}
+
+/**
+ * drm_sched_free_job_queue_if_ready - queue free job if ready
+ *
+ * @sched: scheduler instance to queue free job
+ */
+static void drm_sched_free_job_queue_if_ready(struct drm_gpu_scheduler *sched)
+{
+	struct drm_sched_job *job;
+
+	spin_lock(&sched->job_list_lock);
+	job = list_first_entry_or_null(&sched->pending_list,
+				       struct drm_sched_job, list);
+	if (job && dma_fence_is_signaled(&job->s_fence->finished))
+		drm_sched_free_job_queue(sched);
+	spin_unlock(&sched->job_list_lock);
 }
 
 /**
@@ -310,7 +359,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
 	dma_fence_get(&s_fence->finished);
 	drm_sched_fence_finished(s_fence, result);
 	dma_fence_put(&s_fence->finished);
-	drm_sched_submit_queue(sched);
+	drm_sched_free_job_queue(sched);
 }
 
 /**
@@ -906,18 +955,19 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
 void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched)
 {
 	if (drm_sched_can_queue(sched))
-		drm_sched_submit_queue(sched);
+		drm_sched_run_job_queue(sched);
 }
 
 /**
  * drm_sched_select_entity - Select next entity to process
  *
  * @sched: scheduler instance
+ * @dequeue: dequeue selected entity
  *
  * Returns the entity to process or NULL if none are found.
  */
 static struct drm_sched_entity *
-drm_sched_select_entity(struct drm_gpu_scheduler *sched)
+drm_sched_select_entity(struct drm_gpu_scheduler *sched, bool dequeue)
 {
 	struct drm_sched_entity *entity;
 	int i;
@@ -936,8 +986,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
 	/* Kernel run queue has higher priority than normal run queue*/
 	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
 		entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ?
-			drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]) :
-			drm_sched_rq_select_entity_rr(&sched->sched_rq[i]);
+			drm_sched_rq_select_entity_fifo(&sched->sched_rq[i],
+							dequeue) :
+			drm_sched_rq_select_entity_rr(&sched->sched_rq[i],
+						      dequeue);
 		if (entity)
 			break;
 	}
@@ -974,8 +1026,10 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 						typeof(*next), list);
 
 		if (next) {
-			next->s_fence->scheduled.timestamp =
-				job->s_fence->finished.timestamp;
+			if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
+				     &next->s_fence->scheduled.flags))
+				next->s_fence->scheduled.timestamp =
+					job->s_fence->finished.timestamp;
 			/* start TO timer for next job */
 			drm_sched_start_timeout(sched);
 		}
@@ -1025,30 +1079,44 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 EXPORT_SYMBOL(drm_sched_pick_best);
 
 /**
- * drm_sched_main - main scheduler thread
+ * drm_sched_free_job_work - worker to call free_job
  *
- * @param: scheduler instance
+ * @w: free job work
  */
-static void drm_sched_main(struct work_struct *w)
+static void drm_sched_free_job_work(struct work_struct *w)
 {
 	struct drm_gpu_scheduler *sched =
-		container_of(w, struct drm_gpu_scheduler, work_submit);
-	struct drm_sched_entity *entity;
+		container_of(w, struct drm_gpu_scheduler, work_free_job);
 	struct drm_sched_job *cleanup_job;
-	int r;
 
 	if (READ_ONCE(sched->pause_submit))
 		return;
 
 	cleanup_job = drm_sched_get_cleanup_job(sched);
-	entity = drm_sched_select_entity(sched);
+	if (cleanup_job) {
+		sched->ops->free_job(cleanup_job);
+
+		drm_sched_free_job_queue_if_ready(sched);
+		drm_sched_run_job_queue_if_ready(sched);
+	}
+}
 
-	if (!entity && !cleanup_job)
-		return;	/* No more work */
+/**
+ * drm_sched_run_job_work - worker to call run_job
+ *
+ * @w: run job work
+ */
+static void drm_sched_run_job_work(struct work_struct *w)
+{
+	struct drm_gpu_scheduler *sched =
+		container_of(w, struct drm_gpu_scheduler, work_run_job);
+	struct drm_sched_entity *entity;
+	int r;
 
-	if (cleanup_job)
-		sched->ops->free_job(cleanup_job);
+	if (READ_ONCE(sched->pause_submit))
+		return;
 
+	entity = drm_sched_select_entity(sched, true);
 	if (entity) {
 		struct dma_fence *fence;
 		struct drm_sched_fence *s_fence;
@@ -1057,9 +1125,7 @@ static void drm_sched_main(struct work_struct *w)
 		sched_job = drm_sched_entity_pop_job(entity);
 		if (!sched_job) {
 			complete_all(&entity->entity_idle);
-			if (!cleanup_job)
-				return;	/* No more work */
-			goto again;
+			return;	/* No more work */
 		}
 
 		s_fence = sched_job->s_fence;
@@ -1089,10 +1155,8 @@ static void drm_sched_main(struct work_struct *w)
 		}
 
 		wake_up(&sched->job_scheduled);
+		drm_sched_run_job_queue_if_ready(sched);
 	}
-
-again:
-	drm_sched_submit_queue(sched);
 }
 
 /**
@@ -1151,7 +1215,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 	spin_lock_init(&sched->job_list_lock);
 	atomic_set(&sched->hw_rq_count, 0);
 	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
-	INIT_WORK(&sched->work_submit, drm_sched_main);
+	INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
+	INIT_WORK(&sched->work_free_job, drm_sched_free_job_work);
 	atomic_set(&sched->_score, 0);
 	atomic64_set(&sched->job_id_count, 0);
 	sched->pause_submit = false;
@@ -1276,7 +1341,8 @@ EXPORT_SYMBOL(drm_sched_submit_ready);
 void drm_sched_submit_stop(struct drm_gpu_scheduler *sched)
 {
 	WRITE_ONCE(sched->pause_submit, true);
-	cancel_work_sync(&sched->work_submit);
+	cancel_work_sync(&sched->work_run_job);
+	cancel_work_sync(&sched->work_free_job);
 }
 EXPORT_SYMBOL(drm_sched_submit_stop);
 
@@ -1288,6 +1354,7 @@ EXPORT_SYMBOL(drm_sched_submit_stop);
 void drm_sched_submit_start(struct drm_gpu_scheduler *sched)
 {
 	WRITE_ONCE(sched->pause_submit, false);
-	queue_work(sched->submit_wq, &sched->work_submit);
+	queue_work(sched->submit_wq, &sched->work_run_job);
+	queue_work(sched->submit_wq, &sched->work_free_job);
 }
 EXPORT_SYMBOL(drm_sched_submit_start);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 04eec2d7635f..fbc083a92757 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -487,9 +487,10 @@ struct drm_sched_backend_ops {
  *                 finished.
  * @hw_rq_count: the number of jobs currently in the hardware queue.
  * @job_id_count: used to assign unique id to the each job.
- * @submit_wq: workqueue used to queue @work_submit
+ * @submit_wq: workqueue used to queue @work_run_job and @work_free_job
  * @timeout_wq: workqueue used to queue @work_tdr
- * @work_submit: schedules jobs and cleans up entities
+ * @work_run_job: schedules jobs
+ * @work_free_job: cleans up jobs
  * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
  *            timeout interval is over.
  * @pending_list: the list of jobs which are currently in the job queue.
@@ -518,7 +519,8 @@ struct drm_gpu_scheduler {
 	atomic64_t			job_id_count;
 	struct workqueue_struct		*submit_wq;
 	struct workqueue_struct		*timeout_wq;
-	struct work_struct		work_submit;
+	struct work_struct		work_run_job;
+	struct work_struct		work_free_job;
 	struct delayed_work		work_tdr;
 	struct list_head		pending_list;
 	spinlock_t			job_list_lock;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-xe] [PATCH v3 06/13] drm/sched: Add generic scheduler message interface
  2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
                   ` (4 preceding siblings ...)
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 05/13] drm/sched: Split free_job into own work item Matthew Brost
@ 2023-09-12  2:16 ` Matthew Brost
  2023-09-12  8:23   ` Boris Brezillon
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 07/13] drm/sched: Add drm_sched_start_timeout_unlocked helper Matthew Brost
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Matthew Brost @ 2023-09-12  2:16 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, christian.koenig, faith.ekstrand

Add generic schedule message interface which sends messages to backend
from the drm_gpu_scheduler main submission thread. The idea is some of
these messages modify some state in drm_sched_entity which is also
modified during submission. By scheduling these messages and submission
in the same thread their is not race changing states in
drm_sched_entity.

This interface will be used in Xe, new Intel GPU driver, to cleanup,
suspend, resume, and change scheduling properties of a drm_sched_entity.

The interface is designed to be generic and extendable with only the
backend understanding the messages.

v2:
 - (Christian) We dedicated work item

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 98 ++++++++++++++++++++++++++
 include/drm/gpu_scheduler.h            | 34 ++++++++-
 2 files changed, 131 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index d28b6751256e..13697f45bd7b 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -340,6 +340,35 @@ static void drm_sched_free_job_queue_if_ready(struct drm_gpu_scheduler *sched)
 	spin_unlock(&sched->job_list_lock);
 }
 
+/**
+ * drm_sched_process_msg_queue - queue process msg worker
+ *
+ * @sched: scheduler instance to queue process_msg worker
+ */
+static void drm_sched_process_msg_queue(struct drm_gpu_scheduler *sched)
+{
+	if (!READ_ONCE(sched->pause_submit))
+		queue_work(sched->submit_wq, &sched->work_process_msg);
+}
+
+/**
+ * drm_sched_process_msg_queue_if_ready - queue process msg worker if ready
+ *
+ * @sched: scheduler instance to queue process_msg worker
+ */
+static void
+drm_sched_process_msg_queue_if_ready(struct drm_gpu_scheduler *sched)
+{
+	struct drm_sched_msg *msg;
+
+	spin_lock(&sched->job_list_lock);
+	msg = list_first_entry_or_null(&sched->msgs,
+				       struct drm_sched_msg, link);
+	if (msg)
+		drm_sched_process_msg_queue(sched);
+	spin_unlock(&sched->job_list_lock);
+}
+
 /**
  * drm_sched_job_done - complete a job
  * @s_job: pointer to the job which is done
@@ -1078,6 +1107,71 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 }
 EXPORT_SYMBOL(drm_sched_pick_best);
 
+/**
+ * drm_sched_add_msg - add scheduler message
+ *
+ * @sched: scheduler instance
+ * @msg: message to be added
+ *
+ * Can and will pass an jobs waiting on dependencies or in a runnable queue.
+ * Messages processing will stop if schedule run wq is stopped and resume when
+ * run wq is started.
+ */
+void drm_sched_add_msg(struct drm_gpu_scheduler *sched,
+		       struct drm_sched_msg *msg)
+{
+	spin_lock(&sched->job_list_lock);
+	list_add_tail(&msg->link, &sched->msgs);
+	spin_unlock(&sched->job_list_lock);
+
+	drm_sched_process_msg_queue(sched);
+}
+EXPORT_SYMBOL(drm_sched_add_msg);
+
+/**
+ * drm_sched_get_msg - get scheduler message
+ *
+ * @sched: scheduler instance
+ *
+ * Returns NULL or message
+ */
+static struct drm_sched_msg *
+drm_sched_get_msg(struct drm_gpu_scheduler *sched)
+{
+	struct drm_sched_msg *msg;
+
+	spin_lock(&sched->job_list_lock);
+	msg = list_first_entry_or_null(&sched->msgs,
+				       struct drm_sched_msg, link);
+	if (msg)
+		list_del(&msg->link);
+	spin_unlock(&sched->job_list_lock);
+
+	return msg;
+}
+
+/**
+ * drm_sched_process_msg_work - worker to call process_msg
+ *
+ * @w: process msg work
+ */
+static void drm_sched_process_msg_work(struct work_struct *w)
+{
+	struct drm_gpu_scheduler *sched =
+		container_of(w, struct drm_gpu_scheduler, work_process_msg);
+	struct drm_sched_msg *msg;
+
+	if (READ_ONCE(sched->pause_submit))
+		return;
+
+	msg = drm_sched_get_msg(sched);
+	if (msg) {
+		sched->ops->process_msg(msg);
+
+		drm_sched_process_msg_queue_if_ready(sched);
+	}
+}
+
 /**
  * drm_sched_free_job_work - worker to call free_job
  *
@@ -1212,11 +1306,13 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 
 	init_waitqueue_head(&sched->job_scheduled);
 	INIT_LIST_HEAD(&sched->pending_list);
+	INIT_LIST_HEAD(&sched->msgs);
 	spin_lock_init(&sched->job_list_lock);
 	atomic_set(&sched->hw_rq_count, 0);
 	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
 	INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
 	INIT_WORK(&sched->work_free_job, drm_sched_free_job_work);
+	INIT_WORK(&sched->work_process_msg, drm_sched_process_msg_work);
 	atomic_set(&sched->_score, 0);
 	atomic64_set(&sched->job_id_count, 0);
 	sched->pause_submit = false;
@@ -1343,6 +1439,7 @@ void drm_sched_submit_stop(struct drm_gpu_scheduler *sched)
 	WRITE_ONCE(sched->pause_submit, true);
 	cancel_work_sync(&sched->work_run_job);
 	cancel_work_sync(&sched->work_free_job);
+	cancel_work_sync(&sched->work_process_msg);
 }
 EXPORT_SYMBOL(drm_sched_submit_stop);
 
@@ -1356,5 +1453,6 @@ void drm_sched_submit_start(struct drm_gpu_scheduler *sched)
 	WRITE_ONCE(sched->pause_submit, false);
 	queue_work(sched->submit_wq, &sched->work_run_job);
 	queue_work(sched->submit_wq, &sched->work_free_job);
+	queue_work(sched->submit_wq, &sched->work_process_msg);
 }
 EXPORT_SYMBOL(drm_sched_submit_start);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index fbc083a92757..5d753ecb5d71 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -394,6 +394,23 @@ enum drm_gpu_sched_stat {
 	DRM_GPU_SCHED_STAT_ENODEV,
 };
 
+/**
+ * struct drm_sched_msg - an in-band (relative to GPU scheduler run queue)
+ * message
+ *
+ * Generic enough for backend defined messages, backend can expand if needed.
+ */
+struct drm_sched_msg {
+	/** @link: list link into the gpu scheduler list of messages */
+	struct list_head		link;
+	/**
+	 * @private_data: opaque pointer to message private data (backend defined)
+	 */
+	void				*private_data;
+	/** @opcode: opcode of message (backend defined) */
+	unsigned int			opcode;
+};
+
 /**
  * struct drm_sched_backend_ops - Define the backend operations
  *	called by the scheduler
@@ -471,6 +488,12 @@ struct drm_sched_backend_ops {
          * and it's time to clean it up.
 	 */
 	void (*free_job)(struct drm_sched_job *sched_job);
+
+	/**
+	 * @process_msg: Process a message. Allowed to block, it is this
+	 * function's responsibility to free message if dynamically allocated.
+	 */
+	void (*process_msg)(struct drm_sched_msg *msg);
 };
 
 /**
@@ -482,15 +505,18 @@ struct drm_sched_backend_ops {
  * @timeout: the time after which a job is removed from the scheduler.
  * @name: name of the ring for which this scheduler is being used.
  * @sched_rq: priority wise array of run queues.
+ * @msgs: list of messages to be processed in @work_process_msg
  * @job_scheduled: once @drm_sched_entity_do_release is called the scheduler
  *                 waits on this wait queue until all the scheduled jobs are
  *                 finished.
  * @hw_rq_count: the number of jobs currently in the hardware queue.
  * @job_id_count: used to assign unique id to the each job.
- * @submit_wq: workqueue used to queue @work_run_job and @work_free_job
+ * @submit_wq: workqueue used to queue @work_run_job, @work_free_job, and
+ *             @work_process_msg
  * @timeout_wq: workqueue used to queue @work_tdr
  * @work_run_job: schedules jobs
  * @work_free_job: cleans up jobs
+ * @work_process_msg: processes messages
  * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
  *            timeout interval is over.
  * @pending_list: the list of jobs which are currently in the job queue.
@@ -502,6 +528,8 @@ struct drm_sched_backend_ops {
  * @sched_policy: Schedule policy for scheduler
  * @ready: marks if the underlying HW is ready to work
  * @free_guilty: A hit to time out handler to free the guilty job.
+ * @pause_submit: pause queuing of @work_run_job, @work_free_job, and
+ *                @work_process_msg on @submit_wq
  * @pause_submit: pause queuing of @work_submit on @submit_wq
  * @dev: system &struct device
  *
@@ -514,6 +542,7 @@ struct drm_gpu_scheduler {
 	long				timeout;
 	const char			*name;
 	struct drm_sched_rq		sched_rq[DRM_SCHED_PRIORITY_COUNT];
+	struct list_head		msgs;
 	wait_queue_head_t		job_scheduled;
 	atomic_t			hw_rq_count;
 	atomic64_t			job_id_count;
@@ -521,6 +550,7 @@ struct drm_gpu_scheduler {
 	struct workqueue_struct		*timeout_wq;
 	struct work_struct		work_run_job;
 	struct work_struct		work_free_job;
+	struct work_struct		work_process_msg;
 	struct delayed_work		work_tdr;
 	struct list_head		pending_list;
 	spinlock_t			job_list_lock;
@@ -568,6 +598,8 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 
 void drm_sched_job_cleanup(struct drm_sched_job *job);
 void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched);
+void drm_sched_add_msg(struct drm_gpu_scheduler *sched,
+		       struct drm_sched_msg *msg);
 bool drm_sched_submit_ready(struct drm_gpu_scheduler *sched);
 void drm_sched_submit_stop(struct drm_gpu_scheduler *sched);
 void drm_sched_submit_start(struct drm_gpu_scheduler *sched);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-xe] [PATCH v3 07/13] drm/sched: Add drm_sched_start_timeout_unlocked helper
  2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
                   ` (5 preceding siblings ...)
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 06/13] drm/sched: Add generic scheduler message interface Matthew Brost
@ 2023-09-12  2:16 ` Matthew Brost
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 08/13] drm/sched: Start run wq before TDR in drm_sched_start Matthew Brost
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-12  2:16 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, christian.koenig, faith.ekstrand

Also add a lockdep assert to drm_sched_start_timeout.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 13697f45bd7b..bc080e09d9ed 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -412,11 +412,20 @@ static void drm_sched_job_done_cb(struct dma_fence *f, struct dma_fence_cb *cb)
  */
 static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
 {
+	lockdep_assert_held(&sched->job_list_lock);
+
 	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
 	    !list_empty(&sched->pending_list))
 		queue_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
 }
 
+static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
+{
+	spin_lock(&sched->job_list_lock);
+	drm_sched_start_timeout(sched);
+	spin_unlock(&sched->job_list_lock);
+}
+
 /**
  * drm_sched_fault - immediately start timeout handler
  *
@@ -529,11 +538,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
 		spin_unlock(&sched->job_list_lock);
 	}
 
-	if (status != DRM_GPU_SCHED_STAT_ENODEV) {
-		spin_lock(&sched->job_list_lock);
-		drm_sched_start_timeout(sched);
-		spin_unlock(&sched->job_list_lock);
-	}
+	if (status != DRM_GPU_SCHED_STAT_ENODEV)
+		drm_sched_start_timeout_unlocked(sched);
 }
 
 /**
@@ -659,11 +665,8 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
 			drm_sched_job_done(s_job, -ECANCELED);
 	}
 
-	if (full_recovery) {
-		spin_lock(&sched->job_list_lock);
-		drm_sched_start_timeout(sched);
-		spin_unlock(&sched->job_list_lock);
-	}
+	if (full_recovery)
+		drm_sched_start_timeout_unlocked(sched);
 
 	drm_sched_submit_start(sched);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-xe] [PATCH v3 08/13] drm/sched: Start run wq before TDR in drm_sched_start
  2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
                   ` (6 preceding siblings ...)
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 07/13] drm/sched: Add drm_sched_start_timeout_unlocked helper Matthew Brost
@ 2023-09-12  2:16 ` Matthew Brost
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 09/13] drm/sched: Submit job before starting TDR Matthew Brost
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-12  2:16 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, christian.koenig, faith.ekstrand

If the TDR is set to a very small value it can fire before the run wq is
started in the function drm_sched_start. The run wq is expected to
running when the TDR fires, fix this ordering so this expectation is
always met.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index bc080e09d9ed..c627d3e6494a 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -665,10 +665,10 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
 			drm_sched_job_done(s_job, -ECANCELED);
 	}
 
+	drm_sched_submit_start(sched);
+
 	if (full_recovery)
 		drm_sched_start_timeout_unlocked(sched);
-
-	drm_sched_submit_start(sched);
 }
 EXPORT_SYMBOL(drm_sched_start);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-xe] [PATCH v3 09/13] drm/sched: Submit job before starting TDR
  2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
                   ` (7 preceding siblings ...)
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 08/13] drm/sched: Start run wq before TDR in drm_sched_start Matthew Brost
@ 2023-09-12  2:16 ` Matthew Brost
  2023-09-14  2:56   ` Luben Tuikov
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 10/13] drm/sched: Add helper to set TDR timeout Matthew Brost
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Matthew Brost @ 2023-09-12  2:16 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, christian.koenig, faith.ekstrand

If the TDR is set to a value, it can fire before a job is submitted in
drm_sched_main. The job should be always be submitted before the TDR
fires, fix this ordering.

v2:
  - Add to pending list before run_job, start TDR after (Luben, Boris)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index c627d3e6494a..9dbfab7be2c6 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -498,7 +498,6 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
 
 	spin_lock(&sched->job_list_lock);
 	list_add_tail(&s_job->list, &sched->pending_list);
-	drm_sched_start_timeout(sched);
 	spin_unlock(&sched->job_list_lock);
 }
 
@@ -1234,6 +1233,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
 		fence = sched->ops->run_job(sched_job);
 		complete_all(&entity->entity_idle);
 		drm_sched_fence_scheduled(s_fence, fence);
+		drm_sched_start_timeout_unlocked(sched);
 
 		if (!IS_ERR_OR_NULL(fence)) {
 			/* Drop for original kref_init of the fence */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-xe] [PATCH v3 10/13] drm/sched: Add helper to set TDR timeout
  2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
                   ` (8 preceding siblings ...)
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 09/13] drm/sched: Submit job before starting TDR Matthew Brost
@ 2023-09-12  2:16 ` Matthew Brost
  2023-09-14  2:38   ` Luben Tuikov
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill Matthew Brost
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Matthew Brost @ 2023-09-12  2:16 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, christian.koenig, faith.ekstrand

Add helper to set TDR timeout and restart the TDR with new timeout
value. This will be used in XE, new Intel GPU driver, to trigger the TDR
to cleanup drm_sched_entity that encounter errors.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 18 ++++++++++++++++++
 include/drm/gpu_scheduler.h            |  1 +
 2 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 9dbfab7be2c6..689fb6686e01 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -426,6 +426,24 @@ static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
 	spin_unlock(&sched->job_list_lock);
 }
 
+/**
+ * drm_sched_set_timeout - set timeout for reset worker
+ *
+ * @sched: scheduler instance to set and (re)-start the worker for
+ * @timeout: timeout period
+ *
+ * Set and (re)-start the timeout for the given scheduler.
+ */
+void drm_sched_set_timeout(struct drm_gpu_scheduler *sched, long timeout)
+{
+	spin_lock(&sched->job_list_lock);
+	sched->timeout = timeout;
+	cancel_delayed_work(&sched->work_tdr);
+	drm_sched_start_timeout(sched);
+	spin_unlock(&sched->job_list_lock);
+}
+EXPORT_SYMBOL(drm_sched_set_timeout);
+
 /**
  * drm_sched_fault - immediately start timeout handler
  *
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 5d753ecb5d71..b7b818cd81b6 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -596,6 +596,7 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 				    struct drm_gpu_scheduler **sched_list,
                                    unsigned int num_sched_list);
 
+void drm_sched_set_timeout(struct drm_gpu_scheduler *sched, long timeout);
 void drm_sched_job_cleanup(struct drm_sched_job *job);
 void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched);
 void drm_sched_add_msg(struct drm_gpu_scheduler *sched,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill
  2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
                   ` (9 preceding siblings ...)
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 10/13] drm/sched: Add helper to set TDR timeout Matthew Brost
@ 2023-09-12  2:16 ` Matthew Brost
  2023-09-12  8:44   ` Boris Brezillon
                     ` (2 more replies)
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 12/13] drm/sched/doc: Add Entity teardown documentaion Matthew Brost
                   ` (3 subsequent siblings)
  14 siblings, 3 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-12  2:16 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, christian.koenig, faith.ekstrand

Wait for pending jobs to be complete before signaling queued jobs. This
ensures dma-fence signaling order correct and also ensures the entity is
not running on the hardware after drm_sched_entity_flush or
drm_sched_entity_fini returns.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
 drivers/gpu/drm/scheduler/sched_entity.c    |  7 ++-
 drivers/gpu/drm/scheduler/sched_main.c      | 50 ++++++++++++++++++---
 include/drm/gpu_scheduler.h                 | 18 ++++++++
 4 files changed, 70 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index fb5dad687168..7835c0da65c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1873,7 +1873,7 @@ static void amdgpu_ib_preempt_mark_partial_job(struct amdgpu_ring *ring)
 	list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) {
 		if (dma_fence_is_signaled(&s_job->s_fence->finished)) {
 			/* remove job from ring_mirror_list */
-			list_del_init(&s_job->list);
+			drm_sched_remove_pending_job(s_job);
 			sched->ops->free_job(s_job);
 			continue;
 		}
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 1dec97caaba3..37557fbb96d0 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -104,9 +104,11 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 	}
 
 	init_completion(&entity->entity_idle);
+	init_completion(&entity->jobs_done);
 
-	/* We start in an idle state. */
+	/* We start in an idle and jobs done state. */
 	complete_all(&entity->entity_idle);
+	complete_all(&entity->jobs_done);
 
 	spin_lock_init(&entity->rq_lock);
 	spsc_queue_init(&entity->job_queue);
@@ -256,6 +258,9 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
 	/* Make sure this entity is not used by the scheduler at the moment */
 	wait_for_completion(&entity->entity_idle);
 
+	/* Make sure all pending jobs are done */
+	wait_for_completion(&entity->jobs_done);
+
 	/* The entity is guaranteed to not be used by the scheduler */
 	prev = rcu_dereference_check(entity->last_scheduled, true);
 	dma_fence_get(prev);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 689fb6686e01..ed6f5680793a 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -510,12 +510,52 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
 }
 EXPORT_SYMBOL(drm_sched_resume_timeout);
 
+/**
+ * drm_sched_add_pending_job - Add pending job to scheduler
+ *
+ * @job: scheduler job to add
+ * @tail: add to tail of pending list
+ */
+void drm_sched_add_pending_job(struct drm_sched_job *job, bool tail)
+{
+	struct drm_gpu_scheduler *sched = job->sched;
+	struct drm_sched_entity *entity = job->entity;
+
+	lockdep_assert_held(&sched->job_list_lock);
+
+	if (tail)
+		list_add_tail(&job->list, &sched->pending_list);
+	else
+		list_add(&job->list, &sched->pending_list);
+	if (!entity->pending_job_count++)
+		reinit_completion(&entity->jobs_done);
+}
+EXPORT_SYMBOL(drm_sched_add_pending_job);
+
+/**
+ * drm_sched_remove_pending_job - Remove pending job from` scheduler
+ *
+ * @job: scheduler job to remove
+ */
+void drm_sched_remove_pending_job(struct drm_sched_job *job)
+{
+	struct drm_gpu_scheduler *sched = job->sched;
+	struct drm_sched_entity *entity = job->entity;
+
+	lockdep_assert_held(&sched->job_list_lock);
+
+	list_del_init(&job->list);
+	if (!--entity->pending_job_count)
+		complete_all(&entity->jobs_done);
+}
+EXPORT_SYMBOL(drm_sched_remove_pending_job);
+
 static void drm_sched_job_begin(struct drm_sched_job *s_job)
 {
 	struct drm_gpu_scheduler *sched = s_job->sched;
 
 	spin_lock(&sched->job_list_lock);
-	list_add_tail(&s_job->list, &sched->pending_list);
+	drm_sched_add_pending_job(s_job, true);
 	spin_unlock(&sched->job_list_lock);
 }
 
@@ -538,7 +578,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
 		 * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
 		 * is parked at which point it's safe.
 		 */
-		list_del_init(&job->list);
+		drm_sched_remove_pending_job(job);
 		spin_unlock(&sched->job_list_lock);
 
 		status = job->sched->ops->timedout_job(job);
@@ -589,7 +629,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 		 * Add at the head of the queue to reflect it was the earliest
 		 * job extracted.
 		 */
-		list_add(&bad->list, &sched->pending_list);
+		drm_sched_add_pending_job(bad, false);
 
 	/*
 	 * Iterate the job list from later to  earlier one and either deactive
@@ -611,7 +651,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 			 * Locking here is for concurrent resume timeout
 			 */
 			spin_lock(&sched->job_list_lock);
-			list_del_init(&s_job->list);
+			drm_sched_remove_pending_job(s_job);
 			spin_unlock(&sched->job_list_lock);
 
 			/*
@@ -1066,7 +1106,7 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 
 	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
 		/* remove job from pending_list */
-		list_del_init(&job->list);
+		drm_sched_remove_pending_job(job);
 
 		/* cancel this job's TO timer */
 		cancel_delayed_work(&sched->work_tdr);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index b7b818cd81b6..7c628f36fe78 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -233,6 +233,21 @@ struct drm_sched_entity {
 	 */
 	struct completion		entity_idle;
 
+	/**
+	 * @pending_job_count:
+	 *
+	 * Number of pending jobs.
+	 */
+	unsigned int                    pending_job_count;
+
+	/**
+	 * @jobs_done:
+	 *
+	 * Signals when entity has no pending jobs, used to sequence entity
+	 * cleanup in drm_sched_entity_fini().
+	 */
+	struct completion		jobs_done;
+
 	/**
 	 * @oldest_job_waiting:
 	 *
@@ -656,4 +671,7 @@ struct drm_gpu_scheduler *
 drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 		     unsigned int num_sched_list);
 
+void drm_sched_add_pending_job(struct drm_sched_job *job, bool tail);
+void drm_sched_remove_pending_job(struct drm_sched_job *job);
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-xe] [PATCH v3 12/13] drm/sched/doc: Add Entity teardown documentaion
  2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
                   ` (10 preceding siblings ...)
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill Matthew Brost
@ 2023-09-12  2:16 ` Matthew Brost
  2023-09-13 15:04   ` Christian König
                     ` (2 more replies)
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 13/13] drm/sched: Update maintainers of GPU scheduler Matthew Brost
                   ` (2 subsequent siblings)
  14 siblings, 3 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-12  2:16 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, christian.koenig, faith.ekstrand

Provide documentation to guide in ways to teardown an entity.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 Documentation/gpu/drm-mm.rst             |  6 ++++++
 drivers/gpu/drm/scheduler/sched_entity.c | 19 +++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index c19b34b1c0ed..cb4d6097897e 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -552,6 +552,12 @@ Overview
 .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
    :doc: Overview
 
+Entity teardown
+---------------
+
+.. kernel-doc:: drivers/gpu/drm/scheduler/sched_entity.c
+   :doc: Entity teardown
+
 Scheduler Function References
 -----------------------------
 
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 37557fbb96d0..76f3e10218bb 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -21,6 +21,25 @@
  *
  */
 
+/**
+ * DOC: Entity teardown
+ *
+ * Drivers can teardown down an entity for several reasons. Reasons typically
+ * are a user closes the entity via an IOCTL, the FD associated with the entity
+ * is closed, or the entity encounters an error. The GPU scheduler provides the
+ * basic infrastructure to do this in a few different ways.
+ *
+ * 1. Let the entity run dry (both the pending list and job queue) and then call
+ * drm_sched_entity_fini. The backend can accelerate the process of running dry.
+ * For example set a flag so run_job is a NOP and set the TDR to a low value to
+ * signal all jobs in a timely manner (this example works for
+ * DRM_SCHED_POLICY_SINGLE_ENTITY).
+ *
+ * 2. Kill the entity directly via drm_sched_entity_flush /
+ * drm_sched_entity_fini ensuring all pending and queued jobs are off the
+ * hardware and signaled.
+ */
+
 #include <linux/kthread.h>
 #include <linux/slab.h>
 #include <linux/completion.h>
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-xe] [PATCH v3 13/13] drm/sched: Update maintainers of GPU scheduler
  2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
                   ` (11 preceding siblings ...)
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 12/13] drm/sched/doc: Add Entity teardown documentaion Matthew Brost
@ 2023-09-12  2:16 ` Matthew Brost
  2023-09-12  2:20 ` [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev5) Patchwork
  2023-09-14  1:45 ` [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Luben Tuikov
  14 siblings, 0 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-12  2:16 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, christian.koenig, faith.ekstrand

Add Matthew Brost to maintainers of GPU scheduler

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index d1035fdcaa97..38d96077b35d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7131,6 +7131,7 @@ F:	drivers/gpu/drm/xlnx/
 
 DRM GPU SCHEDULER
 M:	Luben Tuikov <luben.tuikov@amd.com>
+M:	Matthew Brost <matthew.brost@intel.com>
 L:	dri-devel@lists.freedesktop.org
 S:	Maintained
 T:	git git://anongit.freedesktop.org/drm/drm-misc
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev5)
  2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
                   ` (12 preceding siblings ...)
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 13/13] drm/sched: Update maintainers of GPU scheduler Matthew Brost
@ 2023-09-12  2:20 ` Patchwork
  2023-09-14  1:45 ` [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Luben Tuikov
  14 siblings, 0 replies; 53+ messages in thread
From: Patchwork @ 2023-09-12  2:20 UTC (permalink / raw)
  To: Danilo Krummrich; +Cc: intel-xe

== Series Details ==

Series: DRM scheduler changes for Xe (rev5)
URL   : https://patchwork.freedesktop.org/series/121744/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: b4ddac861 fixup! drm/xe/display: Implement display support
=== git am output follows ===
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:290
error: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c: patch does not apply
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1659
error: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c: patch does not apply
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4614
error: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: patch does not apply
error: patch failed: drivers/gpu/drm/msm/adreno/adreno_device.c:809
error: drivers/gpu/drm/msm/adreno/adreno_device.c: patch does not apply
error: patch failed: drivers/gpu/drm/scheduler/sched_main.c:439
error: drivers/gpu/drm/scheduler/sched_main.c: patch does not apply
error: patch failed: include/drm/gpu_scheduler.h:550
error: include/drm/gpu_scheduler.h: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch
Applying: drm/sched: Add drm_sched_submit_* helpers
Patch failed at 0001 drm/sched: Add drm_sched_submit_* helpers
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
@ 2023-09-12  7:29   ` Boris Brezillon
  2023-09-12 15:02     ` Matthew Brost
  2023-09-14  3:35   ` Luben Tuikov
  2023-09-16 17:07   ` Danilo Krummrich
  2 siblings, 1 reply; 53+ messages in thread
From: Boris Brezillon @ 2023-09-12  7:29 UTC (permalink / raw)
  To: Matthew Brost
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, christian.koenig, luben.tuikov,
	donald.robson, daniel, lina, airlied, intel-xe, faith.ekstrand

On Mon, 11 Sep 2023 19:16:04 -0700
Matthew Brost <matthew.brost@intel.com> wrote:

> @@ -1071,6 +1063,7 @@ static int drm_sched_main(void *param)
>   *
>   * @sched: scheduler instance
>   * @ops: backend operations for this scheduler
> + * @submit_wq: workqueue to use for submission. If NULL, the system_wq is used
>   * @hw_submission: number of hw submissions that can be in flight
>   * @hang_limit: number of times to allow a job to hang before dropping it
>   * @timeout: timeout value in jiffies for the scheduler
> @@ -1084,14 +1077,16 @@ static int drm_sched_main(void *param)
>   */
>  int drm_sched_init(struct drm_gpu_scheduler *sched,
>  		   const struct drm_sched_backend_ops *ops,
> +		   struct workqueue_struct *submit_wq,
>  		   unsigned hw_submission, unsigned hang_limit,
>  		   long timeout, struct workqueue_struct *timeout_wq,
>  		   atomic_t *score, const char *name, struct device *dev)
>  {
> -	int i, ret;
> +	int i;
>  	sched->ops = ops;
>  	sched->hw_submission_limit = hw_submission;
>  	sched->name = name;
> +	sched->submit_wq = submit_wq ? : system_wq;

My understanding is that the new design is based on the idea of
splitting the drm_sched_main function into work items that can be
scheduled independently so users/drivers can insert their own
steps/works without requiring changes to drm_sched. This approach is
relying on the properties of ordered workqueues (1 work executed at a
time, FIFO behavior) to guarantee that these steps are still executed
in order, and one at a time.

Given what you're trying to achieve I think we should create an ordered
workqueue instead of using the system_wq when submit_wq is NULL,
otherwise you lose this ordering/serialization guarantee which both
the dedicated kthread and ordered wq provide. It will probably work for
most drivers, but might lead to subtle/hard to spot ordering issues.

>  	sched->timeout = timeout;
>  	sched->timeout_wq = timeout_wq ? : system_wq;
>  	sched->hang_limit = hang_limit;
> @@ -1100,23 +1095,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>  	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
>  		drm_sched_rq_init(sched, &sched->sched_rq[i]);
>  
> -	init_waitqueue_head(&sched->wake_up_worker);
>  	init_waitqueue_head(&sched->job_scheduled);
>  	INIT_LIST_HEAD(&sched->pending_list);
>  	spin_lock_init(&sched->job_list_lock);
>  	atomic_set(&sched->hw_rq_count, 0);
>  	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
> +	INIT_WORK(&sched->work_submit, drm_sched_main);
>  	atomic_set(&sched->_score, 0);
>  	atomic64_set(&sched->job_id_count, 0);
> -
> -	/* Each scheduler will run on a seperate kernel thread */
> -	sched->thread = kthread_run(drm_sched_main, sched, sched->name);
> -	if (IS_ERR(sched->thread)) {
> -		ret = PTR_ERR(sched->thread);
> -		sched->thread = NULL;
> -		DRM_DEV_ERROR(sched->dev, "Failed to create scheduler for %s.\n", name);
> -		return ret;
> -	}
> +	sched->pause_submit = false;
>  
>  	sched->ready = true;
>  	return 0;

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity Matthew Brost
@ 2023-09-12  7:37   ` Boris Brezillon
  2023-09-12 15:14     ` Matthew Brost
  2023-09-12 14:11   ` kernel test robot
  2023-09-14  4:18   ` Luben Tuikov
  2 siblings, 1 reply; 53+ messages in thread
From: Boris Brezillon @ 2023-09-12  7:37 UTC (permalink / raw)
  To: Matthew Brost
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, christian.koenig, luben.tuikov,
	donald.robson, daniel, lina, airlied, intel-xe, faith.ekstrand

On Mon, 11 Sep 2023 19:16:05 -0700
Matthew Brost <matthew.brost@intel.com> wrote:

> Rather than a global modparam for scheduling policy, move the scheduling
> policy to scheduler / entity so user can control each scheduler / entity
> policy.

I'm a bit confused by the commit message (I think I'm okay with the
diff though). Sounds like entity is involved in the sched policy
choice, but AFAICT, it just has to live with the scheduler policy chosen
by the driver at init time. If my understanding is correct, I'd just
drop the ' / entity'.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 05/13] drm/sched: Split free_job into own work item
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 05/13] drm/sched: Split free_job into own work item Matthew Brost
@ 2023-09-12  8:08   ` Boris Brezillon
  2023-09-12 14:37     ` Matthew Brost
  0 siblings, 1 reply; 53+ messages in thread
From: Boris Brezillon @ 2023-09-12  8:08 UTC (permalink / raw)
  To: Matthew Brost
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, christian.koenig, luben.tuikov,
	donald.robson, daniel, lina, airlied, intel-xe, faith.ekstrand

On Mon, 11 Sep 2023 19:16:07 -0700
Matthew Brost <matthew.brost@intel.com> wrote:

> Rather than call free_job and run_job in same work item have a dedicated
> work item for each. This aligns with the design and intended use of work
> queues.
> 
> v2:
>    - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
>      timestamp in free_job() work item (Danilo)
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 143 ++++++++++++++++++-------
>  include/drm/gpu_scheduler.h            |   8 +-
>  2 files changed, 110 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 3820e9ae12c8..d28b6751256e 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -213,11 +213,12 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
>   * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run
>   *
>   * @rq: scheduler run queue to check.
> + * @dequeue: dequeue selected entity
>   *
>   * Try to find a ready entity, returns NULL if none found.
>   */
>  static struct drm_sched_entity *
> -drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq, bool dequeue)
>  {
>  	struct drm_sched_entity *entity;
>  
> @@ -227,8 +228,10 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
>  	if (entity) {
>  		list_for_each_entry_continue(entity, &rq->entities, list) {
>  			if (drm_sched_entity_is_ready(entity)) {
> -				rq->current_entity = entity;
> -				reinit_completion(&entity->entity_idle);
> +				if (dequeue) {
> +					rq->current_entity = entity;
> +					reinit_completion(&entity->entity_idle);
> +				}
>  				spin_unlock(&rq->lock);
>  				return entity;
>  			}
> @@ -238,8 +241,10 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
>  	list_for_each_entry(entity, &rq->entities, list) {
>  
>  		if (drm_sched_entity_is_ready(entity)) {
> -			rq->current_entity = entity;
> -			reinit_completion(&entity->entity_idle);
> +			if (dequeue) {
> +				rq->current_entity = entity;
> +				reinit_completion(&entity->entity_idle);
> +			}
>  			spin_unlock(&rq->lock);
>  			return entity;
>  		}
> @@ -257,11 +262,12 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
>   * drm_sched_rq_select_entity_fifo - Select an entity which provides a job to run
>   *
>   * @rq: scheduler run queue to check.
> + * @dequeue: dequeue selected entity
>   *
>   * Find oldest waiting ready entity, returns NULL if none found.
>   */
>  static struct drm_sched_entity *
> -drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq, bool dequeue)
>  {
>  	struct rb_node *rb;
>  
> @@ -271,8 +277,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>  
>  		entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
>  		if (drm_sched_entity_is_ready(entity)) {
> -			rq->current_entity = entity;
> -			reinit_completion(&entity->entity_idle);
> +			if (dequeue) {
> +				rq->current_entity = entity;
> +				reinit_completion(&entity->entity_idle);
> +			}
>  			break;
>  		}
>  	}
> @@ -282,13 +290,54 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>  }
>  
>  /**
> - * drm_sched_submit_queue - scheduler queue submission
> + * drm_sched_run_job_queue - queue job submission
>   * @sched: scheduler instance
>   */
> -static void drm_sched_submit_queue(struct drm_gpu_scheduler *sched)
> +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>  {
>  	if (!READ_ONCE(sched->pause_submit))
> -		queue_work(sched->submit_wq, &sched->work_submit);
> +		queue_work(sched->submit_wq, &sched->work_run_job);
> +}
> +
> +static struct drm_sched_entity *
> +drm_sched_select_entity(struct drm_gpu_scheduler *sched, bool dequeue);

Nit: Can you drop this forward declaration and move the function here?

> +
> +/**
> + * drm_sched_run_job_queue_if_ready - queue job submission if ready
> + * @sched: scheduler instance
> + */
> +static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> +{
> +	if (drm_sched_select_entity(sched, false))
> +		drm_sched_run_job_queue(sched);
> +}
> +
> +/**
> + * drm_sched_free_job_queue - queue free job
> + *
> + * @sched: scheduler instance to queue free job
> + */
> +static void drm_sched_free_job_queue(struct drm_gpu_scheduler *sched)
> +{
> +	if (!READ_ONCE(sched->pause_submit))
> +		queue_work(sched->submit_wq, &sched->work_free_job);
> +}
> +
> +/**
> + * drm_sched_free_job_queue_if_ready - queue free job if ready
> + *
> + * @sched: scheduler instance to queue free job
> + */
> +static void drm_sched_free_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> +{
> +	struct drm_sched_job *job;
> +
> +	spin_lock(&sched->job_list_lock);
> +	job = list_first_entry_or_null(&sched->pending_list,
> +				       struct drm_sched_job, list);
> +	if (job && dma_fence_is_signaled(&job->s_fence->finished))
> +		drm_sched_free_job_queue(sched);
> +	spin_unlock(&sched->job_list_lock);
>  }
>  
>  /**
> @@ -310,7 +359,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
>  	dma_fence_get(&s_fence->finished);
>  	drm_sched_fence_finished(s_fence, result);
>  	dma_fence_put(&s_fence->finished);
> -	drm_sched_submit_queue(sched);
> +	drm_sched_free_job_queue(sched);
>  }
>  
>  /**
> @@ -906,18 +955,19 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>  void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched)
>  {
>  	if (drm_sched_can_queue(sched))
> -		drm_sched_submit_queue(sched);
> +		drm_sched_run_job_queue(sched);
>  }
>  
>  /**
>   * drm_sched_select_entity - Select next entity to process
>   *
>   * @sched: scheduler instance
> + * @dequeue: dequeue selected entity
>   *
>   * Returns the entity to process or NULL if none are found.
>   */
>  static struct drm_sched_entity *
> -drm_sched_select_entity(struct drm_gpu_scheduler *sched)
> +drm_sched_select_entity(struct drm_gpu_scheduler *sched, bool dequeue)
>  {
>  	struct drm_sched_entity *entity;
>  	int i;
> @@ -936,8 +986,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>  	/* Kernel run queue has higher priority than normal run queue*/
>  	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
>  		entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ?
> -			drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]) :
> -			drm_sched_rq_select_entity_rr(&sched->sched_rq[i]);
> +			drm_sched_rq_select_entity_fifo(&sched->sched_rq[i],
> +							dequeue) :
> +			drm_sched_rq_select_entity_rr(&sched->sched_rq[i],
> +						      dequeue);
>  		if (entity)
>  			break;
>  	}
> @@ -974,8 +1026,10 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>  						typeof(*next), list);
>  
>  		if (next) {
> -			next->s_fence->scheduled.timestamp =
> -				job->s_fence->finished.timestamp;
> +			if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
> +				     &next->s_fence->scheduled.flags))
> +				next->s_fence->scheduled.timestamp =
> +					job->s_fence->finished.timestamp;

Looks like you are changing the behavior here (unconditional ->
conditional timestamp update)? Probably something that should go in a
separate patch.

>  			/* start TO timer for next job */
>  			drm_sched_start_timeout(sched);
>  		}
> @@ -1025,30 +1079,44 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>  EXPORT_SYMBOL(drm_sched_pick_best);
>  
>  /**
> - * drm_sched_main - main scheduler thread
> + * drm_sched_free_job_work - worker to call free_job
>   *
> - * @param: scheduler instance
> + * @w: free job work
>   */
> -static void drm_sched_main(struct work_struct *w)
> +static void drm_sched_free_job_work(struct work_struct *w)
>  {
>  	struct drm_gpu_scheduler *sched =
> -		container_of(w, struct drm_gpu_scheduler, work_submit);
> -	struct drm_sched_entity *entity;
> +		container_of(w, struct drm_gpu_scheduler, work_free_job);
>  	struct drm_sched_job *cleanup_job;
> -	int r;
>  
>  	if (READ_ONCE(sched->pause_submit))
>  		return;
>  
>  	cleanup_job = drm_sched_get_cleanup_job(sched);
> -	entity = drm_sched_select_entity(sched);
> +	if (cleanup_job) {
> +		sched->ops->free_job(cleanup_job);
> +
> +		drm_sched_free_job_queue_if_ready(sched);
> +		drm_sched_run_job_queue_if_ready(sched);
> +	}
> +}
>  
> -	if (!entity && !cleanup_job)
> -		return;	/* No more work */
> +/**
> + * drm_sched_run_job_work - worker to call run_job
> + *
> + * @w: run job work
> + */
> +static void drm_sched_run_job_work(struct work_struct *w)
> +{
> +	struct drm_gpu_scheduler *sched =
> +		container_of(w, struct drm_gpu_scheduler, work_run_job);
> +	struct drm_sched_entity *entity;
> +	int r;
>  
> -	if (cleanup_job)
> -		sched->ops->free_job(cleanup_job);
> +	if (READ_ONCE(sched->pause_submit))
> +		return;
>  
> +	entity = drm_sched_select_entity(sched, true);

Nit:

	if (!entity)
		return;

then you can save an indentation level for the rest of the function.

>  	if (entity) {
>  		struct dma_fence *fence;
>  		struct drm_sched_fence *s_fence;
> @@ -1057,9 +1125,7 @@ static void drm_sched_main(struct work_struct *w)
>  		sched_job = drm_sched_entity_pop_job(entity);
>  		if (!sched_job) {
>  			complete_all(&entity->entity_idle);
> -			if (!cleanup_job)
> -				return;	/* No more work */
> -			goto again;
> +			return;	/* No more work */
>  		}
>  
>  		s_fence = sched_job->s_fence;
> @@ -1089,10 +1155,8 @@ static void drm_sched_main(struct work_struct *w)
>  		}
>  
>  		wake_up(&sched->job_scheduled);
> +		drm_sched_run_job_queue_if_ready(sched);
>  	}
> -
> -again:
> -	drm_sched_submit_queue(sched);
>  }
>  
>  /**
> @@ -1151,7 +1215,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>  	spin_lock_init(&sched->job_list_lock);
>  	atomic_set(&sched->hw_rq_count, 0);
>  	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
> -	INIT_WORK(&sched->work_submit, drm_sched_main);
> +	INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
> +	INIT_WORK(&sched->work_free_job, drm_sched_free_job_work);
>  	atomic_set(&sched->_score, 0);
>  	atomic64_set(&sched->job_id_count, 0);
>  	sched->pause_submit = false;
> @@ -1276,7 +1341,8 @@ EXPORT_SYMBOL(drm_sched_submit_ready);
>  void drm_sched_submit_stop(struct drm_gpu_scheduler *sched)
>  {
>  	WRITE_ONCE(sched->pause_submit, true);
> -	cancel_work_sync(&sched->work_submit);
> +	cancel_work_sync(&sched->work_run_job);
> +	cancel_work_sync(&sched->work_free_job);
>  }
>  EXPORT_SYMBOL(drm_sched_submit_stop);
>  
> @@ -1288,6 +1354,7 @@ EXPORT_SYMBOL(drm_sched_submit_stop);
>  void drm_sched_submit_start(struct drm_gpu_scheduler *sched)
>  {
>  	WRITE_ONCE(sched->pause_submit, false);
> -	queue_work(sched->submit_wq, &sched->work_submit);
> +	queue_work(sched->submit_wq, &sched->work_run_job);
> +	queue_work(sched->submit_wq, &sched->work_free_job);
>  }
>  EXPORT_SYMBOL(drm_sched_submit_start);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 04eec2d7635f..fbc083a92757 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -487,9 +487,10 @@ struct drm_sched_backend_ops {
>   *                 finished.
>   * @hw_rq_count: the number of jobs currently in the hardware queue.
>   * @job_id_count: used to assign unique id to the each job.
> - * @submit_wq: workqueue used to queue @work_submit
> + * @submit_wq: workqueue used to queue @work_run_job and @work_free_job
>   * @timeout_wq: workqueue used to queue @work_tdr
> - * @work_submit: schedules jobs and cleans up entities
> + * @work_run_job: schedules jobs
> + * @work_free_job: cleans up jobs
>   * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
>   *            timeout interval is over.
>   * @pending_list: the list of jobs which are currently in the job queue.
> @@ -518,7 +519,8 @@ struct drm_gpu_scheduler {
>  	atomic64_t			job_id_count;
>  	struct workqueue_struct		*submit_wq;
>  	struct workqueue_struct		*timeout_wq;
> -	struct work_struct		work_submit;
> +	struct work_struct		work_run_job;
> +	struct work_struct		work_free_job;
>  	struct delayed_work		work_tdr;
>  	struct list_head		pending_list;
>  	spinlock_t			job_list_lock;


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 06/13] drm/sched: Add generic scheduler message interface
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 06/13] drm/sched: Add generic scheduler message interface Matthew Brost
@ 2023-09-12  8:23   ` Boris Brezillon
  2023-09-12 14:50     ` Matthew Brost
  0 siblings, 1 reply; 53+ messages in thread
From: Boris Brezillon @ 2023-09-12  8:23 UTC (permalink / raw)
  To: Matthew Brost
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, christian.koenig, luben.tuikov,
	donald.robson, daniel, lina, airlied, intel-xe, faith.ekstrand

On Mon, 11 Sep 2023 19:16:08 -0700
Matthew Brost <matthew.brost@intel.com> wrote:

> Add generic schedule message interface which sends messages to backend
> from the drm_gpu_scheduler main submission thread. The idea is some of
> these messages modify some state in drm_sched_entity which is also
> modified during submission. By scheduling these messages and submission
> in the same thread their is not race changing states in
> drm_sched_entity.
> 
> This interface will be used in Xe, new Intel GPU driver, to cleanup,
> suspend, resume, and change scheduling properties of a drm_sched_entity.
> 
> The interface is designed to be generic and extendable with only the
> backend understanding the messages.

I didn't follow the previous discussions closely enough, but it seemed
to me that the whole point of this 'ordered-wq for scheduler' approach
was so you could interleave your driver-specific work items in the
processing without changing the core. This messaging system looks like
something that could/should be entirely driver-specific to me, and I'm
not convinced this thin 'work -> generic_message_callback' layer is
worth it. You can simply have your own xe_msg_process work, and a
xe_msg_send helper that schedules this work. Assuming other drivers
need this messaging API, they'll probably have their own message ids
and payloads, and the automation done here is simple enough that it can
be duplicated. That's just my personal opinion, of course, and if
others see this message interface as valuable, I fine with it.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill Matthew Brost
@ 2023-09-12  8:44   ` Boris Brezillon
  2023-09-12  9:57   ` Christian König
  2023-09-12 10:28   ` Boris Brezillon
  2 siblings, 0 replies; 53+ messages in thread
From: Boris Brezillon @ 2023-09-12  8:44 UTC (permalink / raw)
  To: Matthew Brost
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, christian.koenig, luben.tuikov,
	donald.robson, daniel, lina, airlied, intel-xe, faith.ekstrand

On Mon, 11 Sep 2023 19:16:13 -0700
Matthew Brost <matthew.brost@intel.com> wrote:

> Wait for pending jobs to be complete before signaling queued jobs.

You probably want to add 'in drm_sched_entity_kill()', even if it's
already specified in the subject, because I was trying to understand
why we'd want to do that in the normal path.

> This
> ensures dma-fence signaling order correct and also ensures the entity is
> not running on the hardware after drm_sched_entity_flush or
> drm_sched_entity_fini returns.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill Matthew Brost
  2023-09-12  8:44   ` Boris Brezillon
@ 2023-09-12  9:57   ` Christian König
  2023-09-12 14:47     ` Matthew Brost
  2023-09-12 10:28   ` Boris Brezillon
  2 siblings, 1 reply; 53+ messages in thread
From: Christian König @ 2023-09-12  9:57 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, faith.ekstrand

Am 12.09.23 um 04:16 schrieb Matthew Brost:
> Wait for pending jobs to be complete before signaling queued jobs. This
> ensures dma-fence signaling order correct and also ensures the entity is
> not running on the hardware after drm_sched_entity_flush or
> drm_sched_entity_fini returns.

Entities are *not* supposed to outlive the submissions they carry and we 
absolutely *can't* wait for submissions to finish while killing the entity.

In other words it is perfectly expected that entities doesn't exists any 
more while the submissions they carried are still running on the hardware.

I somehow better need to document how this working and especially why it 
is working like that.

This approach came up like four or five times now and we already applied 
and reverted patches doing this.

For now let's take a look at the source code of drm_sched_entity_kill():

        /* The entity is guaranteed to not be used by the scheduler */
         prev = rcu_dereference_check(entity->last_scheduled, true);
         dma_fence_get(prev);

         while ((job = 
to_drm_sched_job(spsc_queue_pop(&entity->job_queue)))) {
                 struct drm_sched_fence *s_fence = job->s_fence;

                 dma_fence_get(&s_fence->finished);
                 if (!prev || dma_fence_add_callback(prev, &job->finish_cb,
drm_sched_entity_kill_jobs_cb))
                         drm_sched_entity_kill_jobs_cb(NULL, 
&job->finish_cb);

                 prev = &s_fence->finished;
         }
         dma_fence_put(prev);

This ensures the dma-fence signaling order by delegating signaling of 
the scheduler fences into callbacks.

Regards,
Christian.

>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
>   drivers/gpu/drm/scheduler/sched_entity.c    |  7 ++-
>   drivers/gpu/drm/scheduler/sched_main.c      | 50 ++++++++++++++++++---
>   include/drm/gpu_scheduler.h                 | 18 ++++++++
>   4 files changed, 70 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> index fb5dad687168..7835c0da65c5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> @@ -1873,7 +1873,7 @@ static void amdgpu_ib_preempt_mark_partial_job(struct amdgpu_ring *ring)
>   	list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) {
>   		if (dma_fence_is_signaled(&s_job->s_fence->finished)) {
>   			/* remove job from ring_mirror_list */
> -			list_del_init(&s_job->list);
> +			drm_sched_remove_pending_job(s_job);
>   			sched->ops->free_job(s_job);
>   			continue;
>   		}
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 1dec97caaba3..37557fbb96d0 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -104,9 +104,11 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>   	}
>   
>   	init_completion(&entity->entity_idle);
> +	init_completion(&entity->jobs_done);
>   
> -	/* We start in an idle state. */
> +	/* We start in an idle and jobs done state. */
>   	complete_all(&entity->entity_idle);
> +	complete_all(&entity->jobs_done);
>   
>   	spin_lock_init(&entity->rq_lock);
>   	spsc_queue_init(&entity->job_queue);
> @@ -256,6 +258,9 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
>   	/* Make sure this entity is not used by the scheduler at the moment */
>   	wait_for_completion(&entity->entity_idle);
>   
> +	/* Make sure all pending jobs are done */
> +	wait_for_completion(&entity->jobs_done);
> +
>   	/* The entity is guaranteed to not be used by the scheduler */
>   	prev = rcu_dereference_check(entity->last_scheduled, true);
>   	dma_fence_get(prev);
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 689fb6686e01..ed6f5680793a 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -510,12 +510,52 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
>   }
>   EXPORT_SYMBOL(drm_sched_resume_timeout);
>   
> +/**
> + * drm_sched_add_pending_job - Add pending job to scheduler
> + *
> + * @job: scheduler job to add
> + * @tail: add to tail of pending list
> + */
> +void drm_sched_add_pending_job(struct drm_sched_job *job, bool tail)
> +{
> +	struct drm_gpu_scheduler *sched = job->sched;
> +	struct drm_sched_entity *entity = job->entity;
> +
> +	lockdep_assert_held(&sched->job_list_lock);
> +
> +	if (tail)
> +		list_add_tail(&job->list, &sched->pending_list);
> +	else
> +		list_add(&job->list, &sched->pending_list);
> +	if (!entity->pending_job_count++)
> +		reinit_completion(&entity->jobs_done);
> +}
> +EXPORT_SYMBOL(drm_sched_add_pending_job);
> +
> +/**
> + * drm_sched_remove_pending_job - Remove pending job from` scheduler
> + *
> + * @job: scheduler job to remove
> + */
> +void drm_sched_remove_pending_job(struct drm_sched_job *job)
> +{
> +	struct drm_gpu_scheduler *sched = job->sched;
> +	struct drm_sched_entity *entity = job->entity;
> +
> +	lockdep_assert_held(&sched->job_list_lock);
> +
> +	list_del_init(&job->list);
> +	if (!--entity->pending_job_count)
> +		complete_all(&entity->jobs_done);
> +}
> +EXPORT_SYMBOL(drm_sched_remove_pending_job);
> +
>   static void drm_sched_job_begin(struct drm_sched_job *s_job)
>   {
>   	struct drm_gpu_scheduler *sched = s_job->sched;
>   
>   	spin_lock(&sched->job_list_lock);
> -	list_add_tail(&s_job->list, &sched->pending_list);
> +	drm_sched_add_pending_job(s_job, true);
>   	spin_unlock(&sched->job_list_lock);
>   }
>   
> @@ -538,7 +578,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
>   		 * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
>   		 * is parked at which point it's safe.
>   		 */
> -		list_del_init(&job->list);
> +		drm_sched_remove_pending_job(job);
>   		spin_unlock(&sched->job_list_lock);
>   
>   		status = job->sched->ops->timedout_job(job);
> @@ -589,7 +629,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>   		 * Add at the head of the queue to reflect it was the earliest
>   		 * job extracted.
>   		 */
> -		list_add(&bad->list, &sched->pending_list);
> +		drm_sched_add_pending_job(bad, false);
>   
>   	/*
>   	 * Iterate the job list from later to  earlier one and either deactive
> @@ -611,7 +651,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>   			 * Locking here is for concurrent resume timeout
>   			 */
>   			spin_lock(&sched->job_list_lock);
> -			list_del_init(&s_job->list);
> +			drm_sched_remove_pending_job(s_job);
>   			spin_unlock(&sched->job_list_lock);
>   
>   			/*
> @@ -1066,7 +1106,7 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>   
>   	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>   		/* remove job from pending_list */
> -		list_del_init(&job->list);
> +		drm_sched_remove_pending_job(job);
>   
>   		/* cancel this job's TO timer */
>   		cancel_delayed_work(&sched->work_tdr);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index b7b818cd81b6..7c628f36fe78 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -233,6 +233,21 @@ struct drm_sched_entity {
>   	 */
>   	struct completion		entity_idle;
>   
> +	/**
> +	 * @pending_job_count:
> +	 *
> +	 * Number of pending jobs.
> +	 */
> +	unsigned int                    pending_job_count;
> +
> +	/**
> +	 * @jobs_done:
> +	 *
> +	 * Signals when entity has no pending jobs, used to sequence entity
> +	 * cleanup in drm_sched_entity_fini().
> +	 */
> +	struct completion		jobs_done;
> +
>   	/**
>   	 * @oldest_job_waiting:
>   	 *
> @@ -656,4 +671,7 @@ struct drm_gpu_scheduler *
>   drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>   		     unsigned int num_sched_list);
>   
> +void drm_sched_add_pending_job(struct drm_sched_job *job, bool tail);
> +void drm_sched_remove_pending_job(struct drm_sched_job *job);
> +
>   #endif


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill Matthew Brost
  2023-09-12  8:44   ` Boris Brezillon
  2023-09-12  9:57   ` Christian König
@ 2023-09-12 10:28   ` Boris Brezillon
  2023-09-12 14:54     ` Matthew Brost
  2 siblings, 1 reply; 53+ messages in thread
From: Boris Brezillon @ 2023-09-12 10:28 UTC (permalink / raw)
  To: Matthew Brost
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, christian.koenig, luben.tuikov,
	donald.robson, daniel, lina, airlied, intel-xe, faith.ekstrand

On Mon, 11 Sep 2023 19:16:13 -0700
Matthew Brost <matthew.brost@intel.com> wrote:

> +void drm_sched_add_pending_job(struct drm_sched_job *job, bool tail)
> +{
> +	struct drm_gpu_scheduler *sched = job->sched;
> +	struct drm_sched_entity *entity = job->entity;

drm_sched_entity_pop_job() sets job->entity to NULL [1], and I end with
a NULL deref in this function. I guess you have another patch in your
tree dropping this job->entity = NULL in drm_sched_entity_pop_job(),
but given this comment [1], it's probably not the right thing to do.

> +
> +	lockdep_assert_held(&sched->job_list_lock);
> +
> +	if (tail)
> +		list_add_tail(&job->list, &sched->pending_list);
> +	else
> +		list_add(&job->list, &sched->pending_list);
> +	if (!entity->pending_job_count++)
> +		reinit_completion(&entity->jobs_done);
> +}
> +EXPORT_SYMBOL(drm_sched_add_pending_job);

[1]https://elixir.bootlin.com/linux/v6.6-rc1/source/drivers/gpu/drm/scheduler/sched_entity.c#L497

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity Matthew Brost
  2023-09-12  7:37   ` Boris Brezillon
@ 2023-09-12 14:11   ` kernel test robot
  2023-09-12 15:17     ` Matthew Brost
  2023-09-14  4:18   ` Luben Tuikov
  2 siblings, 1 reply; 53+ messages in thread
From: kernel test robot @ 2023-09-12 14:11 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, mcanal, sarah.walker, ketil.johnsen, lina, llvm,
	Liviu.Dudau, oe-kbuild-all, luben.tuikov, boris.brezillon,
	donald.robson, christian.koenig, faith.ekstrand

Hi Matthew,

kernel test robot noticed the following build errors:

[auto build test ERROR on drm/drm-next]
[also build test ERROR on drm-exynos/exynos-drm-next drm-intel/for-linux-next-fixes drm-tip/drm-tip linus/master v6.6-rc1 next-20230912]
[cannot apply to drm-misc/drm-misc-next drm-intel/for-linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-Brost/drm-sched-Add-drm_sched_submit_-helpers/20230912-102001
base:   git://anongit.freedesktop.org/drm/drm drm-next
patch link:    https://lore.kernel.org/r/20230912021615.2086698-4-matthew.brost%40intel.com
patch subject: [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity
config: arm64-randconfig-r032-20230912 (https://download.01.org/0day-ci/archive/20230912/202309122100.HAEi8ytJ-lkp@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230912/202309122100.HAEi8ytJ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309122100.HAEi8ytJ-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/gpu/drm/v3d/v3d_sched.c:403:9: error: use of undeclared identifier 'ULL'
                                ULL, "v3d_render", DRM_SCHED_POLICY_DEFAULT,
                                ^
   1 error generated.


vim +/ULL +403 drivers/gpu/drm/v3d/v3d_sched.c

   381	
   382	int
   383	v3d_sched_init(struct v3d_dev *v3d)
   384	{
   385		int hw_jobs_limit = 1;
   386		int job_hang_limit = 0;
   387		int hang_limit_ms = 500;
   388		int ret;
   389	
   390		ret = drm_sched_init(&v3d->queue[V3D_BIN].sched,
   391				     &v3d_bin_sched_ops, NULL,
   392				     hw_jobs_limit, job_hang_limit,
   393				     msecs_to_jiffies(hang_limit_ms), NULL,
   394				     NULL, "v3d_bin", DRM_SCHED_POLICY_DEFAULT,
   395				     v3d->drm.dev);
   396		if (ret)
   397			return ret;
   398	
   399		ret = drm_sched_init(&v3d->queue[V3D_RENDER].sched,
   400				     &v3d_render_sched_ops, NULL,
   401				     hw_jobs_limit, job_hang_limit,
   402				     msecs_to_jiffies(hang_limit_ms), NULL,
 > 403				     ULL, "v3d_render", DRM_SCHED_POLICY_DEFAULT,
   404				     v3d->drm.dev);
   405		if (ret)
   406			goto fail;
   407	
   408		ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
   409				     &v3d_tfu_sched_ops, NULL,
   410				     hw_jobs_limit, job_hang_limit,
   411				     msecs_to_jiffies(hang_limit_ms), NULL,
   412				     NULL, "v3d_tfu", DRM_SCHED_POLICY_DEFAULT,
   413				     v3d->drm.dev);
   414		if (ret)
   415			goto fail;
   416	
   417		if (v3d_has_csd(v3d)) {
   418			ret = drm_sched_init(&v3d->queue[V3D_CSD].sched,
   419					     &v3d_csd_sched_ops, NULL,
   420					     hw_jobs_limit, job_hang_limit,
   421					     msecs_to_jiffies(hang_limit_ms), NULL,
   422					     NULL, "v3d_csd", DRM_SCHED_POLICY_DEFAULT,
   423					     v3d->drm.dev);
   424			if (ret)
   425				goto fail;
   426	
   427			ret = drm_sched_init(&v3d->queue[V3D_CACHE_CLEAN].sched,
   428					     &v3d_cache_clean_sched_ops, NULL,
   429					     hw_jobs_limit, job_hang_limit,
   430					     msecs_to_jiffies(hang_limit_ms), NULL,
   431					     NULL, "v3d_cache_clean",
   432					     DRM_SCHED_POLICY_DEFAULT, v3d->drm.dev);
   433			if (ret)
   434				goto fail;
   435		}
   436	
   437		return 0;
   438	
   439	fail:
   440		v3d_sched_fini(v3d);
   441		return ret;
   442	}
   443	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 05/13] drm/sched: Split free_job into own work item
  2023-09-12  8:08   ` Boris Brezillon
@ 2023-09-12 14:37     ` Matthew Brost
  2023-09-12 14:53       ` Boris Brezillon
  0 siblings, 1 reply; 53+ messages in thread
From: Matthew Brost @ 2023-09-12 14:37 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, christian.koenig, luben.tuikov,
	donald.robson, daniel, lina, airlied, intel-xe, faith.ekstrand

On Tue, Sep 12, 2023 at 10:08:33AM +0200, Boris Brezillon wrote:
> On Mon, 11 Sep 2023 19:16:07 -0700
> Matthew Brost <matthew.brost@intel.com> wrote:
> 
> > Rather than call free_job and run_job in same work item have a dedicated
> > work item for each. This aligns with the design and intended use of work
> > queues.
> > 
> > v2:
> >    - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
> >      timestamp in free_job() work item (Danilo)
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 143 ++++++++++++++++++-------
> >  include/drm/gpu_scheduler.h            |   8 +-
> >  2 files changed, 110 insertions(+), 41 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 3820e9ae12c8..d28b6751256e 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -213,11 +213,12 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
> >   * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run
> >   *
> >   * @rq: scheduler run queue to check.
> > + * @dequeue: dequeue selected entity
> >   *
> >   * Try to find a ready entity, returns NULL if none found.
> >   */
> >  static struct drm_sched_entity *
> > -drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
> > +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq, bool dequeue)
> >  {
> >  	struct drm_sched_entity *entity;
> >  
> > @@ -227,8 +228,10 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
> >  	if (entity) {
> >  		list_for_each_entry_continue(entity, &rq->entities, list) {
> >  			if (drm_sched_entity_is_ready(entity)) {
> > -				rq->current_entity = entity;
> > -				reinit_completion(&entity->entity_idle);
> > +				if (dequeue) {
> > +					rq->current_entity = entity;
> > +					reinit_completion(&entity->entity_idle);
> > +				}
> >  				spin_unlock(&rq->lock);
> >  				return entity;
> >  			}
> > @@ -238,8 +241,10 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
> >  	list_for_each_entry(entity, &rq->entities, list) {
> >  
> >  		if (drm_sched_entity_is_ready(entity)) {
> > -			rq->current_entity = entity;
> > -			reinit_completion(&entity->entity_idle);
> > +			if (dequeue) {
> > +				rq->current_entity = entity;
> > +				reinit_completion(&entity->entity_idle);
> > +			}
> >  			spin_unlock(&rq->lock);
> >  			return entity;
> >  		}
> > @@ -257,11 +262,12 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
> >   * drm_sched_rq_select_entity_fifo - Select an entity which provides a job to run
> >   *
> >   * @rq: scheduler run queue to check.
> > + * @dequeue: dequeue selected entity
> >   *
> >   * Find oldest waiting ready entity, returns NULL if none found.
> >   */
> >  static struct drm_sched_entity *
> > -drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
> > +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq, bool dequeue)
> >  {
> >  	struct rb_node *rb;
> >  
> > @@ -271,8 +277,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
> >  
> >  		entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
> >  		if (drm_sched_entity_is_ready(entity)) {
> > -			rq->current_entity = entity;
> > -			reinit_completion(&entity->entity_idle);
> > +			if (dequeue) {
> > +				rq->current_entity = entity;
> > +				reinit_completion(&entity->entity_idle);
> > +			}
> >  			break;
> >  		}
> >  	}
> > @@ -282,13 +290,54 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
> >  }
> >  
> >  /**
> > - * drm_sched_submit_queue - scheduler queue submission
> > + * drm_sched_run_job_queue - queue job submission
> >   * @sched: scheduler instance
> >   */
> > -static void drm_sched_submit_queue(struct drm_gpu_scheduler *sched)
> > +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> >  {
> >  	if (!READ_ONCE(sched->pause_submit))
> > -		queue_work(sched->submit_wq, &sched->work_submit);
> > +		queue_work(sched->submit_wq, &sched->work_run_job);
> > +}
> > +
> > +static struct drm_sched_entity *
> > +drm_sched_select_entity(struct drm_gpu_scheduler *sched, bool dequeue);
> 
> Nit: Can you drop this forward declaration and move the function here?
>

Sure. Will likely move this function in a seperate patch though.
 
> > +
> > +/**
> > + * drm_sched_run_job_queue_if_ready - queue job submission if ready
> > + * @sched: scheduler instance
> > + */
> > +static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> > +{
> > +	if (drm_sched_select_entity(sched, false))
> > +		drm_sched_run_job_queue(sched);
> > +}
> > +
> > +/**
> > + * drm_sched_free_job_queue - queue free job
> > + *
> > + * @sched: scheduler instance to queue free job
> > + */
> > +static void drm_sched_free_job_queue(struct drm_gpu_scheduler *sched)
> > +{
> > +	if (!READ_ONCE(sched->pause_submit))
> > +		queue_work(sched->submit_wq, &sched->work_free_job);
> > +}
> > +
> > +/**
> > + * drm_sched_free_job_queue_if_ready - queue free job if ready
> > + *
> > + * @sched: scheduler instance to queue free job
> > + */
> > +static void drm_sched_free_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> > +{
> > +	struct drm_sched_job *job;
> > +
> > +	spin_lock(&sched->job_list_lock);
> > +	job = list_first_entry_or_null(&sched->pending_list,
> > +				       struct drm_sched_job, list);
> > +	if (job && dma_fence_is_signaled(&job->s_fence->finished))
> > +		drm_sched_free_job_queue(sched);
> > +	spin_unlock(&sched->job_list_lock);
> >  }
> >  
> >  /**
> > @@ -310,7 +359,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
> >  	dma_fence_get(&s_fence->finished);
> >  	drm_sched_fence_finished(s_fence, result);
> >  	dma_fence_put(&s_fence->finished);
> > -	drm_sched_submit_queue(sched);
> > +	drm_sched_free_job_queue(sched);
> >  }
> >  
> >  /**
> > @@ -906,18 +955,19 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
> >  void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched)
> >  {
> >  	if (drm_sched_can_queue(sched))
> > -		drm_sched_submit_queue(sched);
> > +		drm_sched_run_job_queue(sched);
> >  }
> >  
> >  /**
> >   * drm_sched_select_entity - Select next entity to process
> >   *
> >   * @sched: scheduler instance
> > + * @dequeue: dequeue selected entity
> >   *
> >   * Returns the entity to process or NULL if none are found.
> >   */
> >  static struct drm_sched_entity *
> > -drm_sched_select_entity(struct drm_gpu_scheduler *sched)
> > +drm_sched_select_entity(struct drm_gpu_scheduler *sched, bool dequeue)
> >  {
> >  	struct drm_sched_entity *entity;
> >  	int i;
> > @@ -936,8 +986,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
> >  	/* Kernel run queue has higher priority than normal run queue*/
> >  	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
> >  		entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ?
> > -			drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]) :
> > -			drm_sched_rq_select_entity_rr(&sched->sched_rq[i]);
> > +			drm_sched_rq_select_entity_fifo(&sched->sched_rq[i],
> > +							dequeue) :
> > +			drm_sched_rq_select_entity_rr(&sched->sched_rq[i],
> > +						      dequeue);
> >  		if (entity)
> >  			break;
> >  	}
> > @@ -974,8 +1026,10 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
> >  						typeof(*next), list);
> >  
> >  		if (next) {
> > -			next->s_fence->scheduled.timestamp =
> > -				job->s_fence->finished.timestamp;
> > +			if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
> > +				     &next->s_fence->scheduled.flags))
> > +				next->s_fence->scheduled.timestamp =
> > +					job->s_fence->finished.timestamp;
> 
> Looks like you are changing the behavior here (unconditional ->
> conditional timestamp update)? Probably something that should go in a
> separate patch.
> 

This patch creates a race so this check isn't need before this patch.
With that I think it makes sense to have all in a single patch. If you
feel strongly about this, I can break this change out into a patch prior
to this one.

> >  			/* start TO timer for next job */
> >  			drm_sched_start_timeout(sched);
> >  		}
> > @@ -1025,30 +1079,44 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
> >  EXPORT_SYMBOL(drm_sched_pick_best);
> >  
> >  /**
> > - * drm_sched_main - main scheduler thread
> > + * drm_sched_free_job_work - worker to call free_job
> >   *
> > - * @param: scheduler instance
> > + * @w: free job work
> >   */
> > -static void drm_sched_main(struct work_struct *w)
> > +static void drm_sched_free_job_work(struct work_struct *w)
> >  {
> >  	struct drm_gpu_scheduler *sched =
> > -		container_of(w, struct drm_gpu_scheduler, work_submit);
> > -	struct drm_sched_entity *entity;
> > +		container_of(w, struct drm_gpu_scheduler, work_free_job);
> >  	struct drm_sched_job *cleanup_job;
> > -	int r;
> >  
> >  	if (READ_ONCE(sched->pause_submit))
> >  		return;
> >  
> >  	cleanup_job = drm_sched_get_cleanup_job(sched);
> > -	entity = drm_sched_select_entity(sched);
> > +	if (cleanup_job) {
> > +		sched->ops->free_job(cleanup_job);
> > +
> > +		drm_sched_free_job_queue_if_ready(sched);
> > +		drm_sched_run_job_queue_if_ready(sched);
> > +	}
> > +}
> >  
> > -	if (!entity && !cleanup_job)
> > -		return;	/* No more work */
> > +/**
> > + * drm_sched_run_job_work - worker to call run_job
> > + *
> > + * @w: run job work
> > + */
> > +static void drm_sched_run_job_work(struct work_struct *w)
> > +{
> > +	struct drm_gpu_scheduler *sched =
> > +		container_of(w, struct drm_gpu_scheduler, work_run_job);
> > +	struct drm_sched_entity *entity;
> > +	int r;
> >  
> > -	if (cleanup_job)
> > -		sched->ops->free_job(cleanup_job);
> > +	if (READ_ONCE(sched->pause_submit))
> > +		return;
> >  
> > +	entity = drm_sched_select_entity(sched, true);
> 
> Nit:
> 
> 	if (!entity)
> 		return;
> 
> then you can save an indentation level for the rest of the function.
>

Sure.

Matt
 
> >  	if (entity) {
> >  		struct dma_fence *fence;
> >  		struct drm_sched_fence *s_fence;
> > @@ -1057,9 +1125,7 @@ static void drm_sched_main(struct work_struct *w)
> >  		sched_job = drm_sched_entity_pop_job(entity);
> >  		if (!sched_job) {
> >  			complete_all(&entity->entity_idle);
> > -			if (!cleanup_job)
> > -				return;	/* No more work */
> > -			goto again;
> > +			return;	/* No more work */
> >  		}
> >  
> >  		s_fence = sched_job->s_fence;
> > @@ -1089,10 +1155,8 @@ static void drm_sched_main(struct work_struct *w)
> >  		}
> >  
> >  		wake_up(&sched->job_scheduled);
> > +		drm_sched_run_job_queue_if_ready(sched);
> >  	}
> > -
> > -again:
> > -	drm_sched_submit_queue(sched);
> >  }
> >  
> >  /**
> > @@ -1151,7 +1215,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> >  	spin_lock_init(&sched->job_list_lock);
> >  	atomic_set(&sched->hw_rq_count, 0);
> >  	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
> > -	INIT_WORK(&sched->work_submit, drm_sched_main);
> > +	INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
> > +	INIT_WORK(&sched->work_free_job, drm_sched_free_job_work);
> >  	atomic_set(&sched->_score, 0);
> >  	atomic64_set(&sched->job_id_count, 0);
> >  	sched->pause_submit = false;
> > @@ -1276,7 +1341,8 @@ EXPORT_SYMBOL(drm_sched_submit_ready);
> >  void drm_sched_submit_stop(struct drm_gpu_scheduler *sched)
> >  {
> >  	WRITE_ONCE(sched->pause_submit, true);
> > -	cancel_work_sync(&sched->work_submit);
> > +	cancel_work_sync(&sched->work_run_job);
> > +	cancel_work_sync(&sched->work_free_job);
> >  }
> >  EXPORT_SYMBOL(drm_sched_submit_stop);
> >  
> > @@ -1288,6 +1354,7 @@ EXPORT_SYMBOL(drm_sched_submit_stop);
> >  void drm_sched_submit_start(struct drm_gpu_scheduler *sched)
> >  {
> >  	WRITE_ONCE(sched->pause_submit, false);
> > -	queue_work(sched->submit_wq, &sched->work_submit);
> > +	queue_work(sched->submit_wq, &sched->work_run_job);
> > +	queue_work(sched->submit_wq, &sched->work_free_job);
> >  }
> >  EXPORT_SYMBOL(drm_sched_submit_start);
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 04eec2d7635f..fbc083a92757 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -487,9 +487,10 @@ struct drm_sched_backend_ops {
> >   *                 finished.
> >   * @hw_rq_count: the number of jobs currently in the hardware queue.
> >   * @job_id_count: used to assign unique id to the each job.
> > - * @submit_wq: workqueue used to queue @work_submit
> > + * @submit_wq: workqueue used to queue @work_run_job and @work_free_job
> >   * @timeout_wq: workqueue used to queue @work_tdr
> > - * @work_submit: schedules jobs and cleans up entities
> > + * @work_run_job: schedules jobs
> > + * @work_free_job: cleans up jobs
> >   * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
> >   *            timeout interval is over.
> >   * @pending_list: the list of jobs which are currently in the job queue.
> > @@ -518,7 +519,8 @@ struct drm_gpu_scheduler {
> >  	atomic64_t			job_id_count;
> >  	struct workqueue_struct		*submit_wq;
> >  	struct workqueue_struct		*timeout_wq;
> > -	struct work_struct		work_submit;
> > +	struct work_struct		work_run_job;
> > +	struct work_struct		work_free_job;
> >  	struct delayed_work		work_tdr;
> >  	struct list_head		pending_list;
> >  	spinlock_t			job_list_lock;
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill
  2023-09-12  9:57   ` Christian König
@ 2023-09-12 14:47     ` Matthew Brost
  2023-09-16 17:52       ` Danilo Krummrich
  0 siblings, 1 reply; 53+ messages in thread
From: Matthew Brost @ 2023-09-12 14:47 UTC (permalink / raw)
  To: Christian König
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, intel-xe, faith.ekstrand

On Tue, Sep 12, 2023 at 11:57:30AM +0200, Christian König wrote:
> Am 12.09.23 um 04:16 schrieb Matthew Brost:
> > Wait for pending jobs to be complete before signaling queued jobs. This
> > ensures dma-fence signaling order correct and also ensures the entity is
> > not running on the hardware after drm_sched_entity_flush or
> > drm_sched_entity_fini returns.
> 
> Entities are *not* supposed to outlive the submissions they carry and we
> absolutely *can't* wait for submissions to finish while killing the entity.
> 
> In other words it is perfectly expected that entities doesn't exists any
> more while the submissions they carried are still running on the hardware.
> 
> I somehow better need to document how this working and especially why it is
> working like that.
> 
> This approach came up like four or five times now and we already applied and
> reverted patches doing this.
> 
> For now let's take a look at the source code of drm_sched_entity_kill():
> 
>        /* The entity is guaranteed to not be used by the scheduler */
>         prev = rcu_dereference_check(entity->last_scheduled, true);
>         dma_fence_get(prev);
> 
>         while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue))))
> {
>                 struct drm_sched_fence *s_fence = job->s_fence;
> 
>                 dma_fence_get(&s_fence->finished);
>                 if (!prev || dma_fence_add_callback(prev, &job->finish_cb,
> drm_sched_entity_kill_jobs_cb))
>                         drm_sched_entity_kill_jobs_cb(NULL,
> &job->finish_cb);
> 
>                 prev = &s_fence->finished;
>         }
>         dma_fence_put(prev);
> 
> This ensures the dma-fence signaling order by delegating signaling of the
> scheduler fences into callbacks.
> 

Thanks for the explaination, this code makes more sense now. Agree this
patch is not correct.

This patch really is an RFC for something Nouveau needs, I can delete
this patch in the next rev and let Nouveau run with a slightly different
version if needed.

Matt

> Regards,
> Christian.
> 
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
> >   drivers/gpu/drm/scheduler/sched_entity.c    |  7 ++-
> >   drivers/gpu/drm/scheduler/sched_main.c      | 50 ++++++++++++++++++---
> >   include/drm/gpu_scheduler.h                 | 18 ++++++++
> >   4 files changed, 70 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > index fb5dad687168..7835c0da65c5 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > @@ -1873,7 +1873,7 @@ static void amdgpu_ib_preempt_mark_partial_job(struct amdgpu_ring *ring)
> >   	list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) {
> >   		if (dma_fence_is_signaled(&s_job->s_fence->finished)) {
> >   			/* remove job from ring_mirror_list */
> > -			list_del_init(&s_job->list);
> > +			drm_sched_remove_pending_job(s_job);
> >   			sched->ops->free_job(s_job);
> >   			continue;
> >   		}
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > index 1dec97caaba3..37557fbb96d0 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -104,9 +104,11 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
> >   	}
> >   	init_completion(&entity->entity_idle);
> > +	init_completion(&entity->jobs_done);
> > -	/* We start in an idle state. */
> > +	/* We start in an idle and jobs done state. */
> >   	complete_all(&entity->entity_idle);
> > +	complete_all(&entity->jobs_done);
> >   	spin_lock_init(&entity->rq_lock);
> >   	spsc_queue_init(&entity->job_queue);
> > @@ -256,6 +258,9 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
> >   	/* Make sure this entity is not used by the scheduler at the moment */
> >   	wait_for_completion(&entity->entity_idle);
> > +	/* Make sure all pending jobs are done */
> > +	wait_for_completion(&entity->jobs_done);
> > +
> >   	/* The entity is guaranteed to not be used by the scheduler */
> >   	prev = rcu_dereference_check(entity->last_scheduled, true);
> >   	dma_fence_get(prev);
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 689fb6686e01..ed6f5680793a 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -510,12 +510,52 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
> >   }
> >   EXPORT_SYMBOL(drm_sched_resume_timeout);
> > +/**
> > + * drm_sched_add_pending_job - Add pending job to scheduler
> > + *
> > + * @job: scheduler job to add
> > + * @tail: add to tail of pending list
> > + */
> > +void drm_sched_add_pending_job(struct drm_sched_job *job, bool tail)
> > +{
> > +	struct drm_gpu_scheduler *sched = job->sched;
> > +	struct drm_sched_entity *entity = job->entity;
> > +
> > +	lockdep_assert_held(&sched->job_list_lock);
> > +
> > +	if (tail)
> > +		list_add_tail(&job->list, &sched->pending_list);
> > +	else
> > +		list_add(&job->list, &sched->pending_list);
> > +	if (!entity->pending_job_count++)
> > +		reinit_completion(&entity->jobs_done);
> > +}
> > +EXPORT_SYMBOL(drm_sched_add_pending_job);
> > +
> > +/**
> > + * drm_sched_remove_pending_job - Remove pending job from` scheduler
> > + *
> > + * @job: scheduler job to remove
> > + */
> > +void drm_sched_remove_pending_job(struct drm_sched_job *job)
> > +{
> > +	struct drm_gpu_scheduler *sched = job->sched;
> > +	struct drm_sched_entity *entity = job->entity;
> > +
> > +	lockdep_assert_held(&sched->job_list_lock);
> > +
> > +	list_del_init(&job->list);
> > +	if (!--entity->pending_job_count)
> > +		complete_all(&entity->jobs_done);
> > +}
> > +EXPORT_SYMBOL(drm_sched_remove_pending_job);
> > +
> >   static void drm_sched_job_begin(struct drm_sched_job *s_job)
> >   {
> >   	struct drm_gpu_scheduler *sched = s_job->sched;
> >   	spin_lock(&sched->job_list_lock);
> > -	list_add_tail(&s_job->list, &sched->pending_list);
> > +	drm_sched_add_pending_job(s_job, true);
> >   	spin_unlock(&sched->job_list_lock);
> >   }
> > @@ -538,7 +578,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
> >   		 * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
> >   		 * is parked at which point it's safe.
> >   		 */
> > -		list_del_init(&job->list);
> > +		drm_sched_remove_pending_job(job);
> >   		spin_unlock(&sched->job_list_lock);
> >   		status = job->sched->ops->timedout_job(job);
> > @@ -589,7 +629,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
> >   		 * Add at the head of the queue to reflect it was the earliest
> >   		 * job extracted.
> >   		 */
> > -		list_add(&bad->list, &sched->pending_list);
> > +		drm_sched_add_pending_job(bad, false);
> >   	/*
> >   	 * Iterate the job list from later to  earlier one and either deactive
> > @@ -611,7 +651,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
> >   			 * Locking here is for concurrent resume timeout
> >   			 */
> >   			spin_lock(&sched->job_list_lock);
> > -			list_del_init(&s_job->list);
> > +			drm_sched_remove_pending_job(s_job);
> >   			spin_unlock(&sched->job_list_lock);
> >   			/*
> > @@ -1066,7 +1106,7 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
> >   	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
> >   		/* remove job from pending_list */
> > -		list_del_init(&job->list);
> > +		drm_sched_remove_pending_job(job);
> >   		/* cancel this job's TO timer */
> >   		cancel_delayed_work(&sched->work_tdr);
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index b7b818cd81b6..7c628f36fe78 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -233,6 +233,21 @@ struct drm_sched_entity {
> >   	 */
> >   	struct completion		entity_idle;
> > +	/**
> > +	 * @pending_job_count:
> > +	 *
> > +	 * Number of pending jobs.
> > +	 */
> > +	unsigned int                    pending_job_count;
> > +
> > +	/**
> > +	 * @jobs_done:
> > +	 *
> > +	 * Signals when entity has no pending jobs, used to sequence entity
> > +	 * cleanup in drm_sched_entity_fini().
> > +	 */
> > +	struct completion		jobs_done;
> > +
> >   	/**
> >   	 * @oldest_job_waiting:
> >   	 *
> > @@ -656,4 +671,7 @@ struct drm_gpu_scheduler *
> >   drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
> >   		     unsigned int num_sched_list);
> > +void drm_sched_add_pending_job(struct drm_sched_job *job, bool tail);
> > +void drm_sched_remove_pending_job(struct drm_sched_job *job);
> > +
> >   #endif
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 06/13] drm/sched: Add generic scheduler message interface
  2023-09-12  8:23   ` Boris Brezillon
@ 2023-09-12 14:50     ` Matthew Brost
  0 siblings, 0 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-12 14:50 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, christian.koenig, luben.tuikov,
	donald.robson, daniel, lina, airlied, intel-xe, faith.ekstrand

On Tue, Sep 12, 2023 at 10:23:02AM +0200, Boris Brezillon wrote:
> On Mon, 11 Sep 2023 19:16:08 -0700
> Matthew Brost <matthew.brost@intel.com> wrote:
> 
> > Add generic schedule message interface which sends messages to backend
> > from the drm_gpu_scheduler main submission thread. The idea is some of
> > these messages modify some state in drm_sched_entity which is also
> > modified during submission. By scheduling these messages and submission
> > in the same thread their is not race changing states in
> > drm_sched_entity.
> > 
> > This interface will be used in Xe, new Intel GPU driver, to cleanup,
> > suspend, resume, and change scheduling properties of a drm_sched_entity.
> > 
> > The interface is designed to be generic and extendable with only the
> > backend understanding the messages.
> 
> I didn't follow the previous discussions closely enough, but it seemed
> to me that the whole point of this 'ordered-wq for scheduler' approach
> was so you could interleave your driver-specific work items in the
> processing without changing the core. This messaging system looks like
> something that could/should be entirely driver-specific to me, and I'm
> not convinced this thin 'work -> generic_message_callback' layer is
> worth it. You can simply have your own xe_msg_process work, and a
> xe_msg_send helper that schedules this work. Assuming other drivers
> need this messaging API, they'll probably have their own message ids
> and payloads, and the automation done here is simple enough that it can
> be duplicated. That's just my personal opinion, of course, and if
> others see this message interface as valuable, I fine with it.

Good point. I am fine deleting this from the scheduler and making this
driver specific.

Matt

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 05/13] drm/sched: Split free_job into own work item
  2023-09-12 14:37     ` Matthew Brost
@ 2023-09-12 14:53       ` Boris Brezillon
  2023-09-12 14:55         ` Matthew Brost
  0 siblings, 1 reply; 53+ messages in thread
From: Boris Brezillon @ 2023-09-12 14:53 UTC (permalink / raw)
  To: Matthew Brost
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, christian.koenig, luben.tuikov,
	donald.robson, daniel, lina, airlied, intel-xe, faith.ekstrand

On Tue, 12 Sep 2023 14:37:57 +0000
Matthew Brost <matthew.brost@intel.com> wrote:

> > Looks like you are changing the behavior here (unconditional ->
> > conditional timestamp update)? Probably something that should go in a
> > separate patch.
> >   
> 
> This patch creates a race so this check isn't need before this patch.
> With that I think it makes sense to have all in a single patch. If you
> feel strongly about this, I can break this change out into a patch prior
> to this one.

It's probably fine to keep it in this patch, but we should
definitely have a comment explaining why this check is needed.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill
  2023-09-12 10:28   ` Boris Brezillon
@ 2023-09-12 14:54     ` Matthew Brost
  0 siblings, 0 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-12 14:54 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, christian.koenig, luben.tuikov,
	donald.robson, daniel, lina, airlied, intel-xe, faith.ekstrand

On Tue, Sep 12, 2023 at 12:28:28PM +0200, Boris Brezillon wrote:
> On Mon, 11 Sep 2023 19:16:13 -0700
> Matthew Brost <matthew.brost@intel.com> wrote:
> 
> > +void drm_sched_add_pending_job(struct drm_sched_job *job, bool tail)
> > +{
> > +	struct drm_gpu_scheduler *sched = job->sched;
> > +	struct drm_sched_entity *entity = job->entity;
> 
> drm_sched_entity_pop_job() sets job->entity to NULL [1], and I end with
> a NULL deref in this function. I guess you have another patch in your
> tree dropping this job->entity = NULL in drm_sched_entity_pop_job(),
> but given this comment [1], it's probably not the right thing to do.
> 

Didn't fully test this one, regardless I will drop this patch in the
next rev.

Matt

> > +
> > +	lockdep_assert_held(&sched->job_list_lock);
> > +
> > +	if (tail)
> > +		list_add_tail(&job->list, &sched->pending_list);
> > +	else
> > +		list_add(&job->list, &sched->pending_list);
> > +	if (!entity->pending_job_count++)
> > +		reinit_completion(&entity->jobs_done);
> > +}
> > +EXPORT_SYMBOL(drm_sched_add_pending_job);
> 
> [1]https://elixir.bootlin.com/linux/v6.6-rc1/source/drivers/gpu/drm/scheduler/sched_entity.c#L497

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 05/13] drm/sched: Split free_job into own work item
  2023-09-12 14:53       ` Boris Brezillon
@ 2023-09-12 14:55         ` Matthew Brost
  0 siblings, 0 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-12 14:55 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: robdclark, sarah.walker, ketil.johnsen, lina, mcanal, Liviu.Dudau,
	dri-devel, intel-xe, luben.tuikov, donald.robson,
	christian.koenig, faith.ekstrand

On Tue, Sep 12, 2023 at 04:53:00PM +0200, Boris Brezillon wrote:
> On Tue, 12 Sep 2023 14:37:57 +0000
> Matthew Brost <matthew.brost@intel.com> wrote:
> 
> > > Looks like you are changing the behavior here (unconditional ->
> > > conditional timestamp update)? Probably something that should go in a
> > > separate patch.
> > >   
> > 
> > This patch creates a race so this check isn't need before this patch.
> > With that I think it makes sense to have all in a single patch. If you
> > feel strongly about this, I can break this change out into a patch prior
> > to this one.
> 
> It's probably fine to keep it in this patch, but we should
> definitely have a comment explaining why this check is needed.

Sure, will add comment in next rev.

Matt

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread
  2023-09-12  7:29   ` Boris Brezillon
@ 2023-09-12 15:02     ` Matthew Brost
  2023-09-14  3:41       ` Luben Tuikov
  0 siblings, 1 reply; 53+ messages in thread
From: Matthew Brost @ 2023-09-12 15:02 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: robdclark, sarah.walker, ketil.johnsen, lina, mcanal, Liviu.Dudau,
	dri-devel, intel-xe, luben.tuikov, donald.robson,
	christian.koenig, faith.ekstrand

On Tue, Sep 12, 2023 at 09:29:53AM +0200, Boris Brezillon wrote:
> On Mon, 11 Sep 2023 19:16:04 -0700
> Matthew Brost <matthew.brost@intel.com> wrote:
> 
> > @@ -1071,6 +1063,7 @@ static int drm_sched_main(void *param)
> >   *
> >   * @sched: scheduler instance
> >   * @ops: backend operations for this scheduler
> > + * @submit_wq: workqueue to use for submission. If NULL, the system_wq is used
> >   * @hw_submission: number of hw submissions that can be in flight
> >   * @hang_limit: number of times to allow a job to hang before dropping it
> >   * @timeout: timeout value in jiffies for the scheduler
> > @@ -1084,14 +1077,16 @@ static int drm_sched_main(void *param)
> >   */
> >  int drm_sched_init(struct drm_gpu_scheduler *sched,
> >  		   const struct drm_sched_backend_ops *ops,
> > +		   struct workqueue_struct *submit_wq,
> >  		   unsigned hw_submission, unsigned hang_limit,
> >  		   long timeout, struct workqueue_struct *timeout_wq,
> >  		   atomic_t *score, const char *name, struct device *dev)
> >  {
> > -	int i, ret;
> > +	int i;
> >  	sched->ops = ops;
> >  	sched->hw_submission_limit = hw_submission;
> >  	sched->name = name;
> > +	sched->submit_wq = submit_wq ? : system_wq;
> 
> My understanding is that the new design is based on the idea of
> splitting the drm_sched_main function into work items that can be
> scheduled independently so users/drivers can insert their own
> steps/works without requiring changes to drm_sched. This approach is
> relying on the properties of ordered workqueues (1 work executed at a
> time, FIFO behavior) to guarantee that these steps are still executed
> in order, and one at a time.
> 
> Given what you're trying to achieve I think we should create an ordered
> workqueue instead of using the system_wq when submit_wq is NULL,
> otherwise you lose this ordering/serialization guarantee which both
> the dedicated kthread and ordered wq provide. It will probably work for
> most drivers, but might lead to subtle/hard to spot ordering issues.
> 

I debated chosing between a system_wq or creating an ordered-wq by
default myself. Indeed using the system_wq by default subtlety changes
the behavior as run_job & free_job workers can run in parallel. To be
safe, agree the default use be an ordered-wq. If drivers are fine with
run_job() and free_job() running in parallel, they are free to set
submit_wq == system_wq. Will change in next rev.

Matt

> >  	sched->timeout = timeout;
> >  	sched->timeout_wq = timeout_wq ? : system_wq;
> >  	sched->hang_limit = hang_limit;
> > @@ -1100,23 +1095,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> >  	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
> >  		drm_sched_rq_init(sched, &sched->sched_rq[i]);
> >  
> > -	init_waitqueue_head(&sched->wake_up_worker);
> >  	init_waitqueue_head(&sched->job_scheduled);
> >  	INIT_LIST_HEAD(&sched->pending_list);
> >  	spin_lock_init(&sched->job_list_lock);
> >  	atomic_set(&sched->hw_rq_count, 0);
> >  	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
> > +	INIT_WORK(&sched->work_submit, drm_sched_main);
> >  	atomic_set(&sched->_score, 0);
> >  	atomic64_set(&sched->job_id_count, 0);
> > -
> > -	/* Each scheduler will run on a seperate kernel thread */
> > -	sched->thread = kthread_run(drm_sched_main, sched, sched->name);
> > -	if (IS_ERR(sched->thread)) {
> > -		ret = PTR_ERR(sched->thread);
> > -		sched->thread = NULL;
> > -		DRM_DEV_ERROR(sched->dev, "Failed to create scheduler for %s.\n", name);
> > -		return ret;
> > -	}
> > +	sched->pause_submit = false;
> >  
> >  	sched->ready = true;
> >  	return 0;

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity
  2023-09-12  7:37   ` Boris Brezillon
@ 2023-09-12 15:14     ` Matthew Brost
  0 siblings, 0 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-12 15:14 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, christian.koenig, luben.tuikov,
	donald.robson, daniel, lina, airlied, intel-xe, faith.ekstrand

On Tue, Sep 12, 2023 at 09:37:06AM +0200, Boris Brezillon wrote:
> On Mon, 11 Sep 2023 19:16:05 -0700
> Matthew Brost <matthew.brost@intel.com> wrote:
> 
> > Rather than a global modparam for scheduling policy, move the scheduling
> > policy to scheduler / entity so user can control each scheduler / entity
> > policy.
> 
> I'm a bit confused by the commit message (I think I'm okay with the
> diff though). Sounds like entity is involved in the sched policy
> choice, but AFAICT, it just has to live with the scheduler policy chosen
> by the driver at init time. If my understanding is correct, I'd just
> drop the ' / entity'.

Yep, stale commit message. Will fix in next rev.

Matt

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity
  2023-09-12 14:11   ` kernel test robot
@ 2023-09-12 15:17     ` Matthew Brost
  0 siblings, 0 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-12 15:17 UTC (permalink / raw)
  To: kernel test robot
  Cc: robdclark, mcanal, sarah.walker, ketil.johnsen, lina, llvm,
	Liviu.Dudau, dri-devel, christian.koenig, luben.tuikov,
	boris.brezillon, donald.robson, oe-kbuild-all, intel-xe,
	faith.ekstrand

On Tue, Sep 12, 2023 at 10:11:56PM +0800, kernel test robot wrote:
> Hi Matthew,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on drm/drm-next]
> [also build test ERROR on drm-exynos/exynos-drm-next drm-intel/for-linux-next-fixes drm-tip/drm-tip linus/master v6.6-rc1 next-20230912]
> [cannot apply to drm-misc/drm-misc-next drm-intel/for-linux-next]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-Brost/drm-sched-Add-drm_sched_submit_-helpers/20230912-102001
> base:   git://anongit.freedesktop.org/drm/drm drm-next
> patch link:    https://lore.kernel.org/r/20230912021615.2086698-4-matthew.brost%40intel.com
> patch subject: [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity
> config: arm64-randconfig-r032-20230912 (https://download.01.org/0day-ci/archive/20230912/202309122100.HAEi8ytJ-lkp@intel.com/config)
> compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230912/202309122100.HAEi8ytJ-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202309122100.HAEi8ytJ-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
> >> drivers/gpu/drm/v3d/v3d_sched.c:403:9: error: use of undeclared identifier 'ULL'
>                                 ULL, "v3d_render", DRM_SCHED_POLICY_DEFAULT,
>                                 ^

Typos, s/ULL/NULL in next rev.

Matt

>    1 error generated.
> 
> 
> vim +/ULL +403 drivers/gpu/drm/v3d/v3d_sched.c
> 
>    381	
>    382	int
>    383	v3d_sched_init(struct v3d_dev *v3d)
>    384	{
>    385		int hw_jobs_limit = 1;
>    386		int job_hang_limit = 0;
>    387		int hang_limit_ms = 500;
>    388		int ret;
>    389	
>    390		ret = drm_sched_init(&v3d->queue[V3D_BIN].sched,
>    391				     &v3d_bin_sched_ops, NULL,
>    392				     hw_jobs_limit, job_hang_limit,
>    393				     msecs_to_jiffies(hang_limit_ms), NULL,
>    394				     NULL, "v3d_bin", DRM_SCHED_POLICY_DEFAULT,
>    395				     v3d->drm.dev);
>    396		if (ret)
>    397			return ret;
>    398	
>    399		ret = drm_sched_init(&v3d->queue[V3D_RENDER].sched,
>    400				     &v3d_render_sched_ops, NULL,
>    401				     hw_jobs_limit, job_hang_limit,
>    402				     msecs_to_jiffies(hang_limit_ms), NULL,
>  > 403				     ULL, "v3d_render", DRM_SCHED_POLICY_DEFAULT,
>    404				     v3d->drm.dev);
>    405		if (ret)
>    406			goto fail;
>    407	
>    408		ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
>    409				     &v3d_tfu_sched_ops, NULL,
>    410				     hw_jobs_limit, job_hang_limit,
>    411				     msecs_to_jiffies(hang_limit_ms), NULL,
>    412				     NULL, "v3d_tfu", DRM_SCHED_POLICY_DEFAULT,
>    413				     v3d->drm.dev);
>    414		if (ret)
>    415			goto fail;
>    416	
>    417		if (v3d_has_csd(v3d)) {
>    418			ret = drm_sched_init(&v3d->queue[V3D_CSD].sched,
>    419					     &v3d_csd_sched_ops, NULL,
>    420					     hw_jobs_limit, job_hang_limit,
>    421					     msecs_to_jiffies(hang_limit_ms), NULL,
>    422					     NULL, "v3d_csd", DRM_SCHED_POLICY_DEFAULT,
>    423					     v3d->drm.dev);
>    424			if (ret)
>    425				goto fail;
>    426	
>    427			ret = drm_sched_init(&v3d->queue[V3D_CACHE_CLEAN].sched,
>    428					     &v3d_cache_clean_sched_ops, NULL,
>    429					     hw_jobs_limit, job_hang_limit,
>    430					     msecs_to_jiffies(hang_limit_ms), NULL,
>    431					     NULL, "v3d_cache_clean",
>    432					     DRM_SCHED_POLICY_DEFAULT, v3d->drm.dev);
>    433			if (ret)
>    434				goto fail;
>    435		}
>    436	
>    437		return 0;
>    438	
>    439	fail:
>    440		v3d_sched_fini(v3d);
>    441		return ret;
>    442	}
>    443	
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 04/13] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 04/13] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
@ 2023-09-13 12:30   ` kernel test robot
  0 siblings, 0 replies; 53+ messages in thread
From: kernel test robot @ 2023-09-13 12:30 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, mcanal, sarah.walker, ketil.johnsen, lina,
	oe-kbuild-all, Liviu.Dudau, luben.tuikov, boris.brezillon,
	donald.robson, christian.koenig, faith.ekstrand

Hi Matthew,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm/drm-next]
[also build test WARNING on drm-exynos/exynos-drm-next drm-intel/for-linux-next-fixes drm-tip/drm-tip linus/master v6.6-rc1 next-20230913]
[cannot apply to drm-misc/drm-misc-next drm-intel/for-linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-Brost/drm-sched-Add-drm_sched_submit_-helpers/20230912-102001
base:   git://anongit.freedesktop.org/drm/drm drm-next
patch link:    https://lore.kernel.org/r/20230912021615.2086698-5-matthew.brost%40intel.com
patch subject: [PATCH v3 04/13] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
reproduce: (https://download.01.org/0day-ci/archive/20230913/202309132041.76l2uKon-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309132041.76l2uKon-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Documentation/gpu/drm-mm:552: ./drivers/gpu/drm/scheduler/sched_main.c:52: WARNING: Enumerated list ends without a blank line; unexpected unindent.

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 12/13] drm/sched/doc: Add Entity teardown documentaion
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 12/13] drm/sched/doc: Add Entity teardown documentaion Matthew Brost
@ 2023-09-13 15:04   ` Christian König
  2023-09-14  2:06   ` Luben Tuikov
  2023-09-16 18:06   ` Danilo Krummrich
  2 siblings, 0 replies; 53+ messages in thread
From: Christian König @ 2023-09-13 15:04 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, luben.tuikov, lina, donald.robson, daniel,
	boris.brezillon, airlied, faith.ekstrand

Am 12.09.23 um 04:16 schrieb Matthew Brost:
> Provide documentation to guide in ways to teardown an entity.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   Documentation/gpu/drm-mm.rst             |  6 ++++++
>   drivers/gpu/drm/scheduler/sched_entity.c | 19 +++++++++++++++++++
>   2 files changed, 25 insertions(+)
>
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index c19b34b1c0ed..cb4d6097897e 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -552,6 +552,12 @@ Overview
>   .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
>      :doc: Overview
>   
> +Entity teardown
> +---------------
> +
> +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_entity.c
> +   :doc: Entity teardown
> +
>   Scheduler Function References
>   -----------------------------
>   
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 37557fbb96d0..76f3e10218bb 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -21,6 +21,25 @@
>    *
>    */
>   
> +/**
> + * DOC: Entity teardown
> + *
> + * Drivers can teardown down an entity for several reasons. Reasons typically
> + * are a user closes the entity via an IOCTL, the FD associated with the entity
> + * is closed, or the entity encounters an error. The GPU scheduler provides the
> + * basic infrastructure to do this in a few different ways.
> + *
> + * 1. Let the entity run dry (both the pending list and job queue) and then call
> + * drm_sched_entity_fini. The backend can accelerate the process of running dry.
> + * For example set a flag so run_job is a NOP and set the TDR to a low value to
> + * signal all jobs in a timely manner (this example works for
> + * DRM_SCHED_POLICY_SINGLE_ENTITY).

Please note that it is a requirement from the X server that all 
externally visible effects of command submission must still be visible 
even after the fd is closed.

This has given us tons amount of headache and is one of the reasons we 
have the drm_sched_entity_flush() handling in the first place.

As long as you don't care about X server compatibility that shouldn't 
matter to you.

Regards,
Christian.

> + *
> + * 2. Kill the entity directly via drm_sched_entity_flush /
> + * drm_sched_entity_fini ensuring all pending and queued jobs are off the
> + * hardware and signaled.



> + */
> +
>   #include <linux/kthread.h>
>   #include <linux/slab.h>
>   #include <linux/completion.h>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe
  2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
                   ` (13 preceding siblings ...)
  2023-09-12  2:20 ` [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev5) Patchwork
@ 2023-09-14  1:45 ` Luben Tuikov
  14 siblings, 0 replies; 53+ messages in thread
From: Luben Tuikov @ 2023-09-14  1:45 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, boris.brezillon, donald.robson, daniel, lina,
	airlied, christian.koenig, faith.ekstrand

On 2023-09-11 22:16, Matthew Brost wrote:
> As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we
> have been asked to merge our common DRM scheduler patches first.
> 
> This a continuation of a RFC [3] with all comments addressed, ready for
> a full review, and hopefully in state which can merged in the near
> future. More details of this series can found in the cover letter of the
> RFC [3].
> 
> These changes have been tested with the Xe driver.
> 
> v2:
>  - Break run job, free job, and process message in own work items
>  - This might break other drivers as run job and free job now can run in
>    parallel, can fix up if needed

Hi Matthew,

Do you mean "run job B and free job A" ... "in parallel"?

I don't see why this cannot be done. One can have a work-item/thread
push jobs to hardware, while another post-processes them on a wakeup from
a driver interrupt, and frees them, and both of those run in parallel,
albeit any dependencies.

Regards,
Luben

> 
> v3:
>  - Include missing patch 'drm/sched: Add drm_sched_submit_* helpers'
>  - Fix issue with setting timestamp to early
>  - Don't dequeue jobs for single entity after calling entity fini
>  - Flush pending jobs on entity fini
>  - Add documentation for entity teardown
>  - Add Matthew Brost to maintainers of DRM scheduler
> 
> Matt
> 
> [1] https://gitlab.freedesktop.org/drm/xe/kernel
> [2] https://patchwork.freedesktop.org/series/112188/
> [3] https://patchwork.freedesktop.org/series/116055/
> 
> Matthew Brost (13):
>   drm/sched: Add drm_sched_submit_* helpers
>   drm/sched: Convert drm scheduler to use a work queue rather than
>     kthread
>   drm/sched: Move schedule policy to scheduler / entity
>   drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
>   drm/sched: Split free_job into own work item
>   drm/sched: Add generic scheduler message interface
>   drm/sched: Add drm_sched_start_timeout_unlocked helper
>   drm/sched: Start run wq before TDR in drm_sched_start
>   drm/sched: Submit job before starting TDR
>   drm/sched: Add helper to set TDR timeout
>   drm/sched: Waiting for pending jobs to complete in scheduler kill
>   drm/sched/doc: Add Entity teardown documentaion
>   drm/sched: Update maintainers of GPU scheduler
> 
>  Documentation/gpu/drm-mm.rst                  |   6 +
>  MAINTAINERS                                   |   1 +
>  .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  17 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  15 +-
>  drivers/gpu/drm/etnaviv/etnaviv_sched.c       |   5 +-
>  drivers/gpu/drm/lima/lima_sched.c             |   5 +-
>  drivers/gpu/drm/msm/adreno/adreno_device.c    |   6 +-
>  drivers/gpu/drm/msm/msm_ringbuffer.c          |   5 +-
>  drivers/gpu/drm/nouveau/nouveau_sched.c       |   5 +-
>  drivers/gpu/drm/panfrost/panfrost_job.c       |   5 +-
>  drivers/gpu/drm/scheduler/sched_entity.c      | 111 +++-
>  drivers/gpu/drm/scheduler/sched_fence.c       |   2 +-
>  drivers/gpu/drm/scheduler/sched_main.c        | 497 ++++++++++++++----
>  drivers/gpu/drm/v3d/v3d_sched.c               |  25 +-
>  include/drm/gpu_scheduler.h                   |  96 +++-
>  16 files changed, 644 insertions(+), 159 deletions(-)
> 

-- 
Regards,
Luben


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 12/13] drm/sched/doc: Add Entity teardown documentaion
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 12/13] drm/sched/doc: Add Entity teardown documentaion Matthew Brost
  2023-09-13 15:04   ` Christian König
@ 2023-09-14  2:06   ` Luben Tuikov
  2023-09-16 18:06   ` Danilo Krummrich
  2 siblings, 0 replies; 53+ messages in thread
From: Luben Tuikov @ 2023-09-14  2:06 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, boris.brezillon, donald.robson, daniel, lina,
	airlied, christian.koenig, faith.ekstrand

On 2023-09-11 22:16, Matthew Brost wrote:
> Provide documentation to guide in ways to teardown an entity.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  Documentation/gpu/drm-mm.rst             |  6 ++++++
>  drivers/gpu/drm/scheduler/sched_entity.c | 19 +++++++++++++++++++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index c19b34b1c0ed..cb4d6097897e 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -552,6 +552,12 @@ Overview
>  .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
>     :doc: Overview
>  
> +Entity teardown
> +---------------
> +
> +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_entity.c
> +   :doc: Entity teardown
> +
>  Scheduler Function References
>  -----------------------------
>  
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 37557fbb96d0..76f3e10218bb 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -21,6 +21,25 @@
>   *
>   */
>  
> +/**
> + * DOC: Entity teardown
> + *
> + * Drivers can teardown down an entity for several reasons. Reasons typically
> + * are a user closes the entity via an IOCTL, the FD associated with the entity
> + * is closed, or the entity encounters an error.

So in this third case, "entity encounters an error", we need to make sure
that no new jobs are being pushed to the entity, or at least say that here.
IOW, in all three cases, the common denominator (requirement?) is that no new
jobs are being pushed to the entity, i.e. that there are no incoming jobs.

> The GPU scheduler provides the
> + * basic infrastructure to do this in a few different ways.

Well, I'd say "in two different ways." or "in the following two ways."
I'd rather have "two" in there to make sure that it is these two, and
not any more/less/etc.

> + *
> + * 1. Let the entity run dry (both the pending list and job queue) and then call
> + * drm_sched_entity_fini. The backend can accelerate the process of running dry.
> + * For example set a flag so run_job is a NOP and set the TDR to a low value to
> + * signal all jobs in a timely manner (this example works for
> + * DRM_SCHED_POLICY_SINGLE_ENTITY).
> + *
> + * 2. Kill the entity directly via drm_sched_entity_flush /
> + * drm_sched_entity_fini ensuring all pending and queued jobs are off the
> + * hardware and signaled.
> + */
> +
>  #include <linux/kthread.h>
>  #include <linux/slab.h>
>  #include <linux/completion.h>

-- 
Regards,
Luben


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 10/13] drm/sched: Add helper to set TDR timeout
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 10/13] drm/sched: Add helper to set TDR timeout Matthew Brost
@ 2023-09-14  2:38   ` Luben Tuikov
  2023-09-14 17:36     ` Matthew Brost
  0 siblings, 1 reply; 53+ messages in thread
From: Luben Tuikov @ 2023-09-14  2:38 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, boris.brezillon, donald.robson, daniel, lina,
	airlied, christian.koenig, faith.ekstrand

On 2023-09-11 22:16, Matthew Brost wrote:
> Add helper to set TDR timeout and restart the TDR with new timeout
> value. This will be used in XE, new Intel GPU driver, to trigger the TDR
> to cleanup drm_sched_entity that encounter errors.

Do you just want to trigger the cleanup or do you really want to
modify the timeout and requeue TDR delayed work (to be triggered
later at a different time)?

If the former, then might as well just add a function to queue that
right away. If the latter, then this would do, albeit with a few
notes as mentioned below.

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 18 ++++++++++++++++++
>  include/drm/gpu_scheduler.h            |  1 +
>  2 files changed, 19 insertions(+)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 9dbfab7be2c6..689fb6686e01 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -426,6 +426,24 @@ static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
>  	spin_unlock(&sched->job_list_lock);
>  }
>  
> +/**
> + * drm_sched_set_timeout - set timeout for reset worker
> + *
> + * @sched: scheduler instance to set and (re)-start the worker for
> + * @timeout: timeout period
> + *
> + * Set and (re)-start the timeout for the given scheduler.
> + */
> +void drm_sched_set_timeout(struct drm_gpu_scheduler *sched, long timeout)
> +{

Well, I'd perhaps call this "drm_sched_set_tdr_timeout()", or something
to that effect, as "drm_sched_set_timeout()" isn't clear that it is indeed
a cleanup timeout. However, it's totally up to you. :-)

It appears that "long timeout" is the new job timeout, so it is possible
that a stuck job might be given old timeout + new timeout recovery time,
after this function is called.

> +	spin_lock(&sched->job_list_lock);
> +	sched->timeout = timeout;
> +	cancel_delayed_work(&sched->work_tdr);
> +	drm_sched_start_timeout(sched);
> +	spin_unlock(&sched->job_list_lock);
> +}
> +EXPORT_SYMBOL(drm_sched_set_timeout);

Well, drm_sched_start_timeout() (which also has a name lacking description, perhaps
it should be "drm_sched_start_tdr_timeout()" or "...start_cleanup_timeout()"), anyway,
so that function compares to MAX_SCHEDULE_TIMEOUT and pending list not being empty
before it requeues delayed TDR work item. So, while a remote possibility, this new
function may have the unintended consequence of canceling TDR, and never restarting it.
I see it grabs the lock, however. Maybe wrap it in "if (sched->timeout != MAX_SCHEDULE_TIMEOUT)"?
How about using mod_delayed_work()?
-- 
Regards,
Luben

> +
>  /**
>   * drm_sched_fault - immediately start timeout handler
>   *
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 5d753ecb5d71..b7b818cd81b6 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -596,6 +596,7 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>  				    struct drm_gpu_scheduler **sched_list,
>                                     unsigned int num_sched_list);
>  
> +void drm_sched_set_timeout(struct drm_gpu_scheduler *sched, long timeout);
>  void drm_sched_job_cleanup(struct drm_sched_job *job);
>  void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched);
>  void drm_sched_add_msg(struct drm_gpu_scheduler *sched,


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 09/13] drm/sched: Submit job before starting TDR
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 09/13] drm/sched: Submit job before starting TDR Matthew Brost
@ 2023-09-14  2:56   ` Luben Tuikov
  2023-09-14 17:48     ` Matthew Brost
  0 siblings, 1 reply; 53+ messages in thread
From: Luben Tuikov @ 2023-09-14  2:56 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, boris.brezillon, donald.robson, daniel, lina,
	airlied, christian.koenig, faith.ekstrand

On 2023-09-11 22:16, Matthew Brost wrote:
> If the TDR is set to a value, it can fire before a job is submitted in
> drm_sched_main. The job should be always be submitted before the TDR
> fires, fix this ordering.
> 
> v2:
>   - Add to pending list before run_job, start TDR after (Luben, Boris)
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index c627d3e6494a..9dbfab7be2c6 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -498,7 +498,6 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
>  
>  	spin_lock(&sched->job_list_lock);
>  	list_add_tail(&s_job->list, &sched->pending_list);
> -	drm_sched_start_timeout(sched);
>  	spin_unlock(&sched->job_list_lock);
>  }
>  
> @@ -1234,6 +1233,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
>  		fence = sched->ops->run_job(sched_job);
>  		complete_all(&entity->entity_idle);
>  		drm_sched_fence_scheduled(s_fence, fence);
> +		drm_sched_start_timeout_unlocked(sched);
>  
>  		if (!IS_ERR_OR_NULL(fence)) {
>  			/* Drop for original kref_init of the fence */

So, sched->ops->run_job(), is a "job inflection point" from the point of view of
the DRM scheduler. After that call, DRM has relinquished control of the job to the
firmware/hardware.

Putting the job in the pending list, before submitting it to down to the firmware/hardware,
goes along with starting a timeout timer for the job. The timeout always includes
time for the firmware/hardware to get it prepped, as well as time for the actual
execution of the job (task). Thus, we want to do this:
	1. Put the job in pending list. "Pending list" means "pends in hardware".
	2. Start a timeout timer for the job.
	3. Start executing the job/task. This usually involves giving it to firmware/hardware,
	   i.e. ownership of the job/task changes to another domain. In our case this is accomplished
	   by calling sched->ops->run_job().
Perhaps move drm_sched_start_timeout() closer to sched->ops->run_job() from above and/or increase
the timeout value?
-- 
Regards,
Luben


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
  2023-09-12  7:29   ` Boris Brezillon
@ 2023-09-14  3:35   ` Luben Tuikov
  2023-09-16 17:07   ` Danilo Krummrich
  2 siblings, 0 replies; 53+ messages in thread
From: Luben Tuikov @ 2023-09-14  3:35 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, boris.brezillon, donald.robson, daniel, lina,
	airlied, christian.koenig, faith.ekstrand

On 2023-09-11 22:16, Matthew Brost wrote:
> In XE, the new Intel GPU driver, a choice has made to have a 1 to 1

has --> was


> mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
> seems a bit odd but let us explain the reasoning below.

It's totally fine! :-)

> 
> 1. In XE the submission order from multiple drm_sched_entity is not
> guaranteed to be the same completion even if targeting the same hardware
> engine. This is because in XE we have a firmware scheduler, the GuC,
> which allowed to reorder, timeslice, and preempt submissions. If a using
> shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
> apart as the TDR expects submission order == completion order. Using a
> dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.
> 
> 2. In XE submissions are done via programming a ring buffer (circular
> buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
> limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
> control on the ring for free.
> 
> A problem with this design is currently a drm_gpu_scheduler uses a
> kthread for submission / job cleanup. This doesn't scale if a large
> number of drm_gpu_scheduler are used. To work around the scaling issue,
> use a worker rather than kthread for submission / job cleanup.
> 
> v2:
>   - (Rob Clark) Fix msm build
>   - Pass in run work queue
> v3:
>   - (Boris) don't have loop in worker
> v4:
>   - (Tvrtko) break out submit ready, stop, start helpers into own patch
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +-
>  drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>  drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>  drivers/gpu/drm/msm/msm_ringbuffer.c       |   2 +-
>  drivers/gpu/drm/nouveau/nouveau_sched.c    |   2 +-
>  drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>  drivers/gpu/drm/scheduler/sched_main.c     | 106 +++++++++------------
>  drivers/gpu/drm/v3d/v3d_sched.c            |  10 +-
>  include/drm/gpu_scheduler.h                |  12 ++-
>  9 files changed, 65 insertions(+), 75 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 1f8a794704d0..c83a76bccc1d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2305,7 +2305,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>  			break;
>  		}
>  
> -		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
> +		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL,
>  				   ring->num_hw_submission, 0,
>  				   timeout, adev->reset_domain->wq,
>  				   ring->sched_score, ring->name,
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index 345fec6cb1a4..618a804ddc34 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -134,7 +134,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>  {
>  	int ret;
>  
> -	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
> +	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
>  			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>  			     msecs_to_jiffies(500), NULL, NULL,
>  			     dev_name(gpu->dev), gpu->dev);
> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> index ffd91a5ee299..8d858aed0e56 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
>  
>  	INIT_WORK(&pipe->recover_work, lima_sched_recover_work);
>  
> -	return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
> +	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1,
>  			      lima_job_hang_limit,
>  			      msecs_to_jiffies(timeout), NULL,
>  			      NULL, name, pipe->ldev->dev);
> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
> index 40c0bc35a44c..b8865e61b40f 100644
> --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> @@ -94,7 +94,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
>  	 /* currently managing hangcheck ourselves: */
>  	sched_timeout = MAX_SCHEDULE_TIMEOUT;
>  
> -	ret = drm_sched_init(&ring->sched, &msm_sched_ops,
> +	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
>  			num_hw_submissions, 0, sched_timeout,
>  			NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
>  	if (ret) {
> diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
> index 88217185e0f3..d458c2227d4f 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_sched.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
> @@ -429,7 +429,7 @@ int nouveau_sched_init(struct nouveau_drm *drm)
>  	if (!drm->sched_wq)
>  		return -ENOMEM;
>  
> -	return drm_sched_init(sched, &nouveau_sched_ops,
> +	return drm_sched_init(sched, &nouveau_sched_ops, NULL,
>  			      NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
>  			      NULL, NULL, "nouveau_sched", drm->dev->dev);
>  }
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 033f5e684707..326ca1ddf1d7 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -831,7 +831,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
>  		js->queue[j].fence_context = dma_fence_context_alloc(1);
>  
>  		ret = drm_sched_init(&js->queue[j].sched,
> -				     &panfrost_sched_ops,
> +				     &panfrost_sched_ops, NULL,
>  				     nentries, 0,
>  				     msecs_to_jiffies(JOB_TIMEOUT_MS),
>  				     pfdev->reset.wq,
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index e4fa62abca41..614e8c97a622 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -48,7 +48,6 @@
>   * through the jobs entity pointer.
>   */
>  
> -#include <linux/kthread.h>
>  #include <linux/wait.h>
>  #include <linux/sched.h>
>  #include <linux/completion.h>
> @@ -256,6 +255,16 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>  	return rb ? rb_entry(rb, struct drm_sched_entity, rb_tree_node) : NULL;
>  }
>  
> +/**
> + * drm_sched_submit_queue - scheduler queue submission
> + * @sched: scheduler instance
> + */

I'd probably use a much, much higher comment description here,
and use verbs as opposed to nouns. Something like "start/run the scheduler",
"let the scheduler submit", etc., etc.

> +static void drm_sched_submit_queue(struct drm_gpu_scheduler *sched)
> +{
> +	if (!READ_ONCE(sched->pause_submit))
> +		queue_work(sched->submit_wq, &sched->work_submit);
> +}
> +
>  /**
>   * drm_sched_job_done - complete a job
>   * @s_job: pointer to the job which is done
> @@ -275,7 +284,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
>  	dma_fence_get(&s_fence->finished);
>  	drm_sched_fence_finished(s_fence, result);
>  	dma_fence_put(&s_fence->finished);
> -	wake_up_interruptible(&sched->wake_up_worker);
> +	drm_sched_submit_queue(sched);
>  }
>  
>  /**
> @@ -868,7 +877,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>  void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched)
>  {
>  	if (drm_sched_can_queue(sched))
> -		wake_up_interruptible(&sched->wake_up_worker);
> +		drm_sched_submit_queue(sched);
>  }
>  
>  /**
> @@ -978,61 +987,42 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>  }
>  EXPORT_SYMBOL(drm_sched_pick_best);
>  
> -/**
> - * drm_sched_blocked - check if the scheduler is blocked
> - *
> - * @sched: scheduler instance
> - *
> - * Returns true if blocked, otherwise false.
> - */
> -static bool drm_sched_blocked(struct drm_gpu_scheduler *sched)
> -{
> -	if (kthread_should_park()) {
> -		kthread_parkme();
> -		return true;
> -	}
> -
> -	return false;
> -}
> -
>  /**
>   * drm_sched_main - main scheduler thread
>   *
>   * @param: scheduler instance
> - *
> - * Returns 0.
>   */
> -static int drm_sched_main(void *param)
> +static void drm_sched_main(struct work_struct *w)

I'd have to apply this to see the big picture, but this
is good work! Thanks! :-)
-- 
Regards,
Luben

>  {
> -	struct drm_gpu_scheduler *sched = (struct drm_gpu_scheduler *)param;
> +	struct drm_gpu_scheduler *sched =
> +		container_of(w, struct drm_gpu_scheduler, work_submit);
> +	struct drm_sched_entity *entity;
> +	struct drm_sched_job *cleanup_job;
>  	int r;
>  
> -	sched_set_fifo_low(current);
> +	if (READ_ONCE(sched->pause_submit))
> +		return;
>  
> -	while (!kthread_should_stop()) {
> -		struct drm_sched_entity *entity = NULL;
> -		struct drm_sched_fence *s_fence;
> -		struct drm_sched_job *sched_job;
> -		struct dma_fence *fence;
> -		struct drm_sched_job *cleanup_job = NULL;
> +	cleanup_job = drm_sched_get_cleanup_job(sched);
> +	entity = drm_sched_select_entity(sched);
>  
> -		wait_event_interruptible(sched->wake_up_worker,
> -					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
> -					 (!drm_sched_blocked(sched) &&
> -					  (entity = drm_sched_select_entity(sched))) ||
> -					 kthread_should_stop());
> +	if (!entity && !cleanup_job)
> +		return;	/* No more work */
>  
> -		if (cleanup_job)
> -			sched->ops->free_job(cleanup_job);
> +	if (cleanup_job)
> +		sched->ops->free_job(cleanup_job);
>  
> -		if (!entity)
> -			continue;
> +	if (entity) {
> +		struct dma_fence *fence;
> +		struct drm_sched_fence *s_fence;
> +		struct drm_sched_job *sched_job;
>  
>  		sched_job = drm_sched_entity_pop_job(entity);
> -
>  		if (!sched_job) {
>  			complete_all(&entity->entity_idle);
> -			continue;
> +			if (!cleanup_job)
> +				return;	/* No more work */
> +			goto again;
>  		}
>  
>  		s_fence = sched_job->s_fence;
> @@ -1063,7 +1053,9 @@ static int drm_sched_main(void *param)
>  
>  		wake_up(&sched->job_scheduled);
>  	}
> -	return 0;
> +
> +again:
> +	drm_sched_submit_queue(sched);
>  }
>  
>  /**
> @@ -1071,6 +1063,7 @@ static int drm_sched_main(void *param)
>   *
>   * @sched: scheduler instance
>   * @ops: backend operations for this scheduler
> + * @submit_wq: workqueue to use for submission. If NULL, the system_wq is used
>   * @hw_submission: number of hw submissions that can be in flight
>   * @hang_limit: number of times to allow a job to hang before dropping it
>   * @timeout: timeout value in jiffies for the scheduler
> @@ -1084,14 +1077,16 @@ static int drm_sched_main(void *param)
>   */
>  int drm_sched_init(struct drm_gpu_scheduler *sched,
>  		   const struct drm_sched_backend_ops *ops,
> +		   struct workqueue_struct *submit_wq,
>  		   unsigned hw_submission, unsigned hang_limit,
>  		   long timeout, struct workqueue_struct *timeout_wq,
>  		   atomic_t *score, const char *name, struct device *dev)
>  {
> -	int i, ret;
> +	int i;
>  	sched->ops = ops;
>  	sched->hw_submission_limit = hw_submission;
>  	sched->name = name;
> +	sched->submit_wq = submit_wq ? : system_wq;
>  	sched->timeout = timeout;
>  	sched->timeout_wq = timeout_wq ? : system_wq;
>  	sched->hang_limit = hang_limit;
> @@ -1100,23 +1095,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>  	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
>  		drm_sched_rq_init(sched, &sched->sched_rq[i]);
>  
> -	init_waitqueue_head(&sched->wake_up_worker);
>  	init_waitqueue_head(&sched->job_scheduled);
>  	INIT_LIST_HEAD(&sched->pending_list);
>  	spin_lock_init(&sched->job_list_lock);
>  	atomic_set(&sched->hw_rq_count, 0);
>  	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
> +	INIT_WORK(&sched->work_submit, drm_sched_main);
>  	atomic_set(&sched->_score, 0);
>  	atomic64_set(&sched->job_id_count, 0);
> -
> -	/* Each scheduler will run on a seperate kernel thread */
> -	sched->thread = kthread_run(drm_sched_main, sched, sched->name);
> -	if (IS_ERR(sched->thread)) {
> -		ret = PTR_ERR(sched->thread);
> -		sched->thread = NULL;
> -		DRM_DEV_ERROR(sched->dev, "Failed to create scheduler for %s.\n", name);
> -		return ret;
> -	}
> +	sched->pause_submit = false;
>  
>  	sched->ready = true;
>  	return 0;
> @@ -1135,8 +1122,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
>  	struct drm_sched_entity *s_entity;
>  	int i;
>  
> -	if (sched->thread)
> -		kthread_stop(sched->thread);
> +	drm_sched_submit_stop(sched);
>  
>  	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
>  		struct drm_sched_rq *rq = &sched->sched_rq[i];
> @@ -1216,7 +1202,7 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>   */
>  bool drm_sched_submit_ready(struct drm_gpu_scheduler *sched)
>  {
> -	return !!sched->thread;
> +	return sched->ready;
>  
>  }
>  EXPORT_SYMBOL(drm_sched_submit_ready);
> @@ -1228,7 +1214,8 @@ EXPORT_SYMBOL(drm_sched_submit_ready);
>   */
>  void drm_sched_submit_stop(struct drm_gpu_scheduler *sched)
>  {
> -	kthread_park(sched->thread);
> +	WRITE_ONCE(sched->pause_submit, true);
> +	cancel_work_sync(&sched->work_submit);
>  }
>  EXPORT_SYMBOL(drm_sched_submit_stop);
>  
> @@ -1239,6 +1226,7 @@ EXPORT_SYMBOL(drm_sched_submit_stop);
>   */
>  void drm_sched_submit_start(struct drm_gpu_scheduler *sched)
>  {
> -	kthread_unpark(sched->thread);
> +	WRITE_ONCE(sched->pause_submit, false);
> +	queue_work(sched->submit_wq, &sched->work_submit);
>  }
>  EXPORT_SYMBOL(drm_sched_submit_start);
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index 06238e6d7f5c..38e092ea41e6 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -388,7 +388,7 @@ v3d_sched_init(struct v3d_dev *v3d)
>  	int ret;
>  
>  	ret = drm_sched_init(&v3d->queue[V3D_BIN].sched,
> -			     &v3d_bin_sched_ops,
> +			     &v3d_bin_sched_ops, NULL,
>  			     hw_jobs_limit, job_hang_limit,
>  			     msecs_to_jiffies(hang_limit_ms), NULL,
>  			     NULL, "v3d_bin", v3d->drm.dev);
> @@ -396,7 +396,7 @@ v3d_sched_init(struct v3d_dev *v3d)
>  		return ret;
>  
>  	ret = drm_sched_init(&v3d->queue[V3D_RENDER].sched,
> -			     &v3d_render_sched_ops,
> +			     &v3d_render_sched_ops, NULL,
>  			     hw_jobs_limit, job_hang_limit,
>  			     msecs_to_jiffies(hang_limit_ms), NULL,
>  			     NULL, "v3d_render", v3d->drm.dev);
> @@ -404,7 +404,7 @@ v3d_sched_init(struct v3d_dev *v3d)
>  		goto fail;
>  
>  	ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
> -			     &v3d_tfu_sched_ops,
> +			     &v3d_tfu_sched_ops, NULL,
>  			     hw_jobs_limit, job_hang_limit,
>  			     msecs_to_jiffies(hang_limit_ms), NULL,
>  			     NULL, "v3d_tfu", v3d->drm.dev);
> @@ -413,7 +413,7 @@ v3d_sched_init(struct v3d_dev *v3d)
>  
>  	if (v3d_has_csd(v3d)) {
>  		ret = drm_sched_init(&v3d->queue[V3D_CSD].sched,
> -				     &v3d_csd_sched_ops,
> +				     &v3d_csd_sched_ops, NULL,
>  				     hw_jobs_limit, job_hang_limit,
>  				     msecs_to_jiffies(hang_limit_ms), NULL,
>  				     NULL, "v3d_csd", v3d->drm.dev);
> @@ -421,7 +421,7 @@ v3d_sched_init(struct v3d_dev *v3d)
>  			goto fail;
>  
>  		ret = drm_sched_init(&v3d->queue[V3D_CACHE_CLEAN].sched,
> -				     &v3d_cache_clean_sched_ops,
> +				     &v3d_cache_clean_sched_ops, NULL,
>  				     hw_jobs_limit, job_hang_limit,
>  				     msecs_to_jiffies(hang_limit_ms), NULL,
>  				     NULL, "v3d_cache_clean", v3d->drm.dev);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index f12c5aea5294..278106e358a8 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -473,17 +473,16 @@ struct drm_sched_backend_ops {
>   * @timeout: the time after which a job is removed from the scheduler.
>   * @name: name of the ring for which this scheduler is being used.
>   * @sched_rq: priority wise array of run queues.
> - * @wake_up_worker: the wait queue on which the scheduler sleeps until a job
> - *                  is ready to be scheduled.
>   * @job_scheduled: once @drm_sched_entity_do_release is called the scheduler
>   *                 waits on this wait queue until all the scheduled jobs are
>   *                 finished.
>   * @hw_rq_count: the number of jobs currently in the hardware queue.
>   * @job_id_count: used to assign unique id to the each job.
> + * @submit_wq: workqueue used to queue @work_submit
>   * @timeout_wq: workqueue used to queue @work_tdr
> + * @work_submit: schedules jobs and cleans up entities
>   * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
>   *            timeout interval is over.
> - * @thread: the kthread on which the scheduler which run.
>   * @pending_list: the list of jobs which are currently in the job queue.
>   * @job_list_lock: lock to protect the pending_list.
>   * @hang_limit: once the hangs by a job crosses this limit then it is marked
> @@ -492,6 +491,7 @@ struct drm_sched_backend_ops {
>   * @_score: score used when the driver doesn't provide one
>   * @ready: marks if the underlying HW is ready to work
>   * @free_guilty: A hit to time out handler to free the guilty job.
> + * @pause_submit: pause queuing of @work_submit on @submit_wq
>   * @dev: system &struct device
>   *
>   * One scheduler is implemented for each hardware ring.
> @@ -502,13 +502,13 @@ struct drm_gpu_scheduler {
>  	long				timeout;
>  	const char			*name;
>  	struct drm_sched_rq		sched_rq[DRM_SCHED_PRIORITY_COUNT];
> -	wait_queue_head_t		wake_up_worker;
>  	wait_queue_head_t		job_scheduled;
>  	atomic_t			hw_rq_count;
>  	atomic64_t			job_id_count;
> +	struct workqueue_struct		*submit_wq;
>  	struct workqueue_struct		*timeout_wq;
> +	struct work_struct		work_submit;
>  	struct delayed_work		work_tdr;
> -	struct task_struct		*thread;
>  	struct list_head		pending_list;
>  	spinlock_t			job_list_lock;
>  	int				hang_limit;
> @@ -516,11 +516,13 @@ struct drm_gpu_scheduler {
>  	atomic_t                        _score;
>  	bool				ready;
>  	bool				free_guilty;
> +	bool				pause_submit;
>  	struct device			*dev;
>  };
>  
>  int drm_sched_init(struct drm_gpu_scheduler *sched,
>  		   const struct drm_sched_backend_ops *ops,
> +		   struct workqueue_struct *submit_wq,
>  		   uint32_t hw_submission, unsigned hang_limit,
>  		   long timeout, struct workqueue_struct *timeout_wq,
>  		   atomic_t *score, const char *name, struct device *dev);


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread
  2023-09-12 15:02     ` Matthew Brost
@ 2023-09-14  3:41       ` Luben Tuikov
  0 siblings, 0 replies; 53+ messages in thread
From: Luben Tuikov @ 2023-09-14  3:41 UTC (permalink / raw)
  To: Matthew Brost, Boris Brezillon
  Cc: robdclark, sarah.walker, ketil.johnsen, lina, mcanal, Liviu.Dudau,
	dri-devel, intel-xe, donald.robson, christian.koenig,
	faith.ekstrand

On 2023-09-12 11:02, Matthew Brost wrote:
> On Tue, Sep 12, 2023 at 09:29:53AM +0200, Boris Brezillon wrote:
>> On Mon, 11 Sep 2023 19:16:04 -0700
>> Matthew Brost <matthew.brost@intel.com> wrote:
>>
>>> @@ -1071,6 +1063,7 @@ static int drm_sched_main(void *param)
>>>   *
>>>   * @sched: scheduler instance
>>>   * @ops: backend operations for this scheduler
>>> + * @submit_wq: workqueue to use for submission. If NULL, the system_wq is used
>>>   * @hw_submission: number of hw submissions that can be in flight
>>>   * @hang_limit: number of times to allow a job to hang before dropping it
>>>   * @timeout: timeout value in jiffies for the scheduler
>>> @@ -1084,14 +1077,16 @@ static int drm_sched_main(void *param)
>>>   */
>>>  int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>  		   const struct drm_sched_backend_ops *ops,
>>> +		   struct workqueue_struct *submit_wq,
>>>  		   unsigned hw_submission, unsigned hang_limit,
>>>  		   long timeout, struct workqueue_struct *timeout_wq,
>>>  		   atomic_t *score, const char *name, struct device *dev)
>>>  {
>>> -	int i, ret;
>>> +	int i;
>>>  	sched->ops = ops;
>>>  	sched->hw_submission_limit = hw_submission;
>>>  	sched->name = name;
>>> +	sched->submit_wq = submit_wq ? : system_wq;
>>
>> My understanding is that the new design is based on the idea of
>> splitting the drm_sched_main function into work items that can be
>> scheduled independently so users/drivers can insert their own
>> steps/works without requiring changes to drm_sched. This approach is
>> relying on the properties of ordered workqueues (1 work executed at a
>> time, FIFO behavior) to guarantee that these steps are still executed
>> in order, and one at a time.
>>
>> Given what you're trying to achieve I think we should create an ordered
>> workqueue instead of using the system_wq when submit_wq is NULL,
>> otherwise you lose this ordering/serialization guarantee which both
>> the dedicated kthread and ordered wq provide. It will probably work for
>> most drivers, but might lead to subtle/hard to spot ordering issues.
>>
> 
> I debated chosing between a system_wq or creating an ordered-wq by
> default myself. Indeed using the system_wq by default subtlety changes
> the behavior as run_job & free_job workers can run in parallel. To be
> safe, agree the default use be an ordered-wq. If drivers are fine with
> run_job() and free_job() running in parallel, they are free to set
> submit_wq == system_wq. Will change in next rev.
> 
> Matt

So, yes, this is very good--do make that change. However, in case
of parallelism between run_job() and free_job(), perhaps we should
have a function parameter, to control this, and then internally,
we decide whether to use system_wq (perhaps not) or our own
workqueue which is just not ordered. This will give us some flexibility
should we need to have better control/reporting/etc., of our workqueue.
-- 
Regards,
Luben

> 
>>>  	sched->timeout = timeout;
>>>  	sched->timeout_wq = timeout_wq ? : system_wq;
>>>  	sched->hang_limit = hang_limit;
>>> @@ -1100,23 +1095,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>  	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
>>>  		drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>  
>>> -	init_waitqueue_head(&sched->wake_up_worker);
>>>  	init_waitqueue_head(&sched->job_scheduled);
>>>  	INIT_LIST_HEAD(&sched->pending_list);
>>>  	spin_lock_init(&sched->job_list_lock);
>>>  	atomic_set(&sched->hw_rq_count, 0);
>>>  	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
>>> +	INIT_WORK(&sched->work_submit, drm_sched_main);
>>>  	atomic_set(&sched->_score, 0);
>>>  	atomic64_set(&sched->job_id_count, 0);
>>> -
>>> -	/* Each scheduler will run on a seperate kernel thread */
>>> -	sched->thread = kthread_run(drm_sched_main, sched, sched->name);
>>> -	if (IS_ERR(sched->thread)) {
>>> -		ret = PTR_ERR(sched->thread);
>>> -		sched->thread = NULL;
>>> -		DRM_DEV_ERROR(sched->dev, "Failed to create scheduler for %s.\n", name);
>>> -		return ret;
>>> -	}
>>> +	sched->pause_submit = false;
>>>  
>>>  	sched->ready = true;
>>>  	return 0;


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity Matthew Brost
  2023-09-12  7:37   ` Boris Brezillon
  2023-09-12 14:11   ` kernel test robot
@ 2023-09-14  4:18   ` Luben Tuikov
  2023-09-14  4:23     ` Luben Tuikov
  2023-09-14 15:49     ` Matthew Brost
  2 siblings, 2 replies; 53+ messages in thread
From: Luben Tuikov @ 2023-09-14  4:18 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, boris.brezillon, donald.robson, daniel, lina,
	airlied, christian.koenig, faith.ekstrand

On 2023-09-11 22:16, Matthew Brost wrote:
> Rather than a global modparam for scheduling policy, move the scheduling
> policy to scheduler / entity so user can control each scheduler / entity
> policy.
> 
> v2:
>   - s/DRM_SCHED_POLICY_MAX/DRM_SCHED_POLICY_COUNT (Luben)
>   - Only include policy in scheduler (Luben)
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
>  drivers/gpu/drm/etnaviv/etnaviv_sched.c    |  3 ++-
>  drivers/gpu/drm/lima/lima_sched.c          |  3 ++-
>  drivers/gpu/drm/msm/msm_ringbuffer.c       |  3 ++-
>  drivers/gpu/drm/nouveau/nouveau_sched.c    |  3 ++-
>  drivers/gpu/drm/panfrost/panfrost_job.c    |  3 ++-
>  drivers/gpu/drm/scheduler/sched_entity.c   | 24 ++++++++++++++++++----
>  drivers/gpu/drm/scheduler/sched_main.c     | 23 +++++++++++++++------
>  drivers/gpu/drm/v3d/v3d_sched.c            | 15 +++++++++-----
>  include/drm/gpu_scheduler.h                | 20 ++++++++++++------
>  10 files changed, 72 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index c83a76bccc1d..ecb00991dd51 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2309,6 +2309,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>  				   ring->num_hw_submission, 0,
>  				   timeout, adev->reset_domain->wq,
>  				   ring->sched_score, ring->name,
> +				   DRM_SCHED_POLICY_DEFAULT,
>  				   adev->dev);
>  		if (r) {
>  			DRM_ERROR("Failed to create scheduler on ring %s.\n",
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index 618a804ddc34..3646f995ca94 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -137,7 +137,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>  	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
>  			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>  			     msecs_to_jiffies(500), NULL, NULL,
> -			     dev_name(gpu->dev), gpu->dev);
> +			     dev_name(gpu->dev), DRM_SCHED_POLICY_DEFAULT,
> +			     gpu->dev);
>  	if (ret)
>  		return ret;
>  
> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> index 8d858aed0e56..465d4bf3882b 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -491,7 +491,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
>  	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1,
>  			      lima_job_hang_limit,
>  			      msecs_to_jiffies(timeout), NULL,
> -			      NULL, name, pipe->ldev->dev);
> +			      NULL, name, DRM_SCHED_POLICY_DEFAULT,
> +			      pipe->ldev->dev);
>  }
>  
>  void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
> index b8865e61b40f..f45e674a0aaf 100644
> --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> @@ -96,7 +96,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
>  
>  	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
>  			num_hw_submissions, 0, sched_timeout,
> -			NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
> +			NULL, NULL, to_msm_bo(ring->bo)->name,
> +			DRM_SCHED_POLICY_DEFAULT, gpu->dev->dev);
>  	if (ret) {
>  		goto fail;
>  	}
> diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
> index d458c2227d4f..70e497e40c70 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_sched.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
> @@ -431,7 +431,8 @@ int nouveau_sched_init(struct nouveau_drm *drm)
>  
>  	return drm_sched_init(sched, &nouveau_sched_ops, NULL,
>  			      NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
> -			      NULL, NULL, "nouveau_sched", drm->dev->dev);
> +			      NULL, NULL, "nouveau_sched",
> +			      DRM_SCHED_POLICY_DEFAULT, drm->dev->dev);
>  }
>  
>  void nouveau_sched_fini(struct nouveau_drm *drm)
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 326ca1ddf1d7..ad36bf3a4699 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -835,7 +835,8 @@ int panfrost_job_init(struct panfrost_device *pfdev)
>  				     nentries, 0,
>  				     msecs_to_jiffies(JOB_TIMEOUT_MS),
>  				     pfdev->reset.wq,
> -				     NULL, "pan_js", pfdev->dev);
> +				     NULL, "pan_js", DRM_SCHED_POLICY_DEFAULT,
> +				     pfdev->dev);
>  		if (ret) {
>  			dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret);
>  			goto err_sched;
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index a42763e1429d..65a972b52eda 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -33,6 +33,20 @@
>  #define to_drm_sched_job(sched_job)		\
>  		container_of((sched_job), struct drm_sched_job, queue_node)
>  
> +static bool bad_policies(struct drm_gpu_scheduler **sched_list,
> +			 unsigned int num_sched_list)

Rename the function to the status quo,
	drm_sched_policy_mismatch(...

> +{
> +	enum drm_sched_policy sched_policy = sched_list[0]->sched_policy;
> +	unsigned int i;
> +
> +	/* All schedule policies must match */
> +	for (i = 1; i < num_sched_list; ++i)
> +		if (sched_policy != sched_list[i]->sched_policy)
> +			return true;
> +
> +	return false;
> +}
> +
>  /**
>   * drm_sched_entity_init - Init a context entity used by scheduler when
>   * submit to HW ring.
> @@ -62,7 +76,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>  			  unsigned int num_sched_list,
>  			  atomic_t *guilty)
>  {
> -	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
> +	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])) ||
> +	    bad_policies(sched_list, num_sched_list))
>  		return -EINVAL;
>  
>  	memset(entity, 0, sizeof(struct drm_sched_entity));
> @@ -486,7 +501,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
>  	 * Update the entity's location in the min heap according to
>  	 * the timestamp of the next job, if any.
>  	 */
> -	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
> +	if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO) {
>  		struct drm_sched_job *next;
>  
>  		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> @@ -558,7 +573,8 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>  void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>  {
>  	struct drm_sched_entity *entity = sched_job->entity;
> -	bool first;
> +	bool first, fifo = entity->rq->sched->sched_policy ==
> +		DRM_SCHED_POLICY_FIFO;
>  	ktime_t submit_ts;
>  
>  	trace_drm_sched_job(sched_job, entity);
> @@ -587,7 +603,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>  		drm_sched_rq_add_entity(entity->rq, entity);
>  		spin_unlock(&entity->rq_lock);
>  
> -		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> +		if (fifo)
>  			drm_sched_rq_update_fifo(entity, submit_ts);
>  
>  		drm_sched_wakeup_if_can_queue(entity->rq->sched);
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 614e8c97a622..545d5298c086 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -66,14 +66,14 @@
>  #define to_drm_sched_job(sched_job)		\
>  		container_of((sched_job), struct drm_sched_job, queue_node)
>  
> -int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
> +int default_drm_sched_policy = DRM_SCHED_POLICY_FIFO;
>  
>  /**
>   * DOC: sched_policy (int)
>   * Used to override default entities scheduling policy in a run queue.
>   */
> -MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
> -module_param_named(sched_policy, drm_sched_policy, int, 0444);
> +MODULE_PARM_DESC(sched_policy, "Specify the default scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");

Note, that you don't need to add "default" in the text as it is already there at the very end "FIFO (default)."
Else, it gets confusing what is meant by "default". Like this:

	Specify the default scheduling policy for entities on a run-queue, 1 = Round Robin, 2 = FIFO (default).

See "default" appear twice and it creates confusion? We don't need our internal "default" play to get
exported all the way to the casual user reading this. It is much clear, however,

	Specify the scheduling policy for entities on a run-queue, 1 = Round Robin, 2 = FIFO (default).

To mean, if unset, the default one would be used. But this is all internal code stuff.

So I'd say leave this one alone.

> +module_param_named(sched_policy, default_drm_sched_policy, int, 0444);

Put "default" as a postfix:
default_drm_sched_policy --> drm_sched_policy_default

>  
>  static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a,
>  							    const struct rb_node *b)
> @@ -177,7 +177,7 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
>  	if (rq->current_entity == entity)
>  		rq->current_entity = NULL;
>  
> -	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> +	if (rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO)
>  		drm_sched_rq_remove_fifo_locked(entity);
>  
>  	spin_unlock(&rq->lock);
> @@ -898,7 +898,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>  
>  	/* Kernel run queue has higher priority than normal run queue*/
>  	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
> -		entity = drm_sched_policy == DRM_SCHED_POLICY_FIFO ?
> +		entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ?
>  			drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]) :
>  			drm_sched_rq_select_entity_rr(&sched->sched_rq[i]);
>  		if (entity)
> @@ -1071,6 +1071,7 @@ static void drm_sched_main(struct work_struct *w)
>   *		used
>   * @score: optional score atomic shared with other schedulers
>   * @name: name used for debugging
> + * @sched_policy: schedule policy
>   * @dev: target &struct device
>   *
>   * Return 0 on success, otherwise error code.
> @@ -1080,9 +1081,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>  		   struct workqueue_struct *submit_wq,
>  		   unsigned hw_submission, unsigned hang_limit,
>  		   long timeout, struct workqueue_struct *timeout_wq,
> -		   atomic_t *score, const char *name, struct device *dev)
> +		   atomic_t *score, const char *name,
> +		   enum drm_sched_policy sched_policy,
> +		   struct device *dev)
>  {
>  	int i;
> +
> +	if (sched_policy >= DRM_SCHED_POLICY_COUNT)
> +		return -EINVAL;
> +
>  	sched->ops = ops;
>  	sched->hw_submission_limit = hw_submission;
>  	sched->name = name;
> @@ -1092,6 +1099,10 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>  	sched->hang_limit = hang_limit;
>  	sched->score = score ? score : &sched->_score;
>  	sched->dev = dev;
> +	if (sched_policy == DRM_SCHED_POLICY_DEFAULT)
> +		sched->sched_policy = default_drm_sched_policy;
> +	else
> +		sched->sched_policy = sched_policy;
>  	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
>  		drm_sched_rq_init(sched, &sched->sched_rq[i]);
>  
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index 38e092ea41e6..5e3fe77fa991 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -391,7 +391,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>  			     &v3d_bin_sched_ops, NULL,
>  			     hw_jobs_limit, job_hang_limit,
>  			     msecs_to_jiffies(hang_limit_ms), NULL,
> -			     NULL, "v3d_bin", v3d->drm.dev);
> +			     NULL, "v3d_bin", DRM_SCHED_POLICY_DEFAULT,
> +			     v3d->drm.dev);
>  	if (ret)
>  		return ret;
>  
> @@ -399,7 +400,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>  			     &v3d_render_sched_ops, NULL,
>  			     hw_jobs_limit, job_hang_limit,
>  			     msecs_to_jiffies(hang_limit_ms), NULL,
> -			     NULL, "v3d_render", v3d->drm.dev);
> +			     ULL, "v3d_render", DRM_SCHED_POLICY_DEFAULT,
> +			     v3d->drm.dev);
>  	if (ret)
>  		goto fail;
>  
> @@ -407,7 +409,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>  			     &v3d_tfu_sched_ops, NULL,
>  			     hw_jobs_limit, job_hang_limit,
>  			     msecs_to_jiffies(hang_limit_ms), NULL,
> -			     NULL, "v3d_tfu", v3d->drm.dev);
> +			     NULL, "v3d_tfu", DRM_SCHED_POLICY_DEFAULT,
> +			     v3d->drm.dev);
>  	if (ret)
>  		goto fail;
>  
> @@ -416,7 +419,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>  				     &v3d_csd_sched_ops, NULL,
>  				     hw_jobs_limit, job_hang_limit,
>  				     msecs_to_jiffies(hang_limit_ms), NULL,
> -				     NULL, "v3d_csd", v3d->drm.dev);
> +				     NULL, "v3d_csd", DRM_SCHED_POLICY_DEFAULT,
> +				     v3d->drm.dev);
>  		if (ret)
>  			goto fail;
>  
> @@ -424,7 +428,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>  				     &v3d_cache_clean_sched_ops, NULL,
>  				     hw_jobs_limit, job_hang_limit,
>  				     msecs_to_jiffies(hang_limit_ms), NULL,
> -				     NULL, "v3d_cache_clean", v3d->drm.dev);
> +				     NULL, "v3d_cache_clean",
> +				     DRM_SCHED_POLICY_DEFAULT, v3d->drm.dev);
>  		if (ret)
>  			goto fail;
>  	}
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 278106e358a8..897d52a4ff4f 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -72,11 +72,15 @@ enum drm_sched_priority {
>  	DRM_SCHED_PRIORITY_UNSET = -2
>  };
>  
> -/* Used to chose between FIFO and RR jobs scheduling */
> -extern int drm_sched_policy;
> -
> -#define DRM_SCHED_POLICY_RR    0
> -#define DRM_SCHED_POLICY_FIFO  1
> +/* Used to chose default scheduling policy*/
> +extern int default_drm_sched_policy;
> +
> +enum drm_sched_policy {
> +	DRM_SCHED_POLICY_DEFAULT,
> +	DRM_SCHED_POLICY_RR,
> +	DRM_SCHED_POLICY_FIFO,
> +	DRM_SCHED_POLICY_COUNT,
> +};

No. Use as the first (0th) element name "DRM_SCHED_POLICY_UNSET".
The DRM scheduling policies are,
	* unset, meaning no preference, whatever the default is, (but that's NOT the "default"),
	* Round-Robin, and
	* FIFO.
"Default" is a _result_ of the policy being _unset_. "Default" is not a policy.
IOW, we want to say,
	"If you don't set the policy (i.e. it's unset), we'll set it to the default one,
which could be either Round-Robin, or FIFO."

It may look a bit strange in function calls up there, "What do you mean `unset'? What is it?"
but it needs to be understood that the _policy_ is "unset", "rr", or "fifo", and if it is "unset",
we'll set it to whatever the default one was set to, at boot/compile time, RR or FIFO.

Note that "unset" is equivalent to a function not having the policy parameter altogether (as right now).
Now that you're adding it, you can extend that, as opposed to renaming the enum
to "DEFAULT" to tell the caller that it will be set to the default one. But we don't need
to tell function behaviour in the name of a function parameter/enum element.

>  
>  /**
>   * struct drm_sched_entity - A wrapper around a job queue (typically
> @@ -489,6 +493,7 @@ struct drm_sched_backend_ops {
>   *              guilty and it will no longer be considered for scheduling.
>   * @score: score to help loadbalancer pick a idle sched
>   * @_score: score used when the driver doesn't provide one
> + * @sched_policy: Schedule policy for scheduler
>   * @ready: marks if the underlying HW is ready to work
>   * @free_guilty: A hit to time out handler to free the guilty job.
>   * @pause_submit: pause queuing of @work_submit on @submit_wq
> @@ -514,6 +519,7 @@ struct drm_gpu_scheduler {
>  	int				hang_limit;
>  	atomic_t                        *score;
>  	atomic_t                        _score;
> +	enum drm_sched_policy		sched_policy;
>  	bool				ready;
>  	bool				free_guilty;
>  	bool				pause_submit;
> @@ -525,7 +531,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>  		   struct workqueue_struct *submit_wq,
>  		   uint32_t hw_submission, unsigned hang_limit,
>  		   long timeout, struct workqueue_struct *timeout_wq,
> -		   atomic_t *score, const char *name, struct device *dev);
> +		   atomic_t *score, const char *name,
> +		   enum drm_sched_policy sched_policy,
> +		   struct device *dev);
>  
>  void drm_sched_fini(struct drm_gpu_scheduler *sched);
>  int drm_sched_job_init(struct drm_sched_job *job,

-- 
Regards,
Luben


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity
  2023-09-14  4:18   ` Luben Tuikov
@ 2023-09-14  4:23     ` Luben Tuikov
  2023-09-14 15:48       ` Matthew Brost
  2023-09-14 15:49     ` Matthew Brost
  1 sibling, 1 reply; 53+ messages in thread
From: Luben Tuikov @ 2023-09-14  4:23 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, frank.binns, mcanal,
	Liviu.Dudau, boris.brezillon, donald.robson, daniel, lina,
	airlied, christian.koenig, faith.ekstrand

On 2023-09-14 00:18, Luben Tuikov wrote:
> On 2023-09-11 22:16, Matthew Brost wrote:
>> Rather than a global modparam for scheduling policy, move the scheduling
>> policy to scheduler / entity so user can control each scheduler / entity
>> policy.
>>
>> v2:
>>   - s/DRM_SCHED_POLICY_MAX/DRM_SCHED_POLICY_COUNT (Luben)
>>   - Only include policy in scheduler (Luben)
>>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
>>  drivers/gpu/drm/etnaviv/etnaviv_sched.c    |  3 ++-
>>  drivers/gpu/drm/lima/lima_sched.c          |  3 ++-
>>  drivers/gpu/drm/msm/msm_ringbuffer.c       |  3 ++-
>>  drivers/gpu/drm/nouveau/nouveau_sched.c    |  3 ++-
>>  drivers/gpu/drm/panfrost/panfrost_job.c    |  3 ++-
>>  drivers/gpu/drm/scheduler/sched_entity.c   | 24 ++++++++++++++++++----
>>  drivers/gpu/drm/scheduler/sched_main.c     | 23 +++++++++++++++------
>>  drivers/gpu/drm/v3d/v3d_sched.c            | 15 +++++++++-----
>>  include/drm/gpu_scheduler.h                | 20 ++++++++++++------
>>  10 files changed, 72 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index c83a76bccc1d..ecb00991dd51 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2309,6 +2309,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>>  				   ring->num_hw_submission, 0,
>>  				   timeout, adev->reset_domain->wq,
>>  				   ring->sched_score, ring->name,
>> +				   DRM_SCHED_POLICY_DEFAULT,
>>  				   adev->dev);
>>  		if (r) {
>>  			DRM_ERROR("Failed to create scheduler on ring %s.\n",
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> index 618a804ddc34..3646f995ca94 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> @@ -137,7 +137,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>  	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
>>  			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>  			     msecs_to_jiffies(500), NULL, NULL,
>> -			     dev_name(gpu->dev), gpu->dev);
>> +			     dev_name(gpu->dev), DRM_SCHED_POLICY_DEFAULT,
>> +			     gpu->dev);
>>  	if (ret)
>>  		return ret;
>>  
>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>> index 8d858aed0e56..465d4bf3882b 100644
>> --- a/drivers/gpu/drm/lima/lima_sched.c
>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>> @@ -491,7 +491,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
>>  	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1,
>>  			      lima_job_hang_limit,
>>  			      msecs_to_jiffies(timeout), NULL,
>> -			      NULL, name, pipe->ldev->dev);
>> +			      NULL, name, DRM_SCHED_POLICY_DEFAULT,
>> +			      pipe->ldev->dev);
>>  }
>>  
>>  void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
>> index b8865e61b40f..f45e674a0aaf 100644
>> --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
>> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
>> @@ -96,7 +96,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
>>  
>>  	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
>>  			num_hw_submissions, 0, sched_timeout,
>> -			NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
>> +			NULL, NULL, to_msm_bo(ring->bo)->name,
>> +			DRM_SCHED_POLICY_DEFAULT, gpu->dev->dev);
>>  	if (ret) {
>>  		goto fail;
>>  	}
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
>> index d458c2227d4f..70e497e40c70 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_sched.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
>> @@ -431,7 +431,8 @@ int nouveau_sched_init(struct nouveau_drm *drm)
>>  
>>  	return drm_sched_init(sched, &nouveau_sched_ops, NULL,
>>  			      NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
>> -			      NULL, NULL, "nouveau_sched", drm->dev->dev);
>> +			      NULL, NULL, "nouveau_sched",
>> +			      DRM_SCHED_POLICY_DEFAULT, drm->dev->dev);
>>  }
>>  
>>  void nouveau_sched_fini(struct nouveau_drm *drm)
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>> index 326ca1ddf1d7..ad36bf3a4699 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>> @@ -835,7 +835,8 @@ int panfrost_job_init(struct panfrost_device *pfdev)
>>  				     nentries, 0,
>>  				     msecs_to_jiffies(JOB_TIMEOUT_MS),
>>  				     pfdev->reset.wq,
>> -				     NULL, "pan_js", pfdev->dev);
>> +				     NULL, "pan_js", DRM_SCHED_POLICY_DEFAULT,
>> +				     pfdev->dev);
>>  		if (ret) {
>>  			dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret);
>>  			goto err_sched;
>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>> index a42763e1429d..65a972b52eda 100644
>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>> @@ -33,6 +33,20 @@
>>  #define to_drm_sched_job(sched_job)		\
>>  		container_of((sched_job), struct drm_sched_job, queue_node)
>>  
>> +static bool bad_policies(struct drm_gpu_scheduler **sched_list,
>> +			 unsigned int num_sched_list)
> 
> Rename the function to the status quo,
> 	drm_sched_policy_mismatch(...
> 
>> +{
>> +	enum drm_sched_policy sched_policy = sched_list[0]->sched_policy;
>> +	unsigned int i;
>> +
>> +	/* All schedule policies must match */
>> +	for (i = 1; i < num_sched_list; ++i)
>> +		if (sched_policy != sched_list[i]->sched_policy)
>> +			return true;
>> +
>> +	return false;
>> +}
>> +
>>  /**
>>   * drm_sched_entity_init - Init a context entity used by scheduler when
>>   * submit to HW ring.
>> @@ -62,7 +76,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>>  			  unsigned int num_sched_list,
>>  			  atomic_t *guilty)
>>  {
>> -	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
>> +	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])) ||
>> +	    bad_policies(sched_list, num_sched_list))
>>  		return -EINVAL;
>>  
>>  	memset(entity, 0, sizeof(struct drm_sched_entity));
>> @@ -486,7 +501,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
>>  	 * Update the entity's location in the min heap according to
>>  	 * the timestamp of the next job, if any.
>>  	 */
>> -	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
>> +	if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO) {
>>  		struct drm_sched_job *next;
>>  
>>  		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
>> @@ -558,7 +573,8 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>  void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>>  {
>>  	struct drm_sched_entity *entity = sched_job->entity;
>> -	bool first;
>> +	bool first, fifo = entity->rq->sched->sched_policy ==
>> +		DRM_SCHED_POLICY_FIFO;
>>  	ktime_t submit_ts;
>>  
>>  	trace_drm_sched_job(sched_job, entity);
>> @@ -587,7 +603,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>>  		drm_sched_rq_add_entity(entity->rq, entity);
>>  		spin_unlock(&entity->rq_lock);
>>  
>> -		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
>> +		if (fifo)
>>  			drm_sched_rq_update_fifo(entity, submit_ts);
>>  
>>  		drm_sched_wakeup_if_can_queue(entity->rq->sched);
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 614e8c97a622..545d5298c086 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -66,14 +66,14 @@
>>  #define to_drm_sched_job(sched_job)		\
>>  		container_of((sched_job), struct drm_sched_job, queue_node)
>>  
>> -int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
>> +int default_drm_sched_policy = DRM_SCHED_POLICY_FIFO;
>>  
>>  /**
>>   * DOC: sched_policy (int)
>>   * Used to override default entities scheduling policy in a run queue.
>>   */
>> -MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
>> -module_param_named(sched_policy, drm_sched_policy, int, 0444);
>> +MODULE_PARM_DESC(sched_policy, "Specify the default scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
> 
> Note, that you don't need to add "default" in the text as it is already there at the very end "FIFO (default)."
> Else, it gets confusing what is meant by "default". Like this:
> 
> 	Specify the default scheduling policy for entities on a run-queue, 1 = Round Robin, 2 = FIFO (default).
> 
> See "default" appear twice and it creates confusion? We don't need our internal "default" play to get
> exported all the way to the casual user reading this. It is much clear, however,
> 
> 	Specify the scheduling policy for entities on a run-queue, 1 = Round Robin, 2 = FIFO (default).
> 
> To mean, if unset, the default one would be used. But this is all internal code stuff.
> 
> So I'd say leave this one alone.
> 
>> +module_param_named(sched_policy, default_drm_sched_policy, int, 0444);
> 
> Put "default" as a postfix:
> default_drm_sched_policy --> drm_sched_policy_default
> 
>>  
>>  static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a,
>>  							    const struct rb_node *b)
>> @@ -177,7 +177,7 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
>>  	if (rq->current_entity == entity)
>>  		rq->current_entity = NULL;
>>  
>> -	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
>> +	if (rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO)
>>  		drm_sched_rq_remove_fifo_locked(entity);
>>  
>>  	spin_unlock(&rq->lock);
>> @@ -898,7 +898,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>>  
>>  	/* Kernel run queue has higher priority than normal run queue*/
>>  	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
>> -		entity = drm_sched_policy == DRM_SCHED_POLICY_FIFO ?
>> +		entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ?
>>  			drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]) :
>>  			drm_sched_rq_select_entity_rr(&sched->sched_rq[i]);
>>  		if (entity)
>> @@ -1071,6 +1071,7 @@ static void drm_sched_main(struct work_struct *w)
>>   *		used
>>   * @score: optional score atomic shared with other schedulers
>>   * @name: name used for debugging
>> + * @sched_policy: schedule policy
>>   * @dev: target &struct device
>>   *
>>   * Return 0 on success, otherwise error code.
>> @@ -1080,9 +1081,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>  		   struct workqueue_struct *submit_wq,
>>  		   unsigned hw_submission, unsigned hang_limit,
>>  		   long timeout, struct workqueue_struct *timeout_wq,
>> -		   atomic_t *score, const char *name, struct device *dev)
>> +		   atomic_t *score, const char *name,
>> +		   enum drm_sched_policy sched_policy,
>> +		   struct device *dev)
>>  {
>>  	int i;
>> +
>> +	if (sched_policy >= DRM_SCHED_POLICY_COUNT)
>> +		return -EINVAL;
>> +
>>  	sched->ops = ops;
>>  	sched->hw_submission_limit = hw_submission;
>>  	sched->name = name;
>> @@ -1092,6 +1099,10 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>  	sched->hang_limit = hang_limit;
>>  	sched->score = score ? score : &sched->_score;
>>  	sched->dev = dev;
>> +	if (sched_policy == DRM_SCHED_POLICY_DEFAULT)
>> +		sched->sched_policy = default_drm_sched_policy;
>> +	else
>> +		sched->sched_policy = sched_policy;

Note also that here you can use a ternary operator as opposed to an if-control.

	sched->sched_policy = sched_policy == DRM_SCHED_POLICY_UNSET ?
				drm_sched_policy_default : sched_policy;

-- 
Regards,
Luben


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity
  2023-09-14  4:23     ` Luben Tuikov
@ 2023-09-14 15:48       ` Matthew Brost
  0 siblings, 0 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-14 15:48 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, christian.koenig, boris.brezillon,
	donald.robson, daniel, lina, airlied, intel-xe, faith.ekstrand

On Thu, Sep 14, 2023 at 12:23:35AM -0400, Luben Tuikov wrote:
> On 2023-09-14 00:18, Luben Tuikov wrote:
> > On 2023-09-11 22:16, Matthew Brost wrote:
> >> Rather than a global modparam for scheduling policy, move the scheduling
> >> policy to scheduler / entity so user can control each scheduler / entity
> >> policy.
> >>
> >> v2:
> >>   - s/DRM_SCHED_POLICY_MAX/DRM_SCHED_POLICY_COUNT (Luben)
> >>   - Only include policy in scheduler (Luben)
> >>
> >> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >> ---
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
> >>  drivers/gpu/drm/etnaviv/etnaviv_sched.c    |  3 ++-
> >>  drivers/gpu/drm/lima/lima_sched.c          |  3 ++-
> >>  drivers/gpu/drm/msm/msm_ringbuffer.c       |  3 ++-
> >>  drivers/gpu/drm/nouveau/nouveau_sched.c    |  3 ++-
> >>  drivers/gpu/drm/panfrost/panfrost_job.c    |  3 ++-
> >>  drivers/gpu/drm/scheduler/sched_entity.c   | 24 ++++++++++++++++++----
> >>  drivers/gpu/drm/scheduler/sched_main.c     | 23 +++++++++++++++------
> >>  drivers/gpu/drm/v3d/v3d_sched.c            | 15 +++++++++-----
> >>  include/drm/gpu_scheduler.h                | 20 ++++++++++++------
> >>  10 files changed, 72 insertions(+), 26 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> index c83a76bccc1d..ecb00991dd51 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> @@ -2309,6 +2309,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
> >>  				   ring->num_hw_submission, 0,
> >>  				   timeout, adev->reset_domain->wq,
> >>  				   ring->sched_score, ring->name,
> >> +				   DRM_SCHED_POLICY_DEFAULT,
> >>  				   adev->dev);
> >>  		if (r) {
> >>  			DRM_ERROR("Failed to create scheduler on ring %s.\n",
> >> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >> index 618a804ddc34..3646f995ca94 100644
> >> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >> @@ -137,7 +137,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
> >>  	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
> >>  			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
> >>  			     msecs_to_jiffies(500), NULL, NULL,
> >> -			     dev_name(gpu->dev), gpu->dev);
> >> +			     dev_name(gpu->dev), DRM_SCHED_POLICY_DEFAULT,
> >> +			     gpu->dev);
> >>  	if (ret)
> >>  		return ret;
> >>  
> >> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> >> index 8d858aed0e56..465d4bf3882b 100644
> >> --- a/drivers/gpu/drm/lima/lima_sched.c
> >> +++ b/drivers/gpu/drm/lima/lima_sched.c
> >> @@ -491,7 +491,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
> >>  	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1,
> >>  			      lima_job_hang_limit,
> >>  			      msecs_to_jiffies(timeout), NULL,
> >> -			      NULL, name, pipe->ldev->dev);
> >> +			      NULL, name, DRM_SCHED_POLICY_DEFAULT,
> >> +			      pipe->ldev->dev);
> >>  }
> >>  
> >>  void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
> >> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
> >> index b8865e61b40f..f45e674a0aaf 100644
> >> --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> >> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> >> @@ -96,7 +96,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
> >>  
> >>  	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
> >>  			num_hw_submissions, 0, sched_timeout,
> >> -			NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
> >> +			NULL, NULL, to_msm_bo(ring->bo)->name,
> >> +			DRM_SCHED_POLICY_DEFAULT, gpu->dev->dev);
> >>  	if (ret) {
> >>  		goto fail;
> >>  	}
> >> diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
> >> index d458c2227d4f..70e497e40c70 100644
> >> --- a/drivers/gpu/drm/nouveau/nouveau_sched.c
> >> +++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
> >> @@ -431,7 +431,8 @@ int nouveau_sched_init(struct nouveau_drm *drm)
> >>  
> >>  	return drm_sched_init(sched, &nouveau_sched_ops, NULL,
> >>  			      NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
> >> -			      NULL, NULL, "nouveau_sched", drm->dev->dev);
> >> +			      NULL, NULL, "nouveau_sched",
> >> +			      DRM_SCHED_POLICY_DEFAULT, drm->dev->dev);
> >>  }
> >>  
> >>  void nouveau_sched_fini(struct nouveau_drm *drm)
> >> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> >> index 326ca1ddf1d7..ad36bf3a4699 100644
> >> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> >> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> >> @@ -835,7 +835,8 @@ int panfrost_job_init(struct panfrost_device *pfdev)
> >>  				     nentries, 0,
> >>  				     msecs_to_jiffies(JOB_TIMEOUT_MS),
> >>  				     pfdev->reset.wq,
> >> -				     NULL, "pan_js", pfdev->dev);
> >> +				     NULL, "pan_js", DRM_SCHED_POLICY_DEFAULT,
> >> +				     pfdev->dev);
> >>  		if (ret) {
> >>  			dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret);
> >>  			goto err_sched;
> >> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> >> index a42763e1429d..65a972b52eda 100644
> >> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> >> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> >> @@ -33,6 +33,20 @@
> >>  #define to_drm_sched_job(sched_job)		\
> >>  		container_of((sched_job), struct drm_sched_job, queue_node)
> >>  
> >> +static bool bad_policies(struct drm_gpu_scheduler **sched_list,
> >> +			 unsigned int num_sched_list)
> > 
> > Rename the function to the status quo,
> > 	drm_sched_policy_mismatch(...
> > 

Will do.

> >> +{
> >> +	enum drm_sched_policy sched_policy = sched_list[0]->sched_policy;
> >> +	unsigned int i;
> >> +
> >> +	/* All schedule policies must match */
> >> +	for (i = 1; i < num_sched_list; ++i)
> >> +		if (sched_policy != sched_list[i]->sched_policy)
> >> +			return true;
> >> +
> >> +	return false;
> >> +}
> >> +
> >>  /**
> >>   * drm_sched_entity_init - Init a context entity used by scheduler when
> >>   * submit to HW ring.
> >> @@ -62,7 +76,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
> >>  			  unsigned int num_sched_list,
> >>  			  atomic_t *guilty)
> >>  {
> >> -	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
> >> +	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])) ||
> >> +	    bad_policies(sched_list, num_sched_list))
> >>  		return -EINVAL;
> >>  
> >>  	memset(entity, 0, sizeof(struct drm_sched_entity));
> >> @@ -486,7 +501,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
> >>  	 * Update the entity's location in the min heap according to
> >>  	 * the timestamp of the next job, if any.
> >>  	 */
> >> -	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
> >> +	if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO) {
> >>  		struct drm_sched_job *next;
> >>  
> >>  		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> >> @@ -558,7 +573,8 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> >>  void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
> >>  {
> >>  	struct drm_sched_entity *entity = sched_job->entity;
> >> -	bool first;
> >> +	bool first, fifo = entity->rq->sched->sched_policy ==
> >> +		DRM_SCHED_POLICY_FIFO;
> >>  	ktime_t submit_ts;
> >>  
> >>  	trace_drm_sched_job(sched_job, entity);
> >> @@ -587,7 +603,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
> >>  		drm_sched_rq_add_entity(entity->rq, entity);
> >>  		spin_unlock(&entity->rq_lock);
> >>  
> >> -		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> >> +		if (fifo)
> >>  			drm_sched_rq_update_fifo(entity, submit_ts);
> >>  
> >>  		drm_sched_wakeup_if_can_queue(entity->rq->sched);
> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >> index 614e8c97a622..545d5298c086 100644
> >> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >> @@ -66,14 +66,14 @@
> >>  #define to_drm_sched_job(sched_job)		\
> >>  		container_of((sched_job), struct drm_sched_job, queue_node)
> >>  
> >> -int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
> >> +int default_drm_sched_policy = DRM_SCHED_POLICY_FIFO;
> >>  
> >>  /**
> >>   * DOC: sched_policy (int)
> >>   * Used to override default entities scheduling policy in a run queue.
> >>   */
> >> -MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
> >> -module_param_named(sched_policy, drm_sched_policy, int, 0444);
> >> +MODULE_PARM_DESC(sched_policy, "Specify the default scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
> > 
> > Note, that you don't need to add "default" in the text as it is already there at the very end "FIFO (default)."
> > Else, it gets confusing what is meant by "default". Like this:
> > 
> > 	Specify the default scheduling policy for entities on a run-queue, 1 = Round Robin, 2 = FIFO (default).
> > 
> > See "default" appear twice and it creates confusion? We don't need our internal "default" play to get
> > exported all the way to the casual user reading this. It is much clear, however,
> > 
> > 	Specify the scheduling policy for entities on a run-queue, 1 = Round Robin, 2 = FIFO (default).
> > 
> > To mean, if unset, the default one would be used. But this is all internal code stuff.
> > 
> > So I'd say leave this one alone.
> >

Ok.
 
> >> +module_param_named(sched_policy, default_drm_sched_policy, int, 0444);
> > 
> > Put "default" as a postfix:
> > default_drm_sched_policy --> drm_sched_policy_default
> > 

Sure.

> >>  
> >>  static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a,
> >>  							    const struct rb_node *b)
> >> @@ -177,7 +177,7 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
> >>  	if (rq->current_entity == entity)
> >>  		rq->current_entity = NULL;
> >>  
> >> -	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> >> +	if (rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO)
> >>  		drm_sched_rq_remove_fifo_locked(entity);
> >>  
> >>  	spin_unlock(&rq->lock);
> >> @@ -898,7 +898,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
> >>  
> >>  	/* Kernel run queue has higher priority than normal run queue*/
> >>  	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
> >> -		entity = drm_sched_policy == DRM_SCHED_POLICY_FIFO ?
> >> +		entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ?
> >>  			drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]) :
> >>  			drm_sched_rq_select_entity_rr(&sched->sched_rq[i]);
> >>  		if (entity)
> >> @@ -1071,6 +1071,7 @@ static void drm_sched_main(struct work_struct *w)
> >>   *		used
> >>   * @score: optional score atomic shared with other schedulers
> >>   * @name: name used for debugging
> >> + * @sched_policy: schedule policy
> >>   * @dev: target &struct device
> >>   *
> >>   * Return 0 on success, otherwise error code.
> >> @@ -1080,9 +1081,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> >>  		   struct workqueue_struct *submit_wq,
> >>  		   unsigned hw_submission, unsigned hang_limit,
> >>  		   long timeout, struct workqueue_struct *timeout_wq,
> >> -		   atomic_t *score, const char *name, struct device *dev)
> >> +		   atomic_t *score, const char *name,
> >> +		   enum drm_sched_policy sched_policy,
> >> +		   struct device *dev)
> >>  {
> >>  	int i;
> >> +
> >> +	if (sched_policy >= DRM_SCHED_POLICY_COUNT)
> >> +		return -EINVAL;
> >> +
> >>  	sched->ops = ops;
> >>  	sched->hw_submission_limit = hw_submission;
> >>  	sched->name = name;
> >> @@ -1092,6 +1099,10 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> >>  	sched->hang_limit = hang_limit;
> >>  	sched->score = score ? score : &sched->_score;
> >>  	sched->dev = dev;
> >> +	if (sched_policy == DRM_SCHED_POLICY_DEFAULT)
> >> +		sched->sched_policy = default_drm_sched_policy;
> >> +	else
> >> +		sched->sched_policy = sched_policy;
> 
> Note also that here you can use a ternary operator as opposed to an if-control.
> 
> 	sched->sched_policy = sched_policy == DRM_SCHED_POLICY_UNSET ?
> 				drm_sched_policy_default : sched_policy;

Sure, will fix in next rev.

Matt

> 
> -- 
> Regards,
> Luben
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity
  2023-09-14  4:18   ` Luben Tuikov
  2023-09-14  4:23     ` Luben Tuikov
@ 2023-09-14 15:49     ` Matthew Brost
  1 sibling, 0 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-14 15:49 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, christian.koenig, boris.brezillon,
	donald.robson, daniel, lina, airlied, intel-xe, faith.ekstrand

On Thu, Sep 14, 2023 at 12:18:11AM -0400, Luben Tuikov wrote:
> On 2023-09-11 22:16, Matthew Brost wrote:
> > Rather than a global modparam for scheduling policy, move the scheduling
> > policy to scheduler / entity so user can control each scheduler / entity
> > policy.
> > 
> > v2:
> >   - s/DRM_SCHED_POLICY_MAX/DRM_SCHED_POLICY_COUNT (Luben)
> >   - Only include policy in scheduler (Luben)
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
> >  drivers/gpu/drm/etnaviv/etnaviv_sched.c    |  3 ++-
> >  drivers/gpu/drm/lima/lima_sched.c          |  3 ++-
> >  drivers/gpu/drm/msm/msm_ringbuffer.c       |  3 ++-
> >  drivers/gpu/drm/nouveau/nouveau_sched.c    |  3 ++-
> >  drivers/gpu/drm/panfrost/panfrost_job.c    |  3 ++-
> >  drivers/gpu/drm/scheduler/sched_entity.c   | 24 ++++++++++++++++++----
> >  drivers/gpu/drm/scheduler/sched_main.c     | 23 +++++++++++++++------
> >  drivers/gpu/drm/v3d/v3d_sched.c            | 15 +++++++++-----
> >  include/drm/gpu_scheduler.h                | 20 ++++++++++++------
> >  10 files changed, 72 insertions(+), 26 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index c83a76bccc1d..ecb00991dd51 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2309,6 +2309,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
> >  				   ring->num_hw_submission, 0,
> >  				   timeout, adev->reset_domain->wq,
> >  				   ring->sched_score, ring->name,
> > +				   DRM_SCHED_POLICY_DEFAULT,
> >  				   adev->dev);
> >  		if (r) {
> >  			DRM_ERROR("Failed to create scheduler on ring %s.\n",
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > index 618a804ddc34..3646f995ca94 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > @@ -137,7 +137,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
> >  	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
> >  			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
> >  			     msecs_to_jiffies(500), NULL, NULL,
> > -			     dev_name(gpu->dev), gpu->dev);
> > +			     dev_name(gpu->dev), DRM_SCHED_POLICY_DEFAULT,
> > +			     gpu->dev);
> >  	if (ret)
> >  		return ret;
> >  
> > diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> > index 8d858aed0e56..465d4bf3882b 100644
> > --- a/drivers/gpu/drm/lima/lima_sched.c
> > +++ b/drivers/gpu/drm/lima/lima_sched.c
> > @@ -491,7 +491,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
> >  	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1,
> >  			      lima_job_hang_limit,
> >  			      msecs_to_jiffies(timeout), NULL,
> > -			      NULL, name, pipe->ldev->dev);
> > +			      NULL, name, DRM_SCHED_POLICY_DEFAULT,
> > +			      pipe->ldev->dev);
> >  }
> >  
> >  void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
> > diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
> > index b8865e61b40f..f45e674a0aaf 100644
> > --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> > +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> > @@ -96,7 +96,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
> >  
> >  	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
> >  			num_hw_submissions, 0, sched_timeout,
> > -			NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
> > +			NULL, NULL, to_msm_bo(ring->bo)->name,
> > +			DRM_SCHED_POLICY_DEFAULT, gpu->dev->dev);
> >  	if (ret) {
> >  		goto fail;
> >  	}
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
> > index d458c2227d4f..70e497e40c70 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_sched.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
> > @@ -431,7 +431,8 @@ int nouveau_sched_init(struct nouveau_drm *drm)
> >  
> >  	return drm_sched_init(sched, &nouveau_sched_ops, NULL,
> >  			      NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
> > -			      NULL, NULL, "nouveau_sched", drm->dev->dev);
> > +			      NULL, NULL, "nouveau_sched",
> > +			      DRM_SCHED_POLICY_DEFAULT, drm->dev->dev);
> >  }
> >  
> >  void nouveau_sched_fini(struct nouveau_drm *drm)
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> > index 326ca1ddf1d7..ad36bf3a4699 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> > @@ -835,7 +835,8 @@ int panfrost_job_init(struct panfrost_device *pfdev)
> >  				     nentries, 0,
> >  				     msecs_to_jiffies(JOB_TIMEOUT_MS),
> >  				     pfdev->reset.wq,
> > -				     NULL, "pan_js", pfdev->dev);
> > +				     NULL, "pan_js", DRM_SCHED_POLICY_DEFAULT,
> > +				     pfdev->dev);
> >  		if (ret) {
> >  			dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret);
> >  			goto err_sched;
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > index a42763e1429d..65a972b52eda 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -33,6 +33,20 @@
> >  #define to_drm_sched_job(sched_job)		\
> >  		container_of((sched_job), struct drm_sched_job, queue_node)
> >  
> > +static bool bad_policies(struct drm_gpu_scheduler **sched_list,
> > +			 unsigned int num_sched_list)
> 
> Rename the function to the status quo,
> 	drm_sched_policy_mismatch(...
> 
> > +{
> > +	enum drm_sched_policy sched_policy = sched_list[0]->sched_policy;
> > +	unsigned int i;
> > +
> > +	/* All schedule policies must match */
> > +	for (i = 1; i < num_sched_list; ++i)
> > +		if (sched_policy != sched_list[i]->sched_policy)
> > +			return true;
> > +
> > +	return false;
> > +}
> > +
> >  /**
> >   * drm_sched_entity_init - Init a context entity used by scheduler when
> >   * submit to HW ring.
> > @@ -62,7 +76,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
> >  			  unsigned int num_sched_list,
> >  			  atomic_t *guilty)
> >  {
> > -	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
> > +	if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])) ||
> > +	    bad_policies(sched_list, num_sched_list))
> >  		return -EINVAL;
> >  
> >  	memset(entity, 0, sizeof(struct drm_sched_entity));
> > @@ -486,7 +501,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
> >  	 * Update the entity's location in the min heap according to
> >  	 * the timestamp of the next job, if any.
> >  	 */
> > -	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
> > +	if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO) {
> >  		struct drm_sched_job *next;
> >  
> >  		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> > @@ -558,7 +573,8 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> >  void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
> >  {
> >  	struct drm_sched_entity *entity = sched_job->entity;
> > -	bool first;
> > +	bool first, fifo = entity->rq->sched->sched_policy ==
> > +		DRM_SCHED_POLICY_FIFO;
> >  	ktime_t submit_ts;
> >  
> >  	trace_drm_sched_job(sched_job, entity);
> > @@ -587,7 +603,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
> >  		drm_sched_rq_add_entity(entity->rq, entity);
> >  		spin_unlock(&entity->rq_lock);
> >  
> > -		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> > +		if (fifo)
> >  			drm_sched_rq_update_fifo(entity, submit_ts);
> >  
> >  		drm_sched_wakeup_if_can_queue(entity->rq->sched);
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 614e8c97a622..545d5298c086 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -66,14 +66,14 @@
> >  #define to_drm_sched_job(sched_job)		\
> >  		container_of((sched_job), struct drm_sched_job, queue_node)
> >  
> > -int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
> > +int default_drm_sched_policy = DRM_SCHED_POLICY_FIFO;
> >  
> >  /**
> >   * DOC: sched_policy (int)
> >   * Used to override default entities scheduling policy in a run queue.
> >   */
> > -MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
> > -module_param_named(sched_policy, drm_sched_policy, int, 0444);
> > +MODULE_PARM_DESC(sched_policy, "Specify the default scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
> 
> Note, that you don't need to add "default" in the text as it is already there at the very end "FIFO (default)."
> Else, it gets confusing what is meant by "default". Like this:
> 
> 	Specify the default scheduling policy for entities on a run-queue, 1 = Round Robin, 2 = FIFO (default).
> 
> See "default" appear twice and it creates confusion? We don't need our internal "default" play to get
> exported all the way to the casual user reading this. It is much clear, however,
> 
> 	Specify the scheduling policy for entities on a run-queue, 1 = Round Robin, 2 = FIFO (default).
> 
> To mean, if unset, the default one would be used. But this is all internal code stuff.
> 
> So I'd say leave this one alone.
> 
> > +module_param_named(sched_policy, default_drm_sched_policy, int, 0444);
> 
> Put "default" as a postfix:
> default_drm_sched_policy --> drm_sched_policy_default
> 
> >  
> >  static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a,
> >  							    const struct rb_node *b)
> > @@ -177,7 +177,7 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
> >  	if (rq->current_entity == entity)
> >  		rq->current_entity = NULL;
> >  
> > -	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> > +	if (rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO)
> >  		drm_sched_rq_remove_fifo_locked(entity);
> >  
> >  	spin_unlock(&rq->lock);
> > @@ -898,7 +898,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
> >  
> >  	/* Kernel run queue has higher priority than normal run queue*/
> >  	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
> > -		entity = drm_sched_policy == DRM_SCHED_POLICY_FIFO ?
> > +		entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ?
> >  			drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]) :
> >  			drm_sched_rq_select_entity_rr(&sched->sched_rq[i]);
> >  		if (entity)
> > @@ -1071,6 +1071,7 @@ static void drm_sched_main(struct work_struct *w)
> >   *		used
> >   * @score: optional score atomic shared with other schedulers
> >   * @name: name used for debugging
> > + * @sched_policy: schedule policy
> >   * @dev: target &struct device
> >   *
> >   * Return 0 on success, otherwise error code.
> > @@ -1080,9 +1081,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> >  		   struct workqueue_struct *submit_wq,
> >  		   unsigned hw_submission, unsigned hang_limit,
> >  		   long timeout, struct workqueue_struct *timeout_wq,
> > -		   atomic_t *score, const char *name, struct device *dev)
> > +		   atomic_t *score, const char *name,
> > +		   enum drm_sched_policy sched_policy,
> > +		   struct device *dev)
> >  {
> >  	int i;
> > +
> > +	if (sched_policy >= DRM_SCHED_POLICY_COUNT)
> > +		return -EINVAL;
> > +
> >  	sched->ops = ops;
> >  	sched->hw_submission_limit = hw_submission;
> >  	sched->name = name;
> > @@ -1092,6 +1099,10 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> >  	sched->hang_limit = hang_limit;
> >  	sched->score = score ? score : &sched->_score;
> >  	sched->dev = dev;
> > +	if (sched_policy == DRM_SCHED_POLICY_DEFAULT)
> > +		sched->sched_policy = default_drm_sched_policy;
> > +	else
> > +		sched->sched_policy = sched_policy;
> >  	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
> >  		drm_sched_rq_init(sched, &sched->sched_rq[i]);
> >  
> > diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> > index 38e092ea41e6..5e3fe77fa991 100644
> > --- a/drivers/gpu/drm/v3d/v3d_sched.c
> > +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> > @@ -391,7 +391,8 @@ v3d_sched_init(struct v3d_dev *v3d)
> >  			     &v3d_bin_sched_ops, NULL,
> >  			     hw_jobs_limit, job_hang_limit,
> >  			     msecs_to_jiffies(hang_limit_ms), NULL,
> > -			     NULL, "v3d_bin", v3d->drm.dev);
> > +			     NULL, "v3d_bin", DRM_SCHED_POLICY_DEFAULT,
> > +			     v3d->drm.dev);
> >  	if (ret)
> >  		return ret;
> >  
> > @@ -399,7 +400,8 @@ v3d_sched_init(struct v3d_dev *v3d)
> >  			     &v3d_render_sched_ops, NULL,
> >  			     hw_jobs_limit, job_hang_limit,
> >  			     msecs_to_jiffies(hang_limit_ms), NULL,
> > -			     NULL, "v3d_render", v3d->drm.dev);
> > +			     ULL, "v3d_render", DRM_SCHED_POLICY_DEFAULT,
> > +			     v3d->drm.dev);
> >  	if (ret)
> >  		goto fail;
> >  
> > @@ -407,7 +409,8 @@ v3d_sched_init(struct v3d_dev *v3d)
> >  			     &v3d_tfu_sched_ops, NULL,
> >  			     hw_jobs_limit, job_hang_limit,
> >  			     msecs_to_jiffies(hang_limit_ms), NULL,
> > -			     NULL, "v3d_tfu", v3d->drm.dev);
> > +			     NULL, "v3d_tfu", DRM_SCHED_POLICY_DEFAULT,
> > +			     v3d->drm.dev);
> >  	if (ret)
> >  		goto fail;
> >  
> > @@ -416,7 +419,8 @@ v3d_sched_init(struct v3d_dev *v3d)
> >  				     &v3d_csd_sched_ops, NULL,
> >  				     hw_jobs_limit, job_hang_limit,
> >  				     msecs_to_jiffies(hang_limit_ms), NULL,
> > -				     NULL, "v3d_csd", v3d->drm.dev);
> > +				     NULL, "v3d_csd", DRM_SCHED_POLICY_DEFAULT,
> > +				     v3d->drm.dev);
> >  		if (ret)
> >  			goto fail;
> >  
> > @@ -424,7 +428,8 @@ v3d_sched_init(struct v3d_dev *v3d)
> >  				     &v3d_cache_clean_sched_ops, NULL,
> >  				     hw_jobs_limit, job_hang_limit,
> >  				     msecs_to_jiffies(hang_limit_ms), NULL,
> > -				     NULL, "v3d_cache_clean", v3d->drm.dev);
> > +				     NULL, "v3d_cache_clean",
> > +				     DRM_SCHED_POLICY_DEFAULT, v3d->drm.dev);
> >  		if (ret)
> >  			goto fail;
> >  	}
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 278106e358a8..897d52a4ff4f 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -72,11 +72,15 @@ enum drm_sched_priority {
> >  	DRM_SCHED_PRIORITY_UNSET = -2
> >  };
> >  
> > -/* Used to chose between FIFO and RR jobs scheduling */
> > -extern int drm_sched_policy;
> > -
> > -#define DRM_SCHED_POLICY_RR    0
> > -#define DRM_SCHED_POLICY_FIFO  1
> > +/* Used to chose default scheduling policy*/
> > +extern int default_drm_sched_policy;
> > +
> > +enum drm_sched_policy {
> > +	DRM_SCHED_POLICY_DEFAULT,
> > +	DRM_SCHED_POLICY_RR,
> > +	DRM_SCHED_POLICY_FIFO,
> > +	DRM_SCHED_POLICY_COUNT,
> > +};
> 
> No. Use as the first (0th) element name "DRM_SCHED_POLICY_UNSET".
> The DRM scheduling policies are,
> 	* unset, meaning no preference, whatever the default is, (but that's NOT the "default"),
> 	* Round-Robin, and
> 	* FIFO.
> "Default" is a _result_ of the policy being _unset_. "Default" is not a policy.
> IOW, we want to say,
> 	"If you don't set the policy (i.e. it's unset), we'll set it to the default one,
> which could be either Round-Robin, or FIFO."
> 
> It may look a bit strange in function calls up there, "What do you mean `unset'? What is it?"
> but it needs to be understood that the _policy_ is "unset", "rr", or "fifo", and if it is "unset",
> we'll set it to whatever the default one was set to, at boot/compile time, RR or FIFO.
> 
> Note that "unset" is equivalent to a function not having the policy parameter altogether (as right now).
> Now that you're adding it, you can extend that, as opposed to renaming the enum
> to "DEFAULT" to tell the caller that it will be set to the default one. But we don't need
> to tell function behaviour in the name of a function parameter/enum element.
> 

s/DRM_SCHED_POLICY_DEFAULT/DRM_SCHED_POLICY_UNSET/ it is.

Matt

> >  
> >  /**
> >   * struct drm_sched_entity - A wrapper around a job queue (typically
> > @@ -489,6 +493,7 @@ struct drm_sched_backend_ops {
> >   *              guilty and it will no longer be considered for scheduling.
> >   * @score: score to help loadbalancer pick a idle sched
> >   * @_score: score used when the driver doesn't provide one
> > + * @sched_policy: Schedule policy for scheduler
> >   * @ready: marks if the underlying HW is ready to work
> >   * @free_guilty: A hit to time out handler to free the guilty job.
> >   * @pause_submit: pause queuing of @work_submit on @submit_wq
> > @@ -514,6 +519,7 @@ struct drm_gpu_scheduler {
> >  	int				hang_limit;
> >  	atomic_t                        *score;
> >  	atomic_t                        _score;
> > +	enum drm_sched_policy		sched_policy;
> >  	bool				ready;
> >  	bool				free_guilty;
> >  	bool				pause_submit;
> > @@ -525,7 +531,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> >  		   struct workqueue_struct *submit_wq,
> >  		   uint32_t hw_submission, unsigned hang_limit,
> >  		   long timeout, struct workqueue_struct *timeout_wq,
> > -		   atomic_t *score, const char *name, struct device *dev);
> > +		   atomic_t *score, const char *name,
> > +		   enum drm_sched_policy sched_policy,
> > +		   struct device *dev);
> >  
> >  void drm_sched_fini(struct drm_gpu_scheduler *sched);
> >  int drm_sched_job_init(struct drm_sched_job *job,
> 
> -- 
> Regards,
> Luben
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 10/13] drm/sched: Add helper to set TDR timeout
  2023-09-14  2:38   ` Luben Tuikov
@ 2023-09-14 17:36     ` Matthew Brost
  0 siblings, 0 replies; 53+ messages in thread
From: Matthew Brost @ 2023-09-14 17:36 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, christian.koenig, boris.brezillon,
	donald.robson, daniel, lina, airlied, intel-xe, faith.ekstrand

On Wed, Sep 13, 2023 at 10:38:24PM -0400, Luben Tuikov wrote:
> On 2023-09-11 22:16, Matthew Brost wrote:
> > Add helper to set TDR timeout and restart the TDR with new timeout
> > value. This will be used in XE, new Intel GPU driver, to trigger the TDR
> > to cleanup drm_sched_entity that encounter errors.
> 
> Do you just want to trigger the cleanup or do you really want to
> modify the timeout and requeue TDR delayed work (to be triggered
> later at a different time)?
> 
> If the former, then might as well just add a function to queue that
> right away. If the latter, then this would do, albeit with a few

Let me change the function to queue it immediately as that is what is
needed.

Matt

> notes as mentioned below.
> 
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 18 ++++++++++++++++++
> >  include/drm/gpu_scheduler.h            |  1 +
> >  2 files changed, 19 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 9dbfab7be2c6..689fb6686e01 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -426,6 +426,24 @@ static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
> >  	spin_unlock(&sched->job_list_lock);
> >  }
> >  
> > +/**
> > + * drm_sched_set_timeout - set timeout for reset worker
> > + *
> > + * @sched: scheduler instance to set and (re)-start the worker for
> > + * @timeout: timeout period
> > + *
> > + * Set and (re)-start the timeout for the given scheduler.
> > + */
> > +void drm_sched_set_timeout(struct drm_gpu_scheduler *sched, long timeout)
> > +{
> 
> Well, I'd perhaps call this "drm_sched_set_tdr_timeout()", or something
> to that effect, as "drm_sched_set_timeout()" isn't clear that it is indeed
> a cleanup timeout. However, it's totally up to you. :-)
> 
> It appears that "long timeout" is the new job timeout, so it is possible
> that a stuck job might be given old timeout + new timeout recovery time,
> after this function is called.
> 
> > +	spin_lock(&sched->job_list_lock);
> > +	sched->timeout = timeout;
> > +	cancel_delayed_work(&sched->work_tdr);
> > +	drm_sched_start_timeout(sched);
> > +	spin_unlock(&sched->job_list_lock);
> > +}
> > +EXPORT_SYMBOL(drm_sched_set_timeout);
> 
> Well, drm_sched_start_timeout() (which also has a name lacking description, perhaps
> it should be "drm_sched_start_tdr_timeout()" or "...start_cleanup_timeout()"), anyway,
> so that function compares to MAX_SCHEDULE_TIMEOUT and pending list not being empty
> before it requeues delayed TDR work item. So, while a remote possibility, this new
> function may have the unintended consequence of canceling TDR, and never restarting it.
> I see it grabs the lock, however. Maybe wrap it in "if (sched->timeout != MAX_SCHEDULE_TIMEOUT)"?
> How about using mod_delayed_work()?
> -- 
> Regards,
> Luben
> 
> > +
> >  /**
> >   * drm_sched_fault - immediately start timeout handler
> >   *
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 5d753ecb5d71..b7b818cd81b6 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -596,6 +596,7 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >  				    struct drm_gpu_scheduler **sched_list,
> >                                     unsigned int num_sched_list);
> >  
> > +void drm_sched_set_timeout(struct drm_gpu_scheduler *sched, long timeout);
> >  void drm_sched_job_cleanup(struct drm_sched_job *job);
> >  void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched);
> >  void drm_sched_add_msg(struct drm_gpu_scheduler *sched,
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 09/13] drm/sched: Submit job before starting TDR
  2023-09-14  2:56   ` Luben Tuikov
@ 2023-09-14 17:48     ` Matthew Brost
  2023-09-21  3:35       ` Luben Tuikov
  0 siblings, 1 reply; 53+ messages in thread
From: Matthew Brost @ 2023-09-14 17:48 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: robdclark, sarah.walker, ketil.johnsen, Liviu.Dudau, mcanal,
	frank.binns, dri-devel, christian.koenig, boris.brezillon,
	donald.robson, daniel, lina, airlied, intel-xe, faith.ekstrand

On Wed, Sep 13, 2023 at 10:56:10PM -0400, Luben Tuikov wrote:
> On 2023-09-11 22:16, Matthew Brost wrote:
> > If the TDR is set to a value, it can fire before a job is submitted in
> > drm_sched_main. The job should be always be submitted before the TDR
> > fires, fix this ordering.
> > 
> > v2:
> >   - Add to pending list before run_job, start TDR after (Luben, Boris)
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index c627d3e6494a..9dbfab7be2c6 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -498,7 +498,6 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
> >  
> >  	spin_lock(&sched->job_list_lock);
> >  	list_add_tail(&s_job->list, &sched->pending_list);
> > -	drm_sched_start_timeout(sched);
> >  	spin_unlock(&sched->job_list_lock);
> >  }
> >  
> > @@ -1234,6 +1233,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
> >  		fence = sched->ops->run_job(sched_job);
> >  		complete_all(&entity->entity_idle);
> >  		drm_sched_fence_scheduled(s_fence, fence);
> > +		drm_sched_start_timeout_unlocked(sched);
> >  
> >  		if (!IS_ERR_OR_NULL(fence)) {
> >  			/* Drop for original kref_init of the fence */
> 
> So, sched->ops->run_job(), is a "job inflection point" from the point of view of
> the DRM scheduler. After that call, DRM has relinquished control of the job to the
> firmware/hardware.
> 
> Putting the job in the pending list, before submitting it to down to the firmware/hardware,
> goes along with starting a timeout timer for the job. The timeout always includes
> time for the firmware/hardware to get it prepped, as well as time for the actual
> execution of the job (task). Thus, we want to do this:
> 	1. Put the job in pending list. "Pending list" means "pends in hardware".
> 	2. Start a timeout timer for the job.
> 	3. Start executing the job/task. This usually involves giving it to firmware/hardware,
> 	   i.e. ownership of the job/task changes to another domain. In our case this is accomplished
> 	   by calling sched->ops->run_job().
> Perhaps move drm_sched_start_timeout() closer to sched->ops->run_job() from above and/or increase
> the timeout value?

I disagree. It is clear race if the timeout starts before run_job() that
the TDR can fire before run_job() is called. The entire point of this
patch is to seal this race by starting the TDR after run_job() is
called.

Matt

> -- 
> Regards,
> Luben
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
  2023-09-12  7:29   ` Boris Brezillon
  2023-09-14  3:35   ` Luben Tuikov
@ 2023-09-16 17:07   ` Danilo Krummrich
  2 siblings, 0 replies; 53+ messages in thread
From: Danilo Krummrich @ 2023-09-16 17:07 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, lina, mcanal, Liviu.Dudau,
	luben.tuikov, donald.robson, boris.brezillon, christian.koenig,
	faith.ekstrand

On 9/12/23 04:16, Matthew Brost wrote:
> In XE, the new Intel GPU driver, a choice has made to have a 1 to 1
> mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
> seems a bit odd but let us explain the reasoning below.
> 
> 1. In XE the submission order from multiple drm_sched_entity is not
> guaranteed to be the same completion even if targeting the same hardware
> engine. This is because in XE we have a firmware scheduler, the GuC,
> which allowed to reorder, timeslice, and preempt submissions. If a using
> shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
> apart as the TDR expects submission order == completion order. Using a
> dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.
> 
> 2. In XE submissions are done via programming a ring buffer (circular
> buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
> limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
> control on the ring for free.
> 
> A problem with this design is currently a drm_gpu_scheduler uses a
> kthread for submission / job cleanup. This doesn't scale if a large
> number of drm_gpu_scheduler are used. To work around the scaling issue,
> use a worker rather than kthread for submission / job cleanup.
> 
> v2:
>    - (Rob Clark) Fix msm build
>    - Pass in run work queue
> v3:
>    - (Boris) don't have loop in worker
> v4:
>    - (Tvrtko) break out submit ready, stop, start helpers into own patch
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +-
>   drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>   drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>   drivers/gpu/drm/msm/msm_ringbuffer.c       |   2 +-
>   drivers/gpu/drm/nouveau/nouveau_sched.c    |   2 +-
>   drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>   drivers/gpu/drm/scheduler/sched_main.c     | 106 +++++++++------------
>   drivers/gpu/drm/v3d/v3d_sched.c            |  10 +-
>   include/drm/gpu_scheduler.h                |  12 ++-
>   9 files changed, 65 insertions(+), 75 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 1f8a794704d0..c83a76bccc1d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2305,7 +2305,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>   			break;
>   		}
>   
> -		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
> +		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL,
>   				   ring->num_hw_submission, 0,
>   				   timeout, adev->reset_domain->wq,
>   				   ring->sched_score, ring->name,
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index 345fec6cb1a4..618a804ddc34 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -134,7 +134,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>   {
>   	int ret;
>   
> -	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
> +	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
>   			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>   			     msecs_to_jiffies(500), NULL, NULL,
>   			     dev_name(gpu->dev), gpu->dev);
> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> index ffd91a5ee299..8d858aed0e56 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
>   
>   	INIT_WORK(&pipe->recover_work, lima_sched_recover_work);
>   
> -	return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
> +	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1,
>   			      lima_job_hang_limit,
>   			      msecs_to_jiffies(timeout), NULL,
>   			      NULL, name, pipe->ldev->dev);
> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
> index 40c0bc35a44c..b8865e61b40f 100644
> --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> @@ -94,7 +94,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
>   	 /* currently managing hangcheck ourselves: */
>   	sched_timeout = MAX_SCHEDULE_TIMEOUT;
>   
> -	ret = drm_sched_init(&ring->sched, &msm_sched_ops,
> +	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
>   			num_hw_submissions, 0, sched_timeout,
>   			NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
>   	if (ret) {
> diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
> index 88217185e0f3..d458c2227d4f 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_sched.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
> @@ -429,7 +429,7 @@ int nouveau_sched_init(struct nouveau_drm *drm)
>   	if (!drm->sched_wq)
>   		return -ENOMEM;
>   
> -	return drm_sched_init(sched, &nouveau_sched_ops,
> +	return drm_sched_init(sched, &nouveau_sched_ops, NULL,
>   			      NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
>   			      NULL, NULL, "nouveau_sched", drm->dev->dev);
>   }
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 033f5e684707..326ca1ddf1d7 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -831,7 +831,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
>   		js->queue[j].fence_context = dma_fence_context_alloc(1);
>   
>   		ret = drm_sched_init(&js->queue[j].sched,
> -				     &panfrost_sched_ops,
> +				     &panfrost_sched_ops, NULL,
>   				     nentries, 0,
>   				     msecs_to_jiffies(JOB_TIMEOUT_MS),
>   				     pfdev->reset.wq,
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index e4fa62abca41..614e8c97a622 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -48,7 +48,6 @@
>    * through the jobs entity pointer.
>    */
>   
> -#include <linux/kthread.h>
>   #include <linux/wait.h>
>   #include <linux/sched.h>
>   #include <linux/completion.h>
> @@ -256,6 +255,16 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>   	return rb ? rb_entry(rb, struct drm_sched_entity, rb_tree_node) : NULL;
>   }
>   
> +/**
> + * drm_sched_submit_queue - scheduler queue submission
> + * @sched: scheduler instance
> + */
> +static void drm_sched_submit_queue(struct drm_gpu_scheduler *sched)
> +{
> +	if (!READ_ONCE(sched->pause_submit))
> +		queue_work(sched->submit_wq, &sched->work_submit);
> +}
> +
>   /**
>    * drm_sched_job_done - complete a job
>    * @s_job: pointer to the job which is done
> @@ -275,7 +284,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
>   	dma_fence_get(&s_fence->finished);
>   	drm_sched_fence_finished(s_fence, result);
>   	dma_fence_put(&s_fence->finished);
> -	wake_up_interruptible(&sched->wake_up_worker);
> +	drm_sched_submit_queue(sched);
>   }
>   
>   /**
> @@ -868,7 +877,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>   void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched)
>   {
>   	if (drm_sched_can_queue(sched))
> -		wake_up_interruptible(&sched->wake_up_worker);
> +		drm_sched_submit_queue(sched);
>   }
>   
>   /**
> @@ -978,61 +987,42 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>   }
>   EXPORT_SYMBOL(drm_sched_pick_best);
>   
> -/**
> - * drm_sched_blocked - check if the scheduler is blocked
> - *
> - * @sched: scheduler instance
> - *
> - * Returns true if blocked, otherwise false.
> - */
> -static bool drm_sched_blocked(struct drm_gpu_scheduler *sched)
> -{
> -	if (kthread_should_park()) {
> -		kthread_parkme();
> -		return true;
> -	}
> -
> -	return false;
> -}
> -
>   /**
>    * drm_sched_main - main scheduler thread
>    *
>    * @param: scheduler instance
> - *
> - * Returns 0.
>    */
> -static int drm_sched_main(void *param)
> +static void drm_sched_main(struct work_struct *w)
>   {
> -	struct drm_gpu_scheduler *sched = (struct drm_gpu_scheduler *)param;
> +	struct drm_gpu_scheduler *sched =
> +		container_of(w, struct drm_gpu_scheduler, work_submit);
> +	struct drm_sched_entity *entity;
> +	struct drm_sched_job *cleanup_job;
>   	int r;
>   
> -	sched_set_fifo_low(current);
> +	if (READ_ONCE(sched->pause_submit))
> +		return;
>   
> -	while (!kthread_should_stop()) {
> -		struct drm_sched_entity *entity = NULL;
> -		struct drm_sched_fence *s_fence;
> -		struct drm_sched_job *sched_job;
> -		struct dma_fence *fence;
> -		struct drm_sched_job *cleanup_job = NULL;
> +	cleanup_job = drm_sched_get_cleanup_job(sched);
> +	entity = drm_sched_select_entity(sched);
>   
> -		wait_event_interruptible(sched->wake_up_worker,
> -					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
> -					 (!drm_sched_blocked(sched) &&
> -					  (entity = drm_sched_select_entity(sched))) ||
> -					 kthread_should_stop());
> +	if (!entity && !cleanup_job)
> +		return;	/* No more work */
>   
> -		if (cleanup_job)
> -			sched->ops->free_job(cleanup_job);
> +	if (cleanup_job)
> +		sched->ops->free_job(cleanup_job);
>   
> -		if (!entity)
> -			continue;
> +	if (entity) {
> +		struct dma_fence *fence;
> +		struct drm_sched_fence *s_fence;
> +		struct drm_sched_job *sched_job;
>   
>   		sched_job = drm_sched_entity_pop_job(entity);
> -
>   		if (!sched_job) {
>   			complete_all(&entity->entity_idle);
> -			continue;
> +			if (!cleanup_job)
> +				return;	/* No more work */
> +			goto again;
>   		}
>   
>   		s_fence = sched_job->s_fence;
> @@ -1063,7 +1053,9 @@ static int drm_sched_main(void *param)
>   
>   		wake_up(&sched->job_scheduled);
>   	}
> -	return 0;
> +
> +again:
> +	drm_sched_submit_queue(sched);
>   }
>   
>   /**
> @@ -1071,6 +1063,7 @@ static int drm_sched_main(void *param)
>    *
>    * @sched: scheduler instance
>    * @ops: backend operations for this scheduler
> + * @submit_wq: workqueue to use for submission. If NULL, the system_wq is used

As discussed here [1] I think it would be worth adding a comment somewhere explaining
the implications of this regarding job_free() being part of the fence signalling
critical section.

For instance, a scheduler-dedicated multi-threaded workqueue guarantees that free-job
work can't block run-job work, whereas e.g. using the system_wq won't give such
guarantees.

[1] https://lore.kernel.org/dri-devel/20230811023137.659037-1-matthew.brost@intel.com/T/#mb349b8393b7ace232cff76a969a2ce4d99f852ac

>    * @hw_submission: number of hw submissions that can be in flight
>    * @hang_limit: number of times to allow a job to hang before dropping it
>    * @timeout: timeout value in jiffies for the scheduler
> @@ -1084,14 +1077,16 @@ static int drm_sched_main(void *param)
>    */
>   int drm_sched_init(struct drm_gpu_scheduler *sched,
>   		   const struct drm_sched_backend_ops *ops,
> +		   struct workqueue_struct *submit_wq,
>   		   unsigned hw_submission, unsigned hang_limit,
>   		   long timeout, struct workqueue_struct *timeout_wq,
>   		   atomic_t *score, const char *name, struct device *dev)
>   {
> -	int i, ret;
> +	int i;
>   	sched->ops = ops;
>   	sched->hw_submission_limit = hw_submission;
>   	sched->name = name;
> +	sched->submit_wq = submit_wq ? : system_wq;
>   	sched->timeout = timeout;
>   	sched->timeout_wq = timeout_wq ? : system_wq;
>   	sched->hang_limit = hang_limit;
> @@ -1100,23 +1095,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>   	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
>   		drm_sched_rq_init(sched, &sched->sched_rq[i]);
>   
> -	init_waitqueue_head(&sched->wake_up_worker);
>   	init_waitqueue_head(&sched->job_scheduled);
>   	INIT_LIST_HEAD(&sched->pending_list);
>   	spin_lock_init(&sched->job_list_lock);
>   	atomic_set(&sched->hw_rq_count, 0);
>   	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
> +	INIT_WORK(&sched->work_submit, drm_sched_main);
>   	atomic_set(&sched->_score, 0);
>   	atomic64_set(&sched->job_id_count, 0);
> -
> -	/* Each scheduler will run on a seperate kernel thread */
> -	sched->thread = kthread_run(drm_sched_main, sched, sched->name);
> -	if (IS_ERR(sched->thread)) {
> -		ret = PTR_ERR(sched->thread);
> -		sched->thread = NULL;
> -		DRM_DEV_ERROR(sched->dev, "Failed to create scheduler for %s.\n", name);
> -		return ret;
> -	}
> +	sched->pause_submit = false;
>   
>   	sched->ready = true;
>   	return 0;
> @@ -1135,8 +1122,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
>   	struct drm_sched_entity *s_entity;
>   	int i;
>   
> -	if (sched->thread)
> -		kthread_stop(sched->thread);
> +	drm_sched_submit_stop(sched);
>   
>   	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
>   		struct drm_sched_rq *rq = &sched->sched_rq[i];
> @@ -1216,7 +1202,7 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>    */
>   bool drm_sched_submit_ready(struct drm_gpu_scheduler *sched)
>   {
> -	return !!sched->thread;
> +	return sched->ready;
>   
>   }
>   EXPORT_SYMBOL(drm_sched_submit_ready);
> @@ -1228,7 +1214,8 @@ EXPORT_SYMBOL(drm_sched_submit_ready);
>    */
>   void drm_sched_submit_stop(struct drm_gpu_scheduler *sched)
>   {
> -	kthread_park(sched->thread);
> +	WRITE_ONCE(sched->pause_submit, true);
> +	cancel_work_sync(&sched->work_submit);
>   }
>   EXPORT_SYMBOL(drm_sched_submit_stop);
>   
> @@ -1239,6 +1226,7 @@ EXPORT_SYMBOL(drm_sched_submit_stop);
>    */
>   void drm_sched_submit_start(struct drm_gpu_scheduler *sched)
>   {
> -	kthread_unpark(sched->thread);
> +	WRITE_ONCE(sched->pause_submit, false);
> +	queue_work(sched->submit_wq, &sched->work_submit);
>   }
>   EXPORT_SYMBOL(drm_sched_submit_start);
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index 06238e6d7f5c..38e092ea41e6 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -388,7 +388,7 @@ v3d_sched_init(struct v3d_dev *v3d)
>   	int ret;
>   
>   	ret = drm_sched_init(&v3d->queue[V3D_BIN].sched,
> -			     &v3d_bin_sched_ops,
> +			     &v3d_bin_sched_ops, NULL,
>   			     hw_jobs_limit, job_hang_limit,
>   			     msecs_to_jiffies(hang_limit_ms), NULL,
>   			     NULL, "v3d_bin", v3d->drm.dev);
> @@ -396,7 +396,7 @@ v3d_sched_init(struct v3d_dev *v3d)
>   		return ret;
>   
>   	ret = drm_sched_init(&v3d->queue[V3D_RENDER].sched,
> -			     &v3d_render_sched_ops,
> +			     &v3d_render_sched_ops, NULL,
>   			     hw_jobs_limit, job_hang_limit,
>   			     msecs_to_jiffies(hang_limit_ms), NULL,
>   			     NULL, "v3d_render", v3d->drm.dev);
> @@ -404,7 +404,7 @@ v3d_sched_init(struct v3d_dev *v3d)
>   		goto fail;
>   
>   	ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
> -			     &v3d_tfu_sched_ops,
> +			     &v3d_tfu_sched_ops, NULL,
>   			     hw_jobs_limit, job_hang_limit,
>   			     msecs_to_jiffies(hang_limit_ms), NULL,
>   			     NULL, "v3d_tfu", v3d->drm.dev);
> @@ -413,7 +413,7 @@ v3d_sched_init(struct v3d_dev *v3d)
>   
>   	if (v3d_has_csd(v3d)) {
>   		ret = drm_sched_init(&v3d->queue[V3D_CSD].sched,
> -				     &v3d_csd_sched_ops,
> +				     &v3d_csd_sched_ops, NULL,
>   				     hw_jobs_limit, job_hang_limit,
>   				     msecs_to_jiffies(hang_limit_ms), NULL,
>   				     NULL, "v3d_csd", v3d->drm.dev);
> @@ -421,7 +421,7 @@ v3d_sched_init(struct v3d_dev *v3d)
>   			goto fail;
>   
>   		ret = drm_sched_init(&v3d->queue[V3D_CACHE_CLEAN].sched,
> -				     &v3d_cache_clean_sched_ops,
> +				     &v3d_cache_clean_sched_ops, NULL,
>   				     hw_jobs_limit, job_hang_limit,
>   				     msecs_to_jiffies(hang_limit_ms), NULL,
>   				     NULL, "v3d_cache_clean", v3d->drm.dev);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index f12c5aea5294..278106e358a8 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -473,17 +473,16 @@ struct drm_sched_backend_ops {
>    * @timeout: the time after which a job is removed from the scheduler.
>    * @name: name of the ring for which this scheduler is being used.
>    * @sched_rq: priority wise array of run queues.
> - * @wake_up_worker: the wait queue on which the scheduler sleeps until a job
> - *                  is ready to be scheduled.
>    * @job_scheduled: once @drm_sched_entity_do_release is called the scheduler
>    *                 waits on this wait queue until all the scheduled jobs are
>    *                 finished.
>    * @hw_rq_count: the number of jobs currently in the hardware queue.
>    * @job_id_count: used to assign unique id to the each job.
> + * @submit_wq: workqueue used to queue @work_submit
>    * @timeout_wq: workqueue used to queue @work_tdr
> + * @work_submit: schedules jobs and cleans up entities
>    * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
>    *            timeout interval is over.
> - * @thread: the kthread on which the scheduler which run.
>    * @pending_list: the list of jobs which are currently in the job queue.
>    * @job_list_lock: lock to protect the pending_list.
>    * @hang_limit: once the hangs by a job crosses this limit then it is marked
> @@ -492,6 +491,7 @@ struct drm_sched_backend_ops {
>    * @_score: score used when the driver doesn't provide one
>    * @ready: marks if the underlying HW is ready to work
>    * @free_guilty: A hit to time out handler to free the guilty job.
> + * @pause_submit: pause queuing of @work_submit on @submit_wq
>    * @dev: system &struct device
>    *
>    * One scheduler is implemented for each hardware ring.
> @@ -502,13 +502,13 @@ struct drm_gpu_scheduler {
>   	long				timeout;
>   	const char			*name;
>   	struct drm_sched_rq		sched_rq[DRM_SCHED_PRIORITY_COUNT];
> -	wait_queue_head_t		wake_up_worker;
>   	wait_queue_head_t		job_scheduled;
>   	atomic_t			hw_rq_count;
>   	atomic64_t			job_id_count;
> +	struct workqueue_struct		*submit_wq;
>   	struct workqueue_struct		*timeout_wq;
> +	struct work_struct		work_submit;
>   	struct delayed_work		work_tdr;
> -	struct task_struct		*thread;
>   	struct list_head		pending_list;
>   	spinlock_t			job_list_lock;
>   	int				hang_limit;
> @@ -516,11 +516,13 @@ struct drm_gpu_scheduler {
>   	atomic_t                        _score;
>   	bool				ready;
>   	bool				free_guilty;
> +	bool				pause_submit;
>   	struct device			*dev;
>   };
>   
>   int drm_sched_init(struct drm_gpu_scheduler *sched,
>   		   const struct drm_sched_backend_ops *ops,
> +		   struct workqueue_struct *submit_wq,
>   		   uint32_t hw_submission, unsigned hang_limit,
>   		   long timeout, struct workqueue_struct *timeout_wq,
>   		   atomic_t *score, const char *name, struct device *dev);

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill
  2023-09-12 14:47     ` Matthew Brost
@ 2023-09-16 17:52       ` Danilo Krummrich
  2023-09-18 11:03         ` Christian König
  0 siblings, 1 reply; 53+ messages in thread
From: Danilo Krummrich @ 2023-09-16 17:52 UTC (permalink / raw)
  To: Matthew Brost, Christian König
  Cc: robdclark, sarah.walker, ketil.johnsen, lina, mcanal, Liviu.Dudau,
	dri-devel, luben.tuikov, donald.robson, boris.brezillon, intel-xe,
	faith.ekstrand

On 9/12/23 16:47, Matthew Brost wrote:
> On Tue, Sep 12, 2023 at 11:57:30AM +0200, Christian König wrote:
>> Am 12.09.23 um 04:16 schrieb Matthew Brost:
>>> Wait for pending jobs to be complete before signaling queued jobs. This
>>> ensures dma-fence signaling order correct and also ensures the entity is
>>> not running on the hardware after drm_sched_entity_flush or
>>> drm_sched_entity_fini returns.
>>
>> Entities are *not* supposed to outlive the submissions they carry and we
>> absolutely *can't* wait for submissions to finish while killing the entity.
>>
>> In other words it is perfectly expected that entities doesn't exists any
>> more while the submissions they carried are still running on the hardware.
>>
>> I somehow better need to document how this working and especially why it is
>> working like that.
>>
>> This approach came up like four or five times now and we already applied and
>> reverted patches doing this.
>>
>> For now let's take a look at the source code of drm_sched_entity_kill():
>>
>>         /* The entity is guaranteed to not be used by the scheduler */
>>          prev = rcu_dereference_check(entity->last_scheduled, true);
>>          dma_fence_get(prev);
>>
>>          while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue))))
>> {
>>                  struct drm_sched_fence *s_fence = job->s_fence;
>>
>>                  dma_fence_get(&s_fence->finished);
>>                  if (!prev || dma_fence_add_callback(prev, &job->finish_cb,
>> drm_sched_entity_kill_jobs_cb))
>>                          drm_sched_entity_kill_jobs_cb(NULL,
>> &job->finish_cb);
>>
>>                  prev = &s_fence->finished;
>>          }
>>          dma_fence_put(prev);
>>
>> This ensures the dma-fence signaling order by delegating signaling of the
>> scheduler fences into callbacks.
>>
> 
> Thanks for the explaination, this code makes more sense now. Agree this
> patch is not correct.
> 
> This patch really is an RFC for something Nouveau needs, I can delete
> this patch in the next rev and let Nouveau run with a slightly different
> version if needed.

Maybe there was a misunderstanding, I do not see any need for this in Nouveau.

Instead, what I think we need is a way to wait for the pending_list being empty
(meaning all jobs on the pending_list are freed) before we call drm_sched_fini().

Currently, if we call drm_sched_fini() there might still be pending jobs on the
pending_list (unless the driver implements something driver specific).
drm_sched_fini() stops the scheduler work though, hence pending jobs will never be
freed up leaking their memory.

This might also be true for existing drivers, but since they call drm_sched_fini()
from their driver remove callback, this race is extremely unlikely. However, it
definitely is an issue for drivers using the single entitiy policy calling
drm_sched_fini() from a context where it is much more likely pending jobs still
exist.

> 
> Matt
> 
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
>>>    drivers/gpu/drm/scheduler/sched_entity.c    |  7 ++-
>>>    drivers/gpu/drm/scheduler/sched_main.c      | 50 ++++++++++++++++++---
>>>    include/drm/gpu_scheduler.h                 | 18 ++++++++
>>>    4 files changed, 70 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>> index fb5dad687168..7835c0da65c5 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>> @@ -1873,7 +1873,7 @@ static void amdgpu_ib_preempt_mark_partial_job(struct amdgpu_ring *ring)
>>>    	list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) {
>>>    		if (dma_fence_is_signaled(&s_job->s_fence->finished)) {
>>>    			/* remove job from ring_mirror_list */
>>> -			list_del_init(&s_job->list);
>>> +			drm_sched_remove_pending_job(s_job);
>>>    			sched->ops->free_job(s_job);
>>>    			continue;
>>>    		}
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index 1dec97caaba3..37557fbb96d0 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -104,9 +104,11 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>>>    	}
>>>    	init_completion(&entity->entity_idle);
>>> +	init_completion(&entity->jobs_done);
>>> -	/* We start in an idle state. */
>>> +	/* We start in an idle and jobs done state. */
>>>    	complete_all(&entity->entity_idle);
>>> +	complete_all(&entity->jobs_done);
>>>    	spin_lock_init(&entity->rq_lock);
>>>    	spsc_queue_init(&entity->job_queue);
>>> @@ -256,6 +258,9 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
>>>    	/* Make sure this entity is not used by the scheduler at the moment */
>>>    	wait_for_completion(&entity->entity_idle);
>>> +	/* Make sure all pending jobs are done */
>>> +	wait_for_completion(&entity->jobs_done);
>>> +
>>>    	/* The entity is guaranteed to not be used by the scheduler */
>>>    	prev = rcu_dereference_check(entity->last_scheduled, true);
>>>    	dma_fence_get(prev);
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 689fb6686e01..ed6f5680793a 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -510,12 +510,52 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
>>>    }
>>>    EXPORT_SYMBOL(drm_sched_resume_timeout);
>>> +/**
>>> + * drm_sched_add_pending_job - Add pending job to scheduler
>>> + *
>>> + * @job: scheduler job to add
>>> + * @tail: add to tail of pending list
>>> + */
>>> +void drm_sched_add_pending_job(struct drm_sched_job *job, bool tail)
>>> +{
>>> +	struct drm_gpu_scheduler *sched = job->sched;
>>> +	struct drm_sched_entity *entity = job->entity;
>>> +
>>> +	lockdep_assert_held(&sched->job_list_lock);
>>> +
>>> +	if (tail)
>>> +		list_add_tail(&job->list, &sched->pending_list);
>>> +	else
>>> +		list_add(&job->list, &sched->pending_list);
>>> +	if (!entity->pending_job_count++)
>>> +		reinit_completion(&entity->jobs_done);
>>> +}
>>> +EXPORT_SYMBOL(drm_sched_add_pending_job);
>>> +
>>> +/**
>>> + * drm_sched_remove_pending_job - Remove pending job from` scheduler
>>> + *
>>> + * @job: scheduler job to remove
>>> + */
>>> +void drm_sched_remove_pending_job(struct drm_sched_job *job)
>>> +{
>>> +	struct drm_gpu_scheduler *sched = job->sched;
>>> +	struct drm_sched_entity *entity = job->entity;
>>> +
>>> +	lockdep_assert_held(&sched->job_list_lock);
>>> +
>>> +	list_del_init(&job->list);
>>> +	if (!--entity->pending_job_count)
>>> +		complete_all(&entity->jobs_done);
>>> +}
>>> +EXPORT_SYMBOL(drm_sched_remove_pending_job);
>>> +
>>>    static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>    {
>>>    	struct drm_gpu_scheduler *sched = s_job->sched;
>>>    	spin_lock(&sched->job_list_lock);
>>> -	list_add_tail(&s_job->list, &sched->pending_list);
>>> +	drm_sched_add_pending_job(s_job, true);
>>>    	spin_unlock(&sched->job_list_lock);
>>>    }
>>> @@ -538,7 +578,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
>>>    		 * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
>>>    		 * is parked at which point it's safe.
>>>    		 */
>>> -		list_del_init(&job->list);
>>> +		drm_sched_remove_pending_job(job);
>>>    		spin_unlock(&sched->job_list_lock);
>>>    		status = job->sched->ops->timedout_job(job);
>>> @@ -589,7 +629,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>>>    		 * Add at the head of the queue to reflect it was the earliest
>>>    		 * job extracted.
>>>    		 */
>>> -		list_add(&bad->list, &sched->pending_list);
>>> +		drm_sched_add_pending_job(bad, false);
>>>    	/*
>>>    	 * Iterate the job list from later to  earlier one and either deactive
>>> @@ -611,7 +651,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>>>    			 * Locking here is for concurrent resume timeout
>>>    			 */
>>>    			spin_lock(&sched->job_list_lock);
>>> -			list_del_init(&s_job->list);
>>> +			drm_sched_remove_pending_job(s_job);
>>>    			spin_unlock(&sched->job_list_lock);
>>>    			/*
>>> @@ -1066,7 +1106,7 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>>    	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>>    		/* remove job from pending_list */
>>> -		list_del_init(&job->list);
>>> +		drm_sched_remove_pending_job(job);
>>>    		/* cancel this job's TO timer */
>>>    		cancel_delayed_work(&sched->work_tdr);
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index b7b818cd81b6..7c628f36fe78 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -233,6 +233,21 @@ struct drm_sched_entity {
>>>    	 */
>>>    	struct completion		entity_idle;
>>> +	/**
>>> +	 * @pending_job_count:
>>> +	 *
>>> +	 * Number of pending jobs.
>>> +	 */
>>> +	unsigned int                    pending_job_count;
>>> +
>>> +	/**
>>> +	 * @jobs_done:
>>> +	 *
>>> +	 * Signals when entity has no pending jobs, used to sequence entity
>>> +	 * cleanup in drm_sched_entity_fini().
>>> +	 */
>>> +	struct completion		jobs_done;
>>> +
>>>    	/**
>>>    	 * @oldest_job_waiting:
>>>    	 *
>>> @@ -656,4 +671,7 @@ struct drm_gpu_scheduler *
>>>    drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>>    		     unsigned int num_sched_list);
>>> +void drm_sched_add_pending_job(struct drm_sched_job *job, bool tail);
>>> +void drm_sched_remove_pending_job(struct drm_sched_job *job);
>>> +
>>>    #endif
>>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 12/13] drm/sched/doc: Add Entity teardown documentaion
  2023-09-12  2:16 ` [Intel-xe] [PATCH v3 12/13] drm/sched/doc: Add Entity teardown documentaion Matthew Brost
  2023-09-13 15:04   ` Christian König
  2023-09-14  2:06   ` Luben Tuikov
@ 2023-09-16 18:06   ` Danilo Krummrich
  2 siblings, 0 replies; 53+ messages in thread
From: Danilo Krummrich @ 2023-09-16 18:06 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ketil.johnsen, lina, mcanal, Liviu.Dudau,
	luben.tuikov, donald.robson, boris.brezillon, christian.koenig,
	faith.ekstrand

On 9/12/23 04:16, Matthew Brost wrote:
> Provide documentation to guide in ways to teardown an entity.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   Documentation/gpu/drm-mm.rst             |  6 ++++++
>   drivers/gpu/drm/scheduler/sched_entity.c | 19 +++++++++++++++++++
>   2 files changed, 25 insertions(+)
> 
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index c19b34b1c0ed..cb4d6097897e 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -552,6 +552,12 @@ Overview
>   .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
>      :doc: Overview
>   
> +Entity teardown
> +---------------

While I think it is good to document this as well, my concern was more about tearing
down the drm_gpu_scheduler. (See also my response to patch 11 of this series.)

How do we ensure that the pending_list is actually empty before calling
drm_sched_fini()? If we don't, we potentially leak memory.

For instance, we could let drm_sched_fini() (or a separate drm_sched_teardown())
cancel run work first and leave free work running until the pending_list is empty.

If we think drivers should take care themselves (e.g. through reference counting jobs
per scheduler), we should document this and explain why we can't have the scheduler do
this for us.

> +
> +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_entity.c
> +   :doc: Entity teardown
> +
>   Scheduler Function References
>   -----------------------------
>   
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 37557fbb96d0..76f3e10218bb 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -21,6 +21,25 @@
>    *
>    */
>   
> +/**
> + * DOC: Entity teardown
> + *
> + * Drivers can teardown down an entity for several reasons. Reasons typically
> + * are a user closes the entity via an IOCTL, the FD associated with the entity
> + * is closed, or the entity encounters an error. The GPU scheduler provides the
> + * basic infrastructure to do this in a few different ways.
> + *
> + * 1. Let the entity run dry (both the pending list and job queue) and then call
> + * drm_sched_entity_fini. The backend can accelerate the process of running dry.
> + * For example set a flag so run_job is a NOP and set the TDR to a low value to
> + * signal all jobs in a timely manner (this example works for
> + * DRM_SCHED_POLICY_SINGLE_ENTITY).
> + *
> + * 2. Kill the entity directly via drm_sched_entity_flush /
> + * drm_sched_entity_fini ensuring all pending and queued jobs are off the
> + * hardware and signaled.
> + */
> +
>   #include <linux/kthread.h>
>   #include <linux/slab.h>
>   #include <linux/completion.h>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill
  2023-09-16 17:52       ` Danilo Krummrich
@ 2023-09-18 11:03         ` Christian König
  2023-09-18 14:57           ` Danilo Krummrich
  0 siblings, 1 reply; 53+ messages in thread
From: Christian König @ 2023-09-18 11:03 UTC (permalink / raw)
  To: Danilo Krummrich, Matthew Brost
  Cc: robdclark, sarah.walker, ketil.johnsen, lina, mcanal, Liviu.Dudau,
	dri-devel, luben.tuikov, donald.robson, boris.brezillon, intel-xe,
	faith.ekstrand

Am 16.09.23 um 19:52 schrieb Danilo Krummrich:
> On 9/12/23 16:47, Matthew Brost wrote:
>> On Tue, Sep 12, 2023 at 11:57:30AM +0200, Christian König wrote:
>>> Am 12.09.23 um 04:16 schrieb Matthew Brost:
>>>> Wait for pending jobs to be complete before signaling queued jobs. 
>>>> This
>>>> ensures dma-fence signaling order correct and also ensures the 
>>>> entity is
>>>> not running on the hardware after drm_sched_entity_flush or
>>>> drm_sched_entity_fini returns.
>>>
>>> Entities are *not* supposed to outlive the submissions they carry 
>>> and we
>>> absolutely *can't* wait for submissions to finish while killing the 
>>> entity.
>>>
>>> In other words it is perfectly expected that entities doesn't exists 
>>> any
>>> more while the submissions they carried are still running on the 
>>> hardware.
>>>
>>> I somehow better need to document how this working and especially 
>>> why it is
>>> working like that.
>>>
>>> This approach came up like four or five times now and we already 
>>> applied and
>>> reverted patches doing this.
>>>
>>> For now let's take a look at the source code of 
>>> drm_sched_entity_kill():
>>>
>>>         /* The entity is guaranteed to not be used by the scheduler */
>>>          prev = rcu_dereference_check(entity->last_scheduled, true);
>>>          dma_fence_get(prev);
>>>
>>>          while ((job = 
>>> to_drm_sched_job(spsc_queue_pop(&entity->job_queue))))
>>> {
>>>                  struct drm_sched_fence *s_fence = job->s_fence;
>>>
>>>                  dma_fence_get(&s_fence->finished);
>>>                  if (!prev || dma_fence_add_callback(prev, 
>>> &job->finish_cb,
>>> drm_sched_entity_kill_jobs_cb))
>>>                          drm_sched_entity_kill_jobs_cb(NULL,
>>> &job->finish_cb);
>>>
>>>                  prev = &s_fence->finished;
>>>          }
>>>          dma_fence_put(prev);
>>>
>>> This ensures the dma-fence signaling order by delegating signaling 
>>> of the
>>> scheduler fences into callbacks.
>>>
>>
>> Thanks for the explaination, this code makes more sense now. Agree this
>> patch is not correct.
>>
>> This patch really is an RFC for something Nouveau needs, I can delete
>> this patch in the next rev and let Nouveau run with a slightly different
>> version if needed.
>
> Maybe there was a misunderstanding, I do not see any need for this in 
> Nouveau.
>
> Instead, what I think we need is a way to wait for the pending_list 
> being empty
> (meaning all jobs on the pending_list are freed) before we call 
> drm_sched_fini().
>
> Currently, if we call drm_sched_fini() there might still be pending 
> jobs on the
> pending_list (unless the driver implements something driver specific).
> drm_sched_fini() stops the scheduler work though, hence pending jobs 
> will never be
> freed up leaking their memory.
>
> This might also be true for existing drivers, but since they call 
> drm_sched_fini()
> from their driver remove callback, this race is extremely unlikely. 
> However, it
> definitely is an issue for drivers using the single entitiy policy 
> calling
> drm_sched_fini() from a context where it is much more likely pending 
> jobs still
> exist.

Yeah, that's exactly one of the reasons why I want to get away from the 
idea that the scheduler is necessary for executing the commands.

What this component should do is to push jobs to the hardware and not 
overview their execution, that's the job of the driver.

In other words drivers should be able to call drm_sched_fini() while 
there are jobs still pending on the hardware.

Also keep in mind that you *can't* wait for all hw operations to finish 
in your flush or file descriptor close callback or you create 
un-killable processes.

Regards,
Christian.

>
>>
>> Matt
>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
>>>>    drivers/gpu/drm/scheduler/sched_entity.c    |  7 ++-
>>>>    drivers/gpu/drm/scheduler/sched_main.c      | 50 
>>>> ++++++++++++++++++---
>>>>    include/drm/gpu_scheduler.h                 | 18 ++++++++
>>>>    4 files changed, 70 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>> index fb5dad687168..7835c0da65c5 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>> @@ -1873,7 +1873,7 @@ static void 
>>>> amdgpu_ib_preempt_mark_partial_job(struct amdgpu_ring *ring)
>>>>        list_for_each_entry_safe(s_job, tmp, &sched->pending_list, 
>>>> list) {
>>>>            if (dma_fence_is_signaled(&s_job->s_fence->finished)) {
>>>>                /* remove job from ring_mirror_list */
>>>> -            list_del_init(&s_job->list);
>>>> +            drm_sched_remove_pending_job(s_job);
>>>>                sched->ops->free_job(s_job);
>>>>                continue;
>>>>            }
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
>>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> index 1dec97caaba3..37557fbb96d0 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> @@ -104,9 +104,11 @@ int drm_sched_entity_init(struct 
>>>> drm_sched_entity *entity,
>>>>        }
>>>>        init_completion(&entity->entity_idle);
>>>> +    init_completion(&entity->jobs_done);
>>>> -    /* We start in an idle state. */
>>>> +    /* We start in an idle and jobs done state. */
>>>>        complete_all(&entity->entity_idle);
>>>> +    complete_all(&entity->jobs_done);
>>>>        spin_lock_init(&entity->rq_lock);
>>>>        spsc_queue_init(&entity->job_queue);
>>>> @@ -256,6 +258,9 @@ static void drm_sched_entity_kill(struct 
>>>> drm_sched_entity *entity)
>>>>        /* Make sure this entity is not used by the scheduler at the 
>>>> moment */
>>>>        wait_for_completion(&entity->entity_idle);
>>>> +    /* Make sure all pending jobs are done */
>>>> +    wait_for_completion(&entity->jobs_done);
>>>> +
>>>>        /* The entity is guaranteed to not be used by the scheduler */
>>>>        prev = rcu_dereference_check(entity->last_scheduled, true);
>>>>        dma_fence_get(prev);
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index 689fb6686e01..ed6f5680793a 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -510,12 +510,52 @@ void drm_sched_resume_timeout(struct 
>>>> drm_gpu_scheduler *sched,
>>>>    }
>>>>    EXPORT_SYMBOL(drm_sched_resume_timeout);
>>>> +/**
>>>> + * drm_sched_add_pending_job - Add pending job to scheduler
>>>> + *
>>>> + * @job: scheduler job to add
>>>> + * @tail: add to tail of pending list
>>>> + */
>>>> +void drm_sched_add_pending_job(struct drm_sched_job *job, bool tail)
>>>> +{
>>>> +    struct drm_gpu_scheduler *sched = job->sched;
>>>> +    struct drm_sched_entity *entity = job->entity;
>>>> +
>>>> +    lockdep_assert_held(&sched->job_list_lock);
>>>> +
>>>> +    if (tail)
>>>> +        list_add_tail(&job->list, &sched->pending_list);
>>>> +    else
>>>> +        list_add(&job->list, &sched->pending_list);
>>>> +    if (!entity->pending_job_count++)
>>>> +        reinit_completion(&entity->jobs_done);
>>>> +}
>>>> +EXPORT_SYMBOL(drm_sched_add_pending_job);
>>>> +
>>>> +/**
>>>> + * drm_sched_remove_pending_job - Remove pending job from` scheduler
>>>> + *
>>>> + * @job: scheduler job to remove
>>>> + */
>>>> +void drm_sched_remove_pending_job(struct drm_sched_job *job)
>>>> +{
>>>> +    struct drm_gpu_scheduler *sched = job->sched;
>>>> +    struct drm_sched_entity *entity = job->entity;
>>>> +
>>>> +    lockdep_assert_held(&sched->job_list_lock);
>>>> +
>>>> +    list_del_init(&job->list);
>>>> +    if (!--entity->pending_job_count)
>>>> +        complete_all(&entity->jobs_done);
>>>> +}
>>>> +EXPORT_SYMBOL(drm_sched_remove_pending_job);
>>>> +
>>>>    static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>>    {
>>>>        struct drm_gpu_scheduler *sched = s_job->sched;
>>>>        spin_lock(&sched->job_list_lock);
>>>> -    list_add_tail(&s_job->list, &sched->pending_list);
>>>> +    drm_sched_add_pending_job(s_job, true);
>>>>        spin_unlock(&sched->job_list_lock);
>>>>    }
>>>> @@ -538,7 +578,7 @@ static void drm_sched_job_timedout(struct 
>>>> work_struct *work)
>>>>             * drm_sched_cleanup_jobs. It will be reinserted back 
>>>> after sched->thread
>>>>             * is parked at which point it's safe.
>>>>             */
>>>> -        list_del_init(&job->list);
>>>> +        drm_sched_remove_pending_job(job);
>>>>            spin_unlock(&sched->job_list_lock);
>>>>            status = job->sched->ops->timedout_job(job);
>>>> @@ -589,7 +629,7 @@ void drm_sched_stop(struct drm_gpu_scheduler 
>>>> *sched, struct drm_sched_job *bad)
>>>>             * Add at the head of the queue to reflect it was the 
>>>> earliest
>>>>             * job extracted.
>>>>             */
>>>> -        list_add(&bad->list, &sched->pending_list);
>>>> +        drm_sched_add_pending_job(bad, false);
>>>>        /*
>>>>         * Iterate the job list from later to  earlier one and 
>>>> either deactive
>>>> @@ -611,7 +651,7 @@ void drm_sched_stop(struct drm_gpu_scheduler 
>>>> *sched, struct drm_sched_job *bad)
>>>>                 * Locking here is for concurrent resume timeout
>>>>                 */
>>>>                spin_lock(&sched->job_list_lock);
>>>> -            list_del_init(&s_job->list);
>>>> +            drm_sched_remove_pending_job(s_job);
>>>>                spin_unlock(&sched->job_list_lock);
>>>>                /*
>>>> @@ -1066,7 +1106,7 @@ drm_sched_get_cleanup_job(struct 
>>>> drm_gpu_scheduler *sched)
>>>>        if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>>>            /* remove job from pending_list */
>>>> -        list_del_init(&job->list);
>>>> +        drm_sched_remove_pending_job(job);
>>>>            /* cancel this job's TO timer */
>>>>            cancel_delayed_work(&sched->work_tdr);
>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>> index b7b818cd81b6..7c628f36fe78 100644
>>>> --- a/include/drm/gpu_scheduler.h
>>>> +++ b/include/drm/gpu_scheduler.h
>>>> @@ -233,6 +233,21 @@ struct drm_sched_entity {
>>>>         */
>>>>        struct completion        entity_idle;
>>>> +    /**
>>>> +     * @pending_job_count:
>>>> +     *
>>>> +     * Number of pending jobs.
>>>> +     */
>>>> +    unsigned int                    pending_job_count;
>>>> +
>>>> +    /**
>>>> +     * @jobs_done:
>>>> +     *
>>>> +     * Signals when entity has no pending jobs, used to sequence 
>>>> entity
>>>> +     * cleanup in drm_sched_entity_fini().
>>>> +     */
>>>> +    struct completion        jobs_done;
>>>> +
>>>>        /**
>>>>         * @oldest_job_waiting:
>>>>         *
>>>> @@ -656,4 +671,7 @@ struct drm_gpu_scheduler *
>>>>    drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>>>                 unsigned int num_sched_list);
>>>> +void drm_sched_add_pending_job(struct drm_sched_job *job, bool tail);
>>>> +void drm_sched_remove_pending_job(struct drm_sched_job *job);
>>>> +
>>>>    #endif
>>>
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill
  2023-09-18 11:03         ` Christian König
@ 2023-09-18 14:57           ` Danilo Krummrich
  2023-09-19  5:55             ` Christian König
  0 siblings, 1 reply; 53+ messages in thread
From: Danilo Krummrich @ 2023-09-18 14:57 UTC (permalink / raw)
  To: Christian König, Matthew Brost
  Cc: robdclark, sarah.walker, ketil.johnsen, lina, mcanal, Liviu.Dudau,
	dri-devel, luben.tuikov, donald.robson, boris.brezillon, intel-xe,
	faith.ekstrand

On 9/18/23 13:03, Christian König wrote:
> Am 16.09.23 um 19:52 schrieb Danilo Krummrich:
>> On 9/12/23 16:47, Matthew Brost wrote:
>>> On Tue, Sep 12, 2023 at 11:57:30AM +0200, Christian König wrote:
>>>> Am 12.09.23 um 04:16 schrieb Matthew Brost:
>>>>> Wait for pending jobs to be complete before signaling queued jobs. This
>>>>> ensures dma-fence signaling order correct and also ensures the entity is
>>>>> not running on the hardware after drm_sched_entity_flush or
>>>>> drm_sched_entity_fini returns.
>>>>
>>>> Entities are *not* supposed to outlive the submissions they carry and we
>>>> absolutely *can't* wait for submissions to finish while killing the entity.
>>>>
>>>> In other words it is perfectly expected that entities doesn't exists any
>>>> more while the submissions they carried are still running on the hardware.
>>>>
>>>> I somehow better need to document how this working and especially why it is
>>>> working like that.
>>>>
>>>> This approach came up like four or five times now and we already applied and
>>>> reverted patches doing this.
>>>>
>>>> For now let's take a look at the source code of drm_sched_entity_kill():
>>>>
>>>>         /* The entity is guaranteed to not be used by the scheduler */
>>>>          prev = rcu_dereference_check(entity->last_scheduled, true);
>>>>          dma_fence_get(prev);
>>>>
>>>>          while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue))))
>>>> {
>>>>                  struct drm_sched_fence *s_fence = job->s_fence;
>>>>
>>>>                  dma_fence_get(&s_fence->finished);
>>>>                  if (!prev || dma_fence_add_callback(prev, &job->finish_cb,
>>>> drm_sched_entity_kill_jobs_cb))
>>>>                          drm_sched_entity_kill_jobs_cb(NULL,
>>>> &job->finish_cb);
>>>>
>>>>                  prev = &s_fence->finished;
>>>>          }
>>>>          dma_fence_put(prev);
>>>>
>>>> This ensures the dma-fence signaling order by delegating signaling of the
>>>> scheduler fences into callbacks.
>>>>
>>>
>>> Thanks for the explaination, this code makes more sense now. Agree this
>>> patch is not correct.
>>>
>>> This patch really is an RFC for something Nouveau needs, I can delete
>>> this patch in the next rev and let Nouveau run with a slightly different
>>> version if needed.
>>
>> Maybe there was a misunderstanding, I do not see any need for this in Nouveau.
>>
>> Instead, what I think we need is a way to wait for the pending_list being empty
>> (meaning all jobs on the pending_list are freed) before we call drm_sched_fini().
>>
>> Currently, if we call drm_sched_fini() there might still be pending jobs on the
>> pending_list (unless the driver implements something driver specific).
>> drm_sched_fini() stops the scheduler work though, hence pending jobs will never be
>> freed up leaking their memory.
>>
>> This might also be true for existing drivers, but since they call drm_sched_fini()
>> from their driver remove callback, this race is extremely unlikely. However, it
>> definitely is an issue for drivers using the single entitiy policy calling
>> drm_sched_fini() from a context where it is much more likely pending jobs still
>> exist.
> 
> Yeah, that's exactly one of the reasons why I want to get away from the idea that the scheduler is necessary for executing the commands.
> 
> What this component should do is to push jobs to the hardware and not overview their execution, that's the job of the driver.

While, generally, I'd agree, I think we can't really get around having something that
frees the job once it's fence got signaled. This "something" could be the driver, but
once it ends up being the same code over and over again for every driver, we're probably
back letting the scheduler do it instead in a common way.

> 
> In other words drivers should be able to call drm_sched_fini() while there are jobs still pending on the hardware.

Unless we have a better idea on how to do this, I'd, as mentioned, suggest to have something
like drm_sched_teardown() and/or drm_sched_teardown_timeout() waiting for pending jobs.

> 
> Also keep in mind that you *can't* wait for all hw operations to finish in your flush or file descriptor close callback or you create un-killable processes.

Right, that's why in Nouveau I try to wait for the channel (ring) being idle and if this didn't
work in a "reasonable" amount of time, I kill the fence context, signalling all fences with an
error code, and wait for the scheduler being idle, which comes down to only wait for all free_job()
callbacks to finish, since all jobs are signaled already.

> 
> Regards,
> Christian.
> 
>>
>>>
>>> Matt
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
>>>>>    drivers/gpu/drm/scheduler/sched_entity.c    |  7 ++-
>>>>>    drivers/gpu/drm/scheduler/sched_main.c      | 50 ++++++++++++++++++---
>>>>>    include/drm/gpu_scheduler.h                 | 18 ++++++++
>>>>>    4 files changed, 70 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>>> index fb5dad687168..7835c0da65c5 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>>> @@ -1873,7 +1873,7 @@ static void amdgpu_ib_preempt_mark_partial_job(struct amdgpu_ring *ring)
>>>>>        list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) {
>>>>>            if (dma_fence_is_signaled(&s_job->s_fence->finished)) {
>>>>>                /* remove job from ring_mirror_list */
>>>>> -            list_del_init(&s_job->list);
>>>>> +            drm_sched_remove_pending_job(s_job);
>>>>>                sched->ops->free_job(s_job);
>>>>>                continue;
>>>>>            }
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> index 1dec97caaba3..37557fbb96d0 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> @@ -104,9 +104,11 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>>>>>        }
>>>>>        init_completion(&entity->entity_idle);
>>>>> +    init_completion(&entity->jobs_done);
>>>>> -    /* We start in an idle state. */
>>>>> +    /* We start in an idle and jobs done state. */
>>>>>        complete_all(&entity->entity_idle);
>>>>> +    complete_all(&entity->jobs_done);
>>>>>        spin_lock_init(&entity->rq_lock);
>>>>>        spsc_queue_init(&entity->job_queue);
>>>>> @@ -256,6 +258,9 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity)
>>>>>        /* Make sure this entity is not used by the scheduler at the moment */
>>>>>        wait_for_completion(&entity->entity_idle);
>>>>> +    /* Make sure all pending jobs are done */
>>>>> +    wait_for_completion(&entity->jobs_done);
>>>>> +
>>>>>        /* The entity is guaranteed to not be used by the scheduler */
>>>>>        prev = rcu_dereference_check(entity->last_scheduled, true);
>>>>>        dma_fence_get(prev);
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 689fb6686e01..ed6f5680793a 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -510,12 +510,52 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
>>>>>    }
>>>>>    EXPORT_SYMBOL(drm_sched_resume_timeout);
>>>>> +/**
>>>>> + * drm_sched_add_pending_job - Add pending job to scheduler
>>>>> + *
>>>>> + * @job: scheduler job to add
>>>>> + * @tail: add to tail of pending list
>>>>> + */
>>>>> +void drm_sched_add_pending_job(struct drm_sched_job *job, bool tail)
>>>>> +{
>>>>> +    struct drm_gpu_scheduler *sched = job->sched;
>>>>> +    struct drm_sched_entity *entity = job->entity;
>>>>> +
>>>>> +    lockdep_assert_held(&sched->job_list_lock);
>>>>> +
>>>>> +    if (tail)
>>>>> +        list_add_tail(&job->list, &sched->pending_list);
>>>>> +    else
>>>>> +        list_add(&job->list, &sched->pending_list);
>>>>> +    if (!entity->pending_job_count++)
>>>>> +        reinit_completion(&entity->jobs_done);
>>>>> +}
>>>>> +EXPORT_SYMBOL(drm_sched_add_pending_job);
>>>>> +
>>>>> +/**
>>>>> + * drm_sched_remove_pending_job - Remove pending job from` scheduler
>>>>> + *
>>>>> + * @job: scheduler job to remove
>>>>> + */
>>>>> +void drm_sched_remove_pending_job(struct drm_sched_job *job)
>>>>> +{
>>>>> +    struct drm_gpu_scheduler *sched = job->sched;
>>>>> +    struct drm_sched_entity *entity = job->entity;
>>>>> +
>>>>> +    lockdep_assert_held(&sched->job_list_lock);
>>>>> +
>>>>> +    list_del_init(&job->list);
>>>>> +    if (!--entity->pending_job_count)
>>>>> +        complete_all(&entity->jobs_done);
>>>>> +}
>>>>> +EXPORT_SYMBOL(drm_sched_remove_pending_job);
>>>>> +
>>>>>    static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>>>    {
>>>>>        struct drm_gpu_scheduler *sched = s_job->sched;
>>>>>        spin_lock(&sched->job_list_lock);
>>>>> -    list_add_tail(&s_job->list, &sched->pending_list);
>>>>> +    drm_sched_add_pending_job(s_job, true);
>>>>>        spin_unlock(&sched->job_list_lock);
>>>>>    }
>>>>> @@ -538,7 +578,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
>>>>>             * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
>>>>>             * is parked at which point it's safe.
>>>>>             */
>>>>> -        list_del_init(&job->list);
>>>>> +        drm_sched_remove_pending_job(job);
>>>>>            spin_unlock(&sched->job_list_lock);
>>>>>            status = job->sched->ops->timedout_job(job);
>>>>> @@ -589,7 +629,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>>>>>             * Add at the head of the queue to reflect it was the earliest
>>>>>             * job extracted.
>>>>>             */
>>>>> -        list_add(&bad->list, &sched->pending_list);
>>>>> +        drm_sched_add_pending_job(bad, false);
>>>>>        /*
>>>>>         * Iterate the job list from later to  earlier one and either deactive
>>>>> @@ -611,7 +651,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>>>>>                 * Locking here is for concurrent resume timeout
>>>>>                 */
>>>>>                spin_lock(&sched->job_list_lock);
>>>>> -            list_del_init(&s_job->list);
>>>>> +            drm_sched_remove_pending_job(s_job);
>>>>>                spin_unlock(&sched->job_list_lock);
>>>>>                /*
>>>>> @@ -1066,7 +1106,7 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>>>>        if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>>>>            /* remove job from pending_list */
>>>>> -        list_del_init(&job->list);
>>>>> +        drm_sched_remove_pending_job(job);
>>>>>            /* cancel this job's TO timer */
>>>>>            cancel_delayed_work(&sched->work_tdr);
>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>> index b7b818cd81b6..7c628f36fe78 100644
>>>>> --- a/include/drm/gpu_scheduler.h
>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>> @@ -233,6 +233,21 @@ struct drm_sched_entity {
>>>>>         */
>>>>>        struct completion        entity_idle;
>>>>> +    /**
>>>>> +     * @pending_job_count:
>>>>> +     *
>>>>> +     * Number of pending jobs.
>>>>> +     */
>>>>> +    unsigned int                    pending_job_count;
>>>>> +
>>>>> +    /**
>>>>> +     * @jobs_done:
>>>>> +     *
>>>>> +     * Signals when entity has no pending jobs, used to sequence entity
>>>>> +     * cleanup in drm_sched_entity_fini().
>>>>> +     */
>>>>> +    struct completion        jobs_done;
>>>>> +
>>>>>        /**
>>>>>         * @oldest_job_waiting:
>>>>>         *
>>>>> @@ -656,4 +671,7 @@ struct drm_gpu_scheduler *
>>>>>    drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>>>>                 unsigned int num_sched_list);
>>>>> +void drm_sched_add_pending_job(struct drm_sched_job *job, bool tail);
>>>>> +void drm_sched_remove_pending_job(struct drm_sched_job *job);
>>>>> +
>>>>>    #endif
>>>>
>>
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill
  2023-09-18 14:57           ` Danilo Krummrich
@ 2023-09-19  5:55             ` Christian König
  0 siblings, 0 replies; 53+ messages in thread
From: Christian König @ 2023-09-19  5:55 UTC (permalink / raw)
  To: Danilo Krummrich, Matthew Brost
  Cc: robdclark, sarah.walker, ketil.johnsen, lina, mcanal, Liviu.Dudau,
	dri-devel, luben.tuikov, donald.robson, boris.brezillon, intel-xe,
	faith.ekstrand

Am 18.09.23 um 16:57 schrieb Danilo Krummrich:
> [SNIP]
>> What this component should do is to push jobs to the hardware and not 
>> overview their execution, that's the job of the driver.
>
> While, generally, I'd agree, I think we can't really get around having 
> something that
> frees the job once it's fence got signaled. This "something" could be 
> the driver, but
> once it ends up being the same code over and over again for every 
> driver, we're probably
> back letting the scheduler do it instead in a common way.

We already have a driver private void* in the scheduler fence. What we 
could .do is to let the scheduler provide a functionality to call a 
function when it signals.

>
>>
>> In other words drivers should be able to call drm_sched_fini() while 
>> there are jobs still pending on the hardware.
>
> Unless we have a better idea on how to do this, I'd, as mentioned, 
> suggest to have something
> like drm_sched_teardown() and/or drm_sched_teardown_timeout() waiting 
> for pending jobs.

Yeah, something like that. But I think the better functionality would be 
provide an interator to go over the pending fences in the scheduler.

This could then be used for quite a bunch of use cases, e.g. even for 
signaling the hardware fences etc...

Waiting for the last one is then just a "drm_sched_for_each_pending(...) 
dma_fence_wait_timeout(pending->finished....);".

>
>>
>> Also keep in mind that you *can't* wait for all hw operations to 
>> finish in your flush or file descriptor close callback or you create 
>> un-killable processes.
>
> Right, that's why in Nouveau I try to wait for the channel (ring) 
> being idle and if this didn't
> work in a "reasonable" amount of time, I kill the fence context, 
> signalling all fences with an
> error code, and wait for the scheduler being idle, which comes down to 
> only wait for all free_job()
> callbacks to finish, since all jobs are signaled already.

Exactly that's the right thing to do. Can we please document that somewhere?

Regards,
Christian.

>
>>
>> Regards,
>> Christian.
>>
>>>
>>>>
>>>> Matt
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
>>>>>>    drivers/gpu/drm/scheduler/sched_entity.c    |  7 ++-
>>>>>>    drivers/gpu/drm/scheduler/sched_main.c      | 50 
>>>>>> ++++++++++++++++++---
>>>>>>    include/drm/gpu_scheduler.h                 | 18 ++++++++
>>>>>>    4 files changed, 70 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>>>> index fb5dad687168..7835c0da65c5 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>>>>>> @@ -1873,7 +1873,7 @@ static void 
>>>>>> amdgpu_ib_preempt_mark_partial_job(struct amdgpu_ring *ring)
>>>>>>        list_for_each_entry_safe(s_job, tmp, &sched->pending_list, 
>>>>>> list) {
>>>>>>            if (dma_fence_is_signaled(&s_job->s_fence->finished)) {
>>>>>>                /* remove job from ring_mirror_list */
>>>>>> -            list_del_init(&s_job->list);
>>>>>> +            drm_sched_remove_pending_job(s_job);
>>>>>>                sched->ops->free_job(s_job);
>>>>>>                continue;
>>>>>>            }
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
>>>>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>> index 1dec97caaba3..37557fbb96d0 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>> @@ -104,9 +104,11 @@ int drm_sched_entity_init(struct 
>>>>>> drm_sched_entity *entity,
>>>>>>        }
>>>>>>        init_completion(&entity->entity_idle);
>>>>>> +    init_completion(&entity->jobs_done);
>>>>>> -    /* We start in an idle state. */
>>>>>> +    /* We start in an idle and jobs done state. */
>>>>>>        complete_all(&entity->entity_idle);
>>>>>> +    complete_all(&entity->jobs_done);
>>>>>>        spin_lock_init(&entity->rq_lock);
>>>>>>        spsc_queue_init(&entity->job_queue);
>>>>>> @@ -256,6 +258,9 @@ static void drm_sched_entity_kill(struct 
>>>>>> drm_sched_entity *entity)
>>>>>>        /* Make sure this entity is not used by the scheduler at 
>>>>>> the moment */
>>>>>>        wait_for_completion(&entity->entity_idle);
>>>>>> +    /* Make sure all pending jobs are done */
>>>>>> +    wait_for_completion(&entity->jobs_done);
>>>>>> +
>>>>>>        /* The entity is guaranteed to not be used by the 
>>>>>> scheduler */
>>>>>>        prev = rcu_dereference_check(entity->last_scheduled, true);
>>>>>>        dma_fence_get(prev);
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> index 689fb6686e01..ed6f5680793a 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> @@ -510,12 +510,52 @@ void drm_sched_resume_timeout(struct 
>>>>>> drm_gpu_scheduler *sched,
>>>>>>    }
>>>>>>    EXPORT_SYMBOL(drm_sched_resume_timeout);
>>>>>> +/**
>>>>>> + * drm_sched_add_pending_job - Add pending job to scheduler
>>>>>> + *
>>>>>> + * @job: scheduler job to add
>>>>>> + * @tail: add to tail of pending list
>>>>>> + */
>>>>>> +void drm_sched_add_pending_job(struct drm_sched_job *job, bool 
>>>>>> tail)
>>>>>> +{
>>>>>> +    struct drm_gpu_scheduler *sched = job->sched;
>>>>>> +    struct drm_sched_entity *entity = job->entity;
>>>>>> +
>>>>>> +    lockdep_assert_held(&sched->job_list_lock);
>>>>>> +
>>>>>> +    if (tail)
>>>>>> +        list_add_tail(&job->list, &sched->pending_list);
>>>>>> +    else
>>>>>> +        list_add(&job->list, &sched->pending_list);
>>>>>> +    if (!entity->pending_job_count++)
>>>>>> +        reinit_completion(&entity->jobs_done);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_sched_add_pending_job);
>>>>>> +
>>>>>> +/**
>>>>>> + * drm_sched_remove_pending_job - Remove pending job from` 
>>>>>> scheduler
>>>>>> + *
>>>>>> + * @job: scheduler job to remove
>>>>>> + */
>>>>>> +void drm_sched_remove_pending_job(struct drm_sched_job *job)
>>>>>> +{
>>>>>> +    struct drm_gpu_scheduler *sched = job->sched;
>>>>>> +    struct drm_sched_entity *entity = job->entity;
>>>>>> +
>>>>>> +    lockdep_assert_held(&sched->job_list_lock);
>>>>>> +
>>>>>> +    list_del_init(&job->list);
>>>>>> +    if (!--entity->pending_job_count)
>>>>>> +        complete_all(&entity->jobs_done);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_sched_remove_pending_job);
>>>>>> +
>>>>>>    static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>>>>    {
>>>>>>        struct drm_gpu_scheduler *sched = s_job->sched;
>>>>>>        spin_lock(&sched->job_list_lock);
>>>>>> -    list_add_tail(&s_job->list, &sched->pending_list);
>>>>>> +    drm_sched_add_pending_job(s_job, true);
>>>>>>        spin_unlock(&sched->job_list_lock);
>>>>>>    }
>>>>>> @@ -538,7 +578,7 @@ static void drm_sched_job_timedout(struct 
>>>>>> work_struct *work)
>>>>>>             * drm_sched_cleanup_jobs. It will be reinserted back 
>>>>>> after sched->thread
>>>>>>             * is parked at which point it's safe.
>>>>>>             */
>>>>>> -        list_del_init(&job->list);
>>>>>> +        drm_sched_remove_pending_job(job);
>>>>>>            spin_unlock(&sched->job_list_lock);
>>>>>>            status = job->sched->ops->timedout_job(job);
>>>>>> @@ -589,7 +629,7 @@ void drm_sched_stop(struct drm_gpu_scheduler 
>>>>>> *sched, struct drm_sched_job *bad)
>>>>>>             * Add at the head of the queue to reflect it was the 
>>>>>> earliest
>>>>>>             * job extracted.
>>>>>>             */
>>>>>> -        list_add(&bad->list, &sched->pending_list);
>>>>>> +        drm_sched_add_pending_job(bad, false);
>>>>>>        /*
>>>>>>         * Iterate the job list from later to  earlier one and 
>>>>>> either deactive
>>>>>> @@ -611,7 +651,7 @@ void drm_sched_stop(struct drm_gpu_scheduler 
>>>>>> *sched, struct drm_sched_job *bad)
>>>>>>                 * Locking here is for concurrent resume timeout
>>>>>>                 */
>>>>>>                spin_lock(&sched->job_list_lock);
>>>>>> -            list_del_init(&s_job->list);
>>>>>> +            drm_sched_remove_pending_job(s_job);
>>>>>> spin_unlock(&sched->job_list_lock);
>>>>>>                /*
>>>>>> @@ -1066,7 +1106,7 @@ drm_sched_get_cleanup_job(struct 
>>>>>> drm_gpu_scheduler *sched)
>>>>>>        if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>>>>>            /* remove job from pending_list */
>>>>>> -        list_del_init(&job->list);
>>>>>> +        drm_sched_remove_pending_job(job);
>>>>>>            /* cancel this job's TO timer */
>>>>>>            cancel_delayed_work(&sched->work_tdr);
>>>>>> diff --git a/include/drm/gpu_scheduler.h 
>>>>>> b/include/drm/gpu_scheduler.h
>>>>>> index b7b818cd81b6..7c628f36fe78 100644
>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>> @@ -233,6 +233,21 @@ struct drm_sched_entity {
>>>>>>         */
>>>>>>        struct completion        entity_idle;
>>>>>> +    /**
>>>>>> +     * @pending_job_count:
>>>>>> +     *
>>>>>> +     * Number of pending jobs.
>>>>>> +     */
>>>>>> +    unsigned int                    pending_job_count;
>>>>>> +
>>>>>> +    /**
>>>>>> +     * @jobs_done:
>>>>>> +     *
>>>>>> +     * Signals when entity has no pending jobs, used to sequence 
>>>>>> entity
>>>>>> +     * cleanup in drm_sched_entity_fini().
>>>>>> +     */
>>>>>> +    struct completion        jobs_done;
>>>>>> +
>>>>>>        /**
>>>>>>         * @oldest_job_waiting:
>>>>>>         *
>>>>>> @@ -656,4 +671,7 @@ struct drm_gpu_scheduler *
>>>>>>    drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>>>>>                 unsigned int num_sched_list);
>>>>>> +void drm_sched_add_pending_job(struct drm_sched_job *job, bool 
>>>>>> tail);
>>>>>> +void drm_sched_remove_pending_job(struct drm_sched_job *job);
>>>>>> +
>>>>>>    #endif
>>>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-xe] [PATCH v3 09/13] drm/sched: Submit job before starting TDR
  2023-09-14 17:48     ` Matthew Brost
@ 2023-09-21  3:35       ` Luben Tuikov
  0 siblings, 0 replies; 53+ messages in thread
From: Luben Tuikov @ 2023-09-21  3:35 UTC (permalink / raw)
  To: Matthew Brost
  Cc: robdclark, sarah.walker, ketil.johnsen, lina, mcanal, Liviu.Dudau,
	dri-devel, intel-xe, boris.brezillon, donald.robson,
	christian.koenig, faith.ekstrand

On 2023-09-14 13:48, Matthew Brost wrote:
> On Wed, Sep 13, 2023 at 10:56:10PM -0400, Luben Tuikov wrote:
>> On 2023-09-11 22:16, Matthew Brost wrote:
>>> If the TDR is set to a value, it can fire before a job is submitted in
>>> drm_sched_main. The job should be always be submitted before the TDR
>>> fires, fix this ordering.
>>>
>>> v2:
>>>   - Add to pending list before run_job, start TDR after (Luben, Boris)
>>>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>  drivers/gpu/drm/scheduler/sched_main.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index c627d3e6494a..9dbfab7be2c6 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -498,7 +498,6 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>  
>>>  	spin_lock(&sched->job_list_lock);
>>>  	list_add_tail(&s_job->list, &sched->pending_list);
>>> -	drm_sched_start_timeout(sched);
>>>  	spin_unlock(&sched->job_list_lock);
>>>  }
>>>  
>>> @@ -1234,6 +1233,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
>>>  		fence = sched->ops->run_job(sched_job);
>>>  		complete_all(&entity->entity_idle);
>>>  		drm_sched_fence_scheduled(s_fence, fence);
>>> +		drm_sched_start_timeout_unlocked(sched);
>>>  
>>>  		if (!IS_ERR_OR_NULL(fence)) {
>>>  			/* Drop for original kref_init of the fence */
>>
>> So, sched->ops->run_job(), is a "job inflection point" from the point of view of
>> the DRM scheduler. After that call, DRM has relinquished control of the job to the
>> firmware/hardware.
>>
>> Putting the job in the pending list, before submitting it to down to the firmware/hardware,
>> goes along with starting a timeout timer for the job. The timeout always includes
>> time for the firmware/hardware to get it prepped, as well as time for the actual
>> execution of the job (task). Thus, we want to do this:
>> 	1. Put the job in pending list. "Pending list" means "pends in hardware".
>> 	2. Start a timeout timer for the job.
>> 	3. Start executing the job/task. This usually involves giving it to firmware/hardware,
>> 	   i.e. ownership of the job/task changes to another domain. In our case this is accomplished
>> 	   by calling sched->ops->run_job().
>> Perhaps move drm_sched_start_timeout() closer to sched->ops->run_job() from above and/or increase
>> the timeout value?
> 
> I disagree. It is clear race if the timeout starts before run_job() that
> the TDR can fire before run_job() is called. The entire point of this

Then that would mean that 1) the timeout time is too short, and/or 2) the firmware/hardware
took a really long time to complete the job (from the point of view of the scheduler TDR).

> patch is to seal this race by starting the TDR after run_job() is
> called.

Once you call run_job() you're no longer in control of the job and things can
happen, like this job being returned/cancelled due to reasons out of the scheduler's
control. If you started the timeout _after_ submitting the job to the hardware,
you may be racing with what the hardware might want to do to the job as described
in the previous sentence.

The timeout timer should start before we give away the job to the hardware.
This is not that dissimilar to sending a network packet out the interface.

If you need a longer timeout time, then we can do that, but starting the timeout
after giving away the job to the hardware is a no-go.

-- 
Regards,
Luben


^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2023-09-21  3:35 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 01/13] drm/sched: Add drm_sched_submit_* helpers Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
2023-09-12  7:29   ` Boris Brezillon
2023-09-12 15:02     ` Matthew Brost
2023-09-14  3:41       ` Luben Tuikov
2023-09-14  3:35   ` Luben Tuikov
2023-09-16 17:07   ` Danilo Krummrich
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity Matthew Brost
2023-09-12  7:37   ` Boris Brezillon
2023-09-12 15:14     ` Matthew Brost
2023-09-12 14:11   ` kernel test robot
2023-09-12 15:17     ` Matthew Brost
2023-09-14  4:18   ` Luben Tuikov
2023-09-14  4:23     ` Luben Tuikov
2023-09-14 15:48       ` Matthew Brost
2023-09-14 15:49     ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 04/13] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
2023-09-13 12:30   ` kernel test robot
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 05/13] drm/sched: Split free_job into own work item Matthew Brost
2023-09-12  8:08   ` Boris Brezillon
2023-09-12 14:37     ` Matthew Brost
2023-09-12 14:53       ` Boris Brezillon
2023-09-12 14:55         ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 06/13] drm/sched: Add generic scheduler message interface Matthew Brost
2023-09-12  8:23   ` Boris Brezillon
2023-09-12 14:50     ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 07/13] drm/sched: Add drm_sched_start_timeout_unlocked helper Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 08/13] drm/sched: Start run wq before TDR in drm_sched_start Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 09/13] drm/sched: Submit job before starting TDR Matthew Brost
2023-09-14  2:56   ` Luben Tuikov
2023-09-14 17:48     ` Matthew Brost
2023-09-21  3:35       ` Luben Tuikov
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 10/13] drm/sched: Add helper to set TDR timeout Matthew Brost
2023-09-14  2:38   ` Luben Tuikov
2023-09-14 17:36     ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill Matthew Brost
2023-09-12  8:44   ` Boris Brezillon
2023-09-12  9:57   ` Christian König
2023-09-12 14:47     ` Matthew Brost
2023-09-16 17:52       ` Danilo Krummrich
2023-09-18 11:03         ` Christian König
2023-09-18 14:57           ` Danilo Krummrich
2023-09-19  5:55             ` Christian König
2023-09-12 10:28   ` Boris Brezillon
2023-09-12 14:54     ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 12/13] drm/sched/doc: Add Entity teardown documentaion Matthew Brost
2023-09-13 15:04   ` Christian König
2023-09-14  2:06   ` Luben Tuikov
2023-09-16 18:06   ` Danilo Krummrich
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 13/13] drm/sched: Update maintainers of GPU scheduler Matthew Brost
2023-09-12  2:20 ` [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev5) Patchwork
2023-09-14  1:45 ` [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Luben Tuikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox